WO2023119415A1

WO2023119415A1 - Processing device, processing method, and processing program

Info

Publication number: WO2023119415A1
Application number: PCT/JP2021/047300
Authority: WO
Inventors: 修税所; 浩士今村
Original assignee: 日本電信電話株式会社
Priority date: 2021-12-21
Filing date: 2021-12-21
Publication date: 2023-06-29

Abstract

A processing device 1 is provided with: a labeling unit 51 which labels each input data set using labeling functions with reference to function data 12 including labeling functions that label an input data set or abstain if the input data set cannot be labeled; a discriminator 52 which trains a model, enters each input data set into the trained model, and outputs the discrimination uncertainty of the model, wherein said model learns the label assigned to each input data set, also learns input data 11 with votes that includes the ones of the entered input data sets that have been labeled using a number of labeling functions that is equal to or greater than a threshold value, and outputs, from an input data set, a correct label and the confidence level of each labeling function; and an identification unit 54 which identifies, as presentation data 24, a data set for which the value of an acquisition function considering the discrimination uncertainty 22 is maximized among the data sets in input data 15 with no votes that includes the ones of the entered input data sets that have been labeled using a number of labeling functions that is less than the threshold value. A labeling function newly created for the presentation data 24 is inserted into the function data 12.

Description

Processing device, processing method and processing program

The present invention relates to a processing device, a processing method, and a processing program.

Machine learning, especially so-called supervised learning, is widespread in a wide range of fields. In supervised learning, a training data set is prepared in advance, and the discriminator learns based on the training data set. A training data set is data obtained by adding determination results such as identification, classification, regression, and identity to an input data set to be analyzed. The cost of creating a training dataset with correct answers is a problem in machine learning.

Active learning and weakly supervised learning, which add training datasets by computer processing, are proposed.

In active learning, existing training datasets and classifiers are used to present datasets that improve the performance of the classifier when the correct answer is known, among the input datasets without correct answers. The presented dataset is marked with the correct answer and added to the training dataset.

In weakly supervised learning, the system implements a function that corresponds to the knowledge of the subject who gives the correct answer to the rule, and the system gives the correct answer to the input data set according to the function. The dataset with correct answers is added to the training dataset.

There is an idea that the labeling process can be performed more efficiently by combining active learning and weakly supervised learning.

For example, in weakly supervised learning, there is a technique for adding rules in a manner similar to active learning (Non-Patent Document 1). Non-Patent Document 1 extracts an input data set in which the majority of the output is divided or no votes when the implemented rule is applied to the input data set group. For input datasets randomly selected from the extracted input datasets, add rules to guide the correct answer.

There is also a method of identifying rules that are likely to be functionalized from the form of functions predetermined by the subject and specific feature amounts, and inquiring of the subject whether or not to add those rules (Non-Patent Document 2).

However, in the methods described in Non-Patent Document 1 and Non-Patent Document 2, the expected value of improvement in classifier performance such as active learning is not considered. The methods described in Non-Patent Literature 1 and Non-Patent Literature 2 may be difficult to implement efficient learning, such as it takes time to add rules appropriately.

The present invention has been made in view of the above circumstances, and an object of the present invention is to provide a technique capable of appropriately presenting an input data set to which correct answers should be given in weakly supervised learning.

The processing device according to one embodiment of the present invention labels each input data set with reference to function data including a labeling function that labels an input data set or abstains if labeling is not possible, and labels each input data set with the labeling function. a labeling unit for assigning votes, a label assigned to each of the input data sets, and input data sets labeled with a labeling function of a number equal to or greater than a threshold among the input data sets. , learning a model that outputs a correct label and the reliability of each labeling function from an input data set, inputting each input data set to the trained model, and outputting the uncertainty of discrimination in the model and a value of the acquisition function that takes into account the uncertainty of identification among each data set of unvoted input data including input data sets labeled with a labeling function in a number less than a threshold number in each input data set. and a specifying unit that specifies a data set that maximizes as presentation data. A newly created labeling function for the presentation data is inserted into the function data.

In the processing method of one aspect of the present invention, the computer labels each input data set with reference to function data including a labeling function that labels the input data set or abstains if labeling is not possible, and converts each input data set to the labeling function. and said computer includes a label attached to each of said input datasets and input datasets labeled with a labeling function equal to or greater than a threshold number of each of said input datasets. learning data to learn a model that outputs a correct label and the reliability of each labeling function from an input data set; inputting each of the input data sets to the trained model; and the computer outputs the identification uncertainty for each data set of non-voting input data, including input data sets labeled with a labeling function in a number less than a threshold number in each input data set. identifying as the presentation data the data set that maximizes the value of the acquisition function that takes into account the sparseness. A newly created labeling function for the presentation data is inserted into the function data.

One aspect of the present invention is a processing program that causes a computer to function as the processing device.

According to the present invention, it is possible to provide a technology capable of appropriately presenting an input data set to which correct answers should be given in weakly supervised learning.

FIG. 1 is a diagram illustrating functional blocks of a processing device. FIG. 2 is a diagram illustrating an example of the data structure of input data. FIG. 3 is a diagram illustrating an example of the data structure of function data. FIG. 4 is a diagram illustrating an example of the data structure of labeled input data. FIG. 5 is a flowchart for explaining an outline of processing by the processing device. FIG. 6 is a diagram illustrating functional blocks of a discriminator. FIG. 7 is a diagram illustrating an example of the data structure of an identification probability vector. FIG. 8 is a flowchart for explaining an outline of processing by a model processing unit in the discriminator. FIG. 9 is a flowchart for explaining an outline of processing by the reliability calculation unit of the model processing unit. FIG. 10 is a diagram for explaining the hardware configuration of a computer used in the processing device.

Hereinafter, embodiments of the present invention will be described with reference to the drawings. In the description of the drawings, the same parts are denoted by the same reference numerals, and the description thereof is omitted.

In weakly supervised learning, the processing device 1 according to the embodiment identifies an input data set to which a new labeling function is to be applied based on the output results of existing labeling functions. The specified input data set is presented to subject E. Entity E generates a new labeling function for labeling the specified input data set. The newly generated labeling functions are added to the existing labeling functions. In this way, the processing device 1 can efficiently generate a labeling function by determining an input data set for generating a new labeling function.

The processing device 1 shown in FIG. 1 includes input data 11, function data 12, labeled input data 13, voted input data 14, non-voted input data 15, output data 21, identification uncertainty 22, cluster data 23, presentation It has data 24 and new function data 25 , and functions of a labeling unit 51 , a discriminator 52 , a clustering unit 53 , a specifying unit 54 and an updating unit 55 . Each data is stored in memory 902 or storage 903 . Each function is implemented in the CPU 901 .

The input data 11 is data to be labeled by the labeling function. The input data 11 is a set of multiple input data sets, as shown in FIG. The number of input data sets included in the input data 11 in the embodiment of the present invention is |D|.

The function data 12 is data of a labeling function that labels each input data set of the input data 11 . A labeling function is a function that labels an input dataset or abstains if labeling is not possible. The function data 12 is a set of labeling functions, as shown in FIG. The number of labeling functions included in the function data 12 in the embodiment of the present invention is |F|.

The labeled input data 13 is data obtained by labeling each input data set of the input data 11 with a labeling function. In the labeled input data 13, as shown in FIG. 4, values are set corresponding to an identifier specifying an input data set and an identifier specifying a labeling function. In the example shown in FIG. 4, the first letter after the alphabet v is the identifier that identifies the input dataset, and the second letter is the identifier of the labeling function that processed the input dataset.

If the label of the corresponding input data set can be determined by the corresponding labeling function, the identifier of the determined label is set as the value. On the other hand, if the label of the corresponding input data set cannot be determined by the corresponding labeling function, the value is set to indicate that it could not be determined. The value indicating that the determination was not possible is, for example, 0, and a value that is not set by the label identifier is set.

Each input data set of the input data 11 is classified into either voted input data 14 or non-voted input data 15 depending on whether the processing result of the labeling unit 51 is equal to or greater than a threshold. The threshold is 1 or more.

The input data with voting 14 includes input data sets labeled with a number of labeling functions equal to or greater than a threshold among the input data sets of the input data 11 . Among the input data, an input data set labeled with a labeling function equal to or greater than a predetermined threshold is included in the voted input data 14 .

The non-voting input data 15 includes input data sets labeled with the labeling function in numbers less than the threshold among each input data set. Among the input data, the input data set labeled by the labeling function below the predetermined threshold is included in the non-voting input data 15 .

The output data 21 is data resulting from labeling each data set of the input data 11 with correct labels. The new function data 25 is added to the function data 12 by the processing device 1, and the process of learning by the classifier 52 is repeated until a predetermined condition is satisfied, and then the classifier 52 outputs the output data 21. FIG.

The identification uncertainty 22 is the identification uncertainty of the learning model in the classifier 52 . The uncertainty of identification 22 is referenced to select a data set to present to the subject E from the non-voting input data 15 .

The cluster data 23 indicates the result of dividing each input data set included in the non-voting input data 15 into a plurality of clusters. For example, the cluster data 23 associates a cluster identifier with an input data set identifier belonging to the cluster.

The presentation data 24 is data presented to the subject E when the subject E creates a new labeling function. Presentation data 24 includes one or more input data sets.

The new function data 25 is data specifying a new labeling function generated by the subject E who confirmed the presentation data 24 . New function data 25 includes one or more labeling functions.

By generating a new labeling function for the input data set included in the presentation data 24, the performance of the discriminator 52 can be improved.

The labeling unit 51 labels each input data set with a labeling function. The labeling unit 51 stores the labeling result as the labeled input data 13 . The labeling unit 51 further divides each input data set of the input data 11 into voted input data 14 from the input data set labeled by the labeling function equal to or greater than the threshold, The labeled input data set is partitioned into non-voting input data 15 .

The discriminator 52 learns the labels attached to each input data set of the labeled input data 13 and the input data with voting 14, and produces a model that outputs the correct label and the reliability of each labeling function from the input data set. learn. The discriminator 52 inputs each input data set of the input data 11 to the trained model and outputs the discrimination uncertainty 22 in the model. Identification uncertainty 22 is referenced to identify presentation data 24 . The processing of discriminator 52 will be described in detail later.

The clustering unit 53 divides each input data set of the non-voting input data 15 into a plurality of clusters. The clustering unit 53 classifies each input data set into clusters by referring to the value of real number data indicating the characteristics of each input data set.

In the embodiment of the present invention, the clustering unit 53 uses agglomerative clustering to divide each input data set of the non-voting input data 15 into a plurality of clusters. Agglomerative clustering is, for example, HDBSCAN (Hierarchical Density-Based Spatial Clustering of Applications with Noise).

In agglomerative clustering, each input data set is represented as a leaf on a tree diagram based on the distance between each input data set, and one or more leaves are hierarchically organized. A parameter indicating how many groups of data are considered may be set in advance or may be automatically adjusted. Agglomerative clustering extracts optimal clusters and outliers that do not belong to any cluster. Agglomerative clustering can extract clusters that ensure coherence of meaning without specifying the number of clusters in advance or considering the distorted distribution of leaves while adjusting the strength of cohesion by parameters.

The specifying unit 54 generates the presentation data 24 to be presented to the subject E. The specifying unit 54 specifies, as the presentation data 24 , the data set that maximizes the value of the acquisition function considering the uncertainty of identification 22 among the data sets of the non-voting input data 15 . The acquisition function is positively correlated with the uncertainty of discrimination 22 . The subject E who sees the presentation data 24 generates a new labeling function for the presentation data 24 to generate new function data 25 . The specifying unit 54 can present the subject E with the input data set that maximizes the expected value of the performance improvement of the discriminator 52 . The acquisition function is, for example, batch BALD (J. Kirsch et al., “Batchbald: Efficient and diverse batch acquisition for deep Bayesian active learning,” NueIPS2019). Batch BALD makes it possible to approximate the expected value of the performance improvement of the discriminator 52 represented by a submodular function by a greedy method.

The identifying unit 54 identifies, as the presentation data, the input data set belonging to the cluster to which the input data set that maximizes the value of the acquisition function considering the uncertainty 22 of identification belongs. The specifying unit 54 calculates the value of the acquisition function considering the uncertainty 22 of identification for each data set of the non-voting input data 15 . Here, the values may be calculated for all the data sets among the plurality of data sets belonging to each cluster, or the values may be calculated for the number of data sets presented to the subject E. FIG. The specifying unit 54 calculates a representative value for each cluster. The representative value is the maximum value among the input data sets belonging to a certain cluster. The specifying unit 54 approximates the value of the acquisition function as the amount of decrease in joint entropy for each cluster, and specifies a plurality of input data sets of the cluster to which the data set having the maximum value of the acquisition function belongs. Various data can be shown to the subject E by ensuring a certain degree of closeness between each data set of the data 15 and avoiding selection of only similar data. As a result, efficiency of the labeling function by subject E can be expected.

Also, the acquisition function batch BALD is a method used when presenting multiple data collectively in active learning. This can reduce the possibility of selecting the same or strongly similar data considering the duplication of a plurality of data.

The update unit 55 inserts each labeling function included in the new function data 25 into the function data 12 . When the function data 12 is updated, the processing device 1 refers to the updated function data 12, repeats the learning of the discriminator 52, and presents the presentation data 24 to the subject E. FIG.

When a new labeling function is added from the presentation data 24, the processing device 1 repeats each process of the labeling unit 51, classifier 52, etc., and the classifier 52 learns the model again. Model learning by the discriminator 52 is repeated until a predetermined condition is satisfied. The discriminator 52 inputs each input data set to the learned model and outputs the output data 21 when a predetermined condition is satisfied. The predetermined condition is that the number of input data sets included in the presentation data 24 is 0, the number of added labeling functions, the number of times of learning, the time required for learning, or a combination of these. By generating the new function data 25 from the presentation data 24 generated by the processing device 1, the function data 12 can be efficiently increased.

The discriminator 52 also optimizes the label attached to each input data set and the reliability of each labeling function as one model. It is possible to improve the performance of the discriminator 52 by taking into account the correlation between the labels attached to each input data set and the reliability of each labeling function, compared to optimizing each model. Become.

(Processing method)
A processing method according to an embodiment of the present invention will be described with reference to FIG.

In step S1 , the processing device 1 labels each input data set of the input data 11 with each labeling function of the function data 12 . The processing device 1 outputs labeled input data 13 in which each input data set is associated with a label assigned by a labeling function.

In step S2, the processing device 1 divides each input data set of the input data 11 into voted input data 14 and non-voted input data 15. Voted input data 14 includes input data sets with votes from labeling functions that are equal to or greater than the threshold. Unvoted input data 15 includes input data sets with votes from less than the threshold number of labeling functions.

In step S3 , the processing device 1 performs model processing using the classifier 52 . In the model processing, the model of the discriminator 52 is learned using the labeled input data 13 and the voted input data 14 .

In step S4, it is determined whether or not it is time to output the learning result. When it is time to output the learning result, the output data 21 output from the model processing unit 71 is output in step S10. If it is not the timing to output the learning result, the process proceeds to step S5.

In step S5, the processing device 1 inputs the input data 11 to the learned model obtained in step S3, and outputs the identification uncertainty 22.

In step S6, the processing device 1 clusters the non-voting input data 15. In step S7, the processing device 1 identifies the cluster containing the data set with the largest value of the acquisition function considering the uncertainty 22 of discrimination. The processing device 1 identifies each data set belonging to the identified cluster as the presentation data 24 .

Present the subject E with the presentation data 24 obtained in step S7. In step S8 , the processing device 1 acquires new labeling functions generated from the presentation data 24 . In step S9, the new labeling function acquired in step S8 is added to the function data 12. FIG.

The processing device 1 returns to step S1, refers to each labeling function of the updated function data 12, assigns labels again, and updates the model.

(discriminator)
The discriminator 52 will be described with reference to FIG. The inputs of the discriminator 52 are labeled input data 13 , voted input data 14 and non-voted input data 15 . The output of discriminator 52 is output data 21 and discrimination uncertainty 22 .

The discriminator 52 has a conversion unit 70 and a model processing unit 71 .

The conversion unit 70 converts each data set of the voted input data 14 and the non-voted input data 15 into a real number data set. A real data set is data that expresses the characteristics of the input data set. In the embodiment of the present invention, the model processing section 71 processes real data sets. The transformation unit 70 analyzes the meaning of the input data set by embedding, for example, sentence BERT (Bidirectional Encoder Representations from Transformers), and generates a real number data set.

The conversion unit 70 converts each data set of the voted input data 14 into each real number data set to generate the voted real number data 60 . The voted real number data 60 is referred to during model learning in the model processing unit 71 and is referred to when the learned model outputs the output data 21 and the discrimination uncertainty 22 .

The conversion unit 70 converts each data set of the non-voting input data 15 into each real number data set to generate non-voting real number data 61 . The voted real number data 60 is referred to when the model learned by the model processing unit 71 outputs the output data 21 and the discrimination uncertainty 22 .

The model processing unit 71 includes a reliability calculation unit 72 , a first vector calculation unit 73 , a second vector calculation unit 74 and a loss function calculation unit 75 .

The reliability calculation unit 72 calculates the reliability of the labeling function from the relationship between the real number data set with votes indicating the characteristics of the input data set to be processed of the input data with votes 14 and the label attached to the input data set to be processed. Calculate degrees. The confidence of the labeling function is the average confidence for each input data set calculated for the labeling function.

The reliability calculation unit 72 acquires the label attached to the input data set to be processed and the real number data set from the labeled input data 13 and the voted real number data 60 . The reliability calculation unit 72 calculates the reliability of each labeling function based on overlaps and contradictions that occur between labeling functions and between real number data sets. The reliability calculator 72 associates the identifier of the labeling function with the reliability of the labeling function, and outputs reliability data 62 .

The reliability calculation unit 72 includes, for example, a neural network and a processing unit that calculates the reliability of each labeling function from the output of the neural network. The neural network comprises a concatenate layer, a linear tanh layer, a cMCdropout (consistent Monte Carlo dropout) layer and a linear softmax layer. The concatenate layer receives the input data set to be processed of the labeled input data 13 and the real number data set corresponding to the input data set to be processed output from the first cMCdropout layer of the second vector calculation unit 74 described later. be done.

　The neural network outputs the reliability of each labeling function for each input data set regardless of the presence or absence of voting. The processing unit refers to the labeled input data 13, extracts the reliability of the input data set voted by the labeling function to be processed for each labeling function, and calculates the average of the extracted reliability as the labeling function to be processed. output as the reliability of

The first vector calculation unit 73 weights the labeling function with the degree of reliability, and obtains a first vector including the provisional correct label 63 of the input data set to be processed and the probability that the input data set to be processed corresponds to each label. 1 identification probability vector 64 is calculated.

A first identification probability vector 64 is generated for each of the input data sets and includes probabilities corresponding to each label, as shown in FIG. The sum of the probabilities corresponding to each label is one.

For the input data set to be processed, the first vector calculation unit 73 sets the reliability of the labeling function to the value corresponding to the voted labeling function to the probability calculated by the softmax function, and calculates the first Generate an identification probability vector 64 . The first vector calculation unit 73 sets the label assigned with the highest reliability labeling function among the voted labeling functions to the input data set to be processed as the provisional correct label 63 . The first vector calculator 73 calculates a temporary correct label 63 and a first identification probability vector 64 for each input data set of the input data 11 .

For example, when three labeling functions determine the results of {1, 0, 0} for an input data set, the first vector calculation unit 73 considers the reliability of each labeling function. Probabilities of {0.7, 0.15, 0.15} are output as the probabilities corresponding to each label. For another input data set, when the results of discrimination by the three labeling functions are {0, 0, 1}, the model processing unit 71 sets {0.25, 0 . 25, 0.5}. A high probability is set for the result determined by the labeling function with high reliability, and a low probability is set for the result determined by the labeling function with low reliability.

The discrimination result is {1, 0, 3}. Explain when a contradiction occurs. The first vector calculator 73 outputs probabilities of {0.55, 0.1, 0.35}, for example. Even if contradiction occurs, a high probability is set for the result discriminated by the labeling function with high reliability, and a low probability is set for the result discriminated by the labeling function with low reliability.

When the determination result is {0, 0, 0}, specifically, the case where it is determined that each labeling function cannot be determined will be described. Since the first vector calculator 73 does not have material for judging the probabilities corresponding to each label, it outputs probabilities of {0.33, 0.33, 0.33}, for example.

The second vector calculation unit 74 calculates the second identification probability vector 65 from the real number data set of the input data set to be processed for the input data set with votes. During learning, the input data set to be processed is the input data set of the voting input data 14 . The second vector calculation unit 74 calculates a loss function using the temporary correct label 63, and calculates a second identification probability vector 65 from the real number data set of the input data set to be processed so as to minimize the loss function. Learn a model that computes The second vector calculation unit 74 inputs the real number data set of the input data set to be processed to the trained model, and calculates the second identification probability vector 65 for each of the input data sets.

A second identification probability vector 65 is generated for each of the input data sets and contains the probability corresponding to each label, as shown in FIG. The sum of the probabilities corresponding to each label is one.

The second vector calculation unit 74 inputs the real number data set corresponding to the input data set to be processed with the number of votes equal to or greater than the threshold to the neural network, and generates a second identification probability vector 65 is output. The neural network comprises a cMCdropout layer, a linear ReLU layer, a cMCdropout layer and a linear softmax layer.

The loss function calculation unit 75 calculates a function for comparing each of the first identification probability vector 64 and the second identification probability vector 65 with the provisional correct label 63 for the input data set to be processed. This function has a positive correlation with the difference between the total probability corresponding to each label and the probability of the temporary correct label 63 for the first identification probability vector 64, and for the second identification probability vector 65, each There is a positive correlation with the difference between the sum of the probabilities corresponding to the labels and the probabilities of the tentative correct labels. In the embodiment of the present invention, the sum of the probabilities is 1, so the function is, for example, the difference between the probability value corresponding to the tentative correct label 63 of the first identification probability vector 64 and 1, and the second is a weighted average of the difference between 1 and the value of the probability corresponding to the temporary correct label 63 of the identification probability vector 65 of .

The loss function calculator 75 calculates a loss function correlated with each function for each input data set of the voted input data 14 . The loss function calculator 75 calculates the sum of the functions calculated for the input data sets with votes as the loss function.

The model processing unit 71 learns the model so that the loss function calculated by the loss function calculation unit 75 is minimized. The model processing unit 71 calculates the probabilities corresponding to the temporary correct labels 63 of the first identification probability vector 64 and the second identification probability vector 65 for each of the voted input data sets according to the calculated loss function. Learn so that the value approaches 1.

The model processing unit 71 inputs each input data set of the input data 11 to the trained model and outputs the discrimination uncertainty 22 in the model. The identification uncertainty 22 is positively correlated with the entropy of each second identification probability vector 65 calculated for each input data set of input data 11 .

For each input data set of the input data 11, the model processing unit 71, when the real number data set converted by the conversion unit 70 is input, calculates the probability corresponding to each label by the second vector calculation unit 74. A second identification probability vector 65 containing The model processing unit 71 outputs, for example, an average entropy of the second identification probability vector 65 calculated for each input data set as the identification uncertainty 22 .

After repeating learning and satisfying a predetermined end condition, the model processing unit 71 outputs output data 21, which is the identification result, from the second identification probability vector calculated by the trained model. For each input data set, the model processing unit 71 sets the label with the highest probability in the second identification probability vector 65 calculated from the input data set to be processed as the correct label of the input data set to be processed. The model processing unit 71 outputs the output data 21 in which the identifier of the input data set to be processed and the correct label are associated with each other.

The model processing unit 71 adds a Monte Carlo Dropout layer to the input layer in each neural network of the reliability calculation unit 72 and the second vector calculation unit 74, and further adds a Monte Carlo Dropout layer to the positions other than the final layer. do. In the Monte Carlo Dropout layer, the neural network connecting the nodes is ignored with a certain probability. The model processing unit 71 is equipped with a Monte Carlo Dropout layer, so that it is as if a plurality of models with different node connections exist not only during model learning but also when outputting using a trained model. to estimate the uncertainty of identification. For example, if the output differs each time learning is repeated, it can be determined that the reliability is low, and if the output is the same, it can be determined that the reliability is high.

(Model processing method)
A model processing method by the model processing unit 71 will be described with reference to FIGS. 9 and 10. FIG. 9 and 10 correspond to step S3 in FIG.

In step S101, the model processing unit 71 performs reliability calculation processing.

The model processing unit 71 performs the processing of step S151 for each input data set of the input data 11. In step S151 , the model processing unit 71 calculates the reliability of each input data set and each labeling function from the correlation between the label attached to each input data set of the input data 11 and the voted real number data 60 .

The model processing unit 71 processes steps S152 and S153 for each labeling function. In step S152, the model processing unit 71 identifies the input data set voted for by the labeling function to be processed. In step S153, the model processing unit 71 calculates the average of the labeling functions to be processed in the input data set identified in step S152 as the reliability of the labeling function to be processed.

When the processing of steps S152 and S153 is completed for each labeling function, the model processing unit 71 processes steps S102 to S105 for each input data set of the voted input data 14 .

In step S102, the model processing unit 71 calculates the first identification probability vector 64 from the reliability of each labeling function calculated in step S101. In step S103 , the model processing unit 71 identifies the label corresponding to the highest value among the first identification probability vectors 64 identified in step S102 as the temporary correct label 63 . In step S104, the model processing unit 71 calculates the second identification probability vector 65 from the real number data of the input data set to be processed. In step S105, the model processing unit 71 performs processing based on the function indicating the difference between the first identification probability vector 64 and the temporary correct label 63 and the function indicating the difference between the second identification probability vector 65 and the temporary correct label 63. Identify the function of the input dataset of interest.

When the processing from step S102 to step S105 is completed for each input data set of the voted input data 14, the process proceeds to step S106. In step S106, the model processing unit 71 identifies a loss function in the model from the function of each input data set identified in step S106. In step S107, the model processing unit 71 learns a model that minimizes the loss function specified in step S106.

(evaluation)
Evaluation of the processing apparatus 1 according to the embodiment of the present invention will be described. The conversion unit 70 converts the input data set into real number data representing characteristics using embedding by sentence BERT. The dropout rate of Monte Carlo dropout in the neural networks of the first vector calculation unit 73 and the second vector calculation unit 74 is 0.2 in the input layer and 0.5 in the other layers. The clustering unit 53 performs clustering using HDBSCAN.

First, an example of the presentation data 24 output by the processing device 1 according to the embodiment of the present invention is shown.

In the presentation data 24 described above, although there are various patterns of sentences, it can be seen that information common to each input data set, such as earn money and make money, is included. Therefore, the subject E observing such presentation data 24 can efficiently create a labeling function capable of labeling the presentation data 24 .

According to the processing device 1 according to the embodiment of the present invention, the discriminator 52 determines both the reliability of each labeling function and the labeling of the input data set so as to minimize one loss function. learn. The processing device 1 can also calculate the uncertainty of the model output using the learned model.

Also, according to the processing device 1, the expected value of the performance improvement of the discriminator 52 represented by the submodular function is approximated by the greedy method, and the input data set with the maximum expected value is presented to the subject E. As a result, it is expected that the subject E will create a labeling function more efficiently.

The processing device 1 of the present embodiment described above includes, for example, a CPU (Central Processing Unit, processor) 901, a memory 902, a storage 903 (HDD: Hard Disk Drive, SSD: Solid State Drive), and a communication device 904 , an input device 905 and an output device 906 are used. In this computer system, each function of the processing device 1 is realized by the CPU 901 executing a processing program loaded on the memory 902 .

It should be noted that the processing device 1 may be implemented by one computer, or may be implemented by a plurality of computers. Also, the processing device 1 may be a virtual machine implemented in a computer.

The processing device 1 program can be stored in a computer-readable recording medium such as HDD, SSD, USB (Universal Serial Bus) memory, CD (Compact Disc), DVD (Digital Versatile Disc), or distributed via a network. can also

It should be noted that the present invention is not limited to the above embodiments, and many modifications are possible within the scope of the gist.

1 processing device 11 input data 12 function data 13 labeled input data 14 voted input data 15 non-voted input data 21 output data 22 uncertainty of discrimination 23 cluster data 24 presentation data 25 new function data 51 labeling unit 52 discriminator 53 Clustering unit 54 Identification unit 55 Update unit 60 Real number data with vote 61 Real number data without vote 62 Reliability data 63 Temporary correct label 64 First identification probability vector 65 Second identification probability vector 70 Conversion unit 71 Model processing unit 72 Reliability degree calculator 73 first vector calculator 74 second vector calculator 75 loss function calculator 901 CPU
902 memory 903 storage 904 communication device 905 input device 906 output device E subject

Claims

With reference to function data containing a labeling function that labels the input dataset or abstains if labeling is not possible,
a labeling unit that labels each input data set with the labeling function;
learning voted input data including labels assigned to each input data set and input data sets labeled with a labeling function of a number equal to or greater than a threshold among each input data set, and correcting from the input data set; A discriminator that learns a model that outputs a label and the reliability of each labeling function, inputs each of the input data sets to the trained model, and outputs the uncertainty of discrimination in the model;
Among each data set of non-voting input data including input data sets labeled with a labeling function that is less than a threshold number in each input data set, the value of the acquisition function considering the identification uncertainty is maximized. an identification unit that identifies the data set as presentation data,
A processing unit in which a newly created labeling function for the presentation data is inserted into the function data.
The discriminator is
Reliability for calculating the reliability of the labeling function from the relationship between the real number data set with votes representing the characteristics of the input data set to be processed of the input data with votes and the label attached to the input data set to be processed. a degree calculation unit;
weighting the labeling function with the confidence to generate a first identification probability vector containing a probability that the input data set to be processed corresponds to each label; a first vector calculator that calculates
a second vector calculation unit that calculates a second identification probability vector from the voted real number data set;
calculating a function for comparing each of the first identification probability vector and the second identification probability vector with the provisional correct label for the input data set to be processed;
A loss function calculation unit that calculates a loss function correlated with each function for each input data set of the input data with voting,
The processing device according to claim 1, wherein the model is trained such that the loss function is minimized.
The function is
For the first identification probability vector, there is a positive correlation with the difference between the total probability corresponding to each label and the probability of the temporary correct label,
3. The processing device according to claim 2, wherein the second identification probability vector has a positive correlation with a difference between the total probability corresponding to each label and the probability of the temporary correct label.
3. The processing device according to claim 2, wherein the reliability of the labeling function is an average of reliability for each input data set calculated for the labeling function.
The discriminator calculates, for each of the input data sets, a discrimination probability vector including probabilities corresponding to each label;
3. The processing device according to claim 2, wherein the identification uncertainty has a positive correlation with the entropy of each identification probability vector.
further comprising a clustering unit that divides each input data set of the non-voting input data into a plurality of clusters;
The processing device according to claim 1, wherein the specifying unit specifies, as the presentation data, an input data set belonging to a cluster to which an input data set having the maximum value belongs.
Referring to function data containing a labeling function that the computer labels the input data set or abstains if labeling is not possible,
labeling each input data set with the labeling function;
The computer learns voted input data including the label attached to each of the input data sets and input data sets labeled with a labeling function of a number equal to or greater than a threshold among the input data sets, and inputs a step of learning a model that outputs the correct label and the reliability of each labeling function from the data set, inputting each of the input data sets into the trained model, and outputting the uncertainty of discrimination in the model;
The computer determines an acquisition function that takes into account the uncertainty of identification of each data set of non-voted input data, including input data sets labeled with a labeling function in a number that is less than a threshold number in each input data set. identifying the data set with the largest value as the presented data;
A processing method wherein a newly created labeling function for said presentation data is inserted into said function data.
A processing program for causing a computer to function as the processing device according to any one of claims 1 to 6.