WO2022130516A1

WO2022130516A1 - Annotation device, annotation method, and annotation program

Info

Publication number: WO2022130516A1
Application number: PCT/JP2020/046835
Authority: WO
Inventors: 佑樹北岸; 岳至森; 歩相名神山
Original assignee: 日本電信電話株式会社
Priority date: 2020-12-15
Filing date: 2020-12-15
Publication date: 2022-06-23
Also published as: JPWO2022130516A1

Abstract

An annotation device (10) comprises: an acquisition unit (15a) that acquires first learning data pieces used in machine learning; a first distribution unit (15b) that distributes, to a plurality of annotators, the first learning data pieces acquired by the acquisition unit (15a); a classification unit (15c) that classifies the first learning data pieces on the basis of the reliability of first correct answer labels attached respectively to the first learning data pieces by the annotators; and a second distribution unit (15e) that distributes the classification results for the first learning data pieces classified by the classification unit (15c).

Description

Annotation device, annotation method and annotation program

The present invention relates to an annotation device, an annotation method, and an annotation program.

Conventionally, for supervised learning in machine learning, learning data and the corresponding correct answer label are required. In many studies, work (annotation) is performed to add metadata by viewing data with multiple people.

For example, in the case of annotation for audio or moving image, the worker (appropriately, "annotator") watches the presented audio or moving image for several seconds to several tens of seconds, and adds metadata to meet the specifications. Specifically, if it is an annotation for research and development of emotion recognition from voice, the most appropriate emotion is selected for the heard voice, and if it is object detection or object recognition for an image, the object image. Select an area within and give a description for the object.

The conventional annotation method can be divided into the presence or absence of work comparison targets. When there is no comparison target, the annotator watches a still image or a sound or moving image for a few seconds and adds metadata. This method has a low time cost because the number of data views = the total number of samples N. Also, if it is a task that anyone can understand reliably even in a short time (eg transcription, tagging of objects, clearly angry state no matter who sees or hears it, etc.), it is possible to annotate accurately. ..

On the other hand, when there is a comparison target, the annotator watches a long time (several tens of seconds to several minutes) of audio or moving images and attaches metadata about continuous and relative changes in events (for example, non-patented). (Refer to Document 3), a plurality of voices and moving images are viewed and relative rankings and scores are given (see, for example, Non-Patent Document 4). Since this method has an object to be compared, it is possible to reduce blurring between annotators and add more accurate metadata.

However, with the above-mentioned conventional technique, it is not possible to perform lower cost and higher accuracy annotation in supervised learning in machine learning. This is because with an annotation method that has no comparison target, accurate annotation cannot be performed for tasks that are difficult to understand in a short period of viewing, the variation of labels assigned between annotators becomes large, and the reliability of annotation results becomes low. There's a problem.

There are measures to deal with such problems, such as the quality of annotators, consideration of response tendencies, and reduction of the influence of noise with multi-person annotation, but it is a fundamental solution to improve the reliability of the annotation itself. No (see, for example, Non-Patent Documents 1 and 2). For example, when annotating the degree of concentration, if it seems to be very concentrated or not concentrated at all, that is, if anyone can see and hear it clearly, the votes by multiple annotators are likely to match, but they are concentrated. If it is difficult to tell whether it is or not, accurate annotation is difficult, and as a result, subtle differences cannot be expressed.

On the other hand, in the annotation method without a comparison target, it is necessary to watch a large amount of data for a long time, and the annotation requires a huge cost. For example, when viewing a combination of several data at the same time, if n combinations are selected from the data of all _N samples, a maximum of NC _n combinations may exist. It may be possible to reduce the number of combinations while maintaining the quality of annotations by referring to the psychological experimental method, but for that purpose, careful consideration is required as to which combination to exclude.

In order to solve the above-mentioned problems and achieve the object, the annotation device according to the present invention has an acquisition unit for acquiring the first learning data used for machine learning and the first acquisition unit acquired by the acquisition unit. The first learning data is classified based on the reliability of the first distribution unit that distributes the learning data to a plurality of annotators and the reliability of the first correct answer label given to the first learning data by each annotator. It is characterized by including a classification unit and a second distribution unit that distributes the classification result of the first learning data classified by the classification unit.

Further, the annotation method according to the present invention is an annotation method executed by an annotation device, and is an acquisition step of acquiring first learning data used for machine learning, and the first acquisition step acquired by the acquisition step. The first training data is classified based on the first distribution step of distributing the training data to a plurality of annotators and the reliability of the first correct answer label given to the first training data by each annotator. It is characterized by including a classification step and a second distribution step of distributing the classification result of the first learning data classified by the classification step.

Further, the annotation program according to the present invention has an acquisition step for acquiring the first learning data used for machine learning, and a first distribution for distributing the first learning data acquired by the acquisition step to a plurality of annotators. A classification step for classifying the first learning data based on the reliability of the step and the first correct answer label given to the first learning data by each annotator, and the classification step classified by the classification step. It is characterized by having a computer execute a second distribution step of distributing the classification result of the first learning data.

In the present invention, it is possible to perform low-cost and high-precision annotation in supervised learning in machine learning.

FIG. 1 is a diagram showing a configuration example of an annotation system according to the first embodiment. FIG. 2 is a block diagram showing a configuration example of the annotation device according to the first embodiment. FIG. 3 is a diagram showing an example of learning data according to the first embodiment. FIG. 4 is a diagram showing an example of the first learning data and the first correct answer label according to the first embodiment. FIG. 5 is a diagram showing an example of the second learning data and the second correct answer label according to the first embodiment. FIG. 6 is a flowchart showing an example of the flow of annotation processing according to the first embodiment. FIG. 7 is a flowchart showing an example of the flow of the first learning data classification process according to the first embodiment. FIG. 8 is a diagram showing a computer that executes a program.

Hereinafter, the annotation device, the annotation method, and the embodiment of the annotation program according to the present invention will be described in detail with reference to the drawings. The present invention is not limited to the embodiments described below.

[First Embodiment]
Hereinafter, the configuration of the annotation system, the configuration of the annotation device, the specific example of the annotation processing, the flow of the annotation processing, and the flow of the data classification processing according to the present embodiment will be described in order, and finally, the effect of the present embodiment will be described. ..

[Annotation system configuration]
The configuration of the annotation system (as appropriate, this system) 100 according to the present embodiment will be described in detail with reference to FIG. 1. FIG. 1 is a diagram showing an example of an annotation system according to the first embodiment. The annotation system 100 has an annotation device 10 such as a server, annotators 20 (20A, 20B, 20C) such as various terminals, and various databases 30 (30A, 30B, 30C).

Here, the annotation device 10, the annotator 20, and the database 30 are connected so as to be communicable by wire or wirelessly via a predetermined communication network (not shown). The annotation system 100 shown in FIG. 1 may include a plurality of annotation devices 10.

First, the annotation device 10 acquires learning data necessary for research and development as the first learning data from various databases 30 (step S1). Here, the learning data to be acquired is data such as voice, image, and moving image, and is acquired in a medium and scale according to the purpose of the research or development.

Next, the annotation device 10 distributes the acquired first learning data to the annotator 20 (step S2). Here, the annotator 20 is a terminal that assigns a correct answer label to the distributed learning data, and a user of the terminal, but is not particularly limited. The annotator 20 may be a machine learning model that can be given a specific correct answer label created separately.

Subsequently, the annotator 20 assigns a correct answer label (first correct answer label) to the delivered first learning data (step S3). Further, the annotation device 10 acquires the first learning data to which the correct answer label is attached (step S4).

After that, the annotation device 10 classifies the first learning data based on the first correct answer label (step S5). At this time, the annotation device 10 selects the learning data to which the reliable correct answer data is given as the reference point (appropriately, “reference data”) S based on the answer obtained from the annotator 20. Further, the annotation device 10 further adds data other than the reference point S to the data that is easy to accurately label the correct answer (appropriately “data D”) and data that is difficult to accurately label the correct answer (appropriately “data E”). ”) And.

Further, the annotation device 10 generates the second learning data from the classified first learning data (step S6). At this time, the annotation device 10 generates a data group including the reference point S, the data E, and the data D having the same source. The classification of the first learning data and the generation of the second learning data will be described later.

Then, the annotation device 10 distributes the generated second learning data to the annotator 20 (step S7). At this time, when the data group of the second learning data is distributed, the annotation device 10 distributes each data to the annotator 20 so that the reference point S is viewed and then the data E and the data D are viewed. Further, the annotator 20 assigns a correct answer label (second correct answer label) to the distributed second learning data (step S8). Finally, the annotation device 10 acquires the second learning data to which the correct answer label is attached (step S9).

In the annotation system 100 according to the present embodiment, the annotation device 10 includes reliable data of the event to which the correct answer label is to be given in the data group, and clearly indicates it. Therefore, the annotator 20 can utilize the data as a comparison target, and more accurate annotation can be realized.

[Annotation device configuration]
The configuration of the annotation device 10 according to the present embodiment will be described in detail with reference to FIG. FIG. 2 is a block diagram showing a configuration example of the annotation device according to the present embodiment. The annotation device 10 includes an input unit 11, an output unit 12, a communication unit 13, a storage unit 14, and a control unit 15.

The input unit 11 controls the input of various information to the annotation device 10. The input unit 11 is, for example, a mouse, a keyboard, or the like, and receives input of setting information or the like to the annotation device 10. Further, the output unit 12 controls the output of various information from the annotation device 10. The output unit 12 is, for example, a display or the like, and outputs setting information or the like stored in the annotation device 10.

The communication unit 13 controls data communication with other devices. For example, the communication unit 13 performs data communication with each communication device. Further, the communication unit 13 can perform data communication with a terminal of an operator (not shown).

The storage unit 14 stores various information referred to when the control unit 15 operates and various information acquired when the control unit 15 operates. Here, the storage unit 14 is, for example, a RAM (Random Access Memory), a semiconductor memory element such as a flash memory, or a storage device such as a hard disk or an optical disk. In the example of FIG. 2, the storage unit 14 is installed inside the annotation device 10, but it may be installed outside the annotation device 10, or a plurality of storage units may be installed.

The storage unit 14 is the first learning data acquired from the database 30 described later, the first learning data with the first correct answer label acquired from the annotator 20, and the classification result classified by the classification unit 15c of the control unit 15. , The second learning data generated by the generation unit 15d, the second learning data with the second correct answer label acquired from the annotator 20, etc., as well as the user name and the identification of the machine learning model as the information of the annotator 20. Memorize numbers etc.

The control unit 15 controls the entire annotation device 10. The control unit 15 includes an acquisition unit 15a, a first distribution unit 15b, a classification unit 15c, a generation unit 15d, and a second distribution unit 15e. Here, the control unit 15 is, for example, an electronic circuit such as a CPU (Central Processing Unit) or an MPU (Micro Processing Unit) or an integrated circuit such as an ASIC (Application Specific Integrated Circuit) or an FPGA (Field Programmable Gate Array).

The acquisition unit 15a acquires the first learning data used for machine learning. For example, the acquisition unit 15a acquires the first learning data including audio, an image, or a moving image. Further, the acquisition unit 15a acquires the first learning data from the database 30. Further, the acquisition unit 15a acquires the learning data to which the correct answer label is attached from the annotator 20. Further, the acquisition unit 15a stores the first learning data, the learning data to which the correct answer label is attached, and the like in the storage unit 14.

The first distribution unit 15b distributes the first learning data acquired by the acquisition unit 15a to a plurality of annotators 20. For example, the first distribution unit 15b distributes the first learning data in a format in which a predetermined number is given as the first correct answer label. Further, the first distribution unit 15b distributes the first learning data to the machine learning model as the annotator 20. The detailed processing of the first learning data and the first correct answer label will be described later.

The classification unit 15c classifies the first learning data based on the reliability of the first correct answer label given to the first learning data by each annotator. For example, the classification unit 15c assigns the first learning data as reference data, data that is easy to give an accurate correct label, or difficult to give an accurate label based on the variance of the first correct label as the reliability. Classify into data. Further, the classification unit 15c classifies the first learning data based on the posterior probability of the first correct answer label as the reliability. Further, the classification unit 15a stores the calculation result of the reliability of the first correct answer label and the classification result based on the reliability in the storage unit 14.

Here, the reliability is the variance of the numerical value of the correct answer label of each annotator for a certain learning data when the annotator 20 is a person, but is not particularly limited. The index used for the reliability may be any index showing the variation of the numerical value, and the smaller the variation of the numerical value, the higher the reliability of the correct label. Further, the reliability is a posterior probability of a numerical value that is an estimation result of the machine learning model for a certain learning data when the annotator 20 is a machine learning model, but is not particularly limited. The index used for the reliability may be any index that represents the accuracy of the estimation result of the machine learning model, and the higher the accuracy of the estimation result, the higher the reliability of the correct label.

As a classification result, the generation unit 15d includes a plurality of reference data having different extreme values, data in which it is easy to give an accurate correct label, and data in which it is difficult to give an accurate correct label, and each data. Generate a second training data whose source is the same data group. Further, the generation unit 15d stores the classification result of the second learning data or the like in the storage unit 14.

Here, the extreme value is, for example, a case where the learning data is in a format in which the degree of a specific state such as the degree of concentration is determined by numbers in five stages {1, 2, 3, 4, 5} as a correct answer label. , The minimum number "1" and the maximum number "5", but are not particularly limited. The extreme value may be a number indicating that the annotator 20 can be clearly determined to be in an extreme state, and is not limited to the minimum value or the maximum value in the range of numbers set in advance as the correct label.

The second distribution unit 15e distributes the classification result of the first learning data classified by the classification unit 15c. For example, the second distribution unit 15e first distributes a plurality of reference data having different extreme values. Further, the second distribution unit 15e distributes the classification result to a plurality of annotators to which the first learning data is distributed, or to a predetermined annotator other than the plurality of annotators to which the first learning data is distributed.

Here, the classification result is the first learning data classified by the classification unit 15c based on the reliability of the given correct answer label, and is, for example, the reference point S (reference data) and the data E (correctly correct answer). The training data is labeled in three categories: data that is easy to label) and data D (data that is difficult to label accurately), but is not particularly limited. The classification result may be learning data in which the reliability of the correct answer label is labeled, or may be learning data selected by the generation unit 15d.

[Specific example of annotation processing]
A specific example of the annotation process of the annotation device 10 according to the present embodiment will be described with reference to FIGS. 3 to 5. FIG. 3 is a diagram showing an example of learning data according to the first embodiment. FIG. 4 is a diagram showing an example of the first learning data and the first correct answer label according to the first embodiment. FIG. 5 is a diagram showing an example of the second learning data and the second correct answer label according to the first embodiment.

(First annotation processing)
First, the first annotation process from the acquisition of the first learning data to the acquisition of the first learning data to which the first correct answer label is attached will be described. First, the first learning data acquired by the annotation device 10 from the database 30 or the like will be described with reference to FIG. Here, the first learning data to be acquired is data such as voice, image, and moving image, and is data acquired on a medium and scale according to the purpose of research or development. For example, when annotating for the realization of concentration estimation from voice, the annotation device 10 acquires voice data from a voice database that stores voice data in the database 30.

FIG. 3 is a diagram showing a data set X that holds audio data, and the data set X contains audio data of {x ₀ , x ₁ , x ₂ , x ₃ , x ₄ , ... X _N }. Is included. The voice data shown in FIG. 3 represents the voice waveform as the relationship between the passage of time and the voice signal strength.

In the following description, annotation processing using voice data as the first learning data will be described, but the type of learning data is not particularly limited. The first learning data may be image data, moving image data, or a combination thereof, in addition to audio data. Further, the first learning data may be data obtained by digitizing or textizing the above-mentioned voice data or the like.

Next, with reference to FIG. 4, the first learning data delivered by the annotation device 10 to the annotator 20 and the first correct label given to the first learning data acquired by the annotation device 10 from the annotator 20 will be described. do. For example, when annotating for the realization of concentration estimation from voice, the annotation device 10 has five levels of concentration (“1”: not concentrated, “2”: slightly not concentrated, “3”. ": Neither can be said (flat)," 4 ": slightly concentrated," 5 ": concentrated) is set in advance, and which concentration is applied to each voice data held by the data set X. The first learning data to be given a correct answer label as to whether the degree is most suitable is delivered to the annotator 20.

When the annotation device 10 annotates for the realization of concentration estimation from voice, for example, the students who are asked are concentrated in the lesson from the voice data related to the questions and dialogues between the teacher and the students in the lesson. We will deliver learning data to determine whether or not we were concentrated, and how much we were concentrated, etc. in 5 stages. Further, when the annotation device 10 assigns a correct answer label regarding the concentration degree from the image data or the video data, the annotation device 10 causes the annotator 20 to read the student's facial expression or the like from the image or the video in the lesson, and determines the concentration degree. Learning data may be distributed.

Then, the annotation device 10 acquires the first learning data to which the correct answer label is given by the annotator 20. In FIG. 3, correct labels for the voice data “x ₀ ” to “x _N ” of the data sets X of the “annotator 01” to the “annotator 03” are shown (see FIG. 4 “ANNOT1 (X)”). For example, the correct answer labels given by "Anotator 01" to "Anotator 03" for the voice data x ₀ are "2", "1", and "1", respectively.

(Second annotation processing)
Secondly, the second annotation processing from the classification of the first learning data to which the first correct answer label is attached to the acquisition of the second learning data to which the second correct answer label is attached will be described. First, a specific example of the first training data classification process based on the reliability of the correct answer label obtained from the annotator 20 will be described with reference to FIG. The annotation device 10 calculates the numerical values of the average and the variance of the correct answer labels given to each voice data.

In FIG. 4, at _x0 , the average is “1.3” and the variance is “0.3” (small variance). Similarly, for x ₁ , the mean "5" and the variance "0" (all annotator answers match), and for x ₂ , the mean "1" and the variance "0" (all the annotator answers match), x ₃ Then, the average is "3.3", the variance is "0.3" (small variance), the mean is " _4.0 " for _x4 , the variance is "1.0" (large variance), and the mean is "1. 6 ”, variance“ 1.3 ”(large variance).

At this time, the annotation device 10 selects the learning data to which the reliable correct answer data is given from the answers obtained from the annotator 20 as the reference point S. In the example of FIG. 4, it is assumed that the answers of all the annotators are the same and the numerical value of the given correct answer label is an extremum, and x ₁ (extreme value “5”) and x ₂ (extreme value “1”). ”) Is selected.

Further, the annotation device 10 further classifies the learning data other than the reference point S as data D, which is easy to accurately label the correct answer, and as data E, which is difficult to accurately label the correct answer. For example, the annotation device 10 sets a threshold value of reliability and classifies it as data D if the variance is 1.0 or more, and classifies it as data E otherwise. In the example of FIG. 4, the annotation device 10 classifies x ₀ and x ₃ into data E because the variance is less than 1.0, and classifies x ₄ and x _N into data D because the variance is 1.0 or more.

When the annotation device 10 uses the estimation result by the machine learning model created separately as the correct answer label, for example, the training data in which the posterior probability of the numerical value to be the estimation result is 80% or more is set as the reference point S and the posterior. The training data having a probability of 50% or more and less than 80% is classified as data E, and the training data having a posterior probability of less than 50% is classified as data D. Further, the annotation device 10 can statically or dynamically change the classification method such as the number of classifications of learning data and the threshold value.

Subsequently, with reference to FIG. 5, a specific example of the second learning data generation processing and distribution processing based on the first learning data classification will be described. The annotation device 10 generates a data group including three types of data, the reference point S, the data E, and the data D, as the second training data. At this time, the reference point S must include the data of the respective extreme values "1" and "5". In addition, the sources of data (speakers, people in moving images, objects, etc.) of each data group are the same.

For example, the annotation device 10 generates a data group set P having a data group {p ₀ , p ₁ , ... p _M } as an element. Here, the data group p ₀ includes {x ₀ , x ₁ , x ₂ , x ₃ , x ₄ , x _N } (see FIGS. 3 and 4) as elements, and the data group p _M includes. It is assumed that {x _a , x _b , x _c , x _d , x _e , x _f } (not shown in FIGS. 3 and 4) are included as elements. In the example of FIG. 5, the annotation device 10 selects the reference point S {x ₁ , x ₂ }, the data E {x ₀ , x ₃ }, and the data D {x ₄ , x _N } as the data group p ₀ . , The reference point S {x _a , x _b }, the data E {x _c , x _d }, and the data D {x _e , x _f } are selected as the data group p _M.

Note that the number of data in each data group and the selection method can be arbitrarily changed as long as they include the reference point S which is a different extremum as described above and satisfy the conditions of the same source. For example, the number of data may be a random number within a certain range. Further, as the data to be included, two data E and two data D may be prepared so that the average value of the results of each annotation is closer to "1" or "5".

After that, the annotation device 10 distributes the data group selected as described above to the annotator 20 as the second learning data. At this time, when the annotation device 10 distributes the data group of the second learning data, the reference point S is first distributed for each data group, and then the data E and the data D are distributed to the annotator 20. In the example of FIG. 5, when the data group p ₀ is delivered, the annotation device 10 of the reference point S {x ₁ , x ₂ }, the data E {x ₀ , x ₃ }, and the data D {x ₄ , x _N }. The data group p _M is distributed in order, and the reference point S {x _a , x _b }, the data E {x _c , x _d }, and the data D {x _e , x _f } are distributed in this order.

Note that the annotation device 10 may instruct the annotator 20 to view the data in the order of data E and data D. Further, the annotation device 10 may distribute the data E and the data D so as to be viewed at random. Further, the annotation device 10 may present the correct answer label at the same time as the reference point S delivered first and instruct the annotator 20 not to give the correct answer label, regardless of the classification of the learning data. , You may instruct all the training data to be given the correct label.

Finally, the annotation device 10 acquires the second training data to which the correct answer label is given by the annotator 20. In the example of FIG. 5, the annotation device 10 acquires the correct answer labels of the data E and the data D excluding the reference point S for each annotator for each data group. For example, for the “notator 01”, the data group _p0 . For the training data of {x ₀ , x _3, x ₄ , x _N }, the correct answer labels of {1, 4 _, 3, 2} are acquired in order, and {x _c , x _d , of the data group p _M , For the training data of x _e , x _f }, the correct answer labels of {2 _{, 4, 3,} 3} are acquired in order (see FIG. 5 “ANNOT2 (X)”).

The final processing of the correct answer label given to the second learning data is not particularly limited. The annotation device 10 may take a majority vote for each training data and determine the most correct answer label as the final correct answer label, or calculate the average score of the numerical values and determine the numerical value as the final correct answer label. You may.

[Flow of annotation processing]
The flow of annotation processing according to this embodiment will be described in detail with reference to FIG. FIG. 6 is a flowchart showing an example of the flow of annotation processing according to the first embodiment.

First, the acquisition unit 15a of the annotation device 10 acquires the first learning data including voice, image, video, etc. from the database 30 or the like (step S101). At this time, the acquisition unit 15a may acquire the first learning data from the storage unit 14. Further, the acquisition unit 15a may process the original data such as voice data acquired from the database 30 or the storage unit 14, divide the original data into an appropriate size as learning data, or classify the data appropriately. Further, the acquisition unit 15a may acquire voice data or the like from the outside via the input unit 11.

Next, the first distribution unit 15b distributes the first learning data to the annotator 20 (step S102). At this time, the first distribution unit 15b may select the annotator 20 to be distributed according to the first learning data. Further, the acquisition unit 15a acquires the first learning data to which the first correct answer label is given by the annotator 20 (step S103).

Then, the classification unit 15c classifies the first learning data based on the reliability of the first correct answer label (step S104). Further, the generation unit 15d generates the second learning data from the classified first learning data (step S105). Subsequently, the second distribution unit 15e distributes the second learning data to the annotator 20 (step S106).

The second distribution unit 15e can also distribute the second learning data to annotators other than the annotator 20 that has distributed the first learning data. For example, the second distribution unit 15e can distribute the first learning data to the annotator who is a human being, and can also distribute the second learning data to the annotator which is a machine learning model.

Finally, the acquisition unit 15a acquires the second learning data to which the second correct answer label is given by the annotator 20 (step S107), and the process ends. If the accuracy of the acquired second correct label is not sufficient, the processes of steps S104 to S107 may be performed again.

[Flow of classification processing of the first learning data]
With reference to FIG. 7, the flow of the first learning data classification process according to the present embodiment will be described in detail. FIG. 7 is a flowchart showing an example of the flow of the first learning data classification process according to the first embodiment. First, the acquisition unit 15a of the annotation device 10 acquires the first correct answer label given to the first learning data from the annotator 20 (step S201). Next, when the annotator 20 is a person (step S202: the annotator is a person), the classification unit 15c performs the classification processing of steps S208 to S210 based on the processing of steps S203 to S205.

When the answers of all the annotators match (step S203: affirmative) and the answers are extreme values (step S204: affirmative), the classification unit 15c uses the first learning data to which the correct answer label is attached as the reference point S. (Step S208). Further, when the answer of the annotator 20 includes something that does not match the answer of the annotator 20 (step S203: negative), or when the answer of the annotator 20 is not an extreme value (step S204: negative), the classification unit 15c performs the process of step S205. ..

When the variance of the answers of the annotator 20 is 1.0 or more (step S205: affirmative), the classification unit 15c classifies the first learning data to which the correct answer label is given into the data E (step S209). Further, when the variance of the answers of the annotator 20 is less than 1.0 (step S205: negation), the first learning data to which the correct answer label is attached is classified into the data D (step S210). When the classification process of steps S208 to S210 is completed, the classification unit 15c ends the process.

On the other hand, when the annotator 20 is a machine learning model (step S202: the annotator is a machine learning model), the classification unit 15c performs the classification processing of steps S208 to S210 based on the processing of steps S206 to S207. When the posterior probability of the value that is the estimation result of the annotator 20 is 80% or more (step S206: affirmative), the classification unit 15c classifies the first learning data to which the correct answer label is given into the reference point S (step S206: affirmative). Step S208).

Further, in the classification unit 15c, when the posterior probability of the value that is the estimation result of the annotator 20 is less than 80% (step S206: negative) and the posterior probability is 50% or more (step S207: affirmative), the correct answer. The labeled first training data is classified into data E (step S209). Further, when the posterior probability of the value that is the estimation result of the annotator 20 is less than 50% (step S207: negation), the classification unit 15c classifies the first learning data to which the correct answer label is given into the data D. (Step S210). When the classification process of steps S208 to S210 is completed, the classification unit 15c ends the process.

[Effect of the first embodiment]
First, in the annotation process according to the present embodiment described above, the first learning data used for machine learning is acquired, the acquired first learning data is distributed to a plurality of annotators, and each annotator makes a first. The first learning data is classified based on the reliability of the first correct answer label given to each learning data, and the classification result of the classified first learning data is delivered. Therefore, in this process, it is possible to perform low-cost and high-precision annotation in supervised learning in machine learning.

Secondly, in the annotation process according to the present embodiment described above, the first learning in the form of acquiring the first learning data including voice, image or moving image and assigning a predetermined number as the first correct answer label. Data is distributed, and the first learning data is converted into reference data, data that is easy to give an accurate correct answer label, or data that is difficult to give an accurate correct answer label based on the distribution of the first correct answer label as reliability. Classify. Therefore, in this process, in supervised learning in machine learning, it is possible to assign a highly reliable correct label even when there is no comparison target, and it is possible to perform low-cost and high-precision annotation.

Thirdly, in the annotation process according to the present embodiment described above, as an annotator, the first learning data is delivered to the machine learning model, and the first learning data is used as the posterior probability of the first correct answer label as the reliability. Classify based on. Therefore, in this process, in supervised learning in machine learning, it is possible to assign a highly reliable correct label even when the annotator is not a human being, and it is possible to perform annotation at lower cost and with higher accuracy.

Fourth, in the above-mentioned annotation processing according to the present embodiment, as the classification result, a plurality of reference data having different extreme values, data that can be easily given an accurate correct answer label, and accurate correct answer labels are given. The second training data, which includes data that is difficult to handle and whose source of each data is the same data group, is generated, and a plurality of reference data are distributed first. Therefore, in this process, in supervised learning in machine learning, it is possible to assign a correct label with high reliability and efficiency even when there is no comparison target, and annotation is performed at lower cost and with higher accuracy. be able to.

Fifth, in the annotation process according to the present embodiment described above, the classification result is distributed to a plurality of annotators to which the first learning data is distributed, or to a predetermined annotator other than the plurality of annotators to which the first learning data is distributed. .. Therefore, in this process, in supervised learning in machine learning, it is possible to assign correct answer labels with high reliability, efficiency, and flexibility even when there is no comparison target, and it is possible to assign correct answer labels at lower cost and higher accuracy. Annotation can be performed.

[System configuration, etc.]
Each component of each of the illustrated devices according to the above embodiment is a functional concept and does not necessarily have to be physically configured as shown in the figure. That is, the specific form of distribution / integration of each device is not limited to the one shown in the figure, and all or part of them may be functionally or physically distributed / physically distributed in any unit according to various loads and usage conditions. Can be integrated and configured. Further, each processing function performed by each device may be realized by a CPU and a program analyzed and executed by the CPU, or may be realized as hardware by wired logic.

Further, among the processes described in the above-described embodiment, all or a part of the processes described as being automatically performed can be manually performed, or the processes described as being manually performed can be performed. All or part of it can be done automatically by a known method. In addition, the processing procedure, control procedure, specific name, and information including various data and parameters shown in the above document and drawings can be arbitrarily changed unless otherwise specified.

〔program〕
It is also possible to create a program in which the processing executed by the annotation device 10 described in the above embodiment is described in a language that can be executed by a computer. In this case, the same effect as that of the above embodiment can be obtained by executing the program by the computer. Further, the same process as that of the above embodiment may be realized by recording the program on a computer-readable recording medium, reading the program recorded on the recording medium into the computer, and executing the program.

FIG. 8 is a diagram showing a computer that executes a program. As illustrated in FIG. 8, the computer 1000 includes, for example, a memory 1010, a CPU (Central Processing Unit) 1020, a hard disk drive interface 1030, a disk drive interface 1040, a serial port interface 1050, a video adapter 1060, and the like. It has a network interface 1070 and each of these parts is connected by a bus 1080.

The memory 1010 includes a ROM (Read Only Memory) 1011 and a RAM 1012, as illustrated in FIG. The ROM 1011 stores, for example, a boot program such as a BIOS (Basic Input Output System). The hard disk drive interface 1030 is connected to the hard disk drive 1090, as illustrated in FIG. The disk drive interface 1040 is connected to the disk drive 1100 as illustrated in FIG. For example, a removable storage medium such as a magnetic disk or an optical disk is inserted into the disk drive 1100. The serial port interface 1050 is connected to, for example, a mouse 1110 and a keyboard 1120, as illustrated in FIG. The video adapter 1060 is connected, for example, to a display 1130, as illustrated in FIG.

Here, as illustrated in FIG. 8, the hard disk drive 1090 stores, for example, the OS 1091, the application program 1092, the program module 1093, and the program data 1094. That is, the above program is stored in, for example, the hard disk drive 1090 as a program module in which a command executed by the computer 1000 is described.

Further, the various data described in the above embodiment are stored as program data in, for example, the memory 1010 or the hard disk drive 1090. Then, the CPU 1020 reads the program module 1093 and the program data 1094 stored in the memory 1010 and the hard disk drive 1090 into the RAM 1012 as needed, and executes various processing procedures.

The program module 1093 and program data 1094 related to the program are not limited to those stored in the hard disk drive 1090, and may be stored in, for example, a removable storage medium and read by the CPU 1020 via a disk drive or the like. .. Alternatively, the program module 1093 and the program data 1094 related to the program are stored in another computer connected via a network (LAN (Local Area Network), WAN (Wide Area Network), etc.) and stored via the network interface 1070. It may be read by the CPU 1020.

The above embodiments and modifications thereof are included in the invention described in the claims and the equivalent scope thereof, as included in the technique disclosed in the present application.

10 Annotation device 11 Input unit 12 Output unit 13 Communication unit 14 Storage unit 15 Control unit 15a Acquisition unit 15b First distribution unit 15c Classification unit 15d Generation unit 15e

Second distribution unit

20, 20A, 20B,

20C Annotator

30, 30A, 30B , 30C Database 100 Annotation System

Claims

An acquisition unit that acquires the first learning data used for machine learning,
A first distribution unit that distributes the first learning data acquired by the acquisition unit to a plurality of annotators, and a first distribution unit.
A classification unit that classifies the first learning data based on the reliability of the first correct answer label given to the first learning data by each annotator.
An annotation device including a second distribution unit that distributes the classification result of the first learning data classified by the classification unit.
The acquisition unit acquires the first learning data including audio, images, or moving images, and obtains the first learning data.
The first distribution unit distributes the first learning data in a format in which a predetermined number is given as the first correct answer label.
The classification unit assigns the first learning data as reference data, data that can be easily given an accurate correct label, or accurate correct label based on the variance of the first correct label as the reliability. The annotation device according to claim 1, wherein the data is classified into difficult data.
The first distribution unit distributes the first learning data to the machine learning model as the annotator.
The annotation device according to claim 1 or 2, wherein the classification unit classifies the first learning data as the reliability based on the posterior probability of the first correct answer label.
As the classification result, a plurality of reference data having different extreme values, the data in which the correct answer label is easily attached, and the data in which the correct answer label is difficult to be attached are included, and each data is included. Further equipped with a generator for generating a second training data whose source is the same data group,
The annotation device according to claim 2 or 3, wherein the second distribution unit first distributes the plurality of reference data.
The annotation device according to any one of claims 1 to 4, wherein the second distribution unit distributes the classification result to the plurality of annotators or predetermined annotators other than the plurality of annotators.
Annotation method performed by the annotation device,
The acquisition process to acquire the first learning data used for machine learning,
A first distribution step of distributing the first learning data acquired by the acquisition process to a plurality of annotators, and a first distribution process.
A classification step of classifying the first learning data based on the reliability of the first correct answer label given to the first learning data by each annotator.
An annotation method comprising a second distribution step of distributing the classification result of the first learning data classified by the classification step.
The acquisition step to acquire the first learning data used for machine learning,
The first distribution step of distributing the first learning data acquired by the acquisition step to a plurality of annotators, and
A classification step for classifying the first training data based on the reliability of the first correct label given to the first training data by each annotator, and a classification step.
An annotation program characterized by causing a computer to execute a second distribution step of distributing the classification result of the first learning data classified by the classification step.