WO2022130516A1 - Annotation device, annotation method, and annotation program - Google Patents

Annotation device, annotation method, and annotation program Download PDF

Info

Publication number
WO2022130516A1
WO2022130516A1 PCT/JP2020/046835 JP2020046835W WO2022130516A1 WO 2022130516 A1 WO2022130516 A1 WO 2022130516A1 JP 2020046835 W JP2020046835 W JP 2020046835W WO 2022130516 A1 WO2022130516 A1 WO 2022130516A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
learning data
annotation
learning
classification
Prior art date
Application number
PCT/JP2020/046835
Other languages
French (fr)
Japanese (ja)
Inventor
佑樹 北岸
岳至 森
歩相名 神山
Original Assignee
日本電信電話株式会社
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 日本電信電話株式会社 filed Critical 日本電信電話株式会社
Priority to PCT/JP2020/046835 priority Critical patent/WO2022130516A1/en
Priority to JP2022569380A priority patent/JPWO2022130516A1/ja
Publication of WO2022130516A1 publication Critical patent/WO2022130516A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning

Definitions

  • the present invention relates to an annotation device, an annotation method, and an annotation program.
  • the worker in the case of annotation for audio or moving image, the worker (appropriately, "annotator") watches the presented audio or moving image for several seconds to several tens of seconds, and adds metadata to meet the specifications. Specifically, if it is an annotation for research and development of emotion recognition from voice, the most appropriate emotion is selected for the heard voice, and if it is object detection or object recognition for an image, the object image. Select an area within and give a description for the object.
  • the conventional annotation method can be divided into the presence or absence of work comparison targets.
  • the annotator watches a still image or a sound or moving image for a few seconds and adds metadata.
  • Non-Patent Documents 1 and 2 For example, when annotating the degree of concentration, if it seems to be very concentrated or not concentrated at all, that is, if anyone can see and hear it clearly, the votes by multiple annotators are likely to match, but they are concentrated. If it is difficult to tell whether it is or not, accurate annotation is difficult, and as a result, subtle differences cannot be expressed.
  • the annotation device has an acquisition unit for acquiring the first learning data used for machine learning and the first acquisition unit acquired by the acquisition unit.
  • the first learning data is classified based on the reliability of the first distribution unit that distributes the learning data to a plurality of annotators and the reliability of the first correct answer label given to the first learning data by each annotator. It is characterized by including a classification unit and a second distribution unit that distributes the classification result of the first learning data classified by the classification unit.
  • the annotation method is an annotation method executed by an annotation device, and is an acquisition step of acquiring first learning data used for machine learning, and the first acquisition step acquired by the acquisition step.
  • the first training data is classified based on the first distribution step of distributing the training data to a plurality of annotators and the reliability of the first correct answer label given to the first training data by each annotator. It is characterized by including a classification step and a second distribution step of distributing the classification result of the first learning data classified by the classification step.
  • the annotation program has an acquisition step for acquiring the first learning data used for machine learning, and a first distribution for distributing the first learning data acquired by the acquisition step to a plurality of annotators.
  • FIG. 1 is a diagram showing a configuration example of an annotation system according to the first embodiment.
  • FIG. 2 is a block diagram showing a configuration example of the annotation device according to the first embodiment.
  • FIG. 3 is a diagram showing an example of learning data according to the first embodiment.
  • FIG. 4 is a diagram showing an example of the first learning data and the first correct answer label according to the first embodiment.
  • FIG. 5 is a diagram showing an example of the second learning data and the second correct answer label according to the first embodiment.
  • FIG. 6 is a flowchart showing an example of the flow of annotation processing according to the first embodiment.
  • FIG. 7 is a flowchart showing an example of the flow of the first learning data classification process according to the first embodiment.
  • FIG. 8 is a diagram showing a computer that executes a program.
  • FIG. 1 is a diagram showing an example of an annotation system according to the first embodiment.
  • the annotation system 100 has an annotation device 10 such as a server, annotators 20 (20A, 20B, 20C) such as various terminals, and various databases 30 (30A, 30B, 30C).
  • annotation device 10 the annotator 20, and the database 30 are connected so as to be communicable by wire or wirelessly via a predetermined communication network (not shown).
  • the annotation system 100 shown in FIG. 1 may include a plurality of annotation devices 10.
  • the annotation device 10 acquires learning data necessary for research and development as the first learning data from various databases 30 (step S1).
  • the learning data to be acquired is data such as voice, image, and moving image, and is acquired in a medium and scale according to the purpose of the research or development.
  • the annotation device 10 distributes the acquired first learning data to the annotator 20 (step S2).
  • the annotator 20 is a terminal that assigns a correct answer label to the distributed learning data, and a user of the terminal, but is not particularly limited.
  • the annotator 20 may be a machine learning model that can be given a specific correct answer label created separately.
  • the annotator 20 assigns a correct answer label (first correct answer label) to the delivered first learning data (step S3). Further, the annotation device 10 acquires the first learning data to which the correct answer label is attached (step S4).
  • the annotation device 10 classifies the first learning data based on the first correct answer label (step S5).
  • the annotation device 10 selects the learning data to which the reliable correct answer data is given as the reference point (appropriately, “reference data”) S based on the answer obtained from the annotator 20. Further, the annotation device 10 further adds data other than the reference point S to the data that is easy to accurately label the correct answer (appropriately “data D”) and data that is difficult to accurately label the correct answer (appropriately “data E”). ”) And.
  • the annotation device 10 generates the second learning data from the classified first learning data (step S6). At this time, the annotation device 10 generates a data group including the reference point S, the data E, and the data D having the same source. The classification of the first learning data and the generation of the second learning data will be described later.
  • the annotation device 10 distributes the generated second learning data to the annotator 20 (step S7).
  • the annotation device 10 distributes each data to the annotator 20 so that the reference point S is viewed and then the data E and the data D are viewed.
  • the annotator 20 assigns a correct answer label (second correct answer label) to the distributed second learning data (step S8).
  • the annotation device 10 acquires the second learning data to which the correct answer label is attached (step S9).
  • the annotation device 10 includes reliable data of the event to which the correct answer label is to be given in the data group, and clearly indicates it. Therefore, the annotator 20 can utilize the data as a comparison target, and more accurate annotation can be realized.
  • FIG. 2 is a block diagram showing a configuration example of the annotation device according to the present embodiment.
  • the annotation device 10 includes an input unit 11, an output unit 12, a communication unit 13, a storage unit 14, and a control unit 15.
  • the input unit 11 controls the input of various information to the annotation device 10.
  • the input unit 11 is, for example, a mouse, a keyboard, or the like, and receives input of setting information or the like to the annotation device 10.
  • the output unit 12 controls the output of various information from the annotation device 10.
  • the output unit 12 is, for example, a display or the like, and outputs setting information or the like stored in the annotation device 10.
  • the communication unit 13 controls data communication with other devices. For example, the communication unit 13 performs data communication with each communication device. Further, the communication unit 13 can perform data communication with a terminal of an operator (not shown).
  • the storage unit 14 stores various information referred to when the control unit 15 operates and various information acquired when the control unit 15 operates.
  • the storage unit 14 is, for example, a RAM (Random Access Memory), a semiconductor memory element such as a flash memory, or a storage device such as a hard disk or an optical disk.
  • the storage unit 14 is installed inside the annotation device 10, but it may be installed outside the annotation device 10, or a plurality of storage units may be installed.
  • the storage unit 14 is the first learning data acquired from the database 30 described later, the first learning data with the first correct answer label acquired from the annotator 20, and the classification result classified by the classification unit 15c of the control unit 15. , The second learning data generated by the generation unit 15d, the second learning data with the second correct answer label acquired from the annotator 20, etc., as well as the user name and the identification of the machine learning model as the information of the annotator 20. Memorize numbers etc.
  • the control unit 15 controls the entire annotation device 10.
  • the control unit 15 includes an acquisition unit 15a, a first distribution unit 15b, a classification unit 15c, a generation unit 15d, and a second distribution unit 15e.
  • the control unit 15 is, for example, an electronic circuit such as a CPU (Central Processing Unit) or an MPU (Micro Processing Unit) or an integrated circuit such as an ASIC (Application Specific Integrated Circuit) or an FPGA (Field Programmable Gate Array).
  • the acquisition unit 15a acquires the first learning data used for machine learning. For example, the acquisition unit 15a acquires the first learning data including audio, an image, or a moving image. Further, the acquisition unit 15a acquires the first learning data from the database 30. Further, the acquisition unit 15a acquires the learning data to which the correct answer label is attached from the annotator 20. Further, the acquisition unit 15a stores the first learning data, the learning data to which the correct answer label is attached, and the like in the storage unit 14.
  • the first distribution unit 15b distributes the first learning data acquired by the acquisition unit 15a to a plurality of annotators 20. For example, the first distribution unit 15b distributes the first learning data in a format in which a predetermined number is given as the first correct answer label. Further, the first distribution unit 15b distributes the first learning data to the machine learning model as the annotator 20. The detailed processing of the first learning data and the first correct answer label will be described later.
  • the classification unit 15c classifies the first learning data based on the reliability of the first correct answer label given to the first learning data by each annotator. For example, the classification unit 15c assigns the first learning data as reference data, data that is easy to give an accurate correct label, or difficult to give an accurate label based on the variance of the first correct label as the reliability. Classify into data. Further, the classification unit 15c classifies the first learning data based on the posterior probability of the first correct answer label as the reliability. Further, the classification unit 15a stores the calculation result of the reliability of the first correct answer label and the classification result based on the reliability in the storage unit 14.
  • the reliability is the variance of the numerical value of the correct answer label of each annotator for a certain learning data when the annotator 20 is a person, but is not particularly limited.
  • the index used for the reliability may be any index showing the variation of the numerical value, and the smaller the variation of the numerical value, the higher the reliability of the correct label.
  • the reliability is a posterior probability of a numerical value that is an estimation result of the machine learning model for a certain learning data when the annotator 20 is a machine learning model, but is not particularly limited.
  • the index used for the reliability may be any index that represents the accuracy of the estimation result of the machine learning model, and the higher the accuracy of the estimation result, the higher the reliability of the correct label.
  • the generation unit 15d includes a plurality of reference data having different extreme values, data in which it is easy to give an accurate correct label, and data in which it is difficult to give an accurate correct label, and each data. Generate a second training data whose source is the same data group. Further, the generation unit 15d stores the classification result of the second learning data or the like in the storage unit 14.
  • the extreme value is, for example, a case where the learning data is in a format in which the degree of a specific state such as the degree of concentration is determined by numbers in five stages ⁇ 1, 2, 3, 4, 5 ⁇ as a correct answer label. , The minimum number "1" and the maximum number "5", but are not particularly limited.
  • the extreme value may be a number indicating that the annotator 20 can be clearly determined to be in an extreme state, and is not limited to the minimum value or the maximum value in the range of numbers set in advance as the correct label.
  • the second distribution unit 15e distributes the classification result of the first learning data classified by the classification unit 15c. For example, the second distribution unit 15e first distributes a plurality of reference data having different extreme values. Further, the second distribution unit 15e distributes the classification result to a plurality of annotators to which the first learning data is distributed, or to a predetermined annotator other than the plurality of annotators to which the first learning data is distributed.
  • the classification result is the first learning data classified by the classification unit 15c based on the reliability of the given correct answer label, and is, for example, the reference point S (reference data) and the data E (correctly correct answer).
  • the training data is labeled in three categories: data that is easy to label) and data D (data that is difficult to label accurately), but is not particularly limited.
  • the classification result may be learning data in which the reliability of the correct answer label is labeled, or may be learning data selected by the generation unit 15d.
  • FIG. 3 is a diagram showing an example of learning data according to the first embodiment.
  • FIG. 4 is a diagram showing an example of the first learning data and the first correct answer label according to the first embodiment.
  • FIG. 5 is a diagram showing an example of the second learning data and the second correct answer label according to the first embodiment.
  • the first annotation process from the acquisition of the first learning data to the acquisition of the first learning data to which the first correct answer label is attached will be described.
  • the first learning data acquired by the annotation device 10 from the database 30 or the like will be described with reference to FIG.
  • the first learning data to be acquired is data such as voice, image, and moving image, and is data acquired on a medium and scale according to the purpose of research or development.
  • the annotation device 10 acquires voice data from a voice database that stores voice data in the database 30.
  • FIG. 3 is a diagram showing a data set X that holds audio data, and the data set X contains audio data of ⁇ x 0 , x 1 , x 2 , x 3 , x 4 , ... X N ⁇ . Is included.
  • the voice data shown in FIG. 3 represents the voice waveform as the relationship between the passage of time and the voice signal strength.
  • the first learning data may be image data, moving image data, or a combination thereof, in addition to audio data. Further, the first learning data may be data obtained by digitizing or textizing the above-mentioned voice data or the like.
  • the first learning data delivered by the annotation device 10 to the annotator 20 and the first correct label given to the first learning data acquired by the annotation device 10 from the annotator 20 will be described. do.
  • the annotation device 10 has five levels of concentration (“1”: not concentrated, “2”: slightly not concentrated, “3”. ": Neither can be said (flat),” 4 ": slightly concentrated,” 5 ": concentrated) is set in advance, and which concentration is applied to each voice data held by the data set X.
  • the first learning data to be given a correct answer label as to whether the degree is most suitable is delivered to the annotator 20.
  • the annotation device 10 When the annotation device 10 annotates for the realization of concentration estimation from voice, for example, the students who are asked are concentrated in the lesson from the voice data related to the questions and dialogues between the teacher and the students in the lesson. We will deliver learning data to determine whether or not we were concentrated, and how much we were concentrated, etc. in 5 stages. Further, when the annotation device 10 assigns a correct answer label regarding the concentration degree from the image data or the video data, the annotation device 10 causes the annotator 20 to read the student's facial expression or the like from the image or the video in the lesson, and determines the concentration degree. Learning data may be distributed.
  • the annotation device 10 acquires the first learning data to which the correct answer label is given by the annotator 20.
  • correct labels for the voice data “x 0 ” to “x N ” of the data sets X of the “annotator 01” to the “annotator 03” are shown (see FIG. 4 “ANNOT1 (X)”).
  • the correct answer labels given by "Anotator 01” to "Anotator 03" for the voice data x 0 are "2", "1", and "1", respectively.
  • the annotation device 10 selects the learning data to which the reliable correct answer data is given from the answers obtained from the annotator 20 as the reference point S.
  • the answers of all the annotators are the same and the numerical value of the given correct answer label is an extremum, and x 1 (extreme value “5”) and x 2 (extreme value “1”). ”) Is selected.
  • the annotation device 10 further classifies the learning data other than the reference point S as data D, which is easy to accurately label the correct answer, and as data E, which is difficult to accurately label the correct answer. For example, the annotation device 10 sets a threshold value of reliability and classifies it as data D if the variance is 1.0 or more, and classifies it as data E otherwise. In the example of FIG. 4, the annotation device 10 classifies x 0 and x 3 into data E because the variance is less than 1.0, and classifies x 4 and x N into data D because the variance is 1.0 or more.
  • the annotation device 10 uses the estimation result by the machine learning model created separately as the correct answer label, for example, the training data in which the posterior probability of the numerical value to be the estimation result is 80% or more is set as the reference point S and the posterior.
  • the training data having a probability of 50% or more and less than 80% is classified as data E, and the training data having a posterior probability of less than 50% is classified as data D.
  • the annotation device 10 can statically or dynamically change the classification method such as the number of classifications of learning data and the threshold value.
  • the annotation device 10 generates a data group including three types of data, the reference point S, the data E, and the data D, as the second training data.
  • the reference point S must include the data of the respective extreme values "1" and "5".
  • the sources of data (speakers, people in moving images, objects, etc.) of each data group are the same.
  • the annotation device 10 generates a data group set P having a data group ⁇ p 0 , p 1 , ... p M ⁇ as an element.
  • the data group p 0 includes ⁇ x 0 , x 1 , x 2 , x 3 , x 4 , x N ⁇ (see FIGS. 3 and 4) as elements
  • the data group p M includes. It is assumed that ⁇ x a , x b , x c , x d , x e , x f ⁇ (not shown in FIGS. 3 and 4) are included as elements. In the example of FIG.
  • the annotation device 10 selects the reference point S ⁇ x 1 , x 2 ⁇ , the data E ⁇ x 0 , x 3 ⁇ , and the data D ⁇ x 4 , x N ⁇ as the data group p 0 .
  • the reference point S ⁇ x a , x b ⁇ , the data E ⁇ x c , x d ⁇ , and the data D ⁇ x e , x f ⁇ are selected as the data group p M.
  • the number of data in each data group and the selection method can be arbitrarily changed as long as they include the reference point S which is a different extremum as described above and satisfy the conditions of the same source.
  • the number of data may be a random number within a certain range.
  • two data E and two data D may be prepared so that the average value of the results of each annotation is closer to "1" or "5".
  • the annotation device 10 distributes the data group selected as described above to the annotator 20 as the second learning data.
  • the reference point S is first distributed for each data group, and then the data E and the data D are distributed to the annotator 20.
  • the annotation device 10 of the reference point S ⁇ x 1 , x 2 ⁇ , the data E ⁇ x 0 , x 3 ⁇ , and the data D ⁇ x 4 , x N ⁇ .
  • the data group p M is distributed in order, and the reference point S ⁇ x a , x b ⁇ , the data E ⁇ x c , x d ⁇ , and the data D ⁇ x e , x f ⁇ are distributed in this order.
  • annotation device 10 may instruct the annotator 20 to view the data in the order of data E and data D. Further, the annotation device 10 may distribute the data E and the data D so as to be viewed at random. Further, the annotation device 10 may present the correct answer label at the same time as the reference point S delivered first and instruct the annotator 20 not to give the correct answer label, regardless of the classification of the learning data. , You may instruct all the training data to be given the correct label.
  • the annotation device 10 acquires the second training data to which the correct answer label is given by the annotator 20.
  • the annotation device 10 acquires the correct answer labels of the data E and the data D excluding the reference point S for each annotator for each data group. For example, for the “notator 01”, the data group p0 .
  • the final processing of the correct answer label given to the second learning data is not particularly limited.
  • the annotation device 10 may take a majority vote for each training data and determine the most correct answer label as the final correct answer label, or calculate the average score of the numerical values and determine the numerical value as the final correct answer label. You may.
  • FIG. 6 is a flowchart showing an example of the flow of annotation processing according to the first embodiment.
  • the acquisition unit 15a of the annotation device 10 acquires the first learning data including voice, image, video, etc. from the database 30 or the like (step S101).
  • the acquisition unit 15a may acquire the first learning data from the storage unit 14.
  • the acquisition unit 15a may process the original data such as voice data acquired from the database 30 or the storage unit 14, divide the original data into an appropriate size as learning data, or classify the data appropriately.
  • the acquisition unit 15a may acquire voice data or the like from the outside via the input unit 11.
  • the first distribution unit 15b distributes the first learning data to the annotator 20 (step S102). At this time, the first distribution unit 15b may select the annotator 20 to be distributed according to the first learning data. Further, the acquisition unit 15a acquires the first learning data to which the first correct answer label is given by the annotator 20 (step S103).
  • the classification unit 15c classifies the first learning data based on the reliability of the first correct answer label (step S104). Further, the generation unit 15d generates the second learning data from the classified first learning data (step S105). Subsequently, the second distribution unit 15e distributes the second learning data to the annotator 20 (step S106).
  • the second distribution unit 15e can also distribute the second learning data to annotators other than the annotator 20 that has distributed the first learning data.
  • the second distribution unit 15e can distribute the first learning data to the annotator who is a human being, and can also distribute the second learning data to the annotator which is a machine learning model.
  • the acquisition unit 15a acquires the second learning data to which the second correct answer label is given by the annotator 20 (step S107), and the process ends. If the accuracy of the acquired second correct label is not sufficient, the processes of steps S104 to S107 may be performed again.
  • FIG. 7 is a flowchart showing an example of the flow of the first learning data classification process according to the first embodiment.
  • the acquisition unit 15a of the annotation device 10 acquires the first correct answer label given to the first learning data from the annotator 20 (step S201).
  • the classification unit 15c performs the classification processing of steps S208 to S210 based on the processing of steps S203 to S205.
  • step S203 When the answers of all the annotators match (step S203: affirmative) and the answers are extreme values (step S204: affirmative), the classification unit 15c uses the first learning data to which the correct answer label is attached as the reference point S. (Step S208). Further, when the answer of the annotator 20 includes something that does not match the answer of the annotator 20 (step S203: negative), or when the answer of the annotator 20 is not an extreme value (step S204: negative), the classification unit 15c performs the process of step S205. ..
  • step S205 When the variance of the answers of the annotator 20 is 1.0 or more (step S205: affirmative), the classification unit 15c classifies the first learning data to which the correct answer label is given into the data E (step S209). Further, when the variance of the answers of the annotator 20 is less than 1.0 (step S205: negation), the first learning data to which the correct answer label is attached is classified into the data D (step S210). When the classification process of steps S208 to S210 is completed, the classification unit 15c ends the process.
  • step S202 when the annotator 20 is a machine learning model (step S202: the annotator is a machine learning model), the classification unit 15c performs the classification processing of steps S208 to S210 based on the processing of steps S206 to S207.
  • step S206: affirmative when the posterior probability of the value that is the estimation result of the annotator 20 is 80% or more (step S206: affirmative), the classification unit 15c classifies the first learning data to which the correct answer label is given into the reference point S (step S206: affirmative). Step S208).
  • the classification unit 15c when the posterior probability of the value that is the estimation result of the annotator 20 is less than 80% (step S206: negative) and the posterior probability is 50% or more (step S207: affirmative), the correct answer.
  • the labeled first training data is classified into data E (step S209).
  • the classification unit 15c classifies the first learning data to which the correct answer label is given into the data D. (Step S210).
  • the classification unit 15c ends the process.
  • the first learning data used for machine learning is acquired, the acquired first learning data is distributed to a plurality of annotators, and each annotator makes a first.
  • the first learning data is classified based on the reliability of the first correct answer label given to each learning data, and the classification result of the classified first learning data is delivered. Therefore, in this process, it is possible to perform low-cost and high-precision annotation in supervised learning in machine learning.
  • the first learning in the form of acquiring the first learning data including voice, image or moving image and assigning a predetermined number as the first correct answer label.
  • Data is distributed, and the first learning data is converted into reference data, data that is easy to give an accurate correct answer label, or data that is difficult to give an accurate correct answer label based on the distribution of the first correct answer label as reliability.
  • Classify Therefore, in this process, in supervised learning in machine learning, it is possible to assign a highly reliable correct label even when there is no comparison target, and it is possible to perform low-cost and high-precision annotation.
  • the first learning data is delivered to the machine learning model, and the first learning data is used as the posterior probability of the first correct answer label as the reliability. Classify based on. Therefore, in this process, in supervised learning in machine learning, it is possible to assign a highly reliable correct label even when the annotator is not a human being, and it is possible to perform annotation at lower cost and with higher accuracy.
  • the classification result a plurality of reference data having different extreme values, data that can be easily given an accurate correct answer label, and accurate correct answer labels are given.
  • the second training data which includes data that is difficult to handle and whose source of each data is the same data group, is generated, and a plurality of reference data are distributed first. Therefore, in this process, in supervised learning in machine learning, it is possible to assign a correct label with high reliability and efficiency even when there is no comparison target, and annotation is performed at lower cost and with higher accuracy. be able to.
  • the classification result is distributed to a plurality of annotators to which the first learning data is distributed, or to a predetermined annotator other than the plurality of annotators to which the first learning data is distributed. .. Therefore, in this process, in supervised learning in machine learning, it is possible to assign correct answer labels with high reliability, efficiency, and flexibility even when there is no comparison target, and it is possible to assign correct answer labels at lower cost and higher accuracy. Annotation can be performed.
  • each component of each of the illustrated devices according to the above embodiment is a functional concept and does not necessarily have to be physically configured as shown in the figure. That is, the specific form of distribution / integration of each device is not limited to the one shown in the figure, and all or part of them may be functionally or physically distributed / physically distributed in any unit according to various loads and usage conditions. Can be integrated and configured. Further, each processing function performed by each device may be realized by a CPU and a program analyzed and executed by the CPU, or may be realized as hardware by wired logic.
  • ⁇ program ⁇ It is also possible to create a program in which the processing executed by the annotation device 10 described in the above embodiment is described in a language that can be executed by a computer. In this case, the same effect as that of the above embodiment can be obtained by executing the program by the computer. Further, the same process as that of the above embodiment may be realized by recording the program on a computer-readable recording medium, reading the program recorded on the recording medium into the computer, and executing the program.
  • FIG. 8 is a diagram showing a computer that executes a program.
  • the computer 1000 includes, for example, a memory 1010, a CPU (Central Processing Unit) 1020, a hard disk drive interface 1030, a disk drive interface 1040, a serial port interface 1050, a video adapter 1060, and the like. It has a network interface 1070 and each of these parts is connected by a bus 1080.
  • a CPU Central Processing Unit
  • the memory 1010 includes a ROM (Read Only Memory) 1011 and a RAM 1012, as illustrated in FIG.
  • the ROM 1011 stores, for example, a boot program such as a BIOS (Basic Input Output System).
  • BIOS Basic Input Output System
  • the hard disk drive interface 1030 is connected to the hard disk drive 1090, as illustrated in FIG.
  • the disk drive interface 1040 is connected to the disk drive 1100 as illustrated in FIG.
  • a removable storage medium such as a magnetic disk or an optical disk is inserted into the disk drive 1100.
  • the serial port interface 1050 is connected to, for example, a mouse 1110 and a keyboard 1120, as illustrated in FIG.
  • the video adapter 1060 is connected, for example, to a display 1130, as illustrated in FIG.
  • the hard disk drive 1090 stores, for example, the OS 1091, the application program 1092, the program module 1093, and the program data 1094. That is, the above program is stored in, for example, the hard disk drive 1090 as a program module in which a command executed by the computer 1000 is described.
  • the various data described in the above embodiment are stored as program data in, for example, the memory 1010 or the hard disk drive 1090. Then, the CPU 1020 reads the program module 1093 and the program data 1094 stored in the memory 1010 and the hard disk drive 1090 into the RAM 1012 as needed, and executes various processing procedures.
  • the program module 1093 and program data 1094 related to the program are not limited to those stored in the hard disk drive 1090, and may be stored in, for example, a removable storage medium and read by the CPU 1020 via a disk drive or the like. .. Alternatively, the program module 1093 and the program data 1094 related to the program are stored in another computer connected via a network (LAN (Local Area Network), WAN (Wide Area Network), etc.) and stored via the network interface 1070. It may be read by the CPU 1020.
  • LAN Local Area Network
  • WAN Wide Area Network
  • Annotation device 11 Input unit 12 Output unit 13 Communication unit 14 Storage unit 15 Control unit 15a Acquisition unit 15b First distribution unit 15c Classification unit 15d Generation unit 15e Second distribution unit 20, 20A, 20B, 20C Annotator 30, 30A, 30B , 30C Database 100 Annotation System

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Medical Informatics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Artificial Intelligence (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

An annotation device (10) comprises: an acquisition unit (15a) that acquires first learning data pieces used in machine learning; a first distribution unit (15b) that distributes, to a plurality of annotators, the first learning data pieces acquired by the acquisition unit (15a); a classification unit (15c) that classifies the first learning data pieces on the basis of the reliability of first correct answer labels attached respectively to the first learning data pieces by the annotators; and a second distribution unit (15e) that distributes the classification results for the first learning data pieces classified by the classification unit (15c).

Description

アノテーション装置、アノテーション方法およびアノテーションプログラムAnnotation device, annotation method and annotation program
 本発明は、アノテーション装置、アノテーション方法およびアノテーションプログラムに関する。 The present invention relates to an annotation device, an annotation method, and an annotation program.
 従来、機械学習における教師あり学習のためには、学習データとそれに対応する正解ラベルが必要である。多くの研究では、複数名でデータを視聴等してメタデータを付与する作業(アノテーション)が行われている。 Conventionally, for supervised learning in machine learning, learning data and the corresponding correct answer label are required. In many studies, work (annotation) is performed to add metadata by viewing data with multiple people.
 例えば、音声や動画像に対するアノテーションの場合、作業者(適宜、「アノテータ」)は、提示された数秒~数十秒の音声や動画像を視聴し、仕様に合うようにメタデータを付与する。具体的には、音声からの感情認識の研究開発に向けたアノテーションであれば、聴取した音声に対して最も適切な感情を選択するし、画像に対するオブジェクト検出やオブジェクト認識であれば、オブジェクトの画像内における領域を選択し、オブジェクトに対する説明を付与する。 For example, in the case of annotation for audio or moving image, the worker (appropriately, "annotator") watches the presented audio or moving image for several seconds to several tens of seconds, and adds metadata to meet the specifications. Specifically, if it is an annotation for research and development of emotion recognition from voice, the most appropriate emotion is selected for the heard voice, and if it is object detection or object recognition for an image, the object image. Select an area within and give a description for the object.
 従来のアノテーション手法は、作業の比較対象の有無に分けることができる。比較対象がない場合、アノテータは静止画もしくは数秒程度の音声や動画像を視聴して、メタデータを付与する。この手法は、データ視聴回数=総サンプル数Nとなるため、時間コストが低い。また、短時間でも確実に誰もが理解できるタスク(例:文字起こし、オブジェクトへのタグ付け、誰が見聞きしても明らかに怒っている状態等)であれば、正確にアノテーションを行うことができる。 The conventional annotation method can be divided into the presence or absence of work comparison targets. When there is no comparison target, the annotator watches a still image or a sound or moving image for a few seconds and adds metadata. This method has a low time cost because the number of data views = the total number of samples N. Also, if it is a task that anyone can understand reliably even in a short time (eg transcription, tagging of objects, clearly angry state no matter who sees or hears it, etc.), it is possible to annotate accurately. ..
 一方、比較対象がある場合、アノテータは長時間(数十秒~数分)の音声や動画像を視聴して連続的かつ相対的な事象の変化に関するメタデータを付与したり(例えば、非特許文献3参照)、複数の音声や動画像を視聴して相対的に順位やスコアを付与したりする(例えば、非特許文献4参照)。この手法は、比較する対象があるため、アノテータ間のブレを低減し、より正確なメタデータを付与できる。 On the other hand, when there is a comparison target, the annotator watches a long time (several tens of seconds to several minutes) of audio or moving images and attaches metadata about continuous and relative changes in events (for example, non-patented). (Refer to Document 3), a plurality of voices and moving images are viewed and relative rankings and scores are given (see, for example, Non-Patent Document 4). Since this method has an object to be compared, it is possible to reduce blurring between annotators and add more accurate metadata.
 しかしながら、上述した従来技術では、機械学習における教師あり学習において、より低コストかつ高精度なアノテーションを行うことができない。なぜならば、比較対象がないアノテーション手法では、短時間の視聴では理解が難しいタスクの場合、正確なアノテーションができず、アノテータ間の付与ラベルのばらつきが大きくなり、アノテーション結果の信頼性が低くなるといった問題がある。 However, with the above-mentioned conventional technique, it is not possible to perform lower cost and higher accuracy annotation in supervised learning in machine learning. This is because with an annotation method that has no comparison target, accurate annotation cannot be performed for tasks that are difficult to understand in a short period of viewing, the variation of labels assigned between annotators becomes large, and the reliability of annotation results becomes low. There's a problem.
 このような問題に対して、アノテータの品質、回答傾向の考慮、多人数アノテーションでノイズの影響を小さくする等の対応があるが、アノテーションそのものの信頼性を上げるという根本的な解決にはなっていない(例えば、非特許文献1、2参照)。例えば、集中度のアノテーションを行う場合、非常に集中している、または全く集中していない様子、つまり誰が見聞きしても明らかにわかる状態であれば複数アノテータによる投票は一致しやすいが、集中しているのかそうでないのかがわかりにくい状態の場合、正確なアノテーションは難しく、結果として微妙な違いを表現できない。 There are measures to deal with such problems, such as the quality of annotators, consideration of response tendencies, and reduction of the influence of noise with multi-person annotation, but it is a fundamental solution to improve the reliability of the annotation itself. No (see, for example, Non-Patent Documents 1 and 2). For example, when annotating the degree of concentration, if it seems to be very concentrated or not concentrated at all, that is, if anyone can see and hear it clearly, the votes by multiple annotators are likely to match, but they are concentrated. If it is difficult to tell whether it is or not, accurate annotation is difficult, and as a result, subtle differences cannot be expressed.
 一方、比較対象がないアノテーション手法では、長時間もしくは大量のデータを視聴する必要があり、アノテーションに膨大なコストを要する。例えば、いくつかのデータの組み合わせを同時に視聴する場合、全Nサンプルのデータからn個ずつ選択すると最大個の組み合わせが存在し得る。心理学実験法を参考にアノテーションの品質を保ちつつ組み合わせ数を削減することは可能かもしれないが、そのためにはどの組み合わせを除外するかについては慎重な検討が必要である。 On the other hand, in the annotation method without a comparison target, it is necessary to watch a large amount of data for a long time, and the annotation requires a huge cost. For example, when viewing a combination of several data at the same time, if n combinations are selected from the data of all N samples, a maximum of NC n combinations may exist. It may be possible to reduce the number of combinations while maintaining the quality of annotations by referring to the psychological experimental method, but for that purpose, careful consideration is required as to which combination to exclude.
 上述した課題を解決し、目的を達成するために、本発明に係るアノテーション装置は、機械学習に用いられる第1の学習データを取得する取得部と、前記取得部によって取得された前記第1の学習データを複数のアノテータに配信する第1配信部と、各アノテータによって前記第1の学習データにそれぞれ付与された第1の正解ラベルの信頼度に基づいて、前記第1の学習データを分類する分類部と、前記分類部によって分類された前記第1の学習データの分類結果を配信する第2配信部とを備えることを特徴とする。 In order to solve the above-mentioned problems and achieve the object, the annotation device according to the present invention has an acquisition unit for acquiring the first learning data used for machine learning and the first acquisition unit acquired by the acquisition unit. The first learning data is classified based on the reliability of the first distribution unit that distributes the learning data to a plurality of annotators and the reliability of the first correct answer label given to the first learning data by each annotator. It is characterized by including a classification unit and a second distribution unit that distributes the classification result of the first learning data classified by the classification unit.
 また、本発明に係るアノテーション方法は、アノテーション装置によって実行されるアノテーション方法であって、機械学習に用いられる第1の学習データを取得する取得工程と、前記取得工程によって取得された前記第1の学習データを複数のアノテータに配信する第1配信工程と、各アノテータによって前記第1の学習データにそれぞれ付与された第1の正解ラベルの信頼度に基づいて、前記第1の学習データを分類する分類工程と、前記分類工程によって分類された前記第1の学習データの分類結果を配信する第2配信工程とを含むことを特徴とする。 Further, the annotation method according to the present invention is an annotation method executed by an annotation device, and is an acquisition step of acquiring first learning data used for machine learning, and the first acquisition step acquired by the acquisition step. The first training data is classified based on the first distribution step of distributing the training data to a plurality of annotators and the reliability of the first correct answer label given to the first training data by each annotator. It is characterized by including a classification step and a second distribution step of distributing the classification result of the first learning data classified by the classification step.
 また、本発明に係るアノテーションプログラムは、機械学習に用いられる第1の学習データを取得する取得ステップと、前記取得ステップによって取得された前記第1の学習データを複数のアノテータに配信する第1配信ステップと、各アノテータによって前記第1の学習データにそれぞれ付与された第1の正解ラベルの信頼度に基づいて、前記第1の学習データを分類する分類ステップと、前記分類ステップによって分類された前記第1の学習データの分類結果を配信する第2配信ステップとをコンピュータに実行させることを特徴とする。 Further, the annotation program according to the present invention has an acquisition step for acquiring the first learning data used for machine learning, and a first distribution for distributing the first learning data acquired by the acquisition step to a plurality of annotators. A classification step for classifying the first learning data based on the reliability of the step and the first correct answer label given to the first learning data by each annotator, and the classification step classified by the classification step. It is characterized by having a computer execute a second distribution step of distributing the classification result of the first learning data.
 本発明では、機械学習における教師あり学習において、より低コストかつ高精度なアノテーションを行うことができる。 In the present invention, it is possible to perform low-cost and high-precision annotation in supervised learning in machine learning.
図1は、第1の実施形態に係るアノテーションシステムの構成例を示す図である。FIG. 1 is a diagram showing a configuration example of an annotation system according to the first embodiment. 図2は、第1の実施形態に係るアノテーション装置の構成例を示すブロック図である。FIG. 2 is a block diagram showing a configuration example of the annotation device according to the first embodiment. 図3は、第1の実施形態に係る学習データの一例を示す図である。FIG. 3 is a diagram showing an example of learning data according to the first embodiment. 図4は、第1の実施形態に係る第1の学習データと第1の正解ラベルの一例を示す図である。FIG. 4 is a diagram showing an example of the first learning data and the first correct answer label according to the first embodiment. 図5は、第1の実施形態に係る第2の学習データと第2の正解ラベルの一例を示す図である。FIG. 5 is a diagram showing an example of the second learning data and the second correct answer label according to the first embodiment. 図6は、第1の実施形態に係るアノテーション処理の流れの一例を示すフローチャートである。FIG. 6 is a flowchart showing an example of the flow of annotation processing according to the first embodiment. 図7は、第1の実施形態に係る第1の学習データの分類処理の流れの一例を示すフローチャートである。FIG. 7 is a flowchart showing an example of the flow of the first learning data classification process according to the first embodiment. 図8は、プログラムを実行するコンピュータを示す図である。FIG. 8 is a diagram showing a computer that executes a program.
 以下に、本発明に係るアノテーション装置、アノテーション方法およびアノテーションプログラムの実施形態を図面に基づいて詳細に説明する。なお、本発明は、以下に説明する実施形態により限定されるものではない。 Hereinafter, the annotation device, the annotation method, and the embodiment of the annotation program according to the present invention will be described in detail with reference to the drawings. The present invention is not limited to the embodiments described below.
〔第1の実施形態〕
 以下に、本実施形態に係るアノテーションシステムの構成、アノテーション装置の構成、アノテーション処理の具体例、アノテーション処理の流れ、データの分類処理の流れを順に説明し、最後に本実施形態の効果を説明する。
[First Embodiment]
Hereinafter, the configuration of the annotation system, the configuration of the annotation device, the specific example of the annotation processing, the flow of the annotation processing, and the flow of the data classification processing according to the present embodiment will be described in order, and finally, the effect of the present embodiment will be described. ..
[アノテーションシステムの構成]
 図1を用いて、本実施形態に係るアノテーションシステム(適宜、本システム)100の構成を詳細に説明する。図1は、第1の実施形態に係るアノテーションシステムの一例を示す図である。アノテーションシステム100は、サーバ等のアノテーション装置10、各種端末等のアノテータ20(20A、20B、20C)および各種データベース30(30A、30B、30C)を有する。
[Annotation system configuration]
The configuration of the annotation system (as appropriate, this system) 100 according to the present embodiment will be described in detail with reference to FIG. 1. FIG. 1 is a diagram showing an example of an annotation system according to the first embodiment. The annotation system 100 has an annotation device 10 such as a server, annotators 20 (20A, 20B, 20C) such as various terminals, and various databases 30 (30A, 30B, 30C).
 ここで、アノテーション装置10とアノテータ20とデータベース30とは、図示しない所定の通信網を介して、有線または無線により通信可能に接続される。なお、図1に示したアノテーションシステム100には、複数台のアノテーション装置10が含まれてもよい。 Here, the annotation device 10, the annotator 20, and the database 30 are connected so as to be communicable by wire or wirelessly via a predetermined communication network (not shown). The annotation system 100 shown in FIG. 1 may include a plurality of annotation devices 10.
 まず、アノテーション装置10は、各種データベース30から、研究や開発に必用な学習データを第1の学習データとして取得する(ステップS1)。ここで、取得する学習データとは、音声、画像、動画等のデータであって、当該研究や開発の目的に応じた媒体、規模で取得される。 First, the annotation device 10 acquires learning data necessary for research and development as the first learning data from various databases 30 (step S1). Here, the learning data to be acquired is data such as voice, image, and moving image, and is acquired in a medium and scale according to the purpose of the research or development.
 次に、アノテーション装置10は、取得した第1の学習データをアノテータ20に配信する(ステップS2)。ここで、アノテータ20は、配信された学習データにそれぞれ正解ラベルを付与する端末および当該端末のユーザであるが、特に限定されない。アノテータ20は、別途作成された特定の正解ラベルを付与できる機械学習モデルであってもよい。 Next, the annotation device 10 distributes the acquired first learning data to the annotator 20 (step S2). Here, the annotator 20 is a terminal that assigns a correct answer label to the distributed learning data, and a user of the terminal, but is not particularly limited. The annotator 20 may be a machine learning model that can be given a specific correct answer label created separately.
 続いて、アノテータ20は、配信された第1の学習データに正解ラベル(第1の正解ラベル)を付与する(ステップS3)。また、アノテーション装置10は、正解ラベルを付与された第1の学習データを取得する(ステップS4)。 Subsequently, the annotator 20 assigns a correct answer label (first correct answer label) to the delivered first learning data (step S3). Further, the annotation device 10 acquires the first learning data to which the correct answer label is attached (step S4).
 その後、アノテーション装置10は、第1の正解ラベルをもとに第1の学習データを分類する(ステップS5)。このとき、アノテーション装置10は、アノテータ20から取得した回答に基づいて、信頼できる正解データを付与された学習データを基準点(適宜、「基準データ」)Sとして選定する。また、アノテーション装置10は、基準点S以外の学習データをさらに、正確に正解ラベルを付与しやすいデータ(適宜、「データD」)と正確に正解ラベルを付与しにくいデータ(適宜、「データE」)とに分類する。 After that, the annotation device 10 classifies the first learning data based on the first correct answer label (step S5). At this time, the annotation device 10 selects the learning data to which the reliable correct answer data is given as the reference point (appropriately, “reference data”) S based on the answer obtained from the annotator 20. Further, the annotation device 10 further adds data other than the reference point S to the data that is easy to accurately label the correct answer (appropriately “data D”) and data that is difficult to accurately label the correct answer (appropriately “data E”). ”) And.
 さらに、アノテーション装置10は、分類した第1の学習データから第2の学習データを生成する(ステップS6)。このとき、アノテーション装置10は、発信源が同一である基準点S、データEおよびデータDを含むデータ群を生成する。なお、第1の学習データの分類や第2の学習データの生成については後述する。 Further, the annotation device 10 generates the second learning data from the classified first learning data (step S6). At this time, the annotation device 10 generates a data group including the reference point S, the data E, and the data D having the same source. The classification of the first learning data and the generation of the second learning data will be described later.
 そして、アノテーション装置10は、生成した第2の学習データをアノテータ20に配信する(ステップS7)。このとき、アノテーション装置10は、第2の学習データのデータ群を配信するときに、基準点Sを視聴した後、データE、データDを視聴するように、各データをアノテータ20に配信する。また、アノテータ20は、配信された第2の学習データに正解ラベル(第2の正解ラベル)を付与する(ステップS8)。最後に、アノテーション装置10は、正解ラベルを付与された第2の学習データを取得する(ステップS9)。 Then, the annotation device 10 distributes the generated second learning data to the annotator 20 (step S7). At this time, when the data group of the second learning data is distributed, the annotation device 10 distributes each data to the annotator 20 so that the reference point S is viewed and then the data E and the data D are viewed. Further, the annotator 20 assigns a correct answer label (second correct answer label) to the distributed second learning data (step S8). Finally, the annotation device 10 acquires the second learning data to which the correct answer label is attached (step S9).
 本実施形態に係るアノテーションシステム100では、アノテーション装置10が、正解ラベルを付与したい事象の信頼できるデータをデータ群に含めて、かつそれを明示する。このため、アノテータ20がそれらのデータを比較対象として活用できるようになり、より正確なアノテーションを実現することができる。 In the annotation system 100 according to the present embodiment, the annotation device 10 includes reliable data of the event to which the correct answer label is to be given in the data group, and clearly indicates it. Therefore, the annotator 20 can utilize the data as a comparison target, and more accurate annotation can be realized.
[アノテーション装置の構成]
 図2を用いて、本実施形態に係るアノテーション装置10の構成を詳細に説明する。図2は、本実施形態に係るアノテーション装置の構成例を示すブロック図である。アノテーション装置10は、入力部11、出力部12、通信部13、記憶部14および制御部15を有する。
[Annotation device configuration]
The configuration of the annotation device 10 according to the present embodiment will be described in detail with reference to FIG. FIG. 2 is a block diagram showing a configuration example of the annotation device according to the present embodiment. The annotation device 10 includes an input unit 11, an output unit 12, a communication unit 13, a storage unit 14, and a control unit 15.
 入力部11は、当該アノテーション装置10への各種情報の入力を司る。入力部11は、例えば、マウスやキーボード等であり、当該アノテーション装置10への設定情報等の入力を受け付ける。また、出力部12は、当該アノテーション装置10からの各種情報の出力を司る。出力部12は、例えば、ディスプレイ等であり、当該アノテーション装置10に記憶された設定情報等を出力する。 The input unit 11 controls the input of various information to the annotation device 10. The input unit 11 is, for example, a mouse, a keyboard, or the like, and receives input of setting information or the like to the annotation device 10. Further, the output unit 12 controls the output of various information from the annotation device 10. The output unit 12 is, for example, a display or the like, and outputs setting information or the like stored in the annotation device 10.
 通信部13は、他の装置との間でのデータ通信を司る。例えば、通信部13は、各通信装置との間でデータ通信を行う。また、通信部13は、図示しないオペレータの端末との間でデータ通信を行うことができる。 The communication unit 13 controls data communication with other devices. For example, the communication unit 13 performs data communication with each communication device. Further, the communication unit 13 can perform data communication with a terminal of an operator (not shown).
 記憶部14は、制御部15が動作する際に参照する各種情報や、制御部15が動作した際に取得した各種情報を記憶する。ここで、記憶部14は、例えば、RAM(Random Access Memory)、フラッシュメモリ等の半導体メモリ素子、または、ハードディスク、光ディスク等の記憶装置等である。なお、図2の例では、記憶部14は、アノテーション装置10の内部に設置されているが、アノテーション装置10の外部に設置されてもよいし、複数の記憶部が設置されていてもよい。 The storage unit 14 stores various information referred to when the control unit 15 operates and various information acquired when the control unit 15 operates. Here, the storage unit 14 is, for example, a RAM (Random Access Memory), a semiconductor memory element such as a flash memory, or a storage device such as a hard disk or an optical disk. In the example of FIG. 2, the storage unit 14 is installed inside the annotation device 10, but it may be installed outside the annotation device 10, or a plurality of storage units may be installed.
 記憶部14は、後述するデータベース30から取得した第1の学習データ、アノテータ20から取得した第1の正解ラベルが付与された第1の学習データ、制御部15の分類部15cが分類した分類結果、生成部15dが生成した第2の学習データ、アノテータ20から取得した第2の正解ラベルが付与された第2の学習データ等の他、アノテータ20の情報として、ユーザ名や機械学習モデルの識別番号等を記憶する。 The storage unit 14 is the first learning data acquired from the database 30 described later, the first learning data with the first correct answer label acquired from the annotator 20, and the classification result classified by the classification unit 15c of the control unit 15. , The second learning data generated by the generation unit 15d, the second learning data with the second correct answer label acquired from the annotator 20, etc., as well as the user name and the identification of the machine learning model as the information of the annotator 20. Memorize numbers etc.
 制御部15は、当該アノテーション装置10全体の制御を司る。制御部15は、取得部15a、第1配信部15b、分類部15c、生成部15dおよび第2配信部15eを有する。ここで、制御部15は、例えば、CPU(Central Processing Unit)やMPU(Micro Processing Unit)等の電子回路やASIC(Application Specific Integrated Circuit)やFPGA(Field Programmable Gate Array)等の集積回路である。 The control unit 15 controls the entire annotation device 10. The control unit 15 includes an acquisition unit 15a, a first distribution unit 15b, a classification unit 15c, a generation unit 15d, and a second distribution unit 15e. Here, the control unit 15 is, for example, an electronic circuit such as a CPU (Central Processing Unit) or an MPU (Micro Processing Unit) or an integrated circuit such as an ASIC (Application Specific Integrated Circuit) or an FPGA (Field Programmable Gate Array).
 取得部15aは、機械学習に用いられる第1の学習データを取得する。例えば、取得部15aは、音声、画像または動画を含む第1の学習データを取得する。また、取得部15aは、データベース30から第1の学習データを取得する。また、取得部15aは、アノテータ20から正解ラベルが付与された学習データを取得する。さらに、取得部15aは、第1の学習データ、正解ラベルが付与された学習データ等を記憶部14に格納する。 The acquisition unit 15a acquires the first learning data used for machine learning. For example, the acquisition unit 15a acquires the first learning data including audio, an image, or a moving image. Further, the acquisition unit 15a acquires the first learning data from the database 30. Further, the acquisition unit 15a acquires the learning data to which the correct answer label is attached from the annotator 20. Further, the acquisition unit 15a stores the first learning data, the learning data to which the correct answer label is attached, and the like in the storage unit 14.
 第1配信部15bは、取得部15aによって取得された第1の学習データを複数のアノテータ20に配信する。例えば、第1配信部15bは、第1の正解ラベルとして、所定の数字を付与させる形式の第1の学習データを配信する。また、第1配信部15bは、アノテータ20として、機械学習モデルに第1の学習データを配信する。なお、第1の学習データおよび第1の正解ラベルの詳細な処理については後述する。 The first distribution unit 15b distributes the first learning data acquired by the acquisition unit 15a to a plurality of annotators 20. For example, the first distribution unit 15b distributes the first learning data in a format in which a predetermined number is given as the first correct answer label. Further, the first distribution unit 15b distributes the first learning data to the machine learning model as the annotator 20. The detailed processing of the first learning data and the first correct answer label will be described later.
 分類部15cは、各アノテータによって第1の学習データにそれぞれ付与された第1の正解ラベルの信頼度に基づいて、第1の学習データを分類する。例えば、分類部15cは、第1の学習データを、信頼度として第1の正解ラベルの分散に基づいて、基準データ、正確に正解ラベルを付与しやすいデータ、または正確に正解ラベルを付与しにくいデータに分類する。また、分類部15cは、第1の学習データを、信頼度として第1の正解ラベルの事後確率に基づいて分類する。さらに、分類部15aは、第1の正解ラベルの信頼度の計算結果、信頼度に基づく分類結果を記憶部14に格納する。 The classification unit 15c classifies the first learning data based on the reliability of the first correct answer label given to the first learning data by each annotator. For example, the classification unit 15c assigns the first learning data as reference data, data that is easy to give an accurate correct label, or difficult to give an accurate label based on the variance of the first correct label as the reliability. Classify into data. Further, the classification unit 15c classifies the first learning data based on the posterior probability of the first correct answer label as the reliability. Further, the classification unit 15a stores the calculation result of the reliability of the first correct answer label and the classification result based on the reliability in the storage unit 14.
 ここで、信頼度とは、アノテータ20が人である場合は、ある学習データに対する各アノテータの正解ラベルの数値の分散であるが、特に限定されない。信頼度に用いる指標は、数値のばらつきを表わすものであればよく、数値のばらつきが小さいほど正解ラベルの信頼度が高い。また、信頼度とは、アノテータ20が機械学習モデルである場合は、ある学習データに対する機械学習モデルの推定結果となる数値の事後確率であるが、特に限定されない。信頼度に用いる指標は、機械学習モデルの推定結果の精度を表わすものであればよく、推定結果の精度が大きいほど正解ラベルの信頼度が高い。 Here, the reliability is the variance of the numerical value of the correct answer label of each annotator for a certain learning data when the annotator 20 is a person, but is not particularly limited. The index used for the reliability may be any index showing the variation of the numerical value, and the smaller the variation of the numerical value, the higher the reliability of the correct label. Further, the reliability is a posterior probability of a numerical value that is an estimation result of the machine learning model for a certain learning data when the annotator 20 is a machine learning model, but is not particularly limited. The index used for the reliability may be any index that represents the accuracy of the estimation result of the machine learning model, and the higher the accuracy of the estimation result, the higher the reliability of the correct label.
 生成部15dは、分類結果として、基準データであって極値の異なる複数の基準データ、正確に正解ラベルを付与しやすいデータ、および正確に正解ラベルを付与しにくいデータを含み、かつ、各データの発生源が同一のデータ群である第2の学習データを生成する。さらに、生成部15dは、第2の学習データ等の分類結果を記憶部14に格納する。 As a classification result, the generation unit 15d includes a plurality of reference data having different extreme values, data in which it is easy to give an accurate correct label, and data in which it is difficult to give an accurate correct label, and each data. Generate a second training data whose source is the same data group. Further, the generation unit 15d stores the classification result of the second learning data or the like in the storage unit 14.
 ここで、極値とは、例えば、正解ラベルとして、集中度等の特定の状態の度合いを5段階{1,2,3,4,5}の数字で判定させる形式の学習データであった場合、最小の数字「1」と最大の数字「5」であるが、特に限定されない。極値は、アノテータ20が極端な状態であると明確に判定できることを示す数字であればよく、事前に正解ラベルとして設定された数字の範囲の最小値または最大値に限られない。 Here, the extreme value is, for example, a case where the learning data is in a format in which the degree of a specific state such as the degree of concentration is determined by numbers in five stages {1, 2, 3, 4, 5} as a correct answer label. , The minimum number "1" and the maximum number "5", but are not particularly limited. The extreme value may be a number indicating that the annotator 20 can be clearly determined to be in an extreme state, and is not limited to the minimum value or the maximum value in the range of numbers set in advance as the correct label.
 第2配信部15eは、分類部15cによって分類された第1の学習データの分類結果を配信する。例えば、第2配信部15eは、極値の異なる複数の基準データを最初に配信する。また、第2配信部15eは、分類結果を第1の学習データを配信した複数のアノテータ、または第1の学習データを配信した複数のアノテータ以外の所定のアノテータに配信する。 The second distribution unit 15e distributes the classification result of the first learning data classified by the classification unit 15c. For example, the second distribution unit 15e first distributes a plurality of reference data having different extreme values. Further, the second distribution unit 15e distributes the classification result to a plurality of annotators to which the first learning data is distributed, or to a predetermined annotator other than the plurality of annotators to which the first learning data is distributed.
 ここで、分類結果とは、付与された正解ラベルの信頼度に基づいて分類部15cによって分類された第1の学習データであり、例えば、基準点S(基準データ)、データE(正確に正解ラベルを付与しやすいデータ)およびデータD(正確に正解ラベルを付与しにくいデータ)の3分類がラベリングされた学習データであるが、特に限定されない。分類結果は、正解ラベルの信頼度がラベリングされた学習データであってもよいし、生成部15dによって選定された学習データであってもよい。 Here, the classification result is the first learning data classified by the classification unit 15c based on the reliability of the given correct answer label, and is, for example, the reference point S (reference data) and the data E (correctly correct answer). The training data is labeled in three categories: data that is easy to label) and data D (data that is difficult to label accurately), but is not particularly limited. The classification result may be learning data in which the reliability of the correct answer label is labeled, or may be learning data selected by the generation unit 15d.
[アノテーション処理の具体例]
 図3~図5を用いて、本実施形態に係るアノテーション装置10のアノテーション処理の具体例を説明する。図3は、第1の実施形態に係る学習データの一例を示す図である。図4は、第1の実施形態に係る第1の学習データと第1の正解ラベルの一例を示す図である。図5は、第1の実施形態に係る第2の学習データと第2の正解ラベルの一例を示す図である。
[Specific example of annotation processing]
A specific example of the annotation process of the annotation device 10 according to the present embodiment will be described with reference to FIGS. 3 to 5. FIG. 3 is a diagram showing an example of learning data according to the first embodiment. FIG. 4 is a diagram showing an example of the first learning data and the first correct answer label according to the first embodiment. FIG. 5 is a diagram showing an example of the second learning data and the second correct answer label according to the first embodiment.
(第1のアノテーション処理)
 第1に、第1の学習データの取得から、第1の正解ラベルの付与された第1の学習データの取得までの、第1のアノテーション処理について説明する。まず、図3を用いて、アノテーション装置10がデータベース30等から取得する第1の学習データについて説明する。ここで、取得する第1の学習データは、音声、画像、動画等のデータであって、研究や開発の目的に応じた媒体、規模で取得されたデータである。例えば、音声からの集中度推定の実現に向けてアノテーションを行う場合、アノテーション装置10は、データベース30の中で音声データを記憶する音声データベースから音声データを取得する。
(First annotation processing)
First, the first annotation process from the acquisition of the first learning data to the acquisition of the first learning data to which the first correct answer label is attached will be described. First, the first learning data acquired by the annotation device 10 from the database 30 or the like will be described with reference to FIG. Here, the first learning data to be acquired is data such as voice, image, and moving image, and is data acquired on a medium and scale according to the purpose of research or development. For example, when annotating for the realization of concentration estimation from voice, the annotation device 10 acquires voice data from a voice database that stores voice data in the database 30.
 図3は、音声データを保持するデータセットXを示した図であり、データセットXには、{x,x,x,x,x,・・・x}の音声データが含まれる。なお、図3に示した音声データは、音声波形を時間経過と音声信号強度との関係として表したものである。 FIG. 3 is a diagram showing a data set X that holds audio data, and the data set X contains audio data of {x 0 , x 1 , x 2 , x 3 , x 4 , ... X N }. Is included. The voice data shown in FIG. 3 represents the voice waveform as the relationship between the passage of time and the voice signal strength.
 以降の説明では、音声データを第1の学習データとして用いたアノテーション処理について説明するが、学習データの種類は特に限定されない。第1の学習データは、音声データ以外にも、画像データ、動画データ、またはそれらの組み合わせであってもよい。さらに、第1の学習データは、上記の音声データ等を数値化、テキスト化したデータであってもよい。 In the following description, annotation processing using voice data as the first learning data will be described, but the type of learning data is not particularly limited. The first learning data may be image data, moving image data, or a combination thereof, in addition to audio data. Further, the first learning data may be data obtained by digitizing or textizing the above-mentioned voice data or the like.
 次に、図4を用いて、アノテーション装置10がアノテータ20に配信する第1の学習データと、アノテーション装置10がアノテータ20から取得する第1の学習データに付与された第1の正解ラベルについて説明する。例えば、音声からの集中度推定の実現に向けてアノテーションを行う場合、アノテーション装置10は、5段階の集中度(「1」:集中していない、「2」:やや集中していない、「3」:どちらとも言えない(フラット)、「4」:やや集中している、「5」:集中している)を事前に設定し、データセットXが保持する各音声データに対して、どの集中度が最も適しているかの正解ラベルを付与させる第1の学習データをアノテータ20に配信する。 Next, with reference to FIG. 4, the first learning data delivered by the annotation device 10 to the annotator 20 and the first correct label given to the first learning data acquired by the annotation device 10 from the annotator 20 will be described. do. For example, when annotating for the realization of concentration estimation from voice, the annotation device 10 has five levels of concentration (“1”: not concentrated, “2”: slightly not concentrated, “3”. ": Neither can be said (flat)," 4 ": slightly concentrated," 5 ": concentrated) is set in advance, and which concentration is applied to each voice data held by the data set X. The first learning data to be given a correct answer label as to whether the degree is most suitable is delivered to the annotator 20.
 なお、アノテーション装置10は、音声からの集中度推定の実現に向けてアノテーションを行う場合、例えば、授業中の教師と生徒の発問と対話に関する音声データから、発問された生徒が授業に集中していたか、集中していなかったか、また、どのくらい集中していたか等を5段階で判定するための学習データを配信する。また、アノテーション装置10は、画像データや動画データから集中度に関する正解ラベルを付与する場合は、授業中の画像や動画から、生徒の表情等をアノテータ20に読み取らせ、集中度を判定するための学習データを配信してもよい。 When the annotation device 10 annotates for the realization of concentration estimation from voice, for example, the students who are asked are concentrated in the lesson from the voice data related to the questions and dialogues between the teacher and the students in the lesson. We will deliver learning data to determine whether or not we were concentrated, and how much we were concentrated, etc. in 5 stages. Further, when the annotation device 10 assigns a correct answer label regarding the concentration degree from the image data or the video data, the annotation device 10 causes the annotator 20 to read the student's facial expression or the like from the image or the video in the lesson, and determines the concentration degree. Learning data may be distributed.
 そして、アノテーション装置10は、アノテータ20によって正解ラベルを付与された第1の学習データを取得する。図3では、「アノテータ01」~「アノテータ03」のデータセットXの音声データ「x」~「x」に対する正解ラベルが示されている(図4「ANNOT1(X)」参照)。例えば、音声データxについての「アノテータ01」~「アノテータ03」が付与した正解ラベルは、それぞれ、「2」、「1」、「1」である。 Then, the annotation device 10 acquires the first learning data to which the correct answer label is given by the annotator 20. In FIG. 3, correct labels for the voice data “x 0 ” to “x N ” of the data sets X of the “annotator 01” to the “annotator 03” are shown (see FIG. 4 “ANNOT1 (X)”). For example, the correct answer labels given by "Anotator 01" to "Anotator 03" for the voice data x 0 are "2", "1", and "1", respectively.
(第2のアノテーション処理)
 第2に、第1の正解ラベルの付与された第1の学習データの分類から、第2の正解ラベルの付与された第2の学習データの取得までの、第2のアノテーション処理について説明する。まず、図4を用いて、アノテータ20から取得した正解ラベルの信頼度に基づく、第1の学習データの分類処理の具体例について説明する。アノテーション装置10は、各音声データに付与された正解ラベルの平均および分散の数値を算出する。
(Second annotation processing)
Secondly, the second annotation processing from the classification of the first learning data to which the first correct answer label is attached to the acquisition of the second learning data to which the second correct answer label is attached will be described. First, a specific example of the first training data classification process based on the reliability of the correct answer label obtained from the annotator 20 will be described with reference to FIG. The annotation device 10 calculates the numerical values of the average and the variance of the correct answer labels given to each voice data.
 図4において、xでは、平均「1.3」、分散「0.3」(分散小)である。同様にして、xでは、平均「5」、分散「0」(全アノテータの回答が一致)、xでは、平均「1」、分散「0」(全アノテータの回答が一致)、xでは、平均「3.3」、分散「0.3」(分散小)、xでは、平均「4.0」、分散「1.0」(分散大)、xでは、平均「1.6」、分散「1.3」(分散大)である。 In FIG. 4, at x0 , the average is “1.3” and the variance is “0.3” (small variance). Similarly, for x 1 , the mean "5" and the variance "0" (all annotator answers match), and for x 2 , the mean "1" and the variance "0" (all the annotator answers match), x 3 Then, the average is "3.3", the variance is "0.3" (small variance), the mean is " 4.0 " for x4 , the variance is "1.0" (large variance), and the mean is "1. 6 ”, variance“ 1.3 ”(large variance).
 このとき、アノテーション装置10は、アノテータ20から取得した回答から、信頼できる正解データを付与された学習データを基準点Sとして選定する。図4の例では、全アノテータの回答が一致していて、かつ、付与された正解ラベルの数値が極値であるものとして、x(極値「5」)およびx(極値「1」)を選定する。 At this time, the annotation device 10 selects the learning data to which the reliable correct answer data is given from the answers obtained from the annotator 20 as the reference point S. In the example of FIG. 4, it is assumed that the answers of all the annotators are the same and the numerical value of the given correct answer label is an extremum, and x 1 (extreme value “5”) and x 2 (extreme value “1”). ”) Is selected.
 また、アノテーション装置10は、基準点S以外の学習データをさらに、正確に正解ラベルを付与しやすいデータをデータDとして、また正確に正解ラベルを付与しにくいデータをデータEとして分類する。例えば、アノテーション装置10は、信頼度の閾値を設けて、分散1.0以上であればデータDに分類し、そうでなければデータEに分類する。図4の例では、アノテーション装置10は、xとxは分散1.0未満なのでデータEに分類し、xとxは分散1.0以上なのでデータDに分類する。 Further, the annotation device 10 further classifies the learning data other than the reference point S as data D, which is easy to accurately label the correct answer, and as data E, which is difficult to accurately label the correct answer. For example, the annotation device 10 sets a threshold value of reliability and classifies it as data D if the variance is 1.0 or more, and classifies it as data E otherwise. In the example of FIG. 4, the annotation device 10 classifies x 0 and x 3 into data E because the variance is less than 1.0, and classifies x 4 and x N into data D because the variance is 1.0 or more.
 なお、アノテーション装置10は、正解ラベルとして、別途作成された機械学習モデルによる推定結果を用いる場合は、例えば、推定結果となる数値の事後確率が80%以上である学習データを基準点S、事後確率が50%以上で80%未満である学習データをデータE、事後確率が50%未満である学習データをデータDとして分類する。また、アノテーション装置10は、学習データの分類数や閾値等の分類方式を、静的に、または動的に変更することができる。 When the annotation device 10 uses the estimation result by the machine learning model created separately as the correct answer label, for example, the training data in which the posterior probability of the numerical value to be the estimation result is 80% or more is set as the reference point S and the posterior. The training data having a probability of 50% or more and less than 80% is classified as data E, and the training data having a posterior probability of less than 50% is classified as data D. Further, the annotation device 10 can statically or dynamically change the classification method such as the number of classifications of learning data and the threshold value.
 続いて、図5を用いて、第1の学習データの分類に基づく、第2の学習データの生成処理および配信処理の具体例について説明する。アノテーション装置10は、基準点S、データEおよびデータDの3種類のデータを含むデータ群を第2の学習データとして生成する。このとき、基準点Sについてはそれぞれの極値である「1」および「5」のデータが必ず含まれるようにする。また、各データ群のデータの発生源(話者、動画像に映る人物、オブジェクト等)は同一のものとする。 Subsequently, with reference to FIG. 5, a specific example of the second learning data generation processing and distribution processing based on the first learning data classification will be described. The annotation device 10 generates a data group including three types of data, the reference point S, the data E, and the data D, as the second training data. At this time, the reference point S must include the data of the respective extreme values "1" and "5". In addition, the sources of data (speakers, people in moving images, objects, etc.) of each data group are the same.
 例えば、アノテーション装置10は、データ群{p,p,・・・p}を要素とするデータ群集合Pを生成する。ここで、データ群pには、{x,x,x,x,x,x}(図3、図4参照)が要素として含まれ、データ群pには、{x,x,x,x,x,x}(図3、図4では図示せず)が要素として含まれるものとする。図5の例では、アノテーション装置10は、データ群pとして、基準点S{x,x}、データE{x,x}、データD{x,x}を選定し、データ群pとして、基準点S{x,x}、データE{x,x}、データD{x,x}を選定している。 For example, the annotation device 10 generates a data group set P having a data group {p 0 , p 1 , ... p M } as an element. Here, the data group p 0 includes {x 0 , x 1 , x 2 , x 3 , x 4 , x N } (see FIGS. 3 and 4) as elements, and the data group p M includes. It is assumed that {x a , x b , x c , x d , x e , x f } (not shown in FIGS. 3 and 4) are included as elements. In the example of FIG. 5, the annotation device 10 selects the reference point S {x 1 , x 2 }, the data E {x 0 , x 3 }, and the data D {x 4 , x N } as the data group p 0 . , The reference point S {x a , x b }, the data E {x c , x d }, and the data D {x e , x f } are selected as the data group p M.
 なお、各データ群内のデータの個数や選定方法は、上述したように異なる極値である基準点Sを含み、かつ同一の発生源である条件を満たせば、任意に変更することができる。例えば、データの個数は、一定の範囲内でランダムな個数としてもよい。また、含まれるデータは、データEとデータDを2つずつ用意して、それぞれのアノテーションの結果の平均値が「1」または「5」寄りになるようにしてもよい。 Note that the number of data in each data group and the selection method can be arbitrarily changed as long as they include the reference point S which is a different extremum as described above and satisfy the conditions of the same source. For example, the number of data may be a random number within a certain range. Further, as the data to be included, two data E and two data D may be prepared so that the average value of the results of each annotation is closer to "1" or "5".
 その後、アノテーション装置10は、上記のように選定したデータ群を第2の学習データとしてアノテータ20に配信する。このとき、アノテーション装置10は、第2の学習データのデータ群を配信するときに、データ群ごとに、最初に基準点Sを配信し、その後、データE、データDをアノテータ20に配信する。図5の例では、アノテーション装置10は、データ群pの配信に際して、基準点S{x,x}、データE{x,x}、データD{x,x}の順に配信し、データ群pの配信に際して、基準点S{x,x}、データE{x,x}、データD{x,x}の順に配信している。 After that, the annotation device 10 distributes the data group selected as described above to the annotator 20 as the second learning data. At this time, when the annotation device 10 distributes the data group of the second learning data, the reference point S is first distributed for each data group, and then the data E and the data D are distributed to the annotator 20. In the example of FIG. 5, when the data group p 0 is delivered, the annotation device 10 of the reference point S {x 1 , x 2 }, the data E {x 0 , x 3 }, and the data D {x 4 , x N }. The data group p M is distributed in order, and the reference point S {x a , x b }, the data E {x c , x d }, and the data D {x e , x f } are distributed in this order.
 なお、アノテーション装置10は、アノテータ20にデータE、データDの順にデータを視聴するようにデータ視聴順を指示してもよい。また、アノテーション装置10は、データE、データDをランダムに視聴するように配信してもよい。さらに、アノテーション装置10は、最初に配信した基準点Sについては、配信と同時に正解ラベルを提示し、アノテータ20に正解ラベルを付与しないように指示してもよいし、学習データの分類に関わらず、全ての学習データに正解ラベルを付与するように指示してもよい。 Note that the annotation device 10 may instruct the annotator 20 to view the data in the order of data E and data D. Further, the annotation device 10 may distribute the data E and the data D so as to be viewed at random. Further, the annotation device 10 may present the correct answer label at the same time as the reference point S delivered first and instruct the annotator 20 not to give the correct answer label, regardless of the classification of the learning data. , You may instruct all the training data to be given the correct label.
 最後に、アノテーション装置10は、アノテータ20によって正解ラベルを付与された第2の学習データを取得する。図5の例では、アノテーション装置10は、各データ群について、基準点Sを除くデータE、データDの正解ラベルをアノテータごとに取得しており、例えば、「アノテータ01」について、データ群pの{x,x3,,x}の学習データに対して、順に{1,43,2}の正解ラベルを取得し、データ群pの{x,x,x,x}の学習データに対して、順に{2,43,3}の正解ラベルを取得している(図5「ANNOT2(X)」参照)。 Finally, the annotation device 10 acquires the second training data to which the correct answer label is given by the annotator 20. In the example of FIG. 5, the annotation device 10 acquires the correct answer labels of the data E and the data D excluding the reference point S for each annotator for each data group. For example, for the “notator 01”, the data group p0 . For the training data of {x 0 , x 3, x 4 , x N }, the correct answer labels of {1, 4 , 3, 2} are acquired in order, and {x c , x d , of the data group p M , For the training data of x e , x f }, the correct answer labels of {2 , 4, 3, 3} are acquired in order (see FIG. 5 “ANNOT2 (X)”).
 なお、第2の学習データに付与された正解ラベルの最終的な処理については、特に限定されない。アノテーション装置10は、学習データごとに多数決をとり、最も多い正解ラベルを最終的な正解ラベルとして決定してもよいし、数値の平均点を計算し、その数値を最終的な正解ラベルとして決定してもよい。 The final processing of the correct answer label given to the second learning data is not particularly limited. The annotation device 10 may take a majority vote for each training data and determine the most correct answer label as the final correct answer label, or calculate the average score of the numerical values and determine the numerical value as the final correct answer label. You may.
[アノテーション処理の流れ]
 図6を用いて、本実施形態に係るアノテーション処理の流れを詳細に説明する。図6は、第1の実施形態に係るアノテーション処理の流れの一例を示すフローチャートである。
[Flow of annotation processing]
The flow of annotation processing according to this embodiment will be described in detail with reference to FIG. FIG. 6 is a flowchart showing an example of the flow of annotation processing according to the first embodiment.
 まず、アノテーション装置10の取得部15aは、データベース30等から音声、画像、動画等を含む第1の学習データを取得する(ステップS101)。このとき、取得部15aは、記憶部14から第1の学習データを取得してもよい。また、取得部15aは、データベース30や記憶部14から取得した音声データ等の元データを加工し、学習データとして適切なサイズに分割したり、適切な分類をしたりしてもよい。さらに、取得部15aは、入力部11を介して外部から音声データ等を取得してもよい。 First, the acquisition unit 15a of the annotation device 10 acquires the first learning data including voice, image, video, etc. from the database 30 or the like (step S101). At this time, the acquisition unit 15a may acquire the first learning data from the storage unit 14. Further, the acquisition unit 15a may process the original data such as voice data acquired from the database 30 or the storage unit 14, divide the original data into an appropriate size as learning data, or classify the data appropriately. Further, the acquisition unit 15a may acquire voice data or the like from the outside via the input unit 11.
 次に、第1配信部15bは、第1の学習データをアノテータ20に配信する(ステップS102)。このとき、第1配信部15bは、第1の学習データに応じて配信するアノテータ20を選定してもよい。また、取得部15aは、アノテータ20によって第1の正解ラベルを付与された第1の学習データを取得する(ステップS103)。 Next, the first distribution unit 15b distributes the first learning data to the annotator 20 (step S102). At this time, the first distribution unit 15b may select the annotator 20 to be distributed according to the first learning data. Further, the acquisition unit 15a acquires the first learning data to which the first correct answer label is given by the annotator 20 (step S103).
 そして、分類部15cは、第1の正解ラベルの信頼度をもとに第1の学習データを分類する(ステップS104)。また、生成部15dは、分類された第1の学習データから第2の学習データを生成する(ステップS105)。続いて、第2配信部15eは、第2の学習データをアノテータ20に配信する(ステップS106)。 Then, the classification unit 15c classifies the first learning data based on the reliability of the first correct answer label (step S104). Further, the generation unit 15d generates the second learning data from the classified first learning data (step S105). Subsequently, the second distribution unit 15e distributes the second learning data to the annotator 20 (step S106).
 なお、第2配信部15eは、第1の学習データを配信したアノテータ20以外のアノテータに第2の学習データを配信することもできる。例えば、第2配信部15eは、第1の学習データを人であるアノテータに配信し、第2の学習データを機械学習モデルであるアノテータに配信することもできる。 The second distribution unit 15e can also distribute the second learning data to annotators other than the annotator 20 that has distributed the first learning data. For example, the second distribution unit 15e can distribute the first learning data to the annotator who is a human being, and can also distribute the second learning data to the annotator which is a machine learning model.
 最後に、取得部15aは、アノテータ20によって第2の正解ラベルを付与された第2の学習データを取得し(ステップS107)、処理が終了する。なお、取得された第2の正解ラベルの精度が十分でない場合は、ステップS104~S107の処理を再度行ってもよい。 Finally, the acquisition unit 15a acquires the second learning data to which the second correct answer label is given by the annotator 20 (step S107), and the process ends. If the accuracy of the acquired second correct label is not sufficient, the processes of steps S104 to S107 may be performed again.
[第1の学習データの分類処理の流れ]
 図7を用いて、本実施形態に係る第1の学習データの分類処理の流れを詳細に説明する。図7は、第1の実施形態に係る第1の学習データの分類処理の流れの一例を示すフローチャートである。まず、アノテーション装置10の取得部15aは、アノテータ20から第1の学習データに付与された第1の正解ラベルを取得する(ステップS201)。次に、分類部15cは、アノテータ20が人である場合(ステップS202:アノテータは人)、ステップS203~S205の処理に基づいて、ステップS208~S210の分類処理を行う。
[Flow of classification processing of the first learning data]
With reference to FIG. 7, the flow of the first learning data classification process according to the present embodiment will be described in detail. FIG. 7 is a flowchart showing an example of the flow of the first learning data classification process according to the first embodiment. First, the acquisition unit 15a of the annotation device 10 acquires the first correct answer label given to the first learning data from the annotator 20 (step S201). Next, when the annotator 20 is a person (step S202: the annotator is a person), the classification unit 15c performs the classification processing of steps S208 to S210 based on the processing of steps S203 to S205.
 分類部15cは、全アノテータの回答が一致し(ステップS203:肯定)、その回答が極値である場合(ステップS204:肯定)、その正解ラベルを付与された第1の学習データを基準点Sに分類する(ステップS208)。また、分類部15cは、アノテータ20の回答に一致しないものが含まれる場合(ステップS203:否定)、またアノテータ20の回答が極値ではない場合(ステップS204:否定)、ステップS205の処理を行う。 When the answers of all the annotators match (step S203: affirmative) and the answers are extreme values (step S204: affirmative), the classification unit 15c uses the first learning data to which the correct answer label is attached as the reference point S. (Step S208). Further, when the answer of the annotator 20 includes something that does not match the answer of the annotator 20 (step S203: negative), or when the answer of the annotator 20 is not an extreme value (step S204: negative), the classification unit 15c performs the process of step S205. ..
 分類部15cは、アノテータ20の回答の分散が1.0以上である場合(ステップS205:肯定)、その正解ラベルを付与された第1の学習データをデータEに分類する(ステップS209)。また、アノテータ20の回答の分散が1.0未満である場合(ステップS205:否定)、その正解ラベルを付与された第1の学習データをデータDに分類する(ステップS210)。分類部15cは、ステップS208~S210の分類処理が終了した場合、処理を終了する。 When the variance of the answers of the annotator 20 is 1.0 or more (step S205: affirmative), the classification unit 15c classifies the first learning data to which the correct answer label is given into the data E (step S209). Further, when the variance of the answers of the annotator 20 is less than 1.0 (step S205: negation), the first learning data to which the correct answer label is attached is classified into the data D (step S210). When the classification process of steps S208 to S210 is completed, the classification unit 15c ends the process.
 一方、分類部15cは、アノテータ20が機械学習モデルである場合(ステップS202:アノテータは機械学習モデル)、ステップS206~S207の処理に基づいて、ステップS208~S210の分類処理を行う。分類部15cは、アノテータ20の推定結果となる値の事後確率が80%以上である場合(ステップS206:肯定)、その正解ラベルを付与された第1の学習データを基準点Sに分類する(ステップS208)。 On the other hand, when the annotator 20 is a machine learning model (step S202: the annotator is a machine learning model), the classification unit 15c performs the classification processing of steps S208 to S210 based on the processing of steps S206 to S207. When the posterior probability of the value that is the estimation result of the annotator 20 is 80% or more (step S206: affirmative), the classification unit 15c classifies the first learning data to which the correct answer label is given into the reference point S (step S206: affirmative). Step S208).
 また、分類部15cは、アノテータ20の推定結果となる値の事後確率が80%未満であり(ステップS206:否定)、その事後確率が50%以上である場合(ステップS207:肯定)、その正解ラベルを付与された第1の学習データをデータEに分類する(ステップS209)。また、分類部15cは、アノテータ20の推定結果となる値の事後確率が50%未満である場合(ステップS207:否定)、その正解ラベルを付与された第1の学習データをデータDに分類する(ステップS210)。分類部15cは、ステップS208~S210の分類処理が終了した場合、処理を終了する。 Further, in the classification unit 15c, when the posterior probability of the value that is the estimation result of the annotator 20 is less than 80% (step S206: negative) and the posterior probability is 50% or more (step S207: affirmative), the correct answer. The labeled first training data is classified into data E (step S209). Further, when the posterior probability of the value that is the estimation result of the annotator 20 is less than 50% (step S207: negation), the classification unit 15c classifies the first learning data to which the correct answer label is given into the data D. (Step S210). When the classification process of steps S208 to S210 is completed, the classification unit 15c ends the process.
[第1の実施形態の効果]
 第1に、上述した本実施形態に係るアノテーション処理では、機械学習に用いられる第1の学習データを取得し、取得した第1の学習データを複数のアノテータに配信し、各アノテータによって第1の学習データにそれぞれ付与された第1の正解ラベルの信頼度に基づいて、前記第1の学習データを分類し、分類した第1の学習データの分類結果を配信する。このため、本処理では、機械学習における教師あり学習において、より低コストかつ高精度なアノテーションを行うことができる。
[Effect of the first embodiment]
First, in the annotation process according to the present embodiment described above, the first learning data used for machine learning is acquired, the acquired first learning data is distributed to a plurality of annotators, and each annotator makes a first. The first learning data is classified based on the reliability of the first correct answer label given to each learning data, and the classification result of the classified first learning data is delivered. Therefore, in this process, it is possible to perform low-cost and high-precision annotation in supervised learning in machine learning.
 第2に、上述した本実施形態に係るアノテーション処理では、音声、画像または動画を含む第1の学習データを取得し、第1の正解ラベルとして、所定の数字を付与させる形式の第1の学習データを配信し、第1の学習データを、信頼度として第1の正解ラベルの分散に基づいて、基準データ、正確に正解ラベルを付与しやすいデータ、または正確に正解ラベルを付与しにくいデータに分類する。このため、本処理では、機械学習における教師あり学習において、比較対象がない場合であっても信頼性の高い正解ラベルの付与を可能とし、より低コストかつ高精度なアノテーションを行うことができる。 Secondly, in the annotation process according to the present embodiment described above, the first learning in the form of acquiring the first learning data including voice, image or moving image and assigning a predetermined number as the first correct answer label. Data is distributed, and the first learning data is converted into reference data, data that is easy to give an accurate correct answer label, or data that is difficult to give an accurate correct answer label based on the distribution of the first correct answer label as reliability. Classify. Therefore, in this process, in supervised learning in machine learning, it is possible to assign a highly reliable correct label even when there is no comparison target, and it is possible to perform low-cost and high-precision annotation.
 第3に、上述した本実施形態に係るアノテーション処理では、アノテータとして、機械学習モデルに第1の学習データを配信し、第1の学習データを、信頼度として第1の正解ラベルの事後確率に基づいて分類する。このため、本処理では、機械学習における教師あり学習において、アノテータが人でない場合であっても信頼性の高い正解ラベルの付与を可能とし、より低コストかつ高精度なアノテーションを行うことができる。 Thirdly, in the annotation process according to the present embodiment described above, as an annotator, the first learning data is delivered to the machine learning model, and the first learning data is used as the posterior probability of the first correct answer label as the reliability. Classify based on. Therefore, in this process, in supervised learning in machine learning, it is possible to assign a highly reliable correct label even when the annotator is not a human being, and it is possible to perform annotation at lower cost and with higher accuracy.
 第4に、上述した本実施形態に係るアノテーション処理では、分類結果として、基準データであって極値の異なる複数の基準データ、正確に正解ラベルを付与しやすいデータ、および正確に正解ラベルを付与しにくいデータを含み、かつ、各データの発生源が同一のデータ群である第2の学習データを生成し、複数の基準データを最初に配信する。このため、本処理では、機械学習における教師あり学習において、比較対象がない場合であっても信頼性が高く、効率的な正解ラベルの付与を可能とし、より低コストかつ高精度なアノテーションを行うことができる。 Fourth, in the above-mentioned annotation processing according to the present embodiment, as the classification result, a plurality of reference data having different extreme values, data that can be easily given an accurate correct answer label, and accurate correct answer labels are given. The second training data, which includes data that is difficult to handle and whose source of each data is the same data group, is generated, and a plurality of reference data are distributed first. Therefore, in this process, in supervised learning in machine learning, it is possible to assign a correct label with high reliability and efficiency even when there is no comparison target, and annotation is performed at lower cost and with higher accuracy. be able to.
 第5に、上述した本実施形態に係るアノテーション処理では、分類結果を第1の学習データを配信した複数のアノテータ、または第1の学習データを配信した複数のアノテータ以外の所定のアノテータに配信する。このため、本処理では、機械学習における教師あり学習において、比較対象がない場合であっても信頼性が高く、効率的で、より柔軟な正解ラベルの付与を可能とし、より低コストかつ高精度なアノテーションを行うことができる。 Fifth, in the annotation process according to the present embodiment described above, the classification result is distributed to a plurality of annotators to which the first learning data is distributed, or to a predetermined annotator other than the plurality of annotators to which the first learning data is distributed. .. Therefore, in this process, in supervised learning in machine learning, it is possible to assign correct answer labels with high reliability, efficiency, and flexibility even when there is no comparison target, and it is possible to assign correct answer labels at lower cost and higher accuracy. Annotation can be performed.
〔システム構成等〕
 上記実施形態に係る図示した各装置の各構成要素は機能概念的なものであり、必ずしも物理的に図示のごとく構成されていることを要しない。すなわち、各装置の分散・統合の具体的形態は図示のものに限られず、その全部または一部を、各種の負荷や使用状況などに応じて、任意の単位で機能的または物理的に分散・統合して構成することができる。さらに、各装置にて行なわれる各処理機能は、その全部または任意の一部が、CPUおよび当該CPUにて解析実行されるプログラムにて実現され、あるいは、ワイヤードロジックによるハードウェアとして実現され得る。
[System configuration, etc.]
Each component of each of the illustrated devices according to the above embodiment is a functional concept and does not necessarily have to be physically configured as shown in the figure. That is, the specific form of distribution / integration of each device is not limited to the one shown in the figure, and all or part of them may be functionally or physically distributed / physically distributed in any unit according to various loads and usage conditions. Can be integrated and configured. Further, each processing function performed by each device may be realized by a CPU and a program analyzed and executed by the CPU, or may be realized as hardware by wired logic.
 また、上記実施形態において説明した各処理のうち、自動的に行われるものとして説明した処理の全部または一部を手動的に行うこともでき、あるいは、手動的に行われるものとして説明した処理の全部または一部を公知の方法で自動的に行うこともできる。この他、上記文書中や図面中で示した処理手順、制御手順、具体的名称、各種のデータやパラメータを含む情報については、特記する場合を除いて任意に変更することができる。 Further, among the processes described in the above-described embodiment, all or a part of the processes described as being automatically performed can be manually performed, or the processes described as being manually performed can be performed. All or part of it can be done automatically by a known method. In addition, the processing procedure, control procedure, specific name, and information including various data and parameters shown in the above document and drawings can be arbitrarily changed unless otherwise specified.
〔プログラム〕
 また、上記実施形態において説明したアノテーション装置10が実行する処理をコンピュータが実行可能な言語で記述したプログラムを作成することもできる。この場合、コンピュータがプログラムを実行することにより、上記実施形態と同様の効果を得ることができる。さらに、かかるプログラムをコンピュータ読み取り可能な記録媒体に記録して、この記録媒体に記録されたプログラムをコンピュータに読み込ませて実行することにより上記実施形態と同様の処理を実現してもよい。
〔program〕
It is also possible to create a program in which the processing executed by the annotation device 10 described in the above embodiment is described in a language that can be executed by a computer. In this case, the same effect as that of the above embodiment can be obtained by executing the program by the computer. Further, the same process as that of the above embodiment may be realized by recording the program on a computer-readable recording medium, reading the program recorded on the recording medium into the computer, and executing the program.
 図8は、プログラムを実行するコンピュータを示す図である。図8に例示するように、コンピュータ1000は、例えば、メモリ1010と、CPU(Central Processing Unit)1020と、ハードディスクドライブインタフェース1030と、ディスクドライブインタフェース1040と、シリアルポートインタフェース1050と、ビデオアダプタ1060と、ネットワークインタフェース1070とを有し、これらの各部はバス1080によって接続される。 FIG. 8 is a diagram showing a computer that executes a program. As illustrated in FIG. 8, the computer 1000 includes, for example, a memory 1010, a CPU (Central Processing Unit) 1020, a hard disk drive interface 1030, a disk drive interface 1040, a serial port interface 1050, a video adapter 1060, and the like. It has a network interface 1070 and each of these parts is connected by a bus 1080.
 メモリ1010は、図8に例示するように、ROM(Read Only Memory)1011及びRAM1012を含む。ROM1011は、例えば、BIOS(Basic Input Output System)等のブートプログラムを記憶する。ハードディスクドライブインタフェース1030は、図8に例示するように、ハードディスクドライブ1090に接続される。ディスクドライブインタフェース1040は、図8に例示するように、ディスクドライブ1100に接続される。例えば、磁気ディスクや光ディスク等の着脱可能な記憶媒体が、ディスクドライブ1100に挿入される。シリアルポートインタフェース1050は、図8に例示するように、例えば、マウス1110、キーボード1120に接続される。ビデオアダプタ1060は、図8に例示するように、例えばディスプレイ1130に接続される。 The memory 1010 includes a ROM (Read Only Memory) 1011 and a RAM 1012, as illustrated in FIG. The ROM 1011 stores, for example, a boot program such as a BIOS (Basic Input Output System). The hard disk drive interface 1030 is connected to the hard disk drive 1090, as illustrated in FIG. The disk drive interface 1040 is connected to the disk drive 1100 as illustrated in FIG. For example, a removable storage medium such as a magnetic disk or an optical disk is inserted into the disk drive 1100. The serial port interface 1050 is connected to, for example, a mouse 1110 and a keyboard 1120, as illustrated in FIG. The video adapter 1060 is connected, for example, to a display 1130, as illustrated in FIG.
 ここで、図8に例示するように、ハードディスクドライブ1090は、例えば、OS1091、アプリケーションプログラム1092、プログラムモジュール1093、プログラムデータ1094を記憶する。すなわち、上記のプログラムは、コンピュータ1000によって実行される指令が記述されたプログラムモジュールとして、例えば、ハードディスクドライブ1090に記憶される。 Here, as illustrated in FIG. 8, the hard disk drive 1090 stores, for example, the OS 1091, the application program 1092, the program module 1093, and the program data 1094. That is, the above program is stored in, for example, the hard disk drive 1090 as a program module in which a command executed by the computer 1000 is described.
 また、上記実施形態で説明した各種データは、プログラムデータとして、例えば、メモリ1010やハードディスクドライブ1090に記憶される。そして、CPU1020が、メモリ1010やハードディスクドライブ1090に記憶されたプログラムモジュール1093やプログラムデータ1094を必要に応じてRAM1012に読み出し、各種処理手順を実行する。 Further, the various data described in the above embodiment are stored as program data in, for example, the memory 1010 or the hard disk drive 1090. Then, the CPU 1020 reads the program module 1093 and the program data 1094 stored in the memory 1010 and the hard disk drive 1090 into the RAM 1012 as needed, and executes various processing procedures.
 なお、プログラムに係るプログラムモジュール1093やプログラムデータ1094は、ハードディスクドライブ1090に記憶される場合に限られず、例えば着脱可能な記憶媒体に記憶され、ディスクドライブ等を介してCPU1020によって読み出されてもよい。あるいは、プログラムに係るプログラムモジュール1093やプログラムデータ1094は、ネットワーク(LAN(Local Area Network)、WAN(Wide Area Network)等)を介して接続された他のコンピュータに記憶され、ネットワークインタフェース1070を介してCPU1020によって読み出されてもよい。 The program module 1093 and program data 1094 related to the program are not limited to those stored in the hard disk drive 1090, and may be stored in, for example, a removable storage medium and read by the CPU 1020 via a disk drive or the like. .. Alternatively, the program module 1093 and the program data 1094 related to the program are stored in another computer connected via a network (LAN (Local Area Network), WAN (Wide Area Network), etc.) and stored via the network interface 1070. It may be read by the CPU 1020.
 上記の実施形態やその変形は、本願が開示する技術に含まれると同様に、請求の範囲に記載された発明とその均等の範囲に含まれるものである。 The above embodiments and modifications thereof are included in the invention described in the claims and the equivalent scope thereof, as included in the technique disclosed in the present application.
 10 アノテーション装置
 11 入力部
 12 出力部
 13 通信部
 14 記憶部
 15 制御部
 15a 取得部
 15b 第1配信部
 15c 分類部
 15d 生成部
 15e 第2配信部
 20、20A、20B、20C アノテータ
 30、30A、30B、30C データベース
 100 アノテーションシステム
10 Annotation device 11 Input unit 12 Output unit 13 Communication unit 14 Storage unit 15 Control unit 15a Acquisition unit 15b First distribution unit 15c Classification unit 15d Generation unit 15e Second distribution unit 20, 20A, 20B, 20C Annotator 30, 30A, 30B , 30C Database 100 Annotation System

Claims (7)

  1.  機械学習に用いられる第1の学習データを取得する取得部と、
     前記取得部によって取得された前記第1の学習データを複数のアノテータに配信する第1配信部と、
     各アノテータによって前記第1の学習データにそれぞれ付与された第1の正解ラベルの信頼度に基づいて、前記第1の学習データを分類する分類部と、
     前記分類部によって分類された前記第1の学習データの分類結果を配信する第2配信部と
     を備えることを特徴とするアノテーション装置。
    An acquisition unit that acquires the first learning data used for machine learning,
    A first distribution unit that distributes the first learning data acquired by the acquisition unit to a plurality of annotators, and a first distribution unit.
    A classification unit that classifies the first learning data based on the reliability of the first correct answer label given to the first learning data by each annotator.
    An annotation device including a second distribution unit that distributes the classification result of the first learning data classified by the classification unit.
  2.  前記取得部は、音声、画像または動画を含む前記第1の学習データを取得し、
     前記第1配信部は、前記第1の正解ラベルとして、所定の数字を付与させる形式の前記第1の学習データを配信し、
     前記分類部は、前記第1の学習データを、前記信頼度として前記第1の正解ラベルの分散に基づいて、基準データ、正確に正解ラベルを付与しやすいデータ、または正確に正解ラベルを付与しにくいデータに分類することを特徴とする請求項1に記載のアノテーション装置。
    The acquisition unit acquires the first learning data including audio, images, or moving images, and obtains the first learning data.
    The first distribution unit distributes the first learning data in a format in which a predetermined number is given as the first correct answer label.
    The classification unit assigns the first learning data as reference data, data that can be easily given an accurate correct label, or accurate correct label based on the variance of the first correct label as the reliability. The annotation device according to claim 1, wherein the data is classified into difficult data.
  3.  前記第1配信部は、前記アノテータとして、機械学習モデルに前記第1の学習データを配信し、
     前記分類部は、前記第1の学習データを、前記信頼度として前記第1の正解ラベルの事後確率に基づいて分類することを特徴とする請求項1または2に記載のアノテーション装置。
    The first distribution unit distributes the first learning data to the machine learning model as the annotator.
    The annotation device according to claim 1 or 2, wherein the classification unit classifies the first learning data as the reliability based on the posterior probability of the first correct answer label.
  4.  前記分類結果として、前記基準データであって極値の異なる複数の基準データ、前記正確に正解ラベルを付与しやすいデータ、および前記正確に正解ラベルを付与しにくいデータを含み、かつ、各データの発生源が同一のデータ群である第2の学習データを生成する生成部をさらに備え、
     前記第2配信部は、前記複数の基準データを最初に配信することを特徴とする請求項2または3に記載のアノテーション装置。
    As the classification result, a plurality of reference data having different extreme values, the data in which the correct answer label is easily attached, and the data in which the correct answer label is difficult to be attached are included, and each data is included. Further equipped with a generator for generating a second training data whose source is the same data group,
    The annotation device according to claim 2 or 3, wherein the second distribution unit first distributes the plurality of reference data.
  5.  前記第2配信部は、前記分類結果を前記複数のアノテータ、または前記複数のアノテータ以外の所定のアノテータに配信することを特徴とする請求項1から4のいずれか1項に記載のアノテーション装置。 The annotation device according to any one of claims 1 to 4, wherein the second distribution unit distributes the classification result to the plurality of annotators or predetermined annotators other than the plurality of annotators.
  6.  アノテーション装置によって実行されるアノテーション方法であって、
     機械学習に用いられる第1の学習データを取得する取得工程と、
     前記取得工程によって取得された前記第1の学習データを複数のアノテータに配信する第1配信工程と、
     各アノテータによって前記第1の学習データにそれぞれ付与された第1の正解ラベルの信頼度に基づいて、前記第1の学習データを分類する分類工程と、
     前記分類工程によって分類された前記第1の学習データの分類結果を配信する第2配信工程と
     を含むことを特徴とするアノテーション方法。
    Annotation method performed by the annotation device,
    The acquisition process to acquire the first learning data used for machine learning,
    A first distribution step of distributing the first learning data acquired by the acquisition process to a plurality of annotators, and a first distribution process.
    A classification step of classifying the first learning data based on the reliability of the first correct answer label given to the first learning data by each annotator.
    An annotation method comprising a second distribution step of distributing the classification result of the first learning data classified by the classification step.
  7.  機械学習に用いられる第1の学習データを取得する取得ステップと、
     前記取得ステップによって取得された前記第1の学習データを複数のアノテータに配信する第1配信ステップと、
     各アノテータによって前記第1の学習データにそれぞれ付与された第1の正解ラベルの信頼度に基づいて、前記第1の学習データを分類する分類ステップと、
     前記分類ステップによって分類された前記第1の学習データの分類結果を配信する第2配信ステップと
     をコンピュータに実行させることを特徴とするアノテーションプログラム。
    The acquisition step to acquire the first learning data used for machine learning,
    The first distribution step of distributing the first learning data acquired by the acquisition step to a plurality of annotators, and
    A classification step for classifying the first training data based on the reliability of the first correct label given to the first training data by each annotator, and a classification step.
    An annotation program characterized by causing a computer to execute a second distribution step of distributing the classification result of the first learning data classified by the classification step.
PCT/JP2020/046835 2020-12-15 2020-12-15 Annotation device, annotation method, and annotation program WO2022130516A1 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
PCT/JP2020/046835 WO2022130516A1 (en) 2020-12-15 2020-12-15 Annotation device, annotation method, and annotation program
JP2022569380A JPWO2022130516A1 (en) 2020-12-15 2020-12-15

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/JP2020/046835 WO2022130516A1 (en) 2020-12-15 2020-12-15 Annotation device, annotation method, and annotation program

Publications (1)

Publication Number Publication Date
WO2022130516A1 true WO2022130516A1 (en) 2022-06-23

Family

ID=82059222

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2020/046835 WO2022130516A1 (en) 2020-12-15 2020-12-15 Annotation device, annotation method, and annotation program

Country Status (2)

Country Link
JP (1) JPWO2022130516A1 (en)
WO (1) WO2022130516A1 (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2019521443A (en) * 2016-06-30 2019-07-25 コニカ ミノルタ ラボラトリー ユー.エス.エー.,インコーポレイテッド Cell annotation method and annotation system using adaptive additional learning
JP2020144755A (en) * 2019-03-08 2020-09-10 日立オートモティブシステムズ株式会社 Operation device

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2019521443A (en) * 2016-06-30 2019-07-25 コニカ ミノルタ ラボラトリー ユー.エス.エー.,インコーポレイテッド Cell annotation method and annotation system using adaptive additional learning
JP2020144755A (en) * 2019-03-08 2020-09-10 日立オートモティブシステムズ株式会社 Operation device

Also Published As

Publication number Publication date
JPWO2022130516A1 (en) 2022-06-23

Similar Documents

Publication Publication Date Title
Kim et al. On learning associations of faces and voices
Bachorowski Vocal expression and perception of emotion
Sauter et al. Perceptual cues in nonverbal vocal expressions of emotion
Mariooryad et al. Exploring cross-modality affective reactions for audiovisual emotion recognition
Fung et al. ROC speak: semi-automated personalized feedback on nonverbal behavior from recorded videos
US20190147760A1 (en) Cognitive content customization
US10353996B2 (en) Automated summarization based on physiological data
US9922644B2 (en) Analysis of professional-client interactions
Sapru et al. Automatic recognition of emergent social roles in small group interactions
Jones et al. Good vibrations: Human interval timing in the vibrotactile modality
Rao S. B et al. Automatic assessment of communication skill in non-conventional interview settings: A comparative study
Yordanova et al. Automatic detection of everyday social behaviours and environments from verbatim transcripts of daily conversations
Zhang et al. On rater reliability and agreement based dynamic active learning
Harvill et al. Quantifying emotional similarity in speech
Siew The influence of 2-hop network density on spoken word recognition
Okada et al. Predicting performance of collaborative storytelling using multimodal analysis
US20220051670A1 (en) Learning support device, learning support method, and recording medium
WO2022130516A1 (en) Annotation device, annotation method, and annotation program
Lavan et al. Speaker sex perception from spontaneous and volitional nonverbal vocalizations
Metze et al. A review of personality in voice-based man machine interaction
Singh et al. A Survey on: Personality Prediction from Multimedia through Machine Learning
Viegas et al. Entheos: A multimodal dataset for studying enthusiasm
CN113901793A (en) Event extraction method and device combining RPA and AI
Schrank et al. Automatic detection of uncertainty in spontaneous german dialogue
CN113691382A (en) Conference recording method, conference recording device, computer equipment and medium

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20965904

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2022569380

Country of ref document: JP

Kind code of ref document: A

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20965904

Country of ref document: EP

Kind code of ref document: A1