WO2020045341A1

WO2020045341A1 - Selection device, learning device, selection method, learning method, and program

Info

Publication number: WO2020045341A1
Application number: PCT/JP2019/033290
Authority: WO
Inventors: 歩相名神山; 厚志安藤; 亮増村; 哲小橋川
Original assignee: 日本電信電話株式会社
Priority date: 2018-08-27
Filing date: 2019-08-26
Publication date: 2020-03-05
Also published as: JP2020035018A

Abstract

Provided is a selection device which selects a specific evaluator from among a plurality of evaluators who have evaluated a plurality of subjects for evaluation, said device comprising: a match level aggregation part for, on the basis of input data including evaluators and evaluation values for each subject for evaluation, computing the level of match in the evaluation values between one of the evaluators and the other evaluators; a probability value calculation part for computing a probability value for each of the evaluators on the basis of the match level pertaining to the evaluator, the probability value indicating whether the level of match between the evaluation value for the evaluator and the evaluation values for the other evaluators is significant; and an evaluator selection part for, on the basis of the probability values, selecting the evaluator for whom the level of match in the evaluation values with the other evaluators is significant.

Description

Selection device, learning device, selection method, learning method, and program

(4) The present invention relates to a technique for selecting a specific evaluator from a plurality of evaluators who have evaluated a plurality of evaluation targets.

In tests in which conversational skills are considered as one item of the skill test, such as favorable sensitivity of telephone voice (Non-patent document 1) and good pronunciation and fluency of foreign languages (Non-patent document 2), the quantitative Impressive impression values (for example, five-grade evaluation from good to bad, 5 to low 1 with high preference, 1 to 5 low with naturalness, etc.).

Currently, experts of each skill evaluate the impression of this voice and judge pass / fail, but if it can be automatically evaluated, it can be used for cutting off examinations, etc. Can be used as a reference value for an expert unfamiliar with (for example, a person who has just become an evaluator). Therefore, a technique for automatically estimating a voice impression is required.

In order to realize automatic estimation of the impression of data using machine learning, a machine learning model may be learned from impression value data and features of the data. However, since impressions have different criteria for different people, the impression values may be different for different persons even with the same data. In order to be able to estimate an average impression, an average impression can be learned by giving an impression value to one data by a large number of people and using the average value of the impression values. In order to be able to stably estimate the average impression value, the impression value should be given by as many people as possible. For example, in the impression data created in Non-Patent Document 3, impression values are assigned to one voice data by ten people.

(4) In actual operation, it is difficult to give a large amount of impression value to one data due to the limitation of the number of people. Therefore, the impression value is at most about one or two persons per data. In this case, even when an average impression value is obtained, the standard of impression may be different depending on the evaluator, and may be significantly different from the value obtained by the average value of a large number of people. At this time, an averaged label may not be able to be correctly learned due to a difference in evaluation criteria.

課題 Note that the above-described problem is not limited to the case where the evaluation target is a voice, and may be a problem that can occur when the evaluation target is evaluated by a plurality of evaluators in various fields.

The present invention has been made in view of the above points, and provides a technology that enables an evaluator to perform an average evaluation to be selected even when the number of evaluators per evaluation target is small. Aim.

According to the disclosed technology, a selection device that selects a specific evaluator from a plurality of evaluators who have performed evaluations on a plurality of evaluation targets,

一致 a coincidence counting unit that calculates the degree of coincidence of the evaluation value with another evaluator for each evaluator based on the input data including the evaluator and the evaluation value for each evaluation target;

確率 based on the degree of coincidence for each evaluator, for each evaluator, a probability value calculation unit that calculates a probability value indicating whether the evaluation value significantly matches other evaluators,

と an evaluator selection unit that selects an evaluator whose evaluation value is significantly consistent with another evaluator based on the probability value.

選定 is provided.

According to the disclosed technology, a technology is provided that enables an evaluator to perform an average evaluation even when the number of evaluators per evaluation target is small.

FIG. 2 is a functional configuration diagram of the learning device 100 according to the embodiment of the present invention. FIG. 3 is a diagram illustrating an example of a hardware configuration of a device. It is a figure showing an example of learning label data. FIG. 9 is a diagram illustrating an example of learning feature data. It is a figure which shows the evaluation value of both, and the value of m.

Hereinafter, embodiments of the present invention (the present embodiment) will be described with reference to the drawings. The embodiments described below are merely examples, and embodiments to which the present invention is applied are not limited to the following embodiments. For example, in the following embodiment, an example in which an evaluator that evaluates an impression value of a voice is selected is shown. However, the present invention is not limited to the evaluation of an impression value of a voice, and various evaluation targets may be evaluated. Can be applied to the selection of evaluators.

(Functional configuration of device)

FIG. 1 shows a functional configuration diagram of the learning device 100 according to the present embodiment. As shown in FIG. 1, the learning device 100 includes a number-of-matches counting unit 110, a probability value calculation unit 120, an evaluator selection unit 130, and a learning unit 140. Further, there are a learning label data DB 150 and a learning feature data DB 160 which are databases for storing data used in the learning device 100. The learning label data DB 150 and the learning feature amount data DB 160 may be provided as a storage unit inside the learning device 100, or may be connected to the outside of the learning device 100 via a network. The number of matches counted by the number-of-matches counting unit 110 is an example of the degree of match. More generally, the number-of-matches counting section 110 may be referred to as a matching degree counting section 110.

(4) An apparatus (referred to as a selection apparatus) including the number-of-matches counting section 110, the probability value calculation section 120, and the evaluator selection section 130 without providing the learning section 140 may be provided. In this case, in order to perform learning, the selection result (either the selected evaluator or the selected label data) is transmitted from the selection device to an arbitrary learning device including a learning unit.

A device including the number-of-matches totaling unit 110, the probability value calculating unit 120, the evaluator selecting unit 130, and the learning unit 140 may be referred to as a selecting device. The learning apparatus 100 further includes a matching number counting section 110, a probability value calculating section 120, an evaluator selecting section 130, and a learning section 140. The matching number counting section 110, the probability value calculating section 120, and the evaluator selecting section 130 are provided. The part may be referred to as a selection device.

The contents of the data stored in the above-mentioned DB and the detailed operation of each unit will be described later as an embodiment (and a modification).

(Example of hardware configuration)

The learning device 100 and the selection device described above (hereinafter, collectively referred to as “devices”) can be realized by, for example, causing a computer to execute a program describing processing contents described in the present embodiment.

That is, the device can be realized by executing a program corresponding to a process performed by the device using hardware resources such as a CPU and a memory built in the computer. The above-mentioned program can be recorded on a computer-readable recording medium (a portable memory or the like) and can be stored or distributed. Further, the above program can be provided through a network such as the Internet or electronic mail.

FIG. 2 is a diagram illustrating an example of a hardware configuration of the computer according to the present embodiment. The computer in FIG. 2 includes a drive device 170, an auxiliary storage device 172, a memory device 173, a CPU 174, an interface device 175, a display device 176, an input device 177, and the like, which are interconnected by a bus B.

The program for realizing the processing in the computer is provided by a recording medium 171 such as a CD-ROM or a memory card. When the recording medium 171 storing the program is set in the drive device 170, the program is installed from the recording medium 171 to the auxiliary storage device 172 via the drive device 170. However, the program need not always be installed from the recording medium 171 and may be downloaded from another computer via a network. The auxiliary storage device 172 stores installed programs and also stores necessary files and data.

(4) The memory device 173 reads out the program from the auxiliary storage device 172 and stores it when there is an instruction to start the program. The CPU 174 implements functions of the memory device 173 according to a program stored in the device. The interface device 175 is used as an interface for connecting to a network. The display device 176 displays a GUI (Graphical User Interface) or the like by a program. The input device 177 includes a keyboard, a mouse, buttons, a touch panel, and the like, and is used to input various operation instructions.

Hereinafter, as examples, the learning label data DB150, the learning label data stored in the learning feature data DB160, the learning feature data, and the operation of each unit will be described in detail.

(Example)

FIG. 3 shows an example of the learning label data in this embodiment, and FIG. 4 shows an example of the learning feature amount data.

As shown in FIG. 3, the learning label data includes a data number y (i, 0) and an evaluator number y (i) for the data number i (i = 0, 1,..., I) of the learning label data. , 1) and an impression value label y (i, 2). The data number y (i, 0) is y (i, 0) {0,1,2, ..., J}, and is a number indicating the data number j of the learning feature amount data. The evaluator number is the number y (i, 1) {1, 2, 3,..., K} of the evaluator who evaluated the data. The impression value label is an impression value (evaluation value) for the data. In the present embodiment, it is assumed that binary values are assigned as 0 and 1.

As described above, in the learning label data, a plurality of impression labels are given to one learning feature amount data by a plurality of persons.

As shown in FIG. 4, the learning feature amount data is data x (j) for data number j (j = 0, 1,..., J). The learning feature amount data is, for example, an audio signal or a value such as a vector obtained by extracting a feature from the audio signal.

Next, the operation of the coincidence counting section 110 will be described. The learning label data is input from the learning label data DB 150 to the number-of-matches counting section 110.

Based on the input learning label data, the number-of-matches counting section 110 counts the number of matches and the number of mismatches for each evaluator, and calculates the number-of-matches total data C (k, m) and the evaluation. The evaluation number N (k) of the person k is obtained. Note that the coincidence count data C (k, m) may be referred to as coincidence.

Regarding the coincidence count data C (k, m), when m = 0, C (k, m) indicates the number of times that the evaluator k disagrees with others, and when m = 1, C (k, m) k, m) indicates the number of times that the evaluator k has agreed with another person. The evaluation number N (k) is the sum of these numbers, and satisfies N (k) = C (k, 0) + C (k, 1). Hereinafter, the operation of the number-of-matches counting section 110 will be described in more detail. The operation consists of the following (1) and (2).

(1) First, the learning label data is converted into a function based on the following

rules

1 and 2.

Rule 1: If data exists in the learning label data for evaluator k and learning feature data number j, the label at that time is f (k, j); otherwise, f (k, j) = None And For example, in the example shown in FIG. 3, focusing on the first evaluator (k = 1) and the zeroth learning feature data (j = 0), since label = 0, f (1,0) = 0 is there. Also, f (2,0) = 0.

Rule 2: Let a set of evaluator numbers evaluating a certain data number j be L (j). For example, in the example of FIG. 3, L (0) = {1, 2} when the data of No. 0 is evaluated because it is the first and second evaluators. For each j, L (j) is obtained.

(2) As a loop process for the evaluator k (k = 1,... K), the following processes (2-1) and (2-2) are executed for each of k = 1,.

(2-1) C (k, 0) = 0, C (k, 1) = 0, in order to initialize the coincidence count data C (k, m) and the evaluation number N (k) of the evaluator k. N (k) = 0.

{(2-2) As loop processing for data number j (j = 0, 1,..., J), j = 0, 1,. . , J, the following processes (2-2-1) and (2-2-2) are executed.

If (2-2-1) f (k, j) = None, the process proceeds to the next loop for j.

(2-2-2) f (k, j)! = None (if f (k, j) is not None), L ′ = L (j) − {k} is obtained and l {L ′ (evaluator set other than k for evaluating data j) (2-2-2-1), (2-2-2-2), and (2-2-2-3) are executed as loop processing.

(2-2-2-1) N (k) + = 1

(2-2-2-2) When f (l, j) = f (k, j) (when the label matches another person), the number of matching is increased: C (k, 0) + = 1

(2-2-2-3) If f (l, j) ≠ f (k, j) (if the label does not match with the others), increase the number of mismatches: C (k, 1) + = 1

Next, the operation of the probability value calculation unit 120 will be described. The probability value calculation unit 120 obtains a probability value for calculating whether or not the number of the evaluators that match the other evaluators is significantly high. The details are as follows.

Assuming that the probability that the views agree with others is p, if the number of significant matches is high, various evaluators are likely to agree with the views, and evaluators that can stably evaluate among various evaluators can do. Here, it is assumed that the probability that the two people agree and the probability that they disagree follow a Bernoulli distribution of p and (1−p), and calculate the probability value based on the binomial distribution.

The probability value calculation unit 120 performs a binomial test according to the following equation to obtain a significance probability P (k), assuming that the number of C (k, 0) and C (k, 1) follows a binomial distribution.

Ｍ Normcdf in the above equation is a standard normal cumulative distribution function. For the P obtained by the calculation, the evaluator who has a large number of evaluations and has a large number of matches with others has a small value of P. The matching probability p can be, for example, p = 0.5.

Next, the operation of the evaluator selection unit 130 will be described. The evaluator selection unit 130 selects an evaluator whose evaluation value is significantly consistent with another using the threshold δ based on P (k) obtained by the probability value calculation unit 120, and Select only the data evaluated by. In the present embodiment, the evaluator selecting unit 130 calculates and outputs a set I ′ of label data of the selected evaluator according to the following equation.

{I ′ = {i | P (k) <δ} C (k, 1)> C (k, 0) {y (i, 1) = k}

Note that an evaluator k that satisfies “P (k) <δ∧C (k, 1)> C (k, 0)” may be selected, and the selected evaluator may be output. In this case, the learning unit 140 may extract the label data evaluated by the selected evaluator.

{The criteria for selecting the evaluator “P (k) <δ∧C (k, 1)> C (k, 0)” are merely examples, and the evaluator may be selected based on other criteria.

Next, the operation of the learning unit 140 will be described. The learning unit 140 learns a machine learning model using the label data selected by the evaluator selection unit 130 and the corresponding learning feature data as teacher data. As a method of machine learning, a general method (for example, SVM, neural network, or the like) can be used.

のみ As the label data, only the label data set I ′ selected by the evaluator selection unit 130 is used. Specifically, sets X and Y of the learning feature amount data and the label data are obtained as described below.

(X, Y) _{i∈I ′} = {(x (y (i, 0)), y (i, 2)) | i∈I ′}

{The learning unit 140 performs learning using X and Y as teacher data. For example, the model is learned using X as the input of the model and Y as the correct data. The model obtained by learning is used, for example, for automatic evaluation of speech.

(Modification)

Next, a modified example will be described. In the above-described embodiment, it is assumed that the label has two values (0 or 1). However, a stable evaluator can be obtained even if the value is more than two steps, for example, five values, ten values, or the like. Can be selected. In the modified example, a description will be given of processing contents that can be applied even when values of more than two levels are used as labels. In the example of FIG. 3, the learning label data at this time is such that the number of labels (the number of rows) per one data number (y (i, 0)) is the number of stages (not limited to two). It is.

Hereinafter, the operations of the number-of-matches totaling unit 110 and the probability value calculating unit 120 in the modification will be described. The operations of the evaluator selecting unit 130 and the learning unit 140 are the same as the operations in the embodiment.

The number-of-matches totaling unit 110 in the modified example totals differences between evaluators. At this time, the total number of matching data C (k, m) (m = 0,..., M−1, M is the maximum value of the label. For example, in the case of a five-level evaluation where the minimum evaluation value is 1, M = 5 ) Is a total value of the number of differences between the two evaluators. When m = 0, C (k, m) indicates the number of times that evaluator k agrees with another at any stage, and when m ≠ 0, C (k, m) indicates that evaluator k is another And the views indicate the number of times there was an m-step difference. N (k) = Σ _{m = 0} ^M−1 C (k, m) is satisfied. Hereinafter, the operation of the number-of-matches counting section 110 will be described in more detail. The operation consists of the following (1) and (2).

rules

1 and 2.

Rule 1: If data exists in the learning label data for evaluator k and learning feature data number j, the label at that time is f (k, j); otherwise, f (k, j) = None And For example, in the example shown in FIG. 3, f (1,0) = 0 and f (2,0) = 0.

Rule 2: Let a set of evaluator numbers evaluating a certain data number j be L (j). For example, in the example of FIG. 3, L (0) = {1, 2}.

(2) As a loop process for the evaluator k (k = 1,... K), the following (2-1), (2-2), and (2-3) Execute the process.

とき When the minimum value of the (2-1) label is 1 and the maximum value is M, C (k, m) = 0 (m = 0, 1, 2,..., M−1).

(2-2) N (k) = 0.

{(2-3) As loop processing for data number j (j = 0, 1,..., J), j = 0, 1,. . , J, the following processes (2-3-1) and (2-3-2) are executed.

If (2-3-1) f (k, j) = None, the process proceeds to the next loop for j.

(2-3-2) In the case of f (k, j) {None, L ′ = L (j) − {k} is obtained and l {L ′ (evaluator set other than k for evaluating data j) (2-3-2-1) and (2-3-2-2) are executed as the loop processing.

(2-3-2-1) The difference s = | f (l, j) −f (k, j) | between the two scores is calculated and C (k, s) + = 1.

(2-3-2-2) N (k) + = 1

Next, the operation of the probability value calculation unit 120 in the modification will be described. In the embodiment, since the label is binary, it is assumed that the probability that the two people agree and the probability that they disagree follow a Bernoulli distribution of p and (1−p), and the probability value is calculated based on the binomial distribution. Calculated.

In the modified example, since the label has a plurality of step values not limited to binary values, P (k) is obtained on the assumption that C (k, s) follows a polynomial distribution. The probability difference of the evaluation of both a m specifically When p _m and calculates the square value χ as follows seek P (k).

P (k) = Chi_p (χ ² )

Ｃ Chi_p in the above equation is a function for calculating the P value of the χ square value.

When the evaluation of the M phase, when the probability that a score m is given as q _v, for example p _m can be set as follows.

The above formula means that the difference between two evaluators is calculated as shown in the table of FIG. 5 and the value obtained by summing the probability values that the difference between the scores is the same is obtained.

In this modification, a multi-level evaluation (for example, 5 levels) is assumed. However, even if a continuous value according to a normal distribution or a continuous value such as 0 to 1 according to a beta distribution is used, the distribution of the difference between the two persons is evaluated. P (k) can be obtained by performing a test on the assumption that

Also, in the present modified example, a one-dimensional evaluation called an impression value is assumed, but the present invention can be extended to a case where evaluation of multiple items is performed. In this case, for example, a P value is obtained for each of the multiple items in the same manner as in the modification, and an evaluator is selected for each item.

In addition, when evaluating multiple items, an evaluator having a low P value for all items may be selected. As described above, in a case where an evaluator is collectively evaluated and an evaluator is selected, as a method of calculating a P value for all items, for example, one of two patterns shown in the following (1) and (2) is used. Either or both can be used.

(1) Average all P values without weighted sum (= same weight).

(2) Average all P values by weighted sum.

In the case of (1), for example, when there are three items, item A, item B, and item C, the P value of item A for a certain evaluator is PA, the P value of item B is PB, and the value of item C is When the P value is PC, the P value of the evaluator is calculated as (PA + PB + PC) / 3.

In the case of averaging with a weighted sum as in (2), for example, the number of evaluation steps is set so that the P value when matching in seven levels of evaluation is evaluated higher than when matching in two levels of evaluation. Weights can be assigned according to the size so that the stability evaluation when the evaluations match is higher. Specifically, the weighting according to the magnitude of the number of evaluation steps is, for example, in the case of the number of evaluation steps M set for an arbitrary evaluation item, a value of 1 / M is used as a weight, and the product of each P value and the weight is used. The average obtained by adding is used for evaluator selection.

For example, when there are three items, item A, item B, and item C, it is assumed that the number of stages of items A, B, and C is MA, MB, and MC, respectively. If the P value is PA, the P value of item B is PB, and the P value of item C is PC, the P value of the evaluator is ((PA / MA) + (PB / MB) + (PC / MC)) / 3.

(Effects of the embodiment)

According to the technology according to the present embodiment described in the above Examples and Modifications, an evaluator capable of giving an average impression value label is selected based on the probability distribution based on the number of evaluations, the number of matches, and the like. (1) Even when the number of evaluators per data is small (for example, two), non-average evaluators can be excluded, and evaluators performing average evaluation can be selected. Further, an average impression value can be learned.

(Summary of embodiment)

As described above, according to the present embodiment, there is provided a selection device for selecting a specific evaluator from a plurality of evaluators who have performed evaluations on a plurality of evaluation targets, and an evaluator and an evaluation value for each evaluation target. Based on the input data including, a coincidence counting unit that calculates the degree of coincidence of the evaluation value with another evaluator for each evaluator, and for each evaluator, based on the degree of coincidence for each evaluator, A probability value calculation unit that calculates a probability value indicating whether the value significantly matches another evaluator, and an evaluation that selects an evaluator whose evaluation value significantly matches the other evaluator based on the probability value A selection device comprising: a selection unit.

The evaluation value is, for example, a value set for each evaluation target, and is a multi-step value of two or more values for one or a plurality of items, or a continuous value. Further, the probability value calculation unit may calculate the probability value assuming that the degree of coincidence follows a predetermined probability distribution.

Further, according to the present embodiment, an evaluation target and an evaluation value evaluated by the evaluator selected by the selection device are input as teacher data, and learning of a machine learning model is performed using the teacher data. A learning device comprising a learning unit is provided.

Although the present embodiment has been described above, the present invention is not limited to the specific embodiment, and various modifications and changes may be made within the scope of the present invention described in the appended claims. It is possible.

REFERENCE SIGNS LIST 100 learning device 110 number of matches totaling unit 120 probability value calculating unit 130 evaluator selecting unit 140 learning unit 150 learning label data DB
160 learning feature data DB
170 Drive device 171 Recording medium 172 Auxiliary storage device 173 Memory device 174 CPU
175 Interface device 176 Display device 177 Input device

Claims

A selection device for selecting a specific evaluator from a plurality of evaluators who have performed evaluations on a plurality of evaluation targets,
Based on input data including an evaluator and an evaluation value for each evaluation target, a coincidence degree totalization unit that calculates the degree of coincidence of the evaluation value with another evaluator for each evaluator;
A probability value calculation unit that calculates a probability value indicating whether or not the evaluation value significantly matches another evaluator based on the degree of coincidence for each evaluator;
An evaluator selecting unit that selects an evaluator whose evaluation value significantly matches another evaluator based on the probability value.
The selection according to claim 1, wherein the evaluation value is a value set for each evaluation target, and is a multi-level value of two or more values for one or more items or a continuous value. apparatus.
The selection device according to claim 1, wherein the probability value calculation unit calculates the probability value assuming that the degree of coincidence follows a predetermined probability distribution.
An evaluation target and an evaluation value evaluated by an evaluator selected by the selection device according to any one of claims 1 to 3 are input as teacher data, and the learning data of the machine learning model is input using the teacher data. A learning device comprising a learning unit for performing learning.
A selection method performed by a selection device that selects a specific evaluator from a plurality of evaluators who have performed evaluations on a plurality of evaluation targets,
Based on input data including an evaluator and an evaluation value for each evaluation target, a coincidence counting step of calculating a coincidence of the evaluation value with another evaluator for each evaluator;
A probability value calculating step of calculating a probability value indicating whether or not the evaluation value significantly matches another evaluator based on the degree of coincidence for each evaluator;
An evaluator selecting step of selecting an evaluator whose evaluation value significantly matches another evaluator based on the probability value.
A learning method performed by the learning device,
An evaluation target and an evaluation value evaluated by an evaluator selected by the selection device according to any one of claims 1 to 3 are input as teacher data, and the learning data of the machine learning model is input using the teacher data. A learning method comprising a learning step of performing learning.
A program for causing a computer to function as each section of the selection device according to any one of claims 1 to 3.
A program for causing a computer to function as a learning unit in the learning device according to claim 4.