WO2021255840A1

WO2021255840A1 - Estimation method, estimation device, and program

Info

Publication number: WO2021255840A1
Application number: PCT/JP2020/023644
Authority: WO
Inventors: 隆明長谷川; 節夫山田; 和之磯; 正之杉崎
Original assignee: 日本電信電話株式会社
Priority date: 2020-06-16
Filing date: 2020-06-16
Publication date: 2021-12-23
Also published as: JPWO2021256043A1; WO2021256043A1; JP7425368B2

Abstract

An estimation device (30) according to the present disclosure is provided with a determination unit (32) and a paragraph estimation unit (33). The determination unit (32) uses a binary classification model (1) learned on the basis of teacher data to which a binary label indicating whether a talk is changed or not is given with respect to an utterance which constitutes series data of a dialogue including a plurality of topics or divided units thereof, thereby determining whether the utterance which constitutes the series data to be processed is an utterance of changing the talk or not. On the basis of a result of the determination by the determination unit (32), the paragraph estimation unit (33) estimates a range of paragraphs from a change of the talk to a preceding utterance of the next change or to an utterance at an end of the dialogue in the series data to be processed.

Description

Estimating method, estimation device and program

This disclosure relates to an estimation method, an estimation device and a program.

In the department (so-called contact center) where the operator responds to inquiries about products or services from customers (customers), support for solving problems that customers have is required. In the contact center, the history of customer service by the operator (response log) is created, stored and shared. An operator, a contact center manager, or the like can review the accumulated response log to analyze inquiries from customers and improve the quality of response to customers. When reviewing the response log and looking back on the response with the customer, if the dialogue between the operator and the customer can be divided by topic, the work efficiency of looking back on the response can be improved.

The dialogue between the operator and the customer can be regarded as series data composed of multiple utterances along the time axis. By preparing teacher data with a label indicating the topic in the series data for a series of series data, dialogue is performed by machine learning using DNN (Deep Neural Network) such as RSTM (Long Short-Term Memory). It is possible to learn a classification model for classifying topics in (see Non-Patent Document 1).

In general, contact centers deal with a variety of tasks, depending on the type of product or service they handle, with a small number of topics that can be counted, or a large number of topics that can be counted. In some cases. When trying to classify topics in dialogue into many types of topics using the model described in Non-Patent Document 1, a small amount of teacher data reduces the accuracy of classification, and a large amount of teacher data is used to improve the accuracy. It costs a lot to prepare.

In addition, since words and phrases are relatively often omitted in utterances that compose series data, the length of the utterance, that is, the number of words may be reduced. Moreover, even if there are few types of topics, the topics may be similar to each other or the order of appearance of the topics may be indefinite. Even in these cases, it takes a lot of cost to prepare teacher data in order to construct a classification model capable of classifying topics.

In order to estimate a topic in the series data of a dialogue containing multiple topics, it is effective to estimate the range of paragraphs from the utterance immediately before the next switch or the utterance at the end of the dialogue. be. If the range of the paragraph from the change of story to the utterance immediately before the next switch or the utterance at the end of the dialogue can be estimated, the topic can be estimated by limiting the range to the utterances contained in that paragraph. The topic can be estimated with higher accuracy.

An object of the present disclosure made in view of the above problems is to provide an estimation method, an estimation device, and a program capable of estimating a range of paragraphs in a series of dialogue data including a plurality of topics. ..

In order to solve the above problems, the estimation method according to the present disclosure indicates whether or not the utterances constituting the series data of the dialogue including a plurality of topics or the divided units obtained by dividing the utterances are the switching of the utterances. Using the first model trained based on the first teacher data to which the first label is attached, it is determined whether or not the utterances constituting the series data to be processed are utterances of story switching. Judgment step to be performed, and a paragraph estimation step to estimate the range of paragraphs in the series data to be processed from the utterance immediately before the next switch or the utterance at the end of the dialogue based on the result of the determination. ,including.

Further, in order to solve the above-mentioned problem, whether or not the estimation device according to the present disclosure switches the utterance to the utterance constituting the series data of the dialogue including a plurality of topics or the divided unit obtained by dividing the utterance. Whether or not the utterances constituting the series data to be processed are the utterances of the switching of the utterances using the first model trained based on the first teacher data to which the first label indicating the above is given. Based on the judgment unit that determines It is provided with a paragraph estimation unit and a paragraph estimation unit.

Further, in order to solve the above-mentioned problems, the program according to the present disclosure causes a computer to execute the above-mentioned estimation method.

According to the estimation method, estimation device and program according to the present disclosure, it is possible to estimate the range of paragraphs in the series data of the dialogue including a plurality of topics.

It is a figure which shows the configuration example of the learning apparatus which trains a binary classification model. It is a figure which shows the configuration example of the learning apparatus which trains a multi-value classification model. It is a figure which shows an example of the structure of the estimation apparatus which concerns on one Embodiment of this disclosure. It is a figure which shows another example of the structure of the estimation apparatus which concerns on one Embodiment of this disclosure. It is a figure which shows still another example of the structure of the estimation apparatus which concerns on one Embodiment of this disclosure. It is a flowchart which shows an example of the operation of the multi-valued label complement part shown in FIG. It is a flowchart which shows an example of the operation of the estimation apparatus shown in FIG. It is a flowchart which shows an example of the operation of the estimation apparatus shown in FIG. It is a flowchart which shows an example of the operation of the estimation of the paragraph range by the estimation apparatus shown in FIG. It is a flowchart which shows an example of the operation of the estimation of a topic by the estimation apparatus shown in FIG. It is a figure for demonstrating the learning of a binary classification model and a multi-value classification model. It is a figure for demonstrating the estimation of the topic by the estimation apparatus shown in FIG. It is a figure for demonstrating the estimation of the topic by the estimation apparatus shown in FIG. It is a figure for demonstrating the estimation of the topic by the estimation apparatus shown in FIG. It is a figure for demonstrating the estimation of the topic by the estimation apparatus shown in FIG.

Hereinafter, embodiments of the present disclosure will be described with reference to the drawings.

First, the outline of this disclosure will be explained.

The present disclosure estimates and estimates the range of paragraphs from one talk switch to the utterance immediately before or at the end of the dialogue in a series of dialogue data involving multiple topics, such as an operator-customer dialogue. Regarding estimation of topics in paragraphs.

In the following, the dialogue between the operator and the customer at the contact center will be considered as an example. As a case where the operator takes the initiative in conducting dialogue, when solving the problem that the customer has, the operator asks the customer about the current situation or the history so far to find out the cause, and the operator is the customer. There are cases where documents necessary for business procedures are created while conducting interviews about the situation.

In the dialogue in the case described above, the unit of the content that the operator is asking can be regarded as one topic. However, it is difficult to uniquely determine the most appropriate topic type from many topic types. In addition, all the topics in the dialogue as described above are topics in the range related to a specific business, and one topic and another topic are often similar. And it is difficult to distinguish between similar topics. Therefore, it is difficult to divide the entire dialogue into a series of topics.

However, when the operator moves on to the next story, the operator often utters words such as "this time", "in", and "after" to tell the customer that the story will change. In addition, at the end of the talk, the operator often receives the customer's utterance and utters words such as "smart" and "acknowledged" to inform the customer that the talk is over. Since these words do not depend on the content of the story, they are useful for detecting the change of story (break of story).

In the present disclosure, for example, a rule for determining whether or not the utterance in the series data is a story switching utterance is created by using the above-mentioned words and phrases indicating the story switching. Then, in the present disclosure, it is determined whether or not the utterance in the series data is the utterance of the switching of the talk, based on the created rule. Further, in the present disclosure, for example, a teacher who assigns a label indicating that the utterance of the story change is a story change utterance and a label indicating that the other utterances are not the story change utterances. Based on the data, create a model that determines whether or not the utterance is a switching utterance, and use the judgment result of the created model to utter the utterance immediately before the next switching or the utterance at the end of the dialogue. Estimate the range of paragraphs up to. In addition, in this disclosure, the topic in the paragraph or the utterance contained in the paragraph is estimated. Even if the dialogue contains many or similar topics, if the paragraph range from the utterance just before the next switch or the utterance at the end of the dialogue can be estimated. , Since the topic can be estimated by focusing on the utterances included in the paragraph, it is possible to estimate the topic with higher accuracy.

As described above, in the present disclosure, using a model learned based on teacher data, it is determined whether or not the utterances constituting the series data are utterances of switching talks. Further, in the present disclosure, a model learned based on teacher data may be used for estimating the topic in the paragraph. First, the learning of these models will be described.

FIG. 1 is a diagram showing a configuration example of a learning device 10 for learning a binary classification model 1 for determining whether or not an utterance constituting the series data is an utterance of switching talks.

The learning device 10 shown in FIG. 1 includes an input unit 11 and a binary classification learning unit 12.

The input unit 11 inputs the series data of the dialogue including a plurality of topics. The series data is, for example, text data in which time-series utterances of an operator and a customer are voice-recognized. The series data input to the input unit 11 may be an utterance unit or a division unit (for example, a word unit or a character unit) in which the utterance is divided. When the series data is input online, the input unit 11 may sequentially input the text data obtained by the voice recognition of each utterance during the dialogue. When the series data is input offline, the input unit 11 may sort the start time or end time of each utterance during the dialogue and input the text data of each utterance.

Further, the input unit 11 inputs a binary label (first label) indicating whether or not the utterance constitutes the series data or the utterance is divided into divided units, indicating whether or not the utterance is switched. .. The binary label is, for example, "1 (switching story)" or "0 (not switching story)", or "True (switching story)" or "False (not switching story)". Labels such as. Further, if the utterance or its division unit is given some label indicating the change of the story, the input unit 11 considers it as "True (change of the story)" and some label indicating the change of the story. If is not given, it may be regarded as "False (not a change of utterance)".

The binary label is manually attached to the utterances that make up the series data or their division units in advance. As mentioned above, there are words and phrases that are often spoken at the transition of the story. Binary labels are given, for example, based on these terms. For example, taking the failure of a device as an example, when it is desired to classify whether or not the topic is related to the failure of the device, the topic of the utterance regarding the failure of the device is "device failure" regardless of the cause. On the other hand, if you want to classify topics according to the cause of the failure, the topic will be different for each cause of the failure. Therefore, depending on how the topic to be classified is decided, the topic may not be switched even if the story is divided. Therefore, when assigning a binary label, it is shown that even an utterance that transitions from a certain topic to the same topic is a change of story for an utterance that may be a change of story or a division unit thereof. It is preferable that a label is attached. By doing so, it is possible to increase the number of positive examples of the utterance of the story change and improve the accuracy of the determination of the utterance of the story change.

In this way, the input unit 11 is a binary label indicating whether or not the series data of the dialogue including a plurality of topics and the utterances constituting the series data or the division unit thereof are switched. Is entered. The input unit 11 outputs the input series data and the binary label to the binary classification learning unit 12.

The binary classification learning unit 12 learns using the series data and the binary label output from the input unit 11 as teacher data, and determines whether or not the utterance in the series data is a talk switching utterance. Learn model 1 (first model). Therefore, in the binary classification model 1, a binary label (first label) indicating whether or not the utterance is switched with respect to the utterance or its division unit constituting the series data of the dialogue including a plurality of topics is provided. It is a model trained based on the given teacher data (first teacher data). For model training, LSTM or the like suitable for learning time-series data can be used.

As described above, in the teacher data used for learning the binary classification model 1, for utterances that may be a change of talk or division units thereof, including utterances that transition from one topic to the same topic. , A label indicating that the story is switched is given. Therefore, according to the binary classification model 1 learned using such teacher data, the topics are not switched depending on how the topic to be classified is determined, and the utterances related to the same topic continue in the section. Even if there is, it may be determined that the utterance is a change of story.

Next, with reference to FIG. 2, the configuration of the learning device 20 for learning the multi-value classification model 2 for classifying (estimating) topics will be described.

As shown in FIG. 2, the learning device 20 includes an input unit 21, a multi-value label complement unit 22, and a multi-value classification learning unit 23.

The input unit 21 inputs the series data of the dialogue including a plurality of topics. Further, the input unit 21 inputs a binary label indicating whether or not the utterance constitutes the series data or the division unit thereof, which indicates whether or not the utterance is switched. Further, the input unit 21 inputs a range in which one topic continues in the series data and a multi-valued label indicating a topic in the range. The series data and the binary label are the same as the series data and the binary label input to the input unit 11 shown in FIG. Multi-valued labels are given manually. Specifically, in the series data, a range in which one topic continues is specified, and a multi-valued label indicating a topic in the specified range is assigned from labels of a plurality of topics. The binary label and the multi-valued label for one series data may be input in separate files, or may be input together in one file.

The input unit 21 outputs the input series data, binary label, and multi-value label to the multi-value label complement unit 22.

The multi-value label complement unit 22 generates teacher data (second teacher data) for learning the multi-value classification model 2 from the series data, the binary label, and the multi-value label input from the input unit 21. Specifically, the multi-valued label complementing unit 22 assigns a multi-valued label indicating a topic in the range including the utterance to the utterance or the division unit thereof to which the label indicating that the utterance is switched is given. do. As described above, in assigning a binary label as teacher data, for an utterance that may be a change of story or a division unit thereof, including an utterance that transitions from a certain topic to the same topic. A label indicating that it is a switch is given. Therefore, for example, even if the utterances are within the range in which the utterances related to the same topic continue, a label indicating that the utterances are switched may be given. The multi-valued label complementing unit 22 also assigns a multi-valued label indicating a topic in the range including the utterance to such an utterance or a division unit thereof. By doing so, it is possible to increase the teacher data of utterances related to each topic and improve the accuracy of topic estimation.

The multi-valued label complementing unit 22 outputs the utterance to which the multi-valued label is attached or the division unit thereof and the multi-valued label assigned to the utterance or the division unit to the multi-value classification learning unit 23.

The multi-value classification learning unit 23 uses the utterance or its division unit output from the multi-value label complement unit 22 and the multi-value label given to the utterance or division unit as teacher data (second teacher data). , Multi-value classification model 2 (second model) is learned. Therefore, in the multi-value classification model 2, the teacher data (second label) to which the multi-value label (second label) indicating the topic to which the utterance is related is given to the utterance or the division unit thereof constituting the series data. It is a model learned based on teacher data). The teacher data used for learning the multi-valued classification model 2 is given a binary label indicating that the utterance of the story change or its division unit is a story change, and the range in which the topic continues and its division unit. Generated by assigning a multi-valued label indicating a topic in the range including the utterance to the utterance or its division unit to which the label indicating that the utterance is switched is given in the series data in which the topic in the range is specified. It is the data that was made.

Next, the configuration of the estimation device 30 according to the present embodiment will be described with reference to FIG. The estimation device 30 according to the present embodiment is a paragraph in a paragraph from one talk change to the utterance immediately before the next switch or the utterance at the end of the dialogue in the series data of the dialogue including a plurality of topics such as the dialogue between the operator and the customer. Estimate the range and estimate the topic in that paragraph.

As shown in FIG. 3, the estimation device 30 according to the present embodiment includes an input unit 31, a determination unit 32, a paragraph estimation unit 33, a topic estimation unit 34, and an output unit 35.

The input unit 31 inputs series data including a plurality of topics. The series data input to the input unit 31 is data to be processed that is the target of estimation of the paragraph range and the topic in the paragraph. The series data is, for example, text data in which time-series utterances of an operator and a customer are voice-recognized. When the series data is input online, the input unit 31 may sequentially input the text data obtained by the voice recognition of each utterance during the dialogue. Further, when the series data is input offline, the input unit 31 may sort by the start time or the end time of each utterance during the dialogue and input the text data of each utterance. The input unit 31 outputs the input series data to the determination unit 32.

The determination unit 32 uses the binary classification model 1 (first model) to determine whether or not the utterance constituting the series data output from the input unit 31 is an utterance of switching of the story, and determines. The result of is output to the paragraph estimation unit 33. As described above, the binary classification model 1 is a binary label (first) indicating whether or not the utterance or its division unit constitutes the series data of the dialogue including a plurality of topics. It is a model trained based on the teacher data (first teacher data) to which the label) is given.

The paragraph estimation unit 33 estimates the range of paragraphs in the series data from the utterance immediately before the next switch or the utterance at the end of the dialogue in the series data based on the result of the determination by the determination unit 32. Specifically, the paragraph estimation unit 33 ranges from the utterance determined by the determination unit 32 to be the utterance of the switching of the story to the utterance immediately before the utterance determined to be the next utterance of the switching of the story. Estimate the range as one paragraph. As described above, in the teacher data used for learning the binary model 1, even if the utterances are within the range in which the utterances related to the same topic continue, a label indicating that the utterances are switched is given. There is. Therefore, the paragraph estimation unit 33 may classify the range into a plurality of paragraphs even if the utterances related to the same topic continue.

The topic estimation unit 34 uses the multi-value classification model 2 (second model) to estimate the topic in the paragraph or the utterance contained in the paragraph whose range is estimated by the paragraph estimation unit 33. As described above, the multi-value classification model 2 is a teacher data (second label) to which a multi-value label (second label) indicating a topic related to the utterance is attached to the utterance or the division unit thereof constituting the series data. It is a model learned based on the second teacher data). The teacher data used for learning the multi-valued classification model 2 is given a binary label indicating that the utterance of the story change or its division unit is a story change, and the range in which the topic continues and the range thereof. The topic in is generated using the identified series data. Specifically, the teacher data used for learning the multi-valued classification model 2 includes the utterance in the above-mentioned series data in the utterance or the division unit thereof to which the binary label indicates that the utterance is a change of story. It is generated by adding a multi-valued label indicating the topic in the range.

The output unit 35 outputs the utterances constituting the paragraph for each paragraph whose range is estimated in the series data. Further, the output unit 35 may output a multi-valued label indicating a topic in the paragraph, the disclosure time and the end time of the paragraph, and the like.

As described above, in the present embodiment, the determination unit 32 assigns a binary label indicating whether or not the utterance is switched to the utterance or the division unit thereof constituting the series data of the dialogue including a plurality of topics. Using the binary classification model 1 learned based on the teacher data, it is determined whether or not the utterances constituting the series data to be processed are utterances of switching talks. Then, the paragraph estimation unit 33 estimates the range of paragraphs in the series data to be processed based on the result of the determination by the determination unit 32. Further, the topic estimation unit 34 estimates the topic in the paragraph or the utterance included in the paragraph whose range is estimated by the paragraph estimation unit 33 by using the multi-value classification model 2. Further, the output unit 35 outputs an utterance for each paragraph whose range is estimated, a multi-valued label indicating a topic in the paragraph, a disclosure time and an end time of the paragraph, and the like.

Further, in the present embodiment, the learning device 10 configures the series data by using the teacher data to which the binary label indicating whether or not the utterance is switched is used for the utterance or the division unit thereof. It is possible to generate a binary classification model 1 for determining whether or not the utterance to be spoken is an utterance of switching talks. Further, the learning device 20 uses the teacher data to which the multi-valued label indicating the topic related to the utterance is given to the divided unit obtained by dividing the utterance or the previous utterance constituting the series data, so that the paragraph or the paragraph can be used. It is possible to learn the multi-value classification model 2 for determining the topic in the utterance included in. Further, the estimation device 30 can estimate the range of paragraphs in the series data based on the result of the determination of the binary classification model 1. In addition, the estimation device 30 can use the multi-value classification model 2 to estimate a paragraph whose range has been estimated or a topic in an utterance constituting the paragraph. Therefore, according to the estimation device 30 according to the present embodiment, the range of paragraphs from the utterance immediately before the next switch or the utterance at the end of the dialogue is estimated from the series data of the dialogue including a plurality of topics. be able to. Further, according to the estimation device 30 according to the present embodiment, by estimating the range of the paragraph in the series data, the topic can be estimated only for the utterances included in the paragraph, so that the accuracy of the estimation of the topic can be improved. Can be planned. Further, according to the estimation device 30 according to the present embodiment, the topic can be estimated only for the utterances constituting the paragraph, so that the estimation accuracy of the topic can be improved.

In FIG. 3, the estimation device 30 has been described with an example of estimating a topic using the multi-value classification model 2, but the present disclosure is not limited to this. As described above, in the learning of the multi-value classification model 2, teacher data in which one topic in the series data is continuous and the topic in the range is manually specified is used. It is also relatively easy to prepare such teacher data when targeting a small number of topics. On the other hand, it may be difficult to prepare teacher data that specifies a range in which one topic continues and a topic in that range, such as when a large number of topics are targeted. In the present disclosure, even in such a case, it is possible to estimate the topic without using the multi-value classification model 2.

FIG. 4 is a diagram showing a configuration example of an estimation device 30a for estimating a topic without using the multi-value classification model 2 according to the present embodiment. In FIG. 4, the same components as those in FIG. 3 are designated by the same reference numerals, and the description thereof will be omitted.

As shown in FIG. 4, the estimation device 30a includes an input unit 31, a determination unit 32, a paragraph estimation unit 33, a keyword extraction unit 36, a topic estimation unit 34a, and an output unit 35. The estimation device 30a shown in FIG. 4 is different from the estimation device 30 shown in FIG. 3 in that the keyword extraction unit 36 is added and the topic estimation unit 34 is changed to the topic estimation unit 34a.

The keyword extraction unit 36 extracts at least one keyword from the utterances included in the paragraph whose range is estimated by the paragraph estimation unit 33. Any method can be used as the method for extracting keywords, and for example, an existing method such as tf-idf (Term Frequency-Inverse Document Frequency) can be used. The number of keywords extracted by the keyword extraction unit 36 may be limited to a predetermined number in advance, or may be specified by the user.

The topic estimation unit 34a estimates the topic in the paragraph or the utterance contained in the paragraph based on the keywords extracted from the utterance included in the paragraph by the keyword extraction unit 36. The topic estimation unit 34a may, for example, estimate the extracted keyword as a paragraph or a topic in an utterance contained in the paragraph. Further, the topic estimation unit 34a may estimate, for example, a topic having a high similarity to the extracted keyword from a plurality of predetermined topics as a paragraph or a topic in the utterance included in the paragraph.

As described above, according to the estimation device 30a shown in FIG. 4, it is possible to estimate the topic in the paragraph or the utterance contained in the paragraph without using the multi-value classification model 2. Therefore, even when it is difficult to prepare a range of topics and a large amount of teacher data in which the topics in the range are specified, it is possible to estimate the topics in the series data.

FIG. 5 is a diagram showing a configuration example of the estimation device 30b according to the present embodiment. Like the estimation device 30a shown in FIG. 4, the estimation device 30b shown in FIG. 5 estimates the topic without using the multi-value classification model 2. In FIG. 5, the same components as those in FIG. 4 are designated by the same reference numerals, and the description thereof will be omitted.

As shown in FIG. 5, the estimation device 30b includes an input unit 31, a determination unit 32, a paragraph estimation unit 33, a clustering unit 37, a keyword extraction unit 36b, a topic estimation unit 34b, and an output unit 35. Be prepared. The estimation device 30b shown in FIG. 5 has a point that a clustering unit 37 is added, a point that the keyword extraction unit 36 is changed to a keyword extraction unit 36b, and a topic estimation unit 34a, as compared with the estimation device 30a shown in FIG. It is different from the point changed to the topic estimation unit 34b.

In the estimation device 30b shown in FIG. 5, at least one or more series data is input. The clustering unit 37 clusters a plurality of paragraphs whose range is estimated by the paragraph estimation unit 33 for one or more input series data for each similar paragraph. As the clustering method, any existing method can be used. The clustering unit 37 determines a representative paragraph in a cluster consisting of similar paragraphs. The clustering unit 37 determines, for example, the paragraph at the center of the cluster as the representative paragraph among the paragraphs constituting the cluster. Further, the clustering unit 37 may determine, for example, any paragraph among the paragraphs constituting the cluster as a representative paragraph.

The keyword extraction unit 36b extracts keywords from the utterances included in the representative paragraph determined by the clustering unit 37 among the paragraphs constituting the cluster.

The topic estimation unit 34b estimates the topic in the paragraph constituting the cluster based on the keywords extracted by the keyword extraction unit 36b from the utterances included in the paragraph representing the cluster. Specifically, the topic estimation unit 34b estimates a topic estimated based on a keyword extracted from an utterance included in a paragraph representing a cluster as a topic in all paragraphs constituting the cluster.

Further, in FIGS. 3 to 5, the

estimation devices

30, 30a, and 30b have been described by using an example of processing the result of voice recognition of the dialogue between the operator and the customer in the contact center, but the present disclosure is limited to this. It is not something that can be done. For example, in the

estimation devices

30, 30a, 30b, a morphological analysis unit that performs morphological analysis for text chat may be provided after the input unit 31.

Further, in FIGS. 3 to 5, the description has been made using an example in which series data in which a plurality of utterances are arranged in chronological order is input, but the present disclosure is not limited to this. In order to input the utterances constituting the series data one by one, a function unit for extracting the utterances one by one from the series data may be provided in front of the input unit 31.

FIG. 6 is a flowchart for explaining the complementation of the multi-valued label in the learning device 20 shown in FIG.

The multi-valued label complementing unit 22 reads the utterances to which the multi-valued label indicating the topic and the binary label indicating the switching of the talk are attached one by one from the series data input to the input unit 21 (step S11). .. The multi-valued label is given only to the first utterance in the range indicating the topic, and is not given to other utterances. The binary label indicating the change of talk is given only to the utterance showing the change of talk, and is not given to other utterances.

The multi-valued label complementing unit 22 determines whether or not a multi-valued label indicating a topic is attached to the read utterance (step S12).

When it is determined that the multi-value label is attached (step S12: Yes), the multi-value label complementing unit 22 separates the multi-value label of the read utterance so that the multi-value label of the read utterance can be understood. Store the multi-valued label in a temporary storage device. When the multi-value label already stored in the multi-value label temporary storage device exists, the multi-value label complementing unit 22 gives the multi-value attached to the speech that reads the stored multi-value label. The label is updated and stored in the multi-value label temporary storage device (step S13).

When it is determined that the multi-valued label is not attached (step S12: No), or when the multi-valued label attached to the read utterance is updated and stored, the multi-valued label complementing unit 22 adds the read utterance to the read utterance. , It is determined whether or not a binary label indicating that the utterance is switched is attached (step S14).

When it is determined that the binary label indicating that the utterance is switched is attached (step S14: Yes), the multi-value label complementing unit 22 stores the multi-value label stored in the multi-value label temporary storage device. It is given to the read utterance (step S15). As described above, when the read utterance is given a binary label indicating that the dialogue is switched, the multi-valued label complementing unit 22 indicates a multi-valued topic in the series data in the range including the utterance. Give a value label.

When it is determined that the binary label indicating that the talk is switched is not given (step S14: No), or when the read utterance is given a multi-value label, the multi-value label complementing unit 22 reads. It is determined whether or not the utterance is the utterance at the end of the dialogue (step S16).

When it is determined that the read utterance is the utterance at the end of the dialogue (step S16: Yes), the multi-value label complementing unit 22 ends the process.

When it is determined that the read utterance is not the utterance at the end of the dialogue (step S16: No), the multi-value label complementing unit 22 returns to the process of step S11 and reads the next utterance.

In FIG. 6, the multi-valued label is given to only the first utterance in the range indicating the topic, and is not given to other utterances. However, all the utterances in the range indicating the topic are given in advance. May be labeled with a multi-valued label for that topic. In this case, if the multi-valued label is deleted from the utterances that are not given the binary label indicating the change of story, the multi-valued label indicating the topic is given only to the utterances that are given the binary label indicating the change of story. Label.

In this way, any method may be used as long as a multi-valued label indicating the topic is attached to the utterance of the change of story.

Next, the operation of the estimation device 30 shown in FIG. 3 will be described. FIG. 7 is a flowchart showing an example of the operation of the estimation device 30, and is a diagram for explaining an estimation method by the estimation device 30.

The determination unit 32 reads the utterances one by one from the series data of the processing target input to the input unit 31 (step S21). The determination unit 32 uses the binary classification model 1 to determine whether or not the read utterance is a talk switching utterance (step S22).

The paragraph estimation unit 33 determines whether the read utterance is determined by the determination unit 32 to be a switching utterance, or whether the read utterance is an utterance at the end of the dialogue (step). S23).

When it is determined that the read utterance is not the utterance of the switching of the talk and the read utterance is not the utterance at the end of the dialogue (step S23: No), the paragraph estimation unit 33 determines the read utterance. , Accumulate as utterances constituting the paragraph (step S24). When the read utterances are accumulated, the process is repeated from step S21.

When it is determined that the read utterance is the utterance of the switching of the story, or the read utterance is determined to be the utterance at the end of the dialogue (step S23: Yes), the paragraph estimation unit 33 has accumulated. It is determined whether or not there is an utterance (step S25).

When it is determined that there are accumulated utterances (step S25: Yes), the paragraph estimation unit 33 estimates that the range of the accumulated utterances is a paragraph, and the accumulated utterances are used as the utterances constituting the paragraph, and the topic estimation unit 34 Output to. The topic estimation unit 34 estimates the topic in the paragraph whose range has been estimated by the paragraph estimation unit 33 using the multi-value classification model 2 (step S26).

In FIG. 7, the explanation is given using an example of estimating a topic for each paragraph using the multi-value classification model 2, but the present disclosure is not limited to this. The topic estimation unit 34 may estimate the topic in at least one utterance unit included in the paragraph. In this case, the topic estimation unit 34 may estimate the topic using only the first utterance of the paragraph, or may estimate the topic using a predetermined number of utterances from the first utterance of the paragraph. .. When a topic is estimated in units of one or more utterances, the multi-value classification model 2 is learned based on teacher data to which a multi-value label is attached to each unit for estimating a topic.

The topic estimation unit 34 attaches a multi-valued label indicating the estimated topic to the paragraph (step S27). The paragraph estimation unit 33 resets the accumulation of utterances (step S28), and determines whether or not the read utterance is the utterance at the end of the dialogue (step S29).

When it is determined that the read utterance is not the utterance at the end of the dialogue (step S29: No), the paragraph estimation unit 33 returns to the process of step S24 and accumulates the read utterance. By doing this, the read utterance is accumulated as the first utterance of a new paragraph.

When it is determined that the read utterance is the utterance at the end of the dialogue (step S29: Yes), the paragraph estimation unit 33 ends the process.

As described above, the estimation method by the estimation device 30 includes a determination step (step S22) and a paragraph estimation step (steps S23 to S25). In the determination step, teacher data to which a binary label (first label) indicating whether or not the utterance is switched is given to the utterance or its division unit constituting the series data of the dialogue including a plurality of topics. Using the binary classification model 1 (first model) learned based on (first teacher data), whether or not the utterances constituting the series data to be processed are utterances of switching stories. judge. In the paragraph estimation step, based on the result of the determination, the range of paragraphs in the series data to be processed from the utterance immediately before the next switch or the utterance at the end of the dialogue is estimated.

Whether or not the utterance that constitutes the series data is the utterance of the switching of the story by using the teacher data to which the binary label indicating whether or not the story is switched is attached to the utterance or its division unit. It is possible to generate a binary classification model 1 for determining whether or not. Then, based on the result of the determination of the binary classification model 1, the range of paragraphs in the series data to be processed can be estimated. Therefore, it is possible to estimate the range of paragraphs in the series data of a dialogue containing a plurality of topics.

Further, the estimation method according to the present embodiment may further include a topic estimation step (step S26). In the topic estimation step, the teacher data (second teacher data) to which the multi-valued label (second label) indicating the topic to which the utterance is related is given to the utterances constituting the series data or the division unit thereof. Using the multivalued classification model 2 (second model) learned based on the paragraph, the topic in the paragraph or the utterance contained in the paragraph is estimated. By estimating the range of the paragraph, the topic can be estimated only for the utterances included in the paragraph, so that the estimation accuracy of the topic can be improved.

Next, the operation of the estimation device 30a shown in FIG. 4 will be described. FIG. 8 is a flowchart showing an example of the operation of the estimation device 30a shown in FIG. 4, and is a diagram for explaining an estimation method by the estimation device 30a. In FIG. 8, the same processing as in FIG. 7 is designated by the same reference numerals, and the description thereof will be omitted.

When it is determined that there are accumulated utterances (step S25: Yes), the paragraph estimation unit 33 estimates that the range of the accumulated utterances is a paragraph, and outputs the accumulated utterances to the keyword extraction unit 36. The keyword extraction unit 36 extracts keywords from the utterances included in the paragraph whose range is estimated by the paragraph estimation unit 33 (step S31). The topic estimation unit 34a estimates the topic in the paragraph or the utterance included in the paragraph based on the keyword extracted by the keyword extraction unit 36 from the utterance included in the paragraph (step S32).

As described above, the estimation method by the estimation device 30a includes a keyword extraction step (step S31) and a topic estimation step (step S32). In the keyword extraction step, keywords are extracted from the utterances contained in the paragraph whose range is estimated. In the topic estimation step, the topic in the paragraph or the utterance contained in the paragraph is estimated based on the keywords extracted from the utterance contained in the paragraph.

Next, the operation of the estimation device 30b shown in FIG. 5 will be described. FIG. 9 is a flowchart showing an example of the operation of estimating the range of the paragraph by the estimation device 30b shown in FIG. 5, and is a diagram for explaining the estimation method by the estimation device 30b. In FIG. 9, the same processing as in FIG. 7 is designated by the same reference numerals, and the description thereof will be omitted.

When the estimation device 30b determines that there are accumulated utterances (step S25: Yes), the paragraph estimation unit 33 estimates that the range of the accumulated utterances is a paragraph. Then, the paragraph estimation unit 33 resets the accumulation of utterances (step S28).

FIG. 10 is a flowchart showing an example of the operation of estimating a topic by the estimation device 30b shown in FIG. 5, and is a diagram for explaining an estimation method by the estimation device 30b.

The clustering unit 37 reads the paragraph whose range has been estimated by the paragraph estimation unit 33 (step S41). The clustering unit 37 reads a plurality of paragraphs contained in at least one or more series data. That is, the clustering unit 37 repeats the process of step S41 as many times as necessary.

The clustering unit 37 clusters a plurality of read paragraphs for each similar paragraph (step S42).

Next, the clustering unit 37 determines whether or not there are unprocessed clusters (step S43). An unprocessed cluster is a cluster in which paragraphs contained in the cluster are not given multi-value labels.

When it is determined that an unprocessed cluster exists (step S43: No), the clustering unit 37 determines one of the unprocessed clusters as the cluster to be processed, and the paragraph included in the cluster to be processed is included. A representative paragraph is determined from the inside (step S44). The clustering unit 37 determines, for example, the paragraph at the center of the cluster as the representative paragraph.

The keyword extraction unit 36b extracts keywords from the utterances included in the representative paragraph of the cluster determined by the clustering unit 37 (step S45).

The topic estimation unit 34b estimates the topic in the paragraph representing the cluster based on the keywords extracted by the keyword extraction unit 36b (step S46). Next, the topic estimation unit 34b determines whether or not there is an unprocessed paragraph (step S47). The unprocessed paragraph is a paragraph included in the cluster to be processed and is not given a multi-value label.

When it is determined that there is an unprocessed paragraph (step S47: No), the topic estimation unit 34b estimates the unprocessed paragraph included in the cluster based on the keyword extracted from the representative paragraph of the cluster. Is given a multi-valued label indicating (step S48). Then, the topic estimation unit 34b returns to the process of step S47.

When the topic estimation unit 34b determines that there is no unprocessed paragraph (step S47: Yes), the process is repeated from step S43.

As described above, the estimation method by the estimation device 30b further includes a clustering step (step S42). In the clustering step, a plurality of paragraphs whose range is estimated based on one or a plurality of series data are clustered for each similar paragraph. In the keyword extraction step, keywords are extracted from the utterances included in the representative paragraph among the paragraphs included in the cluster consisting of similar paragraphs. In the topic estimation step, the topic in the paragraphs constituting the cluster including the representative paragraph is estimated based on the keywords extracted from the utterances included in the representative paragraph.

Next, model learning (binary classification model 1 and multi-value classification model 2) will be described using a specific example shown in FIG. In the following, it is assumed that the series data includes five topics, "topic A", "topic B", "topic C", "topic D", and "topic E".

As shown in FIG. 11, in the series data used as teacher data, the range in which one topic continues and the topic in that range are manually specified, and for each range in which one topic continues, the topic in that range. A multi-valued label indicating In addition, a binary label indicating whether or not the utterance is switched is manually attached to the utterances constituting the series data. In addition, in FIG. 11, for the sake of simplification of the figure, it is shown that the utterance is the utterance of the talk change only for the utterance of the talk change. As described above, even within the range in which the utterances related to one topic continue, a binary flag indicating that the utterances are switched is given to the utterances that are switched. Therefore, in FIG. 11, for example, an utterance existing in the middle of the range in which the utterance related to the topic A continues may be given a binary label indicating that the utterance is a change of talk.

The above-mentioned series data and binary label are input to the learning device 10, and the binary classification model 1 is trained using LSTM or the like based on the input series data and binary label.

Further, the above-mentioned series data, binary label and multi-value label are input to the learning device 20. In the learning device 20, the multi-valued label is complemented. That is, as shown in FIG. 11, for an utterance to which a label indicating that the utterance is switched is given, a multi-valued label indicating a topic in the range of series data including the utterance is given. By doing so, teacher data is created with a multi-valued label indicating the topic to which the utterance is related to the utterances constituting the series data. As described above, a multi-valued label indicating a topic related to the utterance may be attached to the division unit of the utterance constituting the series data.

Based on the created teacher data, the multi-value classification model 2 is learned using LSTM or the like. In the learning of the multi-value classification model 2, the learning may be performed using only the utterances with the multi-value label, or the learning may be performed using the utterances of the entire paragraph including the utterances with the multi-value label. It may be done.

FIG. 12 is a diagram showing an example of topic estimation by the estimation device 30 shown in FIG. In FIG. 12, it is assumed that the multi-valued classification model 2 is learned in utterance units.

When the series data of one dialogue is input to the estimation device 30, as shown in FIG. 12, whether or not the utterances constituting the series data are the utterances of switching of the talks using the binary classification model 1. It is judged. Then, the range from the utterance of the change of talk to the utterance immediately before the utterance of the change of the next story or the utterance at the end of the dialogue is estimated to be one paragraph.

Next, as shown in FIG. 12, among the utterances included in the paragraph whose range is estimated, the utterance determined to be the utterance of the switching of the utterance is estimated by the multi-value classification model 2 as the topic in the utterance. Will be done. In the multi-valued classification model 2, learning may be performed not in utterance units but in paragraph units. In this case, as shown in FIG. 13, the topic is estimated in paragraph units by the multi-value classification model 2.

FIG. 14 is a diagram showing an example of topic estimation by the estimation device 30a shown in FIG.

When the series data of one dialogue is input to the estimation device 30a, as shown in FIG. 14, whether or not the utterance constituting the series data is the utterance of switching of the talk using the binary classification model 1. It is judged. Then, the range from the utterance of the change of story to the utterance immediately before the utterance of the change of the next story is estimated to be one paragraph.

Next, keywords are extracted from the utterances included in the paragraph whose range is estimated, the topic of that paragraph is estimated based on the extracted keywords, and a multi-valued label indicating the estimated topic is given. In this way, the topic in the paragraph can be estimated without using the multi-valued classification model 2. Therefore, even when it is difficult to prepare the teacher data necessary for learning the multi-valued classification model 2, the topic of the paragraph included in the series data can be estimated. Note that FIG. 14 shows an example in which different multi-value labels (“Topic 1” to “Topic 10”) are assigned to each paragraph, but these are necessarily different topics. Do not mean.

FIG. 15 is a diagram showing an example of topic estimation by the estimation device 30b shown in FIG.

When the series data of one or more dialogues is input to the estimation device 30b, as shown in FIG. 15, is the utterance constituting the series data the utterance of the switching of the talks using the binary classification model 1? It is judged whether or not. Then, the range from the utterance of the change of story to the utterance immediately before the utterance of the change of the next story is estimated to be one paragraph.

Next, as shown in FIG. 15, a plurality of paragraphs whose range is estimated are clustered for each similar paragraph. A representative paragraph is determined from a cluster of similar paragraphs, and keywords are extracted from the utterances contained in the representative paragraph. In FIG. 15, the paragraph shown by the thick line indicates the representative paragraph.

Next, the topic in the representative paragraph is estimated based on the keywords extracted from the utterances included in the representative paragraph of the cluster, and a multi-valued label indicating the estimated topic is given to the representative paragraph. Further, as shown in FIG. 15, other paragraphs constituting the cluster are also given the same multi-valued label as the representative paragraph of the cluster.

In order to show the effectiveness of the estimation method according to the present disclosure (hereinafter, may be referred to as "the present method"), a comparison with the conventional method was carried out by an experiment. In the experiment, 349 calls were used for learning the model and 50 calls were used for verification. As multi-valued labels indicating a topic, eight types of labels indicating a topic A to a topic H and a fixed topic S from the first utterance of a call to the switching of the first talk are prepared. In the conventional method, a binary classification model is learned by using a binary label indicating whether or not an utterance is a change of story, and data attached only to the utterance in which the multi-value label is switched as teacher data, and also many. This is a method for learning a multi-value classification model by using only the utterances in which the value labels are switched as teacher data.

First, we compared the estimation accuracy of the paragraph range (the accuracy of dividing the series data in paragraph units) based on the judgment of whether or not the story was switched by the binary classification model. The comparison results are shown in Table 1.

As mentioned above, in this method, the range of paragraphs is estimated by including the utterances that transition from a certain topic to the same topic in the utterances that change the story. Therefore, as shown in Table 1, in this method, the precision rate is lower than that in the conventional method. However, in this method, it has become possible to detect paragraphs and utterances of story switching that could not be detected by the conventional method, so that the recall rate of paragraph division has increased.

Next, we compared the accuracy of topic estimation by the multi-value classification model in utterances that were determined to be story switching by the binary classification model. As described above, in the conventional method, the multi-value classification model is learned using the teacher data in which the multi-value label indicating the topic in the utterance is manually attached only to the utterance in which the multi-value label is switched. On the other hand, in this method, the multi-value classification model 2 was learned using the teacher data supplemented with the multi-value label for the utterance to which the label indicating that the story was switched was manually assigned. Using each of the multi-value classification model learned by the conventional method and the multi-value classification model 2 learned by this method, it is determined that the utterance is a switching utterance by the conventional method and the binary classification model learned by this method. The topic in the utterance was estimated and compared with the topic of the correct answer given manually to the utterance. The results of the comparison (compliance rate) are shown in Table 2.

As shown in Table 2, it was found that this method can estimate the topic in the utterance determined to be the utterance of the change of the story with high accuracy, including the utterance that transitions from a certain topic to the same topic. rice field. The topic S was not evaluated because the utterance of the change of talk is the first utterance of the call.

Finally, the results (F value) of the classification of all utterance topics were evaluated in the 100 calls targeted for evaluation. This evaluation is a comprehensive evaluation of the determination of utterances of story switching by the binary classification model and the estimation of topics by the multi-value classification model. In this method, the multi-value classification model 2 determines that an utterance that transitions from a certain topic to the same topic is also a utterance that switches the story, but the multi-value classification model 2 determines that the transition to the same topic. Many of the utterances were classified as correct topics. Therefore, as shown in Table 3, the overall evaluation result of this method was higher than that of the conventional method.

As described above, in the present embodiment, the estimation device 30 includes a determination unit 32 and a paragraph estimation unit 33. The determination unit 32 is a teacher to which a binary label (first label) indicating whether or not the utterance is switched is given to the utterance or its division unit constituting the series data of the dialogue including a plurality of topics. Whether or not the utterances constituting the series data to be processed using the binary classification model 1 (first model) learned based on the data (first teacher data) are utterances of switching stories. To judge. Based on the result of the determination by the determination unit 32, the paragraph estimation unit 33 estimates the range of paragraphs in the series data to be processed from the utterance immediately before the next changeover to the utterance at the end of the dialogue.

Whether or not the utterance that constitutes the series data is the utterance of the change of story by using the teacher data to which the binary label indicating whether or not the change of the story is given to the utterance or its division unit. It is possible to generate a binary classification model 1 for determining whether or not. Then, based on the result of the determination of the binary classification model 1, the range of paragraphs in the series data can be estimated. Further, by estimating the range of the paragraph in the series data, the range of estimating the topic can be limited to the utterances included in the paragraph, so that the accuracy of estimating the topic in the paragraph can be improved.

A computer can be suitably used to function as each part of the above-mentioned

estimation devices

30, 30a and 30b. In such a computer, a program describing processing contents that realize the functions of the

estimation devices

30, 30a, and 30b is stored in the storage unit of the computer, and this program is stored by the CPU (Central Processing Unit) of the computer. It can be realized by reading and executing. That is, the program can make the computer function as the

estimation devices

30, 30a, 30b described above.

Further, this program may be recorded on a computer-readable medium. It can be installed on a computer using a computer-readable medium. Here, the computer-readable medium on which the program is recorded may be a non-transient recording medium. The non-transient recording medium is not particularly limited, but may be, for example, a recording medium such as a CD-ROM or a DVD-ROM. This program can also be provided via a network.

The present disclosure is not limited to the configuration specified in each of the above-described embodiments, and various modifications can be made without departing from the gist of the invention described in the claims. For example, the functions included in each component can be rearranged so as not to be logically inconsistent, and a plurality of components can be combined or divided into one.

1 Binary classification model (first model)
2 Multi-value classification model (second model)
10 Learning device 11 Input unit 12 Binary classification learning unit 20 Learning device 21 Input unit 22 Multi-value label complement unit 23 Multi-value

classification learning unit

30, 30a, 30b Estimator 31 Input unit 32 Judgment unit 33

Paragraph estimation unit

34, 34a , 34b Topic estimation unit 35

Output unit

36, 36b Keyword extraction unit 37 Clustering unit

Claims

Based on the first teacher data to which the first label indicating whether or not the utterance is switched is attached to the utterance that constitutes the series data of the dialogue including a plurality of topics or the divided unit obtained by dividing the utterance. Using the first model trained in
An estimation method including a paragraph estimation step for estimating the range of paragraphs from the change of story to the utterance immediately before the next change or the utterance at the end of the dialogue in the series data to be processed based on the result of the determination. ..
In the estimation method according to claim 1,
A second model trained based on the second teacher data to which the second label indicating the topic to which the utterance is related is given to the utterance constituting the series data or the division unit obtained by dividing the utterance. An estimation method that further comprises a topic estimation step that estimates a topic in the paragraph or the utterance contained in the paragraph.
In the estimation method according to claim 2,
The second teacher data is given the first label indicating that the utterance is a talk change, and the range in which the topic continues, and the range in which the topic continues, with respect to the utterance of the talk change or the division unit obtained by dividing the utterance. In the series data in which the topic in the range is specified, the utterance to which the first label is attached or the division unit obtained by dividing the utterance is given the second label indicating the topic in the range including the utterance. The estimation method, which is the data generated by.
In the estimation method according to claim 1,
A keyword extraction step that extracts keywords from the utterances contained in the paragraph,
An estimation method further comprising a topic estimation step for estimating a topic in the paragraph or the utterance contained in the paragraph, based on keywords extracted from the utterance contained in the paragraph.
In the estimation method according to claim 4,
Further provided with a clustering step of clustering a plurality of paragraphs whose range is estimated based on the series data of one or more processing targets for each similar paragraph.
In the keyword extraction step, keywords are extracted from the utterances included in the representative paragraph among the paragraphs included in the cluster consisting of similar paragraphs.
In the topic estimation step, an estimation method for estimating a topic in a paragraph constituting a cluster including the representative paragraph based on a keyword extracted from an utterance included in the representative paragraph.
Based on the first teacher data to which the first label indicating whether or not the utterance is switched is attached to the utterance that constitutes the series data of the dialogue including a plurality of topics or the divided unit obtained by dividing the utterance. Using the first model trained in
Based on the result of the determination by the determination unit, the paragraph estimation unit that estimates the range of paragraphs in the series data to be processed from the utterance immediately before the next changeover to the utterance at the end of the dialogue. Estimator equipped.
A program for causing a computer to execute the estimation method according to any one of claims 1 to 5.