WO2021256043A1

WO2021256043A1 - Estimation device, estimation method, learning device, learning method and program

Info

Publication number: WO2021256043A1
Application number: PCT/JP2021/012692
Authority: WO
Inventors: 隆明長谷川; 節夫山田; 和之磯; 正之杉崎
Original assignee: 日本電信電話株式会社
Priority date: 2020-06-16
Filing date: 2021-03-25
Publication date: 2021-12-23
Also published as: JP7425368B2; JPWO2021256043A1; WO2021255840A1

Abstract

The estimation device (30) according to the present disclosure comprises: a determination unit (32) which determines whether or not speech that constitutes series data to be processed is speech indicating a change of subject, using a binary classification model (1), which is pre-learned on the basis of training data for speech that constitutes dialogue series data including a plurality of topics, or for units of division of said speech; and a paragraph estimation unit (33) which, on the basis of the result of the determination by the determination unit (32), estimates, in the series data to be processed, the range of a paragraph from a change of subject until speech just before a subsequent change, or of a paragraph from a change of subject until end-of-dialogue speech.

Description

Estimator, estimation method, learning device, learning method and program

This disclosure relates to an estimation device, an estimation method, a learning device, a learning method and a program.

In the department (so-called contact center) where the operator responds to inquiries about products or services from customers (customers), support for solving problems that customers have is required. In the contact center, the history of customer service by the operator (response log) is created, stored and shared. An operator, a contact center manager, or the like can review the accumulated response log to analyze inquiries from customers and improve the quality of response to customers. When reviewing the response log and looking back on the response with the customer, if the dialogue between the operator and the customer can be divided into talks or topics, the work efficiency of the response review can be improved.

The dialogue between the operator and the customer can be regarded as series data composed of multiple utterances along the time axis. By preparing teacher data with a label indicating the topic in the series data for a series of series data, dialogue is performed by machine learning using DNN (Deep Neural Network) such as RSTM (Long Short-Term Memory). It is possible to learn a classification model for classifying topics in (see Non-Patent Document 1).

In general, contact centers deal with a variety of tasks, depending on the type of product or service they handle, with a small number of topics that can be counted, or a large number of topics that can be counted. In some cases. When trying to classify topics in dialogue into many types of topics using the model described in Non-Patent Document 1, a small amount of teacher data reduces the accuracy of classification, and a large amount of teacher data is used to improve the accuracy. It costs a lot to prepare.

An object of the present disclosure made in view of the above problems is an estimation method, an estimation device, a learning device, a learning method, and a program capable of estimating a paragraph range in a series of dialogue data including a plurality of topics. Is to provide.

In order to solve the above problems, the estimation device according to the present disclosure learns in advance based on the first teacher data for the utterances constituting the series data of the dialogue including a plurality of topics or the divided units obtained by dividing the utterances. Using the first model, the determination unit that determines whether or not the utterance constituting the series data of the processing target is the utterance of the switching of the talk, and the series of the processing target based on the result of the determination. The data includes a paragraph estimation unit that estimates the range of the paragraph from the switching of the talk to the utterance immediately before the next switching or the paragraph from the switching of the talk to the utterance at the end of the dialogue.

Further, in order to solve the above-mentioned problem, the estimation method according to the present disclosure is based on the first teacher data for an utterance constituting the series data of a dialogue including a plurality of topics or a division unit obtained by dividing the utterance. Using the first model learned in advance, the processing target is based on the determination step of determining whether or not the utterance constituting the series data of the processing target is the utterance of switching the talk, and the result of the determination. Includes a paragraph estimation step that estimates the range of the paragraph from one talk to the utterance immediately before the next switch or from the change of the talk to the utterance at the end of the dialogue in the series data of.

Further, in order to solve the above-mentioned problem, whether or not the learning device according to the present disclosure switches the talk with respect to the utterance constituting the series data of the dialogue including a plurality of topics or the divided unit obtained by dividing the utterance. Based on the first teacher data to which the first label indicating is attached, the first model for determining whether or not the utterance constituting the series data to be processed is the utterance of the switching of the talk is learned. The series data to be processed is configured based on the model learning unit 1 and the second teacher data to which the second label indicating the topic in the range is added to the range in which one topic in the series data continues. It includes a second model learning unit that learns a second model that estimates a topic in speech.

Further, in order to solve the above-mentioned problem, whether or not the learning method according to the present disclosure is a change of talk with respect to an utterance constituting series data of a dialogue including a plurality of topics or a division unit obtained by dividing the utterance. Based on the first teacher data to which the first label indicating is attached, the first model for determining whether or not the utterance constituting the series data to be processed is the utterance of the switching of the talk is learned. An utterance that constitutes the series data to be processed based on the second teacher data in which the learning step 1 and the range in which one topic in the series data continues is given a second label indicating the topic in the range. Includes a second learning step of learning a second model that estimates the topic in.

Further, in order to solve the above-mentioned problems, the program according to the present disclosure operates a computer as the above-mentioned estimation device.

According to the estimation device, estimation method, learning device, learning method and program according to the present disclosure, it is possible to estimate the range of paragraphs in the series data of the dialogue including a plurality of topics.

It is a figure which shows the configuration example of the learning apparatus which trains a binary classification model. It is a figure which shows the configuration example of the learning apparatus which trains a multi-value classification model. It is a figure which shows an example of the structure of the estimation apparatus which concerns on 1st Embodiment of this disclosure. It is a figure which shows another example of the structure of the estimation apparatus which concerns on 1st Embodiment of this disclosure. It is a figure which shows still another example of the structure of the estimation apparatus which concerns on 1st Embodiment of this disclosure. It is a flowchart which shows an example of the operation of the multi-valued label complement part shown in FIG. It is a flowchart which shows an example of the operation of the estimation apparatus shown in FIG. It is a flowchart which shows an example of the operation of the estimation apparatus shown in FIG. It is a flowchart which shows an example of the operation of the estimation of the paragraph range by the estimation apparatus shown in FIG. It is a flowchart which shows an example of the operation of the estimation of a topic by the estimation apparatus shown in FIG. It is a figure for demonstrating the learning of a binary classification model and a multi-value classification model. It is a figure for demonstrating the estimation of the topic by the estimation apparatus shown in FIG. It is a figure for demonstrating the estimation of the topic by the estimation apparatus shown in FIG. It is a figure for demonstrating the estimation of the topic by the estimation apparatus shown in FIG. It is a figure for demonstrating the estimation of the topic by the estimation apparatus shown in FIG. It is a figure which shows an example of the structure of the estimation apparatus which concerns on 2nd Embodiment of this disclosure. It is a flowchart which shows an example of the operation of the estimation apparatus shown in FIG. It is a figure for demonstrating the estimation of the topic by the estimation apparatus shown in FIG. It is a figure which shows the configuration example of the learning data creation apparatus which concerns on 3rd Embodiment of this disclosure. It is a figure which shows the structural example of the learning data creation part shown in FIG. It is a figure which shows the structural example of the combination generation part shown in FIG. It is a figure which shows the structural example of the giving part shown in FIG. It is a flowchart which shows an example of the operation of the combination generation part shown in FIG. It is a flowchart which shows an example of the operation of the addition part shown in FIG. It is a figure which shows the structural example of the estimation apparatus which concerns on 3rd Embodiment of this disclosure. It is a figure which shows the structural example of the input part shown in FIG. It is a figure which shows the structural example of the estimation part shown in FIG. It is a figure which shows the structural example of the combination generation part shown in FIG. It is a flowchart which shows an example of the operation of the switching estimation part shown in FIG. 27. It is a figure for demonstrating an example of the operation from the division of a sentence to the generation of a combination ID string by the estimation part shown in FIG. 27. It is a figure for demonstrating an example of the operation from the estimation using the estimation model to the output of the estimation result by the estimation unit shown in FIG. 27. It is a figure for demonstrating another example of the operation from the division of a sentence to the generation of a combination ID string by the estimation part shown in FIG. 27. It is a figure for demonstrating another example of the operation from the estimation using the estimation model to the output of the estimation result by the estimation unit shown in FIG. 27. It is a figure which shows the other structural example of the learning apparatus which concerns on this disclosure. It is a flowchart which shows an example of the operation of the learning apparatus shown in FIG. 32. It is a figure which shows an example of the hardware composition of the estimation apparatus shown in FIG.

Hereinafter, embodiments of the present disclosure will be described with reference to the drawings.

(First Embodiment)
First, the outline of the present disclosure will be described.

In the utterances that make up the series data, words and phrases are relatively often omitted, so the length of the utterance, that is, the number of words may be reduced. Moreover, even if there are few types of topics, the topics may be similar to each other or the order of appearance of the topics may be indefinite. Even in these cases, it takes a lot of cost to prepare teacher data in order to construct a classification model capable of classifying topics.

In order to estimate a topic in the series data of a dialogue containing multiple topics, the range of the paragraph from the change of story (separation) to the utterance immediately before the next change or the paragraph from the change of story to the utterance at the end of the dialogue. It is effective to estimate. If the range of a paragraph can be estimated, the topic can be estimated by limiting the range to the utterances included in the paragraph, so that the topic can be estimated with higher accuracy.

The present disclosure is a paragraph from one story change to the utterance immediately before the next switch or a paragraph from the story switch to the end of the dialogue in a series of dialogue data containing multiple topics, such as a dialogue between an operator and a customer. Regarding the estimation of the range of and the estimation of the topic in the paragraph.

In the following, the dialogue between the operator and the customer at the contact center will be considered as an example. As a case where the operator takes the initiative in conducting dialogue, when solving the problem that the customer has, the operator asks the customer about the current situation or the history so far to find out the cause, and the operator is the customer. There are cases where documents necessary for business procedures are created while conducting interviews about the situation.

In the dialogue in the case described above, the unit of the content that the operator is asking can be regarded as one topic. However, it is difficult to uniquely determine the most appropriate topic type from many topic types. In addition, all the topics in the dialogue as described above are topics in the range related to a specific business, and one topic and another topic are often similar. And it is difficult to distinguish between similar topics. Therefore, it is difficult to divide the entire dialogue into a series of topics.

However, when the operator moves on to the next story, the operator often utters words such as "this time", "in", and "after" to tell the customer that the story will change. In addition, at the end of the talk, the operator often receives the customer's utterance and utters words such as "smart" and "acknowledged" to inform the customer that the talk is over. Since these words do not depend on the content of the story, they are useful for detecting the change of story (break of story).

In the present disclosure, for example, a rule for determining whether or not the utterance in the series data is a story switching utterance is created by using the above-mentioned words and phrases indicating the story switching. Then, in the present disclosure, it is determined whether or not the utterance in the series data is the utterance of the switching of the talk, based on the created rule. Further, in the present disclosure, for example, a teacher who assigns a label indicating that the utterance of the story change is a story change utterance and a label indicating that the other utterances are not the story change utterances. Based on the data, create a model that determines whether or not the utterance is a change of story, and use the judgment result of the created model to describe the paragraph or story from the change of story to the utterance immediately before the next change. Estimate the range of paragraphs from the transition to the utterance at the end of the dialogue. In addition, in this disclosure, the topic in the paragraph or the utterance contained in the paragraph is estimated. Even if the dialogue contains many topics or similar topics, the paragraph from one talk to the utterance immediately before the next one or the paragraph from the talk to the end of the dialogue If the range can be estimated, the topic can be estimated by focusing on the utterances included in the paragraph, so that the topic can be estimated with higher accuracy.

As described above, in the present disclosure, it is determined whether or not the utterances constituting the series data are the utterances of the switching of the talks by using the model learned in advance. Further, in the present disclosure, a model learned based on teacher data may be used for estimating the topic in the paragraph. First, the learning of these models will be described.

Using a model that determines whether or not the utterances that make up the series data are utterances that switch topics, it is determined whether or not the utterances that make up the series data are utterances that switch topics, and that determination is made. The results may be used to estimate the range of paragraphs. However, in order to create a model that determines whether or not the utterances that make up the series data are utterances that switch topics, teacher data with a multi-valued label that indicates the topic for each utterance that makes up the series data. Is required. Creating such teacher data is usually laborious and often difficult. Therefore, in the present embodiment, it is determined whether or not the utterance constituting the series data is an utterance in which the story is switched, and the range of the paragraph is estimated using the determination result. However, if teacher data with a multi-valued label indicating a topic can be prepared for each utterance constituting the series data, the range of paragraphs may be estimated based on the change of topics. Therefore, the "switching of stories" in the present disclosure is a concept including "switching of topics".

FIG. 1 is a diagram showing a configuration example of a learning device 10 for learning a binary classification model 1 for determining whether or not an utterance constituting the series data is an utterance of switching talks.

The learning device 10 shown in FIG. 1 includes an input unit 11 and a binary classification learning unit 12.

The input unit 11 inputs the series data of the dialogue including a plurality of topics. The series data is, for example, text data in which time-series utterances of an operator and a customer are voice-recognized. The series data input to the input unit 11 may be an utterance unit or a division unit in which the utterance is divided (for example, a word unit, a character unit, a kuten unit, etc.). When the series data is input online, the input unit 11 may sequentially input the text data obtained by the voice recognition of each utterance during the dialogue. When the series data is input offline, the input unit 11 may sort the start time or end time of each utterance during the dialogue and input the text data of each utterance.

Further, in the input unit 11, a binary label (switching label) indicating whether or not the utterance is switched, which is given to the utterance constituting the series data or the division unit obtained by dividing the utterance, is input. The binary label is, for example, "1 (switching story)" or "0 (not switching story)", or "True (switching story)" or "False (not switching story)". Labels such as. Further, if the utterance or its division unit is given some label indicating the change of the story, the input unit 11 considers it as "True (change of the story)" and some label indicating the change of the story. If is not given, it may be regarded as "False (not a change of utterance)".

The binary label is manually attached to the utterances that make up the series data or their division units in advance. As mentioned above, there are words and phrases that are often spoken at the transition of the story. Binary labels are given, for example, based on these terms. For example, taking the failure of a device as an example, when it is desired to classify whether or not the topic is related to the failure of the device, the topic of the utterance regarding the failure of the device is "device failure" regardless of the cause. On the other hand, if you want to classify topics according to the cause of the failure, the topic will be different for each cause of the failure. Therefore, depending on how the topic to be classified is decided, the topic may not be switched even if the story is divided. Therefore, when assigning a binary label, it is shown that even an utterance that transitions from a certain topic to the same topic is a change of story for an utterance that may be a change of story or a division unit thereof. It is preferable that a label is attached. By doing so, it is possible to increase the number of positive examples of the utterance of the story change and improve the accuracy of the determination of the utterance of the story change.

As described above, the input unit 11 is a binary label indicating whether or not the series data of the dialogue including a plurality of topics and the utterances constituting the series data or the division unit thereof are switched. (First label) and is entered. The input unit 11 outputs the input series data and the binary label to the binary classification learning unit 12.

The binary classification learning unit 12 learns using the series data and the binary label output from the input unit 11 as teacher data, and determines whether or not the utterance in the series data is a talk switching utterance. Learn model 1 (first model). Therefore, the binary classification model 1 is a model learned in advance based on the teacher data (first teacher data) for the utterances or the division units thereof constituting the series data of the dialogue including a plurality of topics. Whether or not the teacher data (first teacher data) used for learning the binary classification model 1 is a change of talk with respect to the utterance or its division unit constituting the series data of the dialogue including a plurality of topics. It is the data to which the binary label indicating is attached. For model training, LSTM or the like suitable for learning time-series data can be used.

As described above, in the teacher data used for learning the binary classification model 1, for utterances that may be a change of talk or division units thereof, including utterances that transition from one topic to the same topic. , A label indicating that the story is switched is given. Therefore, according to the binary classification model 1 learned using such teacher data, the topics are not switched depending on how the topic to be classified is determined, and the utterances related to the same topic continue in the section. Even if there is, it may be determined that the utterance is a change of story.

Next, with reference to FIG. 2, the configuration of the learning device 20 for learning the multi-value classification model 2 for classifying (estimating) topics will be described.

As shown in FIG. 2, the learning device 20 includes an input unit 21, a multi-value label complement unit 22, and a multi-value classification learning unit 23.

The input unit 21 inputs the series data of the dialogue including a plurality of topics. Further, the input unit 21 inputs a binary label indicating whether or not the utterance constitutes the series data or the division unit thereof, which indicates whether or not the utterance is switched. Further, the input unit 21 inputs a range in which one topic continues in the series data and a multi-valued label (second label) indicating the topic in the range. The series data and the binary label are the same as the series data and the binary label input to the input unit 11 shown in FIG. Multi-valued labels are given manually. Specifically, in the series data, a range in which one topic continues is specified, and a multi-valued label indicating a topic in the specified range is assigned from labels of a plurality of topics. The binary label and the multi-valued label for one series data may be input in separate files, or may be input together in one file.

The input unit 21 outputs the input series data, binary label, and multi-value label to the multi-value label complement unit 22.

The multi-value label complement unit 22 generates teacher data (second teacher data) for learning the multi-value classification model 2 from the series data, the binary label, and the multi-value label input from the input unit 21. Specifically, the multi-valued label complementing unit 22 assigns a multi-valued label indicating a topic in the range including the utterance to the utterance or the division unit thereof to which the label indicating that the utterance is switched is given. do. As described above, in assigning a binary label as teacher data, for an utterance that may be a change of story or a division unit thereof, including an utterance that transitions from a certain topic to the same topic. A label indicating that it is a switch is given. Therefore, for example, even if the utterances are within the range in which the utterances related to the same topic continue, a label indicating that the utterances are switched may be given. The multi-valued label complementing unit 22 also assigns a multi-valued label indicating a topic in the range including the utterance to such an utterance or a division unit thereof. By doing so, it is possible to increase the teacher data of utterances related to each topic and improve the accuracy of topic estimation.

The multi-valued label complementing unit 22 outputs the utterance to which the multi-valued label is attached or the division unit thereof and the multi-valued label assigned to the utterance or the division unit to the multi-value classification learning unit 23.

The multi-value classification learning unit 23 uses the utterance or its division unit output from the multi-value label complement unit 22 and the multi-value label given to the utterance or division unit as teacher data (second teacher data). , Multi-value classification model 2 (second model) is learned. Therefore, the multi-value classification model 2 is a model learned in advance based on the teacher data (second teacher data) for the utterances constituting the series data or the division units thereof. The teacher data used for learning the multi-valued classification model 2 is given a binary label indicating that the utterance of the story change or its division unit is a story change, and the range in which the topic continues and its division unit. Generated by assigning a multi-valued label indicating a topic in the range including the utterance to the utterance or its division unit to which the label indicating that the utterance is switched is given in the series data in which the topic in the range is specified. It is the data that was made.

Next, the configuration of the estimation device 30 according to the present embodiment will be described with reference to FIG. The estimation device 30 according to the present embodiment is a dialogue from a paragraph from a talk change to an utterance immediately before the next switch in a series data of a dialogue including a plurality of topics such as a dialogue between an operator and a customer. Estimate the range of paragraphs to the last utterance and estimate the topic in that paragraph.

As shown in FIG. 3, the estimation device 30 according to the present embodiment includes an input unit 31, a determination unit 32, a paragraph estimation unit 33, a topic estimation unit 34, and an output unit 35.

The input unit 31 inputs series data including a plurality of topics. The series data input to the input unit 31 is data to be processed that is the target of estimation of the paragraph range and the topic in the paragraph. The series data is, for example, text data in which time-series utterances of an operator and a customer are voice-recognized. When the series data is input online, the input unit 31 may sequentially input the text data obtained by the voice recognition of each utterance during the dialogue. Further, when the series data is input offline, the input unit 31 may sort by the start time or the end time of each utterance during the dialogue and input the text data of each utterance. The input unit 31 outputs the input series data to the determination unit 32.

The determination unit 32 uses the binary classification model 1 (first model) to determine whether or not the utterance constituting the series data output from the input unit 31 is an utterance of switching of the story, and determines. The result of is output to the paragraph estimation unit 33. As described above, the binary classification model 1 is given a binary label indicating whether or not the talk is switched to the utterance or its division unit, which constitutes the series data of the dialogue including a plurality of topics. It is a model trained in advance based on the teacher data (first teacher data).

Based on the result of the determination by the determination unit 32, the paragraph estimation unit 33 determines the range of the paragraph from the talk switching to the utterance immediately before the next switching or the paragraph from the talk switching to the utterance at the end of the dialogue in the series data. presume. Specifically, the paragraph estimation unit 33 ranges from the utterance determined by the determination unit 32 to be the utterance of the switching of the story to the utterance immediately before the utterance determined to be the next utterance of the switching of the story. Estimate the range as one paragraph. As described above, in the teacher data used for learning the binary model 1, even if the utterances are within the range in which the utterances related to the same topic continue, a label indicating that the utterances are switched is given. There is. Therefore, the paragraph estimation unit 33 may classify the range into a plurality of paragraphs even if the utterances related to the same topic continue.

The topic estimation unit 34 uses the multi-value classification model 2 (second model) to estimate the topic in the paragraph or the utterance contained in the paragraph whose range is estimated by the paragraph estimation unit 33. As described above, the multi-value classification model 2 is pre-learned based on the teacher data to which the utterances constituting the series data or the division units thereof are given multi-value labels indicating the topics to which the utterances are related. It is a model. The teacher data used for learning the multi-valued classification model 2 is given a binary label indicating that the utterance of the story change or its division unit is a story change, and the range in which the topic continues and the range thereof. The topic in is generated using the identified series data. Specifically, the teacher data used for learning the multi-valued classification model 2 includes the utterance in the above-mentioned series data in the utterance or the division unit thereof to which the binary label indicates that the utterance is a change of story. It is generated by adding a multi-valued label indicating the topic in the range.

The output unit 35 outputs the utterances constituting the paragraph for each paragraph whose range is estimated in the series data. Further, the output unit 35 may output a multi-valued label indicating a topic in the paragraph, the disclosure time and the end time of the paragraph, and the like.

As described above, in the present embodiment, the determination unit 32 assigns a binary label indicating whether or not the utterance is switched to the utterance or the division unit thereof constituting the series data of the dialogue including a plurality of topics. Using the binary classification model 1 learned in advance based on the teacher data, it is determined whether or not the utterance constituting the series data to be processed is the utterance of the switching of the talk. Then, the paragraph estimation unit 33 estimates the range of paragraphs in the series data to be processed based on the result of the determination by the determination unit 32. Further, the topic estimation unit 34 estimates the topic in the paragraph or the utterance included in the paragraph whose range is estimated by the paragraph estimation unit 33 by using the multi-value classification model 2. Further, the output unit 35 outputs an utterance for each paragraph whose range is estimated, a multi-valued label indicating a topic in the paragraph, a disclosure time and an end time of the paragraph, and the like.

Further, in the present embodiment, the learning device 10 learns the teacher data to which the binary label indicating whether or not the utterance is switched is given to the utterance or the division unit thereof, so that the series data can be obtained. It is possible to generate a binary classification model 1 for determining whether or not the constituent utterances are utterances of switching talks. Further, the learning device 20 is included in a paragraph or a paragraph by learning the teacher data to which the multi-valued label indicating the topic to which the utterance is related is attached to the utterance or the division unit thereof constituting the series data. It is possible to learn the multi-value classification model 2 for determining a topic in an utterance. Further, the estimation device 30 can estimate the range of paragraphs in the series data based on the result of the determination of the binary classification model 1. In addition, the estimation device 30 can use the multi-value classification model 2 to estimate a paragraph whose range has been estimated or a topic in an utterance constituting the paragraph. Therefore, according to the estimation device 30 according to the present embodiment, from the series data of the dialogue including a plurality of topics, from the paragraph from the switching of the talk to the utterance immediately before the next switching, or from the switching of the talk to the utterance at the end of the dialogue. The range of paragraphs can be estimated. Further, according to the estimation device 30 according to the present embodiment, by estimating the range of the paragraph in the series data, the topic can be estimated only for the utterances included in the paragraph, so that the accuracy of the estimation of the topic can be improved. Can be planned.

In FIG. 3, the estimation device 30 has been described with an example of estimating a topic using the multi-value classification model 2, but the present disclosure is not limited to this. As described above, in the learning of the multi-value classification model 2, teacher data in which one topic in the series data is continuous and the topic in the range is manually specified is used. It is also relatively easy to prepare such teacher data when targeting a small number of topics. On the other hand, it may be difficult to prepare teacher data that specifies a range in which one topic continues and a topic in that range, such as when a large number of topics are targeted. In the present disclosure, even in such a case, it is possible to estimate the topic without using the multi-value classification model 2.

FIG. 4 is a diagram showing a configuration example of an estimation device 30a for estimating a topic without using the multi-value classification model 2 according to the present embodiment. In FIG. 4, the same components as those in FIG. 3 are designated by the same reference numerals, and the description thereof will be omitted.

As shown in FIG. 4, the estimation device 30a includes an input unit 31, a determination unit 32, a paragraph estimation unit 33, a keyword extraction unit 36, a topic estimation unit 34a, and an output unit 35. The estimation device 30a shown in FIG. 4 is different from the estimation device 30 shown in FIG. 3 in that the keyword extraction unit 36 is added and the topic estimation unit 34 is changed to the topic estimation unit 34a.

The keyword extraction unit 36 extracts at least one keyword from the utterances included in the paragraph whose range is estimated by the paragraph estimation unit 33. Any method can be used as the method for extracting keywords, and for example, an existing method such as tf-idf (Term Frequency-Inverse Document Frequency) can be used. The number of keywords extracted by the keyword extraction unit 36 may be limited to a predetermined number in advance, or may be specified by the user.

The topic estimation unit 34a estimates the topic in the paragraph or the utterance contained in the paragraph based on the keywords extracted from the utterance included in the paragraph by the keyword extraction unit 36. The topic estimation unit 34a may, for example, estimate the extracted keyword as a paragraph or a topic in an utterance contained in the paragraph. Further, the topic estimation unit 34a may estimate, for example, a topic having a high similarity to the extracted keyword from a plurality of predetermined topics as a paragraph or a topic in the utterance included in the paragraph.

As described above, according to the estimation device 30a shown in FIG. 4, it is possible to estimate the topic in the paragraph or the utterance contained in the paragraph without using the multi-value classification model 2. Therefore, even when it is difficult to prepare a range of topics and a large amount of teacher data in which the topics in the range are specified, it is possible to estimate the topics in the series data.

FIG. 5 is a diagram showing a configuration example of the estimation device 30b according to the present embodiment. Like the estimation device 30a shown in FIG. 4, the estimation device 30b shown in FIG. 5 estimates the topic without using the multi-value classification model 2. In FIG. 5, the same components as those in FIG. 4 are designated by the same reference numerals, and the description thereof will be omitted.

As shown in FIG. 5, the estimation device 30b includes an input unit 31, a determination unit 32, a paragraph estimation unit 33, a clustering unit 37, a keyword extraction unit 36b, a topic estimation unit 34b, and an output unit 35. Be prepared. The estimation device 30b shown in FIG. 5 has a point that a clustering unit 37 is added, a point that the keyword extraction unit 36 is changed to a keyword extraction unit 36b, and a topic estimation unit 34a, as compared with the estimation device 30a shown in FIG. It is different from the point changed to the topic estimation unit 34b.

In the estimation device 30b shown in FIG. 5, at least one or more series data is input. The clustering unit 37 clusters a plurality of paragraphs whose range is estimated by the paragraph estimation unit 33 for one or more input series data for each similar paragraph. As the clustering method, any existing method can be used. The clustering unit 37 determines a representative paragraph in a cluster consisting of similar paragraphs. The clustering unit 37 determines, for example, the paragraph at the center of the cluster as the representative paragraph among the paragraphs constituting the cluster. Further, the clustering unit 37 may determine, for example, any paragraph among the paragraphs constituting the cluster as a representative paragraph.

The keyword extraction unit 36b extracts keywords from the utterances included in the representative paragraph determined by the clustering unit 37 among the paragraphs constituting the cluster.

The topic estimation unit 34b estimates the topic in the paragraph constituting the cluster based on the keywords extracted by the keyword extraction unit 36b from the utterances included in the paragraph representing the cluster. Specifically, the topic estimation unit 34b estimates a topic estimated based on a keyword extracted from an utterance included in a paragraph representing a cluster as a topic in all paragraphs constituting the cluster.

In FIGS. 3 to 5, the

estimation devices

30, 30a, and 30b have been described with reference to an example of processing the result of voice recognition of the dialogue between the operator and the customer in the contact center, but the present disclosure is limited to this. is not it. For example, in the

estimation devices

30, 30a, 30b, a morphological analysis unit that performs morphological analysis for text chat may be provided after the input unit 31.

Further, in FIGS. 3 to 5, the description has been made using an example in which series data in which a plurality of utterances are arranged in chronological order is input, but the present disclosure is not limited to this. In order to input the utterances constituting the series data one by one, a function unit for extracting the utterances one by one from the series data may be provided in front of the input unit 31.

FIG. 6 is a flowchart for explaining the complementation of the multi-valued label in the learning device 20 shown in FIG.

The multi-valued label complementing unit 22 reads the utterances to which the multi-valued label indicating the topic and the binary label indicating the switching of the talk are attached one by one from the series data input to the input unit 21 (step S11). .. The multi-valued label is given only to the first utterance in the range indicating the topic, and is not given to other utterances. The binary label indicating the change of talk is given only to the utterance showing the change of talk, and is not given to other utterances.

The multi-valued label complementing unit 22 determines whether or not a multi-valued label indicating a topic is attached to the read utterance (step S12).

When it is determined that the multi-value label is attached (step S12: Yes), the multi-value label complementing unit 22 separates the multi-value label of the read utterance so that the multi-value label of the read utterance can be understood. Store the multi-valued label in a temporary storage device. When the multi-value label already stored in the multi-value label temporary storage device exists, the multi-value label complementing unit 22 gives the multi-value attached to the speech that reads the stored multi-value label. The label is updated and stored in the multi-value label temporary storage device (step S13).

When it is determined that the multi-valued label is not attached (step S12: No), or when the multi-valued label attached to the read utterance is updated and stored, the multi-valued label complementing unit 22 adds the read utterance to the read utterance. , It is determined whether or not a binary label indicating that the utterance is switched is attached (step S14).

When it is determined that the binary label indicating that the utterance is switched is attached (step S14: Yes), the multi-value label complementing unit 22 stores the multi-value label stored in the multi-value label temporary storage device. It is given to the read utterance (step S15). As described above, when the read utterance is given a binary label indicating that the dialogue is switched, the multi-valued label complementing unit 22 indicates a multi-valued topic in the series data in the range including the utterance. Give a value label.

When it is determined that the binary label indicating that the talk is switched is not given (step S14: No), or when the read utterance is given a multi-value label, the multi-value label complementing unit 22 reads. It is determined whether or not the utterance is the utterance at the end of the dialogue (step S16).

When it is determined that the read utterance is the utterance at the end of the dialogue (step S16: Yes), the multi-value label complementing unit 22 ends the process.

When it is determined that the read utterance is not the utterance at the end of the dialogue (step S16: No), the multi-value label complementing unit 22 returns to the process of step S11 and reads the next utterance.

In FIG. 6, the multi-valued label is given to only the first utterance in the range indicating the topic, and is not given to other utterances. However, all the utterances in the range indicating the topic are given in advance. May be labeled with a multi-valued label for that topic. In this case, if the multi-valued label is deleted from the utterances that are not given the binary label indicating the change of story, the multi-valued label indicating the topic is given only to the utterances that are given the binary label indicating the change of story. Label.

In this way, any method may be used as long as a multi-valued label indicating the topic is attached to the utterance of the change of story.

Next, the operation of the estimation device 30 shown in FIG. 3 will be described. FIG. 7 is a flowchart showing an example of the operation of the estimation device 30, and is a diagram for explaining an estimation method by the estimation device 30.

The determination unit 32 reads the utterances one by one from the series data of the processing target input to the input unit 31 (step S21). The determination unit 32 uses the binary classification model 1 to determine whether or not the read utterance is a talk switching utterance (step S22).

The paragraph estimation unit 33 determines whether the read utterance is determined by the determination unit 32 to be a switching utterance, or whether the read utterance is an utterance at the end of the dialogue (step). S23).

When it is determined that the read utterance is not the utterance of the switching of the talk and the read utterance is not the utterance at the end of the dialogue (step S23: No), the paragraph estimation unit 33 determines the read utterance. , Accumulate as utterances constituting the paragraph (step S24). When the read utterances are accumulated, the process is repeated from step S21.

When it is determined that the read utterance is the utterance of the switching of the story, or the read utterance is determined to be the utterance at the end of the dialogue (step S23: Yes), the paragraph estimation unit 33 has accumulated. It is determined whether or not there is an utterance (step S25).

When it is determined that there are accumulated utterances (step S25: Yes), the paragraph estimation unit 33 estimates that the range of the accumulated utterances is a paragraph, and the accumulated utterances are used as the utterances constituting the paragraph, and the topic estimation unit 34 Output to. The topic estimation unit 34 estimates the topic in the paragraph whose range has been estimated by the paragraph estimation unit 33 using the multi-value classification model 2 (step S26).

In FIG. 7, the explanation is given using an example of estimating a topic for each paragraph using the multi-value classification model 2, but the present disclosure is not limited to this. The topic estimation unit 34 may estimate the topic in at least one utterance unit included in the paragraph. In this case, the topic estimation unit 34 may estimate the topic using only the first utterance of the paragraph, or may estimate the topic using a predetermined number of utterances from the first utterance of the paragraph. .. When a topic is estimated in units of one or more utterances, the multi-value classification model 2 is learned based on teacher data to which a multi-value label is attached to each unit for estimating a topic.

The topic estimation unit 34 attaches a multi-valued label indicating the estimated topic to the paragraph (step S27). The paragraph estimation unit 33 resets the accumulation of utterances (step S28), and determines whether or not the read utterance is the utterance at the end of the dialogue (step S29).

When it is determined that the read utterance is not the utterance at the end of the dialogue (step S29: No), the paragraph estimation unit 33 returns to the process of step S24 and accumulates the read utterance. By doing this, the read utterance is accumulated as the first utterance of a new paragraph.

When it is determined that the read utterance is the utterance at the end of the dialogue (step S29: Yes), the paragraph estimation unit 33 ends the process.

As described above, the estimation method by the estimation device 30 includes a determination step (step S22) and a paragraph estimation step (steps S23 to S25). In the determination step, teacher data to which a binary label (first label) indicating whether or not the utterance is switched is given to the utterance or its division unit constituting the series data of the dialogue including a plurality of topics. Whether or not the utterances constituting the series data to be processed using the binary classification model 1 (first model) learned in advance based on (first teacher data) are utterances of switching stories. To judge. In the paragraph estimation step, based on the result of the determination, the range of the paragraph from the talk switching to the utterance immediately before the next switching or the paragraph from the talk switching to the utterance at the end of the dialogue is estimated in the series data to be processed. ..

Whether the utterance that constitutes the series data is the utterance of the change of story by learning the teacher data to which the binary label indicating whether or not the change of the story is given to the utterance or its division unit. It is possible to generate a binary classification model 1 for determining whether or not. Then, based on the result of the determination of the binary classification model 1, the range of paragraphs in the series data to be processed can be estimated. Therefore, it is possible to estimate the range of paragraphs in the series data of a dialogue containing a plurality of topics.

Further, the estimation method according to the present embodiment may further include a topic estimation step (step S26). In the topic estimation step, many pre-learned based on teacher data to which a multi-valued label (second label) indicating a topic related to the utterance is given to the utterances constituting the series data or the division unit thereof. The value classification model 2 (second model) is used to estimate the topic in the paragraph or the utterance contained in the paragraph. By estimating the range of the paragraph, the topic can be estimated only for the utterances included in the paragraph, so that the estimation accuracy of the topic can be improved.

Next, the operation of the estimation device 30a shown in FIG. 4 will be described. FIG. 8 is a flowchart showing an example of the operation of the estimation device 30a shown in FIG. 4, and is a diagram for explaining an estimation method by the estimation device 30a. In FIG. 8, the same processing as in FIG. 7 is designated by the same reference numerals, and the description thereof will be omitted.

When it is determined that there are accumulated utterances (step S25: Yes), the paragraph estimation unit 33 estimates that the range of the accumulated utterances is a paragraph, and outputs the accumulated utterances to the keyword extraction unit 36. The keyword extraction unit 36 extracts keywords from the utterances included in the paragraph whose range is estimated by the paragraph estimation unit 33 (step S31). The topic estimation unit 34a estimates the topic in the paragraph or the utterance included in the paragraph based on the keyword extracted by the keyword extraction unit 36 from the utterance included in the paragraph (step S32).

As described above, the estimation method by the estimation device 30a includes a keyword extraction step (step S31) and a topic estimation step (step S32). In the keyword extraction step, keywords are extracted from the utterances contained in the paragraph whose range is estimated. In the topic estimation step, the topic in the paragraph or the utterance contained in the paragraph is estimated based on the keywords extracted from the utterance contained in the paragraph.

Next, the operation of the estimation device 30b shown in FIG. 5 will be described. FIG. 9 is a flowchart showing an example of the operation of estimating the range of the paragraph by the estimation device 30b shown in FIG. 5, and is a diagram for explaining the estimation method by the estimation device 30b. In FIG. 9, the same processing as in FIG. 7 is designated by the same reference numerals, and the description thereof will be omitted.

When the estimation device 30b determines that there are accumulated utterances (step S25: Yes), the paragraph estimation unit 33 estimates that the range of the accumulated utterances is a paragraph. Then, the paragraph estimation unit 33 resets the accumulation of utterances (step S28).

FIG. 10 is a flowchart showing an example of the operation of estimating a topic by the estimation device 30b shown in FIG. 5, and is a diagram for explaining an estimation method by the estimation device 30b.

The clustering unit 37 reads the paragraph whose range has been estimated by the paragraph estimation unit 33 (step S41). The clustering unit 37 reads a plurality of paragraphs contained in at least one or more series data. That is, the clustering unit 37 repeats the process of step S41 as many times as necessary.

The clustering unit 37 clusters a plurality of read paragraphs for each similar paragraph (step S42).

Next, the clustering unit 37 determines whether or not there are unprocessed clusters (step S43). An unprocessed cluster is a cluster in which paragraphs contained in the cluster are not given multi-value labels.

When it is determined that an unprocessed cluster exists (step S43: No), the clustering unit 37 determines one of the unprocessed clusters as the cluster to be processed, and the paragraph included in the cluster to be processed is included. A representative paragraph is determined from the inside (step S44). The clustering unit 37 determines, for example, the paragraph at the center of the cluster as the representative paragraph.

The keyword extraction unit 36b extracts keywords from the utterances included in the representative paragraph of the cluster determined by the clustering unit 37 (step S45).

The topic estimation unit 34b estimates the topic in the paragraph representing the cluster based on the keywords extracted by the keyword extraction unit 36b (step S46). Next, the topic estimation unit 34b determines whether or not there is an unprocessed paragraph (step S47). The unprocessed paragraph is a paragraph included in the cluster to be processed that is not given a multi-value label.

When it is determined that there is an unprocessed paragraph (step S47: No), the topic estimation unit 34b estimates the unprocessed paragraph included in the cluster based on the keyword extracted from the representative paragraph of the cluster. Is given a multi-valued label indicating (step S48). Then, the topic estimation unit 34b returns to the process of step S47.

When the topic estimation unit 34b determines that there is no unprocessed paragraph (step S47: Yes), the process is repeated from step S43.

As described above, the estimation method by the estimation device 30b further includes a clustering step (step S42). In the clustering step, a plurality of paragraphs whose range is estimated based on one or a plurality of series data are clustered for each similar paragraph. In the keyword extraction step, keywords are extracted from the utterances included in the representative paragraph among the paragraphs included in the cluster consisting of similar paragraphs. In the topic estimation step, the topic in the paragraphs constituting the cluster including the representative paragraph is estimated based on the keywords extracted from the utterances included in the representative paragraph.

Next, model learning (binary classification model 1 and multi-value classification model 2) will be described using a specific example shown in FIG. In the following, it is assumed that the series data includes five topics, "topic A", "topic B", "topic C", "topic D", and "topic E".

As shown in FIG. 11, in the series data used as teacher data, the range in which one topic continues and the topic in that range are manually specified, and for each range in which one topic continues, the topic in that range. A multi-valued label indicating In addition, a binary label indicating whether or not the utterance is switched is manually attached to the utterances constituting the series data. In addition, in FIG. 11, for the sake of simplification of the figure, it is shown that the utterance is the utterance of the talk change only for the utterance of the talk change. As described above, even within the range in which the utterances related to one topic continue, a binary flag indicating that the utterances are switched is given to the utterances that are switched. Therefore, in FIG. 11, for example, an utterance existing in the middle of the range in which the utterance related to the topic A continues may be given a binary label indicating that the utterance is a change of talk.

The above-mentioned series data and binary label are input to the learning device 10, and the binary classification model 1 is trained using LSTM or the like based on the input series data and binary label.

Further, the above-mentioned series data, binary label and multi-value label are input to the learning device 20. In the learning device 20, the multi-valued label is complemented. That is, as shown in FIG. 11, for an utterance to which a label indicating that the utterance is switched is given, a multi-valued label indicating a topic in the range of series data including the utterance is given. By doing so, teacher data is created with a multi-valued label indicating the topic to which the utterance is related to the utterances constituting the series data. As described above, a multi-valued label indicating a topic related to the utterance may be attached to the division unit of the utterance constituting the series data.

Based on the created teacher data, the multi-value classification model 2 is learned using LSTM or the like. In the learning of the multi-value classification model 2, only the utterances with the multi-value label may be learned, or the utterances of the entire paragraph including the utterances with the multi-value label may be learned.

FIG. 12 is a diagram showing an example of topic estimation by the estimation device 30 shown in FIG. In FIG. 12, it is assumed that the multi-valued classification model 2 is learned in utterance units.

When the series data of one dialogue is input to the estimation device 30, as shown in FIG. 12, whether or not the utterances constituting the series data are the utterances of switching of the talks using the binary classification model 1. It is judged. Then, the range from the utterance of the change of story to the utterance immediately before the utterance of the change of the next story or the utterance of the change of story to the utterance at the end of the dialogue is estimated to be one paragraph.

Next, as shown in FIG. 12, among the utterances included in the paragraph whose range is estimated, the utterance determined to be the utterance of the switching of the utterance is estimated by the multi-value classification model 2 as the topic in the utterance. Will be done. In the multi-valued classification model 2, learning may be performed not in utterance units but in paragraph units. In this case, as shown in FIG. 13, the topic is estimated in paragraph units by the multi-value classification model 2.

FIG. 14 is a diagram showing an example of topic estimation by the estimation device 30a shown in FIG.

When the series data of one dialogue is input to the estimation device 30a, as shown in FIG. 14, whether or not the utterance constituting the series data is the utterance of switching of the talk using the binary classification model 1. It is judged. Then, the range from the utterance of the change of story to the utterance immediately before the utterance of the change of the next story is estimated to be one paragraph.

Next, keywords are extracted from the utterances included in the paragraph whose range is estimated, the topic of that paragraph is estimated based on the extracted keywords, and a multi-valued label indicating the estimated topic is given. In this way, the topic in the paragraph can be estimated without using the multi-valued classification model 2. Therefore, even when it is difficult to prepare the teacher data necessary for learning the multi-valued classification model 2, the topic of the paragraph included in the series data can be estimated. Note that FIG. 14 shows an example in which different multi-value labels (“Topic 1” to “Topic 10”) are assigned to each paragraph, but these are necessarily different topics. Do not mean.

FIG. 15 is a diagram showing an example of topic estimation by the estimation device 30b shown in FIG.

When the series data of one or more dialogues is input to the estimation device 30b, as shown in FIG. 15, is the utterance constituting the series data the utterance of the switching of the talks using the binary classification model 1? It is judged whether or not. Then, the range from the utterance of the change of story to the utterance immediately before the utterance of the change of the next story is estimated to be one paragraph.

Next, as shown in FIG. 15, a plurality of paragraphs whose range is estimated are clustered for each similar paragraph. A representative paragraph is determined from a cluster of similar paragraphs, and keywords are extracted from the utterances contained in the representative paragraph. In FIG. 15, the paragraph shown by the thick line indicates the representative paragraph.

Next, the topic in the representative paragraph is estimated based on the keywords extracted from the utterances included in the representative paragraph of the cluster, and a multi-valued label indicating the estimated topic is given to the representative paragraph. Further, as shown in FIG. 15, other paragraphs constituting the cluster are also given the same multi-valued label as the representative paragraph of the cluster.

In order to show the effectiveness of the estimation method according to this embodiment (hereinafter, may be referred to as "this method"), a comparison with the conventional method was made by experiment. In the experiment, 349 calls were used for learning the model and 50 calls were used for verification. As multi-valued labels indicating a topic, eight types of labels indicating a topic A to a topic H and a fixed topic S from the first utterance of a call to the switching of the first talk are prepared. In the conventional method, a binary classification model is learned by using data in which a binary label indicating whether or not an utterance is a change of talk is given only to an utterance in which a multi-value label is switched as teacher data, and a multi-value label is used. This is a method of learning a multi-valued classification model using only switching utterances as teacher data.

First, we compared the estimation accuracy of the paragraph range (the accuracy of dividing the series data in paragraph units) based on the judgment of whether or not the story was switched by the binary classification model. The comparison results are shown in Table 1.

As mentioned above, in this method, the range of paragraphs is estimated by including the utterances that transition from a certain topic to the same topic in the utterances that change the story. Therefore, as shown in Table 1, in this method, the precision rate is lower than that in the conventional method. However, in this method, it has become possible to detect paragraphs and utterances of story switching that could not be detected by the conventional method, so that the recall rate of paragraph division has increased.

Next, we compared the accuracy of topic estimation by the multi-value classification model in utterances that were determined to be story switching by the binary classification model. As described above, in the conventional method, the multi-value classification model is generated by learning the teacher data to which the multi-value label indicating the topic in the utterance is manually attached only to the utterance in which the multi-value label is switched. On the other hand, in this method, the multi-value classification model 2 was generated by learning the teacher data supplemented with the multi-value label for the utterance to which the label indicating that the story was switched was manually assigned. Using each of the multi-value classification model learned by the conventional method and the multi-value classification model 2 learned by this method, it is determined that the utterance is a switching utterance by the conventional method and the binary classification model learned by this method. The topic in the utterance was estimated and compared with the topic of the correct answer given manually to the utterance. The results of the comparison (compliance rate) are shown in Table 2.

As shown in Table 2, it was found that this method can estimate the topic in the utterance determined to be the utterance of the change of the story with high accuracy, including the utterance that transitions from a certain topic to the same topic. rice field. The topic S was not evaluated because the utterance of the change of talk is the first utterance of the call.

Finally, the results (F value) of the classification of all utterance topics were evaluated in the 100 calls targeted for evaluation. This evaluation is a comprehensive evaluation of the determination of utterances of story switching by the binary classification model and the estimation of topics by the multi-value classification model. In this method, the multi-value classification model 2 determines that an utterance that transitions from a certain topic to the same topic is also a utterance that switches the story, but the multi-value classification model 2 determines that the transition to the same topic. Many of the utterances were classified as correct topics. Therefore, as shown in Table 3, the overall evaluation result of this method was higher than that of the conventional method.

As described above, in the present embodiment, the estimation device 30 includes a determination unit 32 and a paragraph estimation unit 33. The determination unit 32 is a teacher data (first teacher) to which a binary label indicating whether or not the talk is switched is given to the utterance or the division unit thereof constituting the series data of the dialogue including a plurality of topics. Using the binary classification model 1 (first model) learned in advance based on the data), it is determined whether or not the utterance constituting the series data to be processed is the utterance of the switching of the talk. Based on the result of the determination by the determination unit 32, the paragraph estimation unit 33 is a paragraph in the series data to be processed from the change of story to the utterance immediately before the next change, or the paragraph from the change of story to the end of the dialogue. Estimate the range of.

Whether the utterance that constitutes the series data is the utterance of the change of story by learning the teacher data to which the binary label indicating whether or not the change of the story is given to the utterance or its division unit. It is possible to generate a binary classification model 1 for determining whether or not. Then, based on the result of the determination of the binary classification model 1, the range of paragraphs in the series data can be estimated. Further, by estimating the range of the paragraph in the series data, the range of estimating the topic can be limited to the utterances included in the paragraph, so that the accuracy of estimating the topic in the paragraph can be improved.

(Second embodiment)
In the first embodiment, it has been described by using an example of determining whether or not the utterance or the division unit thereof constituting the series data is a switching of the utterance, and estimating the range of the paragraph based on the determination result. However, as described above, it may be determined whether or not the utterance or the division unit thereof constituting the series data is a change of topic, and the range of the paragraph may be estimated based on the determination result.

As mentioned above, the dialogue between the operator and the customer in the contact center can be regarded as series data along the time axis. A method called Text Tiling is known as a method of dividing series data into sections of topics that are objectively classified with respect to a series of series data (see, for example, Reference 1). In this method, the text is divided at the minimum point of the degree of cohesion based on the cohesiveness of words in the vicinity of the text. In addition, a method called Topic Tiling that divides text using Latent Dirichlet Allocation (LDA), which is a representative topic model, has also been proposed (see Reference 2). Further, a method of classifying each data of the time series data into the label to which the data belongs has been proposed based on the model learned from the teacher data with the labels of the predetermined classification (Reference 3). reference).
[Reference 1]
Tsutomu Hirao, Kei Kitauchi, Tsuyoshi Kitani "Text Segmentation Based on Vocabulary Cohesion and Word Importance" IPSJ Journal, 41 (SIG_3 (TOD_6)) pp.24-36, 2000-05-15.
[Reference 2]
M.Riedl and C.Biemann, TopicTiling: A Text Segmentation Algorithm based on LDA ", Proceedings of the 50th ACL 2012, 2012.
[Reference 3]
Yuta Tsuboi, 2 others, "Natural language processing by deep learning", Kodansha, May 24, 2017, p.32-36

However, in a dialogue about a specific service or product, such as a dialogue in a contact center, the contact center can be analyzed later, such as whether the response is performed according to a script created in advance. It is required to classify into subjective topics from the side. Subjective topics are topics that are categorized, for example, from the perspective of isolating the cause of the customer's inability to use a particular service, or from the perspective of interviewing the needs or desires of the operator to the customer on a sales call. be. In these dialogues, the same keywords as service names, product names, or related vocabularies appear everywhere in the dialogues, so even if the content is a topic that you want to distinguish subjectively, it is superficial and objective. Indistinguishable topics make up the majority of the dialogue. Therefore, the methods described in

References

1 and 2 cannot accurately divide and classify dialogues for each subjective topic.

In addition, in the dialogue of the contact center, the utterance itself is short, and there are some utterances in which it is not possible to uniquely determine which topic the utterance belongs to. Such utterances will be labeled with a topic that is different from the original topic. In a model that trains teacher data with a label different from the original topic, the accuracy of classification is reduced. Therefore, in the method described in Reference 3, it is difficult to appropriately classify each utterance including a short conversation input in chronological order by a subjective topic.

Hereinafter, the configuration and operation of the estimation device 30c according to the second embodiment of the present disclosure will be described. The estimation device 30c according to the present embodiment determines whether or not the utterance constituting the series data or the division unit thereof is a topic switching, and estimates the range of the paragraph based on the determination result.

FIG. 16 is a diagram showing a configuration example of the estimation device 30c according to the present embodiment.

As shown in FIG. 16, the estimation device 30c according to the present embodiment includes an input unit 41, a determination unit 42, a topic estimation unit 43, a paragraph estimation unit 44, and an output unit 45.

The input unit 41 inputs the series data of the dialogue including a plurality of topics. The series data input to the input unit 41 is data to be processed that is the target of estimation of the range of paragraphs and topics in paragraphs. The series data is, for example, text data in which time-series utterances of an operator and a customer are voice-recognized. When the series data is input online, the input unit 41 may sequentially input the text data obtained by the voice recognition of each utterance during the dialogue. Further, when the series data is input offline, the input unit 41 may sort the start time or end time of each utterance during the dialogue and input the text data of each utterance. The input unit 41 outputs the input series data to the determination unit 42.

The determination unit 42 uses the binary classification model 1a to determine whether or not the utterance constituting the series data output from the input unit 41 is a topic switching utterance. Here, the binary classification model 1a is a model learned in advance so as to determine whether or not the topic is switched with respect to the utterance or the division unit thereof constituting the series data of the dialogue. In the binary classification model 1a, for example, the teacher data to which the binary label (switching label) indicating whether or not the topic is switched is attached to the utterance or the division unit thereof constituting the series data is shown in FIG. It can be created by learning with the learning device 10 described with reference to.

The determination unit 42 determines from the determination result using the binary classification model 1a whether or not the utterance constituting the series data or the division unit thereof is to be processed by the topic estimation unit 43 described later. Specifically, the determination unit 42 determines the utterance or the division unit thereof determined to be the switching of the topic as the processing target by the topic estimation unit 43. The determination unit 42 outputs the determination result of whether or not to be processed by the topic estimation unit 43 to the topic estimation unit 43 and the paragraph estimation unit 44.

The topic estimation unit 43 uses the multi-value classification model 2a to set a topic within the range including the utterance for the utterance (the utterance of switching the topic) determined to be processed by the determination unit 42 or the division unit thereof. Give a multi-valued label to indicate. Here, the multi-value classification model 2a is a model for estimating a topic in a range including the utterance with respect to the utterance or its division unit. In the multi-value classification model 2a, for example, the teacher data to which the multi-value label (topic label) indicating the topic to which the utterance is related is attached to the utterance or the division unit thereof constituting the series data is referred to with reference to FIG. It can be created by learning with the learning device 20 described above. In the learning of the multi-value classification model 2a, the utterances of the topic switching may be performed, and the learning about the transition of the topic may be performed only for the utterances to which the multi-value label is attached. By excluding the utterances between the utterance of the topic change and the utterance of the next topic change from the learning target, noise for the topic classification can be removed.

The topic estimation unit 43 stores the topic estimation result (multi-valued label corresponding to the estimated topic) in the label information table. The label information table is an area for storing the estimation result of the topic for the data to be processed, and may be a memory on a computer, a database, or a file.

The paragraph estimation unit 44 estimates that the range from the utterance determined to be processed by the determination unit 42 (the utterance of the topic change) to the utterance immediately before the next utterance determined to be processed is the range of one paragraph. do. The paragraph estimation unit 44 attaches a multi-valued label stored in the label information table to the utterance included in the paragraph whose range is estimated. Specifically, the paragraph estimation unit 44 describes the utterances from the utterance of the topic change to the utterance immediately before the utterance of the next topic change, and the utterance of the topic change stored in the label information table. Give the given multi-valued label.

The output unit 45 outputs the utterances constituting the paragraph for each paragraph whose range is estimated in the series data. Further, the output unit 35 may output a multi-valued label indicating a topic in the paragraph, a paragraph start time, an end time, and the like.

Similar to the first embodiment, in the estimation device 30c, a morphological analysis unit that performs morphological analysis for text chat may be provided after the input unit 41. Further, when the series data to be processed is input offline, the configuration of the estimation device 30c uses all the results of the determination of whether or not the utterance of the topic is switched and the estimation of the topic at once, and paragraphs. You may estimate the range of. In this case, the paragraph estimation unit 44 sets the utterance in the range from the change of the topic to the utterance immediately before the change of the next topic based on the determination result of whether or not the change of the topic is made and the estimation result of the topic. , The multi-valued label estimated by the topic estimation unit 43 may be attached.

FIG. 17 is a flowchart showing an example of the operation of the estimation device 30c according to the present embodiment.

The determination unit 42 determines whether or not the dialogue in the series data of the processing target input to the input unit 41 has been completed (step S51).

When it is determined that the dialogue is completed (step S51: Yes), the estimation device 30c ends the process.

When it is determined that the dialogue is not completed (step S51: No), the determination unit 42 reads the utterance to be processed (step S52). The determination unit 42 uses the binary classification model 1a to determine whether or not the read utterance is a topic-switching utterance (step S53).

When it is determined that the read utterance is not a topic switching utterance (step S54: No), the process of step S57, which will be described later, is performed.

When it is determined that the read utterance is a topic switching utterance (step S54: Yes), the topic estimation unit 43 estimates the topic of the read utterance using the multi-value classification model 2a (step). S55). The topic estimation unit 43 stores the estimated topic in the label information table and updates the label information table (step S56). That is, the label information table is updated every time the read utterance is a topic switching utterance.

The paragraph estimation unit 44 assigns a multi-valued label stored in the label information table to the read utterance (step S57). As described above, the label information table is updated every time the read utterance is a topic switching utterance. Therefore, the same multi-valued label is assigned from the utterance of the topic change to the utterance immediately before the utterance of the next topic change, which constitutes one paragraph.

When a multi-valued label is attached to the read utterance, the determination unit 42 returns to the process of step S51 with the next utterance in the series data as the processing target (step S58).

FIG. 18 is a diagram showing an example of topic estimation by the estimation device 30c according to the present embodiment. In FIG. 18, it is assumed that the binary classification model 1a and the multi-value classification model 2a are learned in utterance units.

When the series data of one dialogue is input to the estimation device 30c, the determination unit 42 uses the binary classification model 1a as shown in FIG. It is determined whether or not it is. The topic estimation unit 43 estimates the topic of the utterance determined to be the switching of the topic by using the multi-value classification model 2a, and stores the multi-value label indicating the estimated topic in the label information table. The paragraph estimation unit 44 estimates the range from the utterance of the topic change to the utterance immediately before the utterance of the next topic change as one paragraph. Then, the paragraph estimation unit 44 assigns a multi-valued label indicating the topic of the utterance at the beginning of the paragraph, which is stored in the label information table, to all the utterances constituting the paragraph.

As described above, in the present embodiment, the estimation device 30c uses the binary classification model 1a to determine whether or not the utterances constituting the series data are utterances of switching topics. Further, the estimation device 30c estimates the topic of the utterance of the topic change by using the multi-value classification model 2a. Further, the estimation device 30c estimates the range of the paragraph from the utterance of the topic change to the utterance immediately before the next topic change utterance, and the topic estimated about the topic change utterance is the topic change. Presumed to be a topic in a paragraph containing the utterance of.

This makes it possible to detect an utterance of a topic change and estimate a multi-valued label to be given to the utterance even in a dialogue in which the majority of similar topics or a dialogue in which the order of the topics is indefinite. Therefore, it is possible to estimate the utterance from the utterance of the change of topic to the utterance immediately before the utterance of the change of the next topic as a paragraph consisting of one topic.

(Third embodiment)
In the first and second embodiments described above, a model for estimating whether or not the talk (topic) is switched and a model for estimating the topic are created for each utterance or its division unit. As described above, the utterance division unit is, for example, a word unit in which the utterance is divided into words. Further, the utterance division unit is, for example, a unit divided by punctuation marks or punctuation marks when punctuation marks are added to the utterance. Further, in the first and second embodiments described above, when the topic of the utterance is estimated, the topic is estimated in the utterance or a predetermined division unit. Then, in the first and second embodiments, the division unit of the utterance was fixed.

However, in the dialogue between the customer and the person in charge of reception at the contact center, the topic (scene) does not always change in a predetermined unit. For example, in the response of a contact center regarding a car accident, the response history may be recorded separately for the scene of confirming the presence or absence of an injury and the scene of confirming the damage of a car. In the following, the dialogue between the customer and the person in charge of responding shown in utterances 1 to 4 will be described with an example of dividing the dialogue into a scene for confirming the presence or absence of an injury and a scene for confirming damage to the car.
Responsible person: "I heard that you had an accident when you put the car in the garage. What was the situation?" (Utterance 1)
Customer: "When I was in the garage, the bumper behind the car hit the utility pole and got scratched." (Utterance 2)
Responsible person: "Well, was your body okay because you rubbed the bumper behind the car with a utility pole when you put it in the garage?" (Utterance 3)
Customer: "I wasn't injured." (Utterance 4)

In the above example, utterance 1 and utterance 2 are utterances in a scene where damage to the car is confirmed. In the middle of utterance 3, the scene of confirming the damage of the car is switched to the scene of confirming the presence or absence of injury, and the scene of confirming the presence or absence of injury continues to utterance 4. Specifically, utterance 3 "That's right, because I rubbed the bumper behind the car with a utility pole when I put it in the garage" is the scene to confirm the damage of the car, and utterance 3 " Was your body okay? ”Is the scene to check for any injuries.

In the first and second embodiments, it is necessary to determine the unit in advance and prepare the learning data. Therefore, it is difficult to create a model corresponding to the case where the scene is switched in the middle of the utterance as in the above-mentioned utterance 3. In the example of utterance 3, the unit "That's right, I rubbed the bumper behind the car with a utility pole when I put it in the garage" is a label indicating that it is a scene to confirm damage to the car. It is desirable to give a label indicating that it is a scene to confirm the presence or absence of injury to the unit "Is your body okay?", But such a unit is decided in advance. That is difficult.

For example, if you use punctuation-based division, utterance 3 is "Is that so?" "When you put it in the garage" "By rubbing the bumper behind the car with a utility pole" "Your body" "It's okay" Did you do it? " However, for example, it is not possible to specify what kind of scene it is, and it is difficult to give a label only with units such as "Is that so", "Your body", and "Is it okay?" ..

Also, when creating learning data by connecting predetermined units, "Is that so?" "When putting it in the garage" "By rubbing the bumper behind the car with a utility pole" is connected to one It is possible to create learning data by connecting "body" and "is it okay?" As a unit and making it one unit. However, create learning data by judging whether or not it should be a negative example in other units such as "was it?", "When putting it in the garage", "was it, when putting it in the garage", etc. It is difficult.

Also, when estimating the switching point of a story (topic) in the middle of an utterance, it is difficult to determine the unit of the utterance before the estimation.

In this embodiment, the learning unit is not fixed, and positive examples, negative examples, and non-target learning data are dynamically created in various units from the teacher data. That is, in the present embodiment, the learning data is created by making the division unit of the utterance variable. By doing so, even when the story (scene) is switched in the middle of the utterance, it is possible to create learning data for learning a model capable of estimating the switching point with high accuracy. Further, by using a model in which learning data created without fixing the learning unit is used, it is possible to estimate each scene in the utterance even when the scene is switched in the middle of the utterance.

FIG. 19 is a diagram showing a configuration example of the learning data creating device 50 according to the present embodiment. The learning data creating device 50 according to the present embodiment dynamically creates positive examples, negative examples, and non-target learning data in various units from the teacher data.

As shown in FIG. 19, the learning data creating device 50 according to the present embodiment includes an input unit 51, a learning data creating unit 52, and an output unit 53.

The dialogue series data is input to the input unit 51. The series data is, for example, voice data of a time-series dialogue between an operator and a customer, or text data in which utterances included in the dialogue are voice-recognized. The input unit 51 outputs the input series data to the learning data creation unit 52.

The learning data creation unit 52 inputs the series data output from the input unit 51 and the teacher data. The teacher data is data in which the range of utterances necessary for specifying a scene in the utterances constituting the series data is labeled before the learning data is created. Labels in teacher data are manually assigned. The learning data creation unit 52 creates learning data used for learning a model for estimating a topic (scene) in the utterance in an arbitrary division unit of the utterance based on the input series data and the teacher data.

FIG. 20 is a diagram showing a configuration example of the learning data creation unit 52.

As shown in FIG. 20, the learning data creation unit 52 includes a sentence output unit 521, an ID assignment unit 522, a combination generation unit 523, and an assignment unit 524.

The sentence output unit 521 outputs the utterance character string constituting the series data input from the input unit 51 as a sentence. When the series data is text data, the sentence output unit 521 outputs a sentence divided into word units by morphological analysis. When the series data is voice data, the sentence output unit 521 outputs a sentence divided into word units by voice recognition.

The ID assignment unit 522 generates an element in which the utterance is divided according to a predetermined rule from the sentence output from the sentence output unit 521. The unit of division (unit of element) by the ID assigning unit 522 may be any unit as long as it can be specified, such as a word unit, a punctuation mark unit, a voice recognition unit, and a speech end unit. The ID assigning unit 522 assigns an ID to each of the elements in which the utterance is divided, and stores the ID assigned to each element in the ID set.

The combination generation unit 523 generates a combination of IDs (combination ID string) necessary for learning the model based on the IDs stored in the ID set.

FIG. 21 is a diagram showing a configuration example of the combination generation unit 523.

As shown in FIG. 21, the combination generation unit 523 includes an ID extraction unit 5231, a combination target ID storage unit 5232, a combination generation ID storage unit 5233, and a combination ID generation unit 5234.

The ID extraction unit 5231 extracts a predetermined longest unit ID from the ID set and stores it in the longest unit ID set. Here, the longest unit may be any unit as long as it is a unit longer than the unit divided when the sentence is output by the sentence output unit 521 and can be specified in advance. For example, if the unit of division at the time of output of a sentence is a word unit, the longest unit is a punctuation mark unit or a punctuation unit, which is longer than the word unit. Further, for example, if the unit of division at the time of outputting a sentence is a punctuation mark unit, the longest unit is a punctuation mark unit or a voice recognition unit, which is longer than the punctuation mark unit.

The combination target ID storage unit 5232 extracts the IDs in the range to be combined from the longest unit ID set and stores them in the combination target ID set.

The combination generation ID storage unit 5233 acquires the combination generation ID for generating the combination ID string from the combination target ID set and stores it in the combination generation ID set.

The combination ID generation unit 5234 generates a combination ID string based on the set of combination generation IDs, stores it in the set of combination ID columns, and updates the set of combination ID columns.

Referring to FIG. 20 again, the combination generation unit 523 outputs the generated combination ID string to the addition unit 524.

The combination ID string output from the combination generation unit 523 and the teacher data are input to the addition unit 524. The assigning unit 524 creates learning data by assigning a positive example, a negative example, or a label to be excluded from learning based on the teacher data for each division unit in which the combination ID string is replaced with a character string.

FIG. 22 is a diagram showing a configuration example of the granting unit 524.

As shown in FIG. 22, the granting unit 524 includes a positive example granting unit 5241, a negative example granting unit 5242, and a non-target granting unit 5243.

The regular example assigning unit 5241 assigns a label indicating a regular example to a predetermined ID column in the set of combination ID columns based on the teacher data. By doing so, a label showing a positive example is given to the division unit in which the predetermined ID string is replaced with the character string.

The negative example assigning unit 5242 assigns a label indicating a negative example to a predetermined ID column in the set of combination ID columns. By doing so, a label showing a negative example is given to the division unit in which the predetermined ID string is rewritten into the character string.

The non-target granting unit 5243 assigns a label indicating that it is not subject to learning to a predetermined ID column in the set of combination ID columns. By doing so, a label indicating that the combination ID string is not the target is given to the division unit in which the combination ID string is replaced with the character string. The non-target granting unit 5243 deletes the combination ID column to which the label indicating that it is not the target of learning is attached, and the division unit corresponding to the combination ID column to which the label indicating the positive example or the negative example is attached, and A label indicating a positive example or a negative example is output as training data. The details of the operation of the granting unit 524 will be described later.

Referring to FIG. 19 again, the output unit 53 outputs the learning data created by the learning data creation unit 52.

Next, the operation of the learning data creation unit 52 will be described. In the following, a case of creating learning data for learning a model for determining whether or not a scene (story) is switched will be described as an example. Specifically, since the above-mentioned utterance 3 includes a scene change, the utterance 3 will be described as an example. Further, in the following, it is assumed that the label "T" is given to the range determined to be the change of the scene, and the label "F" is given to the range not determined to be the change of the scene. Further, it is assumed that the division unit of the sentence is a punctuation mark unit and the longest unit is a punctuation mark unit. Further, as teacher data, it is assumed that the label "T" is given to the range determined to be the change of scene in utterance 3 ("Is your body okay?").

The ID assigning unit 522 divides the utterance 3 by punctuation marks, and assigns an ID to each element divided by the punctuation marks. In the following, it is assumed that the ID assigning unit 522 assigns an ID as follows.
ID1: Was that so?
ID2: When you put it in the garage
ID3: Because I rubbed the bumper behind the car with a utility pole,
ID4: Your body is
ID5: Is that okay?

The ID assigning unit 522 stores the ID assigned to each element of the utterance in the ID set.

The combination generation unit 523 creates a combination (ID string) of the IDs of the elements divided into punctuation marks within the range of the longest predetermined unit from the ID set. The operation of the combination generation unit 523 will be described with reference to FIG. 23. FIG. 23 is a flowchart showing an example of the operation of the combination generation unit 523.

The ID extraction unit 5231 extracts all IDs from the ID set for each longest unit and stores them in the longest unit ID set (step S61). As described above, since the longest unit is a punctuation unit, the range of the longest unit is ID1 to ID5. The ID extraction unit 5231 extracts IDs 1 to 5 from the ID set and stores (1, 2, 3, 4, 5) in the longest unit ID set.

The combination target ID storage unit 5232 deletes the smallest ID among the IDs stored in the longest unit ID set from the longest unit ID set, and stores the ID in the combination target ID set (step S62). In the above-mentioned example, the combination target ID storage unit 5232 takes out ID1 from the ID set of the longest unit and stores it in the combination target ID set. Further, the combination target ID storage unit 5232 deletes ID1 from the ID set of the longest unit. Therefore, (2,3,4,5) is stored in the ID set of the longest unit.

The combination generation ID storage unit 5233 arranges all the IDs included in the combination target ID set in ascending order and stores them in the combination generation ID set and the combination ID string set (step S63). In the above example, since (1) is stored in the ID set to be combined, the combination sequence in which all the IDs are arranged in ascending order is [1]. The combination generation ID storage unit 5233 stores (1) in the set of combination generation IDs, and stores [1] in the set of combination ID columns.

The combination ID generation unit 5234 deletes the smallest ID among the ID columns stored in the combination generation ID set, arranges the remaining IDs in ascending order, and stores them in the combination ID column set (step S64). In the above example, (1) is stored in the set of combination generation IDs. Therefore, the combination ID generation unit 5234 deletes the smallest ID1.

Next, the combination ID generation unit 5234 determines whether or not the set of combination generation IDs is empty (step S65). In the above example, the set of combination generation IDs is empty because ID1 is deleted.

If it is determined that the set of combination generation IDs is not empty (step S65: No), the combination ID generation unit 5234 repeats the process of step S64.

When the combination ID generation unit 5234 determines that the combination generation ID set is empty (step S65: Yes), the combination target ID storage unit 5232 determines whether or not the longest unit ID set is empty. (Step S66). In the above example, since (2,3,4,5) is stored in the longest unit ID set, the longest unit ID set is not empty.

When it is determined that the ID set of the longest unit is not empty (step S66: No), the combination target ID storage unit 5232 returns to the process of step S62. In the above example, since (2, 3, 4, 5) is stored in the ID set of the longest unit, the combination target ID storage unit 5232 takes out the smallest ID 2 and stores it in the combination target ID. Further, the combination target ID storage unit 5232 deletes the ID 2 from the ID set of the longest unit. Therefore, (3, 4, 5) is stored in the ID set of the longest unit.

Hereinafter, the processes of steps S63 and S64 are performed, and (1 and 2) are stored in the ID set to be combined. Further, an ID string in which all the IDs stored in the ID set to be combined are arranged in ascending order is stored in the combination generation ID set and the combination ID string set. Since (1,2) is stored in the ID set to be combined, the combination column in which all the IDs are arranged in ascending order is [1,2], and the combination generation ID set is (1,2). Is stored. Further, the combination sequence [1, 2] is added to the set of combination columns, and the set of combination columns becomes ([1], [1, 2]).

The combination ID generation unit 5234 deletes the smallest ID among the ID columns stored in the combination generation ID set, arranges the remaining IDs in ascending order, and stores them in the combination ID column set. In the above example, (1, 2) is stored in the set of combination generation IDs. Therefore, the combination ID generation unit 5234 deletes the smallest ID1. ID1 is deleted, and (2) remains in the set of combination generation IDs. Since (2) remains in the set of combination generation IDs, the combination ID generation unit 5234 stores [2] in the set of combination ID strings. Therefore, the set of combination ID columns is ([1], [1,2], [2]).

Hereinafter, the same process is repeated until the ID set of the longest unit becomes empty. When the ID set of the longest unit becomes empty, the following ID columns are stored in the set of combination ID columns. In this way, the combination generation unit 523 generates a combination ID string composed of one element or a plurality of consecutive key points in which the utterance is divided according to a predetermined rule.
[1]
[1, 2]
[2]
[1, 2, 3]
[2, 3]
[3]
[1, 2, 3, 4]
[2,3,4]
[3,4]
[4]
[1, 2, 3, 4, 5]
[2,3,4,5]
[3, 4, 5]
[4,5]
[5]

When the combination target ID storage unit 5232 determines that the ID set of the longest unit is empty (step S66: Yes), the ID extraction unit 5231 does not store the IDs stored in the longest unit ID set among the ID sets. It is determined whether or not there is (step S67).

If it is determined that there is an ID that is not stored in the ID set of the longest unit (step S67: Yes), the ID extraction unit 5231 returns to the process of step S61.

When it is determined that there is no ID stored in the ID set of the longest unit (step S67: No), the combination generation unit 523 ends the process.

Next, the operation of the granting unit 524 will be described with reference to FIG. 24. FIG. 24 is a flowchart showing an example of the operation of the granting unit 524.

The regular example assigning unit 5241 assigns a label indicating a regular example to all the ID columns in the range matching the teacher data in the ID strings included in the set of the combination ID strings generated by the combination generation unit 523 (step). S71). As described above, it is assumed that the label "T" is attached to the range determined to be the scene change in the utterance 3 ("Is your body okay?") As the teacher data. Therefore, the example giving unit 5241 assigns a label (“T”) indicating the example to the ID columns [4, 5] in the same range as “Is your body okay?” In the utterance 3.

The negative example assigning unit 5242 includes negative examples in all the combination ID columns included in the set of combination ID columns, which does not include any ID included in the ID column labeled with a positive example. A label is attached (step S72). In the above-mentioned example, the ID column [4, 5] is given a label indicating a positive example. Therefore, the negative example assigning unit 5242 assigns a label (“F”) indicating a negative example to all the following combination ID strings that do not include ID4 and ID5.
[1]: F
[1, 2]: F
[2]: F
[1, 2, 3]: F
[2,3]: F
[3]: F

The non-target granting unit 5243 assigns a label indicating non-target to all the combination ID columns to which neither the label indicating the positive example nor the label indicating the negative example is assigned among the ID columns included in the set of the combination ID columns. (Step S73). In the above-mentioned example, the non-target granting unit 5243 assigns a label indicating non-target to the following combination ID column.
[1,2,3,4]: Not applicable [2,3,4]: Not applicable [3,4]: Not applicable [4]: Not applicable [1,2,3,4,5]: Not applicable [2,3,4,5]: Not applicable [3,4,5]: Not applicable [5]

The non-target granting unit 5243 deletes the combination ID column to which the label indicating the non-target is attached from the set of the combination ID columns. Then, the non-target granting unit 5243 stores the division unit corresponding to the combination ID string to which the label indicating the positive example or the negative example is attached in the learning data. In the above-mentioned example, the division unit corresponding to the following combination ID string is stored in the learning data.
[1]: F
[1, 2]: F
[2]: F
[1, 2, 3]: F
[2,3]: F
[3]: F
[4,5]: T

In this way, the learning data creation device 50 according to the present embodiment, a division unit composed of one element or a plurality of consecutive elements in which the utterance is divided by a predetermined rule (for example, a punctuation mark unit) is given a label. And create learning data. Here, in the present embodiment, the learning data includes division units having different numbers of constituent elements.

Therefore, even if the scene (story) is switched in the middle of the utterance, the learning data can be created in the utterance division unit according to the change. In addition, by learning the learning data created in this way, it is possible to create a model that can estimate the scene change with high accuracy even when the scene (story) changes in the middle of the utterance. can.

Next, the estimation device 30d according to the present embodiment will be described. The estimation device 30d according to the present embodiment uses a model trained based on the training data created by the training data creation device 50, and switches scenes (story) in utterance division units having different numbers of constituent elements. Is to estimate

FIG. 25 is a diagram showing a configuration example of the estimation device 30d according to the present embodiment.

As shown in FIG. 25, the estimation device 30d according to the present embodiment includes an input unit 61, an estimation unit 62, and an output unit 63.

The dialogue series data is input to the input unit 61. As shown in FIG. 26, the input unit 61 includes a sentence output unit 611. Similar to the sentence output unit 521, the sentence output unit 611 outputs the utterance character string constituting the series data input to the input unit 61 to the estimation unit 62 as a sentence. When the series data is text data, the sentence output unit 611 outputs a sentence divided into word units by morphological analysis. When the series data is voice data, the output unit 611 outputs a sentence divided into word units by voice recognition.

With reference to FIG. 25 again, the estimation unit 62 estimates the change of story from the sentence output from the input unit 61 by using the estimation model 3. The estimation model 3 is a model created by learning the learning data created by the learning data creation device 50. As described above, the learning data created by the learning data creation unit 50 includes division units having different numbers of constituent elements, and each division unit is given a label as to whether or not the story is switched. It is data. Therefore, the estimation model 3 is a model learned in advance so as to determine whether or not the story is switched for each of the division units having different numbers of constituent elements. The estimation unit 62 generates division units having different numbers of constituent elements from the utterances constituting the series data to be processed, and uses the estimation model 3 as the first model for each of the generated division units. Judging whether or not it is a switch

The output unit 63 outputs the estimation result by the estimation unit 62.

Next, the configuration of the estimation unit 62 will be described. FIG. 27 is a diagram showing a configuration example of the estimation unit 62.

As shown in FIG. 27, the estimation unit 62 includes an ID assignment unit 621, a combination generation unit 622, and a switching estimation unit 623.

The ID assignment unit 621 generates an element in which the utterance is divided according to a predetermined rule from the sentence output from the sentence output unit 611. The unit of division by the ID assigning unit 621 may be any identifiable unit such as a word unit, a punctuation mark unit, a voice recognition unit, and a speech end unit. The ID assigning unit 621 assigns an ID to each of the elements in which the utterance is divided, and stores the ID assigned to each element in the ID set.

The combination generation unit 622 generates a combination of IDs (combination ID string) used for estimating the switching of the story based on the IDs stored in the ID set.

FIG. 28 is a diagram showing a configuration example of the combination generation unit 622. As shown in FIG. 28, the combination generation unit 622 includes an ID extraction unit 6221, a combination target ID storage unit 6222, a combination generation ID storage unit 6223, and a combination ID generation unit 6224.

Similar to the ID extraction unit 5231, the ID extraction unit 6221 extracts a predetermined longest unit ID from the ID set and stores it in the longest unit ID set.

Similar to the combination target ID storage unit 5232, the combination target ID storage unit 6222 extracts the IDs in the range to be combined from the longest unit ID set and stores them in the combination target ID set.

Similar to the combination generation ID storage unit 6223, the combination generation ID storage unit 6223 acquires the combination generation ID for generating the combination ID string from the combination target ID set and stores it in the combination generation ID storage unit.

Similar to the combination ID generation unit 5234, the combination ID generation unit 6224 generates a combination ID string based on the combination generation ID set, stores it in the combination ID string set, and updates the combination ID column set.

Referring to FIG. 27 again, the combination generation unit 622 switches the set of the generated combination ID string and outputs it to the estimation unit 623.

The switching estimation unit 623 is input with a set of combination ID strings output from the combination generation unit 622. The switching estimation unit 623 uses the estimation model 3 to determine for each division unit corresponding to the combination ID string whether or not the division unit is a story change, and outputs the determination result.

Next, the operation of the estimation unit 62 will be described focusing on the operation of the switching estimation unit 623. Since the operation of generating the combination ID string by the combination generation unit 622 is the same as the operation of the combination generation unit 523 described with reference to FIG. 23, the description thereof will be omitted.

FIG. 29 is a flowchart showing an example of the operation of the switching estimation unit 623.

The switching estimation unit 623 extracts one combination ID string consisting of only IDs for which it has not yet been estimated whether or not the story is switched from the set of combination ID strings (step S81).

The switching estimation unit 623 replaces the extracted combination ID string with a word string (step S82). That is, the switching estimation unit 623 replaces the ID included in the combination ID string with the utterance element corresponding to the ID.

Next, the switching estimation unit 623 estimates whether or not the character string (speech division unit) in which the combination ID string is replaced is a story switching using the estimation model 3 (step S83).

Next, the switching estimation unit 623 determines whether or not the estimation result is a positive example (whether the story is switched) (step S84).

When it is determined that the estimation result is not a positive example (step S84: No), the switching estimation unit 623 determines whether or not the set of combination ID strings is empty (step S85).

When it is determined that the set of combination ID columns is not empty (step S85: No), the switching estimation unit 623 returns to the process of step S81.

When it is determined that the set of the combination ID strings is empty (step S85: Yes), the switching estimation unit 623 outputs the estimation result for each ID via the output unit 63 (step S86), and ends the process. ..

When it is determined that the determination result is a positive example (step S84: Yes), the switching estimation unit 623 consists of only IDs in the set of combination ID strings that do not estimate whether or not the story is switching. It is determined whether or not there is a combination ID column (step S87).

When it is determined that there is a combination ID string consisting only of IDs for which it is not estimated whether or not the talk is switched (step S87: Yes), the switching estimation unit 623 returns to the process of step S81.

When it is determined that there is no combination ID string consisting only of IDs for which it is not estimated whether or not the talk is switched (step S87: No), the switching determination unit 623 estimates for each ID via the output unit 63. The result and the estimation unit are output (step S88), and the process is terminated.

Hereinafter, the operation of the estimation unit 62 will be further described with reference to a specific example.

Consider the following utterances as an example.
Utterance: "I heard that you were hit when you were stopped at a traffic light. Is your injury okay?"

As shown in FIG. 30A, the ID assigning unit 621 divides the above-mentioned utterance into four elements in units of punctuation marks, and assigns IDs (ID1 to ID4) to each element. The combination generation unit 622 generates a combination ID string by the process described with reference to FIG. 23. In the example shown in FIG. 30A, the combination generation unit 622 has 10 combination ID sequences ([1], [1,2], [2], [1,2,3], [2,3], [3]. ], [1,2,3,4], [2,3,4], [3,4], [4]).

The switching estimation unit 623 extracts one combination ID string from the generated set of combination ID strings, and estimates whether or not the division unit corresponding to the extracted combination ID sequence is the switching of the story. As shown in FIG. 30B, the switching estimation unit 623 changes the story by dividing the division unit corresponding to the combination ID string in the set of the combination ID strings until it is estimated to be a positive example (story switching). Estimate in order whether or not there is. Combination ID columns [1], [1,2], [2], [1,2,3], [2,3], [3], [1,2,3,4], [2,3] It is assumed that the division unit corresponding to 4], is not a regular example, and the division unit corresponding to the combination ID sequence [3, 4] is estimated to be a regular example.

Since the switching estimation unit 623 does not have a combination ID string consisting of only unestimated IDs, the estimation result and the estimation unit are output for each ID via the output unit 63. Since it was estimated that the division unit corresponding to the combination ID sequence [3, 4] was a positive example, the switching estimation unit 623 had a positive example for the ID 3 and ID 4 as shown in FIG. 30B. Also, it is output that the unit (estimated unit) presumed to be a positive example is the combination sequence [3, 4].

The operation of the estimation unit 62 will be further described by giving another specific example.

Consider the following utterances as an example.
Utterance: "Then, I would like to know the situation of the car in detail, but this time, the grade will not be lowered."

As shown in FIG. 31A, the ID assigning unit 621 divides the above-mentioned utterance into four elements in units of punctuation marks, and assigns IDs (ID1 to ID4) to each element. The combination generation unit 622 generates a combination ID string by the process described with reference to FIG. 23. In the example shown in FIG. 31A, the combination generation unit 622 has 10 combination ID sequences ([1], [1,2], [2], [1,2,3], [2,3], [3]. ], [1,2,3,4], [2,3,4], [3,4], [4]).

The switching estimation unit 623 extracts one combination ID string from the generated set of combination ID strings, and estimates whether or not the division unit corresponding to the extracted combination ID sequence is the switching of the story. As shown in FIG. 31B, the switching estimation unit 623 changes the story by dividing the division unit corresponding to the combination ID string in the set of the combination ID strings until it is estimated to be a positive example (story switching). Estimate in order whether or not there is. In the following, it is assumed that the division unit corresponding to the combination ID column [1] is not a regular example, and the division unit corresponding to the combination ID column [1, 2] is a regular example.

Since the switching estimation unit 623 has a combination ID sequence ([3], [3, 4], [4]) consisting only of IDs (ID3 and ID4) for which it is not estimated whether or not it is a positive example, there is a combination ID sequence ([3], [3,4], [4]). It is further estimated whether or not these ID columns are correct examples. In the following, it is assumed that the division unit corresponding to the combination ID column [3] is not a regular example, and the division unit corresponding to the combination ID column [3, 4] is estimated to be a regular example.

Since the switching estimation unit 623 does not have a combination ID string consisting of only unestimated IDs, the estimation result and the estimation unit are output for each ID via the output unit 63. Since it was estimated that the division unit corresponding to the combination ID column [1, 2] and the combination ID column [3, 4] is a positive example, the switching estimation unit 623 may refer to ID 1 and ID 2 as shown in FIG. 31B. It is output that the estimation result is a positive example and that the estimation unit is the combination sequence [1, 2]. Further, the switching estimation unit 623 outputs to ID3 and ID4 that the estimation result is a positive example and that the estimation unit is the combination sequence [3,4].

Next, estimation of switching between the case where the range of the division unit is variable as in the present embodiment and the case where the range of the division unit is fixed as in the first and second embodiments. The result of comparing the accuracy will be described. When the range of the division unit was fixed, the precision was 0.46, the recall was 0.33, and the F value was 0.38. On the other hand, when the range of the division unit was made variable, the precision was 0.49, the recall was 0.35, and the F value was 0.41. From this result, it was confirmed that higher estimation accuracy can be obtained when the range of the division unit is variable than when the range of the division unit is fixed.

As described above, in the present embodiment, whether or not the utterance is switched for each of the divided units in which the utterance is divided according to a predetermined rule and is composed of one element or a plurality of consecutive elements and the number of constituent elements is different. Create training data with a label indicating. Further, in the present embodiment, a division unit having a different number of constituent elements is generated from the utterances constituting the series data to be processed, and the training data is generated by using the trained estimation model 3. Each time, the estimation model 3 is used to determine whether or not the story is switched.

Therefore, even when the story is switched in the middle of the utterance, the switching point can be estimated with high accuracy.

In the first embodiment, the binary classification model 1 is created by the learning device 10 and the multi-value classification model 2 is created by the learning device 20. However, the present invention is not limited to this. No. For example, as shown in FIG. 32, one learning device 70 may create a binary classification model 1 and a multi-value classification model 2.

As shown in FIG. 32, the learning device 70 includes an input unit 11, a binary classification learning unit 12 as a first model learning unit, an input unit 21, a multi-value label complementing unit 22, and a second model. It is provided with a multi-value classification learning unit 23 as a learning unit.

The operation of each of the input unit 11 and the binary classification learning unit 12 is the same as the operation of each of the input unit 11 and the binary classification learning unit 12 described with reference to FIG. Although detailed description is omitted, the binary classification learning unit 12 indicates whether or not the utterance or the division unit in which the utterance is divided, which constitutes the series data of the dialogue including a plurality of topics, is the switching of the utterance. Based on the teacher data (first teacher data) to which the binary label (first label) is attached, it is determined whether or not the utterance constituting the series data to be processed is the utterance of the switching of the talk. Learn the value classification model 1 (first model).

The operations of the input unit 21, the multi-value label complement unit 22, and the multi-value classification learning unit 23 are the operations of the input unit 21, the multi-value label complement unit 22, and the multi-value classification learning unit 23, which are described with reference to FIG. Is the same as. Although detailed explanation is omitted, the multi-valued classification learning unit 23 has teacher data (second label) in which a multi-valued label (second label) indicating a topic in the range is added to a range in which one topic in the series data continues. Based on the teacher data of 2), the multi-valued classification model 2 (second model) that estimates the topic in the utterance that constitutes the series data to be processed is learned.

FIG. 33 is a diagram showing an example of the operation of the learning device 70, and is a diagram for explaining a learning method by the learning device 70.

The binary classification learning unit 12 is a teacher to which a binary label indicating whether or not the utterance is switched is given to the utterance or the division unit obtained by dividing the utterance that constitutes the series data of the dialogue including a plurality of topics. Based on the data (first teacher data), the binary classification model 1 for determining whether or not the utterance constituting the series data to be processed is the utterance of the switching of the talk is learned (step S91).

The multi-value classification learning unit 23 estimates the topic in the utterance that constitutes the series data to be processed based on the teacher data in which the multi-value label indicating the topic in the range is added to the range in which one topic in the series data continues. Learn the multi-valued classification model 2 to be performed (step S92).

Next, the hardware configuration of the estimation devices 30 to 30d according to the present disclosure will be described. Although the hardware configuration of the estimation device 30 will be described below, the estimation devices 30a to 30d may have the same hardware configuration. Further, the

learning devices

10, 20, 70 and the learning data creating device 50 may have the same hardware configuration.

FIG. 34 is a block diagram showing a hardware configuration when the estimation device 30 of the present disclosure is a computer capable of executing a program instruction. Here, the computer may be a general-purpose computer, a dedicated computer, a workstation, a PC (Personal Computer), an electronic notepad, or the like. The program instruction may be a program code, a code segment, or the like for executing a necessary task.

In the example shown in FIG. 32, the estimation device 30 includes a processor 110, a ROM (Read Only Memory) 120, a RAM (Random Access Memory) 130, a storage 140, an input unit 150, a display unit 160, and a communication interface (I / F) 170. Has. Each configuration is communicably connected to each other via bus 190. Specifically, the processor 110 is a CPU (Central Processing Unit), an MPU (Micro Processing Unit), a GPU (Graphics Processing Unit), a DSP (Digital Signal Processor), a SoC (System on a Chip), or the like, and is of the same type or different types. It may be composed of a plurality of processors.

The processor 110 controls each configuration and executes various arithmetic processes. That is, the processor 110 reads the program from the ROM 120 or the storage 140, and executes the program using the RAM 130 as a work area. The processor 110 controls each of the above configurations of the estimation device 30 and performs various arithmetic processes according to the program stored in the ROM 120 or the storage 140. In the present embodiment, the program according to the present disclosure is stored in the ROM 120 or the storage 140. The processor 110 reads and executes the program. The determination unit 32, the paragraph estimation unit 33, and the topic estimation unit 34 constitute a control unit 38 (FIG. 3). The control unit 38 may be configured by dedicated hardware such as an ASIC (Application Specific Integrated Circuit) or an FPGA (Field-Programmable Gate Array), or may be configured by one or more processors as described above. .. When the learning device 70 has the hardware configuration shown in FIG. 34, the binary classification learning unit 12, the multi-value label complementing unit 22, and the multi-value classification learning unit 23 constitute a control unit 71. The control unit 61 may be configured by dedicated hardware such as an ASIC or FPGA, or may be configured by one or more processors as described above.

The program is stored in a non-transitory storage medium such as a CD-ROM (CompactDiskReadOnlyMemory), a DVD-ROM (DigitalVersatileDiskReadOnlyMemory), or a USB (UniversalSerialBus) memory. May be provided at. Further, the program may be downloaded from an external device via a network.

ROM 120 stores various programs and various data. The RAM 130 temporarily stores a program or data as a work area. The storage 140 is composed of an HDD (Hard Disk Drive) or an SSD (Solid State Drive), and stores various programs including an operating system and various data. For example, the storage 140 stores the created

binary classification models

1, 1a,

multi-value classification models

2, 2a, and estimation model 3.

The input unit 150 includes a pointing device such as a mouse and a keyboard, and is used for performing various inputs.

The display unit 160 is, for example, a liquid crystal display and displays various information. The display unit 160 may adopt a touch panel method and function as an input unit 150.

The communication interface 170 is an interface for communicating with other devices such as an external device (not shown), and for example, standards such as Ethernet (registered trademark), FDDI, and Wi-Fi (registered trademark) are used.

Regarding the above embodiments, the following additional notes will be further disclosed.

(Appendix 1)
An estimator with a processor
The processor
The series data to be processed using the first model learned in advance based on the first teacher data for the utterance that constitutes the series data of the dialogue including a plurality of topics or the division unit obtained by dividing the utterance. Judging whether or not the utterances that make up the utterance are switching utterances,
Based on the result of the determination, the range of the paragraph from the switching of the talk to the utterance immediately before the next switching or the paragraph from the switching of the talk to the utterance at the end of the dialogue in the series data to be processed is estimated. Estimator.

(Appendix 2)
A learning device equipped with a processor
The processor
Based on the first teacher data to which the first label indicating whether or not the utterance is switched is given to the utterance that constitutes the series data of the dialogue including a plurality of topics or the divided unit obtained by dividing the utterance. , Learn the first model to determine whether the utterances that make up the series data to be processed are utterances that switch stories.
The topic in the utterance constituting the series data to be processed is estimated based on the second teacher data in which the range in which one topic in the series data continues is given a second label indicating the topic in the range. A learning device that learns 2 models.

(Appendix 3)
A non-temporary storage medium that stores a program that can be executed by a computer, which stores the program and causes the computer to function as the estimation device according to the appendix 1.

(Appendix 4)
A non-temporary storage medium that stores a program that can be executed by a computer, and that causes the computer to function as the learning device according to the second item.

All documents, patent applications and technical standards described herein are to the same extent as if specifically and individually stated that the individual documents, patent applications and technical standards are incorporated by reference. Incorporated by reference in the specification.

A computer can be suitably used to function as each part of the

estimation device

30, 30a, 30b, 30c, 30d and the learning device 70 described above. Such a computer stores a program describing processing contents that realize the functions of the

estimation devices

30, 30a, and 30b in the storage unit of the computer, and the processor of the computer reads and executes the program. It can be realized by. That is, the program can make the computer function as the

estimation device

30, 30a, 30b, 30c, 30d and the learning device 70 described above.

Further, this program may be recorded on a computer-readable medium. It can be installed on a computer using a computer-readable medium. Here, the computer-readable medium on which the program is recorded may be a non-transient recording medium. The non-transient recording medium is not particularly limited, but may be, for example, a recording medium such as a CD-ROM or a DVD-ROM. This program can also be provided via a network.

The present disclosure is not limited to the configuration specified in each of the above-described embodiments, and various modifications can be made without departing from the gist of the invention described in the claims. For example, the functions included in each component can be rearranged so as not to be logically inconsistent, and a plurality of components can be combined or divided into one.

1,1a Binary classification model (first model)
2,2a Multi-value classification model (second model)
3 Estimated model 10 Learning device 11 Input unit 12 Binary classification learning unit (first model learning unit)
20 Learning device 21 Input unit 22 Multi-value label complement 23 Multi-value classification learning unit (second model learning unit)
30, 30a, 30b, 30c, 30d Estimator 31 Input unit 32 Judgment unit 33

Paragraph estimation unit

34, 34a, 34b Topic estimation unit 35

Output unit

36, 36b Keyword extraction unit 37 Clustering unit 38 Control unit (processor)
41 Input unit 42 Judgment unit 43 Topic estimation unit 44 Paragraph estimation unit 45 Output unit 50 Learning data creation device 51 Input unit 52 Learning data creation unit 53 Output unit 61 Input unit 62 Estimating unit 63 Output unit 521 Sentence output unit 522 ID assignment unit 523 Combination generation unit 524 Granting unit 611 Text input unit 621 ID assignment unit 622 Combination generation unit 623 Switching estimation unit 5231 ID extraction unit 5232 Combination target ID storage unit 5233 Combination generation ID storage unit 5234 Combination ID generation unit 5241 Correct example assignment unit 5242 Negative example Granting unit 5243 Non-target granting unit 6221 ID extraction unit 6222 Combination target ID storage unit 6223 Combination generation ID storage unit 6224 Combination ID generation unit 110 Processor 120 ROM
130 RAM
140 Storage 150 Input unit 160 Display unit 170 Communication interface 190 Bus 70 Learning device 71 Control unit (processor)

Claims

The series data to be processed using the first model learned in advance based on the first teacher data for the utterance that constitutes the series data of the dialogue including a plurality of topics or the division unit obtained by dividing the utterance. A determination unit that determines whether or not the utterances that make up the utterance are switching utterances.
Based on the result of the determination, a paragraph that estimates the range of the paragraph from the switching of the talk to the utterance immediately before the next switching or the paragraph from the switching of the talk to the utterance at the end of the dialogue in the series data to be processed. An estimation device including an estimation unit.
In the estimation device according to claim 1,
The topic in the paragraph or the utterance contained in the paragraph using the second model pre-learned based on the second teacher data for the utterances constituting the series data or the divided units obtained by dividing the utterances. An estimation device further equipped with a topic estimation unit for estimating.
In the estimation device according to claim 1,
A keyword extractor that extracts keywords from the utterances contained in the paragraph,
An estimation device further comprising a topic estimation unit for estimating a topic in the paragraph or an utterance included in the paragraph based on a keyword extracted from the utterance included in the paragraph.
In the estimation device according to claim 3,
Further provided with a clustering unit for clustering a plurality of paragraphs whose range is estimated based on the series data of one or more processing targets for each similar paragraph.
The keyword extraction unit extracts keywords from the utterances included in the representative paragraph among the paragraphs included in the cluster consisting of similar paragraphs.
The topic estimation unit is an estimation device that estimates a topic in a paragraph constituting a cluster including the representative paragraph based on a keyword extracted from an utterance included in the representative paragraph.
In the estimation device according to any one of claims 1 to 4.
The division unit of the utterance consists of one element obtained by dividing the utterance according to a predetermined rule or a plurality of consecutive elements.
The first model includes training data in which the number of constituent elements is different, and the training data to which a label indicating whether or not the story is switched is attached to each of the division units has been learned in advance. A model of the estimation device.
In the estimation device according to claim 5,
From the utterances that make up the series data to be processed, division units with different numbers of constituent elements are generated, and for each generated division unit, whether or not the story is switched using the first model. An estimator to determine.
The series data to be processed using the first model learned in advance based on the first teacher data for the utterance that constitutes the series data of the dialogue including a plurality of topics or the division unit obtained by dividing the utterance. A determination step for determining whether or not the utterances constituting the above are utterances of switching stories, and
Based on the result of the determination, a paragraph that estimates the range of the paragraph from the switching of the talk to the utterance immediately before the next switching or the paragraph from the switching of the talk to the utterance at the end of the dialogue in the series data to be processed. Estimating steps and estimation methods including.
Based on the first teacher data to which the first label indicating whether or not the utterance is switched is given to the utterance that constitutes the series data of the dialogue including a plurality of topics or the divided unit obtained by dividing the utterance. , The first model learning unit that learns the first model that determines whether or not the utterance that constitutes the series data to be processed is the utterance of the switching of the story.
The topic in the utterance constituting the series data to be processed is estimated based on the second teacher data in which the range in which one topic in the series data continues is given a second label indicating the topic in the range. A learning device including a second model learning unit for learning two models.
Based on the first teacher data to which the first label indicating whether or not the utterance is switched is given to the utterance that constitutes the series data of the dialogue including a plurality of topics or the divided unit obtained by dividing the utterance. , The first learning step to learn the first model for determining whether or not the utterance constituting the series data to be processed is the utterance of the switching of the talk.
The topic in the utterance constituting the series data to be processed is estimated based on the second teacher data in which the range in which one topic in the series data continues is given a second label indicating the topic in the range. A learning method comprising a second learning step of learning two models.
A program for operating a computer as the estimation device according to any one of claims 1 to 6.