WO2021256043A1 - Estimation device, estimation method, learning device, learning method and program - Google Patents

Estimation device, estimation method, learning device, learning method and program Download PDF

Info

Publication number
WO2021256043A1
WO2021256043A1 PCT/JP2021/012692 JP2021012692W WO2021256043A1 WO 2021256043 A1 WO2021256043 A1 WO 2021256043A1 JP 2021012692 W JP2021012692 W JP 2021012692W WO 2021256043 A1 WO2021256043 A1 WO 2021256043A1
Authority
WO
WIPO (PCT)
Prior art keywords
utterance
unit
topic
paragraph
estimation
Prior art date
Application number
PCT/JP2021/012692
Other languages
French (fr)
Japanese (ja)
Inventor
隆明 長谷川
節夫 山田
和之 磯
正之 杉崎
Original Assignee
日本電信電話株式会社
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 日本電信電話株式会社 filed Critical 日本電信電話株式会社
Priority to JP2022532313A priority Critical patent/JP7425368B2/en
Publication of WO2021256043A1 publication Critical patent/WO2021256043A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/60Information retrieval; Database structures therefor; File system structures therefor of audio data
    • G06F16/65Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/10Speech classification or search using distance or distortion measures between unknown speech and reference templates

Definitions

  • the range of the paragraph from the change of story (separation) to the utterance immediately before the next change or the paragraph from the change of story to the utterance at the end of the dialogue is effective to estimate. If the range of a paragraph can be estimated, the topic can be estimated by limiting the range to the utterances included in the paragraph, so that the topic can be estimated with higher accuracy.
  • the present disclosure is a paragraph from one story change to the utterance immediately before the next switch or a paragraph from the story switch to the end of the dialogue in a series of dialogue data containing multiple topics, such as a dialogue between an operator and a customer. Regarding the estimation of the range of and the estimation of the topic in the paragraph.
  • a binary label (switching label) indicating whether or not the utterance is switched, which is given to the utterance constituting the series data or the division unit obtained by dividing the utterance, is input.
  • the binary label is, for example, "1 (switching story)” or “0 (not switching story)", or "True (switching story)” or “False (not switching story)”. Labels such as. Further, if the utterance or its division unit is given some label indicating the change of the story, the input unit 11 considers it as “True (change of the story)” and some label indicating the change of the story. If is not given, it may be regarded as "False (not a change of utterance)".
  • the binary label is manually attached to the utterances that make up the series data or their division units in advance. As mentioned above, there are words and phrases that are often spoken at the transition of the story. Binary labels are given, for example, based on these terms. For example, taking the failure of a device as an example, when it is desired to classify whether or not the topic is related to the failure of the device, the topic of the utterance regarding the failure of the device is "device failure" regardless of the cause. On the other hand, if you want to classify topics according to the cause of the failure, the topic will be different for each cause of the failure. Therefore, depending on how the topic to be classified is decided, the topic may not be switched even if the story is divided.
  • the multi-valued label complementing unit 22 also assigns a multi-valued label indicating a topic in the range including the utterance to such an utterance or a division unit thereof. By doing so, it is possible to increase the teacher data of utterances related to each topic and improve the accuracy of topic estimation.
  • the multi-valued label complementing unit 22 outputs the utterance to which the multi-valued label is attached or the division unit thereof and the multi-valued label assigned to the utterance or the division unit to the multi-value classification learning unit 23.
  • the multi-value classification learning unit 23 uses the utterance or its division unit output from the multi-value label complement unit 22 and the multi-value label given to the utterance or division unit as teacher data (second teacher data). , Multi-value classification model 2 (second model) is learned. Therefore, the multi-value classification model 2 is a model learned in advance based on the teacher data (second teacher data) for the utterances constituting the series data or the division units thereof.
  • the teacher data used for learning the multi-valued classification model 2 is given a binary label indicating that the utterance of the story change or its division unit is a story change, and the range in which the topic continues and its division unit.
  • the input unit 31 inputs series data including a plurality of topics.
  • the series data input to the input unit 31 is data to be processed that is the target of estimation of the paragraph range and the topic in the paragraph.
  • the series data is, for example, text data in which time-series utterances of an operator and a customer are voice-recognized.
  • the input unit 31 may sequentially input the text data obtained by the voice recognition of each utterance during the dialogue. Further, when the series data is input offline, the input unit 31 may sort by the start time or the end time of each utterance during the dialogue and input the text data of each utterance.
  • the input unit 31 outputs the input series data to the determination unit 32.
  • the topic estimation unit 34 uses the multi-value classification model 2 (second model) to estimate the topic in the paragraph or the utterance contained in the paragraph whose range is estimated by the paragraph estimation unit 33.
  • the multi-value classification model 2 is pre-learned based on the teacher data to which the utterances constituting the series data or the division units thereof are given multi-value labels indicating the topics to which the utterances are related. It is a model.
  • the teacher data used for learning the multi-valued classification model 2 is given a binary label indicating that the utterance of the story change or its division unit is a story change, and the range in which the topic continues and the range thereof.
  • the topic in is generated using the identified series data.
  • the teacher data used for learning the multi-valued classification model 2 includes the utterance in the above-mentioned series data in the utterance or the division unit thereof to which the binary label indicates that the utterance is a change of story. It is generated by adding a multi-valued label indicating the topic in the range.
  • the output unit 35 outputs the utterances constituting the paragraph for each paragraph whose range is estimated in the series data. Further, the output unit 35 may output a multi-valued label indicating a topic in the paragraph, the disclosure time and the end time of the paragraph, and the like.
  • FIG. 4 is a diagram showing a configuration example of an estimation device 30a for estimating a topic without using the multi-value classification model 2 according to the present embodiment.
  • the same components as those in FIG. 3 are designated by the same reference numerals, and the description thereof will be omitted.
  • the keyword extraction unit 36 extracts at least one keyword from the utterances included in the paragraph whose range is estimated by the paragraph estimation unit 33. Any method can be used as the method for extracting keywords, and for example, an existing method such as tf-idf (Term Frequency-Inverse Document Frequency) can be used.
  • the number of keywords extracted by the keyword extraction unit 36 may be limited to a predetermined number in advance, or may be specified by the user.
  • the topic estimation unit 34a estimates the topic in the paragraph or the utterance contained in the paragraph based on the keywords extracted from the utterance included in the paragraph by the keyword extraction unit 36.
  • the topic estimation unit 34a may, for example, estimate the extracted keyword as a paragraph or a topic in an utterance contained in the paragraph. Further, the topic estimation unit 34a may estimate, for example, a topic having a high similarity to the extracted keyword from a plurality of predetermined topics as a paragraph or a topic in the utterance included in the paragraph.
  • the estimation device 30a shown in FIG. 4 it is possible to estimate the topic in the paragraph or the utterance contained in the paragraph without using the multi-value classification model 2. Therefore, even when it is difficult to prepare a range of topics and a large amount of teacher data in which the topics in the range are specified, it is possible to estimate the topics in the series data.
  • FIG. 5 is a diagram showing a configuration example of the estimation device 30b according to the present embodiment. Like the estimation device 30a shown in FIG. 4, the estimation device 30b shown in FIG. 5 estimates the topic without using the multi-value classification model 2.
  • the same components as those in FIG. 4 are designated by the same reference numerals, and the description thereof will be omitted.
  • the keyword extraction unit 36b extracts keywords from the utterances included in the representative paragraph determined by the clustering unit 37 among the paragraphs constituting the cluster.
  • the topic estimation unit 34b estimates the topic in the paragraph constituting the cluster based on the keywords extracted by the keyword extraction unit 36b from the utterances included in the paragraph representing the cluster. Specifically, the topic estimation unit 34b estimates a topic estimated based on a keyword extracted from an utterance included in a paragraph representing a cluster as a topic in all paragraphs constituting the cluster.
  • FIGS. 3 to 5 the description has been made using an example in which series data in which a plurality of utterances are arranged in chronological order is input, but the present disclosure is not limited to this.
  • a function unit for extracting the utterances one by one from the series data may be provided in front of the input unit 31.
  • the multi-valued label complementing unit 22 reads the utterances to which the multi-valued label indicating the topic and the binary label indicating the switching of the talk are attached one by one from the series data input to the input unit 21 (step S11). ..
  • the multi-valued label is given only to the first utterance in the range indicating the topic, and is not given to other utterances.
  • the binary label indicating the change of talk is given only to the utterance showing the change of talk, and is not given to other utterances.
  • the multi-valued label complementing unit 22 determines whether or not a multi-valued label indicating a topic is attached to the read utterance (step S12).
  • step S12 When it is determined that the multi-valued label is not attached (step S12: No), or when the multi-valued label attached to the read utterance is updated and stored, the multi-valued label complementing unit 22 adds the read utterance to the read utterance. , It is determined whether or not a binary label indicating that the utterance is switched is attached (step S14).
  • the multi-value label complementing unit 22 stores the multi-value label stored in the multi-value label temporary storage device. It is given to the read utterance (step S15). As described above, when the read utterance is given a binary label indicating that the dialogue is switched, the multi-valued label complementing unit 22 indicates a multi-valued topic in the series data in the range including the utterance. Give a value label.
  • step S14 When it is determined that the binary label indicating that the talk is switched is not given (step S14: No), or when the read utterance is given a multi-value label, the multi-value label complementing unit 22 reads. It is determined whether or not the utterance is the utterance at the end of the dialogue (step S16).
  • step S16 When it is determined that the read utterance is the utterance at the end of the dialogue (step S16: Yes), the multi-value label complementing unit 22 ends the process.
  • step S16 When it is determined that the read utterance is not the utterance at the end of the dialogue (step S16: No), the multi-value label complementing unit 22 returns to the process of step S11 and reads the next utterance.
  • the multi-valued label is given to only the first utterance in the range indicating the topic, and is not given to other utterances. However, all the utterances in the range indicating the topic are given in advance. May be labeled with a multi-valued label for that topic. In this case, if the multi-valued label is deleted from the utterances that are not given the binary label indicating the change of story, the multi-valued label indicating the topic is given only to the utterances that are given the binary label indicating the change of story. Label.
  • any method may be used as long as a multi-valued label indicating the topic is attached to the utterance of the change of story.
  • FIG. 7 is a flowchart showing an example of the operation of the estimation device 30, and is a diagram for explaining an estimation method by the estimation device 30.
  • the determination unit 32 reads the utterances one by one from the series data of the processing target input to the input unit 31 (step S21).
  • the determination unit 32 uses the binary classification model 1 to determine whether or not the read utterance is a talk switching utterance (step S22).
  • step S23 When it is determined that the read utterance is not the utterance of the switching of the talk and the read utterance is not the utterance at the end of the dialogue (step S23: No), the paragraph estimation unit 33 determines the read utterance. , Accumulate as utterances constituting the paragraph (step S24). When the read utterances are accumulated, the process is repeated from step S21.
  • the paragraph estimation unit 33 estimates that the range of the accumulated utterances is a paragraph, and the accumulated utterances are used as the utterances constituting the paragraph, and the topic estimation unit 34 Output to.
  • the topic estimation unit 34 estimates the topic in the paragraph whose range has been estimated by the paragraph estimation unit 33 using the multi-value classification model 2 (step S26).
  • the topic estimation unit 34 may estimate the topic in at least one utterance unit included in the paragraph. In this case, the topic estimation unit 34 may estimate the topic using only the first utterance of the paragraph, or may estimate the topic using a predetermined number of utterances from the first utterance of the paragraph. ..
  • the multi-value classification model 2 is learned based on teacher data to which a multi-value label is attached to each unit for estimating a topic.
  • the topic estimation unit 34 attaches a multi-valued label indicating the estimated topic to the paragraph (step S27).
  • the paragraph estimation unit 33 resets the accumulation of utterances (step S28), and determines whether or not the read utterance is the utterance at the end of the dialogue (step S29).
  • step S29: No the paragraph estimation unit 33 returns to the process of step S24 and accumulates the read utterance. By doing this, the read utterance is accumulated as the first utterance of a new paragraph.
  • step S29: Yes When it is determined that the read utterance is the utterance at the end of the dialogue (step S29: Yes), the paragraph estimation unit 33 ends the process.
  • the estimation method by the estimation device 30 includes a determination step (step S22) and a paragraph estimation step (steps S23 to S25).
  • the determination step teacher data to which a binary label (first label) indicating whether or not the utterance is switched is given to the utterance or its division unit constituting the series data of the dialogue including a plurality of topics.
  • the utterances constituting the series data to be processed using the binary classification model 1 (first model) learned in advance based on (first teacher data) are utterances of switching stories. To judge.
  • the paragraph estimation step based on the result of the determination, the range of the paragraph from the talk switching to the utterance immediately before the next switching or the paragraph from the talk switching to the utterance at the end of the dialogue is estimated in the series data to be processed. ..
  • the estimation method according to the present embodiment may further include a topic estimation step (step S26).
  • a topic estimation step many pre-learned based on teacher data to which a multi-valued label (second label) indicating a topic related to the utterance is given to the utterances constituting the series data or the division unit thereof.
  • the value classification model 2 (second model) is used to estimate the topic in the paragraph or the utterance contained in the paragraph. By estimating the range of the paragraph, the topic can be estimated only for the utterances included in the paragraph, so that the estimation accuracy of the topic can be improved.
  • FIG. 8 is a flowchart showing an example of the operation of the estimation device 30a shown in FIG. 4, and is a diagram for explaining an estimation method by the estimation device 30a.
  • the same processing as in FIG. 7 is designated by the same reference numerals, and the description thereof will be omitted.
  • the paragraph estimation unit 33 estimates that the range of the accumulated utterances is a paragraph, and outputs the accumulated utterances to the keyword extraction unit 36.
  • the keyword extraction unit 36 extracts keywords from the utterances included in the paragraph whose range is estimated by the paragraph estimation unit 33 (step S31).
  • the topic estimation unit 34a estimates the topic in the paragraph or the utterance included in the paragraph based on the keyword extracted by the keyword extraction unit 36 from the utterance included in the paragraph (step S32).
  • the estimation method by the estimation device 30a includes a keyword extraction step (step S31) and a topic estimation step (step S32).
  • keyword extraction step keywords are extracted from the utterances contained in the paragraph whose range is estimated.
  • topic estimation step the topic in the paragraph or the utterance contained in the paragraph is estimated based on the keywords extracted from the utterance contained in the paragraph.
  • FIG. 9 is a flowchart showing an example of the operation of estimating the range of the paragraph by the estimation device 30b shown in FIG. 5, and is a diagram for explaining the estimation method by the estimation device 30b.
  • the same processing as in FIG. 7 is designated by the same reference numerals, and the description thereof will be omitted.
  • step S25 Yes
  • the paragraph estimation unit 33 estimates that the range of the accumulated utterances is a paragraph. Then, the paragraph estimation unit 33 resets the accumulation of utterances (step S28).
  • FIG. 10 is a flowchart showing an example of the operation of estimating a topic by the estimation device 30b shown in FIG. 5, and is a diagram for explaining an estimation method by the estimation device 30b.
  • the clustering unit 37 reads the paragraph whose range has been estimated by the paragraph estimation unit 33 (step S41).
  • the clustering unit 37 reads a plurality of paragraphs contained in at least one or more series data. That is, the clustering unit 37 repeats the process of step S41 as many times as necessary.
  • the clustering unit 37 clusters a plurality of read paragraphs for each similar paragraph (step S42).
  • the clustering unit 37 determines whether or not there are unprocessed clusters (step S43).
  • An unprocessed cluster is a cluster in which paragraphs contained in the cluster are not given multi-value labels.
  • the clustering unit 37 determines one of the unprocessed clusters as the cluster to be processed, and the paragraph included in the cluster to be processed is included. A representative paragraph is determined from the inside (step S44). The clustering unit 37 determines, for example, the paragraph at the center of the cluster as the representative paragraph.
  • the keyword extraction unit 36b extracts keywords from the utterances included in the representative paragraph of the cluster determined by the clustering unit 37 (step S45).
  • the topic estimation unit 34b estimates the topic in the paragraph representing the cluster based on the keywords extracted by the keyword extraction unit 36b (step S46). Next, the topic estimation unit 34b determines whether or not there is an unprocessed paragraph (step S47).
  • the unprocessed paragraph is a paragraph included in the cluster to be processed that is not given a multi-value label.
  • step S47 When it is determined that there is an unprocessed paragraph (step S47: No), the topic estimation unit 34b estimates the unprocessed paragraph included in the cluster based on the keyword extracted from the representative paragraph of the cluster. Is given a multi-valued label indicating (step S48). Then, the topic estimation unit 34b returns to the process of step S47.
  • step S47: Yes the process is repeated from step S43.
  • the estimation method by the estimation device 30b further includes a clustering step (step S42).
  • a clustering step a plurality of paragraphs whose range is estimated based on one or a plurality of series data are clustered for each similar paragraph.
  • keywords are extracted from the utterances included in the representative paragraph among the paragraphs included in the cluster consisting of similar paragraphs.
  • the topic estimation step the topic in the paragraphs constituting the cluster including the representative paragraph is estimated based on the keywords extracted from the utterances included in the representative paragraph.
  • model learning (binary classification model 1 and multi-value classification model 2) will be described using a specific example shown in FIG. In the following, it is assumed that the series data includes five topics, "topic A”, “topic B”, “topic C”, “topic D”, and "topic E”.
  • the range in which one topic continues and the topic in that range are manually specified, and for each range in which one topic continues, the topic in that range.
  • a multi-valued label indicating is manually attached to the utterances constituting the series data.
  • the utterance is the utterance of the talk change only for the utterance of the talk change.
  • a binary flag indicating that the utterances are switched is given to the utterances that are switched. Therefore, in FIG. 11, for example, an utterance existing in the middle of the range in which the utterance related to the topic A continues may be given a binary label indicating that the utterance is a change of talk.
  • the above-mentioned series data and binary label are input to the learning device 10, and the binary classification model 1 is trained using LSTM or the like based on the input series data and binary label.
  • the above-mentioned series data, binary label and multi-value label are input to the learning device 20.
  • the multi-valued label is complemented. That is, as shown in FIG. 11, for an utterance to which a label indicating that the utterance is switched is given, a multi-valued label indicating a topic in the range of series data including the utterance is given.
  • teacher data is created with a multi-valued label indicating the topic to which the utterance is related to the utterances constituting the series data.
  • a multi-valued label indicating a topic related to the utterance may be attached to the division unit of the utterance constituting the series data.
  • the multi-value classification model 2 is learned using LSTM or the like. In the learning of the multi-value classification model 2, only the utterances with the multi-value label may be learned, or the utterances of the entire paragraph including the utterances with the multi-value label may be learned.
  • FIG. 12 is a diagram showing an example of topic estimation by the estimation device 30 shown in FIG. In FIG. 12, it is assumed that the multi-valued classification model 2 is learned in utterance units.
  • the estimation device 30 When the series data of one dialogue is input to the estimation device 30, as shown in FIG. 12, whether or not the utterances constituting the series data are the utterances of switching of the talks using the binary classification model 1. It is judged. Then, the range from the utterance of the change of story to the utterance immediately before the utterance of the change of the next story or the utterance of the change of story to the utterance at the end of the dialogue is estimated to be one paragraph.
  • FIG. 14 is a diagram showing an example of topic estimation by the estimation device 30a shown in FIG.
  • the estimation device 30a When the series data of one dialogue is input to the estimation device 30a, as shown in FIG. 14, whether or not the utterance constituting the series data is the utterance of switching of the talk using the binary classification model 1. It is judged. Then, the range from the utterance of the change of story to the utterance immediately before the utterance of the change of the next story is estimated to be one paragraph.
  • FIG. 14 shows an example in which different multi-value labels (“Topic 1” to “Topic 10”) are assigned to each paragraph, but these are necessarily different topics. Do not mean.
  • the estimation device 30b When the series data of one or more dialogues is input to the estimation device 30b, as shown in FIG. 15, is the utterance constituting the series data the utterance of the switching of the talks using the binary classification model 1? It is judged whether or not. Then, the range from the utterance of the change of story to the utterance immediately before the utterance of the change of the next story is estimated to be one paragraph.
  • FIG. 15 a plurality of paragraphs whose range is estimated are clustered for each similar paragraph.
  • a representative paragraph is determined from a cluster of similar paragraphs, and keywords are extracted from the utterances contained in the representative paragraph.
  • the paragraph shown by the thick line indicates the representative paragraph.
  • the topic in the representative paragraph is estimated based on the keywords extracted from the utterances included in the representative paragraph of the cluster, and a multi-valued label indicating the estimated topic is given to the representative paragraph. Further, as shown in FIG. 15, other paragraphs constituting the cluster are also given the same multi-valued label as the representative paragraph of the cluster.
  • this method In order to show the effectiveness of the estimation method according to this embodiment (hereinafter, may be referred to as "this method"), a comparison with the conventional method was made by experiment. In the experiment, 349 calls were used for learning the model and 50 calls were used for verification. As multi-valued labels indicating a topic, eight types of labels indicating a topic A to a topic H and a fixed topic S from the first utterance of a call to the switching of the first talk are prepared. In the conventional method, a binary classification model is learned by using data in which a binary label indicating whether or not an utterance is a change of talk is given only to an utterance in which a multi-value label is switched as teacher data, and a multi-value label is used. This is a method of learning a multi-valued classification model using only switching utterances as teacher data.
  • the range of paragraphs is estimated by including the utterances that transition from a certain topic to the same topic in the utterances that change the story. Therefore, as shown in Table 1, in this method, the precision rate is lower than that in the conventional method. However, in this method, it has become possible to detect paragraphs and utterances of story switching that could not be detected by the conventional method, so that the recall rate of paragraph division has increased.
  • the multi-value classification model is generated by learning the teacher data to which the multi-value label indicating the topic in the utterance is manually attached only to the utterance in which the multi-value label is switched.
  • the multi-value classification model 2 was generated by learning the teacher data supplemented with the multi-value label for the utterance to which the label indicating that the story was switched was manually assigned.
  • the utterance is a switching utterance by the conventional method and the binary classification model learned by this method.
  • the topic in the utterance was estimated and compared with the topic of the correct answer given manually to the utterance.
  • the results of the comparison are shown in Table 2.
  • the results (F value) of the classification of all utterance topics were evaluated in the 100 calls targeted for evaluation.
  • This evaluation is a comprehensive evaluation of the determination of utterances of story switching by the binary classification model and the estimation of topics by the multi-value classification model.
  • the multi-value classification model 2 determines that an utterance that transitions from a certain topic to the same topic is also a utterance that switches the story, but the multi-value classification model 2 determines that the transition to the same topic. Many of the utterances were classified as correct topics. Therefore, as shown in Table 3, the overall evaluation result of this method was higher than that of the conventional method.
  • the estimation device 30 includes a determination unit 32 and a paragraph estimation unit 33.
  • the determination unit 32 is a teacher data (first teacher) to which a binary label indicating whether or not the talk is switched is given to the utterance or the division unit thereof constituting the series data of the dialogue including a plurality of topics.
  • the binary classification model 1 (first model) learned in advance based on the data) it is determined whether or not the utterance constituting the series data to be processed is the utterance of the switching of the talk.
  • the paragraph estimation unit 33 is a paragraph in the series data to be processed from the change of story to the utterance immediately before the next change, or the paragraph from the change of story to the end of the dialogue. Estimate the range of.
  • the utterance that constitutes the series data is the utterance of the change of story by learning the teacher data to which the binary label indicating whether or not the change of the story is given to the utterance or its division unit. It is possible to generate a binary classification model 1 for determining whether or not. Then, based on the result of the determination of the binary classification model 1, the range of paragraphs in the series data can be estimated. Further, by estimating the range of the paragraph in the series data, the range of estimating the topic can be limited to the utterances included in the paragraph, so that the accuracy of estimating the topic in the paragraph can be improved.
  • a method called Text Tiling is known as a method of dividing series data into sections of topics that are objectively classified with respect to a series of series data (see, for example, Reference 1).
  • the text is divided at the minimum point of the degree of cohesion based on the cohesiveness of words in the vicinity of the text.
  • Topic Tiling that divides text using Latent Dirichlet Allocation (LDA), which is a representative topic model, has also been proposed (see Reference 2).
  • Subjective topics are topics that are categorized, for example, from the perspective of isolating the cause of the customer's inability to use a particular service, or from the perspective of interviewing the needs or desires of the operator to the customer on a sales call. be.
  • the same keywords as service names, product names, or related vocabularies appear everywhere in the dialogues, so even if the content is a topic that you want to distinguish subjectively, it is superficial and objective. Indistinguishable topics make up the majority of the dialogue. Therefore, the methods described in References 1 and 2 cannot accurately divide and classify dialogues for each subjective topic.
  • the utterance itself is short, and there are some utterances in which it is not possible to uniquely determine which topic the utterance belongs to. Such utterances will be labeled with a topic that is different from the original topic. In a model that trains teacher data with a label different from the original topic, the accuracy of classification is reduced. Therefore, in the method described in Reference 3, it is difficult to appropriately classify each utterance including a short conversation input in chronological order by a subjective topic.
  • the estimation device 30c according to the present embodiment determines whether or not the utterance constituting the series data or the division unit thereof is a topic switching, and estimates the range of the paragraph based on the determination result.
  • FIG. 16 is a diagram showing a configuration example of the estimation device 30c according to the present embodiment.
  • the estimation device 30c includes an input unit 41, a determination unit 42, a topic estimation unit 43, a paragraph estimation unit 44, and an output unit 45.
  • the input unit 41 inputs the series data of the dialogue including a plurality of topics.
  • the series data input to the input unit 41 is data to be processed that is the target of estimation of the range of paragraphs and topics in paragraphs.
  • the series data is, for example, text data in which time-series utterances of an operator and a customer are voice-recognized.
  • the input unit 41 may sequentially input the text data obtained by the voice recognition of each utterance during the dialogue. Further, when the series data is input offline, the input unit 41 may sort the start time or end time of each utterance during the dialogue and input the text data of each utterance.
  • the input unit 41 outputs the input series data to the determination unit 42.
  • the determination unit 42 uses the binary classification model 1a to determine whether or not the utterance constituting the series data output from the input unit 41 is a topic switching utterance.
  • the binary classification model 1a is a model learned in advance so as to determine whether or not the topic is switched with respect to the utterance or the division unit thereof constituting the series data of the dialogue.
  • the teacher data to which the binary label (switching label) indicating whether or not the topic is switched is attached to the utterance or the division unit thereof constituting the series data is shown in FIG. It can be created by learning with the learning device 10 described with reference to.
  • the determination unit 42 determines from the determination result using the binary classification model 1a whether or not the utterance constituting the series data or the division unit thereof is to be processed by the topic estimation unit 43 described later. Specifically, the determination unit 42 determines the utterance or the division unit thereof determined to be the switching of the topic as the processing target by the topic estimation unit 43. The determination unit 42 outputs the determination result of whether or not to be processed by the topic estimation unit 43 to the topic estimation unit 43 and the paragraph estimation unit 44.
  • the topic estimation unit 43 uses the multi-value classification model 2a to set a topic within the range including the utterance for the utterance (the utterance of switching the topic) determined to be processed by the determination unit 42 or the division unit thereof. Give a multi-valued label to indicate.
  • the multi-value classification model 2a is a model for estimating a topic in a range including the utterance with respect to the utterance or its division unit.
  • the teacher data to which the multi-value label (topic label) indicating the topic to which the utterance is related is attached to the utterance or the division unit thereof constituting the series data is referred to with reference to FIG. It can be created by learning with the learning device 20 described above.
  • the utterances of the topic switching may be performed, and the learning about the transition of the topic may be performed only for the utterances to which the multi-value label is attached.
  • the utterances between the utterance of the topic change and the utterance of the next topic change from the learning target noise for the topic classification can be removed.
  • the topic estimation unit 43 stores the topic estimation result (multi-valued label corresponding to the estimated topic) in the label information table.
  • the label information table is an area for storing the estimation result of the topic for the data to be processed, and may be a memory on a computer, a database, or a file.
  • the paragraph estimation unit 44 estimates that the range from the utterance determined to be processed by the determination unit 42 (the utterance of the topic change) to the utterance immediately before the next utterance determined to be processed is the range of one paragraph. do.
  • the paragraph estimation unit 44 attaches a multi-valued label stored in the label information table to the utterance included in the paragraph whose range is estimated. Specifically, the paragraph estimation unit 44 describes the utterances from the utterance of the topic change to the utterance immediately before the utterance of the next topic change, and the utterance of the topic change stored in the label information table. Give the given multi-valued label.
  • the output unit 45 outputs the utterances constituting the paragraph for each paragraph whose range is estimated in the series data. Further, the output unit 35 may output a multi-valued label indicating a topic in the paragraph, a paragraph start time, an end time, and the like.
  • a morphological analysis unit that performs morphological analysis for text chat may be provided after the input unit 41. Further, when the series data to be processed is input offline, the configuration of the estimation device 30c uses all the results of the determination of whether or not the utterance of the topic is switched and the estimation of the topic at once, and paragraphs. You may estimate the range of. In this case, the paragraph estimation unit 44 sets the utterance in the range from the change of the topic to the utterance immediately before the change of the next topic based on the determination result of whether or not the change of the topic is made and the estimation result of the topic. , The multi-valued label estimated by the topic estimation unit 43 may be attached.
  • FIG. 17 is a flowchart showing an example of the operation of the estimation device 30c according to the present embodiment.
  • the determination unit 42 determines whether or not the dialogue in the series data of the processing target input to the input unit 41 has been completed (step S51).
  • step S51: Yes the estimation device 30c ends the process.
  • the determination unit 42 reads the utterance to be processed (step S52).
  • the determination unit 42 uses the binary classification model 1a to determine whether or not the read utterance is a topic-switching utterance (step S53).
  • step S54 When it is determined that the read utterance is not a topic switching utterance (step S54: No), the process of step S57, which will be described later, is performed.
  • the topic estimation unit 43 estimates the topic of the read utterance using the multi-value classification model 2a (step). S55). The topic estimation unit 43 stores the estimated topic in the label information table and updates the label information table (step S56). That is, the label information table is updated every time the read utterance is a topic switching utterance.
  • the paragraph estimation unit 44 assigns a multi-valued label stored in the label information table to the read utterance (step S57).
  • the label information table is updated every time the read utterance is a topic switching utterance. Therefore, the same multi-valued label is assigned from the utterance of the topic change to the utterance immediately before the utterance of the next topic change, which constitutes one paragraph.
  • the determination unit 42 When a multi-valued label is attached to the read utterance, the determination unit 42 returns to the process of step S51 with the next utterance in the series data as the processing target (step S58).
  • FIG. 18 is a diagram showing an example of topic estimation by the estimation device 30c according to the present embodiment. In FIG. 18, it is assumed that the binary classification model 1a and the multi-value classification model 2a are learned in utterance units.
  • the determination unit 42 uses the binary classification model 1a as shown in FIG. It is determined whether or not it is.
  • the topic estimation unit 43 estimates the topic of the utterance determined to be the switching of the topic by using the multi-value classification model 2a, and stores the multi-value label indicating the estimated topic in the label information table.
  • the paragraph estimation unit 44 estimates the range from the utterance of the topic change to the utterance immediately before the utterance of the next topic change as one paragraph. Then, the paragraph estimation unit 44 assigns a multi-valued label indicating the topic of the utterance at the beginning of the paragraph, which is stored in the label information table, to all the utterances constituting the paragraph.
  • the estimation device 30c uses the binary classification model 1a to determine whether or not the utterances constituting the series data are utterances of switching topics. Further, the estimation device 30c estimates the topic of the utterance of the topic change by using the multi-value classification model 2a. Further, the estimation device 30c estimates the range of the paragraph from the utterance of the topic change to the utterance immediately before the next topic change utterance, and the topic estimated about the topic change utterance is the topic change. Presumed to be a topic in a paragraph containing the utterance of.
  • the utterance division unit is, for example, a word unit in which the utterance is divided into words. Further, the utterance division unit is, for example, a unit divided by punctuation marks or punctuation marks when punctuation marks are added to the utterance. Further, in the first and second embodiments described above, when the topic of the utterance is estimated, the topic is estimated in the utterance or a predetermined division unit. Then, in the first and second embodiments, the division unit of the utterance was fixed.
  • the topic does not always change in a predetermined unit.
  • the response history may be recorded separately for the scene of confirming the presence or absence of an injury and the scene of confirming the damage of a car.
  • the dialogue between the customer and the person in charge of responding shown in utterances 1 to 4 will be described with an example of dividing the dialogue into a scene for confirming the presence or absence of an injury and a scene for confirming damage to the car.
  • Responsible person "I heard that you had an accident when you put the car in the garage.
  • utterance 1 and utterance 2 are utterances in a scene where damage to the car is confirmed.
  • the scene of confirming the damage of the car is switched to the scene of confirming the presence or absence of injury, and the scene of confirming the presence or absence of injury continues to utterance 4.
  • utterance 3 "That's right, because I rubbed the bumper behind the car with a utility pole when I put it in the garage” is the scene to confirm the damage of the car, and utterance 3 " Was your body okay? ”Is the scene to check for any injuries.
  • the unit "That's right, I rubbed the bumper behind the car with a utility pole when I put it in the garage" is a label indicating that it is a scene to confirm damage to the car. It is desirable to give a label indicating that it is a scene to confirm the presence or absence of injury to the unit "Is your body okay?", But such a unit is decided in advance. That is difficult.
  • the learning unit is not fixed, and positive examples, negative examples, and non-target learning data are dynamically created in various units from the teacher data. That is, in the present embodiment, the learning data is created by making the division unit of the utterance variable. By doing so, even when the story (scene) is switched in the middle of the utterance, it is possible to create learning data for learning a model capable of estimating the switching point with high accuracy. Further, by using a model in which learning data created without fixing the learning unit is used, it is possible to estimate each scene in the utterance even when the scene is switched in the middle of the utterance.
  • FIG. 19 is a diagram showing a configuration example of the learning data creating device 50 according to the present embodiment.
  • the learning data creating device 50 according to the present embodiment dynamically creates positive examples, negative examples, and non-target learning data in various units from the teacher data.
  • the learning data creating device 50 includes an input unit 51, a learning data creating unit 52, and an output unit 53.
  • the dialogue series data is input to the input unit 51.
  • the series data is, for example, voice data of a time-series dialogue between an operator and a customer, or text data in which utterances included in the dialogue are voice-recognized.
  • the input unit 51 outputs the input series data to the learning data creation unit 52.
  • the learning data creation unit 52 inputs the series data output from the input unit 51 and the teacher data.
  • the teacher data is data in which the range of utterances necessary for specifying a scene in the utterances constituting the series data is labeled before the learning data is created. Labels in teacher data are manually assigned.
  • the learning data creation unit 52 creates learning data used for learning a model for estimating a topic (scene) in the utterance in an arbitrary division unit of the utterance based on the input series data and the teacher data.
  • FIG. 20 is a diagram showing a configuration example of the learning data creation unit 52.
  • the learning data creation unit 52 includes a sentence output unit 521, an ID assignment unit 522, a combination generation unit 523, and an assignment unit 524.
  • the sentence output unit 521 outputs the utterance character string constituting the series data input from the input unit 51 as a sentence.
  • the sentence output unit 521 outputs a sentence divided into word units by morphological analysis.
  • the sentence output unit 521 outputs a sentence divided into word units by voice recognition.
  • the ID assignment unit 522 generates an element in which the utterance is divided according to a predetermined rule from the sentence output from the sentence output unit 521.
  • the unit of division (unit of element) by the ID assigning unit 522 may be any unit as long as it can be specified, such as a word unit, a punctuation mark unit, a voice recognition unit, and a speech end unit.
  • the ID assigning unit 522 assigns an ID to each of the elements in which the utterance is divided, and stores the ID assigned to each element in the ID set.
  • the combination generation unit 523 generates a combination of IDs (combination ID string) necessary for learning the model based on the IDs stored in the ID set.
  • FIG. 21 is a diagram showing a configuration example of the combination generation unit 523.
  • the combination generation unit 523 includes an ID extraction unit 5231, a combination target ID storage unit 5232, a combination generation ID storage unit 5233, and a combination ID generation unit 5234.
  • the ID extraction unit 5231 extracts a predetermined longest unit ID from the ID set and stores it in the longest unit ID set.
  • the longest unit may be any unit as long as it is a unit longer than the unit divided when the sentence is output by the sentence output unit 521 and can be specified in advance. For example, if the unit of division at the time of output of a sentence is a word unit, the longest unit is a punctuation mark unit or a punctuation unit, which is longer than the word unit. Further, for example, if the unit of division at the time of outputting a sentence is a punctuation mark unit, the longest unit is a punctuation mark unit or a voice recognition unit, which is longer than the punctuation mark unit.
  • the combination target ID storage unit 5232 extracts the IDs in the range to be combined from the longest unit ID set and stores them in the combination target ID set.
  • the combination generation ID storage unit 5233 acquires the combination generation ID for generating the combination ID string from the combination target ID set and stores it in the combination generation ID set.
  • the combination ID generation unit 5234 generates a combination ID string based on the set of combination generation IDs, stores it in the set of combination ID columns, and updates the set of combination ID columns.
  • the combination generation unit 523 outputs the generated combination ID string to the addition unit 524.
  • the combination ID string output from the combination generation unit 523 and the teacher data are input to the addition unit 524.
  • the assigning unit 524 creates learning data by assigning a positive example, a negative example, or a label to be excluded from learning based on the teacher data for each division unit in which the combination ID string is replaced with a character string.
  • FIG. 22 is a diagram showing a configuration example of the granting unit 524.
  • the granting unit 524 includes a positive example granting unit 5241, a negative example granting unit 5242, and a non-target granting unit 5243.
  • the regular example assigning unit 5241 assigns a label indicating a regular example to a predetermined ID column in the set of combination ID columns based on the teacher data. By doing so, a label showing a positive example is given to the division unit in which the predetermined ID string is replaced with the character string.
  • the negative example assigning unit 5242 assigns a label indicating a negative example to a predetermined ID column in the set of combination ID columns. By doing so, a label showing a negative example is given to the division unit in which the predetermined ID string is rewritten into the character string.
  • the non-target granting unit 5243 assigns a label indicating that it is not subject to learning to a predetermined ID column in the set of combination ID columns. By doing so, a label indicating that the combination ID string is not the target is given to the division unit in which the combination ID string is replaced with the character string.
  • the non-target granting unit 5243 deletes the combination ID column to which the label indicating that it is not the target of learning is attached, and the division unit corresponding to the combination ID column to which the label indicating the positive example or the negative example is attached, and A label indicating a positive example or a negative example is output as training data. The details of the operation of the granting unit 524 will be described later.
  • the output unit 53 outputs the learning data created by the learning data creation unit 52.
  • the operation of the learning data creation unit 52 will be described.
  • a case of creating learning data for learning a model for determining whether or not a scene (story) is switched will be described as an example.
  • the utterance 3 since the above-mentioned utterance 3 includes a scene change, the utterance 3 will be described as an example.
  • the label "T” is given to the range determined to be the change of the scene, and the label "F” is given to the range not determined to be the change of the scene.
  • the division unit of the sentence is a punctuation mark unit and the longest unit is a punctuation mark unit.
  • the label "T” is given to the range determined to be the change of scene in utterance 3 ("Is your body okay?").
  • the ID assigning unit 522 divides the utterance 3 by punctuation marks, and assigns an ID to each element divided by the punctuation marks. In the following, it is assumed that the ID assigning unit 522 assigns an ID as follows. ID1: Was that so? ID2: When you put it in the garage ID3: Because I rubbed the bumper behind the car with a utility pole, ID4: Your body is ID5: Is that okay?
  • the ID assigning unit 522 stores the ID assigned to each element of the utterance in the ID set.
  • the combination generation unit 523 creates a combination (ID string) of the IDs of the elements divided into punctuation marks within the range of the longest predetermined unit from the ID set.
  • the operation of the combination generation unit 523 will be described with reference to FIG. 23.
  • FIG. 23 is a flowchart showing an example of the operation of the combination generation unit 523.
  • the ID extraction unit 5231 extracts all IDs from the ID set for each longest unit and stores them in the longest unit ID set (step S61). As described above, since the longest unit is a punctuation unit, the range of the longest unit is ID1 to ID5. The ID extraction unit 5231 extracts IDs 1 to 5 from the ID set and stores (1, 2, 3, 4, 5) in the longest unit ID set.
  • the combination target ID storage unit 5232 deletes the smallest ID among the IDs stored in the longest unit ID set from the longest unit ID set, and stores the ID in the combination target ID set (step S62).
  • the combination target ID storage unit 5232 takes out ID1 from the ID set of the longest unit and stores it in the combination target ID set. Further, the combination target ID storage unit 5232 deletes ID1 from the ID set of the longest unit. Therefore, (2,3,4,5) is stored in the ID set of the longest unit.
  • the combination generation ID storage unit 5233 arranges all the IDs included in the combination target ID set in ascending order and stores them in the combination generation ID set and the combination ID string set (step S63).
  • the combination sequence in which all the IDs are arranged in ascending order is [1].
  • the combination generation ID storage unit 5233 stores (1) in the set of combination generation IDs, and stores [1] in the set of combination ID columns.
  • the combination ID generation unit 5234 deletes the smallest ID among the ID columns stored in the combination generation ID set, arranges the remaining IDs in ascending order, and stores them in the combination ID column set (step S64).
  • (1) is stored in the set of combination generation IDs. Therefore, the combination ID generation unit 5234 deletes the smallest ID1.
  • the combination ID generation unit 5234 determines whether or not the set of combination generation IDs is empty (step S65). In the above example, the set of combination generation IDs is empty because ID1 is deleted.
  • step S65: No If it is determined that the set of combination generation IDs is not empty (step S65: No), the combination ID generation unit 5234 repeats the process of step S64.
  • the combination target ID storage unit 5232 determines whether or not the longest unit ID set is empty. (Step S66). In the above example, since (2,3,4,5) is stored in the longest unit ID set, the longest unit ID set is not empty.
  • step S66 the combination target ID storage unit 5232 returns to the process of step S62.
  • the combination target ID storage unit 5232 since (2, 3, 4, 5) is stored in the ID set of the longest unit, the combination target ID storage unit 5232 takes out the smallest ID 2 and stores it in the combination target ID. Further, the combination target ID storage unit 5232 deletes the ID 2 from the ID set of the longest unit. Therefore, (3, 4, 5) is stored in the ID set of the longest unit.
  • steps S63 and S64 are performed, and (1 and 2) are stored in the ID set to be combined. Further, an ID string in which all the IDs stored in the ID set to be combined are arranged in ascending order is stored in the combination generation ID set and the combination ID string set. Since (1,2) is stored in the ID set to be combined, the combination column in which all the IDs are arranged in ascending order is [1,2], and the combination generation ID set is (1,2). Is stored. Further, the combination sequence [1, 2] is added to the set of combination columns, and the set of combination columns becomes ([1], [1, 2]).
  • the combination ID generation unit 5234 deletes the smallest ID among the ID columns stored in the combination generation ID set, arranges the remaining IDs in ascending order, and stores them in the combination ID column set.
  • (1, 2) is stored in the set of combination generation IDs. Therefore, the combination ID generation unit 5234 deletes the smallest ID1. ID1 is deleted, and (2) remains in the set of combination generation IDs. Since (2) remains in the set of combination generation IDs, the combination ID generation unit 5234 stores [2] in the set of combination ID strings. Therefore, the set of combination ID columns is ([1], [1,2], [2]).
  • the combination generation unit 523 generates a combination ID string composed of one element or a plurality of consecutive key points in which the utterance is divided according to a predetermined rule. [1] [1, 2] [2] [1, 2, 3] [2, 3] [3] [1, 2, 3, 4] [2,3,4] [3,4] [4] [1, 2, 3, 4, 5] [2,3,4,5] [3, 4, 5] [4,5] [5]
  • step S66 determines that the ID set of the longest unit is empty (step S66: Yes)
  • the ID extraction unit 5231 does not store the IDs stored in the longest unit ID set among the ID sets. It is determined whether or not there is (step S67).
  • step S67: Yes If it is determined that there is an ID that is not stored in the ID set of the longest unit (step S67: Yes), the ID extraction unit 5231 returns to the process of step S61.
  • step S67: No the combination generation unit 523 ends the process.
  • FIG. 24 is a flowchart showing an example of the operation of the granting unit 524.
  • the regular example assigning unit 5241 assigns a label indicating a regular example to all the ID columns in the range matching the teacher data in the ID strings included in the set of the combination ID strings generated by the combination generation unit 523 (step). S71). As described above, it is assumed that the label "T" is attached to the range determined to be the scene change in the utterance 3 ("Is your body okay?") As the teacher data. Therefore, the example giving unit 5241 assigns a label (“T”) indicating the example to the ID columns [4, 5] in the same range as “Is your body okay?” In the utterance 3.
  • the negative example assigning unit 5242 includes negative examples in all the combination ID columns included in the set of combination ID columns, which does not include any ID included in the ID column labeled with a positive example.
  • a label is attached (step S72).
  • the ID column [4, 5] is given a label indicating a positive example. Therefore, the negative example assigning unit 5242 assigns a label (“F”) indicating a negative example to all the following combination ID strings that do not include ID4 and ID5.
  • the non-target granting unit 5243 assigns a label indicating non-target to all the combination ID columns to which neither the label indicating the positive example nor the label indicating the negative example is assigned among the ID columns included in the set of the combination ID columns. (Step S73). In the above-mentioned example, the non-target granting unit 5243 assigns a label indicating non-target to the following combination ID column. [1,2,3,4]: Not applicable [2,3,4]: Not applicable [3,4]: Not applicable [4]: Not applicable [1,2,3,4,5]: Not applicable [2,3,4,5]: Not applicable [3,4,5]: Not applicable [5]
  • the non-target granting unit 5243 deletes the combination ID column to which the label indicating the non-target is attached from the set of the combination ID columns. Then, the non-target granting unit 5243 stores the division unit corresponding to the combination ID string to which the label indicating the positive example or the negative example is attached in the learning data. In the above-mentioned example, the division unit corresponding to the following combination ID string is stored in the learning data. [1]: F [1, 2]: F [2]: F [1, 2, 3]: F [2,3]: F [3]: F [4,5]: T
  • the learning data creation device 50 a division unit composed of one element or a plurality of consecutive elements in which the utterance is divided by a predetermined rule (for example, a punctuation mark unit) is given a label. And create learning data.
  • the learning data includes division units having different numbers of constituent elements.
  • the learning data can be created in the utterance division unit according to the change.
  • the learning data created in this way it is possible to create a model that can estimate the scene change with high accuracy even when the scene (story) changes in the middle of the utterance. can.
  • the estimation device 30d according to the present embodiment uses a model trained based on the training data created by the training data creation device 50, and switches scenes (story) in utterance division units having different numbers of constituent elements. Is to estimate
  • FIG. 25 is a diagram showing a configuration example of the estimation device 30d according to the present embodiment.
  • the estimation device 30d includes an input unit 61, an estimation unit 62, and an output unit 63.
  • the dialogue series data is input to the input unit 61.
  • the input unit 61 includes a sentence output unit 611. Similar to the sentence output unit 521, the sentence output unit 611 outputs the utterance character string constituting the series data input to the input unit 61 to the estimation unit 62 as a sentence.
  • the sentence output unit 611 outputs a sentence divided into word units by morphological analysis.
  • the output unit 611 outputs a sentence divided into word units by voice recognition.
  • the estimation unit 62 estimates the change of story from the sentence output from the input unit 61 by using the estimation model 3.
  • the estimation model 3 is a model created by learning the learning data created by the learning data creation device 50.
  • the learning data created by the learning data creation unit 50 includes division units having different numbers of constituent elements, and each division unit is given a label as to whether or not the story is switched. It is data. Therefore, the estimation model 3 is a model learned in advance so as to determine whether or not the story is switched for each of the division units having different numbers of constituent elements.
  • the estimation unit 62 generates division units having different numbers of constituent elements from the utterances constituting the series data to be processed, and uses the estimation model 3 as the first model for each of the generated division units. Judging whether or not it is a switch
  • the output unit 63 outputs the estimation result by the estimation unit 62.
  • FIG. 27 is a diagram showing a configuration example of the estimation unit 62.
  • the estimation unit 62 includes an ID assignment unit 621, a combination generation unit 622, and a switching estimation unit 623.
  • the ID assignment unit 621 generates an element in which the utterance is divided according to a predetermined rule from the sentence output from the sentence output unit 611.
  • the unit of division by the ID assigning unit 621 may be any identifiable unit such as a word unit, a punctuation mark unit, a voice recognition unit, and a speech end unit.
  • the ID assigning unit 621 assigns an ID to each of the elements in which the utterance is divided, and stores the ID assigned to each element in the ID set.
  • the combination generation unit 622 generates a combination of IDs (combination ID string) used for estimating the switching of the story based on the IDs stored in the ID set.
  • FIG. 28 is a diagram showing a configuration example of the combination generation unit 622.
  • the combination generation unit 622 includes an ID extraction unit 6221, a combination target ID storage unit 6222, a combination generation ID storage unit 6223, and a combination ID generation unit 6224.
  • the ID extraction unit 6221 extracts a predetermined longest unit ID from the ID set and stores it in the longest unit ID set.
  • the combination target ID storage unit 6222 extracts the IDs in the range to be combined from the longest unit ID set and stores them in the combination target ID set.
  • the combination generation ID storage unit 6223 acquires the combination generation ID for generating the combination ID string from the combination target ID set and stores it in the combination generation ID storage unit.
  • the combination ID generation unit 6224 Similar to the combination ID generation unit 5234, the combination ID generation unit 6224 generates a combination ID string based on the combination generation ID set, stores it in the combination ID string set, and updates the combination ID column set.
  • the combination generation unit 622 switches the set of the generated combination ID string and outputs it to the estimation unit 623.
  • the switching estimation unit 623 is input with a set of combination ID strings output from the combination generation unit 622.
  • the switching estimation unit 623 uses the estimation model 3 to determine for each division unit corresponding to the combination ID string whether or not the division unit is a story change, and outputs the determination result.
  • the operation of the estimation unit 62 will be described focusing on the operation of the switching estimation unit 623. Since the operation of generating the combination ID string by the combination generation unit 622 is the same as the operation of the combination generation unit 523 described with reference to FIG. 23, the description thereof will be omitted.
  • FIG. 29 is a flowchart showing an example of the operation of the switching estimation unit 623.
  • the switching estimation unit 623 extracts one combination ID string consisting of only IDs for which it has not yet been estimated whether or not the story is switched from the set of combination ID strings (step S81).
  • the switching estimation unit 623 replaces the extracted combination ID string with a word string (step S82). That is, the switching estimation unit 623 replaces the ID included in the combination ID string with the utterance element corresponding to the ID.
  • the switching estimation unit 623 estimates whether or not the character string (speech division unit) in which the combination ID string is replaced is a story switching using the estimation model 3 (step S83).
  • the switching estimation unit 623 determines whether or not the estimation result is a positive example (whether the story is switched) (step S84).
  • step S84 determines whether or not the set of combination ID strings is empty (step S85).
  • step S85: No the switching estimation unit 623 returns to the process of step S81.
  • the switching estimation unit 623 When it is determined that the set of the combination ID strings is empty (step S85: Yes), the switching estimation unit 623 outputs the estimation result for each ID via the output unit 63 (step S86), and ends the process. ..
  • the switching estimation unit 623 consists of only IDs in the set of combination ID strings that do not estimate whether or not the story is switching. It is determined whether or not there is a combination ID column (step S87).
  • step S87: Yes When it is determined that there is a combination ID string consisting only of IDs for which it is not estimated whether or not the talk is switched (step S87: Yes), the switching estimation unit 623 returns to the process of step S81.
  • step S87 When it is determined that there is no combination ID string consisting only of IDs for which it is not estimated whether or not the talk is switched (step S87: No), the switching determination unit 623 estimates for each ID via the output unit 63. The result and the estimation unit are output (step S88), and the process is terminated.
  • the operation of the estimation unit 62 will be further described with reference to a specific example.
  • the ID assigning unit 621 divides the above-mentioned utterance into four elements in units of punctuation marks, and assigns IDs (ID1 to ID4) to each element.
  • the combination generation unit 622 generates a combination ID string by the process described with reference to FIG. 23.
  • the combination generation unit 622 has 10 combination ID sequences ([1], [1,2], [2], [1,2,3], [2,3], [3]. ], [1,2,3,4], [2,3,4], [3,4], [4]).
  • the switching estimation unit 623 extracts one combination ID string from the generated set of combination ID strings, and estimates whether or not the division unit corresponding to the extracted combination ID sequence is the switching of the story. As shown in FIG. 30B, the switching estimation unit 623 changes the story by dividing the division unit corresponding to the combination ID string in the set of the combination ID strings until it is estimated to be a positive example (story switching). Estimate in order whether or not there is. Combination ID columns [1], [1,2], [2], [1,2,3], [2,3], [3], [1,2,3,4], [2,3] It is assumed that the division unit corresponding to 4], is not a regular example, and the division unit corresponding to the combination ID sequence [3, 4] is estimated to be a regular example.
  • the switching estimation unit 623 does not have a combination ID string consisting of only unestimated IDs, the estimation result and the estimation unit are output for each ID via the output unit 63. Since it was estimated that the division unit corresponding to the combination ID sequence [3, 4] was a positive example, the switching estimation unit 623 had a positive example for the ID 3 and ID 4 as shown in FIG. 30B. Also, it is output that the unit (estimated unit) presumed to be a positive example is the combination sequence [3, 4].
  • the operation of the estimation unit 62 will be further described by giving another specific example.
  • the ID assigning unit 621 divides the above-mentioned utterance into four elements in units of punctuation marks, and assigns IDs (ID1 to ID4) to each element.
  • the combination generation unit 622 generates a combination ID string by the process described with reference to FIG. 23.
  • the combination generation unit 622 has 10 combination ID sequences ([1], [1,2], [2], [1,2,3], [2,3], [3]. ], [1,2,3,4], [2,3,4], [3,4], [4]).
  • the switching estimation unit 623 extracts one combination ID string from the generated set of combination ID strings, and estimates whether or not the division unit corresponding to the extracted combination ID sequence is the switching of the story. As shown in FIG. 31B, the switching estimation unit 623 changes the story by dividing the division unit corresponding to the combination ID string in the set of the combination ID strings until it is estimated to be a positive example (story switching). Estimate in order whether or not there is. In the following, it is assumed that the division unit corresponding to the combination ID column [1] is not a regular example, and the division unit corresponding to the combination ID column [1, 2] is a regular example.
  • the switching estimation unit 623 Since the switching estimation unit 623 has a combination ID sequence ([3], [3, 4], [4]) consisting only of IDs (ID3 and ID4) for which it is not estimated whether or not it is a positive example, there is a combination ID sequence ([3], [3,4], [4]). It is further estimated whether or not these ID columns are correct examples. In the following, it is assumed that the division unit corresponding to the combination ID column [3] is not a regular example, and the division unit corresponding to the combination ID column [3, 4] is estimated to be a regular example.
  • the switching estimation unit 623 Since the switching estimation unit 623 does not have a combination ID string consisting of only unestimated IDs, the estimation result and the estimation unit are output for each ID via the output unit 63. Since it was estimated that the division unit corresponding to the combination ID column [1, 2] and the combination ID column [3, 4] is a positive example, the switching estimation unit 623 may refer to ID 1 and ID 2 as shown in FIG. 31B. It is output that the estimation result is a positive example and that the estimation unit is the combination sequence [1, 2]. Further, the switching estimation unit 623 outputs to ID3 and ID4 that the estimation result is a positive example and that the estimation unit is the combination sequence [3,4].
  • whether or not the utterance is switched for each of the divided units in which the utterance is divided according to a predetermined rule and is composed of one element or a plurality of consecutive elements and the number of constituent elements is different.
  • a division unit having a different number of constituent elements is generated from the utterances constituting the series data to be processed, and the training data is generated by using the trained estimation model 3.
  • the estimation model 3 is used to determine whether or not the story is switched.
  • the switching point can be estimated with high accuracy.
  • the binary classification model 1 is created by the learning device 10 and the multi-value classification model 2 is created by the learning device 20.
  • the present invention is not limited to this. No.
  • one learning device 70 may create a binary classification model 1 and a multi-value classification model 2.
  • the learning device 70 includes an input unit 11, a binary classification learning unit 12 as a first model learning unit, an input unit 21, a multi-value label complementing unit 22, and a second model. It is provided with a multi-value classification learning unit 23 as a learning unit.
  • each of the input unit 11 and the binary classification learning unit 12 indicates whether or not the utterance or the division unit in which the utterance is divided, which constitutes the series data of the dialogue including a plurality of topics, is the switching of the utterance.
  • the teacher data first teacher data
  • the binary label first label
  • the operations of the input unit 21, the multi-value label complement unit 22, and the multi-value classification learning unit 23 are the operations of the input unit 21, the multi-value label complement unit 22, and the multi-value classification learning unit 23, which are described with reference to FIG. Is the same as.
  • the multi-valued classification learning unit 23 has teacher data (second label) in which a multi-valued label (second label) indicating a topic in the range is added to a range in which one topic in the series data continues. Based on the teacher data of 2), the multi-valued classification model 2 (second model) that estimates the topic in the utterance that constitutes the series data to be processed is learned.
  • FIG. 33 is a diagram showing an example of the operation of the learning device 70, and is a diagram for explaining a learning method by the learning device 70.
  • the binary classification learning unit 12 is a teacher to which a binary label indicating whether or not the utterance is switched is given to the utterance or the division unit obtained by dividing the utterance that constitutes the series data of the dialogue including a plurality of topics. Based on the data (first teacher data), the binary classification model 1 for determining whether or not the utterance constituting the series data to be processed is the utterance of the switching of the talk is learned (step S91).
  • the multi-value classification learning unit 23 estimates the topic in the utterance that constitutes the series data to be processed based on the teacher data in which the multi-value label indicating the topic in the range is added to the range in which one topic in the series data continues. Learn the multi-valued classification model 2 to be performed (step S92).
  • the hardware configuration of the estimation devices 30 to 30d will be described.
  • the estimation devices 30a to 30d may have the same hardware configuration.
  • the learning devices 10, 20, 70 and the learning data creating device 50 may have the same hardware configuration.
  • FIG. 34 is a block diagram showing a hardware configuration when the estimation device 30 of the present disclosure is a computer capable of executing a program instruction.
  • the computer may be a general-purpose computer, a dedicated computer, a workstation, a PC (Personal Computer), an electronic notepad, or the like.
  • the program instruction may be a program code, a code segment, or the like for executing a necessary task.
  • the estimation device 30 includes a processor 110, a ROM (Read Only Memory) 120, a RAM (Random Access Memory) 130, a storage 140, an input unit 150, a display unit 160, and a communication interface (I / F) 170.
  • the processor 110 is a CPU (Central Processing Unit), an MPU (Micro Processing Unit), a GPU (Graphics Processing Unit), a DSP (Digital Signal Processor), a SoC (System on a Chip), or the like, and is of the same type or different types. It may be composed of a plurality of processors.
  • the processor 110 controls each configuration and executes various arithmetic processes. That is, the processor 110 reads the program from the ROM 120 or the storage 140, and executes the program using the RAM 130 as a work area. The processor 110 controls each of the above configurations of the estimation device 30 and performs various arithmetic processes according to the program stored in the ROM 120 or the storage 140. In the present embodiment, the program according to the present disclosure is stored in the ROM 120 or the storage 140. The processor 110 reads and executes the program.
  • the determination unit 32, the paragraph estimation unit 33, and the topic estimation unit 34 constitute a control unit 38 (FIG. 3).
  • the control unit 38 may be configured by dedicated hardware such as an ASIC (Application Specific Integrated Circuit) or an FPGA (Field-Programmable Gate Array), or may be configured by one or more processors as described above. ..
  • ASIC Application Specific Integrated Circuit
  • FPGA Field-Programmable Gate Array
  • the control unit 61 may be configured by dedicated hardware such as an ASIC or FPGA, or may be configured by one or more processors as described above.
  • the program is stored in a non-transitory storage medium such as a CD-ROM (CompactDiskReadOnlyMemory), a DVD-ROM (DigitalVersatileDiskReadOnlyMemory), or a USB (UniversalSerialBus) memory. May be provided at. Further, the program may be downloaded from an external device via a network.
  • a non-transitory storage medium such as a CD-ROM (CompactDiskReadOnlyMemory), a DVD-ROM (DigitalVersatileDiskReadOnlyMemory), or a USB (UniversalSerialBus) memory. May be provided at. Further, the program may be downloaded from an external device via a network.
  • the ROM 120 stores various programs and various data.
  • the RAM 130 temporarily stores a program or data as a work area.
  • the storage 140 is composed of an HDD (Hard Disk Drive) or an SSD (Solid State Drive), and stores various programs including an operating system and various data.
  • the storage 140 stores the created binary classification models 1, 1a, multi-value classification models 2, 2a, and estimation model 3.
  • the input unit 150 includes a pointing device such as a mouse and a keyboard, and is used for performing various inputs.
  • the display unit 160 is, for example, a liquid crystal display and displays various information.
  • the display unit 160 may adopt a touch panel method and function as an input unit 150.
  • the communication interface 170 is an interface for communicating with other devices such as an external device (not shown), and for example, standards such as Ethernet (registered trademark), FDDI, and Wi-Fi (registered trademark) are used.
  • Ethernet registered trademark
  • FDDI FDDI
  • Wi-Fi registered trademark
  • An estimator with a processor The processor The series data to be processed using the first model learned in advance based on the first teacher data for the utterance that constitutes the series data of the dialogue including a plurality of topics or the division unit obtained by dividing the utterance. Judging whether or not the utterances that make up the utterance are switching utterances, Based on the result of the determination, the range of the paragraph from the switching of the talk to the utterance immediately before the next switching or the paragraph from the switching of the talk to the utterance at the end of the dialogue in the series data to be processed is estimated. Estimator.
  • a learning device equipped with a processor The processor Based on the first teacher data to which the first label indicating whether or not the utterance is switched is given to the utterance that constitutes the series data of the dialogue including a plurality of topics or the divided unit obtained by dividing the utterance. , Learn the first model to determine whether the utterances that make up the series data to be processed are utterances that switch stories.
  • the topic in the utterance constituting the series data to be processed is estimated based on the second teacher data in which the range in which one topic in the series data continues is given a second label indicating the topic in the range.
  • a learning device that learns 2 models.
  • Appendix 3 A non-temporary storage medium that stores a program that can be executed by a computer, which stores the program and causes the computer to function as the estimation device according to the appendix 1.
  • Appendix 4 A non-temporary storage medium that stores a program that can be executed by a computer, and that causes the computer to function as the learning device according to the second item.
  • a computer can be suitably used to function as each part of the estimation device 30, 30a, 30b, 30c, 30d and the learning device 70 described above.
  • Such a computer stores a program describing processing contents that realize the functions of the estimation devices 30, 30a, and 30b in the storage unit of the computer, and the processor of the computer reads and executes the program. It can be realized by. That is, the program can make the computer function as the estimation device 30, 30a, 30b, 30c, 30d and the learning device 70 described above.
  • this program may be recorded on a computer-readable medium. It can be installed on a computer using a computer-readable medium.
  • the computer-readable medium on which the program is recorded may be a non-transient recording medium.
  • the non-transient recording medium is not particularly limited, but may be, for example, a recording medium such as a CD-ROM or a DVD-ROM. This program can also be provided via a network.
  • each component can be rearranged so as not to be logically inconsistent, and a plurality of components can be combined or divided into one.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Acoustics & Sound (AREA)
  • Machine Translation (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The estimation device (30) according to the present disclosure comprises: a determination unit (32) which determines whether or not speech that constitutes series data to be processed is speech indicating a change of subject, using a binary classification model (1), which is pre-learned on the basis of training data for speech that constitutes dialogue series data including a plurality of topics, or for units of division of said speech; and a paragraph estimation unit (33) which, on the basis of the result of the determination by the determination unit (32), estimates, in the series data to be processed, the range of a paragraph from a change of subject until speech just before a subsequent change, or of a paragraph from a change of subject until end-of-dialogue speech.

Description

推定装置、推定方法、学習装置、学習方法およびプログラムEstimator, estimation method, learning device, learning method and program
 本開示は、推定装置、推定方法、学習装置、学習方法およびプログラムに関する。 This disclosure relates to an estimation device, an estimation method, a learning device, a learning method and a program.
 オペレータがカスタマ(顧客)からの商品あるいはサービスなどについての問い合わせに応対する部門(いわゆる、コンタクトセンタ)では、カスタマが抱えている問題に対する解決のサポートなどが求められる。コンタクトセンタでは、オペレータによるカスタマとの応対の履歴(応対ログ)が作成され、蓄積・共有される。オペレータあるいはコンタクトセンタの管理者などが、蓄積された応対ログを見直すことで、カスタマからの問い合わせを分析したり、カスタマへの応対の質の向上を図ったりすることができる。応対ログを見直し、カスタマとの応対を振り返る際に、オペレータとカスタマとの対話を話あるいは話題ごとに分割することができれば、応対の振り返りの作業効率を上げることができる。 In the department (so-called contact center) where the operator responds to inquiries about products or services from customers (customers), support for solving problems that customers have is required. In the contact center, the history of customer service by the operator (response log) is created, stored and shared. An operator, a contact center manager, or the like can review the accumulated response log to analyze inquiries from customers and improve the quality of response to customers. When reviewing the response log and looking back on the response with the customer, if the dialogue between the operator and the customer can be divided into talks or topics, the work efficiency of the response review can be improved.
 オペレータとカスタマとの対話は、時間軸に沿った複数の発話から構成された系列データとみなすことができる。一連の系列データに対して、系列データにおける話題を示すラベルを付与した教師データを準備することで、LSTM(Long Short-Term Memory)などのDNN(Deep Neural Network)を用いた機械学習により、対話における話題を分類する分類モデルの学習が可能である(非特許文献1参照)。 The dialogue between the operator and the customer can be regarded as series data composed of multiple utterances along the time axis. By preparing teacher data with a label indicating the topic in the series data for a series of series data, dialogue is performed by machine learning using DNN (Deep Neural Network) such as RSTM (Long Short-Term Memory). It is possible to learn a classification model for classifying topics in (see Non-Patent Document 1).
 一般に、コンタクトセンタで扱うタスクは様々であり、取り扱う商品あるいはサービスの種類によっては、数えられる程度の少数の種類の話題で済む場合もあれば、非常に多くの、数えきれない種類の話題に至る場合もある。対話における話題を、非特許文献1に記載のモデルを用いて、多くの種類の話題に分類しようとすると、少量の教師データでは分類の精度が低下し、精度を上げるために大量の教師データを準備するには、多くのコストがかかってしまう。 In general, contact centers deal with a variety of tasks, depending on the type of product or service they handle, with a small number of topics that can be counted, or a large number of topics that can be counted. In some cases. When trying to classify topics in dialogue into many types of topics using the model described in Non-Patent Document 1, a small amount of teacher data reduces the accuracy of classification, and a large amount of teacher data is used to improve the accuracy. It costs a lot to prepare.
 上記のような問題点に鑑みてなされた本開示の目的は、複数の話題を含む対話の系列データにおける、段落の範囲を推定することができる推定方法、推定装置、学習装置、学習方法およびプログラムを提供することにある。 An object of the present disclosure made in view of the above problems is an estimation method, an estimation device, a learning device, a learning method, and a program capable of estimating a paragraph range in a series of dialogue data including a plurality of topics. Is to provide.
 上記課題を解決するため、本開示に係る推定装置は、複数の話題を含む対話の系列データを構成する発話または前記発話を分割した分割単位に対して、第1の教師データに基づいて予め学習された第1のモデルを用いて、処理対象の系列データを構成する発話が、話の切り替わりの発話であるか否かを判定する判定部と、前記判定の結果に基づき、前記処理対象の系列データにおける、話の切り替わりから次の切り替わりの直前の発話までの段落または前記話の切り替わりから前記対話の終わりの発話までの段落の範囲を推定する段落推定部と、を備える。 In order to solve the above problems, the estimation device according to the present disclosure learns in advance based on the first teacher data for the utterances constituting the series data of the dialogue including a plurality of topics or the divided units obtained by dividing the utterances. Using the first model, the determination unit that determines whether or not the utterance constituting the series data of the processing target is the utterance of the switching of the talk, and the series of the processing target based on the result of the determination. The data includes a paragraph estimation unit that estimates the range of the paragraph from the switching of the talk to the utterance immediately before the next switching or the paragraph from the switching of the talk to the utterance at the end of the dialogue.
 また、上記課題を解決するため、本開示に係る推定方法は、複数の話題を含む対話の系列データを構成する発話または前記発話を分割した分割単位に対して、第1の教師データに基づいて予め学習された第1のモデルを用いて、処理対象の系列データを構成する発話が、話の切り替わりの発話であるか否かを判定する判定ステップと、前記判定の結果に基づき、前記処理対象の系列データにおける、話の切り替わりから次の切り替わりの直前の発話までの段落または前記話の切り替わりから前記対話の終わりの発話までの段落の範囲を推定する段落推定ステップと、を含む。 Further, in order to solve the above-mentioned problem, the estimation method according to the present disclosure is based on the first teacher data for an utterance constituting the series data of a dialogue including a plurality of topics or a division unit obtained by dividing the utterance. Using the first model learned in advance, the processing target is based on the determination step of determining whether or not the utterance constituting the series data of the processing target is the utterance of switching the talk, and the result of the determination. Includes a paragraph estimation step that estimates the range of the paragraph from one talk to the utterance immediately before the next switch or from the change of the talk to the utterance at the end of the dialogue in the series data of.
 また、上記課題を解決するため、本開示に係る学習装置は、複数の話題を含む対話の系列データを構成する発話または前記発話を分割した分割単位に対して、話の切り替わりであるか否かを示す第1のラベルが付与された第1の教師データに基づき、処理対象の系列データを構成する発話が、話の切り替わりの発話であるか否かを判定する第1のモデルを学習する第1のモデル学習部と、前記系列データにおける1つの話題が続く範囲に、前記範囲における話題を示す第2のラベルが付与された第2の教師データに基づき、前記処理対象の系列データを構成する発話における話題を推定する第2のモデルを学習する第2のモデル学習部と、を備える。 Further, in order to solve the above-mentioned problem, whether or not the learning device according to the present disclosure switches the talk with respect to the utterance constituting the series data of the dialogue including a plurality of topics or the divided unit obtained by dividing the utterance. Based on the first teacher data to which the first label indicating is attached, the first model for determining whether or not the utterance constituting the series data to be processed is the utterance of the switching of the talk is learned. The series data to be processed is configured based on the model learning unit 1 and the second teacher data to which the second label indicating the topic in the range is added to the range in which one topic in the series data continues. It includes a second model learning unit that learns a second model that estimates a topic in speech.
 また、上記課題を解決するため、本開示に係る学習方法は、複数の話題を含む対話の系列データを構成する発話または前記発話を分割した分割単位に対して、話の切り替わりであるか否かを示す第1のラベルが付与された第1の教師データに基づき、処理対象の系列データを構成する発話が、話の切り替わりの発話であるか否かを判定する第1のモデルを学習する第1の学習ステップと、前記系列データにおける1つの話題が続く範囲に、前記範囲における話題を示す第2のラベルが付与された第2の教師データに基づき、前記処理対象の系列データを構成する発話における話題を推定する第2のモデルを学習する第2の学習ステップと、を含む。 Further, in order to solve the above-mentioned problem, whether or not the learning method according to the present disclosure is a change of talk with respect to an utterance constituting series data of a dialogue including a plurality of topics or a division unit obtained by dividing the utterance. Based on the first teacher data to which the first label indicating is attached, the first model for determining whether or not the utterance constituting the series data to be processed is the utterance of the switching of the talk is learned. An utterance that constitutes the series data to be processed based on the second teacher data in which the learning step 1 and the range in which one topic in the series data continues is given a second label indicating the topic in the range. Includes a second learning step of learning a second model that estimates the topic in.
 また、上記課題を解決するため、本開示に係るプログラムは、コンピュータを、上述した推定装置として動作させる。 Further, in order to solve the above-mentioned problems, the program according to the present disclosure operates a computer as the above-mentioned estimation device.
 本開示に係る推定装置、推定方法、学習装置、学習方法およびプログラムによれば、複数の話題を含む対話の系列データにおける、段落の範囲を推定することができる。 According to the estimation device, estimation method, learning device, learning method and program according to the present disclosure, it is possible to estimate the range of paragraphs in the series data of the dialogue including a plurality of topics.
二値分類モデルを学習する学習装置の構成例を示す図である。It is a figure which shows the configuration example of the learning apparatus which trains a binary classification model. 多値分類モデルを学習する学習装置の構成例を示す図である。It is a figure which shows the configuration example of the learning apparatus which trains a multi-value classification model. 本開示の第1の実施形態に係る推定装置の構成の一例を示す図である。It is a figure which shows an example of the structure of the estimation apparatus which concerns on 1st Embodiment of this disclosure. 本開示の第1の実施形態に係る推定装置の構成の別の一例を示す図である。It is a figure which shows another example of the structure of the estimation apparatus which concerns on 1st Embodiment of this disclosure. 本開示の第1の実施形態に係る推定装置の構成のさらに別の一例を示す図である。It is a figure which shows still another example of the structure of the estimation apparatus which concerns on 1st Embodiment of this disclosure. 図2に示す多値ラベル補完部の動作の一例を示すフローチャートである。It is a flowchart which shows an example of the operation of the multi-valued label complement part shown in FIG. 図3に示す推定装置の動作の一例を示すフローチャートである。It is a flowchart which shows an example of the operation of the estimation apparatus shown in FIG. 図4に示す推定装置の動作の一例を示すフローチャートである。It is a flowchart which shows an example of the operation of the estimation apparatus shown in FIG. 図5に示す推定装置による段落の範囲の推定の動作の一例を示すフローチャートである。It is a flowchart which shows an example of the operation of the estimation of the paragraph range by the estimation apparatus shown in FIG. 図5に示す推定装置による話題の推定の動作の一例を示すフローチャートである。It is a flowchart which shows an example of the operation of the estimation of a topic by the estimation apparatus shown in FIG. 二値分類モデルおよび多値分類モデルの学習について説明するための図である。It is a figure for demonstrating the learning of a binary classification model and a multi-value classification model. 図3に示す推定装置による話題の推定について説明するための図である。It is a figure for demonstrating the estimation of the topic by the estimation apparatus shown in FIG. 図3に示す推定装置による話題の推定について説明するための図である。It is a figure for demonstrating the estimation of the topic by the estimation apparatus shown in FIG. 図4に示す推定装置による話題の推定について説明するための図である。It is a figure for demonstrating the estimation of the topic by the estimation apparatus shown in FIG. 図5に示す推定装置による話題の推定について説明するための図である。It is a figure for demonstrating the estimation of the topic by the estimation apparatus shown in FIG. 本開示の第2の実施形態に係る推定装置の構成の一例を示す図である。It is a figure which shows an example of the structure of the estimation apparatus which concerns on 2nd Embodiment of this disclosure. 図16に示す推定装置の動作の一例を示すフローチャートである。It is a flowchart which shows an example of the operation of the estimation apparatus shown in FIG. 図16に示す推定装置による話題の推定について説明するための図である。It is a figure for demonstrating the estimation of the topic by the estimation apparatus shown in FIG. 本開示の第3の実施形態に係る学習データ作成装置の構成例を示す図である。It is a figure which shows the configuration example of the learning data creation apparatus which concerns on 3rd Embodiment of this disclosure. 図19に示す学習データ作成部の構成例を示す図である。It is a figure which shows the structural example of the learning data creation part shown in FIG. 図20に示す組み合わせ生成部の構成例を示す図である。It is a figure which shows the structural example of the combination generation part shown in FIG. 図20に示す付与部の構成例を示す図である。It is a figure which shows the structural example of the giving part shown in FIG. 図21に示す組み合わせ生成部の動作の一例を示すフローチャートである。It is a flowchart which shows an example of the operation of the combination generation part shown in FIG. 図22に示す付与部の動作の一例を示すフローチャートである。It is a flowchart which shows an example of the operation of the addition part shown in FIG. 本開示の第3の実施形態に係る推定装置の構成例を示す図である。It is a figure which shows the structural example of the estimation apparatus which concerns on 3rd Embodiment of this disclosure. 図25に示す入力部の構成例を示す図である。It is a figure which shows the structural example of the input part shown in FIG. 図25に示す推定部の構成例を示す図である。It is a figure which shows the structural example of the estimation part shown in FIG. 図27に示す組み合わせ生成部の構成例を示す図である。It is a figure which shows the structural example of the combination generation part shown in FIG. 図27に示す切り替わり推定部の動作の一例を示すフローチャートである。It is a flowchart which shows an example of the operation of the switching estimation part shown in FIG. 27. 図27に示す推定部による、文の分割から組み合わせID列の生成までの動作の一例を説明するための図である。It is a figure for demonstrating an example of the operation from the division of a sentence to the generation of a combination ID string by the estimation part shown in FIG. 27. 図27に示す推定部による、推定モデルを用いた推定から推定結果の出力までの動作の一例を説明するための図である。It is a figure for demonstrating an example of the operation from the estimation using the estimation model to the output of the estimation result by the estimation unit shown in FIG. 27. 図27に示す推定部による、文の分割から組み合わせID列の生成までの動作の他の一例を説明するための図である。It is a figure for demonstrating another example of the operation from the division of a sentence to the generation of a combination ID string by the estimation part shown in FIG. 27. 図27に示す推定部による、推定モデルを用いた推定から推定結果の出力までの動作の他の一例を説明するための図である。It is a figure for demonstrating another example of the operation from the estimation using the estimation model to the output of the estimation result by the estimation unit shown in FIG. 27. 本開示に係る学習装置の他の構成例を示す図である。It is a figure which shows the other structural example of the learning apparatus which concerns on this disclosure. 図32に示す学習装置の動作の一例を示すフローチャートである。It is a flowchart which shows an example of the operation of the learning apparatus shown in FIG. 32. 図3に示す推定装置のハードウェア構成の一例を示す図である。It is a figure which shows an example of the hardware composition of the estimation apparatus shown in FIG.
 以下、本開示の実施の形態について図面を参照して説明する。 Hereinafter, embodiments of the present disclosure will be described with reference to the drawings.
 (第1の実施形態)
 まず、本開示の概要について説明する。
(First Embodiment)
First, the outline of the present disclosure will be described.
 系列データを構成する発話においては、語句が省略されることが比較的多いため、発話の長さ、すなわち、単語数が少なくなる場合がある。また、話題の種類が少なくても、話題同士が類似していたり、話題の出現順序が不定であったりする場合がある。これらの場合にも話題の分類が可能な分類モデルを構築するためには、教師データの準備に多くのコストがかかってしまう。 In the utterances that make up the series data, words and phrases are relatively often omitted, so the length of the utterance, that is, the number of words may be reduced. Moreover, even if there are few types of topics, the topics may be similar to each other or the order of appearance of the topics may be indefinite. Even in these cases, it takes a lot of cost to prepare teacher data in order to construct a classification model capable of classifying topics.
 複数の話題を含む対話の系列データにおける話題を推定するためには、話の切り替わり(区切り)から次の切り替わりの直前の発話までの段落または話の切り替わりから対話の終わりの発話までの段落の範囲を推定することが有効である。段落の範囲を推定することができれば、その段落に含まれる発話に範囲を限定して話題を推定することができるので、話題の推定をより高精度に行うことができる。 In order to estimate a topic in the series data of a dialogue containing multiple topics, the range of the paragraph from the change of story (separation) to the utterance immediately before the next change or the paragraph from the change of story to the utterance at the end of the dialogue. It is effective to estimate. If the range of a paragraph can be estimated, the topic can be estimated by limiting the range to the utterances included in the paragraph, so that the topic can be estimated with higher accuracy.
 本開示は、オペレータとカスタマとの対話といった、複数の話題を含む対話の系列データにおける、話の切り替わりから次の切り替わりの直前の発話までの段落または話の切り替わりから対話の終わりの発話までの段落の範囲の推定、および、段落における話題の推定に関する。 The present disclosure is a paragraph from one story change to the utterance immediately before the next switch or a paragraph from the story switch to the end of the dialogue in a series of dialogue data containing multiple topics, such as a dialogue between an operator and a customer. Regarding the estimation of the range of and the estimation of the topic in the paragraph.
 以下では、コンタクトセンタにおけるオペレータとカスタマとの対話を例として考える。オペレータが主導して対話を進めるケースとして、カスタマが抱えている問題を解決するに当たり、オペレータが現在の状況あるいはこれまでの経緯などをカスタマに問診しながら原因を突き止めていくケース、オペレータがカスタマの状況についてインタビューを行いながら業務の手続きに必要な書類を作成するケースなどが存在する。 In the following, the dialogue between the operator and the customer at the contact center will be considered as an example. As a case where the operator takes the initiative in conducting dialogue, when solving the problem that the customer has, the operator asks the customer about the current situation or the history so far to find out the cause, and the operator is the customer. There are cases where documents necessary for business procedures are created while conducting interviews about the situation.
 上述したようなケースの対話では、オペレータが質問している内容の単位を1つの話題と捉えることができる。ただし、多くの話題の種類から最も適切な話題の種類を一意に決定することは難しい。また、上述したような対話における話題はいずれも特定の業務に関連した範囲の話題であり、ある話題と別の話題とが類似していることが多い。そして、類似している話題を区別することは難しい。そのため、対話全体を、話題ごとの一連のまとまりに分割するのは困難である。 In the dialogue in the case described above, the unit of the content that the operator is asking can be regarded as one topic. However, it is difficult to uniquely determine the most appropriate topic type from many topic types. In addition, all the topics in the dialogue as described above are topics in the range related to a specific business, and one topic and another topic are often similar. And it is difficult to distinguish between similar topics. Therefore, it is difficult to divide the entire dialogue into a series of topics.
 しかしながら、オペレータが次の話に移る際には、オペレータは、「このたび」、「では」、「あと」といった、話が切り替わることをカスタマに伝える語句を発することが多い。また、話が終わる際には、オペレータは、カスタマの発話を受けて、「かしこまりました」、「承知いたしました」といった、話が終わることをカスタマに伝える語句を発することが多い。これらの語句は、話の内容に依存しないため、話の切り替わり(話の区切り)を検出する上で有用である。 However, when the operator moves on to the next story, the operator often utters words such as "this time", "in", and "after" to tell the customer that the story will change. In addition, at the end of the talk, the operator often receives the customer's utterance and utters words such as "smart" and "acknowledged" to inform the customer that the talk is over. Since these words do not depend on the content of the story, they are useful for detecting the change of story (break of story).
 本開示においては、例えば、上述した話の切り替わりを示す語句などを利用して、系列データにおける発話が、話の切り替わり発話であるか否かを判定するルールを作成する。そして、本開示においては、作成したルールに基づき、系列データにおける発話が、話の切り替わりの発話であるか否かを判定する。また、本開示においては、例えば、話の切り替わりの発話には、話の切り替わりであることを示すラベルを付与し、その他の発話には、話の切り替わりの発話でないことを示すラベルを付与した教師データに基づき、話の切り替わりの発話であるか否かを判定するモデルを作成し、作成したモデルの判定の結果を用いて、話の切り替わりから次の切り替わりの直前の発話までの段落または話の切り替わりから対話の終わりの発話までの段落の範囲を推定する。また、本開示においては、段落あるいは段落に含まれる発話における話題を推定する。対話に多くの話題あるいは類似した内容の話題が含まれている場合であっても、話の切り替わりから次の切り替わりの直前の発話までの段落または話の切り替わりから対話の終わりの発話までの段落の範囲を推定することができれば、その段落に含まれる発話に絞って話題を推定することができるので、より精度の高い話題の推定が可能となる。 In the present disclosure, for example, a rule for determining whether or not the utterance in the series data is a story switching utterance is created by using the above-mentioned words and phrases indicating the story switching. Then, in the present disclosure, it is determined whether or not the utterance in the series data is the utterance of the switching of the talk, based on the created rule. Further, in the present disclosure, for example, a teacher who assigns a label indicating that the utterance of the story change is a story change utterance and a label indicating that the other utterances are not the story change utterances. Based on the data, create a model that determines whether or not the utterance is a change of story, and use the judgment result of the created model to describe the paragraph or story from the change of story to the utterance immediately before the next change. Estimate the range of paragraphs from the transition to the utterance at the end of the dialogue. In addition, in this disclosure, the topic in the paragraph or the utterance contained in the paragraph is estimated. Even if the dialogue contains many topics or similar topics, the paragraph from one talk to the utterance immediately before the next one or the paragraph from the talk to the end of the dialogue If the range can be estimated, the topic can be estimated by focusing on the utterances included in the paragraph, so that the topic can be estimated with higher accuracy.
 上述したように、本開示においては、予め学習されたモデルを用いて、系列データを構成する発話が、話の切り替わりの発話であるか否かを判定する。また、本開示においては、段落における話題の推定に、教師データに基づき学習されたモデルを用いてもよい。まず、これらのモデルの学習について説明する。 As described above, in the present disclosure, it is determined whether or not the utterances constituting the series data are the utterances of the switching of the talks by using the model learned in advance. Further, in the present disclosure, a model learned based on teacher data may be used for estimating the topic in the paragraph. First, the learning of these models will be described.
 系列データを構成する発話が、話題の切り替わりの発話であるか否かを判定するモデルを用いて、系列データを構成する発話が、話題の切り替わりの発話であるか否かを判定し、その判定結果を用いて、段落の範囲を推定してもよい。ただし、系列データを構成する発話が話題の切り替わりの発話であるか否かを判定するモデルの作成のためには、系列データを構成する発話ごとに話題を示す多値ラベルが付与された教師データが必要となる。通常、そのような教師データを作成することは、手間がかかり、困難であることが多い。そこで、本実施形態においては、系列データを構成する発話が、話の切り替わりの発話であるか否かを判定し、その判定結果を用いて、段落の範囲を推定する。ただし、系列データを構成する発話ごとに話題を示す多値ラベルが付与された教師データを用意することができれば、話題の切り替わりに基づき、段落の範囲を推定してもよい。従って、本開示における「話の切り替わり」は、「話題の切り替わり」も含む概念である。 Using a model that determines whether or not the utterances that make up the series data are utterances that switch topics, it is determined whether or not the utterances that make up the series data are utterances that switch topics, and that determination is made. The results may be used to estimate the range of paragraphs. However, in order to create a model that determines whether or not the utterances that make up the series data are utterances that switch topics, teacher data with a multi-valued label that indicates the topic for each utterance that makes up the series data. Is required. Creating such teacher data is usually laborious and often difficult. Therefore, in the present embodiment, it is determined whether or not the utterance constituting the series data is an utterance in which the story is switched, and the range of the paragraph is estimated using the determination result. However, if teacher data with a multi-valued label indicating a topic can be prepared for each utterance constituting the series data, the range of paragraphs may be estimated based on the change of topics. Therefore, the "switching of stories" in the present disclosure is a concept including "switching of topics".
 図1は、系列データを構成する発話が話の切り替わりの発話であるか否かを判定する二値分類モデル1を学習する学習装置10の構成例を示す図である。 FIG. 1 is a diagram showing a configuration example of a learning device 10 for learning a binary classification model 1 for determining whether or not an utterance constituting the series data is an utterance of switching talks.
 図1に示す学習装置10は、入力部11と、二値分類学習部12とを備える。 The learning device 10 shown in FIG. 1 includes an input unit 11 and a binary classification learning unit 12.
 入力部11は、複数の話題を含む対話の系列データが入力される。系列データは、例えば、オペレータおよびカスタマの時系列的な発話が音声認識されたテキストデータである。入力部11に入力される系列データは、発話単位であってもよく、発話を分割した分割単位(例えば、単語単位、文字単位、句点単位など)であってもよい。入力部11は、オンラインで系列データが入力される場合には、対話中の各発話の音声認識により得られたテキストデータが逐次、入力されてよい。入力部11は、オフラインで系列データが入力される場合には、対話中の各発話の開始時刻あるいは終了時刻でソートして、各発話のテキストデータが入力されてよい。 The input unit 11 inputs the series data of the dialogue including a plurality of topics. The series data is, for example, text data in which time-series utterances of an operator and a customer are voice-recognized. The series data input to the input unit 11 may be an utterance unit or a division unit in which the utterance is divided (for example, a word unit, a character unit, a kuten unit, etc.). When the series data is input online, the input unit 11 may sequentially input the text data obtained by the voice recognition of each utterance during the dialogue. When the series data is input offline, the input unit 11 may sort the start time or end time of each utterance during the dialogue and input the text data of each utterance.
 また、入力部11は、系列データを構成する発話または発話を分割した分割単位に対して付与された、話の切り替わりであるか否かを示す二値ラベル(切り替わりラベル)が入力される。二値ラベルは、例えば、「1(話の切り替わりである)」または「0(話の切り替わりでない)」、あるいは、「True(話の切り替わりである)」または「False(話の切り替わりでない)」などのラベルである。また、入力部11は、発話またはその分割単位に対して、話の切り替わりを示す何らかのラベルが付与されていれば、「True(話の切り替わりである)」とみなし、話の切り替わりを示す何らかのラベルが付与されていなければ、「False(話の切り替わりでない)」とみなしてよい。 Further, in the input unit 11, a binary label (switching label) indicating whether or not the utterance is switched, which is given to the utterance constituting the series data or the division unit obtained by dividing the utterance, is input. The binary label is, for example, "1 (switching story)" or "0 (not switching story)", or "True (switching story)" or "False (not switching story)". Labels such as. Further, if the utterance or its division unit is given some label indicating the change of the story, the input unit 11 considers it as "True (change of the story)" and some label indicating the change of the story. If is not given, it may be regarded as "False (not a change of utterance)".
 二値ラベルは系列データを構成する発話またはその分割単位に対して予め人手により付与される。上述したように、話の切り替わりに発せられることが多い語句がある。二値ラベルは、例えば、これらの語句に基づき付与される。なお、例えば、機器の故障を例にすると、機器の故障に関する話題であるか否かを分類したい場合には、原因に関わらず機器の故障に関する発話の話題は「機器の故障」となる。一方、故障の原因に応じて話題を分類したい場合には、故障の原因ごとに異なる話題となる。したがって、分類したい話題の決め方によっては、話に区切りがついても、話題が切り替わっていない場合がある。そのため、二値ラベルの付与にあたっては、ある話題から同じ話題に遷移する発話であっても、話の切り替わりである可能性がある発話またはその分割単位に対して、話の切り替わりであることを示すラベルが付与されることが好ましい。こうすることで、話の切り替わりの発話についての正例を増やし、話の切り替わりの発話の判定の精度を高めることができる。 The binary label is manually attached to the utterances that make up the series data or their division units in advance. As mentioned above, there are words and phrases that are often spoken at the transition of the story. Binary labels are given, for example, based on these terms. For example, taking the failure of a device as an example, when it is desired to classify whether or not the topic is related to the failure of the device, the topic of the utterance regarding the failure of the device is "device failure" regardless of the cause. On the other hand, if you want to classify topics according to the cause of the failure, the topic will be different for each cause of the failure. Therefore, depending on how the topic to be classified is decided, the topic may not be switched even if the story is divided. Therefore, when assigning a binary label, it is shown that even an utterance that transitions from a certain topic to the same topic is a change of story for an utterance that may be a change of story or a division unit thereof. It is preferable that a label is attached. By doing so, it is possible to increase the number of positive examples of the utterance of the story change and improve the accuracy of the determination of the utterance of the story change.
 このように、入力部11は、複数の話題を含む対話の系列データと、系列データを構成する発話またはその分割単位に対して付与された、話の切り替わりであるか否かを示す二値ラベル(第1のラベル)とが入力される。入力部11は、入力された系列データおよび二値ラベルを二値分類学習部12に出力する。 As described above, the input unit 11 is a binary label indicating whether or not the series data of the dialogue including a plurality of topics and the utterances constituting the series data or the division unit thereof are switched. (First label) and is entered. The input unit 11 outputs the input series data and the binary label to the binary classification learning unit 12.
 二値分類学習部12は、入力部11から出力された系列データおよび二値ラベルを教師データとして学習を行い、系列データにおける発話が話の切り替わりの発話であるか否かを判定する二値分類モデル1(第1のモデル)を学習する。したがって、二値分類モデル1は、複数の話題を含む対話の系列データを構成する発話またはその分割単位に対して、教師データ(第1の教師データ)に基づいて予め学習されたモデルである。二値分類モデル1の学習に用いられる教師データ(第1の教師データ)は、複数の話題を含む対話の系列データを構成する発話またはその分割単位に対して、話の切り替わりであるか否かを示す二値ラベルが付与されたデータである。モデルの学習には、時系列的なデータの学習に適したLSTMなどを用いることができる。 The binary classification learning unit 12 learns using the series data and the binary label output from the input unit 11 as teacher data, and determines whether or not the utterance in the series data is a talk switching utterance. Learn model 1 (first model). Therefore, the binary classification model 1 is a model learned in advance based on the teacher data (first teacher data) for the utterances or the division units thereof constituting the series data of the dialogue including a plurality of topics. Whether or not the teacher data (first teacher data) used for learning the binary classification model 1 is a change of talk with respect to the utterance or its division unit constituting the series data of the dialogue including a plurality of topics. It is the data to which the binary label indicating is attached. For model training, LSTM or the like suitable for learning time-series data can be used.
 上述したように、二値分類モデル1の学習に用いられる教師データにおいては、ある話題から同じ話題に遷移する発話を含めて、話の切り替わりである可能性がある発話またはその分割単位に対して、話の切り替わりであることを示すラベルが付与される。したがって、このような教師データを用いて学習された二値分類モデル1によれば、分類したい話題の決め方によっては、話題が切り替わっておらず、同じ話題に関連する発話が続く区間内の発話であっても、話の切り替わりの発話と判定されることがある。 As described above, in the teacher data used for learning the binary classification model 1, for utterances that may be a change of talk or division units thereof, including utterances that transition from one topic to the same topic. , A label indicating that the story is switched is given. Therefore, according to the binary classification model 1 learned using such teacher data, the topics are not switched depending on how the topic to be classified is determined, and the utterances related to the same topic continue in the section. Even if there is, it may be determined that the utterance is a change of story.
 次に、図2を参照して、話題を分類(推定)する多値分類モデル2を学習する学習装置20の構成について説明する。 Next, with reference to FIG. 2, the configuration of the learning device 20 for learning the multi-value classification model 2 for classifying (estimating) topics will be described.
 図2に示すように、学習装置20は、入力部21と、多値ラベル補完部22と、多値分類学習部23とを備える。 As shown in FIG. 2, the learning device 20 includes an input unit 21, a multi-value label complement unit 22, and a multi-value classification learning unit 23.
 入力部21は、複数の話題を含む対話の系列データが入力される。また、入力部21は、系列データを構成する発話またはその分割単位に対して付与された、話の切り替わりであるか否かを示す二値ラベルが入力される。また、入力部21は、系列データにおける、1つの話題が続く範囲およびその範囲における話題を示す多値ラベル(第2のラベル)が入力される。系列データおよび二値ラベルは、図1に示す入力部11に入力される系列データおよび二値ラベルと同様である。多値ラベルは人手により付与される。具体的には、系列データにおいて、1つの話題が続く範囲が特定され、複数の話題のラベルの中から、その特定された範囲における話題を示す多値ラベルが付与される。1つの系列データに対する二値ラベルと多値ラベルとは、別々のファイルで入力されてもよいし、1つのファイルでまとめて入力されてもよい。 The input unit 21 inputs the series data of the dialogue including a plurality of topics. Further, the input unit 21 inputs a binary label indicating whether or not the utterance constitutes the series data or the division unit thereof, which indicates whether or not the utterance is switched. Further, the input unit 21 inputs a range in which one topic continues in the series data and a multi-valued label (second label) indicating the topic in the range. The series data and the binary label are the same as the series data and the binary label input to the input unit 11 shown in FIG. Multi-valued labels are given manually. Specifically, in the series data, a range in which one topic continues is specified, and a multi-valued label indicating a topic in the specified range is assigned from labels of a plurality of topics. The binary label and the multi-valued label for one series data may be input in separate files, or may be input together in one file.
 入力部21は、入力された、系列データ、二値ラベルおよび多値ラベルを多値ラベル補完部22に出力する。 The input unit 21 outputs the input series data, binary label, and multi-value label to the multi-value label complement unit 22.
 多値ラベル補完部22は、入力部21から入力された系列データ、二値ラベルおよび多値ラベルから、多値分類モデル2を学習するための教師データ(第2の教師データ)を生成する。具体的には、多値ラベル補完部22は、話の切り替わりであることを示すラベルが付与された発話またはその分割単位に対して、その発話が含まれる範囲における話題を示す多値ラベルを付与する。上述したように、教師データとしての二値ラベルの付与においては、ある話題から同じ話題に遷移する発話も含めて、話の切り替わりである可能性がある発話またはその分割単位に対して、話の切り替わりであることを示すラベルが付与される。したがって、例えば、同じ話題に関連する発話が続く範囲内の発話であっても、話の切り替わりであることを示すラベルが付与されることがある。多値ラベル補完部22は、そのような発話またはその分割単位に対しても、その発話が含まれる範囲における話題を示す多値ラベルを付与する。こうすることで、各話題に関連する発話の教師データを増やし、話題の推定の精度向上を図ることができる。 The multi-value label complement unit 22 generates teacher data (second teacher data) for learning the multi-value classification model 2 from the series data, the binary label, and the multi-value label input from the input unit 21. Specifically, the multi-valued label complementing unit 22 assigns a multi-valued label indicating a topic in the range including the utterance to the utterance or the division unit thereof to which the label indicating that the utterance is switched is given. do. As described above, in assigning a binary label as teacher data, for an utterance that may be a change of story or a division unit thereof, including an utterance that transitions from a certain topic to the same topic. A label indicating that it is a switch is given. Therefore, for example, even if the utterances are within the range in which the utterances related to the same topic continue, a label indicating that the utterances are switched may be given. The multi-valued label complementing unit 22 also assigns a multi-valued label indicating a topic in the range including the utterance to such an utterance or a division unit thereof. By doing so, it is possible to increase the teacher data of utterances related to each topic and improve the accuracy of topic estimation.
 多値ラベル補完部22は、多値ラベルを付与した発話またはその分割単位と、その発話または分割単位に付与した多値ラベルとを多値分類学習部23に出力する。 The multi-valued label complementing unit 22 outputs the utterance to which the multi-valued label is attached or the division unit thereof and the multi-valued label assigned to the utterance or the division unit to the multi-value classification learning unit 23.
 多値分類学習部23は、多値ラベル補完部22から出力された、発話またはその分割単位と、その発話または分割単位に付与された多値ラベルとを教師データ(第2の教師データ)として、多値分類モデル2(第2のモデル)を学習する。したがって、多値分類モデル2は、系列データを構成する発話またはその分割単位に対して、教師データ(第2の教師データ)に基づいて予め学習されたモデルである。多値分類モデル2の学習に用いられる教師データは、話の切り替わりの発話またはその分割単位に対して、話の切り替わりであることを示す二値ラベルが付与されるとともに、話題が続く範囲およびその範囲における話題が特定された系列データにおいて、話の切り替わりであることを示すラベルが付与された発話またはその分割単位に、その発話が含まれる範囲における話題を示す多値ラベルを付与することで生成されたデータである。 The multi-value classification learning unit 23 uses the utterance or its division unit output from the multi-value label complement unit 22 and the multi-value label given to the utterance or division unit as teacher data (second teacher data). , Multi-value classification model 2 (second model) is learned. Therefore, the multi-value classification model 2 is a model learned in advance based on the teacher data (second teacher data) for the utterances constituting the series data or the division units thereof. The teacher data used for learning the multi-valued classification model 2 is given a binary label indicating that the utterance of the story change or its division unit is a story change, and the range in which the topic continues and its division unit. Generated by assigning a multi-valued label indicating a topic in the range including the utterance to the utterance or its division unit to which the label indicating that the utterance is switched is given in the series data in which the topic in the range is specified. It is the data that was made.
 次に、本実施形態に係る推定装置30の構成について、図3を参照して説明する。本実施形態に係る推定装置30は、オペレータとカスタマとの対話といった、複数の話題を含む対話の系列データにおける、話の切り替わりから次の切り替わりの直前の発話までの段落または話の切り替わりから対話の終わりの発話までの段落の範囲を推定し、その段落における話題を推定する。 Next, the configuration of the estimation device 30 according to the present embodiment will be described with reference to FIG. The estimation device 30 according to the present embodiment is a dialogue from a paragraph from a talk change to an utterance immediately before the next switch in a series data of a dialogue including a plurality of topics such as a dialogue between an operator and a customer. Estimate the range of paragraphs to the last utterance and estimate the topic in that paragraph.
 図3に示すように、本実施形態に係る推定装置30は、入力部31と、判定部32と、段落推定部33と、話題推定部34と、出力部35とを備える。 As shown in FIG. 3, the estimation device 30 according to the present embodiment includes an input unit 31, a determination unit 32, a paragraph estimation unit 33, a topic estimation unit 34, and an output unit 35.
 入力部31は、複数の話題を含む系列データが入力される。入力部31に入力される系列データは、段落の範囲および段落における話題の推定の対象となる処理対象のデータである。系列データは、例えば、オペレータおよびカスタマの時系列的な発話が音声認識されたテキストデータである。入力部31は、オンラインで系列データが入力される場合には、対話中の各発話の音声認識により得られたテキストデータが逐次、入力されてよい。また、入力部31は、オフラインで系列データが入力される場合には、対話中の各発話の開始時刻あるいは終了時刻でソートして、各発話のテキストデータが入力されてよい。入力部31は、入力された系列データを判定部32に出力する。 The input unit 31 inputs series data including a plurality of topics. The series data input to the input unit 31 is data to be processed that is the target of estimation of the paragraph range and the topic in the paragraph. The series data is, for example, text data in which time-series utterances of an operator and a customer are voice-recognized. When the series data is input online, the input unit 31 may sequentially input the text data obtained by the voice recognition of each utterance during the dialogue. Further, when the series data is input offline, the input unit 31 may sort by the start time or the end time of each utterance during the dialogue and input the text data of each utterance. The input unit 31 outputs the input series data to the determination unit 32.
 判定部32は、二値分類モデル1(第1のモデル)を用いて、入力部31から出力された系列データを構成する発話が、話の切り替わりの発話であるか否かを判定し、判定の結果を段落推定部33に出力する。上述したように、二値分類モデル1は、複数の話題を含む対話の系列データを構成する、発話またはその分割単位に対して、話の切り替わりであるか否かを示す二値ラベルが付与された教師データ(第1の教師データ)に基づいて予め学習されたモデルである。 The determination unit 32 uses the binary classification model 1 (first model) to determine whether or not the utterance constituting the series data output from the input unit 31 is an utterance of switching of the story, and determines. The result of is output to the paragraph estimation unit 33. As described above, the binary classification model 1 is given a binary label indicating whether or not the talk is switched to the utterance or its division unit, which constitutes the series data of the dialogue including a plurality of topics. It is a model trained in advance based on the teacher data (first teacher data).
 段落推定部33は、判定部32による判定の結果に基づき、系列データにおける、話の切り替わりから次の切り替わりの直前の発話までの段落または話の切り替わりから対話の終わりの発話までの段落の範囲を推定する。具体的には、段落推定部33は、判定部32により話の切り替わりの発話であると判定された発話から、その次に話の切り替わりの発話であると判定された発話の直前の発話までの範囲を、1つの段落と推定する。上述したように、二値モデル1の学習に用いられる教師データにおいては、同じ話題に関連する発話が続く範囲内の発話であっても、話の切り替わりであることを示すラベルが付与されることがある。そのため、段落推定部33は、同じ話題に関連する発話が続く範囲であっても、その範囲を複数の段落に分類することがある。 Based on the result of the determination by the determination unit 32, the paragraph estimation unit 33 determines the range of the paragraph from the talk switching to the utterance immediately before the next switching or the paragraph from the talk switching to the utterance at the end of the dialogue in the series data. presume. Specifically, the paragraph estimation unit 33 ranges from the utterance determined by the determination unit 32 to be the utterance of the switching of the story to the utterance immediately before the utterance determined to be the next utterance of the switching of the story. Estimate the range as one paragraph. As described above, in the teacher data used for learning the binary model 1, even if the utterances are within the range in which the utterances related to the same topic continue, a label indicating that the utterances are switched is given. There is. Therefore, the paragraph estimation unit 33 may classify the range into a plurality of paragraphs even if the utterances related to the same topic continue.
 話題推定部34は、多値分類モデル2(第2のモデル)を用いて、段落推定部33により範囲が推定された、段落または段落に含まれる発話における話題を推定する。上述したように、多値分類モデル2は、系列データを構成する発話またはその分割単位に対して、その発話が関連する話題を示す多値ラベルが付与された教師データに基づいて予め学習されたモデルである。多値分類モデル2の学習に用いられる教師データは、話の切り替わりの発話またはその分割単位に対して、話の切り替わりであることを二値ラベルが付与されるとともに、話題が続く範囲およびその範囲における話題が特定された系列データを用いて生成される。具体的には、多値分類モデル2の学習に用いられる教師データは、上記の系列データにおいて、話の切り替わりであることを二値ラベルが付与された発話またはその分割単位に、その発話が含まれる範囲における話題を示す多値ラベルを付与することで生成される。 The topic estimation unit 34 uses the multi-value classification model 2 (second model) to estimate the topic in the paragraph or the utterance contained in the paragraph whose range is estimated by the paragraph estimation unit 33. As described above, the multi-value classification model 2 is pre-learned based on the teacher data to which the utterances constituting the series data or the division units thereof are given multi-value labels indicating the topics to which the utterances are related. It is a model. The teacher data used for learning the multi-valued classification model 2 is given a binary label indicating that the utterance of the story change or its division unit is a story change, and the range in which the topic continues and the range thereof. The topic in is generated using the identified series data. Specifically, the teacher data used for learning the multi-valued classification model 2 includes the utterance in the above-mentioned series data in the utterance or the division unit thereof to which the binary label indicates that the utterance is a change of story. It is generated by adding a multi-valued label indicating the topic in the range.
 出力部35は、系列データにおける範囲が推定された段落ごとに、その段落を構成する発話を出力する。また、出力部35は、段落における話題を示す多値ラベル、段落の開示時刻および終了時刻などを出力してもよい。 The output unit 35 outputs the utterances constituting the paragraph for each paragraph whose range is estimated in the series data. Further, the output unit 35 may output a multi-valued label indicating a topic in the paragraph, the disclosure time and the end time of the paragraph, and the like.
 このように本実施形態においては、判定部32は、複数の話題を含む対話の系列データを構成する発話またはその分割単位に対して、話の切り替わりであるか否かを示す二値ラベルが付与された教師データに基づいて予め学習された二値分類モデル1を用いて、処理対象の系列データを構成する発話が、話の切り替わりの発話であるか否かを判定する。そして、段落推定部33は、判定部32による判定の結果に基づき、処理対象の系列データにおける段落の範囲を推定する。また、話題推定部34は、多値分類モデル2を用いて、段落推定部33により範囲が推定された段落または段落に含まれる発話における話題を推定する。また、出力部35は、範囲が推定された段落ごとの発話、段落における話題を示す多値ラベル、あるいは、段落の開示時刻および終了時刻などを出力する。 As described above, in the present embodiment, the determination unit 32 assigns a binary label indicating whether or not the utterance is switched to the utterance or the division unit thereof constituting the series data of the dialogue including a plurality of topics. Using the binary classification model 1 learned in advance based on the teacher data, it is determined whether or not the utterance constituting the series data to be processed is the utterance of the switching of the talk. Then, the paragraph estimation unit 33 estimates the range of paragraphs in the series data to be processed based on the result of the determination by the determination unit 32. Further, the topic estimation unit 34 estimates the topic in the paragraph or the utterance included in the paragraph whose range is estimated by the paragraph estimation unit 33 by using the multi-value classification model 2. Further, the output unit 35 outputs an utterance for each paragraph whose range is estimated, a multi-valued label indicating a topic in the paragraph, a disclosure time and an end time of the paragraph, and the like.
 また、本実施形態においては、学習装置10は、発話またはその分割単位に対して、話の切り替わりであるか否かを示す二値ラベルが付与された教師データを学習することで、系列データを構成する発話が、話の切り替わりの発話であるか否かを判定する二値分類モデル1を生成することができる。また、学習装置20は、系列データを構成する発話またはその分割単位に対して、その発話が関連する話題を示す多値ラベルが付与された教師データを学習することで、段落または段落に含まれる発話における話題を判定する多値分類モデル2を学習することができる。また、推定装置30は、二値分類モデル1の判定の結果に基づき、系列データにおける段落の範囲を推定することができる。また、推定装置30は、多値分類モデル2を用いて、範囲が推定された段落または段落を構成する発話における話題を推定することができる。したがって、本実施形態に係る推定装置30によれば、複数の話題を含む対話の系列データから、話の切り替わりから次の切り替わりの直前の発話までの段落または話の切り替わりから対話の終わりの発話までの段落の範囲を推定することができる。また、本実施形態に係る推定装置30によれば、系列データにおける段落の範囲を推定することで、段落に含まれる発話に限って話題を推定することができるので、話題の推定の精度向上を図ることができる。 Further, in the present embodiment, the learning device 10 learns the teacher data to which the binary label indicating whether or not the utterance is switched is given to the utterance or the division unit thereof, so that the series data can be obtained. It is possible to generate a binary classification model 1 for determining whether or not the constituent utterances are utterances of switching talks. Further, the learning device 20 is included in a paragraph or a paragraph by learning the teacher data to which the multi-valued label indicating the topic to which the utterance is related is attached to the utterance or the division unit thereof constituting the series data. It is possible to learn the multi-value classification model 2 for determining a topic in an utterance. Further, the estimation device 30 can estimate the range of paragraphs in the series data based on the result of the determination of the binary classification model 1. In addition, the estimation device 30 can use the multi-value classification model 2 to estimate a paragraph whose range has been estimated or a topic in an utterance constituting the paragraph. Therefore, according to the estimation device 30 according to the present embodiment, from the series data of the dialogue including a plurality of topics, from the paragraph from the switching of the talk to the utterance immediately before the next switching, or from the switching of the talk to the utterance at the end of the dialogue. The range of paragraphs can be estimated. Further, according to the estimation device 30 according to the present embodiment, by estimating the range of the paragraph in the series data, the topic can be estimated only for the utterances included in the paragraph, so that the accuracy of the estimation of the topic can be improved. Can be planned.
 図3においては、推定装置30は、多値分類モデル2を用いて話題を推定する例を用いて説明したが、本開示はこれに限られるものではない。上述したように、多値分類モデル2の学習には、系列データにおける1つの話題が連続する範囲およびその範囲における話題が人手により特定された教師データが用いられる。少数の話題を対象とする場合には、このような教師データを準備することも比較的容易である。一方、多数の話題を対象とする場合など、1つの話題が続く範囲、およびその範囲における話題を特定した教師データを、準備することが困難な場合がある。本開示においては、このような場合にも、多値分類モデル2を用いずに、話題を推定することも可能である。 In FIG. 3, the estimation device 30 has been described with an example of estimating a topic using the multi-value classification model 2, but the present disclosure is not limited to this. As described above, in the learning of the multi-value classification model 2, teacher data in which one topic in the series data is continuous and the topic in the range is manually specified is used. It is also relatively easy to prepare such teacher data when targeting a small number of topics. On the other hand, it may be difficult to prepare teacher data that specifies a range in which one topic continues and a topic in that range, such as when a large number of topics are targeted. In the present disclosure, even in such a case, it is possible to estimate the topic without using the multi-value classification model 2.
 図4は、本実施形態に係る、多値分類モデル2を用いずに話題を推定する推定装置30aの構成例を示す図である。図4において、図3と同様の構成には同じ符号を付し、説明を省略する。 FIG. 4 is a diagram showing a configuration example of an estimation device 30a for estimating a topic without using the multi-value classification model 2 according to the present embodiment. In FIG. 4, the same components as those in FIG. 3 are designated by the same reference numerals, and the description thereof will be omitted.
 図4に示すように、推定装置30aは、入力部31と、判定部32と、段落推定部33と、キーワード抽出部36と、話題推定部34aと、出力部35とを備える。図4に示す推定装置30aは、図3に示す推定装置30と比較して、キーワード抽出部36を追加した点と、話題推定部34を話題推定部34aに変更した点とが異なる。 As shown in FIG. 4, the estimation device 30a includes an input unit 31, a determination unit 32, a paragraph estimation unit 33, a keyword extraction unit 36, a topic estimation unit 34a, and an output unit 35. The estimation device 30a shown in FIG. 4 is different from the estimation device 30 shown in FIG. 3 in that the keyword extraction unit 36 is added and the topic estimation unit 34 is changed to the topic estimation unit 34a.
 キーワード抽出部36は、段落推定部33により範囲が推定された段落に含まれる発話から、少なくとも1つのキーワードを抽出する。キーワードを抽出する手法は、任意の手法を用いることができ、例えば、tf-idf(Term Frequency - Inverse Document Frequency)などの既存の手法を用いることができる。キーワード抽出部36が抽出するキーワードの数は、予め所定の数に制限されてもよいし、ユーザが指定してもよい。 The keyword extraction unit 36 extracts at least one keyword from the utterances included in the paragraph whose range is estimated by the paragraph estimation unit 33. Any method can be used as the method for extracting keywords, and for example, an existing method such as tf-idf (Term Frequency-Inverse Document Frequency) can be used. The number of keywords extracted by the keyword extraction unit 36 may be limited to a predetermined number in advance, or may be specified by the user.
 話題推定部34aは、キーワード抽出部36により、段落に含まれる発話から抽出されたキーワードに基づき、その段落または段落に含まれる発話における話題を推定する。話題推定部34aは、例えば、抽出されたキーワードを段落または段落に含まれる発話における話題と推定してよい。また、話題推定部34aは、例えば、予め規定された複数の話題の中から、抽出されたキーワードと類似性の高い話題を、段落または段落に含まれる発話における話題と推定してもよい。 The topic estimation unit 34a estimates the topic in the paragraph or the utterance contained in the paragraph based on the keywords extracted from the utterance included in the paragraph by the keyword extraction unit 36. The topic estimation unit 34a may, for example, estimate the extracted keyword as a paragraph or a topic in an utterance contained in the paragraph. Further, the topic estimation unit 34a may estimate, for example, a topic having a high similarity to the extracted keyword from a plurality of predetermined topics as a paragraph or a topic in the utterance included in the paragraph.
 このように、図4に示す推定装置30aによれば、多値分類モデル2を用いずに、段落または段落に含まれる発話における話題を推定することができる。そのため、話題の範囲およびその範囲における話題が特定された大量の教師データを用意することが困難な場合にも、系列データにおける話題を推定することができる。 As described above, according to the estimation device 30a shown in FIG. 4, it is possible to estimate the topic in the paragraph or the utterance contained in the paragraph without using the multi-value classification model 2. Therefore, even when it is difficult to prepare a range of topics and a large amount of teacher data in which the topics in the range are specified, it is possible to estimate the topics in the series data.
 図5は、本実施形態に係る推定装置30bの構成例を示す図である。図5に示す推定装置30bは、図4に示す推定装置30aと同様に、多値分類モデル2を用いずに話題を推定する。図5において、図4と同様の構成には同じ符号を付し、説明を省略する。 FIG. 5 is a diagram showing a configuration example of the estimation device 30b according to the present embodiment. Like the estimation device 30a shown in FIG. 4, the estimation device 30b shown in FIG. 5 estimates the topic without using the multi-value classification model 2. In FIG. 5, the same components as those in FIG. 4 are designated by the same reference numerals, and the description thereof will be omitted.
 図5に示すように、推定装置30bは、入力部31と、判定部32と、段落推定部33と、クラスタリング部37と、キーワード抽出部36bと、話題推定部34bと、出力部35とを備える。図5に示す推定装置30bは、図4に示す推定装置30aと比較して、クラスタリング部37を追加した点と、キーワード抽出部36をキーワード抽出部36bに変更した点と、話題推定部34aを話題推定部34bに変更した点とが異なる。 As shown in FIG. 5, the estimation device 30b includes an input unit 31, a determination unit 32, a paragraph estimation unit 33, a clustering unit 37, a keyword extraction unit 36b, a topic estimation unit 34b, and an output unit 35. Be prepared. The estimation device 30b shown in FIG. 5 has a point that a clustering unit 37 is added, a point that the keyword extraction unit 36 is changed to a keyword extraction unit 36b, and a topic estimation unit 34a, as compared with the estimation device 30a shown in FIG. It is different from the point changed to the topic estimation unit 34b.
 図5に示す推定装置30bにおいては、少なくとも1以上の系列データが入力される。クラスタリング部37は、入力された1つ以上の系列データについて段落推定部33により範囲が推定された複数の段落を、類似する段落ごとにクラスタリングする。クラスタリングの手法は、既存の任意の手法を用いることができる。クラスタリング部37は、類似する段落からなるクラスタの中で代表の段落を決定する。クラスタリング部37は、例えば、クラスタを構成する段落のうち、クラスタの中心の段落を代表の段落と決定する。また、クラスタリング部37は、例えば、クラスタを構成する段落のうち、任意の段落を代表の段落と決定してもよい。 In the estimation device 30b shown in FIG. 5, at least one or more series data is input. The clustering unit 37 clusters a plurality of paragraphs whose range is estimated by the paragraph estimation unit 33 for one or more input series data for each similar paragraph. As the clustering method, any existing method can be used. The clustering unit 37 determines a representative paragraph in a cluster consisting of similar paragraphs. The clustering unit 37 determines, for example, the paragraph at the center of the cluster as the representative paragraph among the paragraphs constituting the cluster. Further, the clustering unit 37 may determine, for example, any paragraph among the paragraphs constituting the cluster as a representative paragraph.
 キーワード抽出部36bは、クラスタを構成する段落のうち、クラスタリング部37により決定された、代表の段落に含まれる発話からキーワードを抽出する。 The keyword extraction unit 36b extracts keywords from the utterances included in the representative paragraph determined by the clustering unit 37 among the paragraphs constituting the cluster.
 話題推定部34bは、クラスタの代表の段落に含まれる発話から、キーワード抽出部36bにより抽出されたキーワードに基づき、そのクラスタを構成する段落における話題を推定する。具体的には、話題推定部34bは、クラスタの代表の段落に含まれる発話から抽出されたキーワードに基づき推定した話題を、そのクラスタを構成する全ての段落における話題と推定する。 The topic estimation unit 34b estimates the topic in the paragraph constituting the cluster based on the keywords extracted by the keyword extraction unit 36b from the utterances included in the paragraph representing the cluster. Specifically, the topic estimation unit 34b estimates a topic estimated based on a keyword extracted from an utterance included in a paragraph representing a cluster as a topic in all paragraphs constituting the cluster.
 図3から図5においては、推定装置30,30a,30bは、コンタクトセンタにおけるオペレータとカスタマとの対話の音声認識の結果を処理する例を用いて説明したが、本開示はこれに限られるものではない。例えば、推定装置30,30a,30bにおいて、テキストチャットに対する形態素解析を行う形態素解析部が入力部31の後段に設けられてもよい。 In FIGS. 3 to 5, the estimation devices 30, 30a, and 30b have been described with reference to an example of processing the result of voice recognition of the dialogue between the operator and the customer in the contact center, but the present disclosure is limited to this. is not it. For example, in the estimation devices 30, 30a, 30b, a morphological analysis unit that performs morphological analysis for text chat may be provided after the input unit 31.
 また、図3から図5においては、複数の発話が時系列的に並ぶ系列データが入力される例を用いて説明したが、本開示はこれに限られるものではない。系列データを構成する発話を1つずつ入力するために、系列データから発話を1つずつ取り出す機能部が入力部31の前段に設けられてもよい。 Further, in FIGS. 3 to 5, the description has been made using an example in which series data in which a plurality of utterances are arranged in chronological order is input, but the present disclosure is not limited to this. In order to input the utterances constituting the series data one by one, a function unit for extracting the utterances one by one from the series data may be provided in front of the input unit 31.
 図6は、図2に示す学習装置20における多値ラベルの補完について説明するためのフローチャートである。 FIG. 6 is a flowchart for explaining the complementation of the multi-valued label in the learning device 20 shown in FIG.
 多値ラベル補完部22は、入力部21に入力された系列データから話題を示す多値ラベル、および、話の切り替わりを示す二値ラベルが付与されている発話を1つずつ読み込む(ステップS11)。なお、多値ラベルは話題を示す範囲の最初の発話のみに付与され、他の発話には付与されていない。話の切り替わりを示す二値ラベルは、話の切り替わりを示す発話のみに付与されており、他の発話には付与されていない。 The multi-valued label complementing unit 22 reads the utterances to which the multi-valued label indicating the topic and the binary label indicating the switching of the talk are attached one by one from the series data input to the input unit 21 (step S11). .. The multi-valued label is given only to the first utterance in the range indicating the topic, and is not given to other utterances. The binary label indicating the change of talk is given only to the utterance showing the change of talk, and is not given to other utterances.
 多値ラベル補完部22は、読み込んだ発話に話題を示す多値ラベルが付与されているか否かを判定する(ステップS12)。 The multi-valued label complementing unit 22 determines whether or not a multi-valued label indicating a topic is attached to the read utterance (step S12).
 多値ラベルが付与されていると判定した場合(ステップS12:Yes)、多値ラベル補完部22は、読み込んだ発話の多値ラベルが分かるように上記発話とは別に、不図示の多値ラベル一時記憶装置にその多値ラベルを記憶する。多値ラベル補完部22は、既に多値ラベル一時記憶装置に記憶している多値ラベルが存在している場合には、記憶している多値ラベルを読み込んだ発話に付与されている多値ラベルに更新して多値ラベル一時記憶装置に記憶する(ステップS13)。 When it is determined that the multi-value label is attached (step S12: Yes), the multi-value label complementing unit 22 separates the multi-value label of the read utterance so that the multi-value label of the read utterance can be understood. Store the multi-valued label in a temporary storage device. When the multi-value label already stored in the multi-value label temporary storage device exists, the multi-value label complementing unit 22 gives the multi-value attached to the speech that reads the stored multi-value label. The label is updated and stored in the multi-value label temporary storage device (step S13).
 多値ラベルが付与されていないと判定した場合(ステップS12:No)、あるいは、読み込んだ発話に付与されている多値ラベルを更新・記憶すると、多値ラベル補完部22は、読み込んだ発話に、話の切り替わりであることを示す二値ラベルが付与されているか否かを判定する(ステップS14)。 When it is determined that the multi-valued label is not attached (step S12: No), or when the multi-valued label attached to the read utterance is updated and stored, the multi-valued label complementing unit 22 adds the read utterance to the read utterance. , It is determined whether or not a binary label indicating that the utterance is switched is attached (step S14).
 話の切り替わりであることを示す二値ラベルが付与されていると判定した場合(ステップS14:Yes)、多値ラベル補完部22は、多値ラベル一時記憶装置に記憶している多値ラベルを読み込んだ発話に付与する(ステップS15)。このように、多値ラベル補完部22は、読み込んだ発話に、対話の切り替わりであることを示す二値ラベルが付与されている場合、系列データにおける、その発話が含まれる範囲の話題を示す多値ラベルを付与する。 When it is determined that the binary label indicating that the utterance is switched is attached (step S14: Yes), the multi-value label complementing unit 22 stores the multi-value label stored in the multi-value label temporary storage device. It is given to the read utterance (step S15). As described above, when the read utterance is given a binary label indicating that the dialogue is switched, the multi-valued label complementing unit 22 indicates a multi-valued topic in the series data in the range including the utterance. Give a value label.
 話の切り替わりであることを示す二値ラベルが付与されていないと判定した場合(ステップS14:No)、あるいは、読み込んだ発話に多値ラベルを付与すると、多値ラベル補完部22は、読み込んだ発話が対話の終わりの発話であるか否かを判定する(ステップS16)。 When it is determined that the binary label indicating that the talk is switched is not given (step S14: No), or when the read utterance is given a multi-value label, the multi-value label complementing unit 22 reads. It is determined whether or not the utterance is the utterance at the end of the dialogue (step S16).
 読み込んだ発話が対話の終わりの発話であると判定した場合(ステップS16:Yes)、多値ラベル補完部22は、処理を終了する。 When it is determined that the read utterance is the utterance at the end of the dialogue (step S16: Yes), the multi-value label complementing unit 22 ends the process.
 読み込んだ発話が対話の終わりの発話でないと判定した場合(ステップS16:No)、多値ラベル補完部22は、ステップS11の処理に戻り、次の発話を読み込む。 When it is determined that the read utterance is not the utterance at the end of the dialogue (step S16: No), the multi-value label complementing unit 22 returns to the process of step S11 and reads the next utterance.
 図6においては、多値ラベルは、話題を示す範囲の最初の発話のみに付与され、他の発話には付与されていない例を用いて説明したが、あらかじめ、話題を示す範囲の全ての発話にその話題の多値ラベルが付与されていてもよい。この場合、話の切り替わりを示す二値ラベルが付与されていない発話から多値ラベルを削除すると、話の切り替わりを示す二値ラベルが付与されている発話のみに話題を示す多値ラベルが付与される。 In FIG. 6, the multi-valued label is given to only the first utterance in the range indicating the topic, and is not given to other utterances. However, all the utterances in the range indicating the topic are given in advance. May be labeled with a multi-valued label for that topic. In this case, if the multi-valued label is deleted from the utterances that are not given the binary label indicating the change of story, the multi-valued label indicating the topic is given only to the utterances that are given the binary label indicating the change of story. Label.
 このように、話の切り替わりの発話に、話題を示す多値ラベルが付与される方法であればどのような方法であっても構わない。 In this way, any method may be used as long as a multi-valued label indicating the topic is attached to the utterance of the change of story.
 次に、図3に示す推定装置30の動作について説明する。図7は、推定装置30の動作の一例を示すフローチャートであり、推定装置30による推定方法を説明するための図である。 Next, the operation of the estimation device 30 shown in FIG. 3 will be described. FIG. 7 is a flowchart showing an example of the operation of the estimation device 30, and is a diagram for explaining an estimation method by the estimation device 30.
 判定部32は、入力部31に入力された処理対象の系列データから1つずつ発話を読み込む(ステップS21)。判定部32は、二値分類モデル1を用いて、読み込んだ発話が話の切り替わりの発話であるか否かを判定する(ステップS22)。 The determination unit 32 reads the utterances one by one from the series data of the processing target input to the input unit 31 (step S21). The determination unit 32 uses the binary classification model 1 to determine whether or not the read utterance is a talk switching utterance (step S22).
 段落推定部33は、読み込まれた発話が、判定部32により話の切り替わりの発話であると判定されたか、または、読み込まれた発話が対話の終わりの発話であるか否かを判定する(ステップS23)。 The paragraph estimation unit 33 determines whether the read utterance is determined by the determination unit 32 to be a switching utterance, or whether the read utterance is an utterance at the end of the dialogue (step). S23).
 読み込まれた発話が話の切り替わりの発話でないと判定され、かつ、読み込まれた発話が対話の終わりの発話でないと判定した場合(ステップS23:No)、段落推定部33は、読み込まれた発話を、段落を構成する発話として蓄積する(ステップS24)。読み込まれた発話が蓄積されると、ステップS21から処理が繰り返される。 When it is determined that the read utterance is not the utterance of the switching of the talk and the read utterance is not the utterance at the end of the dialogue (step S23: No), the paragraph estimation unit 33 determines the read utterance. , Accumulate as utterances constituting the paragraph (step S24). When the read utterances are accumulated, the process is repeated from step S21.
 読み込まれた発話が話の切り替わりの発話であると判定された、あるいは、読み込まれた発話が対話の終わりの発話であると判定した場合(ステップS23:Yes)、段落推定部33は、蓄積した発話があるか否かを判定する(ステップS25)。 When it is determined that the read utterance is the utterance of the switching of the story, or the read utterance is determined to be the utterance at the end of the dialogue (step S23: Yes), the paragraph estimation unit 33 has accumulated. It is determined whether or not there is an utterance (step S25).
 蓄積した発話があると判定した場合(ステップS25:Yes)、段落推定部33は、蓄積した発話の範囲が段落であると推定し、蓄積した発話を、段落を構成する発話として話題推定部34に出力する。話題推定部34は、多値分類モデル2を用いて、段落推定部33により範囲が推定された段落における話題を推定する(ステップS26)。 When it is determined that there are accumulated utterances (step S25: Yes), the paragraph estimation unit 33 estimates that the range of the accumulated utterances is a paragraph, and the accumulated utterances are used as the utterances constituting the paragraph, and the topic estimation unit 34 Output to. The topic estimation unit 34 estimates the topic in the paragraph whose range has been estimated by the paragraph estimation unit 33 using the multi-value classification model 2 (step S26).
 図7においては、多値分類モデル2を用いて、段落ごとに話題を推定する例を用いて説明しているが、本開示はこれに限られるものではない。話題推定部34は、段落に含まれる少なくとも1つ以上の発話単位で、話題を推定してもよい。この場合、話題推定部34は、段落の最初の発話だけを用いて話題を推定してもよいし、段落の最初の発話から予め指定された数の発話を用いて話題を推定してもよい。1つ以上の発話を単位として話題を推定する場合、多値分類モデル2は、話題を推定する単位ごとに多値ラベルが付与された教師データに基づき学習される。 In FIG. 7, the explanation is given using an example of estimating a topic for each paragraph using the multi-value classification model 2, but the present disclosure is not limited to this. The topic estimation unit 34 may estimate the topic in at least one utterance unit included in the paragraph. In this case, the topic estimation unit 34 may estimate the topic using only the first utterance of the paragraph, or may estimate the topic using a predetermined number of utterances from the first utterance of the paragraph. .. When a topic is estimated in units of one or more utterances, the multi-value classification model 2 is learned based on teacher data to which a multi-value label is attached to each unit for estimating a topic.
 話題推定部34は、推定した話題を示す多値ラベルを段落に付与する(ステップS27)。段落推定部33は、発話の蓄積をリセットし(ステップS28)、読み込まれた発話が対話の終わりの発話であるか否かを判定する(ステップS29)。 The topic estimation unit 34 attaches a multi-valued label indicating the estimated topic to the paragraph (step S27). The paragraph estimation unit 33 resets the accumulation of utterances (step S28), and determines whether or not the read utterance is the utterance at the end of the dialogue (step S29).
 読み込まれた発話が対話の終わりの発話でないと判定した場合(ステップS29:No)、段落推定部33は、ステップS24の処理に戻り、読み込まれた発話を蓄積する。こうすることで、読み込まれた発話が新たな段落の最初の発話として蓄積される。 When it is determined that the read utterance is not the utterance at the end of the dialogue (step S29: No), the paragraph estimation unit 33 returns to the process of step S24 and accumulates the read utterance. By doing this, the read utterance is accumulated as the first utterance of a new paragraph.
 読み込まれた発話が対話の終わりの発話であると判定した場合(ステップS29:Yes)、段落推定部33は、処理を終了する。 When it is determined that the read utterance is the utterance at the end of the dialogue (step S29: Yes), the paragraph estimation unit 33 ends the process.
 このように、推定装置30による推定方法は、判定ステップ(ステップS22)と、段落推定ステップ(ステップS23~ステップS25)とを含む。判定ステップでは、複数の話題を含む対話の系列データを構成する発話またはその分割単位に対して、話の切り替わりであるか否かを示す二値ラベル(第1のラベル)が付与された教師データ(第1の教師データ)に基づいて予め学習された二値分類モデル1(第1のモデル)を用いて、処理対象の系列データを構成する発話が、話の切り替わりの発話であるか否かを判定する。段落推定ステップでは、判定の結果に基づき、処理対象の系列データにおける、話の切り替わりから次の切り替わりの直前の発話までの段落または話の切り替わりから対話の終わりの発話までの段落の範囲を推定する。 As described above, the estimation method by the estimation device 30 includes a determination step (step S22) and a paragraph estimation step (steps S23 to S25). In the determination step, teacher data to which a binary label (first label) indicating whether or not the utterance is switched is given to the utterance or its division unit constituting the series data of the dialogue including a plurality of topics. Whether or not the utterances constituting the series data to be processed using the binary classification model 1 (first model) learned in advance based on (first teacher data) are utterances of switching stories. To judge. In the paragraph estimation step, based on the result of the determination, the range of the paragraph from the talk switching to the utterance immediately before the next switching or the paragraph from the talk switching to the utterance at the end of the dialogue is estimated in the series data to be processed. ..
 発話またはその分割単位に対して、話の切り替わりであるか否かを示す二値ラベルが付与された教師データを学習することで、系列データを構成する発話が、話の切り替わりの発話であるか否かを判定する二値分類モデル1を生成することができる。そして、二値分類モデル1の判定の結果に基づき、処理対象の系列データにおける段落の範囲を推定することができる。したがって、複数の話題を含む対話の系列データにおける段落の範囲を推定することができる。 Whether the utterance that constitutes the series data is the utterance of the change of story by learning the teacher data to which the binary label indicating whether or not the change of the story is given to the utterance or its division unit. It is possible to generate a binary classification model 1 for determining whether or not. Then, based on the result of the determination of the binary classification model 1, the range of paragraphs in the series data to be processed can be estimated. Therefore, it is possible to estimate the range of paragraphs in the series data of a dialogue containing a plurality of topics.
 また、本実施形態に係る推定方法は、話題推定ステップ(ステップS26)をさらに含んでよい。話題推定ステップでは、系列データを構成する発話またはその分割単位に対して、その発話が関連する話題を示す多値ラベル(第2のラベル)が付与された教師データに基づいて予め学習された多値分類モデル2(第2のモデル)を用いて、段落または段落に含まれる発話における話題を推定する。段落の範囲を推定することで、段落に含まれる発話に限定して、話題を推定することができるので、話題の推定精度の向上を図ることができる。 Further, the estimation method according to the present embodiment may further include a topic estimation step (step S26). In the topic estimation step, many pre-learned based on teacher data to which a multi-valued label (second label) indicating a topic related to the utterance is given to the utterances constituting the series data or the division unit thereof. The value classification model 2 (second model) is used to estimate the topic in the paragraph or the utterance contained in the paragraph. By estimating the range of the paragraph, the topic can be estimated only for the utterances included in the paragraph, so that the estimation accuracy of the topic can be improved.
 次に、図4に示す推定装置30aの動作について説明する。図8は、図4に示す推定装置30aの動作の一例を示すフローチャートであり、推定装置30aによる推定方法を説明するための図である。図8において、図7と同様の処理には同じ符号を付し、説明を省略する。 Next, the operation of the estimation device 30a shown in FIG. 4 will be described. FIG. 8 is a flowchart showing an example of the operation of the estimation device 30a shown in FIG. 4, and is a diagram for explaining an estimation method by the estimation device 30a. In FIG. 8, the same processing as in FIG. 7 is designated by the same reference numerals, and the description thereof will be omitted.
 蓄積した発話があると判定した場合(ステップS25:Yes)、段落推定部33は、蓄積した発話の範囲が段落であると推定し、蓄積した発話をキーワード抽出部36に出力する。キーワード抽出部36は、段落推定部33により範囲が推定された段落に含まれる発話からキーワードを抽出する(ステップS31)。話題推定部34aは、段落に含まれる発話からキーワード抽出部36により抽出されたキーワードに基づき、その段落または段落に含まれる発話における話題を推定する(ステップS32)。 When it is determined that there are accumulated utterances (step S25: Yes), the paragraph estimation unit 33 estimates that the range of the accumulated utterances is a paragraph, and outputs the accumulated utterances to the keyword extraction unit 36. The keyword extraction unit 36 extracts keywords from the utterances included in the paragraph whose range is estimated by the paragraph estimation unit 33 (step S31). The topic estimation unit 34a estimates the topic in the paragraph or the utterance included in the paragraph based on the keyword extracted by the keyword extraction unit 36 from the utterance included in the paragraph (step S32).
 このように、推定装置30aによる推定方法は、キーワード抽出ステップ(ステップS31)と、話題推定ステップ(ステップS32)とを含む。キーワード抽出ステップでは、範囲が推定された段落に含まれる発話からキーワードを抽出する。話題推定ステップでは、段落に含まれる発話から抽出されたキーワードに基づき、段落または段落に含まれる発話における話題を推定する。 As described above, the estimation method by the estimation device 30a includes a keyword extraction step (step S31) and a topic estimation step (step S32). In the keyword extraction step, keywords are extracted from the utterances contained in the paragraph whose range is estimated. In the topic estimation step, the topic in the paragraph or the utterance contained in the paragraph is estimated based on the keywords extracted from the utterance contained in the paragraph.
 次に、図5に示す推定装置30bの動作について説明する。図9は、図5に示す推定装置30bによる段落の範囲の推定の動作の一例を示すフローチャートであり、推定装置30bによる推定方法を説明するための図である。図9において、図7と同様の処理には同じ符号を付し、説明を省略する。 Next, the operation of the estimation device 30b shown in FIG. 5 will be described. FIG. 9 is a flowchart showing an example of the operation of estimating the range of the paragraph by the estimation device 30b shown in FIG. 5, and is a diagram for explaining the estimation method by the estimation device 30b. In FIG. 9, the same processing as in FIG. 7 is designated by the same reference numerals, and the description thereof will be omitted.
 推定装置30bにおいては、蓄積した発話があると判定した場合(ステップS25:Yes)、段落推定部33は、蓄積した発話の範囲が段落であると推定する。そして、段落推定部33は、発話の蓄積をリセットする(ステップS28)。 When the estimation device 30b determines that there are accumulated utterances (step S25: Yes), the paragraph estimation unit 33 estimates that the range of the accumulated utterances is a paragraph. Then, the paragraph estimation unit 33 resets the accumulation of utterances (step S28).
 図10は、図5に示す推定装置30bによる話題の推定の動作の一例を示すフローチャートであり、推定装置30bによる推定方法を説明するための図である。 FIG. 10 is a flowchart showing an example of the operation of estimating a topic by the estimation device 30b shown in FIG. 5, and is a diagram for explaining an estimation method by the estimation device 30b.
 クラスタリング部37は、段落推定部33により範囲が推定された段落を読み込む(ステップS41)。クラスタリング部37は、少なくとも1つ以上の系列データに含まれる複数の段落を読み込む。すなわち、クラスタリング部37は、ステップS41の処理を必要な回数だけ繰り返す。 The clustering unit 37 reads the paragraph whose range has been estimated by the paragraph estimation unit 33 (step S41). The clustering unit 37 reads a plurality of paragraphs contained in at least one or more series data. That is, the clustering unit 37 repeats the process of step S41 as many times as necessary.
 クラスタリング部37は、読み込んだ複数の段落を、類似する段落ごとにクラスタリングする(ステップS42)。 The clustering unit 37 clusters a plurality of read paragraphs for each similar paragraph (step S42).
 次に、クラスタリング部37は、未処理のクラスタがないか否かを判定する(ステップS43)。未処理のクラスタとは、クラスタに含まれる段落に対する多値ラベルの付与が行われていないクラスタである。 Next, the clustering unit 37 determines whether or not there are unprocessed clusters (step S43). An unprocessed cluster is a cluster in which paragraphs contained in the cluster are not given multi-value labels.
 未処理のクラスタが存在すると判定した場合(ステップS43:No)、クラスタリング部37は、未処理のクラスタのうちの1つのクラスタを処理対象のクラスタと決定し、処理対象のクラスタに含まれる段落の中から、代表の段落を決定する(ステップS44)。クラスタリング部37は、例えば、クラスタの中心の段落を代表の段落と決定する。 When it is determined that an unprocessed cluster exists (step S43: No), the clustering unit 37 determines one of the unprocessed clusters as the cluster to be processed, and the paragraph included in the cluster to be processed is included. A representative paragraph is determined from the inside (step S44). The clustering unit 37 determines, for example, the paragraph at the center of the cluster as the representative paragraph.
 キーワード抽出部36bは、クラスタリング部37により決定されたクラスタの代表の段落に含まれる発話からキーワードを抽出する(ステップS45)。 The keyword extraction unit 36b extracts keywords from the utterances included in the representative paragraph of the cluster determined by the clustering unit 37 (step S45).
 話題推定部34bは、キーワード抽出部36bにより抽出されたキーワードに基づき、クラスタの代表の段落における話題を推定する(ステップS46)。次に、話題推定部34bは、未処理の段落がないか否かを判定する(ステップS47)。未処理の段落とは、処理対象のクラスタに含まれる段落のうち、多値ラベルが付与されていない段落である。 The topic estimation unit 34b estimates the topic in the paragraph representing the cluster based on the keywords extracted by the keyword extraction unit 36b (step S46). Next, the topic estimation unit 34b determines whether or not there is an unprocessed paragraph (step S47). The unprocessed paragraph is a paragraph included in the cluster to be processed that is not given a multi-value label.
 未処理の段落があると判定した場合(ステップS47:No)、話題推定部34bは、クラスタに含まれる未処理の段落に対して、そのクラスタの代表の段落から抽出したキーワードに基づき推定した話題を示す多値ラベルを付与する(ステップS48)。そして、話題推定部34bは、ステップS47の処理に戻る。 When it is determined that there is an unprocessed paragraph (step S47: No), the topic estimation unit 34b estimates the unprocessed paragraph included in the cluster based on the keyword extracted from the representative paragraph of the cluster. Is given a multi-valued label indicating (step S48). Then, the topic estimation unit 34b returns to the process of step S47.
 話題推定部34bにより未処理の段落がないと判定された場合(ステップS47:Yes)、ステップS43から処理が繰り返される。 When the topic estimation unit 34b determines that there is no unprocessed paragraph (step S47: Yes), the process is repeated from step S43.
 このように、推定装置30bによる推定方法は、クラスタリングステップ(ステップS42)をさらに備える。クラスタリングステップでは、1または複数の系列データに基づき範囲が推定された複数の段落を、類似する段落ごとにクラスタリングする。キーワード抽出ステップでは、類似する段落からなるクラスタに含まれる段落のうち、代表の段落に含まれる発話からキーワードを抽出する。話題推定ステップでは、代表の段落に含まれる発話から抽出されたキーワードに基づき、代表の段落を含むクラスタを構成する段落における話題を推定する。 As described above, the estimation method by the estimation device 30b further includes a clustering step (step S42). In the clustering step, a plurality of paragraphs whose range is estimated based on one or a plurality of series data are clustered for each similar paragraph. In the keyword extraction step, keywords are extracted from the utterances included in the representative paragraph among the paragraphs included in the cluster consisting of similar paragraphs. In the topic estimation step, the topic in the paragraphs constituting the cluster including the representative paragraph is estimated based on the keywords extracted from the utterances included in the representative paragraph.
 次に、モデルの学習(二値分類モデル1および多値分類モデル2)について、図11に示す具体例を用いて説明する。以下では、系列データには、「話題A」、「話題B」、「話題C」、「話題D」および「話題E」の5つの話題が含まれるものとする。 Next, model learning (binary classification model 1 and multi-value classification model 2) will be described using a specific example shown in FIG. In the following, it is assumed that the series data includes five topics, "topic A", "topic B", "topic C", "topic D", and "topic E".
 図11に示すように、教師データとして用いられる系列データにおいて、1つの話題が続く範囲と、その範囲における話題とが人手により特定され、1つの話題が続く範囲それぞれに対して、その範囲における話題を示す多値ラベルが付与される。また、系列データを構成する発話に対して、話の切り替わりであるか否かを示す二値ラベルが人手により付与される。なお、図11においては、図の簡略化のため、話の切り替わりの発話に対してのみ、その発話が話の切り替わりであることを示している。上述したように、1つの話題に関連する発話が続く範囲内であっても、話の切り替わりの発話に対しては、話の切り替わりであることを示す二値フラグが付与される。したがって、図11においては、例えば、話題Aに関連する発話が続く範囲の途中に存在する発話にも、話の切り替わりであることを示す二値ラベルが付与されることがある。 As shown in FIG. 11, in the series data used as teacher data, the range in which one topic continues and the topic in that range are manually specified, and for each range in which one topic continues, the topic in that range. A multi-valued label indicating In addition, a binary label indicating whether or not the utterance is switched is manually attached to the utterances constituting the series data. In addition, in FIG. 11, for the sake of simplification of the figure, it is shown that the utterance is the utterance of the talk change only for the utterance of the talk change. As described above, even within the range in which the utterances related to one topic continue, a binary flag indicating that the utterances are switched is given to the utterances that are switched. Therefore, in FIG. 11, for example, an utterance existing in the middle of the range in which the utterance related to the topic A continues may be given a binary label indicating that the utterance is a change of talk.
 上述した系列データおよび二値ラベルが学習装置10に入力され、入力された系列データおよび二値ラベルに基づき、LSTMなどを用いて、二値分類モデル1が学習される。 The above-mentioned series data and binary label are input to the learning device 10, and the binary classification model 1 is trained using LSTM or the like based on the input series data and binary label.
 また、上述した系列データ、二値ラベルおよび多値ラベルが学習装置20に入力される。学習装置20では、多値ラベルの補完が行われる。すなわち、図11に示すように、話の切り替わりであることを示すラベルが付与された発話に対して、その発話が含まれる系列データの範囲における話題を示す多値ラベルが付与される。こうすることで、系列データを構成する発話に対して、その発話が関連する話題を示す多値ラベルが付与された教師データが作成される。なお、上述したように、系列データを構成する発話の分割単位に対して、その発話が関連する話題を示す多値ラベルが付与されてもよい。 Further, the above-mentioned series data, binary label and multi-value label are input to the learning device 20. In the learning device 20, the multi-valued label is complemented. That is, as shown in FIG. 11, for an utterance to which a label indicating that the utterance is switched is given, a multi-valued label indicating a topic in the range of series data including the utterance is given. By doing so, teacher data is created with a multi-valued label indicating the topic to which the utterance is related to the utterances constituting the series data. As described above, a multi-valued label indicating a topic related to the utterance may be attached to the division unit of the utterance constituting the series data.
 作成された教師データに基づき、LSTMなどを用いて、多値分類モデル2が学習される。多値分類モデル2の学習においては、多値ラベルが付与された発話だけが学習されてもよいし、多値ラベルが付与された発話を含む段落全体の発話が学習されてもよい。 Based on the created teacher data, the multi-value classification model 2 is learned using LSTM or the like. In the learning of the multi-value classification model 2, only the utterances with the multi-value label may be learned, or the utterances of the entire paragraph including the utterances with the multi-value label may be learned.
 図12は、図3に示す推定装置30による話題の推定の一例を示す図である。図12においては、多値分類モデル2が発話単位で学習されているものとする。 FIG. 12 is a diagram showing an example of topic estimation by the estimation device 30 shown in FIG. In FIG. 12, it is assumed that the multi-valued classification model 2 is learned in utterance units.
 1つの対話の系列データが推定装置30に入力されると、図12に示すように、二値分類モデル1を用いて、系列データを構成する発話が、話の切り替わりの発話であるか否か判定される。そして、話の切り替わりの発話から、次の話の切り替わりの発話の直前の発話までの範囲あるいは話の切り替わりの発話から、対話の終わりの発話までの範囲が1つの段落と推定される。 When the series data of one dialogue is input to the estimation device 30, as shown in FIG. 12, whether or not the utterances constituting the series data are the utterances of switching of the talks using the binary classification model 1. It is judged. Then, the range from the utterance of the change of story to the utterance immediately before the utterance of the change of the next story or the utterance of the change of story to the utterance at the end of the dialogue is estimated to be one paragraph.
 次に、図12に示すように、範囲が推定された段落に含まれる発話のうち、話の切り替わりの発話であると判定された発話について、多値分類モデル2により、その発話における話題が推定される。なお、多値分類モデル2は、発話単位でなく、段落単位で学習が行われてもよい。この場合、図13に示すように、多値分類モデル2により、段落単位で、話題が推定される。 Next, as shown in FIG. 12, among the utterances included in the paragraph whose range is estimated, the utterance determined to be the utterance of the switching of the utterance is estimated by the multi-value classification model 2 as the topic in the utterance. Will be done. In the multi-valued classification model 2, learning may be performed not in utterance units but in paragraph units. In this case, as shown in FIG. 13, the topic is estimated in paragraph units by the multi-value classification model 2.
 図14は、図4に示す推定装置30aによる話題の推定の一例を示す図である。 FIG. 14 is a diagram showing an example of topic estimation by the estimation device 30a shown in FIG.
 1つの対話の系列データが推定装置30aに入力されると、図14に示すように、二値分類モデル1を用いて、系列データを構成する発話が、話の切り替わりの発話であるか否か判定される。そして、話の切り替わりの発話から、次の話の切り替わりの発話の直前の発話までの範囲が1つの段落と推定される。 When the series data of one dialogue is input to the estimation device 30a, as shown in FIG. 14, whether or not the utterance constituting the series data is the utterance of switching of the talk using the binary classification model 1. It is judged. Then, the range from the utterance of the change of story to the utterance immediately before the utterance of the change of the next story is estimated to be one paragraph.
 次に、範囲が推定された段落に含まれる発話からキーワードが抽出され、抽出されたキーワードに基づき、その段落の話題が推定され、推定された話題を示す多値ラベルが付与される。このように、多値分類モデル2を用いなくても、段落における話題を推定することができる。そのため、多値分類モデル2の学習に必要な教師データを用意することが難しい場合にも、系列データに含まれる段落の話題を推定することができる。なお、図14においては、各段落に対して異なる多値ラベル(「話題1」~「話題10」)が付与された例を示しているが、これらは必ずしも異なる話題であることを示しているわけではない。 Next, keywords are extracted from the utterances included in the paragraph whose range is estimated, the topic of that paragraph is estimated based on the extracted keywords, and a multi-valued label indicating the estimated topic is given. In this way, the topic in the paragraph can be estimated without using the multi-valued classification model 2. Therefore, even when it is difficult to prepare the teacher data necessary for learning the multi-valued classification model 2, the topic of the paragraph included in the series data can be estimated. Note that FIG. 14 shows an example in which different multi-value labels (“Topic 1” to “Topic 10”) are assigned to each paragraph, but these are necessarily different topics. Do not mean.
 図15は、図5に示す推定装置30bによる話題の推定の一例を示す図である。 FIG. 15 is a diagram showing an example of topic estimation by the estimation device 30b shown in FIG.
 1つ以上の対話の系列データが推定装置30bに入力されると、図15に示すように、二値分類モデル1を用いて、系列データを構成する発話が、話の切り替わりの発話であるか否か判定される。そして、話の切り替わりの発話から、次の話の切り替わりの発話の直前の発話までの範囲が1つの段落と推定される。 When the series data of one or more dialogues is input to the estimation device 30b, as shown in FIG. 15, is the utterance constituting the series data the utterance of the switching of the talks using the binary classification model 1? It is judged whether or not. Then, the range from the utterance of the change of story to the utterance immediately before the utterance of the change of the next story is estimated to be one paragraph.
 次に、図15に示すように、範囲の推定された複数の段落が、類似する段落ごとにクラスタリングされる。類似する段落からなるクラスタから代表の段落が決定され、代表の段落に含まれる発話からキーワードが抽出される。図15においては、太線で示される段落が、代表の段落を示す。 Next, as shown in FIG. 15, a plurality of paragraphs whose range is estimated are clustered for each similar paragraph. A representative paragraph is determined from a cluster of similar paragraphs, and keywords are extracted from the utterances contained in the representative paragraph. In FIG. 15, the paragraph shown by the thick line indicates the representative paragraph.
 次に、クラスタの代表の段落に含まれる発話から抽出されたキーワードに基づき、代表の段落における話題が推定され、推定された話題を示す多値ラベルが代表の段落に付与される。さらに、図15に示すように、クラスタを構成する他の段落にも、クラスタの代表の段落と同じ多値ラベルが付与される。 Next, the topic in the representative paragraph is estimated based on the keywords extracted from the utterances included in the representative paragraph of the cluster, and a multi-valued label indicating the estimated topic is given to the representative paragraph. Further, as shown in FIG. 15, other paragraphs constituting the cluster are also given the same multi-valued label as the representative paragraph of the cluster.
 本実施形態に係る推定方法(以下、「本手法」と称することがある)の有効性を示すために、実験により従来手法との比較を行った。実験では、モデルの学習用に349通話、検証用に50通話を用いた。話題を示す多値ラベルとして、話題Aから話題H、および、通話の最初の発話から1番目の話の切り替わりまでの固定的な話題Sを示す8種類のラベルを用意した。従来手法とは、発話が話の切り替わりである否かを示す二値ラベルを、多値ラベルが切り替わる発話のみに付与したデータを教師データとして二値分類モデルを学習し、また、多値ラベルが切り替わる発話のみを教師データとして多値分類モデルを学習する手法である。 In order to show the effectiveness of the estimation method according to this embodiment (hereinafter, may be referred to as "this method"), a comparison with the conventional method was made by experiment. In the experiment, 349 calls were used for learning the model and 50 calls were used for verification. As multi-valued labels indicating a topic, eight types of labels indicating a topic A to a topic H and a fixed topic S from the first utterance of a call to the switching of the first talk are prepared. In the conventional method, a binary classification model is learned by using data in which a binary label indicating whether or not an utterance is a change of talk is given only to an utterance in which a multi-value label is switched as teacher data, and a multi-value label is used. This is a method of learning a multi-valued classification model using only switching utterances as teacher data.
 まず、二値分類モデルによる、話の切り替わりであるか否かの判定に基づく、段落の範囲の推定精度(系列データの段落単位での分割精度)を比較した。比較結果を表1に示す。 First, we compared the estimation accuracy of the paragraph range (the accuracy of dividing the series data in paragraph units) based on the judgment of whether or not the story was switched by the binary classification model. The comparison results are shown in Table 1.
Figure JPOXMLDOC01-appb-T000001
Figure JPOXMLDOC01-appb-T000001
 上述したように、本手法では、ある話題から同じ話題に遷移する発話についても、話の切り替わり発話に含めて、段落の範囲を推定する。そのため、表1に示すように、本手法では、従来手法と比べて、適合率は下がっている。しかしながら、本手法では、従来手法では検出しきれなかった段落および話の切り替わりの発話を検出することができるようになったため、段落分割の再現率は上がった。 As mentioned above, in this method, the range of paragraphs is estimated by including the utterances that transition from a certain topic to the same topic in the utterances that change the story. Therefore, as shown in Table 1, in this method, the precision rate is lower than that in the conventional method. However, in this method, it has become possible to detect paragraphs and utterances of story switching that could not be detected by the conventional method, so that the recall rate of paragraph division has increased.
 次に、二値分類モデルにより話の切り替わりと判定された発話における、多値分類モデルによる話題の推定の精度について比較した。上述したように、従来手法では、多値ラベルが切り替わる発話のみに、人手によりその発話における話題を示す多値ラベルが付与された教師データを学習して多値分類モデルを生成した。一方、本手法では、話の切り替わりであることを示すラベルが人手により付与された発話に対して、多値ラベルを補完した教師データを学習して多値分類モデル2を生成した。従来手法で学習された多値分類モデルおよび本手法で学習された多値分類モデル2それぞれを用いて、従来手法および本手法で学習された二値分類モデルにより話の切り替わりの発話であると判定された発話における話題を推定し、その発話に対して人手により付与された正解の話題と比較した。比較の結果(適合率)を表2に示す。 Next, we compared the accuracy of topic estimation by the multi-value classification model in utterances that were determined to be story switching by the binary classification model. As described above, in the conventional method, the multi-value classification model is generated by learning the teacher data to which the multi-value label indicating the topic in the utterance is manually attached only to the utterance in which the multi-value label is switched. On the other hand, in this method, the multi-value classification model 2 was generated by learning the teacher data supplemented with the multi-value label for the utterance to which the label indicating that the story was switched was manually assigned. Using each of the multi-value classification model learned by the conventional method and the multi-value classification model 2 learned by this method, it is determined that the utterance is a switching utterance by the conventional method and the binary classification model learned by this method. The topic in the utterance was estimated and compared with the topic of the correct answer given manually to the utterance. The results of the comparison (compliance rate) are shown in Table 2.
Figure JPOXMLDOC01-appb-T000002
Figure JPOXMLDOC01-appb-T000002
 表2に示すように、本手法では、ある話題から同じ話題に遷移する発話も含めて、話の切り替わりの発話であると判定された発話における話題を、高い精度で推定できていることが分かった。話題Sについては、話の切り替わりの発話は通話の最初の発話となるため、評価を行わなかった。 As shown in Table 2, it was found that this method can estimate the topic in the utterance determined to be the utterance of the change of the story with high accuracy, including the utterance that transitions from a certain topic to the same topic. rice field. The topic S was not evaluated because the utterance of the change of talk is the first utterance of the call.
 最後に、評価対象とした100通話において、全ての発話の話題の分類の結果(F値)を評価した。この評価は、二値分類モデルによる話の切り替わりの発話の判定と、多値分類モデルによる話題の推定とを総合的に評価したものである。本手法においては、多値分類モデル2により、ある話題から同じ話題に遷移する発話についても、話の切り替わりの発話であると判定されるが、多値分類モデル2により、同じ話題への遷移の発話の多くが正しい話題に分類された。そのため、表3に示すように、従来手法と比べて、本手法の方が、総合的に高い評価結果が得られた。 Finally, the results (F value) of the classification of all utterance topics were evaluated in the 100 calls targeted for evaluation. This evaluation is a comprehensive evaluation of the determination of utterances of story switching by the binary classification model and the estimation of topics by the multi-value classification model. In this method, the multi-value classification model 2 determines that an utterance that transitions from a certain topic to the same topic is also a utterance that switches the story, but the multi-value classification model 2 determines that the transition to the same topic. Many of the utterances were classified as correct topics. Therefore, as shown in Table 3, the overall evaluation result of this method was higher than that of the conventional method.
Figure JPOXMLDOC01-appb-T000003
Figure JPOXMLDOC01-appb-T000003
 このように本実施形態においては、推定装置30は、判定部32と、段落推定部33とを備える。判定部32は、複数の話題を含む対話の系列データを構成する発話またはその分割単位に対して、話の切り替わりであるか否かを示す二値ラベルが付与された教師データ(第1の教師データ)に基づいて予め学習された二値分類モデル1(第1のモデル)を用いて、処理対象の系列データを構成する発話が、話の切り替わりの発話であるか否かを判定する。段落推定部33は、判定部32による判定の結果に基づき、処理対象の系列データにおける、話の切り替わりから次の切り替わりの直前の発話までの段落または話の切り替わりから対話の終わりの発話までの段落の範囲を推定する。 As described above, in the present embodiment, the estimation device 30 includes a determination unit 32 and a paragraph estimation unit 33. The determination unit 32 is a teacher data (first teacher) to which a binary label indicating whether or not the talk is switched is given to the utterance or the division unit thereof constituting the series data of the dialogue including a plurality of topics. Using the binary classification model 1 (first model) learned in advance based on the data), it is determined whether or not the utterance constituting the series data to be processed is the utterance of the switching of the talk. Based on the result of the determination by the determination unit 32, the paragraph estimation unit 33 is a paragraph in the series data to be processed from the change of story to the utterance immediately before the next change, or the paragraph from the change of story to the end of the dialogue. Estimate the range of.
 発話またはその分割単位に対して、話の切り替わりであるか否かを示す二値ラベルが付与された教師データを学習することで、系列データを構成する発話が、話の切り替わりの発話であるか否かを判定する二値分類モデル1を生成することができる。そして、二値分類モデル1の判定の結果に基づき、系列データにおける段落の範囲を推定することができる。また、系列データにおける段落の範囲を推定することで、段落に含まれる発話に話題を推定する範囲を限定することができるので、段落における話題の推定の精度の向上を図ることができる。 Whether the utterance that constitutes the series data is the utterance of the change of story by learning the teacher data to which the binary label indicating whether or not the change of the story is given to the utterance or its division unit. It is possible to generate a binary classification model 1 for determining whether or not. Then, based on the result of the determination of the binary classification model 1, the range of paragraphs in the series data can be estimated. Further, by estimating the range of the paragraph in the series data, the range of estimating the topic can be limited to the utterances included in the paragraph, so that the accuracy of estimating the topic in the paragraph can be improved.
 (第2の実施形態)
 第1の実施形態においては、系列データを構成する発話またはその分割単位が、話の切り替わりであるか否かを判定し、その判定結果に基づき段落の範囲を推定する例を用いて説明した。ただし、上述したように、系列データを構成する発話またはその分割単位が、話題の切り替わりであるか否かを判定し、その判定結果に基づき段落の範囲を推定してもよい。
(Second embodiment)
In the first embodiment, it has been described by using an example of determining whether or not the utterance or the division unit thereof constituting the series data is a switching of the utterance, and estimating the range of the paragraph based on the determination result. However, as described above, it may be determined whether or not the utterance or the division unit thereof constituting the series data is a change of topic, and the range of the paragraph may be estimated based on the determination result.
 上述したように、コンタクトセンタにおけるオペレータとカスタマとの対話は、時間軸に沿った系列データと見なすことができる。一連の系列データに対して客観的に分類される話題の区間ごとに系列データを分割する手法として、Text Tilingという手法が知られている(例えば、参考文献1参照)。この手法では、テキストの近傍にある単語同士の結束性に基づいて、結束度の極小点でテキストを分割する。また、トピックモデルの代表であるLatent Dirichlet Allocation(LDA)を用いてテキストを分割するTopic Tilingという手法も提案されている(参考文献2参照)。また、事前に定義された分類のラベルを付与した教師データから学習したモデルに基づき、時系列データの各データに対して、そのデータが属するラベルに分類する手法が提案されている(参考文献3参照)。
 [参考文献1]
 平尾努、北内啓、木谷強「語彙的結束性と単語重要度に基づくテキストセグメンテーション」情報処理学会論文誌、41(SIG_3(TOD_6))pp.24-36、2000-05-15.
 [参考文献2]
 M.Riedl and C.Biemann、 TopicTiling: A Text Segmentation Algorithm based on LDA"、 Proceedings of the 50th ACL 2012、 2012.
 [参考文献3]
 坪井祐太、他2名、「深層学習による自然言語処理」、講談社、2017年5月24日、p.32-36
As mentioned above, the dialogue between the operator and the customer in the contact center can be regarded as series data along the time axis. A method called Text Tiling is known as a method of dividing series data into sections of topics that are objectively classified with respect to a series of series data (see, for example, Reference 1). In this method, the text is divided at the minimum point of the degree of cohesion based on the cohesiveness of words in the vicinity of the text. In addition, a method called Topic Tiling that divides text using Latent Dirichlet Allocation (LDA), which is a representative topic model, has also been proposed (see Reference 2). Further, a method of classifying each data of the time series data into the label to which the data belongs has been proposed based on the model learned from the teacher data with the labels of the predetermined classification (Reference 3). reference).
[Reference 1]
Tsutomu Hirao, Kei Kitauchi, Tsuyoshi Kitani "Text Segmentation Based on Vocabulary Cohesion and Word Importance" IPSJ Journal, 41 (SIG_3 (TOD_6)) pp.24-36, 2000-05-15.
[Reference 2]
M.Riedl and C.Biemann, TopicTiling: A Text Segmentation Algorithm based on LDA ", Proceedings of the 50th ACL 2012, 2012.
[Reference 3]
Yuta Tsuboi, 2 others, "Natural language processing by deep learning", Kodansha, May 24, 2017, p.32-36
 しかしながら、コンタクトセンタにおける対話のような、特定のサービスあるいは商品に関する対話においては、事前に作成されているスクリプトに沿った応対を行っているかなどの、後々に種々の分析ができるように、コンタクトセンタ側から見た主観的な話題に分類することが求められている。主観的な話題とは、例えば、オペレータが特定のサービスを利用できなくなったカスタマ側の原因を切り分ける観点、あるいは、オペレータからカスタマへの営業電話におけるニーズあるいは要望のインタビューの観点から分類される話題である。これらの対話においては、サービス名、商品名あるいはこれらに関連する語彙として同じキーワードが、対話のいたるところで出現するので、内容としては主観的に区別したい話題であっても、表層的・客観的には区別できない話題が対話の大半を占めている。そのため、参考文献1および参考文献2に記載の手法では、主観的な話題ごとに対話を精度よく分割・分類することができない。 However, in a dialogue about a specific service or product, such as a dialogue in a contact center, the contact center can be analyzed later, such as whether the response is performed according to a script created in advance. It is required to classify into subjective topics from the side. Subjective topics are topics that are categorized, for example, from the perspective of isolating the cause of the customer's inability to use a particular service, or from the perspective of interviewing the needs or desires of the operator to the customer on a sales call. be. In these dialogues, the same keywords as service names, product names, or related vocabularies appear everywhere in the dialogues, so even if the content is a topic that you want to distinguish subjectively, it is superficial and objective. Indistinguishable topics make up the majority of the dialogue. Therefore, the methods described in References 1 and 2 cannot accurately divide and classify dialogues for each subjective topic.
 また、コンタクトセンタの対話においては、発話自体が短く、その発話だけでは、どの話題に属するのかを一意に決定することができない発話も存在する。このような発話には、本来の話題とは異なる話題のラベルが付与されることになる。本来の話題とは異なるラベルが付与された教師データを学習したモデルでは、分類の精度が低下する。そのため、参考文献3の記載の手法では、時系列順に入力された短い会話を含む各発話のそれぞれを、主観的な話題で適切に分類することは困難である。 In addition, in the dialogue of the contact center, the utterance itself is short, and there are some utterances in which it is not possible to uniquely determine which topic the utterance belongs to. Such utterances will be labeled with a topic that is different from the original topic. In a model that trains teacher data with a label different from the original topic, the accuracy of classification is reduced. Therefore, in the method described in Reference 3, it is difficult to appropriately classify each utterance including a short conversation input in chronological order by a subjective topic.
 以下では、本開示の第2の実施形態に係る推定装置30cの構成および動作について説明する。本実施形態に係る推定装置30cは、系列データを構成する発話またはその分割単位が話題の切り替わりであるか否かを判定し、その判定結果に基づき段落の範囲を推定するものである。 Hereinafter, the configuration and operation of the estimation device 30c according to the second embodiment of the present disclosure will be described. The estimation device 30c according to the present embodiment determines whether or not the utterance constituting the series data or the division unit thereof is a topic switching, and estimates the range of the paragraph based on the determination result.
 図16は、本実施形態に係る推定装置30cの構成例を示す図である。 FIG. 16 is a diagram showing a configuration example of the estimation device 30c according to the present embodiment.
 図16に示すように、本実施形態に係る推定装置30cは、入力部41と、判定部42と、話題推定部43と、段落推定部44と、出力部45とを備える。 As shown in FIG. 16, the estimation device 30c according to the present embodiment includes an input unit 41, a determination unit 42, a topic estimation unit 43, a paragraph estimation unit 44, and an output unit 45.
 入力部41は、複数の話題を含む対話の系列データが入力される。入力部41に入力される系列データは、段落の範囲および段落における話題の推定の対象となる処理対象のデータである。系列データは、例えば、オペレータおよびカスタマの時系列的な発話が音声認識されたテキストデータである。入力部41は、オンラインで系列データが入力される場合には、対話中の各発話の音声認識により得られたテキストデータが逐次、入力されてよい。また、入力部41は、オフラインで系列データが入力される場合には、対話中の各発話の開始時刻あるいは終了時刻でソートして、各発話のテキストデータが入力されてよい。入力部41は、入力された系列データを判定部42に出力する。 The input unit 41 inputs the series data of the dialogue including a plurality of topics. The series data input to the input unit 41 is data to be processed that is the target of estimation of the range of paragraphs and topics in paragraphs. The series data is, for example, text data in which time-series utterances of an operator and a customer are voice-recognized. When the series data is input online, the input unit 41 may sequentially input the text data obtained by the voice recognition of each utterance during the dialogue. Further, when the series data is input offline, the input unit 41 may sort the start time or end time of each utterance during the dialogue and input the text data of each utterance. The input unit 41 outputs the input series data to the determination unit 42.
 判定部42は、二値分類モデル1aを用いて、入力部41から出力された系列データを構成する発話が、話題の切り替わりの発話であるか否かを判定する。ここで、二値分類モデル1aは、対話の系列データを構成する発話またはその分割単位に対して、話題の切り替わりであるか否かを判定するように予め学習したモデルである。二値分類モデル1aは、例えば、系列データを構成する発話またはその分割単位に対して、話題の切り替わりであるか否かを示す二値ラベル(切り替わりラベル)が付与された教師データを、図1を参照して説明した学習装置10により学習することで作成することができる。 The determination unit 42 uses the binary classification model 1a to determine whether or not the utterance constituting the series data output from the input unit 41 is a topic switching utterance. Here, the binary classification model 1a is a model learned in advance so as to determine whether or not the topic is switched with respect to the utterance or the division unit thereof constituting the series data of the dialogue. In the binary classification model 1a, for example, the teacher data to which the binary label (switching label) indicating whether or not the topic is switched is attached to the utterance or the division unit thereof constituting the series data is shown in FIG. It can be created by learning with the learning device 10 described with reference to.
 判定部42は、二値分類モデル1aを用いた判定結果から、系列データを構成する発話またはその分割単位を、後述する話題推定部43による処理対象とするか否かを決定する。具体的には、判定部42は、話題の切り替わりであると判定した発話またはその分割単位を、話題推定部43による処理対象と決定する。判定部42は、話題推定部43による処理対象とするか否かの決定結果を、話題推定部43および段落推定部44に出力する。 The determination unit 42 determines from the determination result using the binary classification model 1a whether or not the utterance constituting the series data or the division unit thereof is to be processed by the topic estimation unit 43 described later. Specifically, the determination unit 42 determines the utterance or the division unit thereof determined to be the switching of the topic as the processing target by the topic estimation unit 43. The determination unit 42 outputs the determination result of whether or not to be processed by the topic estimation unit 43 to the topic estimation unit 43 and the paragraph estimation unit 44.
 話題推定部43は、多値分類モデル2aを用いて、判定部42により処理対象と決定された発話(話題の切り替わりの発話)またはその分割単位に対して、その発話が含まれる範囲における話題を示す多値ラベルを付与する。ここで、多値分類モデル2aは、発話またはその分割単位に対して、その発話が含まれる範囲における話題を推定するモデルである。多値分類モデル2aは、例えば、系列データを構成する発話またはその分割単位に対して、その発話が関連する話題を示す多値ラベル(話題ラベル)が付与された教師データを、図2を参照して説明した学習装置20により学習することで作成することができる。多値分類モデル2aの学習にあたっては、話題の切り替わりの発話であり、多値ラベルが付与された発話のみを対象にして、話題の推移についての学習が行われてもよい。話題の切り替わりの発話から次の話題の切り替わりの発話までの間の発話を学習の対象から除外することで、話題の分類に対するノイズを除去することができる。 The topic estimation unit 43 uses the multi-value classification model 2a to set a topic within the range including the utterance for the utterance (the utterance of switching the topic) determined to be processed by the determination unit 42 or the division unit thereof. Give a multi-valued label to indicate. Here, the multi-value classification model 2a is a model for estimating a topic in a range including the utterance with respect to the utterance or its division unit. In the multi-value classification model 2a, for example, the teacher data to which the multi-value label (topic label) indicating the topic to which the utterance is related is attached to the utterance or the division unit thereof constituting the series data is referred to with reference to FIG. It can be created by learning with the learning device 20 described above. In the learning of the multi-value classification model 2a, the utterances of the topic switching may be performed, and the learning about the transition of the topic may be performed only for the utterances to which the multi-value label is attached. By excluding the utterances between the utterance of the topic change and the utterance of the next topic change from the learning target, noise for the topic classification can be removed.
 話題推定部43は、話題の推定結果(推定した話題に対応する多値ラベル)をラベル情報テーブルに保存する。ラベル情報テーブルは、処理しているデータに対する話題の推定結果を格納しておく領域であり、計算機上のメモリであってもよいし、データベースであってもよいし、ファイルであってもよい。 The topic estimation unit 43 stores the topic estimation result (multi-valued label corresponding to the estimated topic) in the label information table. The label information table is an area for storing the estimation result of the topic for the data to be processed, and may be a memory on a computer, a database, or a file.
 段落推定部44は、判定部42により処理対象と決定された発話(話題の切り替わりの発話)から、次に処理対象と決定された発話の直前の発話までの範囲を1つの段落の範囲と推定する。段落推定部44は、ラベル情報テーブルに格納されている多値ラベルを、範囲を推定した段落に含まれる発話に付与する。具体的には、段落推定部44は、話題の切り替わりの発話から、次の話題の切り替わりの発話の直前の発話までの発話に、ラベル情報テーブルに格納されている、その話題の切り替わりの発話に付与された多値ラベルを付与する。 The paragraph estimation unit 44 estimates that the range from the utterance determined to be processed by the determination unit 42 (the utterance of the topic change) to the utterance immediately before the next utterance determined to be processed is the range of one paragraph. do. The paragraph estimation unit 44 attaches a multi-valued label stored in the label information table to the utterance included in the paragraph whose range is estimated. Specifically, the paragraph estimation unit 44 describes the utterances from the utterance of the topic change to the utterance immediately before the utterance of the next topic change, and the utterance of the topic change stored in the label information table. Give the given multi-valued label.
 出力部45は、系列データにおける範囲が推定された段落ごとに、その段落を構成する発話を出力する。また、出力部35は、段落における話題を示す多値ラベル、段落の開始時刻および終了時刻などを出力してもよい。 The output unit 45 outputs the utterances constituting the paragraph for each paragraph whose range is estimated in the series data. Further, the output unit 35 may output a multi-valued label indicating a topic in the paragraph, a paragraph start time, an end time, and the like.
 第1の実施形態と同様に、推定装置30cにおいて、テキストチャットに対する形態素解析を行う形態素解析部が入力部41の後段に設けられてもよい。また、オフラインで処理対象の系列データが入力される場合には、推定装置30cの構成は、話題の切り替わりの発話であるか否かの判定および話題の推定の結果を一度に全て使って、段落の範囲を推定してもよい。この場合、段落推定部44は、話題の切り替わりであるか否かの判定結果と、話題の推定結果とに基づいて、話題の切り替わりから次の話題の切り替わりの直前の発話までの範囲の発話に、話題推定部43により推定された多値ラベルを付与してよい。 Similar to the first embodiment, in the estimation device 30c, a morphological analysis unit that performs morphological analysis for text chat may be provided after the input unit 41. Further, when the series data to be processed is input offline, the configuration of the estimation device 30c uses all the results of the determination of whether or not the utterance of the topic is switched and the estimation of the topic at once, and paragraphs. You may estimate the range of. In this case, the paragraph estimation unit 44 sets the utterance in the range from the change of the topic to the utterance immediately before the change of the next topic based on the determination result of whether or not the change of the topic is made and the estimation result of the topic. , The multi-valued label estimated by the topic estimation unit 43 may be attached.
 図17は、本実施形態に係る推定装置30cの動作の一例を示すフローチャートである。 FIG. 17 is a flowchart showing an example of the operation of the estimation device 30c according to the present embodiment.
 判定部42は、入力部41に入力された処理対象の系列データにおける対話が終了したか否かを判定する(ステップS51)。 The determination unit 42 determines whether or not the dialogue in the series data of the processing target input to the input unit 41 has been completed (step S51).
 対話が終了したと判定された場合(ステップS51:Yes)、推定装置30cは処理を終了する。 When it is determined that the dialogue is completed (step S51: Yes), the estimation device 30c ends the process.
 対話が終了していないと判定した場合(ステップS51:No)、判定部42は、処理対象の発話を読み込む(ステップS52)。判定部42は、二値分類モデル1aを用いて、読み込んだ発話が話題の切り替わりの発話であるか否かを判定する(ステップS53)。 When it is determined that the dialogue is not completed (step S51: No), the determination unit 42 reads the utterance to be processed (step S52). The determination unit 42 uses the binary classification model 1a to determine whether or not the read utterance is a topic-switching utterance (step S53).
 読み込まれた発話が話題の切り替わりの発話でないと判定された場合(ステップS54:No)、後述するステップS57の処理が行われる。 When it is determined that the read utterance is not a topic switching utterance (step S54: No), the process of step S57, which will be described later, is performed.
 読み込まれた発話が話題の切り替わりの発話であると判定された場合(ステップS54:Yes)、話題推定部43は、多値分類モデル2aを用いて、読み込まれた発話の話題を推定する(ステップS55)。話題推定部43は、推定した話題をラベル情報テーブルに格納して、ラベル情報テーブルを更新する(ステップS56)。すなわち、読み込まれた発話が話題の切り替わりの発話であるごとに、ラベル情報テーブルが更新される。 When it is determined that the read utterance is a topic switching utterance (step S54: Yes), the topic estimation unit 43 estimates the topic of the read utterance using the multi-value classification model 2a (step). S55). The topic estimation unit 43 stores the estimated topic in the label information table and updates the label information table (step S56). That is, the label information table is updated every time the read utterance is a topic switching utterance.
 段落推定部44は、読み込まれた発話に、ラベル情報テーブルに格納された多値ラベルを付与する(ステップS57)。上述したように、読み込まれた発話が話題の切り替わりの発話であるごとに、ラベル情報テーブルが更新される。したがって、一つの段落を構成する、話題の切り替わりの発話から次の話題の切り替わりの発話の直前までの発話までに、同一の多値ラベルが付与される。 The paragraph estimation unit 44 assigns a multi-valued label stored in the label information table to the read utterance (step S57). As described above, the label information table is updated every time the read utterance is a topic switching utterance. Therefore, the same multi-valued label is assigned from the utterance of the topic change to the utterance immediately before the utterance of the next topic change, which constitutes one paragraph.
 読み込まれた発話に多値ラベルが付与されると、判定部42は、系列データにおける次の発話を処理対象として(ステップS58)、ステップS51の処理に戻る。 When a multi-valued label is attached to the read utterance, the determination unit 42 returns to the process of step S51 with the next utterance in the series data as the processing target (step S58).
 図18は、本実施形態に係る推定装置30cによる話題の推定の一例を示す図である。図18においては、二値分類モデル1aおよび多値分類モデル2aが発話単位で学習されているものとする。 FIG. 18 is a diagram showing an example of topic estimation by the estimation device 30c according to the present embodiment. In FIG. 18, it is assumed that the binary classification model 1a and the multi-value classification model 2a are learned in utterance units.
 1つの対話の系列データが推定装置30cに入力されると、判定部42は、図18に示すように、二値分類モデル1aを用いて、系列データを構成する発話が、話題の切り替わりの発話であるか否かを判定する。話題推定部43は、多値分類モデル2aを用いて、話題の切り替わりであると判定された発話の話題を推定し、推定した話題を示す多値ラベルをラベル情報テーブルに格納する。段落推定部44は、話題の切り替わりの発話から、次の話題の切り替わりの発話の直前の発話までの範囲を一つの段落と推定する。そして、段落推定部44は、その段落を構成する全ての発話に対して、ラベル情報テーブルに格納されている、その段落の先頭の発話の話題を示す多値ラベルを付与する。 When the series data of one dialogue is input to the estimation device 30c, the determination unit 42 uses the binary classification model 1a as shown in FIG. It is determined whether or not it is. The topic estimation unit 43 estimates the topic of the utterance determined to be the switching of the topic by using the multi-value classification model 2a, and stores the multi-value label indicating the estimated topic in the label information table. The paragraph estimation unit 44 estimates the range from the utterance of the topic change to the utterance immediately before the utterance of the next topic change as one paragraph. Then, the paragraph estimation unit 44 assigns a multi-valued label indicating the topic of the utterance at the beginning of the paragraph, which is stored in the label information table, to all the utterances constituting the paragraph.
 このように本実施形態においては、推定装置30cは、二値分類モデル1aを用いて、系列データを構成する発話が、話題の切り替わりの発話であるか否かを判定する。また、推定装置30cは、多値分類モデル2aを用いて、話題の切り替わりの発話の話題を推定する。また、推定装置30cは、話題の切り替わりの発話から、次の話題の切り替わりの発話の直前までの発話までを段落の範囲と推定し、話題の切り替わりの発話について推定した話題を、その話題の切り替わりの発話を含む段落における話題と推定する。 As described above, in the present embodiment, the estimation device 30c uses the binary classification model 1a to determine whether or not the utterances constituting the series data are utterances of switching topics. Further, the estimation device 30c estimates the topic of the utterance of the topic change by using the multi-value classification model 2a. Further, the estimation device 30c estimates the range of the paragraph from the utterance of the topic change to the utterance immediately before the next topic change utterance, and the topic estimated about the topic change utterance is the topic change. Presumed to be a topic in a paragraph containing the utterance of.
 これにより、類似した話題が大半を占める対話あるいは話題の順序が不定である対話であっても、話題の切り替わりの発話を検出し、その発話に付与すべき多値ラベルを推定することができる。そのため、話題の切り替わりの発話から、次の話題の切り替わりの発話の直前までの発話を、一つの話題からなる段落として推定することができる。 This makes it possible to detect an utterance of a topic change and estimate a multi-valued label to be given to the utterance even in a dialogue in which the majority of similar topics or a dialogue in which the order of the topics is indefinite. Therefore, it is possible to estimate the utterance from the utterance of the change of topic to the utterance immediately before the utterance of the change of the next topic as a paragraph consisting of one topic.
 (第3の実施形態)
 上述した第1および第2の実施形態においては、発話あるいはその分割単位で、話(話題)の切り替わりであるか否かの推定を行うモデルおよび話題の推定を行うモデルを作成した。上述したように、発話の分割単位とは、例えば、発話を単語ごとに分割した単語単位である。また、発話の分割単位とは、例えば、発話に句読点が付与されている場合には、句読点あるいは句点で分割した単位である。また、上述した第1および第2の実施形態においては、発話の話題を推定する場合、発話あるいは予め定められた分割単位で、話題を推定していた。そして、第1および第2の実施形態においては、発話の分割単位は固定であった。
(Third embodiment)
In the first and second embodiments described above, a model for estimating whether or not the talk (topic) is switched and a model for estimating the topic are created for each utterance or its division unit. As described above, the utterance division unit is, for example, a word unit in which the utterance is divided into words. Further, the utterance division unit is, for example, a unit divided by punctuation marks or punctuation marks when punctuation marks are added to the utterance. Further, in the first and second embodiments described above, when the topic of the utterance is estimated, the topic is estimated in the utterance or a predetermined division unit. Then, in the first and second embodiments, the division unit of the utterance was fixed.
 しかしながら、コンタクトセンタにおける顧客と応対担当者との対話では、予め定められた単位で話題(場面)が切り替わるとは限らない。例えば、自動車事故に関するコンタクトセンタの応対では、けがの有無を確認する場面と、車の損傷を確認する場面とを分けて、応対履歴を記録する場合がある。以下では、発話1から発話4で示す顧客と応対担当者との対話を、けがの有無を確認する場面と、車の損傷を確認する場面とに分ける例を用いて説明する。
 応対担当者:「車を車庫に入れるときに、事故にあったと聞いていますが、どのような状況でしたか?」(発話1)
 顧客:「車庫に入れているときに、車の後ろのバンパーが電柱に当たり、傷がついてしまったのです。」(発話2)
 応対担当者:「そうでしたか、車庫に入れる際、車の後ろのバンパーを電柱で擦ったということで、お体は、大丈夫でしたでしょうか?」(発話3)
 顧客:「ケガはなかったです。」(発話4)
However, in the dialogue between the customer and the person in charge of reception at the contact center, the topic (scene) does not always change in a predetermined unit. For example, in the response of a contact center regarding a car accident, the response history may be recorded separately for the scene of confirming the presence or absence of an injury and the scene of confirming the damage of a car. In the following, the dialogue between the customer and the person in charge of responding shown in utterances 1 to 4 will be described with an example of dividing the dialogue into a scene for confirming the presence or absence of an injury and a scene for confirming damage to the car.
Responsible person: "I heard that you had an accident when you put the car in the garage. What was the situation?" (Utterance 1)
Customer: "When I was in the garage, the bumper behind the car hit the utility pole and got scratched." (Utterance 2)
Responsible person: "Well, was your body okay because you rubbed the bumper behind the car with a utility pole when you put it in the garage?" (Utterance 3)
Customer: "I wasn't injured." (Utterance 4)
 上述した例において、発話1および発話2は、車の損傷を確認する場面での発話である。発話3の途中で、車の損傷を確認する場面からケガの有無を確認する場面に切り替わり、ケガの有無を確認する場面が発話4へと続いている。具体的には、発話3の「そうでしたか、車庫に入れる際、車の後ろのバンパーを電柱で擦ったということで、」までが車の損傷を確認する場面であり、発話3の「お体は、大丈夫でしたでしょうか?」からが、ケガの有無を確認する場面である。 In the above example, utterance 1 and utterance 2 are utterances in a scene where damage to the car is confirmed. In the middle of utterance 3, the scene of confirming the damage of the car is switched to the scene of confirming the presence or absence of injury, and the scene of confirming the presence or absence of injury continues to utterance 4. Specifically, utterance 3 "That's right, because I rubbed the bumper behind the car with a utility pole when I put it in the garage" is the scene to confirm the damage of the car, and utterance 3 " Was your body okay? ”Is the scene to check for any injuries.
 第1および第2の実施形態においては、予め単位を決めて学習データを用意する必要がある。そのため、上述した発話3のように、発話の途中で場面が切り替わる場合に対応するモデルを作成することは困難である。発話3の例では、「そうでしたか、車庫に入れる際、車の後ろのバンパーを電柱で擦ったということで、」という単位には、車の損傷を確認する場面であることを示すラベルを付与し、「お体は、大丈夫でしたでしょうか?」という単位には、ケガの有無を確認する場面であることを示すラベルを付与することが望ましいが、予めこのような単位を決定することは困難である。 In the first and second embodiments, it is necessary to determine the unit in advance and prepare the learning data. Therefore, it is difficult to create a model corresponding to the case where the scene is switched in the middle of the utterance as in the above-mentioned utterance 3. In the example of utterance 3, the unit "That's right, I rubbed the bumper behind the car with a utility pole when I put it in the garage" is a label indicating that it is a scene to confirm damage to the car. It is desirable to give a label indicating that it is a scene to confirm the presence or absence of injury to the unit "Is your body okay?", But such a unit is decided in advance. That is difficult.
 例えば、句読点単位の分割を採用した場合、発話3は、「そうでしたか」「車庫に入れる際」「車の後ろのバンパーを電柱で擦ったということで」「お体は」「大丈夫でしたでしょうか?」という単位に分割される。しかしながら、例えば、「そうでしたか」「お体は」「大丈夫でしたでしょうか?」といった単位だけでは、どのような場面かを特定することができず、ラベルを付与することが困難である。 For example, if you use punctuation-based division, utterance 3 is "Is that so?" "When you put it in the garage" "By rubbing the bumper behind the car with a utility pole" "Your body" "It's okay" Did you do it? " However, for example, it is not possible to specify what kind of scene it is, and it is difficult to give a label only with units such as "Is that so", "Your body", and "Is it okay?" ..
 また、予め定められた単位を繋げて学習データを作成する場合、「そうでしたか」「車庫に入れる際」「車の後ろのバンパーを電柱で擦ったということで」までを繋げて1つの単位とし、「お体は」「大丈夫でしたでしょうか?」までを繋げて1つの単位とすることで学習データを作成することは可能である。ただし、「そうでしたか」、「車庫に入れる際」、「そうでしたか、車庫に入れる際」など、他の単位で負例とすべきか否かを判定して学習データを作成することは困難である。 Also, when creating learning data by connecting predetermined units, "Is that so?" "When putting it in the garage" "By rubbing the bumper behind the car with a utility pole" is connected to one It is possible to create learning data by connecting "body" and "is it okay?" As a unit and making it one unit. However, create learning data by judging whether or not it should be a negative example in other units such as "was it?", "When putting it in the garage", "was it, when putting it in the garage", etc. It is difficult.
 また、発話の途中で話(話題)の切り替わりの箇所を推定する場合、推定前に発話の単位を決定することが困難である。 Also, when estimating the switching point of a story (topic) in the middle of an utterance, it is difficult to determine the unit of the utterance before the estimation.
 本実施形態においては、学習の単位を固定せず、教師データから動的に様々な単位で正例、負例および対象外の学習データを作成する。すなわち、本実施形態においては、発話の分割単位を可変にして学習データを作成する。こうすることで、発話の途中で話(場面)が切り替わる場合であっても、切り替わりの箇所を高精度に推定することが可能なモデルを学習するための学習データを作成することができる。また、学習の単位を固定せずに作成した学習データを学習したモデルを用いることで、発話の途中で場面が切り替わる場合にも、発話内の各場面を推定することができる。 In this embodiment, the learning unit is not fixed, and positive examples, negative examples, and non-target learning data are dynamically created in various units from the teacher data. That is, in the present embodiment, the learning data is created by making the division unit of the utterance variable. By doing so, even when the story (scene) is switched in the middle of the utterance, it is possible to create learning data for learning a model capable of estimating the switching point with high accuracy. Further, by using a model in which learning data created without fixing the learning unit is used, it is possible to estimate each scene in the utterance even when the scene is switched in the middle of the utterance.
 図19は、本実施形態に係る学習データ作成装置50の構成例を示す図である。本実施形態に係る学習データ作成装置50は、教師データから動的に様々な単位で正例、負例および対象外の学習データを作成する。 FIG. 19 is a diagram showing a configuration example of the learning data creating device 50 according to the present embodiment. The learning data creating device 50 according to the present embodiment dynamically creates positive examples, negative examples, and non-target learning data in various units from the teacher data.
 図19に示すように、本実施形態に係る学習データ作成装置50は、入力部51と、学習データ作成部52と、出力部53とを備える。 As shown in FIG. 19, the learning data creating device 50 according to the present embodiment includes an input unit 51, a learning data creating unit 52, and an output unit 53.
 入力部51は、対話の系列データが入力される。系列データは、例えば、オペレータとカスタマとの時系列的な対話の音声データあるいはその対話に含まれる発話が音声認識されたテキストデータである。入力部51は、入力された系列データを学習データ作成部52に出力する。 The dialogue series data is input to the input unit 51. The series data is, for example, voice data of a time-series dialogue between an operator and a customer, or text data in which utterances included in the dialogue are voice-recognized. The input unit 51 outputs the input series data to the learning data creation unit 52.
 学習データ作成部52は、入力部51から出力された系列データと、教師データとが入力される。教師データは、学習データを作成する前に、系列データを構成する発話における、場面を特定するために最低限必要な発話の範囲にラベルが付与されたデータである。教師データにおけるラベルは人手により付与される。学習データ作成部52は、入力された系列データおよび教師データに基づき、発話の任意の分割単位で、当該発話における話題(場面)を推定するモデルの学習に用いる学習データを作成する。 The learning data creation unit 52 inputs the series data output from the input unit 51 and the teacher data. The teacher data is data in which the range of utterances necessary for specifying a scene in the utterances constituting the series data is labeled before the learning data is created. Labels in teacher data are manually assigned. The learning data creation unit 52 creates learning data used for learning a model for estimating a topic (scene) in the utterance in an arbitrary division unit of the utterance based on the input series data and the teacher data.
 図20は、学習データ作成部52の構成例を示す図である。 FIG. 20 is a diagram showing a configuration example of the learning data creation unit 52.
 図20に示すように、学習データ作成部52は、文出力部521と、ID付与部522と、組み合わせ生成部523と、付与部524とを備える。 As shown in FIG. 20, the learning data creation unit 52 includes a sentence output unit 521, an ID assignment unit 522, a combination generation unit 523, and an assignment unit 524.
 文出力部521は、入力部51から入力された系列データを構成する発話の文字列を文として出力する。系列データがテキストデータである場合には、文出力部521は、形態素解析により単語単位に分割された文を出力する。また、系列データが音声データである場合には、文出力部521は、音声認識により単語単位に分割された文を出力する。 The sentence output unit 521 outputs the utterance character string constituting the series data input from the input unit 51 as a sentence. When the series data is text data, the sentence output unit 521 outputs a sentence divided into word units by morphological analysis. When the series data is voice data, the sentence output unit 521 outputs a sentence divided into word units by voice recognition.
 ID付与部522は、文出力部521から出力された文から、発話を所定の規則で分割した要素を生成する。ID付与部522による分割の単位(要素の単位)は、単語単位、句読点単位、音声認識単位、話し終わり単位など、特定可能な単位なら任意の単位であってよい。ID付与部522は、発話を分割した要素それぞれに対してIDを付与し、各要素に付与したIDをID集合に格納する。 The ID assignment unit 522 generates an element in which the utterance is divided according to a predetermined rule from the sentence output from the sentence output unit 521. The unit of division (unit of element) by the ID assigning unit 522 may be any unit as long as it can be specified, such as a word unit, a punctuation mark unit, a voice recognition unit, and a speech end unit. The ID assigning unit 522 assigns an ID to each of the elements in which the utterance is divided, and stores the ID assigned to each element in the ID set.
 組み合わせ生成部523は、ID集合に格納されたIDに基づいて、モデルの学習に必要なIDの組み合わせ(組み合わせID列)を生成する。 The combination generation unit 523 generates a combination of IDs (combination ID string) necessary for learning the model based on the IDs stored in the ID set.
 図21は、組み合わせ生成部523の構成例を示す図である。 FIG. 21 is a diagram showing a configuration example of the combination generation unit 523.
 図21に示すように、組み合わせ生成部523は、ID取り出し部5231と、組み合わせ対象ID格納部5232と、組み合わせ生成ID格納部5233と、組み合わせID生成部5234と、を備える。 As shown in FIG. 21, the combination generation unit 523 includes an ID extraction unit 5231, a combination target ID storage unit 5232, a combination generation ID storage unit 5233, and a combination ID generation unit 5234.
 ID取り出し部5231は、ID集合から、予め定められた最長単位のIDを取り出し、最長単位のID集合に格納する。ここで、最長単位とは、文出力部521による文の出力の際に分割された単位よりも長い単位であって、予め特定可能な単位であれば、任意の単位であってよい。例えば、文の出力の際の分割の単位が単語単位であれば、最長単位は、単語単位よりも長い、句読点単位あるいは句点単位などである。また、例えば、文の出力の際の分割の単位が句読点単位であれば、最長単位は、句読点単位よりも長い、句点単位あるいは音声認識単位などである。 The ID extraction unit 5231 extracts a predetermined longest unit ID from the ID set and stores it in the longest unit ID set. Here, the longest unit may be any unit as long as it is a unit longer than the unit divided when the sentence is output by the sentence output unit 521 and can be specified in advance. For example, if the unit of division at the time of output of a sentence is a word unit, the longest unit is a punctuation mark unit or a punctuation unit, which is longer than the word unit. Further, for example, if the unit of division at the time of outputting a sentence is a punctuation mark unit, the longest unit is a punctuation mark unit or a voice recognition unit, which is longer than the punctuation mark unit.
 組み合わせ対象ID格納部5232は、最長単位のID集合から、組み合わせ対象となる範囲のIDを取り出し、組み合わせ対象のID集合に格納する。 The combination target ID storage unit 5232 extracts the IDs in the range to be combined from the longest unit ID set and stores them in the combination target ID set.
 組み合わせ生成ID格納部5233は、組み合わせ対象のID集合から、組み合わせID列を生成するための組み合わせ生成IDを取得し、組み合わせ生成IDの集合に格納する。 The combination generation ID storage unit 5233 acquires the combination generation ID for generating the combination ID string from the combination target ID set and stores it in the combination generation ID set.
 組み合わせID生成部5234は、組み合わせ生成IDの集合に基づき、組み合わせID列を生成し、組み合わせID列の集合に格納して、組み合わせID列の集合を更新する。 The combination ID generation unit 5234 generates a combination ID string based on the set of combination generation IDs, stores it in the set of combination ID columns, and updates the set of combination ID columns.
 図20を再び参照すると、組み合わせ生成部523は、生成した組み合わせID列を付与部524に出力する。 Referring to FIG. 20 again, the combination generation unit 523 outputs the generated combination ID string to the addition unit 524.
 付与部524は、組み合わせ生成部523から出力された組み合わせID列と、教師データとが入力される。付与部524は、組み合わせID列を文字列に置き換えた分割単位ごとに、教師データに基づき、正例、負例あるいは学習の対象外とするラベルを付与して、学習データを作成する。 The combination ID string output from the combination generation unit 523 and the teacher data are input to the addition unit 524. The assigning unit 524 creates learning data by assigning a positive example, a negative example, or a label to be excluded from learning based on the teacher data for each division unit in which the combination ID string is replaced with a character string.
 図22は、付与部524の構成例を示す図である。 FIG. 22 is a diagram showing a configuration example of the granting unit 524.
 図22に示すように、付与部524は、正例付与部5241と、負例付与部5242と、対象外付与部5243とを備える。 As shown in FIG. 22, the granting unit 524 includes a positive example granting unit 5241, a negative example granting unit 5242, and a non-target granting unit 5243.
 正例付与部5241は、教師データに基づき、組み合わせID列の集合のうち、所定のID列に正例を示すラベルを付与する。こうすることで、所定のID列を文字列に置き換えた分割単位に正例を示すラベルが付与される。 The regular example assigning unit 5241 assigns a label indicating a regular example to a predetermined ID column in the set of combination ID columns based on the teacher data. By doing so, a label showing a positive example is given to the division unit in which the predetermined ID string is replaced with the character string.
 負例付与部5242は、組み合わせID列の集合のうち、所定のID列に負例を示すラベルを付与する。こうすることで、所定のID列を文字列に書き換えた分割単位に負例を示すラベルが付与される。 The negative example assigning unit 5242 assigns a label indicating a negative example to a predetermined ID column in the set of combination ID columns. By doing so, a label showing a negative example is given to the division unit in which the predetermined ID string is rewritten into the character string.
 対象外付与部5243は、組み合わせID列の集合のうち、所定のID列に学習の対象外であることを示すラベルを付与する。こうすることで、組み合わせID列を文字列に置き換えた分割単位に、対象外であることを示すラベルが付与される。対象外付与部5243は、学習の対象外であることを示すラベルが付与された組み合わせID列を削除し、正例あるいは負例を示すラベルが付与された組み合わせID列に対応する分割単位と、正例あるいは負例を示すラベルとを学習データとして出力する。付与部524の動作の詳細については後述する。 The non-target granting unit 5243 assigns a label indicating that it is not subject to learning to a predetermined ID column in the set of combination ID columns. By doing so, a label indicating that the combination ID string is not the target is given to the division unit in which the combination ID string is replaced with the character string. The non-target granting unit 5243 deletes the combination ID column to which the label indicating that it is not the target of learning is attached, and the division unit corresponding to the combination ID column to which the label indicating the positive example or the negative example is attached, and A label indicating a positive example or a negative example is output as training data. The details of the operation of the granting unit 524 will be described later.
 図19を再び参照すると、出力部53は、学習データ作成部52により作成された学習データを出力する。 Referring to FIG. 19 again, the output unit 53 outputs the learning data created by the learning data creation unit 52.
 次に、学習データ作成部52の動作について説明する。なお、以下では、場面(話)の切り替わりであるか否かを判定するモデルの学習のための学習データを作成する場合を例として説明する。具体的には、上述した発話3には場面の切り替わりが含まれているので、発話3を例として説明する。また、以下では、場面の切り替わりと判定する範囲にはラベル「T」が付与され、場面の切り替わりと判定しない範囲にはラベル「F」が付与されるものとする。また、文の分割単位は句読点単位であり、最長単位は句点単位であるとする。また、教師データとして、発話3において場面の切り替わりと判定される範囲(「お体は、大丈夫でしたでしょうか?」)にラベル「T」が付与されているものとする。 Next, the operation of the learning data creation unit 52 will be described. In the following, a case of creating learning data for learning a model for determining whether or not a scene (story) is switched will be described as an example. Specifically, since the above-mentioned utterance 3 includes a scene change, the utterance 3 will be described as an example. Further, in the following, it is assumed that the label "T" is given to the range determined to be the change of the scene, and the label "F" is given to the range not determined to be the change of the scene. Further, it is assumed that the division unit of the sentence is a punctuation mark unit and the longest unit is a punctuation mark unit. Further, as teacher data, it is assumed that the label "T" is given to the range determined to be the change of scene in utterance 3 ("Is your body okay?").
 ID付与部522は、発話3を句読点で分割し、句読点で分割した要素ごとにIDを付与する。以下では、ID付与部522は、以下のようにIDを付与したものとする。
  ID1:そうでしたか、
  ID2:車庫に入れる際、
  ID3:車の後ろのバンパーを電柱で擦ったということで、
  ID4:お体は、
  ID5:大丈夫でしょうか。
The ID assigning unit 522 divides the utterance 3 by punctuation marks, and assigns an ID to each element divided by the punctuation marks. In the following, it is assumed that the ID assigning unit 522 assigns an ID as follows.
ID1: Was that so?
ID2: When you put it in the garage
ID3: Because I rubbed the bumper behind the car with a utility pole,
ID4: Your body is
ID5: Is that okay?
 ID付与部522は、発話の各要素に付与したIDをID集合に格納する。 The ID assigning unit 522 stores the ID assigned to each element of the utterance in the ID set.
 組み合わせ生成部523は、ID集合から、予め定められた最長単位の範囲内で、句読点分割した要素のIDの組み合わせ(ID列)を作成する。組み合わせ生成部523の動作について、図23を参照して説明する。図23は、組み合わせ生成部523の動作の一例を示すフローチャートである。 The combination generation unit 523 creates a combination (ID string) of the IDs of the elements divided into punctuation marks within the range of the longest predetermined unit from the ID set. The operation of the combination generation unit 523 will be described with reference to FIG. 23. FIG. 23 is a flowchart showing an example of the operation of the combination generation unit 523.
 ID取り出し部5231は、ID集合から、最長単位ごとに全てのIDを取り出し、最長単位のID集合に格納する(ステップS61)。上述したように、最長単位は句点単位なので、最長単位の範囲はID1~ID5となる。ID取り出し部5231は、ID集合からID1~ID5を取り出し、(1,2,3,4,5)を最長単位のID集合に格納する。 The ID extraction unit 5231 extracts all IDs from the ID set for each longest unit and stores them in the longest unit ID set (step S61). As described above, since the longest unit is a punctuation unit, the range of the longest unit is ID1 to ID5. The ID extraction unit 5231 extracts IDs 1 to 5 from the ID set and stores (1, 2, 3, 4, 5) in the longest unit ID set.
 組み合わせ対象ID格納部5232は、最長単位のID集合に格納されているIDのうち、一番小さいIDを最長単位のID集合から削除し、組み合わせ対象のID集合に格納する(ステップS62)。上述した例では、組み合わせ対象ID格納部5232は、最長単位のID集合からID1を取り出し、組み合わせ対象のID集合に格納する。また、組み合わせ対象ID格納部5232は、最長単位のID集合からID1を削除する。したがって、最長単位のID集合には、(2,3,4,5)が格納される。 The combination target ID storage unit 5232 deletes the smallest ID among the IDs stored in the longest unit ID set from the longest unit ID set, and stores the ID in the combination target ID set (step S62). In the above-mentioned example, the combination target ID storage unit 5232 takes out ID1 from the ID set of the longest unit and stores it in the combination target ID set. Further, the combination target ID storage unit 5232 deletes ID1 from the ID set of the longest unit. Therefore, (2,3,4,5) is stored in the ID set of the longest unit.
 組み合わせ生成ID格納部5233は、組み合わせ対象のID集合に含まれる全てのIDを小さい順に並べて、組み合わせ生成IDの集合と、組み合わせID列の集合とに格納する(ステップS63)。上述した例では、組み合わせ対象のID集合には(1)が格納されているので、全てのIDを小さい順に並べた組み合わせ列は[1]となる。組み合わせ生成ID格納部5233は、組み合わせ生成IDの集合には(1)を格納し、組み合わせID列の集合には[1]を格納する。 The combination generation ID storage unit 5233 arranges all the IDs included in the combination target ID set in ascending order and stores them in the combination generation ID set and the combination ID string set (step S63). In the above example, since (1) is stored in the ID set to be combined, the combination sequence in which all the IDs are arranged in ascending order is [1]. The combination generation ID storage unit 5233 stores (1) in the set of combination generation IDs, and stores [1] in the set of combination ID columns.
 組み合わせID生成部5234は、組み合わせ生成IDの集合に格納されているID列のうち、一番小さいIDを削除し、残ったIDを小さい順に並べて組み合わせID列の集合に格納する(ステップS64)。上述した例では、組み合わせ生成IDの集合には、(1)が格納されている。したがって、組み合わせID生成部5234は、一番小さいID1を削除する。 The combination ID generation unit 5234 deletes the smallest ID among the ID columns stored in the combination generation ID set, arranges the remaining IDs in ascending order, and stores them in the combination ID column set (step S64). In the above example, (1) is stored in the set of combination generation IDs. Therefore, the combination ID generation unit 5234 deletes the smallest ID1.
 次に、組み合わせID生成部5234は、組み合わせ生成IDの集合が空であるか否かを判定する(ステップS65)。上述した例では、ID1が削除されることで、組み合わせ生成IDの集合は空となっている。 Next, the combination ID generation unit 5234 determines whether or not the set of combination generation IDs is empty (step S65). In the above example, the set of combination generation IDs is empty because ID1 is deleted.
 組み合わせ生成IDの集合が空でないと判定すると(ステップS65:No)、組み合わせID生成部5234は、ステップS64の処理を繰り返す。 If it is determined that the set of combination generation IDs is not empty (step S65: No), the combination ID generation unit 5234 repeats the process of step S64.
 組み合わせID生成部5234により組み合わせ生成IDの集合が空であると判定されると(ステップS65:Yes)、組み合わせ対象ID格納部5232は、最長単位のID集合が空であるか否かを判定する(ステップS66)。上述した例では、最長単位のID集合には、(2,3,4,5)が格納されているので、最長単位のID集合は空ではない。 When the combination ID generation unit 5234 determines that the combination generation ID set is empty (step S65: Yes), the combination target ID storage unit 5232 determines whether or not the longest unit ID set is empty. (Step S66). In the above example, since (2,3,4,5) is stored in the longest unit ID set, the longest unit ID set is not empty.
 最長単位のID集合が空ではないと判定すると(ステップS66:No)、組み合わせ対象ID格納部5232は、ステップS62の処理に戻る。上述した例では、最長単位のID集合には(2,3,4,5)が格納されているので、組み合わせ対象ID格納部5232は、一番小さいID2を取り出し、組み合わせ対象IDに格納する。また、組み合わせ対象ID格納部5232は、最長単位のID集合からID2を削除する。したがって、最長単位のID集合には、(3,4,5)が格納される。 When it is determined that the ID set of the longest unit is not empty (step S66: No), the combination target ID storage unit 5232 returns to the process of step S62. In the above example, since (2, 3, 4, 5) is stored in the ID set of the longest unit, the combination target ID storage unit 5232 takes out the smallest ID 2 and stores it in the combination target ID. Further, the combination target ID storage unit 5232 deletes the ID 2 from the ID set of the longest unit. Therefore, (3, 4, 5) is stored in the ID set of the longest unit.
 以下、ステップS63およびステップS64の処理が行われ、組み合わせ対象のID集合には、(1,2)が格納される。また、組み合わせ対象のID集合に格納された全てのIDを小さい順に並べたID列が組み合わせ生成IDの集合および組み合わせID列の集合に格納される。組み合わせ対象のID集合には、(1,2)が格納されているので、全てのIDを小さい順に並べた組み合わせ列は[1,2]となり、組み合わせ生成IDの集合には(1,2)が格納される。また、組み合わせ列[1,2]が組み合わせ列の集合に追加され、組み合わせ列の集合は、([1],[1,2])となる。 Hereinafter, the processes of steps S63 and S64 are performed, and (1 and 2) are stored in the ID set to be combined. Further, an ID string in which all the IDs stored in the ID set to be combined are arranged in ascending order is stored in the combination generation ID set and the combination ID string set. Since (1,2) is stored in the ID set to be combined, the combination column in which all the IDs are arranged in ascending order is [1,2], and the combination generation ID set is (1,2). Is stored. Further, the combination sequence [1, 2] is added to the set of combination columns, and the set of combination columns becomes ([1], [1, 2]).
 組み合わせID生成部5234は、組み合わせ生成IDの集合に格納されているID列のうち、一番小さいIDを削除し、残ったIDを小さい順に並べて組み合わせID列の集合に格納する。上述した例では、組み合わせ生成IDの集合には、(1,2)が格納されている。したがって、組み合わせID生成部5234は、一番小さいID1を削除する。ID1が削除され、組み合わせ生成IDの集合には、(2)が残る。組み合わせ生成IDの集合には(2)が残っているので、組み合わせID生成部5234は、組み合わせID列の集合に[2]を格納する。したがって、組み合わせID列の集合は、([1],[1,2],[2])となる。 The combination ID generation unit 5234 deletes the smallest ID among the ID columns stored in the combination generation ID set, arranges the remaining IDs in ascending order, and stores them in the combination ID column set. In the above example, (1, 2) is stored in the set of combination generation IDs. Therefore, the combination ID generation unit 5234 deletes the smallest ID1. ID1 is deleted, and (2) remains in the set of combination generation IDs. Since (2) remains in the set of combination generation IDs, the combination ID generation unit 5234 stores [2] in the set of combination ID strings. Therefore, the set of combination ID columns is ([1], [1,2], [2]).
 以下、同様の処理が、最長単位のID集合が空になるまで繰り返される。最長単位のID集合が空になった際には、組み合わせID列の集合には、以下のID列が格納される。このように、組み合わせ生成部523は、発話を所定の規則で分割した一の要素または連続する複数の要からなる組み合わせID列を生成する。
  [1]
  [1,2]
  [2]
  [1,2,3]
  [2,3]
  [3]
  [1,2,3,4]
  [2,3,4]
  [3,4]
  [4]
  [1,2,3,4,5]
  [2,3,4,5]
  [3,4,5]
  [4,5]
  [5]
Hereinafter, the same process is repeated until the ID set of the longest unit becomes empty. When the ID set of the longest unit becomes empty, the following ID columns are stored in the set of combination ID columns. In this way, the combination generation unit 523 generates a combination ID string composed of one element or a plurality of consecutive key points in which the utterance is divided according to a predetermined rule.
[1]
[1, 2]
[2]
[1, 2, 3]
[2, 3]
[3]
[1, 2, 3, 4]
[2,3,4]
[3,4]
[4]
[1, 2, 3, 4, 5]
[2,3,4,5]
[3, 4, 5]
[4,5]
[5]
 組み合わせ対象ID格納部5232により最長単位のID集合が空であると判定されると(ステップS66:Yes)、ID取り出し部5231は、ID集合のうち、最長単位のID集合に格納していないIDがあるか否かを判定する(ステップS67)。 When the combination target ID storage unit 5232 determines that the ID set of the longest unit is empty (step S66: Yes), the ID extraction unit 5231 does not store the IDs stored in the longest unit ID set among the ID sets. It is determined whether or not there is (step S67).
 最長単位のID集合に格納していないIDがあると判定すると(ステップS67:Yes)、ID取り出し部5231は、ステップS61の処理に戻る。 If it is determined that there is an ID that is not stored in the ID set of the longest unit (step S67: Yes), the ID extraction unit 5231 returns to the process of step S61.
 最長単位のID集合に格納していないIDがないと判定されると(ステップS67:No)、組み合わせ生成部523は、処理を終了する。 When it is determined that there is no ID stored in the ID set of the longest unit (step S67: No), the combination generation unit 523 ends the process.
 次に、付与部524の動作について、図24を参照して説明する。図24は、付与部524の動作の一例を示すフローチャートである。 Next, the operation of the granting unit 524 will be described with reference to FIG. 24. FIG. 24 is a flowchart showing an example of the operation of the granting unit 524.
 正例付与部5241は、組み合わせ生成部523により生成された組み合わせID列の集合に含まれるID列の中で、教師データと一致する範囲のID列全てに正例を示すラベルを付与する(ステップS71)。上述したように、教師データとして、発話3において場面の切り替わりと判定される範囲(「お体は、大丈夫でしたでしょうか?」)にラベル「T」が付与されているものとする。したがって、正例付与部5241は、発話3における「お体は、大丈夫でしたでしょうか?」と同じ範囲のID列[4,5]に正例を示すラベル(「T」)を付与する。 The regular example assigning unit 5241 assigns a label indicating a regular example to all the ID columns in the range matching the teacher data in the ID strings included in the set of the combination ID strings generated by the combination generation unit 523 (step). S71). As described above, it is assumed that the label "T" is attached to the range determined to be the scene change in the utterance 3 ("Is your body okay?") As the teacher data. Therefore, the example giving unit 5241 assigns a label (“T”) indicating the example to the ID columns [4, 5] in the same range as “Is your body okay?” In the utterance 3.
 負例付与部5242は、組み合わせID列の集合に含まれるID列の中で、正例を示すラベルが付与されたID列に含まれるIDを1つも含まない組み合わせID列全てに、負例を示すラベルを付与する(ステップS72)。上述した例では、ID列[4,5]に正例を示すラベルが付与される。したがって、負例付与部5242は、ID4もID5も含まれない以下の全ての組み合わせID列に、負例を示すラベル(「F」)を付与する。
  [1]:F
  [1,2]:F
  [2]:F
  [1,2,3]:F
  [2,3]:F
  [3]:F
The negative example assigning unit 5242 includes negative examples in all the combination ID columns included in the set of combination ID columns, which does not include any ID included in the ID column labeled with a positive example. A label is attached (step S72). In the above-mentioned example, the ID column [4, 5] is given a label indicating a positive example. Therefore, the negative example assigning unit 5242 assigns a label (“F”) indicating a negative example to all the following combination ID strings that do not include ID4 and ID5.
[1]: F
[1, 2]: F
[2]: F
[1, 2, 3]: F
[2,3]: F
[3]: F
 対象外付与部5243は、組み合わせID列の集合に含まれるID列の中で、正例を示すラベルも負例を示すラベルも付与されていない組み合わせID列全てに、対象外を示すラベルを付与する(ステップS73)。上述した例では、対象外付与部5243は、以下の組み合わせID列に、対象外を示すラベルを付与する。
  [1,2,3,4]:対象外
  [2,3,4]:対象外
  [3,4]:対象外
  [4]:対象外
  [1,2,3,4,5]:対象外
  [2,3,4,5]:対象外
  [3,4,5]:対象外
  [5]
The non-target granting unit 5243 assigns a label indicating non-target to all the combination ID columns to which neither the label indicating the positive example nor the label indicating the negative example is assigned among the ID columns included in the set of the combination ID columns. (Step S73). In the above-mentioned example, the non-target granting unit 5243 assigns a label indicating non-target to the following combination ID column.
[1,2,3,4]: Not applicable [2,3,4]: Not applicable [3,4]: Not applicable [4]: Not applicable [1,2,3,4,5]: Not applicable [2,3,4,5]: Not applicable [3,4,5]: Not applicable [5]
 対象外付与部5243は、組み合わせID列の集合から、対象外を示すラベルが付与されている組み合わせID列を削除する。そして、対象外付与部5243は、正例あるいは負例を示すラベルが付与されている組み合わせID列に対応する分割単位を学習データに格納する。上述した例では、以下の組み合わせID列に対応する分割単位を学習データに格納する。
  [1]:F
  [1,2]:F
  [2]:F
  [1,2,3]:F
  [2,3]:F
  [3]:F
  [4,5]:T
The non-target granting unit 5243 deletes the combination ID column to which the label indicating the non-target is attached from the set of the combination ID columns. Then, the non-target granting unit 5243 stores the division unit corresponding to the combination ID string to which the label indicating the positive example or the negative example is attached in the learning data. In the above-mentioned example, the division unit corresponding to the following combination ID string is stored in the learning data.
[1]: F
[1, 2]: F
[2]: F
[1, 2, 3]: F
[2,3]: F
[3]: F
[4,5]: T
 このように本実施形態に係る学習データ作成装置50、発話を所定の規則(例えば、句読点単位)で分割した一の要素または連続する複数の要素により構成される分割単位に対してラベルを付与して、学習データを作成する。ここで、本実施形態においては、学習データには、構成する要素の数が異なる分割単位が含まれる。 In this way, the learning data creation device 50 according to the present embodiment, a division unit composed of one element or a plurality of consecutive elements in which the utterance is divided by a predetermined rule (for example, a punctuation mark unit) is given a label. And create learning data. Here, in the present embodiment, the learning data includes division units having different numbers of constituent elements.
 そのため、発話の途中で場面(話)が切り替わるような場合にも、その切り替わりに応じた発話の分割単位で、学習データを作成することができる。また、このようにして作成された学習データを学習することで、発話の途中で場面(話)が切り替わる場合にも、場面の切り替わりを高精度に推定することが可能なモデルを作成することができる。 Therefore, even if the scene (story) is switched in the middle of the utterance, the learning data can be created in the utterance division unit according to the change. In addition, by learning the learning data created in this way, it is possible to create a model that can estimate the scene change with high accuracy even when the scene (story) changes in the middle of the utterance. can.
 次に、本実施形態に係る推定装置30dについて説明する。本実施形態に係る推定装置30dは、学習データ作成装置50により作成された学習データに基づいて学習したモデルを用いて、構成する要素の数が異なる発話の分割単位で、場面(話)の切り替わりを推定するものである Next, the estimation device 30d according to the present embodiment will be described. The estimation device 30d according to the present embodiment uses a model trained based on the training data created by the training data creation device 50, and switches scenes (story) in utterance division units having different numbers of constituent elements. Is to estimate
 図25は、本実施形態に係る推定装置30dの構成例を示す図である。 FIG. 25 is a diagram showing a configuration example of the estimation device 30d according to the present embodiment.
 図25に示すように、本実施形態に係る推定装置30dは、入力部61と、推定部62と、出力部63とを備える。 As shown in FIG. 25, the estimation device 30d according to the present embodiment includes an input unit 61, an estimation unit 62, and an output unit 63.
 入力部61は、対話の系列データが入力される。入力部61は、図26に示すように、文出力部611を備える。文出力部611は、文出力部521と同様に、入力部61に入力された系列データを構成する発話の文字列を文として推定部62に出力する。系列データがテキストデータである場合には、文出力部611は、形態素解析により単語単位に分割された文を出力する。また、系列データが音声データである場合には、出力部611は、音声認識により単語単位に分割された文を出力する。 The dialogue series data is input to the input unit 61. As shown in FIG. 26, the input unit 61 includes a sentence output unit 611. Similar to the sentence output unit 521, the sentence output unit 611 outputs the utterance character string constituting the series data input to the input unit 61 to the estimation unit 62 as a sentence. When the series data is text data, the sentence output unit 611 outputs a sentence divided into word units by morphological analysis. When the series data is voice data, the output unit 611 outputs a sentence divided into word units by voice recognition.
 図25を再び参照すると、推定部62は、推定モデル3を用いて、入力部61から出力された文から、話の切り替わりを推定する。推定モデル3は、学習データ作成装置50により作成された学習データの学習により作成されたモデルである。上述したように、学習データ作成部50が作成する学習データは、構成する要素の数が異なる分割単位を含み、分割単位それぞれに対して、話の切り替わりであるか否かのラベルが付与されたデータである。したがって、推定モデル3は、構成する要素の数が異なる分割単位それぞれに対して、話の切り替わりであるか否かを判定するように予め学習したモデルである。推定部62は、処理対象の系列データを構成する発話から、構成する要素の数が異なる分割単位を生成し、生成した分割単位ごとに、第1のモデルとしての推定モデル3を用いて、話の切り替わりであるか否かを判定する With reference to FIG. 25 again, the estimation unit 62 estimates the change of story from the sentence output from the input unit 61 by using the estimation model 3. The estimation model 3 is a model created by learning the learning data created by the learning data creation device 50. As described above, the learning data created by the learning data creation unit 50 includes division units having different numbers of constituent elements, and each division unit is given a label as to whether or not the story is switched. It is data. Therefore, the estimation model 3 is a model learned in advance so as to determine whether or not the story is switched for each of the division units having different numbers of constituent elements. The estimation unit 62 generates division units having different numbers of constituent elements from the utterances constituting the series data to be processed, and uses the estimation model 3 as the first model for each of the generated division units. Judging whether or not it is a switch
 出力部63は、推定部62による推定結果を出力する。 The output unit 63 outputs the estimation result by the estimation unit 62.
 次に、推定部62の構成について説明する。図27は、推定部62の構成例を示す図である。 Next, the configuration of the estimation unit 62 will be described. FIG. 27 is a diagram showing a configuration example of the estimation unit 62.
 図27に示すように、推定部62は、ID付与部621と、組み合わせ生成部622と、切り替わり推定部623とを備える。 As shown in FIG. 27, the estimation unit 62 includes an ID assignment unit 621, a combination generation unit 622, and a switching estimation unit 623.
 ID付与部621は、文出力部611から出力された文から、発話を所定の規則で分割した要素を生成する。ID付与部621による分割の単位は、単語単位、句読点単位、音声認識単位、話し終わり単位など、特定可能な単位なら任意の単位であってよい。ID付与部621は、発話を分割した要素それぞれに対してIDを付与し、各要素に付与したIDをID集合に格納する。 The ID assignment unit 621 generates an element in which the utterance is divided according to a predetermined rule from the sentence output from the sentence output unit 611. The unit of division by the ID assigning unit 621 may be any identifiable unit such as a word unit, a punctuation mark unit, a voice recognition unit, and a speech end unit. The ID assigning unit 621 assigns an ID to each of the elements in which the utterance is divided, and stores the ID assigned to each element in the ID set.
 組み合わせ生成部622は、ID集合に格納されたIDに基づいて、話の切り替わりの推定に用いるIDの組み合わせ(組み合わせID列)を生成する。 The combination generation unit 622 generates a combination of IDs (combination ID string) used for estimating the switching of the story based on the IDs stored in the ID set.
 図28は、組み合わせ生成部622の構成例を示す図である。図28に示すように、組み合わせ生成部622は、ID取り出し部6221と、組み合わせ対象ID格納部6222と、組み合わせ生成ID格納部6223と、組み合わせID生成部6224と、を備える。 FIG. 28 is a diagram showing a configuration example of the combination generation unit 622. As shown in FIG. 28, the combination generation unit 622 includes an ID extraction unit 6221, a combination target ID storage unit 6222, a combination generation ID storage unit 6223, and a combination ID generation unit 6224.
 ID取り出し部6221は、ID取り出し部5231と同様に、ID集合から、予め定められた最長単位のIDを取り出し、最長単位のID集合に格納する。 Similar to the ID extraction unit 5231, the ID extraction unit 6221 extracts a predetermined longest unit ID from the ID set and stores it in the longest unit ID set.
 組み合わせ対象ID格納部6222は、組み合わせ対象ID格納部5232と同様に、最長単位のID集合から、組み合わせ対象となる範囲のIDを取り出し、組み合わせ対象のID集合に格納する。 Similar to the combination target ID storage unit 5232, the combination target ID storage unit 6222 extracts the IDs in the range to be combined from the longest unit ID set and stores them in the combination target ID set.
 組み合わせ生成ID格納部6223は、組み合わせ生成ID格納部6223と同様に、組み合わせ対象のID集合から、組み合わせID列を生成するための組み合わせ生成IDを取得し、組み合わせ生成IDの集合に格納する。 Similar to the combination generation ID storage unit 6223, the combination generation ID storage unit 6223 acquires the combination generation ID for generating the combination ID string from the combination target ID set and stores it in the combination generation ID storage unit.
 組み合わせID生成部6224は、組み合わせID生成部5234と同様に、組み合わせ生成IDの集合に基づき、組み合わせID列を生成し、組み合わせID列の集合に格納して、組み合わせID列の集合を更新する。 Similar to the combination ID generation unit 5234, the combination ID generation unit 6224 generates a combination ID string based on the combination generation ID set, stores it in the combination ID string set, and updates the combination ID column set.
 図27を再び参照すると、組み合わせ生成部622は、生成した組み合わせID列の集合を切り替わり推定部623に出力する。 Referring to FIG. 27 again, the combination generation unit 622 switches the set of the generated combination ID string and outputs it to the estimation unit 623.
 切り替わり推定部623は、組み合わせ生成部622から出力された組み合わせID列の集合が入力される。切り替わり推定部623は、推定モデル3を用いて、組み合わせID列に対応する分割単位ごとに、その分割単位が話の切り替わりであるか否かを判定し、判定結果を出力する。 The switching estimation unit 623 is input with a set of combination ID strings output from the combination generation unit 622. The switching estimation unit 623 uses the estimation model 3 to determine for each division unit corresponding to the combination ID string whether or not the division unit is a story change, and outputs the determination result.
 次に、推定部62の動作について、切り替わり推定部623の動作を中心に説明する。組み合わせ生成部622による組み合わせID列の生成の動作は、図23を参照して説明した組み合わせ生成部523の動作と同様であるため、説明を省略する。 Next, the operation of the estimation unit 62 will be described focusing on the operation of the switching estimation unit 623. Since the operation of generating the combination ID string by the combination generation unit 622 is the same as the operation of the combination generation unit 523 described with reference to FIG. 23, the description thereof will be omitted.
 図29は、切り替わり推定部623の動作の一例を示すフローチャートである。 FIG. 29 is a flowchart showing an example of the operation of the switching estimation unit 623.
 切り替わり推定部623は、組み合わせID列の集合から、話の切り替わりであるか否かをまだ推定していないIDだけからなる組み合わせID列を1つ取り出す(ステップS81)。 The switching estimation unit 623 extracts one combination ID string consisting of only IDs for which it has not yet been estimated whether or not the story is switched from the set of combination ID strings (step S81).
 切り替わり推定部623は、取り出した組み合わせID列を単語列に置き換える(ステップS82)。すなわち、切り替わり推定部623は、組み合わせID列に含まれるIDを、そのIDに対応する発話の要素に置き換える。 The switching estimation unit 623 replaces the extracted combination ID string with a word string (step S82). That is, the switching estimation unit 623 replaces the ID included in the combination ID string with the utterance element corresponding to the ID.
 次に、切り替わり推定部623は、推定モデル3を用いて、組み合わせID列を置き換えた文字列(発話の分割単位)が、話の切り替わりであるか否かを推定する(ステップS83)。 Next, the switching estimation unit 623 estimates whether or not the character string (speech division unit) in which the combination ID string is replaced is a story switching using the estimation model 3 (step S83).
 次に、切り替わり推定部623は、推定結果が正例であったか(話の切り替わりであったか)否かを判定する(ステップS84)。 Next, the switching estimation unit 623 determines whether or not the estimation result is a positive example (whether the story is switched) (step S84).
 推定結果が正例でなかったと判定した場合(ステップS84:No)、切り替わり推定部623は、組み合わせID列の集合が空であるか否かを判定する(ステップS85)。 When it is determined that the estimation result is not a positive example (step S84: No), the switching estimation unit 623 determines whether or not the set of combination ID strings is empty (step S85).
 組み合わせID列の集合が空でないと判定した場合(ステップS85:No)、切り替わり推定部623は、ステップS81の処理に戻る。 When it is determined that the set of combination ID columns is not empty (step S85: No), the switching estimation unit 623 returns to the process of step S81.
 組み合わせID列の集合が空であると判定した場合(ステップS85:Yes)、切り替わり推定部623は、出力部63を介して、IDごとに推定結果を出力し(ステップS86)、処理を終了する。 When it is determined that the set of the combination ID strings is empty (step S85: Yes), the switching estimation unit 623 outputs the estimation result for each ID via the output unit 63 (step S86), and ends the process. ..
 判定結果が正例であると判定した場合(ステップS84:Yes)、切り替わり推定部623は、組み合わせID列の集合のなかに、話の切り替わりであるか否かを推定していないIDだけからなる組み合わせID列があるか否かを判定する(ステップS87)。 When it is determined that the determination result is a positive example (step S84: Yes), the switching estimation unit 623 consists of only IDs in the set of combination ID strings that do not estimate whether or not the story is switching. It is determined whether or not there is a combination ID column (step S87).
 話の切り替わりであるか否かを推定していないIDだけからなる組み合わせID列があると判定した場合(ステップS87:Yes)、切り替わり推定部623は、ステップS81の処理に戻る。 When it is determined that there is a combination ID string consisting only of IDs for which it is not estimated whether or not the talk is switched (step S87: Yes), the switching estimation unit 623 returns to the process of step S81.
 話の切り替わりであるか否かを推定していないIDだけからなる組み合わせID列がないと判定した場合(ステップS87:No)、切り替わり判定部623は、出力部63を介して、IDごとに推定結果と推定単位とを出力し(ステップS88)、処理を終了する。 When it is determined that there is no combination ID string consisting only of IDs for which it is not estimated whether or not the talk is switched (step S87: No), the switching determination unit 623 estimates for each ID via the output unit 63. The result and the estimation unit are output (step S88), and the process is terminated.
 以下では、具体例を挙げて、推定部62の動作についてさらに説明する。 Hereinafter, the operation of the estimation unit 62 will be further described with reference to a specific example.
 以下のような発話を例として考える。
 発話:「信号で止まっている時に、追突されたと伺っておりますが、お怪我は、大丈夫でしょうか。」
Consider the following utterances as an example.
Utterance: "I heard that you were hit when you were stopped at a traffic light. Is your injury okay?"
 ID付与部621は、図30Aに示すように、上述した発話を、句読点単位で4つの要素に分割し、各要素にID(ID1~ID4)を付与する。組み合わせ生成部622は、図23を参照して説明した処理により、組み合わせID列を生成する。図30Aに示す例では、組み合わせ生成部622は、10個の組み合わせID列([1],[1,2],[2],[1,2,3],[2,3],[3],[1,2,3,4],[2,3,4],[3,4],[4])を生成する。 As shown in FIG. 30A, the ID assigning unit 621 divides the above-mentioned utterance into four elements in units of punctuation marks, and assigns IDs (ID1 to ID4) to each element. The combination generation unit 622 generates a combination ID string by the process described with reference to FIG. 23. In the example shown in FIG. 30A, the combination generation unit 622 has 10 combination ID sequences ([1], [1,2], [2], [1,2,3], [2,3], [3]. ], [1,2,3,4], [2,3,4], [3,4], [4]).
 切り替わり推定部623は、生成された組み合わせID列の集合から、1つの組み合わせID列を取り出し、取り出した組み合わせID列に対応する分割単位が話の切り替わりであるか否かを推定する。切り替わり推定部623は、図30Bに示すように、正例である(話の切り替わりである)と推定されるまで、組み合わせID列の集合内の組み合わせID列に対応する分割単位が話の切り替わりであるか否かを順に推定する。組み合わせID列[1],[1,2],[2],[1,2,3],[2,3],[3],[1,2,3,4],[2,3,4],に対応する分割単位は正例でないと推定され、組み合わせID列[3,4]に対応する分割単位が正例であると推定されたとする。 The switching estimation unit 623 extracts one combination ID string from the generated set of combination ID strings, and estimates whether or not the division unit corresponding to the extracted combination ID sequence is the switching of the story. As shown in FIG. 30B, the switching estimation unit 623 changes the story by dividing the division unit corresponding to the combination ID string in the set of the combination ID strings until it is estimated to be a positive example (story switching). Estimate in order whether or not there is. Combination ID columns [1], [1,2], [2], [1,2,3], [2,3], [3], [1,2,3,4], [2,3] It is assumed that the division unit corresponding to 4], is not a regular example, and the division unit corresponding to the combination ID sequence [3, 4] is estimated to be a regular example.
 切り替わり推定部623は、推定していないIDだけからなる組み合わせID列がないため、IDごとに推定結果と推定単位とを、出力部63を介して出力する。組み合わせID列[3,4]に対応する分割単位が正例であると推定されたため、切り替わり推定部623は、図30Bに示すように、ID3およびID4対して推定結果が正例であったこと、また、正例であると推定された単位(推定単位)が、組み合わせ列[3,4]であったことを出力する。 Since the switching estimation unit 623 does not have a combination ID string consisting of only unestimated IDs, the estimation result and the estimation unit are output for each ID via the output unit 63. Since it was estimated that the division unit corresponding to the combination ID sequence [3, 4] was a positive example, the switching estimation unit 623 had a positive example for the ID 3 and ID 4 as shown in FIG. 30B. Also, it is output that the unit (estimated unit) presumed to be a positive example is the combination sequence [3, 4].
 別の具体例を挙げて、推定部62の動作についてさらに説明する。 The operation of the estimation unit 62 will be further described by giving another specific example.
 以下のような発話を例として考える。
 発話:「では、お車の状況を詳しく教えて頂きたいのですが、今回は、等級が下がることはございません。」
Consider the following utterances as an example.
Utterance: "Then, I would like to know the situation of the car in detail, but this time, the grade will not be lowered."
 ID付与部621は、図31Aに示すように、上述した発話を、句読点単位で4つの要素に分割し、各要素にID(ID1~ID4)を付与する。組み合わせ生成部622は、図23を参照して説明した処理により、組み合わせID列を生成する。図31Aに示す例では、組み合わせ生成部622は、10個の組み合わせID列([1],[1,2],[2],[1,2,3],[2,3],[3],[1,2,3,4],[2,3,4],[3,4],[4])を生成する。 As shown in FIG. 31A, the ID assigning unit 621 divides the above-mentioned utterance into four elements in units of punctuation marks, and assigns IDs (ID1 to ID4) to each element. The combination generation unit 622 generates a combination ID string by the process described with reference to FIG. 23. In the example shown in FIG. 31A, the combination generation unit 622 has 10 combination ID sequences ([1], [1,2], [2], [1,2,3], [2,3], [3]. ], [1,2,3,4], [2,3,4], [3,4], [4]).
 切り替わり推定部623は、生成された組み合わせID列の集合から、1つの組み合わせID列を取り出し、取り出した組み合わせID列に対応する分割単位が話の切り替わりであるか否かを推定する。切り替わり推定部623は、図31Bに示すように、正例である(話の切り替わりである)と推定されるまで、組み合わせID列の集合内の組み合わせID列に対応する分割単位が話の切り替わりであるか否かを順に推定する。以下では、組み合わせID列[1]に対応する分割単位は正例でないと推定され、組み合わせID列[1,2]に対応する分割単位が正例であると推定されたとする。 The switching estimation unit 623 extracts one combination ID string from the generated set of combination ID strings, and estimates whether or not the division unit corresponding to the extracted combination ID sequence is the switching of the story. As shown in FIG. 31B, the switching estimation unit 623 changes the story by dividing the division unit corresponding to the combination ID string in the set of the combination ID strings until it is estimated to be a positive example (story switching). Estimate in order whether or not there is. In the following, it is assumed that the division unit corresponding to the combination ID column [1] is not a regular example, and the division unit corresponding to the combination ID column [1, 2] is a regular example.
 切り替わり推定部623は、正例であるか否かを推定していないID(ID3およびID4)だけからなる組み合わせID列([3],[3,4],[4])が存在するため、これらのID列についてさらに、正例であるか否かを推定する。以下では、組み合わせID列[3]に対応する分割単位は正例でないと推定され、組み合わせID列[3,4]に対応する分割単位が正例であると推定されたとする。 Since the switching estimation unit 623 has a combination ID sequence ([3], [3, 4], [4]) consisting only of IDs (ID3 and ID4) for which it is not estimated whether or not it is a positive example, there is a combination ID sequence ([3], [3,4], [4]). It is further estimated whether or not these ID columns are correct examples. In the following, it is assumed that the division unit corresponding to the combination ID column [3] is not a regular example, and the division unit corresponding to the combination ID column [3, 4] is estimated to be a regular example.
 切り替わり推定部623は、推定していなIDだけからなる組み合わせID列がないため、IDごとに推定結果と推定単位とを、出力部63を介して出力する。組み合わせID列[1,2]および組み合わせID列[3,4]に対応する分割単位が正例であると推定されたため、切り替わり推定部623は、図31Bに示すように、ID1,ID2に対して、推定結果が正例であったこと、また、推定単位が組み合わせ列[1,2]であったことを出力する。また、切り替わり推定部623は、ID3,ID4に対して、推定結果が正例であったこと、また、推定単位が組み合わせ列[3,4]であったことを出力する。 Since the switching estimation unit 623 does not have a combination ID string consisting of only unestimated IDs, the estimation result and the estimation unit are output for each ID via the output unit 63. Since it was estimated that the division unit corresponding to the combination ID column [1, 2] and the combination ID column [3, 4] is a positive example, the switching estimation unit 623 may refer to ID 1 and ID 2 as shown in FIG. 31B. It is output that the estimation result is a positive example and that the estimation unit is the combination sequence [1, 2]. Further, the switching estimation unit 623 outputs to ID3 and ID4 that the estimation result is a positive example and that the estimation unit is the combination sequence [3,4].
 次に、本実施形態のように、分割単位の範囲を可変にする場合と、第1および第2の実施形態のように、分割単位の範囲を固定にする場合とで、話の切り替わりの推定精度の比較を行った結果について説明する。分割単位の範囲を固定にした場合、適合率は0.46、再現率は0.33、F値は0.38であった。一方、分割単位の範囲を可変にした場合、適合率は0.49、再現率は0.35、F値は0.41であった。この結果より、分割単位の範囲を可変にする場合に、分割単位の範囲を固定にする場合よりも、高い推定精度が得られることが確認された。 Next, estimation of switching between the case where the range of the division unit is variable as in the present embodiment and the case where the range of the division unit is fixed as in the first and second embodiments. The result of comparing the accuracy will be described. When the range of the division unit was fixed, the precision was 0.46, the recall was 0.33, and the F value was 0.38. On the other hand, when the range of the division unit was made variable, the precision was 0.49, the recall was 0.35, and the F value was 0.41. From this result, it was confirmed that higher estimation accuracy can be obtained when the range of the division unit is variable than when the range of the division unit is fixed.
 このように本実施形態においては、発話を所定の規則で分割した一の要素または連続する複数の要素からなり、構成する要素の数が異なる分割単位それぞれに対して、話の切り替わりであるか否かを示すラベルを付与した学習データを作成する。さらに、本実施形態においては、処理対象の系列データを構成する発話から、構成する要素の数が異なる分割単位を生成し、上記学習データを学習済みの推定モデル3を用いて、生成した分割単位ごとに、推定モデル3を用いて、話の切り替わりであるか否かを判定する。 As described above, in the present embodiment, whether or not the utterance is switched for each of the divided units in which the utterance is divided according to a predetermined rule and is composed of one element or a plurality of consecutive elements and the number of constituent elements is different. Create training data with a label indicating. Further, in the present embodiment, a division unit having a different number of constituent elements is generated from the utterances constituting the series data to be processed, and the training data is generated by using the trained estimation model 3. Each time, the estimation model 3 is used to determine whether or not the story is switched.
 そのため、発話の途中で話が切り替わる場合にも、切り替わりの箇所を高精度に推定することができる。 Therefore, even when the story is switched in the middle of the utterance, the switching point can be estimated with high accuracy.
 なお、第1の実施形態においては、二値分類モデル1が学習装置10により作成され、多値分類モデル2が学習装置20により作成される例を用いて説明したが、これに限れられるものではない。例えば、図32に示すように、1つの学習装置70が、二値分類モデル1と、多値分類モデル2とを作成してもよい。 In the first embodiment, the binary classification model 1 is created by the learning device 10 and the multi-value classification model 2 is created by the learning device 20. However, the present invention is not limited to this. No. For example, as shown in FIG. 32, one learning device 70 may create a binary classification model 1 and a multi-value classification model 2.
 学習装置70は、図32に示すように、入力部11と、第1のモデル学習部としての二値分類学習部12と、入力部21と、多値ラベル補完部22と、第2のモデル学習部としての多値分類学習部23とを備える。 As shown in FIG. 32, the learning device 70 includes an input unit 11, a binary classification learning unit 12 as a first model learning unit, an input unit 21, a multi-value label complementing unit 22, and a second model. It is provided with a multi-value classification learning unit 23 as a learning unit.
 入力部11および二値分類学習部12それぞれの動作は、図1を参照して説明した入力部11および二値分類学習部12それぞれの動作と同じである。詳細な説明は省略するが、二値分類学習部12は、複数の話題を含む対話の系列データを構成する発話または発話を分割した分割単位に対して、話の切り替わりであるか否かを示す二値ラベル(第1のラベル)が付与された教師データ(第1の教師データ)に基づき、処理対象の系列データを構成する発話が、話の切り替わりの発話であるか否かを判定する二値分類モデル1(第1のモデル)を学習する。 The operation of each of the input unit 11 and the binary classification learning unit 12 is the same as the operation of each of the input unit 11 and the binary classification learning unit 12 described with reference to FIG. Although detailed description is omitted, the binary classification learning unit 12 indicates whether or not the utterance or the division unit in which the utterance is divided, which constitutes the series data of the dialogue including a plurality of topics, is the switching of the utterance. Based on the teacher data (first teacher data) to which the binary label (first label) is attached, it is determined whether or not the utterance constituting the series data to be processed is the utterance of the switching of the talk. Learn the value classification model 1 (first model).
 入力部21、多値ラベル補完部22および多値分類学習部23それぞれの動作は、図2を参照して説明した入力部21、多値ラベル補完部22および多値分類学習部23それぞれの動作と同じである。詳細な説明は省略するが、多値分類学習部23は、系列データにおける1つの話題が続く範囲に、その範囲における話題を示す多値ラベル(第2のラベル)が付与された教師データ(第2の教師データ)に基づき、処理対象の系列データを構成する発話における話題を推定する多値分類モデル2(第2のモデル)を学習する。 The operations of the input unit 21, the multi-value label complement unit 22, and the multi-value classification learning unit 23 are the operations of the input unit 21, the multi-value label complement unit 22, and the multi-value classification learning unit 23, which are described with reference to FIG. Is the same as. Although detailed explanation is omitted, the multi-valued classification learning unit 23 has teacher data (second label) in which a multi-valued label (second label) indicating a topic in the range is added to a range in which one topic in the series data continues. Based on the teacher data of 2), the multi-valued classification model 2 (second model) that estimates the topic in the utterance that constitutes the series data to be processed is learned.
 図33は、学習装置70の動作の一例を示す図であり、学習装置70による学習方法について説明するための図である。 FIG. 33 is a diagram showing an example of the operation of the learning device 70, and is a diagram for explaining a learning method by the learning device 70.
 二値分類学習部12は、複数の話題を含む対話の系列データを構成する発話または発話を分割した分割単位に対して、話の切り替わりであるか否かを示す二値ラベルが付与された教師データ(第1の教師データ)に基づき、処理対象の系列データを構成する発話が、話の切り替わりの発話であるか否かを判定する二値分類モデル1を学習する(ステップS91)。 The binary classification learning unit 12 is a teacher to which a binary label indicating whether or not the utterance is switched is given to the utterance or the division unit obtained by dividing the utterance that constitutes the series data of the dialogue including a plurality of topics. Based on the data (first teacher data), the binary classification model 1 for determining whether or not the utterance constituting the series data to be processed is the utterance of the switching of the talk is learned (step S91).
 多値分類学習部23は、系列データにおける1つの話題が続く範囲に、その範囲における話題を示す多値ラベルが付与された教師データに基づき、処理対象の系列データを構成する発話における話題を推定する多値分類モデル2を学習する(ステップS92)。 The multi-value classification learning unit 23 estimates the topic in the utterance that constitutes the series data to be processed based on the teacher data in which the multi-value label indicating the topic in the range is added to the range in which one topic in the series data continues. Learn the multi-valued classification model 2 to be performed (step S92).
 次に、本開示に係る推定装置30~30dのハードウェア構成について説明する。なお、以下では、推定装置30のハードウェア構成について説明するが、推定装置30a~30dについても同様のハードウェア構成であってよい。また、学習装置10,20,70および学習データ作成装置50についても同様のハードウェア構成であってよい。 Next, the hardware configuration of the estimation devices 30 to 30d according to the present disclosure will be described. Although the hardware configuration of the estimation device 30 will be described below, the estimation devices 30a to 30d may have the same hardware configuration. Further, the learning devices 10, 20, 70 and the learning data creating device 50 may have the same hardware configuration.
 図34は、本開示の推定装置30がプログラム命令を実行可能なコンピュータである場合のハードウェア構成を示すブロック図である。ここで、コンピュータは、汎用コンピュータ、専用コンピュータ、ワークステーション、PC(Personal Computer)、電子ノートパッドなどであってもよい。プログラム命令は、必要なタスクを実行するためのプログラムコード、コードセグメントなどであってもよい。 FIG. 34 is a block diagram showing a hardware configuration when the estimation device 30 of the present disclosure is a computer capable of executing a program instruction. Here, the computer may be a general-purpose computer, a dedicated computer, a workstation, a PC (Personal Computer), an electronic notepad, or the like. The program instruction may be a program code, a code segment, or the like for executing a necessary task.
 図32に示す例では、推定装置30は、プロセッサ110、ROM(Read Only Memory)120、RAM(Random Access Memory)130、ストレージ140、入力部150、表示部160および通信インタフェース(I/F)170を有する。各構成は、バス190を介して相互に通信可能に接続されている。プロセッサ110は、具体的にはCPU(Central Processing Unit)、MPU(Micro Processing Unit)、GPU(Graphics Processing Unit)、DSP(Digital Signal Processor)、SoC(System on a Chip)などであり、同種または異種の複数のプロセッサにより構成されてもよい。 In the example shown in FIG. 32, the estimation device 30 includes a processor 110, a ROM (Read Only Memory) 120, a RAM (Random Access Memory) 130, a storage 140, an input unit 150, a display unit 160, and a communication interface (I / F) 170. Has. Each configuration is communicably connected to each other via bus 190. Specifically, the processor 110 is a CPU (Central Processing Unit), an MPU (Micro Processing Unit), a GPU (Graphics Processing Unit), a DSP (Digital Signal Processor), a SoC (System on a Chip), or the like, and is of the same type or different types. It may be composed of a plurality of processors.
 プロセッサ110は、各構成の制御、および各種の演算処理を実行する。すなわち、プロセッサ110は、ROM120またはストレージ140からプログラムを読み出し、RAM130を作業領域としてプログラムを実行する。プロセッサ110は、ROM120またはストレージ140に記憶されているプログラムに従って、推定装置30の上記各構成の制御および各種の演算処理を行う。本実施形態では、ROM120またはストレージ140には、本開示に係るプログラムが格納されている。プロセッサ110は、当該プログラムを読み出して実行する。判定部32、段落推定部33および話題推定部34は、制御部38を構成する(図3)。該制御部38は、ASIC(Application Specific Integrated Circuit)、FPGA(Field-Programmable Gate Array)など専用のハードウェアによって構成されてもよいし、上述したように1つ以上のプロセッサによって構成されてもよい。また、学習装置70が図34に示すハードウェア構成を備える場合、二値分類学習部12、多値ラベル補完部22および多値分類学習部23は、制御部71を構成する。制御部61は、ASIC、FPGAなど専用のハードウェアによって構成されてもよいし、上述したように1つ以上のプロセッサによって構成されてもよい。 The processor 110 controls each configuration and executes various arithmetic processes. That is, the processor 110 reads the program from the ROM 120 or the storage 140, and executes the program using the RAM 130 as a work area. The processor 110 controls each of the above configurations of the estimation device 30 and performs various arithmetic processes according to the program stored in the ROM 120 or the storage 140. In the present embodiment, the program according to the present disclosure is stored in the ROM 120 or the storage 140. The processor 110 reads and executes the program. The determination unit 32, the paragraph estimation unit 33, and the topic estimation unit 34 constitute a control unit 38 (FIG. 3). The control unit 38 may be configured by dedicated hardware such as an ASIC (Application Specific Integrated Circuit) or an FPGA (Field-Programmable Gate Array), or may be configured by one or more processors as described above. .. When the learning device 70 has the hardware configuration shown in FIG. 34, the binary classification learning unit 12, the multi-value label complementing unit 22, and the multi-value classification learning unit 23 constitute a control unit 71. The control unit 61 may be configured by dedicated hardware such as an ASIC or FPGA, or may be configured by one or more processors as described above.
 プログラムは、CD-ROM(Compact Disk Read Only Memory)、DVD-ROM(Digital Versatile Disk Read Only Memory)、USB(Universal Serial Bus)メモリなどの非一時的(non-transitory)記憶媒体に記憶された形態で提供されてもよい。また、プログラムは、ネットワークを介して外部装置からダウンロードされる形態としてもよい。 The program is stored in a non-transitory storage medium such as a CD-ROM (CompactDiskReadOnlyMemory), a DVD-ROM (DigitalVersatileDiskReadOnlyMemory), or a USB (UniversalSerialBus) memory. May be provided at. Further, the program may be downloaded from an external device via a network.
 ROM120は、各種プログラムおよび各種データを格納する。RAM130は、作業領域として一時的にプログラム又はデータを記憶する。ストレージ140は、HDD(Hard Disk Drive)またはSSD(Solid State Drive)により構成され、オペレーティングシステムを含む各種プログラムおよび各種データを格納する。例えば、ストレージ140は、作成した二値分類モデル1,1a、多値分類モデル2,2aおよび推定モデル3を格納する。 ROM 120 stores various programs and various data. The RAM 130 temporarily stores a program or data as a work area. The storage 140 is composed of an HDD (Hard Disk Drive) or an SSD (Solid State Drive), and stores various programs including an operating system and various data. For example, the storage 140 stores the created binary classification models 1, 1a, multi-value classification models 2, 2a, and estimation model 3.
 入力部150は、マウスなどのポインティングデバイス、およびキーボードを含み、各種の入力を行うために使用される。 The input unit 150 includes a pointing device such as a mouse and a keyboard, and is used for performing various inputs.
 表示部160は、例えば、液晶ディスプレイであり、各種の情報を表示する。表示部160は、タッチパネル方式を採用して、入力部150として機能してもよい。 The display unit 160 is, for example, a liquid crystal display and displays various information. The display unit 160 may adopt a touch panel method and function as an input unit 150.
 通信インタフェース170は、外部装置(図示しない)などの他の機器と通信するためのインタフェースであり、例えば、イーサネット(登録商標)、FDDI、Wi-Fi(登録商標)などの規格が用いられる。 The communication interface 170 is an interface for communicating with other devices such as an external device (not shown), and for example, standards such as Ethernet (registered trademark), FDDI, and Wi-Fi (registered trademark) are used.
 以上の実施形態に関し、更に以下の付記を開示する。 Regarding the above embodiments, the following additional notes will be further disclosed.
 (付記項1)
 プロセッサを備える推定装置であって、
 前記プロセッサは、
 複数の話題を含む対話の系列データを構成する発話または前記発話を分割した分割単位に対して、第1の教師データに基づいて予め学習された第1のモデルを用いて、処理対象の系列データを構成する発話が、話の切り替わりの発話であるか否かを判定し、
 前記判定の結果に基づき、前記処理対象の系列データにおける、話の切り替わりから次の切り替わりの直前の発話までの段落または前記話の切り替わりから前記対話の終わりの発話までの段落の範囲を推定する、推定装置。
(Appendix 1)
An estimator with a processor
The processor
The series data to be processed using the first model learned in advance based on the first teacher data for the utterance that constitutes the series data of the dialogue including a plurality of topics or the division unit obtained by dividing the utterance. Judging whether or not the utterances that make up the utterance are switching utterances,
Based on the result of the determination, the range of the paragraph from the switching of the talk to the utterance immediately before the next switching or the paragraph from the switching of the talk to the utterance at the end of the dialogue in the series data to be processed is estimated. Estimator.
 (付記項2)
 プロセッサを備える学習装置であって、
 前記プロセッサは、
 複数の話題を含む対話の系列データを構成する発話または前記発話を分割した分割単位に対して、話の切り替わりであるか否かを示す第1のラベルが付与された第1の教師データに基づき、処理対象の系列データを構成する発話が、話の切り替わりの発話であるか否かを判定する第1のモデルを学習し、
 前記系列データにおける1つの話題が続く範囲に、前記範囲における話題を示す第2のラベルが付与された第2の教師データに基づき、前記処理対象の系列データを構成する発話における話題を推定する第2のモデルを学習する、学習装置。
(Appendix 2)
A learning device equipped with a processor
The processor
Based on the first teacher data to which the first label indicating whether or not the utterance is switched is given to the utterance that constitutes the series data of the dialogue including a plurality of topics or the divided unit obtained by dividing the utterance. , Learn the first model to determine whether the utterances that make up the series data to be processed are utterances that switch stories.
The topic in the utterance constituting the series data to be processed is estimated based on the second teacher data in which the range in which one topic in the series data continues is given a second label indicating the topic in the range. A learning device that learns 2 models.
 (付記項3)
 コンピュータによって実行可能なプログラムを記憶した非一時的記憶媒体であって、前記コンピュータを付記項1に記載の推定装置として機能させる、プログラムを記憶した非一時的記憶媒体。
(Appendix 3)
A non-temporary storage medium that stores a program that can be executed by a computer, which stores the program and causes the computer to function as the estimation device according to the appendix 1.
 (付記項4)
 コンピュータによって実行可能なプログラムを記憶した非一時的記憶媒体であって、前記コンピュータを付記項2に記載の学習装置として機能させる、プログラムを記憶した非一時的記憶媒体。
(Appendix 4)
A non-temporary storage medium that stores a program that can be executed by a computer, and that causes the computer to function as the learning device according to the second item.
 本明細書に記載された全ての文献、特許出願および技術規格は、個々の文献、特許出願、および技術規格が参照により取り込まれることが具体的かつ個々に記載された場合と同程度に、本明細書中に参照により取り込まれる。 All documents, patent applications and technical standards described herein are to the same extent as if specifically and individually stated that the individual documents, patent applications and technical standards are incorporated by reference. Incorporated by reference in the specification.
 上述した推定装置30,30a,30b,30c,30dおよび学習装置70の各部として機能させるためにコンピュータを好適に用いることが可能である。そのようなコンピュータは、推定装置30,30a,30bの各部の機能を実現する処理内容を記述したプログラムを該コンピュータの記憶部に格納しておき、該コンピュータのプロセッサによってこのプログラムを読み出して実行させることで実現することができる。すなわち、プログラムは、コンピュータを、上述した推定装置30,30a,30b,30c,30dおよび学習装置70として機能させることができる。 A computer can be suitably used to function as each part of the estimation device 30, 30a, 30b, 30c, 30d and the learning device 70 described above. Such a computer stores a program describing processing contents that realize the functions of the estimation devices 30, 30a, and 30b in the storage unit of the computer, and the processor of the computer reads and executes the program. It can be realized by. That is, the program can make the computer function as the estimation device 30, 30a, 30b, 30c, 30d and the learning device 70 described above.
 また、このプログラムは、コンピュータ読取り可能媒体に記録されていてもよい。コンピュータ読取り可能媒体を用いれば、コンピュータにインストールすることが可能である。ここで、プログラムが記録されたコンピュータ読取り可能媒体は、非一過性の記録媒体であってもよい。非一過性の記録媒体は、特に限定されるものではないが、例えば、CD-ROMやDVD-ROMなどの記録媒体であってもよい。また、このプログラムは、ネットワークを介して提供することも可能である。 Further, this program may be recorded on a computer-readable medium. It can be installed on a computer using a computer-readable medium. Here, the computer-readable medium on which the program is recorded may be a non-transient recording medium. The non-transient recording medium is not particularly limited, but may be, for example, a recording medium such as a CD-ROM or a DVD-ROM. This program can also be provided via a network.
 本開示は、上述した各実施形態で特定された構成に限定されず、請求の範囲に記載した発明の要旨を逸脱しない範囲内で種々の変形が可能である。例えば、各構成部などに含まれる機能などは論理的に矛盾しないように再配置可能であり、複数の構成部などを1つに組み合わせたり、或いは分割したりすることが可能である。 The present disclosure is not limited to the configuration specified in each of the above-described embodiments, and various modifications can be made without departing from the gist of the invention described in the claims. For example, the functions included in each component can be rearranged so as not to be logically inconsistent, and a plurality of components can be combined or divided into one.
 1,1a  二値分類モデル(第1のモデル)
 2,2a  多値分類モデル(第2のモデル)
 3  推定モデル
 10  学習装置
 11  入力部
 12  二値分類学習部(第1のモデル学習部)
 20  学習装置
 21  入力部
 22  多値ラベル補完部
 23  多値分類学習部(第2のモデル学習部)
 30,30a,30b,30c,30d  推定装置
 31  入力部
 32  判定部
 33  段落推定部
 34,34a,34b  話題推定部
 35  出力部
 36,36b  キーワード抽出部
 37  クラスタリング部
 38  制御部(プロセッサ)
 41  入力部
 42  判定部
 43  話題推定部
 44  段落推定部
 45  出力部
 50  学習データ作成装置
 51  入力部
 52  学習データ作成部
 53  出力部
 61  入力部
 62  推定部
 63  出力部
 521  文出力部
 522  ID付与部
 523  組み合わせ生成部
 524  付与部
 611  文入力部
 621  ID付与部
 622  組み合わせ生成部
 623  切り替わり推定部
 5231  ID取り出し部
 5232  組み合わせ対象ID格納部
 5233  組み合わせ生成ID格納部
 5234  組み合わせID生成部
 5241  正例付与部
 5242  負例付与部
 5243  対象外付与部
 6221  ID取り出し部
 6222  組み合わせ対象ID格納部
 6223  組み合わせ生成ID格納部
 6224  組み合わせID生成部
 110  プロセッサ
 120  ROM
 130  RAM
 140  ストレージ
 150  入力部
 160  表示部
 170  通信インタフェース
 190  バス
 70  学習装置
 71  制御部(プロセッサ)
 
1,1a Binary classification model (first model)
2,2a Multi-value classification model (second model)
3 Estimated model 10 Learning device 11 Input unit 12 Binary classification learning unit (first model learning unit)
20 Learning device 21 Input unit 22 Multi-value label complement 23 Multi-value classification learning unit (second model learning unit)
30, 30a, 30b, 30c, 30d Estimator 31 Input unit 32 Judgment unit 33 Paragraph estimation unit 34, 34a, 34b Topic estimation unit 35 Output unit 36, 36b Keyword extraction unit 37 Clustering unit 38 Control unit (processor)
41 Input unit 42 Judgment unit 43 Topic estimation unit 44 Paragraph estimation unit 45 Output unit 50 Learning data creation device 51 Input unit 52 Learning data creation unit 53 Output unit 61 Input unit 62 Estimating unit 63 Output unit 521 Sentence output unit 522 ID assignment unit 523 Combination generation unit 524 Granting unit 611 Text input unit 621 ID assignment unit 622 Combination generation unit 623 Switching estimation unit 5231 ID extraction unit 5232 Combination target ID storage unit 5233 Combination generation ID storage unit 5234 Combination ID generation unit 5241 Correct example assignment unit 5242 Negative example Granting unit 5243 Non-target granting unit 6221 ID extraction unit 6222 Combination target ID storage unit 6223 Combination generation ID storage unit 6224 Combination ID generation unit 110 Processor 120 ROM
130 RAM
140 Storage 150 Input unit 160 Display unit 170 Communication interface 190 Bus 70 Learning device 71 Control unit (processor)

Claims (10)

  1.  複数の話題を含む対話の系列データを構成する発話または前記発話を分割した分割単位に対して、第1の教師データに基づいて予め学習された第1のモデルを用いて、処理対象の系列データを構成する発話が、話の切り替わりの発話であるか否かを判定する判定部と、
     前記判定の結果に基づき、前記処理対象の系列データにおける、話の切り替わりから次の切り替わりの直前の発話までの段落または前記話の切り替わりから前記対話の終わりの発話までの段落の範囲を推定する段落推定部と、を備える推定装置。
    The series data to be processed using the first model learned in advance based on the first teacher data for the utterance that constitutes the series data of the dialogue including a plurality of topics or the division unit obtained by dividing the utterance. A determination unit that determines whether or not the utterances that make up the utterance are switching utterances.
    Based on the result of the determination, a paragraph that estimates the range of the paragraph from the switching of the talk to the utterance immediately before the next switching or the paragraph from the switching of the talk to the utterance at the end of the dialogue in the series data to be processed. An estimation device including an estimation unit.
  2.  請求項1に記載の推定装置において、
     前記系列データを構成する発話または前記発話を分割した分割単位に対して、第2の教師データに基づいて予め学習された第2のモデルを用いて、前記段落または前記段落に含まれる発話における話題を推定する話題推定部をさらに備える、推定装置。
    In the estimation device according to claim 1,
    The topic in the paragraph or the utterance contained in the paragraph using the second model pre-learned based on the second teacher data for the utterances constituting the series data or the divided units obtained by dividing the utterances. An estimation device further equipped with a topic estimation unit for estimating.
  3.  請求項1に記載の推定装置において、
     前記段落に含まれる発話からキーワードを抽出するキーワード抽出部と、
     前記段落に含まれる発話から抽出されたキーワードに基づき、前記段落または前記段落に含まれる発話における話題を推定する話題推定部と、をさらに備える推定装置。
    In the estimation device according to claim 1,
    A keyword extractor that extracts keywords from the utterances contained in the paragraph,
    An estimation device further comprising a topic estimation unit for estimating a topic in the paragraph or an utterance included in the paragraph based on a keyword extracted from the utterance included in the paragraph.
  4.  請求項3に記載の推定装置において、
     1以上の処理対象の系列データに基づき前記範囲が推定された複数の段落を、類似する段落ごとにクラスタリングするクラスタリング部をさらに備え、
     前記キーワード抽出部は、類似する段落からなるクラスタに含まれる段落のうち、代表の段落に含まれる発話からキーワードを抽出し、
     前記話題推定部は、前記代表の段落に含まれる発話から抽出されたキーワードに基づき、前記代表の段落を含むクラスタを構成する段落における話題を推定する、推定装置。
    In the estimation device according to claim 3,
    Further provided with a clustering unit for clustering a plurality of paragraphs whose range is estimated based on the series data of one or more processing targets for each similar paragraph.
    The keyword extraction unit extracts keywords from the utterances included in the representative paragraph among the paragraphs included in the cluster consisting of similar paragraphs.
    The topic estimation unit is an estimation device that estimates a topic in a paragraph constituting a cluster including the representative paragraph based on a keyword extracted from an utterance included in the representative paragraph.
  5.  請求項1から4のいずれか一項に記載の推定装置において、
     前記発話の分割単位は、前記発話を所定の規則で分割した一の要素または連続する複数の要素からなり、
     前記第1のモデルは、構成する前記要素の数が異なる分割単位を含み、前記分割単位それぞれに対して、前記話の切り替わりであるか否かを示すラベルが付与された学習データを予め学習済みのモデルである、推定装置。
    In the estimation device according to any one of claims 1 to 4.
    The division unit of the utterance consists of one element obtained by dividing the utterance according to a predetermined rule or a plurality of consecutive elements.
    The first model includes training data in which the number of constituent elements is different, and the training data to which a label indicating whether or not the story is switched is attached to each of the division units has been learned in advance. A model of the estimation device.
  6.  請求項5に記載の推定装置において、
     前記処理対象の系列データを構成する発話から、構成する前記要素の数が異なる分割単位を生成し、生成した分割単位ごとに、前記第1のモデルを用いて、話の切り替わりであるか否かを判定する、推定装置。
    In the estimation device according to claim 5,
    From the utterances that make up the series data to be processed, division units with different numbers of constituent elements are generated, and for each generated division unit, whether or not the story is switched using the first model. An estimator to determine.
  7.  複数の話題を含む対話の系列データを構成する発話または前記発話を分割した分割単位に対して、第1の教師データに基づいて予め学習された第1のモデルを用いて、処理対象の系列データを構成する発話が、話の切り替わりの発話であるか否かを判定する判定ステップと、
     前記判定の結果に基づき、前記処理対象の系列データにおける、話の切り替わりから次の切り替わりの直前の発話までの段落または前記話の切り替わりから前記対話の終わりの発話までの段落の範囲を推定する段落推定ステップと、を含む推定方法。
    The series data to be processed using the first model learned in advance based on the first teacher data for the utterance that constitutes the series data of the dialogue including a plurality of topics or the division unit obtained by dividing the utterance. A determination step for determining whether or not the utterances constituting the above are utterances of switching stories, and
    Based on the result of the determination, a paragraph that estimates the range of the paragraph from the switching of the talk to the utterance immediately before the next switching or the paragraph from the switching of the talk to the utterance at the end of the dialogue in the series data to be processed. Estimating steps and estimation methods including.
  8.  複数の話題を含む対話の系列データを構成する発話または前記発話を分割した分割単位に対して、話の切り替わりであるか否かを示す第1のラベルが付与された第1の教師データに基づき、処理対象の系列データを構成する発話が、話の切り替わりの発話であるか否かを判定する第1のモデルを学習する第1のモデル学習部と、
     前記系列データにおける1つの話題が続く範囲に、前記範囲における話題を示す第2のラベルが付与された第2の教師データに基づき、前記処理対象の系列データを構成する発話における話題を推定する第2のモデルを学習する第2のモデル学習部と、を備える学習装置。
    Based on the first teacher data to which the first label indicating whether or not the utterance is switched is given to the utterance that constitutes the series data of the dialogue including a plurality of topics or the divided unit obtained by dividing the utterance. , The first model learning unit that learns the first model that determines whether or not the utterance that constitutes the series data to be processed is the utterance of the switching of the story.
    The topic in the utterance constituting the series data to be processed is estimated based on the second teacher data in which the range in which one topic in the series data continues is given a second label indicating the topic in the range. A learning device including a second model learning unit for learning two models.
  9.  複数の話題を含む対話の系列データを構成する発話または前記発話を分割した分割単位に対して、話の切り替わりであるか否かを示す第1のラベルが付与された第1の教師データに基づき、処理対象の系列データを構成する発話が、話の切り替わりの発話であるか否かを判定する第1のモデルを学習する第1の学習ステップと、
     前記系列データにおける1つの話題が続く範囲に、前記範囲における話題を示す第2のラベルが付与された第2の教師データに基づき、前記処理対象の系列データを構成する発話における話題を推定する第2のモデルを学習する第2の学習ステップと、を含む学習方法。
    Based on the first teacher data to which the first label indicating whether or not the utterance is switched is given to the utterance that constitutes the series data of the dialogue including a plurality of topics or the divided unit obtained by dividing the utterance. , The first learning step to learn the first model for determining whether or not the utterance constituting the series data to be processed is the utterance of the switching of the talk.
    The topic in the utterance constituting the series data to be processed is estimated based on the second teacher data in which the range in which one topic in the series data continues is given a second label indicating the topic in the range. A learning method comprising a second learning step of learning two models.
  10.  コンピュータを、請求項1から6のいずれか一項に記載の推定装置として動作させるプログラム。 A program for operating a computer as the estimation device according to any one of claims 1 to 6.
PCT/JP2021/012692 2020-06-16 2021-03-25 Estimation device, estimation method, learning device, learning method and program WO2021256043A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP2022532313A JP7425368B2 (en) 2020-06-16 2021-03-25 Estimation device, estimation method, learning device, learning method and program

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
JPPCT/JP2020/023644 2020-06-16
PCT/JP2020/023644 WO2021255840A1 (en) 2020-06-16 2020-06-16 Estimation method, estimation device, and program

Publications (1)

Publication Number Publication Date
WO2021256043A1 true WO2021256043A1 (en) 2021-12-23

Family

ID=79267817

Family Applications (2)

Application Number Title Priority Date Filing Date
PCT/JP2020/023644 WO2021255840A1 (en) 2020-06-16 2020-06-16 Estimation method, estimation device, and program
PCT/JP2021/012692 WO2021256043A1 (en) 2020-06-16 2021-03-25 Estimation device, estimation method, learning device, learning method and program

Family Applications Before (1)

Application Number Title Priority Date Filing Date
PCT/JP2020/023644 WO2021255840A1 (en) 2020-06-16 2020-06-16 Estimation method, estimation device, and program

Country Status (2)

Country Link
JP (1) JP7425368B2 (en)
WO (2) WO2021255840A1 (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2010100853A1 (en) * 2009-03-04 2010-09-10 日本電気株式会社 Language model adaptation device, speech recognition device, language model adaptation method, and computer-readable recording medium
JP2012247912A (en) * 2011-05-26 2012-12-13 Chubu Electric Power Co Inc Speech signal processing apparatus
JP2018045639A (en) * 2016-09-16 2018-03-22 株式会社東芝 Dialog log analyzer, dialog log analysis method, and program
JP2018128575A (en) * 2017-02-08 2018-08-16 日本電信電話株式会社 End-of-talk determination device, end-of-talk determination method and program
JP2019053126A (en) * 2017-09-13 2019-04-04 株式会社日立製作所 Growth type interactive device

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2010100853A1 (en) * 2009-03-04 2010-09-10 日本電気株式会社 Language model adaptation device, speech recognition device, language model adaptation method, and computer-readable recording medium
JP2012247912A (en) * 2011-05-26 2012-12-13 Chubu Electric Power Co Inc Speech signal processing apparatus
JP2018045639A (en) * 2016-09-16 2018-03-22 株式会社東芝 Dialog log analyzer, dialog log analysis method, and program
JP2018128575A (en) * 2017-02-08 2018-08-16 日本電信電話株式会社 End-of-talk determination device, end-of-talk determination method and program
JP2019053126A (en) * 2017-09-13 2019-04-04 株式会社日立製作所 Growth type interactive device

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
MIMURA, MASATO ET AL.: "Automatic Indexing of Speakers and Topics for Panel Discussion Speech", IPSJ SIG TECHNICAL REPORT, vol. 96, no. 55, 28 May 1996 (1996-05-28), pages 13 - 18 *
TAKAAKI HASEGAWA: "Automatic Knowledge Assistance System Supporting Operator Responses", NTT TECHNICAL REVIEW, vol. 17, no. 9, 1 September 2019 (2019-09-01), pages 15 - 18, XP055874245 *

Also Published As

Publication number Publication date
JPWO2021256043A1 (en) 2021-12-23
JP7425368B2 (en) 2024-01-31
WO2021255840A1 (en) 2021-12-23

Similar Documents

Publication Publication Date Title
US10642889B2 (en) Unsupervised automated topic detection, segmentation and labeling of conversations
CN108153800B (en) Information processing method, information processing apparatus, and recording medium
US7634406B2 (en) System and method for identifying semantic intent from acoustic information
JP4728972B2 (en) Indexing apparatus, method and program
CN104598644B (en) Favorite label mining method and device
Halibas et al. Application of text classification and clustering of Twitter data for business analytics
CN107305541A (en) Speech recognition text segmentation method and device
US11232266B1 (en) Systems and methods for generating a summary of a multi-speaker conversation
JP7060027B2 (en) FAQ maintenance support device, FAQ maintenance support method, and program
KR20150101341A (en) Apparatus and method for recommending movie based on distributed fuzzy association rules mining
CN111462761A (en) Voiceprint data generation method and device, computer device and storage medium
JP6208794B2 (en) Conversation analyzer, method and computer program
CN113342955A (en) Question and answer sentence processing method and device and electronic equipment
CN110413998B (en) Self-adaptive Chinese word segmentation method oriented to power industry, system and medium thereof
JP4325370B2 (en) Document-related vocabulary acquisition device and program
CN113988195A (en) Private domain traffic clue mining method and device, vehicle and readable medium
CN108899016B (en) Voice text normalization method, device and equipment and readable storage medium
WO2021256043A1 (en) Estimation device, estimation method, learning device, learning method and program
CN116702736A (en) Safe call generation method and device, electronic equipment and storage medium
US11580737B1 (en) Search results within segmented communication session content
JP6545633B2 (en) Word score calculation device, word score calculation method and program
CN111611394B (en) Text classification method and device, electronic equipment and readable storage medium
CN114266240A (en) Multi-intention identification method and device based on robot
CN113934833A (en) Training data acquisition method, device and system and storage medium
CN109241993B (en) Evaluation object emotion classification method and device integrating user and overall evaluation information

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21824906

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2022532313

Country of ref document: JP

Kind code of ref document: A

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21824906

Country of ref document: EP

Kind code of ref document: A1