WO2021256043A1 - Dispositif d'estimation, procédé d'estimation, dispositif d'apprentissage, procédé d'apprentissage et programme - Google Patents

Dispositif d'estimation, procédé d'estimation, dispositif d'apprentissage, procédé d'apprentissage et programme Download PDF

Info

Publication number
WO2021256043A1
WO2021256043A1 PCT/JP2021/012692 JP2021012692W WO2021256043A1 WO 2021256043 A1 WO2021256043 A1 WO 2021256043A1 JP 2021012692 W JP2021012692 W JP 2021012692W WO 2021256043 A1 WO2021256043 A1 WO 2021256043A1
Authority
WO
WIPO (PCT)
Prior art keywords
utterance
unit
topic
paragraph
estimation
Prior art date
Application number
PCT/JP2021/012692
Other languages
English (en)
Japanese (ja)
Inventor
隆明 長谷川
節夫 山田
和之 磯
正之 杉崎
Original Assignee
日本電信電話株式会社
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 日本電信電話株式会社 filed Critical 日本電信電話株式会社
Priority to JP2022532313A priority Critical patent/JP7425368B2/ja
Publication of WO2021256043A1 publication Critical patent/WO2021256043A1/fr

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/60Information retrieval; Database structures therefor; File system structures therefor of audio data
    • G06F16/65Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/10Speech classification or search using distance or distortion measures between unknown speech and reference templates

Definitions

  • the range of the paragraph from the change of story (separation) to the utterance immediately before the next change or the paragraph from the change of story to the utterance at the end of the dialogue is effective to estimate. If the range of a paragraph can be estimated, the topic can be estimated by limiting the range to the utterances included in the paragraph, so that the topic can be estimated with higher accuracy.
  • the present disclosure is a paragraph from one story change to the utterance immediately before the next switch or a paragraph from the story switch to the end of the dialogue in a series of dialogue data containing multiple topics, such as a dialogue between an operator and a customer. Regarding the estimation of the range of and the estimation of the topic in the paragraph.
  • a binary label (switching label) indicating whether or not the utterance is switched, which is given to the utterance constituting the series data or the division unit obtained by dividing the utterance, is input.
  • the binary label is, for example, "1 (switching story)” or “0 (not switching story)", or "True (switching story)” or “False (not switching story)”. Labels such as. Further, if the utterance or its division unit is given some label indicating the change of the story, the input unit 11 considers it as “True (change of the story)” and some label indicating the change of the story. If is not given, it may be regarded as "False (not a change of utterance)".
  • the binary label is manually attached to the utterances that make up the series data or their division units in advance. As mentioned above, there are words and phrases that are often spoken at the transition of the story. Binary labels are given, for example, based on these terms. For example, taking the failure of a device as an example, when it is desired to classify whether or not the topic is related to the failure of the device, the topic of the utterance regarding the failure of the device is "device failure" regardless of the cause. On the other hand, if you want to classify topics according to the cause of the failure, the topic will be different for each cause of the failure. Therefore, depending on how the topic to be classified is decided, the topic may not be switched even if the story is divided.
  • the multi-valued label complementing unit 22 also assigns a multi-valued label indicating a topic in the range including the utterance to such an utterance or a division unit thereof. By doing so, it is possible to increase the teacher data of utterances related to each topic and improve the accuracy of topic estimation.
  • the multi-valued label complementing unit 22 outputs the utterance to which the multi-valued label is attached or the division unit thereof and the multi-valued label assigned to the utterance or the division unit to the multi-value classification learning unit 23.
  • the multi-value classification learning unit 23 uses the utterance or its division unit output from the multi-value label complement unit 22 and the multi-value label given to the utterance or division unit as teacher data (second teacher data). , Multi-value classification model 2 (second model) is learned. Therefore, the multi-value classification model 2 is a model learned in advance based on the teacher data (second teacher data) for the utterances constituting the series data or the division units thereof.
  • the teacher data used for learning the multi-valued classification model 2 is given a binary label indicating that the utterance of the story change or its division unit is a story change, and the range in which the topic continues and its division unit.
  • the input unit 31 inputs series data including a plurality of topics.
  • the series data input to the input unit 31 is data to be processed that is the target of estimation of the paragraph range and the topic in the paragraph.
  • the series data is, for example, text data in which time-series utterances of an operator and a customer are voice-recognized.
  • the input unit 31 may sequentially input the text data obtained by the voice recognition of each utterance during the dialogue. Further, when the series data is input offline, the input unit 31 may sort by the start time or the end time of each utterance during the dialogue and input the text data of each utterance.
  • the input unit 31 outputs the input series data to the determination unit 32.
  • the topic estimation unit 34 uses the multi-value classification model 2 (second model) to estimate the topic in the paragraph or the utterance contained in the paragraph whose range is estimated by the paragraph estimation unit 33.
  • the multi-value classification model 2 is pre-learned based on the teacher data to which the utterances constituting the series data or the division units thereof are given multi-value labels indicating the topics to which the utterances are related. It is a model.
  • the teacher data used for learning the multi-valued classification model 2 is given a binary label indicating that the utterance of the story change or its division unit is a story change, and the range in which the topic continues and the range thereof.
  • the topic in is generated using the identified series data.
  • the teacher data used for learning the multi-valued classification model 2 includes the utterance in the above-mentioned series data in the utterance or the division unit thereof to which the binary label indicates that the utterance is a change of story. It is generated by adding a multi-valued label indicating the topic in the range.
  • the output unit 35 outputs the utterances constituting the paragraph for each paragraph whose range is estimated in the series data. Further, the output unit 35 may output a multi-valued label indicating a topic in the paragraph, the disclosure time and the end time of the paragraph, and the like.
  • FIG. 4 is a diagram showing a configuration example of an estimation device 30a for estimating a topic without using the multi-value classification model 2 according to the present embodiment.
  • the same components as those in FIG. 3 are designated by the same reference numerals, and the description thereof will be omitted.
  • the keyword extraction unit 36 extracts at least one keyword from the utterances included in the paragraph whose range is estimated by the paragraph estimation unit 33. Any method can be used as the method for extracting keywords, and for example, an existing method such as tf-idf (Term Frequency-Inverse Document Frequency) can be used.
  • the number of keywords extracted by the keyword extraction unit 36 may be limited to a predetermined number in advance, or may be specified by the user.
  • the topic estimation unit 34a estimates the topic in the paragraph or the utterance contained in the paragraph based on the keywords extracted from the utterance included in the paragraph by the keyword extraction unit 36.
  • the topic estimation unit 34a may, for example, estimate the extracted keyword as a paragraph or a topic in an utterance contained in the paragraph. Further, the topic estimation unit 34a may estimate, for example, a topic having a high similarity to the extracted keyword from a plurality of predetermined topics as a paragraph or a topic in the utterance included in the paragraph.
  • the estimation device 30a shown in FIG. 4 it is possible to estimate the topic in the paragraph or the utterance contained in the paragraph without using the multi-value classification model 2. Therefore, even when it is difficult to prepare a range of topics and a large amount of teacher data in which the topics in the range are specified, it is possible to estimate the topics in the series data.
  • FIG. 5 is a diagram showing a configuration example of the estimation device 30b according to the present embodiment. Like the estimation device 30a shown in FIG. 4, the estimation device 30b shown in FIG. 5 estimates the topic without using the multi-value classification model 2.
  • the same components as those in FIG. 4 are designated by the same reference numerals, and the description thereof will be omitted.
  • the keyword extraction unit 36b extracts keywords from the utterances included in the representative paragraph determined by the clustering unit 37 among the paragraphs constituting the cluster.
  • the topic estimation unit 34b estimates the topic in the paragraph constituting the cluster based on the keywords extracted by the keyword extraction unit 36b from the utterances included in the paragraph representing the cluster. Specifically, the topic estimation unit 34b estimates a topic estimated based on a keyword extracted from an utterance included in a paragraph representing a cluster as a topic in all paragraphs constituting the cluster.
  • FIGS. 3 to 5 the description has been made using an example in which series data in which a plurality of utterances are arranged in chronological order is input, but the present disclosure is not limited to this.
  • a function unit for extracting the utterances one by one from the series data may be provided in front of the input unit 31.
  • the multi-valued label complementing unit 22 reads the utterances to which the multi-valued label indicating the topic and the binary label indicating the switching of the talk are attached one by one from the series data input to the input unit 21 (step S11). ..
  • the multi-valued label is given only to the first utterance in the range indicating the topic, and is not given to other utterances.
  • the binary label indicating the change of talk is given only to the utterance showing the change of talk, and is not given to other utterances.
  • the multi-valued label complementing unit 22 determines whether or not a multi-valued label indicating a topic is attached to the read utterance (step S12).
  • step S12 When it is determined that the multi-valued label is not attached (step S12: No), or when the multi-valued label attached to the read utterance is updated and stored, the multi-valued label complementing unit 22 adds the read utterance to the read utterance. , It is determined whether or not a binary label indicating that the utterance is switched is attached (step S14).
  • the multi-value label complementing unit 22 stores the multi-value label stored in the multi-value label temporary storage device. It is given to the read utterance (step S15). As described above, when the read utterance is given a binary label indicating that the dialogue is switched, the multi-valued label complementing unit 22 indicates a multi-valued topic in the series data in the range including the utterance. Give a value label.
  • step S14 When it is determined that the binary label indicating that the talk is switched is not given (step S14: No), or when the read utterance is given a multi-value label, the multi-value label complementing unit 22 reads. It is determined whether or not the utterance is the utterance at the end of the dialogue (step S16).
  • step S16 When it is determined that the read utterance is the utterance at the end of the dialogue (step S16: Yes), the multi-value label complementing unit 22 ends the process.
  • step S16 When it is determined that the read utterance is not the utterance at the end of the dialogue (step S16: No), the multi-value label complementing unit 22 returns to the process of step S11 and reads the next utterance.
  • the multi-valued label is given to only the first utterance in the range indicating the topic, and is not given to other utterances. However, all the utterances in the range indicating the topic are given in advance. May be labeled with a multi-valued label for that topic. In this case, if the multi-valued label is deleted from the utterances that are not given the binary label indicating the change of story, the multi-valued label indicating the topic is given only to the utterances that are given the binary label indicating the change of story. Label.
  • any method may be used as long as a multi-valued label indicating the topic is attached to the utterance of the change of story.
  • FIG. 7 is a flowchart showing an example of the operation of the estimation device 30, and is a diagram for explaining an estimation method by the estimation device 30.
  • the determination unit 32 reads the utterances one by one from the series data of the processing target input to the input unit 31 (step S21).
  • the determination unit 32 uses the binary classification model 1 to determine whether or not the read utterance is a talk switching utterance (step S22).
  • step S23 When it is determined that the read utterance is not the utterance of the switching of the talk and the read utterance is not the utterance at the end of the dialogue (step S23: No), the paragraph estimation unit 33 determines the read utterance. , Accumulate as utterances constituting the paragraph (step S24). When the read utterances are accumulated, the process is repeated from step S21.
  • the paragraph estimation unit 33 estimates that the range of the accumulated utterances is a paragraph, and the accumulated utterances are used as the utterances constituting the paragraph, and the topic estimation unit 34 Output to.
  • the topic estimation unit 34 estimates the topic in the paragraph whose range has been estimated by the paragraph estimation unit 33 using the multi-value classification model 2 (step S26).
  • the topic estimation unit 34 may estimate the topic in at least one utterance unit included in the paragraph. In this case, the topic estimation unit 34 may estimate the topic using only the first utterance of the paragraph, or may estimate the topic using a predetermined number of utterances from the first utterance of the paragraph. ..
  • the multi-value classification model 2 is learned based on teacher data to which a multi-value label is attached to each unit for estimating a topic.
  • the topic estimation unit 34 attaches a multi-valued label indicating the estimated topic to the paragraph (step S27).
  • the paragraph estimation unit 33 resets the accumulation of utterances (step S28), and determines whether or not the read utterance is the utterance at the end of the dialogue (step S29).
  • step S29: No the paragraph estimation unit 33 returns to the process of step S24 and accumulates the read utterance. By doing this, the read utterance is accumulated as the first utterance of a new paragraph.
  • step S29: Yes When it is determined that the read utterance is the utterance at the end of the dialogue (step S29: Yes), the paragraph estimation unit 33 ends the process.
  • the estimation method by the estimation device 30 includes a determination step (step S22) and a paragraph estimation step (steps S23 to S25).
  • the determination step teacher data to which a binary label (first label) indicating whether or not the utterance is switched is given to the utterance or its division unit constituting the series data of the dialogue including a plurality of topics.
  • the utterances constituting the series data to be processed using the binary classification model 1 (first model) learned in advance based on (first teacher data) are utterances of switching stories. To judge.
  • the paragraph estimation step based on the result of the determination, the range of the paragraph from the talk switching to the utterance immediately before the next switching or the paragraph from the talk switching to the utterance at the end of the dialogue is estimated in the series data to be processed. ..
  • the estimation method according to the present embodiment may further include a topic estimation step (step S26).
  • a topic estimation step many pre-learned based on teacher data to which a multi-valued label (second label) indicating a topic related to the utterance is given to the utterances constituting the series data or the division unit thereof.
  • the value classification model 2 (second model) is used to estimate the topic in the paragraph or the utterance contained in the paragraph. By estimating the range of the paragraph, the topic can be estimated only for the utterances included in the paragraph, so that the estimation accuracy of the topic can be improved.
  • FIG. 8 is a flowchart showing an example of the operation of the estimation device 30a shown in FIG. 4, and is a diagram for explaining an estimation method by the estimation device 30a.
  • the same processing as in FIG. 7 is designated by the same reference numerals, and the description thereof will be omitted.
  • the paragraph estimation unit 33 estimates that the range of the accumulated utterances is a paragraph, and outputs the accumulated utterances to the keyword extraction unit 36.
  • the keyword extraction unit 36 extracts keywords from the utterances included in the paragraph whose range is estimated by the paragraph estimation unit 33 (step S31).
  • the topic estimation unit 34a estimates the topic in the paragraph or the utterance included in the paragraph based on the keyword extracted by the keyword extraction unit 36 from the utterance included in the paragraph (step S32).
  • the estimation method by the estimation device 30a includes a keyword extraction step (step S31) and a topic estimation step (step S32).
  • keyword extraction step keywords are extracted from the utterances contained in the paragraph whose range is estimated.
  • topic estimation step the topic in the paragraph or the utterance contained in the paragraph is estimated based on the keywords extracted from the utterance contained in the paragraph.
  • FIG. 9 is a flowchart showing an example of the operation of estimating the range of the paragraph by the estimation device 30b shown in FIG. 5, and is a diagram for explaining the estimation method by the estimation device 30b.
  • the same processing as in FIG. 7 is designated by the same reference numerals, and the description thereof will be omitted.
  • step S25 Yes
  • the paragraph estimation unit 33 estimates that the range of the accumulated utterances is a paragraph. Then, the paragraph estimation unit 33 resets the accumulation of utterances (step S28).
  • FIG. 10 is a flowchart showing an example of the operation of estimating a topic by the estimation device 30b shown in FIG. 5, and is a diagram for explaining an estimation method by the estimation device 30b.
  • the clustering unit 37 reads the paragraph whose range has been estimated by the paragraph estimation unit 33 (step S41).
  • the clustering unit 37 reads a plurality of paragraphs contained in at least one or more series data. That is, the clustering unit 37 repeats the process of step S41 as many times as necessary.
  • the clustering unit 37 clusters a plurality of read paragraphs for each similar paragraph (step S42).
  • the clustering unit 37 determines whether or not there are unprocessed clusters (step S43).
  • An unprocessed cluster is a cluster in which paragraphs contained in the cluster are not given multi-value labels.
  • the clustering unit 37 determines one of the unprocessed clusters as the cluster to be processed, and the paragraph included in the cluster to be processed is included. A representative paragraph is determined from the inside (step S44). The clustering unit 37 determines, for example, the paragraph at the center of the cluster as the representative paragraph.
  • the keyword extraction unit 36b extracts keywords from the utterances included in the representative paragraph of the cluster determined by the clustering unit 37 (step S45).
  • the topic estimation unit 34b estimates the topic in the paragraph representing the cluster based on the keywords extracted by the keyword extraction unit 36b (step S46). Next, the topic estimation unit 34b determines whether or not there is an unprocessed paragraph (step S47).
  • the unprocessed paragraph is a paragraph included in the cluster to be processed that is not given a multi-value label.
  • step S47 When it is determined that there is an unprocessed paragraph (step S47: No), the topic estimation unit 34b estimates the unprocessed paragraph included in the cluster based on the keyword extracted from the representative paragraph of the cluster. Is given a multi-valued label indicating (step S48). Then, the topic estimation unit 34b returns to the process of step S47.
  • step S47: Yes the process is repeated from step S43.
  • the estimation method by the estimation device 30b further includes a clustering step (step S42).
  • a clustering step a plurality of paragraphs whose range is estimated based on one or a plurality of series data are clustered for each similar paragraph.
  • keywords are extracted from the utterances included in the representative paragraph among the paragraphs included in the cluster consisting of similar paragraphs.
  • the topic estimation step the topic in the paragraphs constituting the cluster including the representative paragraph is estimated based on the keywords extracted from the utterances included in the representative paragraph.
  • model learning (binary classification model 1 and multi-value classification model 2) will be described using a specific example shown in FIG. In the following, it is assumed that the series data includes five topics, "topic A”, “topic B”, “topic C”, “topic D”, and "topic E”.
  • the range in which one topic continues and the topic in that range are manually specified, and for each range in which one topic continues, the topic in that range.
  • a multi-valued label indicating is manually attached to the utterances constituting the series data.
  • the utterance is the utterance of the talk change only for the utterance of the talk change.
  • a binary flag indicating that the utterances are switched is given to the utterances that are switched. Therefore, in FIG. 11, for example, an utterance existing in the middle of the range in which the utterance related to the topic A continues may be given a binary label indicating that the utterance is a change of talk.
  • the above-mentioned series data and binary label are input to the learning device 10, and the binary classification model 1 is trained using LSTM or the like based on the input series data and binary label.
  • the above-mentioned series data, binary label and multi-value label are input to the learning device 20.
  • the multi-valued label is complemented. That is, as shown in FIG. 11, for an utterance to which a label indicating that the utterance is switched is given, a multi-valued label indicating a topic in the range of series data including the utterance is given.
  • teacher data is created with a multi-valued label indicating the topic to which the utterance is related to the utterances constituting the series data.
  • a multi-valued label indicating a topic related to the utterance may be attached to the division unit of the utterance constituting the series data.
  • the multi-value classification model 2 is learned using LSTM or the like. In the learning of the multi-value classification model 2, only the utterances with the multi-value label may be learned, or the utterances of the entire paragraph including the utterances with the multi-value label may be learned.
  • FIG. 12 is a diagram showing an example of topic estimation by the estimation device 30 shown in FIG. In FIG. 12, it is assumed that the multi-valued classification model 2 is learned in utterance units.
  • the estimation device 30 When the series data of one dialogue is input to the estimation device 30, as shown in FIG. 12, whether or not the utterances constituting the series data are the utterances of switching of the talks using the binary classification model 1. It is judged. Then, the range from the utterance of the change of story to the utterance immediately before the utterance of the change of the next story or the utterance of the change of story to the utterance at the end of the dialogue is estimated to be one paragraph.
  • FIG. 14 is a diagram showing an example of topic estimation by the estimation device 30a shown in FIG.
  • the estimation device 30a When the series data of one dialogue is input to the estimation device 30a, as shown in FIG. 14, whether or not the utterance constituting the series data is the utterance of switching of the talk using the binary classification model 1. It is judged. Then, the range from the utterance of the change of story to the utterance immediately before the utterance of the change of the next story is estimated to be one paragraph.
  • FIG. 14 shows an example in which different multi-value labels (“Topic 1” to “Topic 10”) are assigned to each paragraph, but these are necessarily different topics. Do not mean.
  • the estimation device 30b When the series data of one or more dialogues is input to the estimation device 30b, as shown in FIG. 15, is the utterance constituting the series data the utterance of the switching of the talks using the binary classification model 1? It is judged whether or not. Then, the range from the utterance of the change of story to the utterance immediately before the utterance of the change of the next story is estimated to be one paragraph.
  • FIG. 15 a plurality of paragraphs whose range is estimated are clustered for each similar paragraph.
  • a representative paragraph is determined from a cluster of similar paragraphs, and keywords are extracted from the utterances contained in the representative paragraph.
  • the paragraph shown by the thick line indicates the representative paragraph.
  • the topic in the representative paragraph is estimated based on the keywords extracted from the utterances included in the representative paragraph of the cluster, and a multi-valued label indicating the estimated topic is given to the representative paragraph. Further, as shown in FIG. 15, other paragraphs constituting the cluster are also given the same multi-valued label as the representative paragraph of the cluster.
  • this method In order to show the effectiveness of the estimation method according to this embodiment (hereinafter, may be referred to as "this method"), a comparison with the conventional method was made by experiment. In the experiment, 349 calls were used for learning the model and 50 calls were used for verification. As multi-valued labels indicating a topic, eight types of labels indicating a topic A to a topic H and a fixed topic S from the first utterance of a call to the switching of the first talk are prepared. In the conventional method, a binary classification model is learned by using data in which a binary label indicating whether or not an utterance is a change of talk is given only to an utterance in which a multi-value label is switched as teacher data, and a multi-value label is used. This is a method of learning a multi-valued classification model using only switching utterances as teacher data.
  • the range of paragraphs is estimated by including the utterances that transition from a certain topic to the same topic in the utterances that change the story. Therefore, as shown in Table 1, in this method, the precision rate is lower than that in the conventional method. However, in this method, it has become possible to detect paragraphs and utterances of story switching that could not be detected by the conventional method, so that the recall rate of paragraph division has increased.
  • the multi-value classification model is generated by learning the teacher data to which the multi-value label indicating the topic in the utterance is manually attached only to the utterance in which the multi-value label is switched.
  • the multi-value classification model 2 was generated by learning the teacher data supplemented with the multi-value label for the utterance to which the label indicating that the story was switched was manually assigned.
  • the utterance is a switching utterance by the conventional method and the binary classification model learned by this method.
  • the topic in the utterance was estimated and compared with the topic of the correct answer given manually to the utterance.
  • the results of the comparison are shown in Table 2.
  • the results (F value) of the classification of all utterance topics were evaluated in the 100 calls targeted for evaluation.
  • This evaluation is a comprehensive evaluation of the determination of utterances of story switching by the binary classification model and the estimation of topics by the multi-value classification model.
  • the multi-value classification model 2 determines that an utterance that transitions from a certain topic to the same topic is also a utterance that switches the story, but the multi-value classification model 2 determines that the transition to the same topic. Many of the utterances were classified as correct topics. Therefore, as shown in Table 3, the overall evaluation result of this method was higher than that of the conventional method.
  • the estimation device 30 includes a determination unit 32 and a paragraph estimation unit 33.
  • the determination unit 32 is a teacher data (first teacher) to which a binary label indicating whether or not the talk is switched is given to the utterance or the division unit thereof constituting the series data of the dialogue including a plurality of topics.
  • the binary classification model 1 (first model) learned in advance based on the data) it is determined whether or not the utterance constituting the series data to be processed is the utterance of the switching of the talk.
  • the paragraph estimation unit 33 is a paragraph in the series data to be processed from the change of story to the utterance immediately before the next change, or the paragraph from the change of story to the end of the dialogue. Estimate the range of.
  • the utterance that constitutes the series data is the utterance of the change of story by learning the teacher data to which the binary label indicating whether or not the change of the story is given to the utterance or its division unit. It is possible to generate a binary classification model 1 for determining whether or not. Then, based on the result of the determination of the binary classification model 1, the range of paragraphs in the series data can be estimated. Further, by estimating the range of the paragraph in the series data, the range of estimating the topic can be limited to the utterances included in the paragraph, so that the accuracy of estimating the topic in the paragraph can be improved.
  • a method called Text Tiling is known as a method of dividing series data into sections of topics that are objectively classified with respect to a series of series data (see, for example, Reference 1).
  • the text is divided at the minimum point of the degree of cohesion based on the cohesiveness of words in the vicinity of the text.
  • Topic Tiling that divides text using Latent Dirichlet Allocation (LDA), which is a representative topic model, has also been proposed (see Reference 2).
  • Subjective topics are topics that are categorized, for example, from the perspective of isolating the cause of the customer's inability to use a particular service, or from the perspective of interviewing the needs or desires of the operator to the customer on a sales call. be.
  • the same keywords as service names, product names, or related vocabularies appear everywhere in the dialogues, so even if the content is a topic that you want to distinguish subjectively, it is superficial and objective. Indistinguishable topics make up the majority of the dialogue. Therefore, the methods described in References 1 and 2 cannot accurately divide and classify dialogues for each subjective topic.
  • the utterance itself is short, and there are some utterances in which it is not possible to uniquely determine which topic the utterance belongs to. Such utterances will be labeled with a topic that is different from the original topic. In a model that trains teacher data with a label different from the original topic, the accuracy of classification is reduced. Therefore, in the method described in Reference 3, it is difficult to appropriately classify each utterance including a short conversation input in chronological order by a subjective topic.
  • the estimation device 30c according to the present embodiment determines whether or not the utterance constituting the series data or the division unit thereof is a topic switching, and estimates the range of the paragraph based on the determination result.
  • FIG. 16 is a diagram showing a configuration example of the estimation device 30c according to the present embodiment.
  • the estimation device 30c includes an input unit 41, a determination unit 42, a topic estimation unit 43, a paragraph estimation unit 44, and an output unit 45.
  • the input unit 41 inputs the series data of the dialogue including a plurality of topics.
  • the series data input to the input unit 41 is data to be processed that is the target of estimation of the range of paragraphs and topics in paragraphs.
  • the series data is, for example, text data in which time-series utterances of an operator and a customer are voice-recognized.
  • the input unit 41 may sequentially input the text data obtained by the voice recognition of each utterance during the dialogue. Further, when the series data is input offline, the input unit 41 may sort the start time or end time of each utterance during the dialogue and input the text data of each utterance.
  • the input unit 41 outputs the input series data to the determination unit 42.
  • the determination unit 42 uses the binary classification model 1a to determine whether or not the utterance constituting the series data output from the input unit 41 is a topic switching utterance.
  • the binary classification model 1a is a model learned in advance so as to determine whether or not the topic is switched with respect to the utterance or the division unit thereof constituting the series data of the dialogue.
  • the teacher data to which the binary label (switching label) indicating whether or not the topic is switched is attached to the utterance or the division unit thereof constituting the series data is shown in FIG. It can be created by learning with the learning device 10 described with reference to.
  • the determination unit 42 determines from the determination result using the binary classification model 1a whether or not the utterance constituting the series data or the division unit thereof is to be processed by the topic estimation unit 43 described later. Specifically, the determination unit 42 determines the utterance or the division unit thereof determined to be the switching of the topic as the processing target by the topic estimation unit 43. The determination unit 42 outputs the determination result of whether or not to be processed by the topic estimation unit 43 to the topic estimation unit 43 and the paragraph estimation unit 44.
  • the topic estimation unit 43 uses the multi-value classification model 2a to set a topic within the range including the utterance for the utterance (the utterance of switching the topic) determined to be processed by the determination unit 42 or the division unit thereof. Give a multi-valued label to indicate.
  • the multi-value classification model 2a is a model for estimating a topic in a range including the utterance with respect to the utterance or its division unit.
  • the teacher data to which the multi-value label (topic label) indicating the topic to which the utterance is related is attached to the utterance or the division unit thereof constituting the series data is referred to with reference to FIG. It can be created by learning with the learning device 20 described above.
  • the utterances of the topic switching may be performed, and the learning about the transition of the topic may be performed only for the utterances to which the multi-value label is attached.
  • the utterances between the utterance of the topic change and the utterance of the next topic change from the learning target noise for the topic classification can be removed.
  • the topic estimation unit 43 stores the topic estimation result (multi-valued label corresponding to the estimated topic) in the label information table.
  • the label information table is an area for storing the estimation result of the topic for the data to be processed, and may be a memory on a computer, a database, or a file.
  • the paragraph estimation unit 44 estimates that the range from the utterance determined to be processed by the determination unit 42 (the utterance of the topic change) to the utterance immediately before the next utterance determined to be processed is the range of one paragraph. do.
  • the paragraph estimation unit 44 attaches a multi-valued label stored in the label information table to the utterance included in the paragraph whose range is estimated. Specifically, the paragraph estimation unit 44 describes the utterances from the utterance of the topic change to the utterance immediately before the utterance of the next topic change, and the utterance of the topic change stored in the label information table. Give the given multi-valued label.
  • the output unit 45 outputs the utterances constituting the paragraph for each paragraph whose range is estimated in the series data. Further, the output unit 35 may output a multi-valued label indicating a topic in the paragraph, a paragraph start time, an end time, and the like.
  • a morphological analysis unit that performs morphological analysis for text chat may be provided after the input unit 41. Further, when the series data to be processed is input offline, the configuration of the estimation device 30c uses all the results of the determination of whether or not the utterance of the topic is switched and the estimation of the topic at once, and paragraphs. You may estimate the range of. In this case, the paragraph estimation unit 44 sets the utterance in the range from the change of the topic to the utterance immediately before the change of the next topic based on the determination result of whether or not the change of the topic is made and the estimation result of the topic. , The multi-valued label estimated by the topic estimation unit 43 may be attached.
  • FIG. 17 is a flowchart showing an example of the operation of the estimation device 30c according to the present embodiment.
  • the determination unit 42 determines whether or not the dialogue in the series data of the processing target input to the input unit 41 has been completed (step S51).
  • step S51: Yes the estimation device 30c ends the process.
  • the determination unit 42 reads the utterance to be processed (step S52).
  • the determination unit 42 uses the binary classification model 1a to determine whether or not the read utterance is a topic-switching utterance (step S53).
  • step S54 When it is determined that the read utterance is not a topic switching utterance (step S54: No), the process of step S57, which will be described later, is performed.
  • the topic estimation unit 43 estimates the topic of the read utterance using the multi-value classification model 2a (step). S55). The topic estimation unit 43 stores the estimated topic in the label information table and updates the label information table (step S56). That is, the label information table is updated every time the read utterance is a topic switching utterance.
  • the paragraph estimation unit 44 assigns a multi-valued label stored in the label information table to the read utterance (step S57).
  • the label information table is updated every time the read utterance is a topic switching utterance. Therefore, the same multi-valued label is assigned from the utterance of the topic change to the utterance immediately before the utterance of the next topic change, which constitutes one paragraph.
  • the determination unit 42 When a multi-valued label is attached to the read utterance, the determination unit 42 returns to the process of step S51 with the next utterance in the series data as the processing target (step S58).
  • FIG. 18 is a diagram showing an example of topic estimation by the estimation device 30c according to the present embodiment. In FIG. 18, it is assumed that the binary classification model 1a and the multi-value classification model 2a are learned in utterance units.
  • the determination unit 42 uses the binary classification model 1a as shown in FIG. It is determined whether or not it is.
  • the topic estimation unit 43 estimates the topic of the utterance determined to be the switching of the topic by using the multi-value classification model 2a, and stores the multi-value label indicating the estimated topic in the label information table.
  • the paragraph estimation unit 44 estimates the range from the utterance of the topic change to the utterance immediately before the utterance of the next topic change as one paragraph. Then, the paragraph estimation unit 44 assigns a multi-valued label indicating the topic of the utterance at the beginning of the paragraph, which is stored in the label information table, to all the utterances constituting the paragraph.
  • the estimation device 30c uses the binary classification model 1a to determine whether or not the utterances constituting the series data are utterances of switching topics. Further, the estimation device 30c estimates the topic of the utterance of the topic change by using the multi-value classification model 2a. Further, the estimation device 30c estimates the range of the paragraph from the utterance of the topic change to the utterance immediately before the next topic change utterance, and the topic estimated about the topic change utterance is the topic change. Presumed to be a topic in a paragraph containing the utterance of.
  • the utterance division unit is, for example, a word unit in which the utterance is divided into words. Further, the utterance division unit is, for example, a unit divided by punctuation marks or punctuation marks when punctuation marks are added to the utterance. Further, in the first and second embodiments described above, when the topic of the utterance is estimated, the topic is estimated in the utterance or a predetermined division unit. Then, in the first and second embodiments, the division unit of the utterance was fixed.
  • the topic does not always change in a predetermined unit.
  • the response history may be recorded separately for the scene of confirming the presence or absence of an injury and the scene of confirming the damage of a car.
  • the dialogue between the customer and the person in charge of responding shown in utterances 1 to 4 will be described with an example of dividing the dialogue into a scene for confirming the presence or absence of an injury and a scene for confirming damage to the car.
  • Responsible person "I heard that you had an accident when you put the car in the garage.
  • utterance 1 and utterance 2 are utterances in a scene where damage to the car is confirmed.
  • the scene of confirming the damage of the car is switched to the scene of confirming the presence or absence of injury, and the scene of confirming the presence or absence of injury continues to utterance 4.
  • utterance 3 "That's right, because I rubbed the bumper behind the car with a utility pole when I put it in the garage” is the scene to confirm the damage of the car, and utterance 3 " Was your body okay? ”Is the scene to check for any injuries.
  • the unit "That's right, I rubbed the bumper behind the car with a utility pole when I put it in the garage" is a label indicating that it is a scene to confirm damage to the car. It is desirable to give a label indicating that it is a scene to confirm the presence or absence of injury to the unit "Is your body okay?", But such a unit is decided in advance. That is difficult.
  • the learning unit is not fixed, and positive examples, negative examples, and non-target learning data are dynamically created in various units from the teacher data. That is, in the present embodiment, the learning data is created by making the division unit of the utterance variable. By doing so, even when the story (scene) is switched in the middle of the utterance, it is possible to create learning data for learning a model capable of estimating the switching point with high accuracy. Further, by using a model in which learning data created without fixing the learning unit is used, it is possible to estimate each scene in the utterance even when the scene is switched in the middle of the utterance.
  • FIG. 19 is a diagram showing a configuration example of the learning data creating device 50 according to the present embodiment.
  • the learning data creating device 50 according to the present embodiment dynamically creates positive examples, negative examples, and non-target learning data in various units from the teacher data.
  • the learning data creating device 50 includes an input unit 51, a learning data creating unit 52, and an output unit 53.
  • the dialogue series data is input to the input unit 51.
  • the series data is, for example, voice data of a time-series dialogue between an operator and a customer, or text data in which utterances included in the dialogue are voice-recognized.
  • the input unit 51 outputs the input series data to the learning data creation unit 52.
  • the learning data creation unit 52 inputs the series data output from the input unit 51 and the teacher data.
  • the teacher data is data in which the range of utterances necessary for specifying a scene in the utterances constituting the series data is labeled before the learning data is created. Labels in teacher data are manually assigned.
  • the learning data creation unit 52 creates learning data used for learning a model for estimating a topic (scene) in the utterance in an arbitrary division unit of the utterance based on the input series data and the teacher data.
  • FIG. 20 is a diagram showing a configuration example of the learning data creation unit 52.
  • the learning data creation unit 52 includes a sentence output unit 521, an ID assignment unit 522, a combination generation unit 523, and an assignment unit 524.
  • the sentence output unit 521 outputs the utterance character string constituting the series data input from the input unit 51 as a sentence.
  • the sentence output unit 521 outputs a sentence divided into word units by morphological analysis.
  • the sentence output unit 521 outputs a sentence divided into word units by voice recognition.
  • the ID assignment unit 522 generates an element in which the utterance is divided according to a predetermined rule from the sentence output from the sentence output unit 521.
  • the unit of division (unit of element) by the ID assigning unit 522 may be any unit as long as it can be specified, such as a word unit, a punctuation mark unit, a voice recognition unit, and a speech end unit.
  • the ID assigning unit 522 assigns an ID to each of the elements in which the utterance is divided, and stores the ID assigned to each element in the ID set.
  • the combination generation unit 523 generates a combination of IDs (combination ID string) necessary for learning the model based on the IDs stored in the ID set.
  • FIG. 21 is a diagram showing a configuration example of the combination generation unit 523.
  • the combination generation unit 523 includes an ID extraction unit 5231, a combination target ID storage unit 5232, a combination generation ID storage unit 5233, and a combination ID generation unit 5234.
  • the ID extraction unit 5231 extracts a predetermined longest unit ID from the ID set and stores it in the longest unit ID set.
  • the longest unit may be any unit as long as it is a unit longer than the unit divided when the sentence is output by the sentence output unit 521 and can be specified in advance. For example, if the unit of division at the time of output of a sentence is a word unit, the longest unit is a punctuation mark unit or a punctuation unit, which is longer than the word unit. Further, for example, if the unit of division at the time of outputting a sentence is a punctuation mark unit, the longest unit is a punctuation mark unit or a voice recognition unit, which is longer than the punctuation mark unit.
  • the combination target ID storage unit 5232 extracts the IDs in the range to be combined from the longest unit ID set and stores them in the combination target ID set.
  • the combination generation ID storage unit 5233 acquires the combination generation ID for generating the combination ID string from the combination target ID set and stores it in the combination generation ID set.
  • the combination ID generation unit 5234 generates a combination ID string based on the set of combination generation IDs, stores it in the set of combination ID columns, and updates the set of combination ID columns.
  • the combination generation unit 523 outputs the generated combination ID string to the addition unit 524.
  • the combination ID string output from the combination generation unit 523 and the teacher data are input to the addition unit 524.
  • the assigning unit 524 creates learning data by assigning a positive example, a negative example, or a label to be excluded from learning based on the teacher data for each division unit in which the combination ID string is replaced with a character string.
  • FIG. 22 is a diagram showing a configuration example of the granting unit 524.
  • the granting unit 524 includes a positive example granting unit 5241, a negative example granting unit 5242, and a non-target granting unit 5243.
  • the regular example assigning unit 5241 assigns a label indicating a regular example to a predetermined ID column in the set of combination ID columns based on the teacher data. By doing so, a label showing a positive example is given to the division unit in which the predetermined ID string is replaced with the character string.
  • the negative example assigning unit 5242 assigns a label indicating a negative example to a predetermined ID column in the set of combination ID columns. By doing so, a label showing a negative example is given to the division unit in which the predetermined ID string is rewritten into the character string.
  • the non-target granting unit 5243 assigns a label indicating that it is not subject to learning to a predetermined ID column in the set of combination ID columns. By doing so, a label indicating that the combination ID string is not the target is given to the division unit in which the combination ID string is replaced with the character string.
  • the non-target granting unit 5243 deletes the combination ID column to which the label indicating that it is not the target of learning is attached, and the division unit corresponding to the combination ID column to which the label indicating the positive example or the negative example is attached, and A label indicating a positive example or a negative example is output as training data. The details of the operation of the granting unit 524 will be described later.
  • the output unit 53 outputs the learning data created by the learning data creation unit 52.
  • the operation of the learning data creation unit 52 will be described.
  • a case of creating learning data for learning a model for determining whether or not a scene (story) is switched will be described as an example.
  • the utterance 3 since the above-mentioned utterance 3 includes a scene change, the utterance 3 will be described as an example.
  • the label "T” is given to the range determined to be the change of the scene, and the label "F” is given to the range not determined to be the change of the scene.
  • the division unit of the sentence is a punctuation mark unit and the longest unit is a punctuation mark unit.
  • the label "T” is given to the range determined to be the change of scene in utterance 3 ("Is your body okay?").
  • the ID assigning unit 522 divides the utterance 3 by punctuation marks, and assigns an ID to each element divided by the punctuation marks. In the following, it is assumed that the ID assigning unit 522 assigns an ID as follows. ID1: Was that so? ID2: When you put it in the garage ID3: Because I rubbed the bumper behind the car with a utility pole, ID4: Your body is ID5: Is that okay?
  • the ID assigning unit 522 stores the ID assigned to each element of the utterance in the ID set.
  • the combination generation unit 523 creates a combination (ID string) of the IDs of the elements divided into punctuation marks within the range of the longest predetermined unit from the ID set.
  • the operation of the combination generation unit 523 will be described with reference to FIG. 23.
  • FIG. 23 is a flowchart showing an example of the operation of the combination generation unit 523.
  • the ID extraction unit 5231 extracts all IDs from the ID set for each longest unit and stores them in the longest unit ID set (step S61). As described above, since the longest unit is a punctuation unit, the range of the longest unit is ID1 to ID5. The ID extraction unit 5231 extracts IDs 1 to 5 from the ID set and stores (1, 2, 3, 4, 5) in the longest unit ID set.
  • the combination target ID storage unit 5232 deletes the smallest ID among the IDs stored in the longest unit ID set from the longest unit ID set, and stores the ID in the combination target ID set (step S62).
  • the combination target ID storage unit 5232 takes out ID1 from the ID set of the longest unit and stores it in the combination target ID set. Further, the combination target ID storage unit 5232 deletes ID1 from the ID set of the longest unit. Therefore, (2,3,4,5) is stored in the ID set of the longest unit.
  • the combination generation ID storage unit 5233 arranges all the IDs included in the combination target ID set in ascending order and stores them in the combination generation ID set and the combination ID string set (step S63).
  • the combination sequence in which all the IDs are arranged in ascending order is [1].
  • the combination generation ID storage unit 5233 stores (1) in the set of combination generation IDs, and stores [1] in the set of combination ID columns.
  • the combination ID generation unit 5234 deletes the smallest ID among the ID columns stored in the combination generation ID set, arranges the remaining IDs in ascending order, and stores them in the combination ID column set (step S64).
  • (1) is stored in the set of combination generation IDs. Therefore, the combination ID generation unit 5234 deletes the smallest ID1.
  • the combination ID generation unit 5234 determines whether or not the set of combination generation IDs is empty (step S65). In the above example, the set of combination generation IDs is empty because ID1 is deleted.
  • step S65: No If it is determined that the set of combination generation IDs is not empty (step S65: No), the combination ID generation unit 5234 repeats the process of step S64.
  • the combination target ID storage unit 5232 determines whether or not the longest unit ID set is empty. (Step S66). In the above example, since (2,3,4,5) is stored in the longest unit ID set, the longest unit ID set is not empty.
  • step S66 the combination target ID storage unit 5232 returns to the process of step S62.
  • the combination target ID storage unit 5232 since (2, 3, 4, 5) is stored in the ID set of the longest unit, the combination target ID storage unit 5232 takes out the smallest ID 2 and stores it in the combination target ID. Further, the combination target ID storage unit 5232 deletes the ID 2 from the ID set of the longest unit. Therefore, (3, 4, 5) is stored in the ID set of the longest unit.
  • steps S63 and S64 are performed, and (1 and 2) are stored in the ID set to be combined. Further, an ID string in which all the IDs stored in the ID set to be combined are arranged in ascending order is stored in the combination generation ID set and the combination ID string set. Since (1,2) is stored in the ID set to be combined, the combination column in which all the IDs are arranged in ascending order is [1,2], and the combination generation ID set is (1,2). Is stored. Further, the combination sequence [1, 2] is added to the set of combination columns, and the set of combination columns becomes ([1], [1, 2]).
  • the combination ID generation unit 5234 deletes the smallest ID among the ID columns stored in the combination generation ID set, arranges the remaining IDs in ascending order, and stores them in the combination ID column set.
  • (1, 2) is stored in the set of combination generation IDs. Therefore, the combination ID generation unit 5234 deletes the smallest ID1. ID1 is deleted, and (2) remains in the set of combination generation IDs. Since (2) remains in the set of combination generation IDs, the combination ID generation unit 5234 stores [2] in the set of combination ID strings. Therefore, the set of combination ID columns is ([1], [1,2], [2]).
  • the combination generation unit 523 generates a combination ID string composed of one element or a plurality of consecutive key points in which the utterance is divided according to a predetermined rule. [1] [1, 2] [2] [1, 2, 3] [2, 3] [3] [1, 2, 3, 4] [2,3,4] [3,4] [4] [1, 2, 3, 4, 5] [2,3,4,5] [3, 4, 5] [4,5] [5]
  • step S66 determines that the ID set of the longest unit is empty (step S66: Yes)
  • the ID extraction unit 5231 does not store the IDs stored in the longest unit ID set among the ID sets. It is determined whether or not there is (step S67).
  • step S67: Yes If it is determined that there is an ID that is not stored in the ID set of the longest unit (step S67: Yes), the ID extraction unit 5231 returns to the process of step S61.
  • step S67: No the combination generation unit 523 ends the process.
  • FIG. 24 is a flowchart showing an example of the operation of the granting unit 524.
  • the regular example assigning unit 5241 assigns a label indicating a regular example to all the ID columns in the range matching the teacher data in the ID strings included in the set of the combination ID strings generated by the combination generation unit 523 (step). S71). As described above, it is assumed that the label "T" is attached to the range determined to be the scene change in the utterance 3 ("Is your body okay?") As the teacher data. Therefore, the example giving unit 5241 assigns a label (“T”) indicating the example to the ID columns [4, 5] in the same range as “Is your body okay?” In the utterance 3.
  • the negative example assigning unit 5242 includes negative examples in all the combination ID columns included in the set of combination ID columns, which does not include any ID included in the ID column labeled with a positive example.
  • a label is attached (step S72).
  • the ID column [4, 5] is given a label indicating a positive example. Therefore, the negative example assigning unit 5242 assigns a label (“F”) indicating a negative example to all the following combination ID strings that do not include ID4 and ID5.
  • the non-target granting unit 5243 assigns a label indicating non-target to all the combination ID columns to which neither the label indicating the positive example nor the label indicating the negative example is assigned among the ID columns included in the set of the combination ID columns. (Step S73). In the above-mentioned example, the non-target granting unit 5243 assigns a label indicating non-target to the following combination ID column. [1,2,3,4]: Not applicable [2,3,4]: Not applicable [3,4]: Not applicable [4]: Not applicable [1,2,3,4,5]: Not applicable [2,3,4,5]: Not applicable [3,4,5]: Not applicable [5]
  • the non-target granting unit 5243 deletes the combination ID column to which the label indicating the non-target is attached from the set of the combination ID columns. Then, the non-target granting unit 5243 stores the division unit corresponding to the combination ID string to which the label indicating the positive example or the negative example is attached in the learning data. In the above-mentioned example, the division unit corresponding to the following combination ID string is stored in the learning data. [1]: F [1, 2]: F [2]: F [1, 2, 3]: F [2,3]: F [3]: F [4,5]: T
  • the learning data creation device 50 a division unit composed of one element or a plurality of consecutive elements in which the utterance is divided by a predetermined rule (for example, a punctuation mark unit) is given a label. And create learning data.
  • the learning data includes division units having different numbers of constituent elements.
  • the learning data can be created in the utterance division unit according to the change.
  • the learning data created in this way it is possible to create a model that can estimate the scene change with high accuracy even when the scene (story) changes in the middle of the utterance. can.
  • the estimation device 30d according to the present embodiment uses a model trained based on the training data created by the training data creation device 50, and switches scenes (story) in utterance division units having different numbers of constituent elements. Is to estimate
  • FIG. 25 is a diagram showing a configuration example of the estimation device 30d according to the present embodiment.
  • the estimation device 30d includes an input unit 61, an estimation unit 62, and an output unit 63.
  • the dialogue series data is input to the input unit 61.
  • the input unit 61 includes a sentence output unit 611. Similar to the sentence output unit 521, the sentence output unit 611 outputs the utterance character string constituting the series data input to the input unit 61 to the estimation unit 62 as a sentence.
  • the sentence output unit 611 outputs a sentence divided into word units by morphological analysis.
  • the output unit 611 outputs a sentence divided into word units by voice recognition.
  • the estimation unit 62 estimates the change of story from the sentence output from the input unit 61 by using the estimation model 3.
  • the estimation model 3 is a model created by learning the learning data created by the learning data creation device 50.
  • the learning data created by the learning data creation unit 50 includes division units having different numbers of constituent elements, and each division unit is given a label as to whether or not the story is switched. It is data. Therefore, the estimation model 3 is a model learned in advance so as to determine whether or not the story is switched for each of the division units having different numbers of constituent elements.
  • the estimation unit 62 generates division units having different numbers of constituent elements from the utterances constituting the series data to be processed, and uses the estimation model 3 as the first model for each of the generated division units. Judging whether or not it is a switch
  • the output unit 63 outputs the estimation result by the estimation unit 62.
  • FIG. 27 is a diagram showing a configuration example of the estimation unit 62.
  • the estimation unit 62 includes an ID assignment unit 621, a combination generation unit 622, and a switching estimation unit 623.
  • the ID assignment unit 621 generates an element in which the utterance is divided according to a predetermined rule from the sentence output from the sentence output unit 611.
  • the unit of division by the ID assigning unit 621 may be any identifiable unit such as a word unit, a punctuation mark unit, a voice recognition unit, and a speech end unit.
  • the ID assigning unit 621 assigns an ID to each of the elements in which the utterance is divided, and stores the ID assigned to each element in the ID set.
  • the combination generation unit 622 generates a combination of IDs (combination ID string) used for estimating the switching of the story based on the IDs stored in the ID set.
  • FIG. 28 is a diagram showing a configuration example of the combination generation unit 622.
  • the combination generation unit 622 includes an ID extraction unit 6221, a combination target ID storage unit 6222, a combination generation ID storage unit 6223, and a combination ID generation unit 6224.
  • the ID extraction unit 6221 extracts a predetermined longest unit ID from the ID set and stores it in the longest unit ID set.
  • the combination target ID storage unit 6222 extracts the IDs in the range to be combined from the longest unit ID set and stores them in the combination target ID set.
  • the combination generation ID storage unit 6223 acquires the combination generation ID for generating the combination ID string from the combination target ID set and stores it in the combination generation ID storage unit.
  • the combination ID generation unit 6224 Similar to the combination ID generation unit 5234, the combination ID generation unit 6224 generates a combination ID string based on the combination generation ID set, stores it in the combination ID string set, and updates the combination ID column set.
  • the combination generation unit 622 switches the set of the generated combination ID string and outputs it to the estimation unit 623.
  • the switching estimation unit 623 is input with a set of combination ID strings output from the combination generation unit 622.
  • the switching estimation unit 623 uses the estimation model 3 to determine for each division unit corresponding to the combination ID string whether or not the division unit is a story change, and outputs the determination result.
  • the operation of the estimation unit 62 will be described focusing on the operation of the switching estimation unit 623. Since the operation of generating the combination ID string by the combination generation unit 622 is the same as the operation of the combination generation unit 523 described with reference to FIG. 23, the description thereof will be omitted.
  • FIG. 29 is a flowchart showing an example of the operation of the switching estimation unit 623.
  • the switching estimation unit 623 extracts one combination ID string consisting of only IDs for which it has not yet been estimated whether or not the story is switched from the set of combination ID strings (step S81).
  • the switching estimation unit 623 replaces the extracted combination ID string with a word string (step S82). That is, the switching estimation unit 623 replaces the ID included in the combination ID string with the utterance element corresponding to the ID.
  • the switching estimation unit 623 estimates whether or not the character string (speech division unit) in which the combination ID string is replaced is a story switching using the estimation model 3 (step S83).
  • the switching estimation unit 623 determines whether or not the estimation result is a positive example (whether the story is switched) (step S84).
  • step S84 determines whether or not the set of combination ID strings is empty (step S85).
  • step S85: No the switching estimation unit 623 returns to the process of step S81.
  • the switching estimation unit 623 When it is determined that the set of the combination ID strings is empty (step S85: Yes), the switching estimation unit 623 outputs the estimation result for each ID via the output unit 63 (step S86), and ends the process. ..
  • the switching estimation unit 623 consists of only IDs in the set of combination ID strings that do not estimate whether or not the story is switching. It is determined whether or not there is a combination ID column (step S87).
  • step S87: Yes When it is determined that there is a combination ID string consisting only of IDs for which it is not estimated whether or not the talk is switched (step S87: Yes), the switching estimation unit 623 returns to the process of step S81.
  • step S87 When it is determined that there is no combination ID string consisting only of IDs for which it is not estimated whether or not the talk is switched (step S87: No), the switching determination unit 623 estimates for each ID via the output unit 63. The result and the estimation unit are output (step S88), and the process is terminated.
  • the operation of the estimation unit 62 will be further described with reference to a specific example.
  • the ID assigning unit 621 divides the above-mentioned utterance into four elements in units of punctuation marks, and assigns IDs (ID1 to ID4) to each element.
  • the combination generation unit 622 generates a combination ID string by the process described with reference to FIG. 23.
  • the combination generation unit 622 has 10 combination ID sequences ([1], [1,2], [2], [1,2,3], [2,3], [3]. ], [1,2,3,4], [2,3,4], [3,4], [4]).
  • the switching estimation unit 623 extracts one combination ID string from the generated set of combination ID strings, and estimates whether or not the division unit corresponding to the extracted combination ID sequence is the switching of the story. As shown in FIG. 30B, the switching estimation unit 623 changes the story by dividing the division unit corresponding to the combination ID string in the set of the combination ID strings until it is estimated to be a positive example (story switching). Estimate in order whether or not there is. Combination ID columns [1], [1,2], [2], [1,2,3], [2,3], [3], [1,2,3,4], [2,3] It is assumed that the division unit corresponding to 4], is not a regular example, and the division unit corresponding to the combination ID sequence [3, 4] is estimated to be a regular example.
  • the switching estimation unit 623 does not have a combination ID string consisting of only unestimated IDs, the estimation result and the estimation unit are output for each ID via the output unit 63. Since it was estimated that the division unit corresponding to the combination ID sequence [3, 4] was a positive example, the switching estimation unit 623 had a positive example for the ID 3 and ID 4 as shown in FIG. 30B. Also, it is output that the unit (estimated unit) presumed to be a positive example is the combination sequence [3, 4].
  • the operation of the estimation unit 62 will be further described by giving another specific example.
  • the ID assigning unit 621 divides the above-mentioned utterance into four elements in units of punctuation marks, and assigns IDs (ID1 to ID4) to each element.
  • the combination generation unit 622 generates a combination ID string by the process described with reference to FIG. 23.
  • the combination generation unit 622 has 10 combination ID sequences ([1], [1,2], [2], [1,2,3], [2,3], [3]. ], [1,2,3,4], [2,3,4], [3,4], [4]).
  • the switching estimation unit 623 extracts one combination ID string from the generated set of combination ID strings, and estimates whether or not the division unit corresponding to the extracted combination ID sequence is the switching of the story. As shown in FIG. 31B, the switching estimation unit 623 changes the story by dividing the division unit corresponding to the combination ID string in the set of the combination ID strings until it is estimated to be a positive example (story switching). Estimate in order whether or not there is. In the following, it is assumed that the division unit corresponding to the combination ID column [1] is not a regular example, and the division unit corresponding to the combination ID column [1, 2] is a regular example.
  • the switching estimation unit 623 Since the switching estimation unit 623 has a combination ID sequence ([3], [3, 4], [4]) consisting only of IDs (ID3 and ID4) for which it is not estimated whether or not it is a positive example, there is a combination ID sequence ([3], [3,4], [4]). It is further estimated whether or not these ID columns are correct examples. In the following, it is assumed that the division unit corresponding to the combination ID column [3] is not a regular example, and the division unit corresponding to the combination ID column [3, 4] is estimated to be a regular example.
  • the switching estimation unit 623 Since the switching estimation unit 623 does not have a combination ID string consisting of only unestimated IDs, the estimation result and the estimation unit are output for each ID via the output unit 63. Since it was estimated that the division unit corresponding to the combination ID column [1, 2] and the combination ID column [3, 4] is a positive example, the switching estimation unit 623 may refer to ID 1 and ID 2 as shown in FIG. 31B. It is output that the estimation result is a positive example and that the estimation unit is the combination sequence [1, 2]. Further, the switching estimation unit 623 outputs to ID3 and ID4 that the estimation result is a positive example and that the estimation unit is the combination sequence [3,4].
  • whether or not the utterance is switched for each of the divided units in which the utterance is divided according to a predetermined rule and is composed of one element or a plurality of consecutive elements and the number of constituent elements is different.
  • a division unit having a different number of constituent elements is generated from the utterances constituting the series data to be processed, and the training data is generated by using the trained estimation model 3.
  • the estimation model 3 is used to determine whether or not the story is switched.
  • the switching point can be estimated with high accuracy.
  • the binary classification model 1 is created by the learning device 10 and the multi-value classification model 2 is created by the learning device 20.
  • the present invention is not limited to this. No.
  • one learning device 70 may create a binary classification model 1 and a multi-value classification model 2.
  • the learning device 70 includes an input unit 11, a binary classification learning unit 12 as a first model learning unit, an input unit 21, a multi-value label complementing unit 22, and a second model. It is provided with a multi-value classification learning unit 23 as a learning unit.
  • each of the input unit 11 and the binary classification learning unit 12 indicates whether or not the utterance or the division unit in which the utterance is divided, which constitutes the series data of the dialogue including a plurality of topics, is the switching of the utterance.
  • the teacher data first teacher data
  • the binary label first label
  • the operations of the input unit 21, the multi-value label complement unit 22, and the multi-value classification learning unit 23 are the operations of the input unit 21, the multi-value label complement unit 22, and the multi-value classification learning unit 23, which are described with reference to FIG. Is the same as.
  • the multi-valued classification learning unit 23 has teacher data (second label) in which a multi-valued label (second label) indicating a topic in the range is added to a range in which one topic in the series data continues. Based on the teacher data of 2), the multi-valued classification model 2 (second model) that estimates the topic in the utterance that constitutes the series data to be processed is learned.
  • FIG. 33 is a diagram showing an example of the operation of the learning device 70, and is a diagram for explaining a learning method by the learning device 70.
  • the binary classification learning unit 12 is a teacher to which a binary label indicating whether or not the utterance is switched is given to the utterance or the division unit obtained by dividing the utterance that constitutes the series data of the dialogue including a plurality of topics. Based on the data (first teacher data), the binary classification model 1 for determining whether or not the utterance constituting the series data to be processed is the utterance of the switching of the talk is learned (step S91).
  • the multi-value classification learning unit 23 estimates the topic in the utterance that constitutes the series data to be processed based on the teacher data in which the multi-value label indicating the topic in the range is added to the range in which one topic in the series data continues. Learn the multi-valued classification model 2 to be performed (step S92).
  • the hardware configuration of the estimation devices 30 to 30d will be described.
  • the estimation devices 30a to 30d may have the same hardware configuration.
  • the learning devices 10, 20, 70 and the learning data creating device 50 may have the same hardware configuration.
  • FIG. 34 is a block diagram showing a hardware configuration when the estimation device 30 of the present disclosure is a computer capable of executing a program instruction.
  • the computer may be a general-purpose computer, a dedicated computer, a workstation, a PC (Personal Computer), an electronic notepad, or the like.
  • the program instruction may be a program code, a code segment, or the like for executing a necessary task.
  • the estimation device 30 includes a processor 110, a ROM (Read Only Memory) 120, a RAM (Random Access Memory) 130, a storage 140, an input unit 150, a display unit 160, and a communication interface (I / F) 170.
  • the processor 110 is a CPU (Central Processing Unit), an MPU (Micro Processing Unit), a GPU (Graphics Processing Unit), a DSP (Digital Signal Processor), a SoC (System on a Chip), or the like, and is of the same type or different types. It may be composed of a plurality of processors.
  • the processor 110 controls each configuration and executes various arithmetic processes. That is, the processor 110 reads the program from the ROM 120 or the storage 140, and executes the program using the RAM 130 as a work area. The processor 110 controls each of the above configurations of the estimation device 30 and performs various arithmetic processes according to the program stored in the ROM 120 or the storage 140. In the present embodiment, the program according to the present disclosure is stored in the ROM 120 or the storage 140. The processor 110 reads and executes the program.
  • the determination unit 32, the paragraph estimation unit 33, and the topic estimation unit 34 constitute a control unit 38 (FIG. 3).
  • the control unit 38 may be configured by dedicated hardware such as an ASIC (Application Specific Integrated Circuit) or an FPGA (Field-Programmable Gate Array), or may be configured by one or more processors as described above. ..
  • ASIC Application Specific Integrated Circuit
  • FPGA Field-Programmable Gate Array
  • the control unit 61 may be configured by dedicated hardware such as an ASIC or FPGA, or may be configured by one or more processors as described above.
  • the program is stored in a non-transitory storage medium such as a CD-ROM (CompactDiskReadOnlyMemory), a DVD-ROM (DigitalVersatileDiskReadOnlyMemory), or a USB (UniversalSerialBus) memory. May be provided at. Further, the program may be downloaded from an external device via a network.
  • a non-transitory storage medium such as a CD-ROM (CompactDiskReadOnlyMemory), a DVD-ROM (DigitalVersatileDiskReadOnlyMemory), or a USB (UniversalSerialBus) memory. May be provided at. Further, the program may be downloaded from an external device via a network.
  • the ROM 120 stores various programs and various data.
  • the RAM 130 temporarily stores a program or data as a work area.
  • the storage 140 is composed of an HDD (Hard Disk Drive) or an SSD (Solid State Drive), and stores various programs including an operating system and various data.
  • the storage 140 stores the created binary classification models 1, 1a, multi-value classification models 2, 2a, and estimation model 3.
  • the input unit 150 includes a pointing device such as a mouse and a keyboard, and is used for performing various inputs.
  • the display unit 160 is, for example, a liquid crystal display and displays various information.
  • the display unit 160 may adopt a touch panel method and function as an input unit 150.
  • the communication interface 170 is an interface for communicating with other devices such as an external device (not shown), and for example, standards such as Ethernet (registered trademark), FDDI, and Wi-Fi (registered trademark) are used.
  • Ethernet registered trademark
  • FDDI FDDI
  • Wi-Fi registered trademark
  • An estimator with a processor The processor The series data to be processed using the first model learned in advance based on the first teacher data for the utterance that constitutes the series data of the dialogue including a plurality of topics or the division unit obtained by dividing the utterance. Judging whether or not the utterances that make up the utterance are switching utterances, Based on the result of the determination, the range of the paragraph from the switching of the talk to the utterance immediately before the next switching or the paragraph from the switching of the talk to the utterance at the end of the dialogue in the series data to be processed is estimated. Estimator.
  • a learning device equipped with a processor The processor Based on the first teacher data to which the first label indicating whether or not the utterance is switched is given to the utterance that constitutes the series data of the dialogue including a plurality of topics or the divided unit obtained by dividing the utterance. , Learn the first model to determine whether the utterances that make up the series data to be processed are utterances that switch stories.
  • the topic in the utterance constituting the series data to be processed is estimated based on the second teacher data in which the range in which one topic in the series data continues is given a second label indicating the topic in the range.
  • a learning device that learns 2 models.
  • Appendix 3 A non-temporary storage medium that stores a program that can be executed by a computer, which stores the program and causes the computer to function as the estimation device according to the appendix 1.
  • Appendix 4 A non-temporary storage medium that stores a program that can be executed by a computer, and that causes the computer to function as the learning device according to the second item.
  • a computer can be suitably used to function as each part of the estimation device 30, 30a, 30b, 30c, 30d and the learning device 70 described above.
  • Such a computer stores a program describing processing contents that realize the functions of the estimation devices 30, 30a, and 30b in the storage unit of the computer, and the processor of the computer reads and executes the program. It can be realized by. That is, the program can make the computer function as the estimation device 30, 30a, 30b, 30c, 30d and the learning device 70 described above.
  • this program may be recorded on a computer-readable medium. It can be installed on a computer using a computer-readable medium.
  • the computer-readable medium on which the program is recorded may be a non-transient recording medium.
  • the non-transient recording medium is not particularly limited, but may be, for example, a recording medium such as a CD-ROM or a DVD-ROM. This program can also be provided via a network.
  • each component can be rearranged so as not to be logically inconsistent, and a plurality of components can be combined or divided into one.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Acoustics & Sound (AREA)
  • Machine Translation (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

La présente invention concerne un dispositif d'estimation (30) comprenant : une unité de détermination (32) qui détermine si une parole qui constitue des données en série à traiter est une parole indiquant un changement de sujet, à l'aide d'un modèle de classification binaire (1), qui est pré-appris sur la base de données d'entraînement pour la parole qui constitue des données en série de dialogue comprenant une pluralité de thèmes, ou pour des unités de division de ladite parole; et une unité d'estimation de paragraphe (33) qui, sur la base du résultat de la détermination par l'unité de détermination (32), estime, dans les données en série à traiter, l'étendue d'un paragraphe à partir d'un changement de sujet jusqu'à la parole juste avant un changement ultérieur, ou d'un paragraphe à partir d'un changement de sujet jusqu'à ce que la parole de fin de dialogue soit atteinte.
PCT/JP2021/012692 2020-06-16 2021-03-25 Dispositif d'estimation, procédé d'estimation, dispositif d'apprentissage, procédé d'apprentissage et programme WO2021256043A1 (fr)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP2022532313A JP7425368B2 (ja) 2020-06-16 2021-03-25 推定装置、推定方法、学習装置、学習方法およびプログラム

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
JPPCT/JP2020/023644 2020-06-16
PCT/JP2020/023644 WO2021255840A1 (fr) 2020-06-16 2020-06-16 Procédé d'estimation, dispositif d'estimation et programme

Publications (1)

Publication Number Publication Date
WO2021256043A1 true WO2021256043A1 (fr) 2021-12-23

Family

ID=79267817

Family Applications (2)

Application Number Title Priority Date Filing Date
PCT/JP2020/023644 WO2021255840A1 (fr) 2020-06-16 2020-06-16 Procédé d'estimation, dispositif d'estimation et programme
PCT/JP2021/012692 WO2021256043A1 (fr) 2020-06-16 2021-03-25 Dispositif d'estimation, procédé d'estimation, dispositif d'apprentissage, procédé d'apprentissage et programme

Family Applications Before (1)

Application Number Title Priority Date Filing Date
PCT/JP2020/023644 WO2021255840A1 (fr) 2020-06-16 2020-06-16 Procédé d'estimation, dispositif d'estimation et programme

Country Status (2)

Country Link
JP (1) JP7425368B2 (fr)
WO (2) WO2021255840A1 (fr)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2010100853A1 (fr) * 2009-03-04 2010-09-10 日本電気株式会社 Dispositif d'adaptation de modèle linguistique, dispositif de reconnaissance vocale, procédé d'adaptation de modèle linguistique et support d'enregistrement lisible par ordinateur
JP2012247912A (ja) * 2011-05-26 2012-12-13 Chubu Electric Power Co Inc 音声信号処理装置
JP2018045639A (ja) * 2016-09-16 2018-03-22 株式会社東芝 対話ログ分析装置、対話ログ分析方法およびプログラム
JP2018128575A (ja) * 2017-02-08 2018-08-16 日本電信電話株式会社 話し終わり判定装置、話し終わり判定方法およびプログラム
JP2019053126A (ja) * 2017-09-13 2019-04-04 株式会社日立製作所 成長型対話装置

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2010100853A1 (fr) * 2009-03-04 2010-09-10 日本電気株式会社 Dispositif d'adaptation de modèle linguistique, dispositif de reconnaissance vocale, procédé d'adaptation de modèle linguistique et support d'enregistrement lisible par ordinateur
JP2012247912A (ja) * 2011-05-26 2012-12-13 Chubu Electric Power Co Inc 音声信号処理装置
JP2018045639A (ja) * 2016-09-16 2018-03-22 株式会社東芝 対話ログ分析装置、対話ログ分析方法およびプログラム
JP2018128575A (ja) * 2017-02-08 2018-08-16 日本電信電話株式会社 話し終わり判定装置、話し終わり判定方法およびプログラム
JP2019053126A (ja) * 2017-09-13 2019-04-04 株式会社日立製作所 成長型対話装置

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
MIMURA, MASATO ET AL.: "Automatic Indexing of Speakers and Topics for Panel Discussion Speech", IPSJ SIG TECHNICAL REPORT, vol. 96, no. 55, 28 May 1996 (1996-05-28), pages 13 - 18 *
TAKAAKI HASEGAWA: "Automatic Knowledge Assistance System Supporting Operator Responses", NTT TECHNICAL REVIEW, vol. 17, no. 9, 1 September 2019 (2019-09-01), pages 15 - 18, XP055874245 *

Also Published As

Publication number Publication date
JPWO2021256043A1 (fr) 2021-12-23
WO2021255840A1 (fr) 2021-12-23
JP7425368B2 (ja) 2024-01-31

Similar Documents

Publication Publication Date Title
US10642889B2 (en) Unsupervised automated topic detection, segmentation and labeling of conversations
CN108153800B (zh) 信息处理方法、信息处理装置以及记录介质
US7634406B2 (en) System and method for identifying semantic intent from acoustic information
JP4728972B2 (ja) インデキシング装置、方法及びプログラム
CN104598644B (zh) 喜好标签挖掘方法和装置
CN107305541A (zh) 语音识别文本分段方法及装置
Halibas et al. Application of text classification and clustering of Twitter data for business analytics
US11232266B1 (en) Systems and methods for generating a summary of a multi-speaker conversation
US20210026890A1 (en) Faq consolidation assistance device, faq consolidation assistance method, and program
CN110413998B (zh) 一种面向电力行业的自适应中文分词方法及其系统、介质
KR20150101341A (ko) 분산 퍼지 연관 규칙 마이닝에 기반한 영화 추천 장치 및 방법
CN111462761A (zh) 声纹数据生成方法、装置、计算机装置及存储介质
JP6208794B2 (ja) 会話分析装置、方法及びコンピュータプログラム
CN113342955A (zh) 一种问答语句的处理方法、装置及电子设备
JP4325370B2 (ja) 文書関連語彙獲得装置及びプログラム
CN113988195A (zh) 一种私域流量线索挖掘方法、装置、车辆、可读介质
CN108899016B (zh) 一种语音文本规整方法、装置、设备及可读存储介质
WO2021256043A1 (fr) Dispositif d'estimation, procédé d'estimation, dispositif d'apprentissage, procédé d'apprentissage et programme
CN109241993B (zh) 融合用户和整体评价信息的评价对象情感分类方法及装置
CN116702736A (zh) 保险话术生成方法、装置、电子设备及存储介质
US11580737B1 (en) Search results within segmented communication session content
JP6545633B2 (ja) 単語スコア計算装置、単語スコア計算方法及びプログラム
CN111611394B (zh) 一种文本分类方法、装置、电子设备及可读存储介质
CN114528851A (zh) 回复语句确定方法、装置、电子设备和存储介质
CN113934833A (zh) 训练数据的获取方法、装置、系统及存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21824906

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2022532313

Country of ref document: JP

Kind code of ref document: A

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21824906

Country of ref document: EP

Kind code of ref document: A1