JP6576968B2

JP6576968B2 - End-of-speech determination device, end-of-speech determination method, and program

Info

Publication number: JP6576968B2
Application number: JP2017021606A
Authority: JP
Inventors: 節夫山田; 伸章廣嶋; 喜昭野田
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 2017-02-08
Filing date: 2017-02-08
Publication date: 2019-09-18
Anticipated expiration: 2037-02-08
Also published as: JP2018128575A

Description

本発明は、複数の話者による対話における発話が、話者の話し終わりの発話であるか否かを判定する話し終わり判定装置、話し終わり判定方法およびプログラムに関する。 The present invention relates to an end-of-speech determination device, an end-of-speech determination method, and a program for determining whether or not an utterance in a dialogue by a plurality of speakers is an end-of-speaker utterance.

コールセンタや窓口などでの顧客と応対担当者との対話の中から、話者の話し終わり（話し終わりの発話）を検出することで、例えば、顧客が話し終えてから、顧客の発話をまとめてシステムで解析するといった処理が可能となる。 By detecting the end of the speaker's speech (utterance at the end of the speech) from the conversation between the customer and the person in charge at the call center or window, for example, after the customer has finished speaking, summarize the customer's speech Processing such as analysis by the system becomes possible.

顧客と応対担当者との対話のような複数の話者による対話における発話が話し終わりの発話であるか否かを判定する方法として、対話における発話に話し終わりの発話であるか否かの情報が付与された学習データ（話し終わり学習データ）を用いる方法がある（非特許文献１参照）。この方法では、対話における発話に話し終わりの発話であるか否かの情報が付与された学習データが利用される。そして、その学習データを用いた機械学習により、対話における発話が話し終わりの発話であるか否かを判定する話し終わり判定モデルが生成される。 Information on whether the utterance in the dialogue is the end of the utterance as a method of determining whether the utterance in the dialogue by multiple speakers such as the dialogue between the customer and the agent in charge is the end of the utterance There is a method of using learning data (speech end learning data) to which is given (see Non-Patent Document 1). In this method, learning data to which information indicating whether or not the utterance at the end of the talk is added to the utterance in the dialogue is used. Then, by machine learning using the learning data, an end-of-speech determination model for determining whether or not the utterance in the dialogue is the end-of-speech utterance is generated.

R.-E. Fan, K.-W. Chang, C.-J. Hsieh, X.-R. Wang, and C.-J. Lin. LIBLINEAR: A library for large linear classification Journal of Machine Learning Research 9(2008), 1871-1874.R.-E. Fan, K.-W. Chang, C.-J. Hsieh, X.-R. Wang, and C.-J. Lin.LIBLINEAR: A library for large linear classification Journal of Machine Learning Research 9 (2008), 1871-1874.

一般に、話し終わりの発話には、分野ごとに表現に違いがある。そのため、非特許文献１に開示されている方法では、ある分野の話し終わり学習データを用いて生成された話し終わり判定モデルを他の分野に適用した場合、話し終わりの発話であるか否かを高精度に判定することができないことがある。判定を行いたい分野毎に、話し終わり学習データを用意することも考えられるが、分野毎に話し終わり学習データを用意することは、コストの増加を招いてしまう。 In general, there is a difference in the expression of each utterance at the end of the utterance. Therefore, in the method disclosed in Non-Patent Document 1, when the speech end determination model generated using the speech end learning data in a certain field is applied to another field, it is determined whether or not the speech is at the end of the speech. It may not be possible to determine with high accuracy. It is conceivable to prepare end-of-speech learning data for each field to be determined, but preparing end-of-speech learning data for each field causes an increase in cost.

上記のような問題点に鑑みてなされた本発明の目的は、コストの増加を抑制しつつ、対話における発話が話し終わりの発話であるか否かを判定することができる話し終わり判定装置、話し終わり判定方法およびプログラムを提供することにある。 An object of the present invention, which has been made in view of the above problems, is a speech end determination device that can determine whether or not an utterance in a dialogue is an end of speech while suppressing an increase in cost. It is in providing the end determination method and program.

上記課題を解決するため、本発明に係る話し終わり判定装置は、複数の話者による対話における発話が、話者の話し終わりの発話であるか否かを判定する話し終わり判定装置であって、対話における話者の交代の有無に基づき、前記対話における発話が話し終わりの発話であるか否かを判定する判定部を備える。 In order to solve the above-described problem, the speech end determination device according to the present invention is a speech end determination device that determines whether or not an utterance in a dialogue by a plurality of speakers is an utterance at the end of a speaker's speech, A determination unit is provided for determining whether or not the utterance in the dialogue is an utterance at the end of the talk based on whether or not the speaker is changed in the dialogue.

また、上記課題を解決するため、本発明に係る話し終わり判定方法は、複数の話者による対話における発話が、話者の話し終わりの発話であるか否かを判定する話し終わり判定方法であって、対話における発話の話者の交代の有無に基づき、前記対話における発話が話し終わりの発話であるか否かを判定するステップを含む。 In addition, in order to solve the above-described problem, the speech end determination method according to the present invention is a speech end determination method for determining whether an utterance in a dialogue by a plurality of speakers is an utterance at the end of a speaker's speech. And determining whether or not the utterance in the dialogue is an end-of-speech utterance based on whether or not the speaker of the utterance in the dialogue is changed.

また、上記課題を解決するため、本発明に係るプログラムは、コンピュータを上述した話し終わり判定装置として機能させる。 Moreover, in order to solve the said subject, the program which concerns on this invention makes a computer function as the above-mentioned talking end determination apparatus.

本発明に係る話し終わり判定装置、話し終わり判定方法およびプログラムによれば、コストの増加を抑制しつつ、対話における発話が話し終わりの発話であるか否かを判定することができる。 According to the talk end determination device, the talk end determination method, and the program according to the present invention, it is possible to determine whether or not the utterance in the dialogue is the utterance at the end of the talk while suppressing an increase in cost.

本発明の第１の実施形態に係る話し終わり判定装置の構成例を示すブロック図である。It is a block diagram which shows the structural example of the talking end determination apparatus which concerns on the 1st Embodiment of this invention. 本発明の第２の実施形態に係る話し終わり判定装置の構成例を示すブロック図である。It is a block diagram which shows the structural example of the talk end determination apparatus which concerns on the 2nd Embodiment of this invention. 話し終わり判定モデルの生成について説明するための図である。It is a figure for demonstrating the production | generation of a speech end determination model. 顧客と応対担当者との対話の一例を示す図である。It is a figure which shows an example of the dialogue between a customer and a reception person in charge. 話し終わり学習データの構成例を示す図である。It is a figure which shows the structural example of speech end learning data. 話し終わり学習データの他の構成例を示す図である。It is a figure which shows the other structural example of talking end learning data. 本発明の第３の実施形態に係る話し終わり判定装置の構成例を示すブロック図である。It is a block diagram which shows the structural example of the talk end determination apparatus which concerns on the 3rd Embodiment of this invention. 顧客と応対担当者との対話の一例を示す図である。It is a figure which shows an example of the dialogue between a customer and a reception person in charge. 図７に示す話し終わり判定モデルによる判定結果の一例を示す図である。It is a figure which shows an example of the determination result by the speech end determination model shown in FIG. 本発明の第４の実施形態に係る話し終わり判定装置の構成例を示すブロック図である。It is a block diagram which shows the structural example of the speech end determination apparatus which concerns on the 4th Embodiment of this invention. 図１０に示す判定部による話し終わりの発話の判定について説明するための図である。It is a figure for demonstrating determination of the speech of the end of a talk by the determination part shown in FIG. 図１０に示す学習データ生成部の動作について説明するための図である。It is a figure for demonstrating operation | movement of the learning data generation part shown in FIG. 自然言語を入力とした機械学習について概念的に示す図である。It is a figure which shows notionally about the machine learning which input the natural language. bag-of-wordsの具体例を示す図である。It is a figure which shows the specific example of bag-of-words. 従来の自然言語を入力とした機械学習の方法の問題点について説明するための図である。It is a figure for demonstrating the problem of the method of the machine learning which input the conventional natural language. 本発明に係る自然言語を入力とした機械学習の方法について説明するための図である。It is a figure for demonstrating the method of the machine learning which input the natural language which concerns on this invention.

以下、本発明を実施するための形態について、図面を参照しながら説明する。 Hereinafter, embodiments for carrying out the present invention will be described with reference to the drawings.

（第１の実施形態）
図１は、本発明の第１の実施形態に係る話し終わり判定装置１０の構成例を示すブロック図である。本実施形態に係る話し終わり判定装置１０は、顧客と応対担当者との対話のような複数の話者による対話における発話が、話者が伝えたい内容を話し終えた話し終わりの発話であるか否かを判定するものである。 (First embodiment)
FIG. 1 is a block diagram illustrating a configuration example of the talking end determination device 10 according to the first embodiment of the present invention. Whether the utterance in the dialogue by a plurality of speakers such as the dialogue between the customer and the person in charge of the conversation is the utterance at the end of the story when the content that the speaker wants to convey is finished. It is to determine whether or not.

なお、人間は常に伝えたい内容を整理してよどみなく話せるわけではなく、話の途中で考えたり、言い淀んだりする。そのため、発話者が話している途中に、音声が途切れる（無音区間が発生する）ことがある。発話とは、話者の話をこのような音声の途切れなどで区切ったものである。 In addition, human beings are not always able to talk about what they want to convey, and they can think and talk in the middle of the story. For this reason, the voice may be interrupted (a silent section occurs) while the speaker is speaking. An utterance is a speaker's story separated by such breaks in speech.

図１に示す話し終わり判定装置１０は、判定部１１を備える。 The talking end determination device 10 illustrated in FIG. 1 includes a determination unit 11.

判定部１１は、複数の話者による対話（顧客と応対担当者との対話）の対話構造に基づき、対話における発話が話し終わりの発話であるか否かを判定する。具体的には、判定部１１は、対話において話者が交代する話者交代の有無を検出し、話者交代の直前の発話を話し終わりの発話であると判定する。 The determination unit 11 determines whether or not the utterance in the dialog is an end-of-speaking utterance based on the dialog structure of a dialog by a plurality of speakers (a dialog between the customer and the person in charge). Specifically, the determination unit 11 detects the presence or absence of a speaker change in which a speaker changes in a conversation, and determines that the utterance immediately before the speaker change is the utterance at the end of the talk.

一般に、顧客と応対担当者との対話などにおいては、例えば、顧客が問い合わせたい内容を話し終えた後、応対担当者がその問い合わせに対する回答を行い、応対担当者が回答を話し終えた後、顧客が更に問い合わせを行うといった対話構造が多い。すなわち、話者交代が起こると、その直前の発話は話者交代が起こる前の話者の話し終わりの発話であることが多いという傾向がある。判定部１１は、この傾向に基づき、対話における発話が話し終わりの発話であるか否かを判定する。 In general, in a dialogue between a customer and a person in charge, for example, after the customer has finished talking about the content that the customer wants to inquire, the person in charge answers the inquiry, and after the person in charge has finished talking about the answer, the customer There are many dialog structures that make further inquiries. That is, when a speaker change occurs, there is a tendency that the utterance just before that is often the utterance at the end of the talk before the speaker change occurs. Based on this tendency, the determination unit 11 determines whether or not the utterance in the dialogue is the utterance at the end of the conversation.

なお、判定部１１は、対話における発話のうち、「あー」、「えーと」、「はい」などの対話の内容に関係しないフィラーのみの発話を取り除いた上で、話者交代が起こったか否かを判定する。フィラーのみの発話は、顧客が話している最中の応対担当者の相槌などである可能性が高く、このような発話を話者交代が起こったか否かの判定に含めると、話し終わりでないにも関わらず、話者交代が起こったと判定されてしまうことがある。そこで、本実施形態においては、フィラーのみの発話を取り除いた上で、話者交代が起こったか否かを判定する。 Note that the determination unit 11 determines whether or not the speaker change has occurred after removing the utterances of only fillers that are not related to the content of the dialogue, such as “Ah”, “Ehto”, “Yes”, among the utterances in the dialogue. Determine. The filler-only utterance is likely to be an interaction between the customer and the customer who is speaking, and if such an utterance is included in the determination of whether or not a speaker change has occurred, Nevertheless, it may be determined that a speaker change has occurred. Therefore, in this embodiment, it is determined whether or not a speaker change has occurred after removing the filler-only utterance.

このように本実施形態においては、話し終わり判定装置１０は、対話における話者の交代の有無に基づき、対話における発話が話者の話し終わりの発話であるか否かを判定する判定部１１を備える。 As described above, in the present embodiment, the speech end determination device 10 includes the determination unit 11 that determines whether or not the utterance in the dialogue is the utterance at the end of the speaker based on the presence or absence of the change of the speaker in the dialogue. Prepare.

話者交代の直前の発話が話し終わりの発話であるという対話構造は、対話が行われている分野に関わらず、よく見られる。この対話構造を用いて対話における発話が話し終わりの発話であるか否かを判定することで、判定を行いたい分野毎に、話し終わり学習データを用意するといったコストの増加を招くことなく、対話における発話が話し終わりの発話であるか否かを判定することができる。 The dialogue structure in which the utterance just before the speaker change is the utterance at the end of the talk is often seen regardless of the field in which the dialogue is being conducted. By using this dialog structure to determine whether or not the utterance in the dialog is an end-of-speech utterance, the dialog end without increasing the cost of preparing the end-of-speech learning data for each field to be determined It can be determined whether or not the utterance at is the utterance at the end of the talk.

（第２の実施形態）
図２は、本発明の第２の実施形態に係る話し終わり判定装置１０Ａの構成例を示す図である。本実施形態に係る話し終わり判定装置１０Ａは、顧客と応対担当者との対話のような複数の話者による対話における発話が話し終わりの発話であるか否かを判定する話し終わり判定モデルを生成するための話し終わり学習データを生成するものである。なお、図２において、図１と同様の構成については同じ符号を付し、説明を省略する。 (Second Embodiment)
FIG. 2 is a diagram illustrating a configuration example of a speech end determination device 10A according to the second embodiment of the present invention. The talk end determination device 10A according to the present embodiment generates a talk end determination model for determining whether or not an utterance in a dialogue by a plurality of speakers, such as a dialogue between a customer and an agent in charge, is an utterance at the end of the talk. It generates end-of-speech learning data. In FIG. 2, the same components as those in FIG.

図２に示す話し終わり判定装置１０Ａは、図１に示す話し終わり判定装置１０と比較して、学習データ生成部１２を追加した点が異なる。 The talk end determination device 10A shown in FIG. 2 is different from the talk end determination device 10 shown in FIG. 1 in that a learning data generation unit 12 is added.

学習データ生成部１２は、対話における発話が話し終わりの発話であるか否かを判定する話し終わり判定モデルを機械学習により生成するための話し終わり学習データ１３を、判定部１１の判定結果に基づき生成する。生成された話し終わり学習データ１３は、例えば、図３に示すように、判定モデル生成部１４に入力され、判定モデル生成部１４による機械学習により、対話における発話が話し終わりの発話であるか否かを判定する話し終わり判定モデル１５が生成される。この話し終わり判定モデル１５により、例えば、コールセンターにおける顧客と応対担当者との対話における発話が話し終わりの発話であるか否かが判定される。なお、話し終わり判定モデル１５および判定モデル生成部１４は、話し終わり判定装置１０Ａが備えていてもよいし、話し終わり判定装置１０Ａとは別の外部装置が備えていてもよい。 The learning data generation unit 12 uses the learning end learning data 13 for generating a speech end determination model by machine learning to determine whether or not the utterance in the dialogue is an end of speech based on the determination result of the determination unit 11. Generate. For example, as illustrated in FIG. 3, the generated end-of-speech learning data 13 is input to the determination model generation unit 14, and whether or not the utterance in the dialogue is an end-of-speech utterance by machine learning by the determination model generation unit 14. An end-of-speech determination model 15 is generated to determine whether or not. The end-of-speech determination model 15 determines, for example, whether or not the utterance in the dialogue between the customer and the reception staff at the call center is the end-of-speech utterance. Note that the end-of-speech determination model 15 and the determination model generation unit 14 may be included in the end-of-speech determination device 10A, or may be included in an external device different from the end-of-speech determination device 10A.

図４は、顧客と応対担当者との対話の一例を示す図である。 FIG. 4 is a diagram illustrating an example of a dialogue between a customer and a person in charge.

図４に示す例では、顧客は、「あの、ちょっと伺いたいのですが」という発話＃１に続いて、「インターネットでの購入ですけども」という発話＃２を行っている。顧客の発話＃１，＃２を受けて、応対担当者は、顧客の発話に対する相槌として「はい」という発話＃３を行っている。 In the example shown in FIG. 4, the customer performs an utterance # 2 “I want to ask you for a moment”, followed by an utterance # 2 “I bought it on the Internet”. In response to customer utterances # 1 and # 2, the person in charge responds to utterance # 3 of “Yes” as a response to the customer utterance.

顧客は、応対担当者の発話＃３に続いて、「配送料はどうなりますか」という発話＃４を行っている。顧客の発話＃１，＃２，＃４は、インターネットでの購入の際の配送料について問い合わせる内容であり、発話＃４で問い合わせが終わっている。したがって、顧客の発話＃４は話し終わりの発話に相当する。 The customer makes an utterance # 4 "What will the delivery fee be?" Following the utterance # 3 of the person in charge. The customer's utterances # 1, # 2, and # 4 are contents for inquiring about a delivery fee at the time of purchase on the Internet, and the inquiry is finished with the utterance # 4. Therefore, the customer's utterance # 4 corresponds to the utterance at the end of the conversation.

応対担当者は、顧客の発話＃４を受けて、「現在、インターネットでの配送料は無料です」という、顧客の問い合わせに対して回答する発話＃５を行っている。応対担当者の発話＃５により、顧客の問い合わせに対する回答が終わっている。したがって、応対担当者の発話＃５は話し終わりの発話に相当する。 In response to the customer's utterance # 4, the person in charge performs the utterance # 5 that answers the customer's inquiry that “the delivery fee on the Internet is currently free”. The response to the customer's inquiry is completed by the utterance # 5 of the person in charge. Therefore, the utterance # 5 of the person in charge corresponds to the utterance at the end of the conversation.

図４に示す顧客と応対担当者との対話を例として、話し終わり判定モデル１５を生成するための話し終わり学習データ１３について図５を参照して説明する。 The conversation end learning data 13 for generating the conversation end determination model 15 will be described with reference to FIG. 5 by taking the dialogue between the customer and the person in charge shown in FIG. 4 as an example.

上述したように、対話の中にフィラーのみの発話が含まれると、実際には話者が話し終わっていないにも関わらず、話者交代が起こったと判定されることがある。そこで、話し終わり学習データ１３においては、フィラーのみの発話は取り除かれる。そのため、話し終わり学習データ１３としては、図５に示すように、応対担当者によるフィラー（「はい」）のみの発話＃３を除いた発話＃１，＃２，＃４，＃５が抽出される。そして、各発話に対して話し終わりの発話であるか否かを示す情報（話し終わりフラグ）が付与される。図５に示す例では、話し終わりフラグが「０」である場合には話し終わりの発話ではなく、話し終わりフラグが「１」である場合には話し終わりの発話であることを示す。したがって、話し終わりの発話である発話＃４，＃５の話し終わりフラグに「１」が設定され、他の発話＃１，＃２の話し終わりフラグに「０」が設定される。このように、話し終わり学習データ１３は、顧客や応対担当者の発話と、その発話が話し終わりの発話であるか否かを示す情報とが対応付けられたデータである。 As described above, when the dialogue includes only the filler utterance, it may be determined that the speaker change has occurred even though the speaker has not actually finished speaking. Therefore, in the end-of-speech learning data 13, the utterance of only the filler is removed. Therefore, as the end-of-speech learning data 13, as shown in FIG. 5, utterances # 1, # 2, # 4, and # 5 excluding the utterance # 3 of only the filler (“Yes”) by the person in charge are extracted. The Then, information (speaking end flag) indicating whether or not each utterance is an ending speech is given. In the example shown in FIG. 5, when the talk end flag is “0”, it is not an end-of-speech utterance, and when the talk end flag is “1”, it is an end-of-speech utterance. Therefore, “1” is set to the talk end flags of the utterances # 4 and # 5, which are utterances of the talk end, and “0” is set to the talk end flags of the other utterances # 1 and # 2. Thus, the end-of-speech learning data 13 is data in which the utterance of the customer or the person in charge of the customer is associated with the information indicating whether or not the utterance is the end-of-speech utterance.

なお、図５においては、フィラーのみの発話を取り除く例を説明したが、これに限られるものではない。例えば、フィラーのみの発話以外の発話にフィラーが含まれている場合には、そのフィラーは取り除いてもよいし、そのフィラーはそのままでもよい。 In addition, although the example which removes the speech only of a filler was demonstrated in FIG. 5, it is not restricted to this. For example, when a filler is included in an utterance other than the utterance of only the filler, the filler may be removed or the filler may be left as it is.

また、図５においては、発話毎に話し終わりフラグを設定する例を用いて説明したが、これに限られるものではなく、話し終わりまでの発話を順次つなげた発話に話し終わりフラグを設定してもよい。 In addition, in FIG. 5, the description has been given using the example of setting the end-of-speech flag for each utterance. However, the present invention is not limited to this, and the end-of-speech flag is set for utterances in which the utterances until the end of the speech are sequentially connected. Also good.

例えば、図６に示すように、発話＃１は話し終わりの発話ではないため、発話＃１の話し終わりフラグに「０」が設定される。次に、顧客の発話＃１と、発話＃１に続く顧客の発話＃２とをつなげた発話が、話し終わり学習データ１３に追加される。発話＃２は話し終わりの発話ではないため、発話＃１と発話＃２とをつなげた発話は話し終わりの発話ではない。そのため、発話＃１と発話＃２とをつなげた発話の話し終わりフラグに「０」が設定される。 For example, as shown in FIG. 6, since the utterance # 1 is not the utterance at the end of the talk, “0” is set to the talk end flag of the utterance # 1. Next, an utterance connecting customer utterance # 1 and customer utterance # 2 following utterance # 1 is added to the end-of-speech learning data 13. Since the utterance # 2 is not the utterance at the end of the talk, the utterance connecting the utterance # 1 and the utterance # 2 is not the utterance at the end of the talk. Therefore, “0” is set to the speech end flag of the speech connecting speech # 1 and speech # 2.

次に、顧客の発話＃１と、発話＃１に続く発話＃２と、発話＃２に続く顧客の発話＃４（フィラーのみの発話＃３を除く）とをつなげた発話が話し終わり学習データ１３に追加される。発話＃４は話し終わりの発話であるため、発話＃１と発話＃２と発話＃４とをつなげた発話は話し終わりの発話である。そのため、発話＃１と発話＃２と発話＃４をつなげた発話の話し終わりフラグに「１」が設定される。このように、話し終わりまでの発話を順次つなげた発話と、その発話の話し終わりフラグとを話し終わり学習データ１３に追加してもよい。 Next, the utterance connecting the customer utterance # 1, the utterance # 2 following the utterance # 1, and the customer utterance # 4 following the utterance # 2 (excluding the filler only utterance # 3) is the end of the learning data. 13 is added. Since the utterance # 4 is the utterance at the end of the talk, the utterance connecting the utterance # 1, the utterance # 2, and the utterance # 4 is the utterance at the end of the talk. Therefore, “1” is set to the speech end flag of the speech connecting speech # 1, speech # 2, and speech # 4. In this way, an utterance in which utterances up to the end of the talk are sequentially connected and a talk end flag of the utterance may be added to the talk end learning data 13.

図５，６に示すような話し終わり学習データ１３は、顧客と応対担当者との対話から手動により生成することができる。ただし、このような話し終わり学習データ１３を、話し終わり判定を行いたい分野毎に生成するのはコストがかかってしまう。 The end-of-speech learning data 13 as shown in FIGS. 5 and 6 can be manually generated from the dialogue between the customer and the person in charge. However, it will be costly to generate such end-of-speech learning data 13 for each field for which end-of-speech determination is desired.

そこで、本実施形態においては、対話構造から対話における発話が話し終わりの発話であるか否かを判定し、その判定結果を話し終わり学習データ１３として用いる。こうすることで、話し終わり判定を行いたい分野の対話から自動的に話し終わり学習データ１３を生成することができる。そして、生成した話し終わり学習データ１３を用いて話し終わり判定モデル１５を生成することで、コストの増加を抑制しつつ、対話における発話が話し終わりの発話であるか否かを判定することができる。 Therefore, in the present embodiment, it is determined from the dialog structure whether or not the utterance in the dialog is an utterance at the end of the talk, and the determination result is used as the talk end learning data 13. In this way, the speech end learning data 13 can be automatically generated from the dialogue in the field where the end of speech determination is desired. Then, by generating the end-of-speech determination model 15 using the generated end-of-speech learning data 13, it is possible to determine whether or not the utterance in the conversation is the end-of-speech utterance while suppressing an increase in cost. .

（第３の実施形態）
図７は、本発明の第３の実施形態に係る話し終わり判定装置１０Ｂの構成例を示す図である。なお、図７において、図２，３と同様の構成については同じ符号を付し、説明を省略する。 (Third embodiment)
FIG. 7 is a diagram illustrating a configuration example of the talking end determination device 10B according to the third embodiment of the present invention. In FIG. 7, the same components as those in FIGS.

図７に示す話し終わり判定装置１０Ｂは、図２に示す話し終わり判定装置１０Ａと比較して、判定モデル生成部１４および話し終わり判定モデル１５を追加した点が異なる。すなわち、本実施形態においては、話し終わり判定装置１０Ｂは、対話構造を用いた話し終わりか否かの判定結果から話し終わり学習データ１３を生成し、生成した話し終わり学習データ１３を用いて話し終わり判定モデル１５を生成する。そして、話し終わり判定モデル１５による、対話における発話が話し終わりの発話であるか否かの判定結果を出力する。 The talk end determination device 10B shown in FIG. 7 differs from the talk end determination device 10A shown in FIG. 2 in that a determination model generation unit 14 and a talk end determination model 15 are added. That is, in the present embodiment, the talk end determination device 10B generates the talk end learning data 13 from the determination result of whether or not the talk is ended using the dialog structure, and uses the generated talk end learning data 13 to end the talk. A determination model 15 is generated. Then, a determination result of whether or not the utterance in the dialogue is the utterance at the end of the talk by the talk end determination model 15 is output.

次に、本実施形態に係る話し終わり判定装置１０Ｂの動作について、より詳細に説明する。 Next, operation | movement of the speech end determination apparatus 10B which concerns on this embodiment is demonstrated in detail.

事前処理として、対話における発話に話し終わりフラグが付与された話し終わり学習データ１３を用いた機械学習により、対話における発話が話し終わりの発話であるか否かを判定する話し終わり判定モデル１５が生成される。なお、事前処理で用いる話し終わり学習データ１３は、例えば、手動により話し終わりフラグが付与されたデータ、前述した対話構造を利用した判定により話し終わりフラグが付与されたデータなどを用いることができる。 As pre-processing, the end-of-speech determination model 15 for determining whether or not the utterance in the dialog is the end-of-speak utterance is generated by machine learning using the end-of-speech learning data 13 in which the end-of-speech flag is added to the utterance in the dialog. Is done. Note that as the end-of-speech learning data 13 used in the pre-processing, for example, data to which the end-of-speech flag is manually added, data to which the end-of-speech flag is given by the determination using the above-described dialog structure, and the like can be used.

機械学習の手法は、学習データに基づき適切なモデルを生成することできれば、特に限定されることはなく、ディープラーニング、サポートベクタマシンなどの種々の手法を用いることができる。話し終わりの発話であるか否かの判定に利用する情報（素性）についても特に限定されることはなく、正しい判定が可能となるように種々のものを用いることができる。 The machine learning method is not particularly limited as long as an appropriate model can be generated based on the learning data, and various methods such as deep learning and support vector machine can be used. There is no particular limitation on the information (feature) used for determining whether or not the speech is at the end of the speech, and various information can be used so that correct determination is possible.

次に、オンライン処理（発話に応じたリアルタイム処理）として、話し終わり判定装置１０Ａに対して、顧客と応対担当者との対話の音声データ（対話データ）が入力され、話し終わり判定モデル１５を用いて、その対話データが示す対話における発話が話し終わりの発話であるか否かが判定される。 Next, as online processing (real-time processing according to speech), voice data (dialog data) of dialogue between the customer and the person in charge is input to the speech end determination device 10A, and the speech end determination model 15 is used. Thus, it is determined whether or not the utterance in the dialogue indicated by the dialogue data is the utterance at the end of the talk.

以下では、対話データとして、図８に示すような、顧客と応対担当者との対話データが入力されたとする。なお、話し終わり判定装置１０Ａには、顧客の発話と応対担当者の発話とが異なるチャンネル（２チャンネル）で入力される。 In the following, it is assumed that the dialogue data between the customer and the person in charge as shown in FIG. 8 is input as the dialogue data. Note that the utterance of the customer and the utterance of the person in charge of the customer are input to the talk end determination device 10A through different channels (two channels).

図８に示す例では、顧客は、「えーと、あんまり詳しくないので」という発話＃１１の後、「どれがいいかよくわからないんですけど」という発話＃１２を行っている。 In the example shown in FIG. 8, the customer performs the utterance # 12 “I don't know which is better” after the utterance # 11 “I don't know much about it”.

応対担当者は、顧客の発話＃１２の後、顧客の発話に対する相槌として、「はい」というフィラーのみの発話＃１３を行っている。顧客は、応対担当者の発話＃１３の後、「どの商品がおすすめですか」という発話を行っている。おすすめの商品に問い合わせる顧客の発話が終わったので、応対担当者は、おすすめの商品を挙げる発話＃１５を行っている。 After the customer's utterance # 12, the person in charge performs the utterance # 13 of only the filler of “Yes” as a consideration for the customer's utterance. The customer utters “which product is recommended” after the utterance # 13 of the person in charge. Since the utterance of the customer who inquires about the recommended product is over, the person in charge of the response has made utterance # 15 to list the recommended product.

話し終わり判定モデル１５は、このような対話において、図９に示すように、顧客の発話＃１１は話し終わりの発話ではないと判定し、顧客の発話＃１２，＃１４は話し終わりの発話であると判定したとする。話し終わり判定装置１０Ｂは、この話し終わり判定モデル１５の判定結果を出力する。 As shown in FIG. 9, the conversation end determination model 15 determines that the customer utterance # 11 is not the end of the utterance and the customer utterances # 12 and # 14 are the end of the utterance. Suppose that it is determined. The talk end determination device 10B outputs the determination result of the talk end determination model 15.

なお、本実施形態においては、話し終わり判定装置１０Ｂが学習データ生成部１２や判定モデル生成部１４を備える例を用いて説明したが、これに限られるものではなく、話し終わり判定装置１０Ｂとは別の外部装置が、学習データ生成部１２や判定モデル生成部１４を備えていてよい。この場合、話し終わり判定装置１０Ｂは、外部装置により生成された話し終わり判定モデル１５を取得し、取得した話し終わり判定モデル１５の判定結果を出力する。 In the present embodiment, the description has been given using the example in which the talking end determination device 10B includes the learning data generation unit 12 and the determination model generation unit 14. However, the present invention is not limited to this, and the talking end determination device 10B. Another external device may include the learning data generation unit 12 and the determination model generation unit 14. In this case, the talk end determination device 10B acquires the talk end determination model 15 generated by the external device, and outputs the determination result of the acquired talk end determination model 15.

（第４の実施形態）
図１０は、本発明の第４の実施形態に係る話し終わり判定装置１０Ｃの構成例を示す図である。なお、図１０において、図７と同様の構成については同じ符号を付し、説明を省略する。 (Fourth embodiment)
FIG. 10 is a diagram illustrating a configuration example of a talking end determination device 10C according to the fourth embodiment of the present invention. In FIG. 10, the same components as those in FIG.

図１０に示す話し終わり判定装置１０Ｃは、図７に示す話し終わり判定装置１０Ｂと比較して、学習データ生成部１２を学習データ生成部１２Ｃに変更した点が異なる。 The talking end determination device 10C shown in FIG. 10 differs from the talking end determination device 10B shown in FIG. 7 in that the learning data generation unit 12 is changed to a learning data generation unit 12C.

学習データ生成部１２Ｃは、判定部１１の判定結果と話し終わり判定モデル１５の判定結果とが入力され、これらの判定結果に基づき話し終わり学習データ１３を生成する。 The learning data generation unit 12C receives the determination result of the determination unit 11 and the determination result of the speech end determination model 15, and generates the end of learning data 13 based on these determination results.

次に、本実施形態に係る話し終わり判定装置１０Ｃの動作について説明する。本実施形態に係る話し終わり判定装置１０Ｃにおいても、第３の実施形態に係る話し終わり判定装置１０Ｂと同様に事前処理およびオンライン処理が行われる。 Next, the operation of the talking end determination device 10C according to the present embodiment will be described. In the talk end determination device 10C according to the present embodiment, pre-processing and online processing are performed in the same manner as the talk end determination device 10B according to the third embodiment.

次に、事後処理として、判定部１１は、対話構造に基づき、入力された対話データが示す対話における発話が話し終わりの発話であるか否かを判定する。 Next, as post-processing, the determination unit 11 determines whether or not the utterance in the dialog indicated by the input dialog data is an end-of-speak utterance based on the dialog structure.

まず、判定部１１は、対話における発話のうち、フィラーのみの発話（発話＃１３）を取り除く。そして、判定部１１は、各発話に対し、その発話の後に続く発話との間で話者交代が起こったか否かを判定する。なお、上述したように、顧客の発話と応対担当者の発話とが異なるチャンネルで入力される。判定部１１は、各チャンネルの入力を監視することで、話者交代が起こったか否かを判定することができる。そして、判定部１１は、話者交代が起こったと判定すると、話者交代の直前の発話を話し終わりの発話であると判定し、話者交代が起こっていないと判定すると、その直前の発話を話し終わりの発話でないと判定する。 First, the determination unit 11 removes the utterance of only the filler (utterance # 13) from the utterances in the dialogue. Then, the determination unit 11 determines whether or not a speaker change has occurred between each utterance and the utterance following the utterance. As described above, the customer's utterance and the response person's utterance are input through different channels. The determination unit 11 can determine whether or not a speaker change has occurred by monitoring the input of each channel. When the determination unit 11 determines that the speaker change has occurred, the determination unit 11 determines that the utterance immediately before the speaker change is the utterance at the end of the talk, and determines that the speaker change has not occurred, Determine that the utterance is not the end of the talk.

図１１に示すように、発話＃１１と発話＃１１に続く発話＃１２との間では、発話は顧客のままであり、話者交代は起こっていない。また、発話＃１２と発話＃１２に続く発話＃１４（フィラーのみの発話＃３は除く）との間では、発話は顧客のままであり、話者交代は起こっていない。また、発話＃１４と発話＃１４に続く発話＃１５との間では、発話は顧客から応対担当者に交代しており、話者交代が起こっている。そのため、判定部１１は、図８に示すように、顧客の発話＃１１，＃１２を話し終わりの発話でないと判定し、顧客の発話＃１４を話し終わりの発話であると判定する。 As shown in FIG. 11, between the utterance # 11 and the utterance # 12 following the utterance # 11, the utterance remains the customer, and no speaker change has occurred. In addition, between the utterance # 12 and the utterance # 14 following the utterance # 12 (excluding the filler-only utterance # 3), the utterance remains the customer, and no speaker change occurs. Also, between the utterance # 14 and the utterance # 15 following the utterance # 14, the utterance is changed from the customer to the person in charge, and a speaker change occurs. Therefore, as shown in FIG. 8, the determination unit 11 determines that the customer's utterances # 11 and # 12 are not utterances at the end of the conversation, and determines that the customer's utterance # 14 is an utterance at the end of the conversations.

学習データ生成部１２Ｃは、話し終わり判定モデル１５による判定結果と、判定部１１による判定結果とを比較する。そして、学習データ生成部１２Ｃは、図１２に示すように、話し終わり判定モデル１５による判定結果と、判定部１１による判定結果とが一致する発話を話し終わり学習データ１３に追加する。 The learning data generation unit 12C compares the determination result by the speech end determination model 15 with the determination result by the determination unit 11. Then, as shown in FIG. 12, the learning data generation unit 12 </ b> C adds, to the speaking end learning data 13, an utterance in which the determination result by the speaking end determination model 15 matches the determination result by the determination unit 11.

図１２に示す例では、話し終わり判定モデル１５と判定部１１とで、発話＃１１の判定結果（話し終わりの発話でない）および発話＃１４の判定結果（話し終わりの発話である）が一致している。学習データ生成部１２Ｃは、発話＃１１，＃１４とその判定結果とを話し終わり学習データ１３として追加する。話し終わり判定モデル１５と判定部１１とで判定結果が一致している場合、その判定結果の信頼性は高いと考えられる。そのため、話し終わり判定モデル１５と判定部１１とで一致する判定結果を話し終わり学習データ１３とすることで、その話し終わり学習データ１３を用いた機械学習により、話し終わり判定モデル１５の信頼性の向上を図ることができる。 In the example shown in FIG. 12, the determination result of utterance # 11 (not the utterance at the end of the talk) and the determination result of utterance # 14 (the utterance at the end of the talk) match between the talk end determination model 15 and the determination unit 11. ing. The learning data generation unit 12C adds the utterances # 11 and # 14 and the determination result as the talking end learning data 13. When the determination result is the same between the speech end determination model 15 and the determination unit 11, the reliability of the determination result is considered high. For this reason, by determining the coincidence result between the speech end determination model 15 and the determination unit 11 as the speech end learning data 13, the reliability of the speech end determination model 15 is improved by machine learning using the speech end learning data 13. Improvements can be made.

なお、学習データ生成部１２Ｃは、話し終わり判定モデル１５による判定結果と、判定部１１による判定結果とを比較することなく、判定部１１による判定結果を話し終わり学習データ１３に追加してもよい。 Note that the learning data generation unit 12C may add the determination result by the determination unit 11 to the speech end learning data 13 without comparing the determination result by the speech end determination model 15 and the determination result by the determination unit 11. .

このように本実施形態においては、話し終わり判定装置１０Ｃは、対話における発話が話し終わりの発話であるか否かが、判定部１１と話し終わり判定モデル１５とで一致する判定結果を、話し終わり学習データ１３に追加する学習データ生成部１２Ｃを備える。 As described above, in the present embodiment, the speech end determination device 10 </ b> C gives a determination result indicating whether or not the speech in the dialogue is the speech end speech, in the determination unit 11 and the speech end determination model 15. A learning data generation unit 12C to be added to the learning data 13 is provided.

話し終わり判定モデル１５と判定部１１とで一致する判定結果を話し終わり学習データ１３とすることで、その話し終わり学習データ１３を用いた機械学習により、話し終わり判定モデル１５の信頼性の向上を図ることができる。 By making the determination result coincided with the end-of-speech determination model 15 and the determination unit 11 into the end-of-speech learning data 13, the reliability of the end-of-speech determination model 15 can be improved by machine learning using the end-of-speech learning data 13. You can plan.

なお、上述したように、話し終わり学習データ１３は、発話単位ではなく、連続する複数の発話をまとめた（蓄積した）単位で生成してもよい。例えば、１つの発話が話し終わりの発話でない場合、その発話に話し終わりフラグを付与するとともに、その発話と次の発話とをつなげた発話に対しても話し終わりフラグを付与して、話し終わり学習データ１３を生成してもよい。この場合、話し終わりの発話であると判定されるまで、発話が順次つなげられる。そして、話し終わりの発話であると判定されると、発話の蓄積がリセットされ、話し終わりの発話であると判定された発話の次の発話について、話し終わりの発話であるか否かが判定される。 As described above, the end-of-speech learning data 13 may be generated not in units of utterances but in units in which a plurality of continuous utterances are collected (accumulated). For example, if one utterance is not an end-of-speech utterance, an end-of-speech flag is assigned to that utterance, and an end-of-speech flag is also attached to an utterance that connects that utterance with the next utterance, thereby learning the end of the talk. Data 13 may be generated. In this case, the utterances are sequentially connected until it is determined that the utterance is the end of the talk. When it is determined that the utterance is the end utterance, the accumulation of the utterance is reset, and it is determined whether the utterance next to the utterance determined to be the end utterance is the end utterance. The

また、顧客と応対担当者との対話は、音声認識処理によりテキスト化した上で、話し終わり判定モデル１５での処理が行われる。ここで、音声認識処理においては、誤りが発生することがある。そこで、発話に含まれる単語の音声認識処理による認識結果として複数の候補を用意するＮ−ｂｅｓｔ法を用いた処理を行ってもよい。 In addition, the conversation between the customer and the person in charge is converted into text by voice recognition processing and then processed by the talking end determination model 15. Here, an error may occur in the speech recognition process. Therefore, a process using the N-best method of preparing a plurality of candidates as a recognition result by a speech recognition process of a word included in an utterance may be performed.

図１３は、自然言語を入力とした機械学習について概念的に示す図である。 FIG. 13 is a diagram conceptually showing machine learning using a natural language as an input.

自然言語を入力とする機械学習では、機械学習（サポートベクターマシン（ＳＶＭ））の入力に合わせた数値ベクトル化を行うために、入力テキストに対する素性計算が行われる。すなわち、学習時や判定時に、入力テキストに対して、機械学習の入力に合わせた数値ベクトル化を行う必要がある。このような素性計算としては、例えば、文章に単語が含まれているかどうかのみを考慮し、単語の並び方などは考慮しないモデル（bag-of-words）が用いられるのが一般的である。 In machine learning using natural language as input, feature calculation is performed on input text in order to perform numerical vectorization in accordance with the input of machine learning (support vector machine (SVM)). That is, at the time of learning or determination, it is necessary to perform numerical vectorization corresponding to the input of machine learning for the input text. As such feature calculation, for example, a model (bag-of-words) that considers only whether words are included in a sentence and does not consider how words are arranged is generally used.

図１４は、bag-of-wordsの具体例を示す図である。 FIG. 14 is a diagram illustrating a specific example of bag-of-words.

bag-of-wordsでは、文章に単語が含まれていれば、その単語に対応する数値を１とし、文章に単語が含まれているか否かを表現する入力ベクトルが計算される。入力テキストが「インターネットで定期預金の解約はできますか」であるとすると、入力テキストに対して形態素解析が行われる。 In bag-of-words, if a sentence contains a word, the numerical value corresponding to the word is set to 1, and an input vector expressing whether or not the sentence contains a word is calculated. If the input text is “Can I cancel my time deposit on the Internet?”, Morphological analysis is performed on the input text.

具体的には、図１４に示すように、大量のテキストの形態素解析により、テキストに出現する単語をカバーするようにリスト化され、各単語に単語番号が割り当てられた単語リストが事前に生成される。そして、単語リストに含まれる単語のうち、入力テキストに出現する単語の単語番号に対応する入力ベクトルの値が「１」となり。入力テキストに出現しない単語の単語番号に対応する入力ベクトルの値が「０」となる。 Specifically, as shown in FIG. 14, a morphological analysis of a large amount of text generates a list in advance so that words appearing in the text are covered and a word number is assigned to each word. The Then, among the words included in the word list, the value of the input vector corresponding to the word number of the word appearing in the input text is “1”. The value of the input vector corresponding to the word number of a word that does not appear in the input text is “0”.

なお、形態素解析の代わりに、品詞を用いるbag-of-posなどの素性計算方法、bag-of-wordsとbag-of-posとを組み合わせた素性計算も用いてもよい。 Instead of morphological analysis, feature calculation methods such as bag-of-pos using parts of speech, or feature calculation combining bag-of-words and bag-of-pos may be used.

話し終わり判定モデル１５での処理のために、対話の音声を音声認識処理によりテキスト化した音声認識結果テキストを形態素解析し、形態素解析の結果からbag-of-wordsなどで素性計算を行うことが考えられる。 For the processing by the speech end determination model 15, the speech recognition result text obtained by converting the voice of the dialogue into the text by the speech recognition processing is subjected to morphological analysis, and the feature calculation is performed by bag-of-words or the like from the result of the morphological analysis. Conceivable.

ここで、音声認識処理に誤りが生じたとする。例えば、「インターネットで定期預金の解約はできますか」という音声に対して、図１５に示すように、「インターネットで敵よ金の害はできますか」と誤った音声認識が行われたとする。このような誤った音声認識結果テキストに対して形態素解析が行われ、入力ベクトルが計算されると、誤りが蓄積され、正しい入力音声を反映した素性計算を行うことができない。 Here, it is assumed that an error has occurred in the speech recognition process. For example, in response to a voice saying “Can I cancel my time deposit on the Internet?”, As shown in FIG. . When morphological analysis is performed on such an erroneous speech recognition result text and an input vector is calculated, errors are accumulated, and the feature calculation reflecting the correct input speech cannot be performed.

このように、音声認識結果を機械学習の入力とする従来方法では、誤った形態素単位となりやすいという問題がある。このような問題が生じる原因としては、音声認識処理に用いられる音声認識辞書と、形態素解析に用いられる形態素解析辞書とが異なる場合が多く、これらの辞書に登録される単語の違いにより不整合が生じることがある。また、別の原因としては、形態素解析は、人が読める正常な文章を対象にしているため、音声認識結果テキストの誤りにより、誤った形態素解析が行われることがある。 As described above, the conventional method using the speech recognition result as an input for machine learning has a problem that it is likely to be an erroneous morpheme unit. The cause of such a problem is that the speech recognition dictionary used for speech recognition processing is often different from the morpheme analysis dictionary used for morpheme analysis, and inconsistencies are caused by differences in words registered in these dictionaries. May occur. Another reason is that the morphological analysis is performed on a normal sentence that can be read by humans, and therefore an erroneous morphological analysis may be performed due to an error in the speech recognition result text.

また、音声認識結果を機械学習の入力とする従来方法では、音声認識処理に誤りが生じると、その誤りを含んだまま素性計算を行うため、正しい入力音声を反映する素性にならないという問題がある。 Further, in the conventional method in which the speech recognition result is input to machine learning, if an error occurs in the speech recognition processing, the feature calculation is performed while the error is included, and thus there is a problem that the feature does not reflect the correct input speech. .

そこで、本発明においては、図１６に示すように、音声認識処理の結果得られる、Ｎ位候補の単語系列（Ｎ−ｂｅｓｔ結果）を、機械学習での素性計算（bag-of-wordsなど）に用いる。 Therefore, in the present invention, as shown in FIG. 16, a word sequence (N-best result) of the N-th candidate obtained as a result of the speech recognition processing is used to calculate a feature (eg, bag-of-words) in machine learning. Used for.

音声認識処理では、音声認識辞書に登録されている登録（品詞情報なども含む）の組み合わせの中で、最も入力音声に近い単語列を探索するという処理が行われる。そのため、音声認識処理の結果として、単語（品詞情報なども含む）の列を得ることができる。また、入力音声への近さの順に、１位候補以外のＮ位候補までを得ることができる。そのため、仮に、１位候補が誤りであっても、Ｎ位候補内に正しい単語が含まれる可能性が高くなる。 In the speech recognition process, a process of searching for a word string closest to the input speech among combinations of registrations (including part-of-speech information and the like) registered in the speech recognition dictionary is performed. Therefore, as a result of the speech recognition process, a sequence of words (including part-of-speech information) can be obtained. In addition, it is possible to obtain up to N-th candidates other than the first candidate in order of proximity to the input voice. Therefore, even if the first candidate is incorrect, there is a high possibility that a correct word is included in the N candidate.

このように本発明においては、形態素解析処理を行わず、音声認識処理の結果得られる、Ｎ位候補の単語系列を用いる。そのため、形態素解析による誤りが生じず、誤りも含めた音声認識処理の結果がそのまま素性に反映される。また、形態素解析を行わないため、処理量の削減を図ることができる。また、形態素解析を行わないため、形態素解析辞書を用意する必要がない。また、Ｎ位候補までの音声認識結果を素性に反映させるため、１位候補に音声認識誤りが生じても、Ｎ位候補内に正しい単語が含まれている可能性が高く、それらを素性計算に反映することができる。 As described above, in the present invention, the word sequence of the N-th candidate obtained as a result of the speech recognition process is used without performing the morphological analysis process. For this reason, an error due to morphological analysis does not occur, and the result of the speech recognition process including the error is directly reflected in the feature. In addition, since morphological analysis is not performed, the amount of processing can be reduced. Further, since morphological analysis is not performed, it is not necessary to prepare a morphological analysis dictionary. In addition, since the speech recognition results up to the Nth candidate are reflected in the feature, even if a speech recognition error occurs in the first candidate, there is a high possibility that the Nth candidate contains a correct word, and these are calculated. Can be reflected.

実施形態では特に触れていないが、話し終わり判定装置１０として機能するコンピュータが行う各処理を実行するためのプログラムが提供されてもよい。また、プログラムは、コンピュータ読取り可能媒体に記録されていてもよい。コンピュータ読取り可能媒体を用いれば、コンピュータにインストールすることが可能である。ここで、プログラムが記録されたコンピュータ読取り可能媒体は、非一過性の記録媒体であってもよい。非一過性の記録媒体は、特に限定されるものではないが、例えば、ＣＤ−ＲＯＭやＤＶＤ−ＲＯＭなどの記録媒体であってもよい。 Although not particularly mentioned in the embodiment, a program for executing each process performed by a computer functioning as the speech end determination device 10 may be provided. The program may be recorded on a computer readable medium. If a computer-readable medium is used, it can be installed on a computer. Here, the computer-readable medium on which the program is recorded may be a non-transitory recording medium. The non-transitory recording medium is not particularly limited, but may be a recording medium such as a CD-ROM or a DVD-ROM.

上述の実施形態は代表的な例として説明したが、本発明の趣旨および範囲内で、多くの変更および置換が可能であることは当業者に明らかである。したがって、本発明は、上述の実施形態によって制限するものと解するべきではなく、特許請求の範囲から逸脱することなく、種々の変形や変更が可能である。例えば、実施形態の構成図に記載の複数の構成ブロックを１つに組み合わせたり、あるいは１つの構成ブロックを分割したりすることが可能である。 Although the above embodiments have been described as representative examples, it will be apparent to those skilled in the art that many changes and substitutions can be made within the spirit and scope of the invention. Therefore, the present invention should not be construed as being limited by the above-described embodiments, and various modifications and changes can be made without departing from the scope of the claims. For example, it is possible to combine a plurality of constituent blocks described in the configuration diagram of the embodiment into one, or to divide one constituent block.

１０，１０Ａ，１０Ｂ，１０Ｃ話し終わり判定装置
１１判定部
１２，１２Ｃ学習データ生成部
１３話し終わり学習データ
１４判定モデル生成部
１５話し終わり判定モデル 10, 10A, 10B, 10C Talk End Determination Device 11 Judgment Unit 12, 12C Learning Data Generation Unit 13 Talk End Learning Data 14 Judgment Model Generation Unit 15 Talk End Determination Model

Claims

An end-of-speech determination device that determines whether or not an utterance in a dialogue by a plurality of speakers is an end-of-speaker utterance,
A determination unit that determines whether or not the utterance in the dialogue is an utterance at the end of the speech based on the presence or absence of a change of speakers in the dialogue ;
A learning data generation unit that generates learning data used for machine learning of a speech end determination model for determining whether or not an utterance in a dialogue is a speech at the end of a speech, based on a determination result of the determination unit Talking end judgment device.

In the talk end judging device according to claim 1 ,
The learning data generation unit is configured to add, to the learning data, a determination result in which whether the utterance in the dialogue is an utterance at the end of a conversation matches between the determination unit and the end-of-speech determination model. End of talk determination device.

In the talk end judging device according to claim 1 or 2 ,
An end-of-speech determination device that determines whether or not an utterance in the dialogue is an end-of-speaker utterance using the end-of-speech determination model and outputs a determination result.

  An end-of-speech determination device that determines whether or not an utterance in a dialogue by a plurality of speakers is an end-of-speaker utterance,
  A determination unit that determines whether or not the utterance in the dialog is an end-of-speech utterance based on the presence or absence of a speaker change in the dialog;
  The determination unit is a speech end determination device characterized in that an utterance obtained by removing only the filler utterance from the utterance in the dialogue is the target of the determination.

  An end-of-speech determination device that determines whether or not an utterance in a dialogue by a plurality of speakers is an end-of-speaker utterance,
  A determination unit that determines whether or not the utterance in the dialog is an end-of-speech utterance based on the presence or absence of a speaker change in the dialog;
  An utterance in the dialogue is generated by using a speech end determination model for determining whether the utterance in the dialogue is an utterance at the end of the conversation, which is generated by machine learning of the learning data generated based on the determination result of the determination unit. A speech end determination device characterized by determining whether or not the speech is at the end of a speaker's speech and outputting a determination result.

An end-of-speech determination method in an end-of-speech determination device that determines whether an utterance in a dialogue by a plurality of speakers is an end-of-speaker utterance,
Determining whether the utterance in the dialogue is an end-of-speech utterance based on the presence or absence of the change of the speaker in the dialogue; and
Generating end-of-speech data used for machine learning of an end-of-speech determination model for determining whether or not an utterance in a dialogue is an end-of-speech utterance. Method.

The method for determining the end of a conversation according to claim 6.
Whether or not the utterance in the dialogue is an utterance at the end of the talk is added to the learning data, a judgment result that matches the judgment based on the presence or absence of the change of the speaker and the talk end judgment model. End of talk determination method.

The method for determining an end of speech according to claim 7,
A speech end determination method, further comprising: determining whether or not an utterance in the dialogue is an utterance at the end of a speaker's speech using the speech end determination model and outputting a determination result.

  An end-of-speech determination method in an end-of-speech determination device that determines whether an utterance in a dialogue by a plurality of speakers is an end-of-speaker utterance,
  Determining whether the utterance in the dialogue is an end-of-speech utterance based on the presence or absence of a speaker change in the dialogue; and
  A speech end determination method characterized in that an utterance obtained by removing an utterance of only a filler from an utterance in the dialog is the target of the determination.

  An end-of-speech determination method in an end-of-speech determination device that determines whether an utterance in a dialogue by a plurality of speakers is an end-of-speaker utterance,
  Determining whether the utterance in the dialogue is an end-of-speech utterance based on the presence or absence of a change of speakers in the dialogue;
  The utterance in the dialog is a speaker by using a speech end determination model for determining whether or not the utterance in the dialog is an utterance at the end of the conversation, which is generated by machine learning of the learning data generated based on the determination result. Determining whether or not the speech is at the end of the speech, and outputting a determination result.

The program for functioning a computer as a speech end determination apparatus as described in any one of Claims 1-5 .