JP2013029868A

JP2013029868A - Conversation screening program, conversation screening device, and conversation screening method

Info

Publication number: JP2013029868A
Application number: JP2012244014A
Authority: JP
Inventors: Sachiko Onodera; 佐知子小野寺
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 2012-11-05
Filing date: 2012-11-05
Publication date: 2013-02-07

Abstract

PROBLEM TO BE SOLVED: To improve work efficiency of monitoring by screening problematic conversation efficiently and highly accurately.SOLUTION: A recording voice data is acquired, and prosody information extraction processing is performed by an extraction unit 401. Next, speech utterance period detection processing is performed by a detection unit 402. Voice recognition result information is acquired by an acquisition unit 405. Then, basic conversation analysis processing by a basic conversation analysis unit 403, conversation structure analysis processing by a conversation structure analysis unit 404, and speech utterance content analysis processing by a speech utterance content analysis unit 406 are performed. Furthermore, determination processing is performed by a determination unit 407, and output processing is performed by an output unit 408.

Description

この発明は、録音された話者間の対話を問題対話または正常対話に選別する対話選別プログラム、対話選別装置、および対話選別方法に関する。 The present invention relates to a dialogue selection program, a dialogue selection device, and a dialogue selection method for selecting a dialogue between recorded speakers as a problem dialogue or a normal dialogue.

コールセンタでは、エージェント（オペレータとも呼ばれる）の顧客への対応をチェックするために、モニタリングをおこなっている。モニタリングとは、エージェントと顧客との対話を実際に聴き、対話内容をチェックすることである。モニタリングでは、各エージェントの対話をランダムに数個選択し、選択した対話を頭から聴取して、チェックをおこなう。 In the call center, monitoring is performed to check the response of an agent (also called an operator) to a customer. Monitoring means actually listening to the dialogue between the agent and the customer and checking the content of the dialogue. In monitoring, several dialogues of each agent are selected at random, and the selected dialogues are listened to from the head and checked.

モニタリングの目的は、大きく分けて２つある。１つは、エージェントの対応スキル（話し方や言葉遣いなど）を評価し、指導に利用するためである。２つめは、問題コールの原因追及と対応をおこなうためである。２つめの目的でモニタリングするときには、予め問題コールの可能性のあるコールを選別すると、効率的に聴取できる。ここで、問題コールについて具体的に説明する。問題コールは大きく分けて以下の（１）〜（３）がある。 There are two main purposes for monitoring. One is to evaluate the agent's response skills (speaking, wording, etc.) and use it for guidance. The second is to investigate and respond to the cause of the problem call. When monitoring for the second purpose, it is possible to listen efficiently by selecting calls that may be problematic calls in advance. Here, the problem call will be specifically described. Problem calls are roughly divided into the following (1) to (3).

（１）必要以上に応対に手間がかかっているコール
（２）顧客が怒り出してしまうコール
（３）顧客との対話がかみ合わないコール (1) Calls that take more time to answer than necessary (2) Calls that make customers angry (3) Calls that do not engage with customer interactions

（１）の問題コールは、回答検索などで顧客を待たせている時間が多くなっている。
（２）の問題コールは、対話前から顧客が怒っているか、対応が不適切または顧客が誤解などにより怒っている。
（３）の問題コールは、「エージェントがうまく回答、説明できない」、「エージェントが顧客の質問を把握していない」などにより、対話をかみ合わせるため対話が長引く、あるいは、話がかみ合わないため、たいていの場合、顧客があきらめて対話を終了させている。 In the problem call (1), a long time is spent waiting for a customer for an answer search or the like.
In the problem call (2), the customer is angry before the dialogue, or the customer is angry because of inappropriate response or misunderstanding.
The problem call of (3) is because the dialogue is prolonged or the story is not engaged because the agent engages the dialogue due to “the agent cannot answer and explain well”, “the agent does not understand the customer's question”, etc. In most cases, the customer gives up and ends the conversation.

上記（１）の問題コールは、下記特許文献１の技術を利用することで、回答検索などでエージェントが顧客を待たせている状態の有無を調べることができ、当該問題コールが抽出可能である。また、上記（２）の問題コールは、下記特許文献２を利用することで、顧客発話の感情状態を調べて、顧客が怒っている可能性のあるコールがわかるため、当該問題コールが抽出可能である。 The problem call of the above (1) can be examined by using the technique of the following Patent Document 1 to check whether or not the agent is waiting for the customer by answer search or the like, and the problem call can be extracted. . In addition, the problem call in (2) above can be extracted by using the following Patent Document 2 to check the emotional state of the customer utterance and find the call that may be angry by the customer. It is.

また、応対時間の長いコールを選別してチェックすることは、多くのコールセンタで行われているのが実情である。このチェックにより、上記（３）の問題コールのうち、話がかみ合うまで延々対話を続けているようなケースについては、抽出可能である。 In addition, it is a fact that many call centers are selecting and checking calls with a long answering time. As a result of this check, it is possible to extract cases in which the conversation continues for a long time until the talk is engaged, among the problem calls of (3) above.

特開２００７−３３７５４号公報JP 2007-33754 A 特表２００３−５０８８０５号公報Special table 2003-508805 gazette

しかしながら、上記（３）の問題コールのうち、話がかみ合わないまま対話が終了してしまったケースについては、応対が長引かないため、そのようなコールの選別はできないという問題があった。 However, among the problem calls in (3) above, the case where the conversation is terminated without talking is a problem that the call cannot be selected because the response is not prolonged.

この発明は、上述した従来技術による問題点を解消するため、問題コールとなる対話（問題対話）の選別を効率的かつ高精度におこなうことにより、モニタリングの作業効率の向上を図ることができる対話選別プログラム、対話選別装置、および対話選別方法を提供することを目的とする。 In order to solve the above-described problems caused by the conventional technology, the present invention can efficiently improve the monitoring work efficiency by efficiently and accurately selecting dialogues that become problem calls (problem dialogues). It is an object of the present invention to provide a sorting program, a dialogue sorting device, and a dialogue sorting method.

上述した課題を解決し、目的を達成するため、第１の対話選別プログラム、対話選別装置、および対話選別方法は、話者間の対話に関する音声データから前記話者ごとの音韻情報から一連の発話区間を前記話者ごとに検出し、検出された前記話者ごとの一連の発話区間の区間長に基づいて、前記両話者の特定の対話区間ごとに主導権話者を特定し、前記録音音声データの開始冒頭において前記主導権話者が相手方に質問をする立場の話者である区間を質問フェーズとし、当該質問フェーズ後において前記主導権話者が相手方から質問を受ける立場の話者である区間を回答フェーズとし、回答フェーズ以降において前記両話者のうち指定話者が前記主導権話者である主導権保持時間を算出し、前記指定話者の主導権保持時間の時間長に基づいて、前記対話の前記指定話者への偏りを分析し、その分析結果に基づいて、前記対話を問題対話に決定し、決定結果を出力することを要件とする。 In order to solve the above-described problems and achieve the object, a first dialogue selection program, a dialogue selection device, and a dialogue selection method are provided by performing a series of utterances from phonetic information for each speaker from speech data related to dialogue between speakers. Detecting a section for each speaker, identifying an initiative speaker for each specific conversation section of the two speakers based on a section length of a series of utterance sections for each detected speaker, and recording the recording The section in which the initiative speaker is a speaker who asks the other party at the beginning of the voice data is set as a question phase, and the initiative speaker is a speaker who receives a question from the other party after the question phase. A certain section is set as an answer phase, and after the answer phase, the designated speaker among the two speakers calculates the initiative holding time that is the initiative speaker, and is based on the time length of the initiative speaker holding time of the designated speaker. The Wherein analyzing the deviation of the specified speaker conversation, based on the analysis result, it determines the dialogue interaction problems and requirements to output the determination result.

この対話選別プログラム、対話選別装置、および対話選別方法によれば、いずれか一方の話者が一方的に話していた対話を、対話がかみ合わずに終了した問題対話に選別することができる。 According to the dialog screening program, the dialog screening device, and the dialog screening method, it is possible to select a dialog that has been unilaterally spoken by any one of the speakers as a problem dialog that has ended without engaging the dialog.

また、第２の対話選別プログラム、対話選別装置、および対話選別方法は、話者間の対話に関する録音音声データから前記話者ごとの音韻情報から一連の発話区間を前記話者ごとに検出し、検出された前記話者ごとの一連の発話区間の区間長に基づいて、前記両話者の特定の対話区間ごとに主導権話者を特定し、前記録音音声データの開始冒頭において前記主導権話者が相手方に質問をする立場の話者である区間を質問フェーズとし、当該質問フェーズ後において前記主導権話者が相手方から質問を受ける立場の話者である区間を回答フェーズとし、所定の認識キーワードと一致または関連するキーワードおよびその出現時刻を含む前記録音音声データに関する前記話者ごとの認識結果を取得し、前記指定話者の認識結果の中から選ばれたキーワードの、前記検出手段によって検出された前記指定話者の前記一連の発話区間での出現状況に基づいて、前記対話の前記回答フェーズ以降における進行の順調性を分析し、その分析結果に基づいて、前記対話を問題対話に決定し、決定結果を出力することを要件とする。 Further, the second dialogue selection program, the dialogue selection device, and the dialogue selection method detect a series of utterance sections for each speaker from phonological information for each speaker from recorded voice data related to a dialogue between speakers, Based on the detected section length of a series of utterance intervals for each speaker, the initiative speaker is identified for each specific conversation interval of the two speakers, and the initiative story is started at the beginning of the recorded voice data. The section where the speaker is the speaker who is in the position to ask the other party the question is the question phase, and after that question phase the section where the initiative speaker is the speaker who receives the question from the other party is the answer phase, and the predetermined A recognition result for each speaker regarding the recorded voice data including a keyword that matches or relates to a keyword and its appearance time is acquired, and a key selected from the recognition results of the designated speaker is acquired. Based on the appearance status of the specified speaker in the series of utterance intervals detected by the detection means, the smoothness of the progress after the answer phase of the dialogue is analyzed, and based on the analysis result It is a requirement to determine the dialogue as a problem dialogue and output the decision result.

この対話選別プログラム、対話選別装置、および対話選別方法によれば、同じ発話内容が繰り返されていた対話を、対話がかみ合わずに終了した問題対話に選別することができる。 According to the dialogue selection program, the dialogue selection device, and the dialogue selection method, a dialogue in which the same utterance content is repeated can be selected as a problem dialogue that has ended without engaging the dialogue.

この対話選別プログラム、対話選別装置、および対話選別方法によれば、問題対話の選別を効率的かつ高精度におこなうことにより、モニタリングの作業効率の向上を図ることができるという効果を奏する。 According to the dialogue selection program, the dialogue selection device, and the dialogue selection method, it is possible to improve the monitoring work efficiency by efficiently and highly accurately selecting problem dialogues.

本実施の形態での抽出対象となる問題コールの一例を示す説明図である。It is explanatory drawing which shows an example of the problem call used as the extraction object in this Embodiment. 本実施の形態にかかる対話選別装置のハードウェア構成を示すブロック図である。It is a block diagram which shows the hardware constitutions of the dialogue selection apparatus concerning this Embodiment. コール情報テーブルの記憶内容を示す説明図である。It is explanatory drawing which shows the memory content of a call information table. 本実施の形態にかかる対話選別装置の機能的構成を示すブロック図である。It is a block diagram which shows the functional structure of the dialogue selection apparatus concerning this Embodiment. 抽出部の入出力を示す説明図である。It is explanatory drawing which shows the input / output of an extraction part. 検出部の入出力を示す説明図である。It is explanatory drawing which shows the input / output of a detection part. 主導権話者の動的な特定例を示す説明図である。It is explanatory drawing which shows the dynamic specific example of an initiative speaker. 基本対話分析部の入出力を示す説明図である。It is explanatory drawing which shows the input / output of a basic dialog analysis part. 対話構造分析部の入出力を示す説明図である。It is explanatory drawing which shows the input / output of a dialog structure analysis part. 取得部による音声認識結果情報の取得過程を示す説明図である。It is explanatory drawing which shows the acquisition process of the speech recognition result information by an acquisition part. 発話内容分析部の入出力を示す説明図である。It is explanatory drawing which shows the input / output of an utterance content analysis part. 類似度算出テーブルの記憶内容を示す説明図である。It is explanatory drawing which shows the memory content of a similarity calculation table. 問題リストファイルを示す説明図である。It is explanatory drawing which shows a problem list file. 本実施の形態にかかる対話選別処理手順を示すフローチャートである。It is a flowchart which shows the dialogue selection process procedure concerning this Embodiment. 検出部による発話区間検出処理（ステップＳ１４０３）の詳細な処理手順を示すフローチャートである。It is a flowchart which shows the detailed process sequence of the utterance area detection process (step S1403) by a detection part. 基本対話分析部による基本対話分析処理（ステップＳ１４０６）の詳細な処理手順を示すフローチャートである。It is a flowchart which shows the detailed process sequence of the basic dialog analysis process (step S1406) by a basic dialog analysis part. 主導権話者特定部による主導権話者特定処理（ステップＳ１６０１）の詳細な処理手順を示すフローチャート（その１）である。It is a flowchart (the 1) which shows the detailed process sequence of the initiative speaker specific process (step S1601) by an initiative speaker specific part. 主導権話者特定部による主導権話者特定処理（ステップＳ１６０１）の詳細な処理手順を示すフローチャート（その２）である。It is a flowchart (the 2) which shows the detailed process sequence of the initiative speaker specific process (step S1601) by a initiative speaker specific part. 対話構造分析部による対話構造分析処理（ステップＳ１４０７）の詳細な処理手順を示すフローチャートである。It is a flowchart which shows the detailed process sequence of the dialog structure analysis process (step S1407) by a dialog structure analysis part. 発話内容分析部による発話内容分析処理（ステップＳ１４０８）の詳細な処理手順を示すフローチャートである。It is a flowchart which shows the detailed process sequence of the utterance content analysis process (step S1408) by the utterance content analysis part. 決定部による決定処理（ステップＳ１４０９）の詳細な処理手順を示すフローチャートである。It is a flowchart which shows the detailed process sequence of the determination process (step S1409) by a determination part.

以下に添付図面を参照して、この対話選別プログラム、対話選別装置、および対話選別方法の好適な実施の形態を詳細に説明する。対話を選別するには、まず、正常な対話および問題対話となる異常な対話がどのような対話であるかを定義して、コンピュータにより自動選別可能とする必要がある。コールセンタや窓口での対話とは、『顧客からの質問⇒エージェントによる回答⇒プラスαの発話』という流れの対話である。正常な対話、問題
対話に関わらず、『顧客からの質問⇒エージェントによる回答』はほぼ同じであり、『プラスαの発話』により、正常か異常かが判断される。 Exemplary embodiments of the dialog selection program, the dialog selection apparatus, and the dialog selection method will be described below in detail with reference to the accompanying drawings. In order to select dialogues, it is first necessary to define what kind of dialogues are normal dialogues and abnormal dialogues that become problem dialogues, and to enable automatic selection by a computer. The dialogue at the call center or window is a dialogue of “question from customer → answer by agent → plus α utterance”. Regardless of the normal dialogue or problem dialogue, “question from customer ⇒ answer by agent” is almost the same, and “normal utterance” determines whether it is normal or abnormal.

たとえば、追加質問や参考情報の提供などは、対話がかみ合っているからこそ発せられる内容であるため、この場合は正常な対話となる。また、対話が顧客とエージェントの発話時間に偏りがない場合、コミュニケーションがとれていると推測されるため、正常な対話となる。さらに、同じ発話や言葉が繰り返されていない場合、対話が順調に進んでいるため、正常な対話といえる。 For example, additional questions and provision of reference information are the contents that can be issued because the dialogue is engaged, and in this case, the dialogue is normal. In addition, when there is no bias in the utterance time between the customer and the agent, it is presumed that communication is taking place, so the dialogue is normal. Furthermore, if the same utterance or word is not repeated, the dialogue is proceeding smoothly, so it can be said that the dialogue is normal.

一方、一方の話者がずっと話していたり、同じ発話や言葉が繰り返されたりしている場合、対話がかみ合っていないと推定される。したがって、対話が一方の話者に偏っている場合や同じ発話や言葉が繰り返されている場合を問題対話といえる。 On the other hand, if one speaker is speaking continuously or the same utterance or word is repeated, it is presumed that the dialogue is not engaged. Therefore, it can be said that the dialogue is biased toward one speaker or the same utterance or word is repeated.

本実施の形態では、このような対話がかみ合わないで終了してしまった問題対話を自動選別することで、モニタリング対象を絞込む。これにより、モニタリング作業が効率化するとともに作業負担の軽減を図る。 In the present embodiment, the target of monitoring is narrowed down by automatically selecting problem dialogs that have ended without engaging such dialogs. This increases the efficiency of the monitoring work and reduces the work load.

（問題コールの例）
図１は、本実施の形態での抽出対象となる問題コールの一例を示す説明図である。図１中、符号Ｔａ♯（♯は番号）はエージェントの発話であり、符号Ｔｃ♯は顧客の発話である。 (Example of problem call)
FIG. 1 is an explanatory diagram showing an example of a problem call to be extracted in the present embodiment. In FIG. 1, reference symbol Ta # (# is a number) is an agent utterance, and reference symbol Tc # is a customer utterance.

顧客の発話Ｔｃ１では、顧客は、ドライバのインストールができていないことが気になり、エージェントに問い合わせている。これに対し、発話Ｔａ１では、エージェントは、そのためにフロッピが読めないので、読む方法を伝えている。 In the customer utterance Tc1, the customer feels that the driver has not been installed, and makes an inquiry to the agent. On the other hand, in the utterance Ta1, since the agent cannot read the floppy for that reason, the agent tells the reading method.

また、発話Ｔｃ３では、顧客は、マニュアル手順どおりにやってドライバがインストールできなかったことに対して、アドバイス・回答を得たいので、同じことを繰り返している。 Further, in the utterance Tc3, the customer wants to obtain advice / answer for the fact that the driver could not be installed following the manual procedure, so the same is repeated.

また、発話Ｔａ５では、エージェントは質問に対応できず、とにかく教えた方法でフロッピが読めるからと逃げている。これに対し、発話Ｔｃ６では、顧客も、これ以上続けても意味がないと悟ったのか、「わかりました」と言って対話を終了する。 Also, in utterance Ta5, the agent cannot respond to the question and escapes because the floppies can be read by the method taught anyway. On the other hand, in the utterance Tc6, the customer ends the dialogue by saying “I understand” whether he / she realized that it is meaningless to continue further.

この対話では、顧客の「ドライバがインストールできなかったが、問題ないのか？」という疑問に的確な回答が与えられていない。当該コールが問題コールであることは、このコールを実際に聴取すればわかることである。 The dialogue does not give an accurate answer to the customer's question, “I was unable to install the driver, but is there a problem?” The fact that the call is a problem call can be understood by actually listening to the call.

上述した例は、コールセンタのような顧客とエージェントとの電話での対話（コール）の録音音声データについて説明したが、コールに限らず、店舗の窓口での顧客との直接対話の録音音声データでもよい。 In the above example, the recorded voice data of the telephone conversation (call) between the customer and the agent such as a call center has been described. However, the recorded voice data of the direct conversation with the customer at the store window is not limited to the call. Good.

（対話選別装置のハードウェア構成）
図２は、本実施の形態にかかる対話選別装置のハードウェア構成を示すブロック図である。図２において、対話選別装置は、ＣＰＵ（ＣｅｎｔｒａｌＰｒｏｃｅｓｓｉｎｇＵｎｉｔ）２０１と、ＲＯＭ（Ｒｅａｄ‐ＯｎｌｙＭｅｍｏｒｙ）２０２と、ＲＡＭ（ＲａｎｄｏｍＡｃｃｅｓｓＭｅｍｏｒｙ）２０３と、磁気ディスクドライブ２０４と、磁気ディスク２０５と、光ディスクドライブ２０６と、光ディスク２０７と、ディスプレイ２０８と、Ｉ／Ｆ（Ｉｎｔｅｒｆａｃｅ）２０９と、キーボード２１０と、マウス２１１と、スキャナ２１２と、プリンタ２１３と、を備えている。また、各構成部はバス２
００によってそれぞれ接続されている。 (Hardware configuration of dialog screening device)
FIG. 2 is a block diagram showing a hardware configuration of the dialog screening apparatus according to the present embodiment. In FIG. 2, the dialogue selection device includes a CPU (Central Processing Unit) 201, a ROM (Read-Only Memory) 202, a RAM (Random Access Memory) 203, a magnetic disk drive 204, a magnetic disk 205, and an optical disk drive. 206, an optical disk 207, a display 208, an I / F (Interface) 209, a keyboard 210, a mouse 211, a scanner 212, and a printer 213. Each component is a bus 2
00, respectively.

ここで、ＣＰＵ２０１は、対話選別装置の全体の制御を司る。ＲＯＭ２０２は、ブートプログラムなどのプログラムを記憶している。ＲＡＭ２０３は、ＣＰＵ２０１のワークエリアとして使用される。磁気ディスクドライブ２０４は、ＣＰＵ２０１の制御にしたがって磁気ディスク２０５に対するデータのリード／ライトを制御する。磁気ディスク２０５は、磁気ディスクドライブ２０４の制御で書き込まれたデータを記憶する。 Here, the CPU 201 governs the overall control of the dialogue selection device. The ROM 202 stores a program such as a boot program. The RAM 203 is used as a work area for the CPU 201. The magnetic disk drive 204 controls reading / writing of data with respect to the magnetic disk 205 according to the control of the CPU 201. The magnetic disk 205 stores data written under the control of the magnetic disk drive 204.

光ディスクドライブ２０６は、ＣＰＵ２０１の制御にしたがって光ディスク２０７に対するデータのリード／ライトを制御する。光ディスク２０７は、光ディスクドライブ２０６の制御で書き込まれたデータを記憶したり、光ディスク２０７に記憶されたデータをコンピュータに読み取らせたりする。 The optical disk drive 206 controls reading / writing of data with respect to the optical disk 207 according to the control of the CPU 201. The optical disk 207 stores data written under the control of the optical disk drive 206, or causes the computer to read data stored on the optical disk 207.

ディスプレイ２０８は、カーソル、アイコンあるいはツールボックスをはじめ、文書、画像、機能情報などのデータを表示する。このディスプレイ２０８は、たとえば、ＣＲＴ、ＴＦＴ液晶ディスプレイ、プラズマディスプレイなどを採用することができる。 The display 208 displays data such as a document, an image, and function information as well as a cursor, an icon, or a tool box. As the display 208, for example, a CRT, a TFT liquid crystal display, a plasma display, or the like can be adopted.

インターフェース（以下、「Ｉ／Ｆ」と略する。）２０９は、通信回線を通じてＬＡＮ（ＬｏｃａｌＡｒｅａＮｅｔｗｏｒｋ）、ＷＡＮ（ＷｉｄｅＡｒｅａＮｅｔｗｏｒｋ）、インターネットなどのネットワーク２１４に接続され、このネットワーク２１４を介して他の装置に接続される。そして、Ｉ／Ｆ２０９は、ネットワーク２１４と内部のインターフェースを司り、外部装置からのデータの入出力を制御する。Ｉ／Ｆ２０９には、たとえばモデムやＬＡＮアダプタなどを採用することができる。 An interface (hereinafter abbreviated as “I / F”) 209 is connected to a network 214 such as a LAN (Local Area Network), a WAN (Wide Area Network), and the Internet through a communication line. Connected to other devices. The I / F 209 controls an internal interface with the network 214 and controls data input / output from an external device. For example, a modem or a LAN adapter may be employed as the I / F 209.

キーボード２１０は、文字、数字、各種指示などの入力のためのキーを備え、データの入力をおこなう。また、タッチパネル式の入力パッドやテンキーなどであってもよい。マウス２１１は、カーソルの移動や範囲選択、あるいはウィンドウの移動やサイズの変更などをおこなう。ポインティングデバイスとして同様に機能を備えるものであれば、トラックボールやジョイスティックなどであってもよい。 The keyboard 210 includes keys for inputting characters, numbers, various instructions, and the like, and inputs data. Moreover, a touch panel type input pad or a numeric keypad may be used. The mouse 211 performs cursor movement, range selection, window movement, size change, and the like. A trackball or a joystick may be used as long as they have the same function as a pointing device.

スキャナ２１２は、画像を光学的に読み取り、対話選別装置内に画像データを取り込む。なお、スキャナ２１２は、ＯＣＲ（ＯｐｔｉｃａｌＣｈａｒａｃｔｅｒＲｅａｄｅｒ）機能を持たせてもよい。また、プリンタ２１３は、画像データや文書データを印刷する。プリンタ２１３には、たとえば、レーザプリンタやインクジェットプリンタを採用することができる。 The scanner 212 optically reads an image and takes in the image data into the dialogue sorting device. The scanner 212 may have an OCR (Optical Character Reader) function. The printer 213 prints image data and document data. As the printer 213, for example, a laser printer or an ink jet printer can be employed.

（コール情報テーブル）
図３は、コール情報テーブルの記憶内容を示す説明図である。コール情報テーブル３００は、図２に示したＲＯＭ２０２，ＲＡＭ２０３，磁気ディスク２０５などの記憶装置によってその機能を実現する。コール情報テーブル３００は、コールＩＤ、対話時間、対話構造分析結果フラグＦａ、発話内容分析結果フラグＦｂ、ＡＮＤ結果フラグＦｃといったフィールド項目を有し、レコードごとに対話を示すコール情報となる。コールＩＤは、対話（またはそのコール情報）を特定する識別情報である。コールＩＤは、その対話の録音音声データの格納先へのポインタとなる。 (Call information table)
FIG. 3 is an explanatory diagram showing the stored contents of the call information table. The call information table 300 realizes its function by a storage device such as the ROM 202, the RAM 203, and the magnetic disk 205 shown in FIG. The call information table 300 has field items such as a call ID, a dialog time, a dialog structure analysis result flag Fa, an utterance content analysis result flag Fb, and an AND result flag Fc, and is call information indicating a dialog for each record. The call ID is identification information that identifies a dialog (or call information thereof). The call ID becomes a pointer to the storage destination of the recorded voice data of the dialog.

ここで、録音音声データとは、対話音声を録音した音声データであり、図示しないデータベースに記憶されている。録音音声データは、ステレオの場合、一方のチャネルが顧客の音声データ、他方のチャネルがエージェントの音声データとなる。モノラルの場合、話者分離されているものとする。 Here, the recorded voice data is voice data obtained by recording dialogue voice, and is stored in a database (not shown). When the recorded voice data is stereo, one channel is customer voice data and the other channel is agent voice data. In the case of monaural, it is assumed that the speakers are separated.

対話時間とは、録音音声データの録音開始から終了までの時間である。対話時間は、録音音声データのデータ長から抽出される情報である。対話構造分析結果フラグＦａとは、その対話の対話構造分析結果を示す２値の情報であり、デフォルトの値はＦａ＝０である。対話構造分析については後述する。対話構造分析結果フラグＦａのみで問題対話選別をおこなう場合、対話構造分析結果フラグＦａがＦａ＝０のときは、そのコール情報の録音音声データにより特定される対話は、正常な対話となる。一方、対話構造分析結果フラグＦａがＦａ＝１のときは、そのコール情報の録音音声データにより特定される対話は、問題対話となる。 The dialogue time is the time from the start to the end of recording of the recorded voice data. The dialogue time is information extracted from the data length of the recorded voice data. The dialog structure analysis result flag Fa is binary information indicating the dialog structure analysis result of the dialog, and the default value is Fa = 0. Dialog structure analysis will be described later. When problem dialogue selection is performed using only the dialogue structure analysis result flag Fa, when the dialogue structure analysis result flag Fa is Fa = 0, the dialogue specified by the recorded voice data of the call information is a normal dialogue. On the other hand, when the dialog structure analysis result flag Fa is Fa = 1, the dialog specified by the recorded voice data of the call information is a problem dialog.

また、発話内容分析結果フラグＦｂとは、その対話の発話内容分析結果を示す２値の情報であり、デフォルトの値はＦｂ＝０である。発話内容分析については後述する。発話内容分析結果フラグＦｂのみで問題対話選別をおこなう場合、発話内容分析結果フラグＦｂがＦｂ＝０のときは、そのコール情報の録音音声データにより特定される対話は、正常な対話となる。一方、発話内容分析結果フラグＦｂがＦｂ＝１のときは、そのコール情報の録音音声データにより特定される対話は、問題対話となる。 The utterance content analysis result flag Fb is binary information indicating the utterance content analysis result of the dialogue, and the default value is Fb = 0. The utterance content analysis will be described later. When problem dialogue selection is performed using only the utterance content analysis result flag Fb, when the utterance content analysis result flag Fb is Fb = 0, the dialogue specified by the recorded voice data of the call information is a normal dialogue. On the other hand, when the utterance content analysis result flag Fb is Fb = 1, the dialogue specified by the recorded voice data of the call information is a problem dialogue.

また、ＡＮＤ結果フラグＦｃとは、対話構造分析結果フラグＦａと発話内容分析結果フラグＦｂのＡＮＤ結果を示す２値の情報である。デフォルトの値はＦｃ＝０である。対話構造分析結果フラグＦａおよび発話内容分析結果フラグＦｂがＦａ＝１でかつＦｂ＝１の場合、Ｆｃ＝１となる。対話構造分析結果フラグＦａおよび発話内容分析結果フラグＦｂで問題対話選別をおこなう場合、ＡＮＤ結果フラグＦｃがＦｃ＝０のときは、そのコール情報の録音音声データにより特定される対話は、正常な対話となる。一方、ＡＮＤ結果フラグＦｃがＦｃ＝１のときは、そのコール情報の録音音声データにより特定される対話は、問題対話となる。 The AND result flag Fc is binary information indicating an AND result of the dialog structure analysis result flag Fa and the utterance content analysis result flag Fb. The default value is Fc = 0. When the dialogue structure analysis result flag Fa and the utterance content analysis result flag Fb are Fa = 1 and Fb = 1, Fc = 1. When problem dialogue selection is performed using the dialogue structure analysis result flag Fa and the utterance content analysis result flag Fb, when the AND result flag Fc is Fc = 0, the dialogue specified by the recorded voice data of the call information is a normal dialogue. It becomes. On the other hand, when the AND result flag Fc is Fc = 1, the dialogue specified by the recorded voice data of the call information is a problem dialogue.

（対話選別装置の機能的構成）
図４は、本実施の形態にかかる対話選別装置の機能的構成を示すブロック図である。対話選別装置４００は、抽出部４０１と検出部４０２と基本対話分析部４０３と対話構造分析部４０４と取得部４０５と発話内容分析部４０６と決定部４０７と出力部４０８とを含む構成である。基本対話分析部４０３は、主導権話者特定部４３１と冒頭フェーズ特定部４３２とを含む。また、発話内容分析部４０６は、算出部４６１と判断部４６２を含む。 (Functional configuration of dialog screening device)
FIG. 4 is a block diagram showing a functional configuration of the dialogue selection apparatus according to the present embodiment. The dialogue selection device 400 includes an extraction unit 401, a detection unit 402, a basic dialogue analysis unit 403, a dialogue structure analysis unit 404, an acquisition unit 405, an utterance content analysis unit 406, a determination unit 407, and an output unit 408. The basic dialog analysis unit 403 includes an initiative speaker specifying unit 431 and an opening phase specifying unit 432. The utterance content analysis unit 406 includes a calculation unit 461 and a determination unit 462.

各機能４０１〜４０８は、具体的には、たとえば、図２に示したＲＯＭ２０２，ＲＡＭ２０３，磁気ディスク２０５などの記憶装置に記憶されたプログラムをＣＰＵ２０１に実行させることにより、または、Ｉ／Ｆ２０９により、その機能を実現する。以下、各機能について個別に説明する。 Specifically, each of the functions 401 to 408 is executed by causing the CPU 201 to execute a program stored in a storage device such as the ROM 202, the RAM 203, and the magnetic disk 205 illustrated in FIG. 2 or by the I / F 209. Realize its function. Hereinafter, each function will be described individually.

＜抽出部４０１＞
図５は、抽出部４０１の入出力を示す説明図である。抽出部４０１は、話者間の対話に関する録音音声データＤから話者ごとに音韻情報を抽出する機能を有する。具体的には、たとえば、図５に示したように、抽出部４０１は、録音音声データＤを入力し、入力された録音音声データＤからエージェント韻律データＭａおよび顧客韻律データＭｃを抽出する。韻律データＭａ，Ｍｃは、所定時間ごとの録音音声データＤのチャネル別（話者別）のパワー値を記録した情報である。各パワー値には、韻律ＩＤが付与されている。韻律ＩＤは、時系列的に昇順の番号である。韻律データＭａ，Ｍｃの抽出については、公知の手法で実現するため、本明細書では詳細を割愛する。 <Extractor 401>
FIG. 5 is an explanatory diagram showing input / output of the extraction unit 401. The extraction unit 401 has a function of extracting phoneme information for each speaker from the recorded voice data D relating to the conversation between the speakers. Specifically, for example, as illustrated in FIG. 5, the extraction unit 401 inputs the recorded voice data D, and extracts the agent prosody data Ma and the customer prosody data Mc from the input recorded voice data D. The prosodic data Ma and Mc are information in which the power value for each channel (for each speaker) of the recorded voice data D for each predetermined time is recorded. A prosodic ID is assigned to each power value. The prosodic ID is an ascending number in time series. The extraction of the prosodic data Ma and Mc is realized by a known method, and therefore details thereof are omitted in this specification.

＜検出部４０２＞
図６は、検出部４０２の入出力を示す説明図である。検出部４０２は、抽出部４０１によって抽出された話者ごとの音韻情報から一連の発話区間を話者ごとに検出する機能を有
する。具体的には、たとえば、図５に示したように、エージェント韻律データＭａからエージェントの一連の発話区間を時系列にしたエージェント発話区間情報Ｓａを生成する。同様に、顧客韻律データＭｃから顧客の一連の発話区間を時系列にした顧客発話区間情報Ｓｃを生成する。 <Detection unit 402>
FIG. 6 is an explanatory diagram showing input / output of the detection unit 402. The detection unit 402 has a function of detecting a series of utterance sections for each speaker from the phoneme information for each speaker extracted by the extraction unit 401. Specifically, for example, as shown in FIG. 5, agent utterance section information Sa in which a series of utterance sections of the agent is time-sequentially generated from the agent prosodic data Ma. Similarly, customer utterance section information Sc in which a series of utterance sections of the customer are time-sequentially generated from the customer prosody data Mc.

検出部４０２では、具体的には、韻律データＭａ，Ｍｃのパワー値を時系列で読み込み、連続するパワー値が所定のしきい値以上で、かつ、その連続時間が所定の最低発話区間長以上となる区間を発話区間として検出する。図６に示した発話区間情報Ｓａ，Ｓｃでは、発話区間ごとに、発話ＩＤ、開始時刻および終了時刻を有する。発話ＩＤは、発話区間を特定する識別情報であり、ここでは、コール情報と対応付けるため、コールＩＤに枝番号を付した情報とする。 Specifically, the detection unit 402 reads the power values of the prosodic data Ma and Mc in time series, the continuous power value is equal to or greater than a predetermined threshold value, and the continuous time is equal to or greater than the predetermined minimum utterance interval length. Is detected as an utterance interval. In the utterance section information Sa and Sc shown in FIG. 6, each utterance section has an utterance ID, a start time, and an end time. The utterance ID is identification information for specifying the utterance section. Here, the utterance ID is information obtained by adding a branch number to the call ID in order to associate with the call information.

＜基本対話分析部４０３＞
基本対話分析部４０３は、図４に示したように、主導権話者特定部４３１と、冒頭フェーズ特定部４３２とを有する。主導権話者特定部４３１は、検出部４０２によって検出された話者ごとの一連の発話区間の区間長に基づいて、両話者の特定の対話区間ごとに主導権話者を特定する。具体的には、たとえば、発話区間情報Ｓａ，Ｓｃの長さや頻度によって、ある対話区間において対話の主導権を握っている話者がいずれの話者であるかを特定する。 <Basic dialogue analysis unit 403>
As shown in FIG. 4, the basic dialog analysis unit 403 includes an initiative speaker specifying unit 431 and an opening phase specifying unit 432. The initiative speaker identification unit 431 identifies the initiative speaker for each specific conversation section of both speakers based on the section length of a series of utterance sections for each speaker detected by the detection unit 402. Specifically, for example, it is specified which speaker is the speaker who has the initiative in the conversation in a certain conversation section based on the length and frequency of the utterance section information Sa and Sc.

ここで、対話区間とは、両話者が交互に発話している区間である。対話区間の設定は、主導権話者の特定処理の際に、動的に設定する場合と、あらかじめ決められた一定の区間を対話区間とするよう静的に設定する場合がある。動的に設定する場合は、両話者またはいずれか一方の話者の発話回数が所定の発話回数となったときの１回目の発話開始時刻から所定回数の発話終了時刻までの区間が対話区間となる。そして、対話区間内で発話時間が長い方の話者を主導権話者とする。 Here, the dialogue section is a section in which both speakers speak alternately. The conversation section may be set dynamically during the initiative speaker specifying process, or may be statically set so that a predetermined section is set as the conversation section. When set dynamically, the interval from the first utterance start time to the predetermined utterance end time when the number of utterances of both speakers or one of the speakers reaches the predetermined number of utterances is the conversation interval It becomes. Then, the speaker with the longer utterance time in the dialogue section is set as the initiative speaker.

図７は、主導権話者の動的な特定例を示す説明図である。ここでは、図１に示した対話を用いて説明する。なお、図１では対話は発話Ｔｃ６で終了しているが、ここでは、それ以降も継続するものとする。また、しきい値となる所定の発話回数は３回とし、いずれか一方の話者の発話回数が３になった時点で主導権話者特定をおこなう。 FIG. 7 is an explanatory diagram illustrating a dynamic specific example of initiative speakers. Here, a description will be given using the dialogue shown in FIG. In FIG. 1, the dialogue ends with the utterance Tc6, but here it is assumed that the dialogue continues thereafter. Further, the predetermined number of utterances serving as a threshold is three, and the initiative speaker is specified when the number of utterances of any one of the speakers becomes three.

発話Ｔｃ１から計数すると、まず、顧客の発話Ｔｃ１の開始時刻から顧客の発話Ｔｃ３の終了時刻までの区間が対話区間Ｒ１となる。対話区間Ｒ１での顧客の発話時間は、発話Ｔｃ１〜Ｔｃ３の総区間長であり、対話区間Ｒ１でのエージェントの発話時間は、発話Ｔａ１〜Ｔａ２の総区間長である。この場合、発話Ｔｃ１〜Ｔｃ３の総区間長の方が大きいため、対話区間Ｒ１の主導権話者は顧客となる。同様に、対話区間Ｒ２では主導権話者はエージェントとなる。 When counting from the utterance Tc1, first, a section from the start time of the customer's utterance Tc1 to the end time of the customer's utterance Tc3 is the conversation section R1. The customer utterance time in the conversation section R1 is the total section length of the utterances Tc1 to Tc3, and the agent utterance time in the conversation section R1 is the total section length of the utterances Ta1 to Ta2. In this case, since the total section length of the utterances Tc1 to Tc3 is larger, the initiative speaker in the conversation section R1 is a customer. Similarly, in the conversation period R2, the initiative speaker is an agent.

図８は、基本対話分析部４０３の入出力を示す説明図である。図８に示すように、基本対話分析部４０３は、発話区間情報Ｓａ，Ｓｃを入力し、図７に示したように主導権話者を対話区間ごとに特定することで、主導権話者情報Ｑを出力する。主導権話者情報Ｑは、コールＩＤ、主導権話者、対話区間（開始時刻と終了時刻）をフィールド項目とする。各レコードは、コールＩＤによって特定される対話において、その対話区間における主導権話者が誰であるかを示している。 FIG. 8 is an explanatory diagram showing input / output of the basic dialog analysis unit 403. As shown in FIG. 8, the basic dialogue analysis unit 403 inputs the utterance interval information Sa and Sc, and identifies the initiative speaker for each dialogue interval as shown in FIG. Q is output. The initiative speaker information Q includes a call ID, initiative speaker, and conversation section (start time and end time) as field items. Each record indicates who is the initiative speaker in the dialogue section in the dialogue specified by the call ID.

また、図４において、冒頭フェーズ特定部４３２は、録音音声データＤの開始冒頭において主導権話者が相手方に質問をする立場の話者（本例では顧客）である区間を質問フェーズとし、当該質問フェーズ後において主導権話者が相手方から質問を受ける立場の話者（本例ではエージェント）である区間を回答フェーズとする機能を有する。具体的には、
図８に示したように、総対話区間のうち顧客が最初に主導権話者となる対話区間Ｒ１を質問フェーズとする。また、質問フェーズ後にエージェントが最初に主導権話者となる対話区間Ｒ２を回答フェーズとする。 In FIG. 4, the opening phase specifying unit 432 sets a section in which the initiative speaker is a speaker (a customer in this example) asking the other party at the beginning of the recorded audio data D as a question phase. After the question phase, the answering phase is a section in which the initiative speaker is a speaker (agent in this example) in a position to receive a question from the other party. In particular,
As shown in FIG. 8, the dialogue section R1 in which the customer first becomes the initiative speaker among the total dialogue sections is set as the question phase. Further, after the question phase, the dialogue section R2 in which the agent first becomes the initiative speaker is set as the answer phase.

冒頭フェーズ特定部４３２は、図８に示したように、冒頭フェーズ情報Ｐｈを生成して出力する。冒頭フェーズ情報Ｐｈは、コールＩＤ、フェーズ種、フェーズ区間（開始時刻、終了時刻）をフィールド項目とする。各レコードは、コールＩＤによって特定される対話の冒頭において、質問フェーズと回答フェーズの区間を示している。 As shown in FIG. 8, the opening phase specifying unit 432 generates and outputs opening phase information Ph. The opening phase information Ph includes a call ID, a phase type, and a phase section (start time, end time) as field items. Each record indicates a section between the question phase and the answer phase at the beginning of the dialogue specified by the call ID.

冒頭フェーズ特定部４３２により質問フェーズおよび回答フェーズを特定するのは、上述したように、対話の冒頭では、正常な対話であろうが問題対話であろうが、質問と回答は少なくとも１回はあるからであり、対話が正常か否かはそれ以降の『プラスαの対話』の内容に依存するからである。 As described above, the question phase and the answer phase are specified by the opening phase specifying unit 432, whether it is a normal dialogue or a problem dialogue at the beginning of the dialogue, but there is at least one question and answer. This is because whether or not the dialogue is normal depends on the contents of the subsequent “plus α dialogue”.

＜対話構造分析部４０４＞
図９は、対話構造分析部４０４の入出力を示す説明図である。対話構造分析部４０４は、エージェント発話区間情報Ｓａ，顧客発話区間情報Ｓｃ，主導権話者情報Ｑ，冒頭フェーズ情報Ｐｈを入力し、対話構造分析結果を出力する。具体的には、コール情報テーブル３００の対話構造分析結果フラグＦａを更新する。 <Dialogue structure analysis unit 404>
FIG. 9 is an explanatory diagram showing input / output of the dialog structure analysis unit 404. The dialog structure analysis unit 404 inputs the agent utterance section information Sa, the customer utterance section information Sc, the initiative speaker information Q, and the opening phase information Ph, and outputs a dialog structure analysis result. Specifically, the dialog structure analysis result flag Fa of the call information table 300 is updated.

図４において、対話構造分析部４０４は、録音音声データＤにより特定される回答フェーズ以降の対話を構造分析する機能を有する。具体的には、冒頭フェーズ特定部４３２によって特定された回答フェーズ以降において両話者のうち指定話者が主導権話者である主導権保持時間を算出する。そして、指定話者の主導権保持時間の時間長に基づいて、対話の指定話者への偏りを分析する。 In FIG. 4, the dialog structure analysis unit 404 has a function of structurally analyzing dialogs after the answer phase specified by the recorded voice data D. Specifically, the initiative holding time in which the designated speaker is the initiative speaker among the two speakers after the answer phase identified by the opening phase identification unit 432 is calculated. Then, based on the time length of the designated speaker's initiative holding time, the bias of the dialogue toward the designated speaker is analyzed.

ここで、指定話者とは、あらかじめ設定された話者である。指定話者を「顧客」に設定しておくと、回答フェーズ以降の顧客の発話の偏りを分析し、指定話者を「エージェント」に設定しておくと、回答フェーズ以降のエージェントの発話の偏りを分析することとなる。 Here, the designated speaker is a speaker set in advance. If the designated speaker is set to “customer”, the bias of the customer's utterance after the answer phase is analyzed, and if the designated speaker is set to “agent”, the bias of the agent's utterance after the answer phase is analyzed. Will be analyzed.

対話構造分析部４０４は、回答フェーズ以降における対話時間の時間長がしきい値となる所定の対話時間長以上であり、指定話者の平均話者時間長が、不図示のデータベースに蓄積されている全顧客発話の平均発話時間長以上であり、指定話者の主導権保持時間の時間長がしきい値となる所定の保持時間長以上（両話者の主導権保持時間に対する比率が所定比率以上である場合でもよい）である場合、指定話者に偏りがある対話であると決定する。 The dialogue structure analysis unit 404 has a dialogue time length equal to or greater than a predetermined dialogue time length that is a threshold value after the answer phase, and the average speaker time length of the designated speaker is accumulated in a database (not shown). Is equal to or longer than the average utterance time length of all customer utterances, and is equal to or longer than a predetermined retention time length that is the threshold of the retention time of the designated speaker's initiative (the ratio of the two speakers to the initiative retention time is a predetermined ratio) If this is the case, it is determined that the conversation is biased toward the designated speaker.

回答フェーズ以降における対話時間の時間長がしきい値となる所定の対話時間長以上とするのは、回答フェーズ以降における対話時間が短い場合には、コール終了の挨拶だけのようなケースを排除するためである。また、平均発話時間長が平均発話時間長以上とするのは、「はい」、「ええ」などの相槌のような発話が連続しているケースを除くためである。 When the dialogue time after the answer phase is longer than the predetermined dialogue time, which is the threshold, if the dialogue time after the answer phase is short, cases such as just the greeting at the end of the call are excluded. Because. The reason why the average utterance time length is greater than or equal to the average utterance time length is to exclude cases where utterances such as “Yes” and “Yes” are continuous.

＜取得部４０５＞
図４において、取得部４０５は、所定の認識キーワードと一致または関連するキーワードおよびその出現時刻を含む録音音声データＤに関する話者ごとの認識結果を取得する機能を有する。この取得に先立って録音音声データＤの音声認識が必要である。 <Acquisition unit 405>
In FIG. 4, the acquisition unit 405 has a function of acquiring a recognition result for each speaker regarding the recorded voice data D including a keyword that matches or relates to a predetermined recognition keyword and its appearance time. Prior to this acquisition, voice recognition of the recorded voice data D is necessary.

図１０は、取得部４０５による音声認識結果情報の取得過程を示す説明図である。音声
認識処理は、録音音声データＤと音声認識キーワードリストＬを読み込んで録音音声データＤの音声認識処理を実行し、音声認識結果情報Ｗを出力する。取得部４０５は、この音声認識結果情報Ｗを取得することとなる。 FIG. 10 is an explanatory diagram illustrating a process of acquiring speech recognition result information by the acquisition unit 405. The voice recognition process reads the recorded voice data D and the voice recognition keyword list L, executes the voice recognition process of the recorded voice data D, and outputs voice recognition result information W. The acquisition unit 405 acquires the voice recognition result information W.

音声認識処理は、ワードスポッティングやサブワード法など既存の手法により録音音声データＤで発話された言葉をテキストデータの単語に変換する。音声認識キーワードリストＬは、対象となる製品や業務のマニュアル、エージェントが利用するＦＡＱ（ＦｒｅｑｕｅｎｔｌｙＡｓｋｅｄＱｕｅｓｔｉｏｎｓ）から作成されたテキストデータである。 In the speech recognition process, words uttered in the recorded speech data D are converted into words of text data by an existing method such as word spotting or a subword method. The speech recognition keyword list L is text data created from a target product, a business manual, or FAQ (Frequently Asked Questions) used by an agent.

音声認識結果情報Ｗは、認識結果ＩＤ、認識キーワード、話者種、出現時刻をフィールド項目とする。認識結果ＩＤには、音声認識キーワードリストＬ内の認識キーワードと一致または関連する都度採番される固有の番号が書き込まれる。認識キーワードには、音声認識された単語と一致または関連した音声認識キーワードリストＬ内の認識キーワードが書き込まれる。話者種には、認識キーワードに一致または関連する単語を発した話者名が書き込まれる。出現時刻には、認識キーワードに一致または関連する単語を発した時刻（録音音声データＤ上での位置）が書き込まれる。 The speech recognition result information W has the recognition result ID, the recognition keyword, the speaker type, and the appearance time as field items. In the recognition result ID, a unique number is written each time it matches or relates to the recognition keyword in the speech recognition keyword list L. In the recognition keyword, a recognition keyword in the speech recognition keyword list L that matches or relates to the speech-recognized word is written. In the speaker type, the name of the speaker who has issued a word that matches or relates to the recognition keyword is written. In the appearance time, a time (a position on the recorded voice data D) at which a word that matches or is related to the recognition keyword is written.

この取得部４０５は、内部において音声認識処理を実行することとしてもよく、対話選別装置４００外から受信することとしてもよい。いずれにしても、少なくとも音声認識結果情報Ｗが対話選別装置４００内の記憶装置に保持されていればよい。 This acquisition unit 405 may execute voice recognition processing inside, or may receive from outside the dialogue sorting apparatus 400. In any case, at least the speech recognition result information W only needs to be held in the storage device in the dialogue selection device 400.

＜発話内容分析部４０６＞
図４において、発話内容分析部４０６は、回答フェーズ以降の対話の内容を分析する機能を有する。図１１は、発話内容分析部４０６の入出力を示す説明図である。発話内容分析部４０６は、指定発話区間情報（ここでは、指定話者を顧客としているため、顧客発話区間情報Ｓｃ）および音声認識結果情報Ｗを入力し、発話内容分析結果を出力する。具体的には、コール情報テーブル３００の発話内容分析結果フラグＦｂを更新する。 <Speech content analysis unit 406>
In FIG. 4, the utterance content analysis unit 406 has a function of analyzing the content of the dialogue after the answer phase. FIG. 11 is an explanatory diagram showing input / output of the utterance content analysis unit 406. The utterance content analysis unit 406 inputs the specified utterance section information (here, the specified speaker is the customer utterance section information Sc) and the speech recognition result information W, and outputs the utterance content analysis result. Specifically, the utterance content analysis result flag Fb in the call information table 300 is updated.

発話内容分析部４０６は、認識結果リストテーブルの中から選ばれた指定話者の認識キーワードの、指定話者の一連の発話区間での出現状況に基づいて、対話の進行の順調性を分析する。対話の進行の順調性は、同じ発話内容が繰り返されているか否かで判断することとなる。繰り返されていると判断された対話は、問題対話となる。 The utterance content analysis unit 406 analyzes the smoothness of the progress of the dialogue based on the appearance status of the recognition keyword of the designated speaker selected from the recognition result list table in a series of utterance sections of the designated speaker. . The smoothness of the progress of the dialogue is determined by whether or not the same utterance content is repeated. Dialogues that are determined to be repeated become problem dialogues.

また、認識キーワードの使用状況については、各発話区間での語句の変化（同一認識キーワードの出現状況）をみて、変化のないものを選別する。この場合、「はい」などの短い認識キーワードではなく、ある発話長以上の発話となる認識キーワードを対象とする。認識キーワードは発話区間ごとに選別され、類似度算出テーブルに書き込まれる。 As for the usage status of the recognition keyword, the change of the phrase in each utterance section (appearance status of the same recognition keyword) is selected to select the one having no change. In this case, not a short recognition keyword such as “Yes”, but a recognition keyword that becomes an utterance longer than a certain utterance length is targeted. The recognition keywords are selected for each utterance section and written into the similarity calculation table.

図１２は、類似度算出テーブルの記憶内容を示す説明図である。類似度算出テーブル１２００は、発話ＩＤ、認識キーワード、類似度をフィールド項目とする。発話内容分析部４０６では、指定話者の発話区間情報から各発話の発話区間を抽出する。そして、音声認識結果情報Ｗからその抽出発話時間中に抽出された認識キーワードを、出現時刻を手掛かりにして読み出し、類似度算出テーブル１２００の抽出発話区間のレコードに書き込む FIG. 12 is an explanatory diagram showing the contents stored in the similarity calculation table. The similarity calculation table 1200 uses the utterance ID, the recognition keyword, and the similarity as field items. The utterance content analysis unit 406 extracts the utterance section of each utterance from the utterance section information of the designated speaker. Then, the recognition keyword extracted from the speech recognition result information W during the extracted utterance time is read out using the appearance time as a clue and is written in the record of the extracted utterance section of the similarity calculation table 1200.

たとえば、発話ＩＤ：ｃａｌｌ００１−１０に着目すると、その発話区間内に出現時刻がある認識キーワードとして、認識結果ＩＤ：１〜３の顧客の認識キーワード「フロッピ」，「ドライバ」，「インストール」が音声認識結果情報Ｗから抽出される。そして、類似度算出テーブル１２００の発話ＩＤ：ｃａｌｌ００１−１０のレコードに、認識キーワード「フロッピ」，「ドライバ」，「インストール」が書き込まれる。 For example, when attention is paid to the utterance ID: call001-10, the recognition keywords “floppy”, “driver”, and “install” of the customers with the recognition result IDs: 1 to 3 are voiced as the recognition keywords having the appearance time in the utterance section. Extracted from the recognition result information W. Then, the recognition keywords “floppy”, “driver”, and “install” are written in the record of the utterance ID: call001-10 of the similarity calculation table 1200.

また、認識結果ＩＤ４，５の認識キーワード「インストール」，「フロッピ」は、発話ＩＤ：ｃａｌｌ００１−１１の発話区間内に出現するため、類似度算出テーブル１２００の発話ＩＤ：ｃａｌｌ００１−１１のレコードに書き込まれる。なお、認識結果ＩＤ：６の認識キーワード「操作」は、発話ＩＤ：ｃａｌｌ００１−１１の発話区間内に出現するが指定話者（顧客）ではないため、類似度算出テーブル１２００には書き込まれない。 Also, since the recognition keywords “install” and “floppy” of the recognition result IDs 4 and 5 appear in the utterance section of the utterance ID: call001-11, they are written in the record of the utterance ID: call001-11 of the similarity calculation table 1200. It is. Note that the recognition keyword “operation” with the recognition result ID: 6 appears in the utterance section with the utterance ID: call001-11, but is not a designated speaker (customer), and thus is not written in the similarity calculation table 1200.

また、図４に示したように、発話内容分析部４０６は、算出部４６１と判断部４６２を有する。算出部４６１は、図１２に示した類似度を算出する機能を有する。具体的には、指定話者の連続発話区間における指定話者の認識結果の中から選ばれた認識キーワードと同一または類似のキーワードの出現回数（語句数）と指定話者の連続発話区間における指定話者の認識結果内の全認識キーワードの出現回数（語句数）とに基づいて、連続発話区間の類似度を算出する。 As shown in FIG. 4, the utterance content analysis unit 406 includes a calculation unit 461 and a determination unit 462. The calculation unit 461 has a function of calculating the similarity shown in FIG. Specifically, the number of occurrences (word count) of the same or similar keyword as the recognition keyword selected from the recognition results of the designated speaker in the continuous utterance section of the designated speaker and the designation in the continuous utterance section of the designated speaker Based on the number of appearances (number of phrases) of all recognition keywords in the recognition result of the speaker, the similarity between consecutive utterance sections is calculated.

ここで、類似度は、Ａ／Ｂで算出される。
Ａ：指定話者の連続発話区間における音声認識結果情報Ｗの中から選ばれた認識キーワードと同一または類似のキーワード数（語句数）
Ｂ：指定話者の連続発話区間における指定話者の音声認識結果情報Ｗ内の全認識キーワード数（語句数） Here, the similarity is calculated by A / B.
A: Number of keywords (number of phrases) that are the same as or similar to the recognition keyword selected from the speech recognition result information W in the continuous speech section of the designated speaker
B: Number of all recognized keywords (number of phrases) in the speech recognition result information W of the designated speaker in the continuous utterance section of the designated speaker

類似度の分子Ａでは、類似度の算出対象となる発話区間は、連続発話区間である。連続発話区間とは、時系列的に連続する指定話者（ここでは顧客）の２つの発話区間である。この２つの発話区間は文字通り連続していてもよいが、間に指定外話者（ここではエージェント）の発話区間が存在していてもよい。また、連続発話区間では、似たようなキーワードが発せられること、また、キーワードの同一性は音声認識処理の精度に依存することがあるため、類似のキーワードも計数することとしてもよい。類似か否かは、不図示の同義語データベースを参照することで決定することとしてもよい。 In the numerator A of the similarity, the utterance section for which the similarity is calculated is a continuous utterance section. The continuous utterance sections are two utterance sections of a designated speaker (in this case, a customer) that is continuous in time series. These two utterance sections may be literally continuous, but an utterance section of a non-designated speaker (here, an agent) may exist between them. In addition, similar keywords may be issued in the continuous utterance period, and the identity of the keywords may depend on the accuracy of the speech recognition processing, so similar keywords may be counted. Whether or not they are similar may be determined by referring to a synonym database (not shown).

以下、図１２を例に挙げて類似度算出手法を説明する。発話ＩＤ：ｃａｌｌ００１−１０の発話区間は先頭の発話区間であるため類似度は算出できない。発話ＩＤ：ｃａｌｌ００１−１１の発話区間については、先行する発話区間（発話ＩＤ：ｃａｌｌ００１−１０）との間で類似度を算出する。 Hereinafter, the similarity calculation method will be described with reference to FIG. Since the utterance section of the utterance ID: call001-10 is the head utterance section, the similarity cannot be calculated. For the utterance section of the utterance ID: call001-11, the similarity is calculated with the preceding utterance section (utterance ID: call001-10).

まず、発話区間（発話ＩＤ：ｃａｌｌ００１−１１）の認識キーワード「インストール」，「フロッピ」と同一または類似のキーワードを、発話区間（発話ＩＤ：ｃａｌｌ００１−１０）の認識キーワード群から探す。発話区間（発話ＩＤ：ｃａｌｌ００１−１０）の認識キーワード群にも認識キーワード「インストール」，「フロッピ」が存在するため、分子Ａ＝２となる。一方、分母Ｂは、連続発話区間で出現する認識キーワードが「インストール」，「フロッピ」，「ドライバ」であるため、分母Ｂ＝３となる。したがって、発話区間（発話ＩＤ：ｃａｌｌ００１−１０）と発話区間（発話ＩＤ：ｃａｌｌ００１−１１）との間の類似度は、Ａ／Ｂ＝０．６６となる。 First, a keyword that is the same as or similar to the recognition keywords “install” and “floppy” in the utterance section (utterance ID: call001-11) is searched from the recognition keyword group in the utterance section (utterance ID: call001-10). Since the recognition keywords “install” and “floppy” also exist in the recognition keyword group in the utterance section (utterance ID: call001-10), the numerator A = 2. On the other hand, the denominator B has a denominator B = 3 because the recognition keywords appearing in the continuous speech section are “install”, “floppy”, and “driver”. Therefore, the similarity between the utterance section (utterance ID: call001-10) and the utterance section (utterance ID: call001-11) is A / B = 0.66.

また、図４において、判断部４６２は、算出部４６１によって算出された類似度が所定の類似度以上であるか否かを判断する機能を有する。具体的には、たとえば、しきい値となる所定の類似度を０．５とした場合、上述した発話区間（発話ＩＤ：ｃａｌｌ００１−１０）と発話区間（発話ＩＤ：ｃａｌｌ００１−１１）との間の類似度（＝０．６６）は所定のしきい値以上となる。したがって、当該連続発話区間では、同じ発話が繰り返されていると推定することができる。よって、このような連続発話区間が所定数以上となる場合、進行が順調でない対話であると分析する。 In FIG. 4, the determination unit 462 has a function of determining whether or not the similarity calculated by the calculation unit 461 is equal to or higher than a predetermined similarity. Specifically, for example, when a predetermined similarity level serving as a threshold is 0.5, between the above-described utterance section (utterance ID: call001-10) and the utterance section (utterance ID: call001-11) The similarity (= 0.66) is equal to or greater than a predetermined threshold. Therefore, it can be estimated that the same utterance is repeated in the continuous utterance section. Therefore, when the number of such continuous utterance sections is equal to or greater than a predetermined number, it is analyzed that the progress is not smooth.

また、図１２の例では、１組の連続発話区間についてのみ説明したが、算出部４６１では連続発話区間ごとに類似度を算出し、判断部４６２では連続発話区間ごとに判断することとしてもよい。すなわち、複数組の連続発話区間について分析することとしてもよい。この場合、算出部４６１では、さらに所定の類似度以上となった連続発話区間の組数を計数し、その計数された組数がしきい値となる所定数以上であるか否かを判断することとしてもよい。この場合、計数された組数が所定数以上となった場合、進行が順調でない対話であると分析する。したがって、１組の連続発話区間を分析する場合に比べて、分析の信頼度が向上することとなる。 In the example of FIG. 12, only one set of continuous utterance sections has been described. However, the calculation unit 461 may calculate the similarity for each continuous utterance section, and the determination unit 462 may determine for each continuous utterance section. . That is, a plurality of sets of continuous utterance sections may be analyzed. In this case, the calculation unit 461 further counts the number of sets of continuous utterance sections having a predetermined similarity or higher, and determines whether or not the counted number of sets is equal to or higher than a predetermined number serving as a threshold value. It is good as well. In this case, when the number of pairs counted exceeds a predetermined number, it is analyzed that the progress is an unsuccessful dialogue. Therefore, the reliability of analysis is improved as compared with the case of analyzing a set of continuous speech sections.

＜決定部４０７＞
また、図４において、決定部４０７は、対話構造分析部４０４または／および発話内容分析部４０６によって分析された分析結果に基づいて、対話を問題対話に決定する機能を有する。具体的には、コール情報テーブル３００の対話構造分析結果フラグＦａ，発話内容分析結果フラグＦｂ，ＡＮＤ結果フラグＦｃの値により決定する。そして、問題対話となったコール情報のコールＩＤを抽出する。 <Determining unit 407>
In FIG. 4, the determination unit 407 has a function of determining a dialogue as a problem dialogue based on the analysis result analyzed by the dialogue structure analysis unit 404 and / or the utterance content analysis unit 406. Specifically, it is determined by the values of the dialog structure analysis result flag Fa, the utterance content analysis result flag Fb, and the AND result flag Fc of the call information table 300. Then, the call ID of the call information that becomes the problem dialogue is extracted.

たとえば、回答フェーズ以降の指定話者（顧客）の発話の偏りにより問題対話を決定する場合、対話構造分析結果フラグＦａがＦａ＝１のときは、問題対話に決定する。Ｆａ＝０のときは対話に偏りがないため、問題対話に決定せず、正常な対話として扱う。同様に、回答フェーズ以降の指定話者（顧客）の対話の進行の順調性により問題対話を決定する場合、発話内容分析結果フラグＦｂがＦｂ＝１のときは、問題対話に決定する。Ｆｂ＝０のときは対話が順調であるため、問題対話に決定せず、正常な対話として扱う。 For example, when the problem dialogue is determined based on the utterance bias of the designated speaker (customer) after the answer phase, when the dialogue structure analysis result flag Fa is 1, the problem dialogue is determined. When Fa = 0, there is no bias in the dialogue, so the problem dialogue is not decided and treated as a normal dialogue. Similarly, when the problem dialogue is determined by the smoothness of the progress of the dialogue of the designated speaker (customer) after the answer phase, when the utterance content analysis result flag Fb is Fb = 1, the question dialogue is determined. When Fb = 0, since the dialogue is smooth, it is not determined as a problem dialogue and is treated as a normal dialogue.

また、回答フェーズ以降の指定話者（顧客）の発話の偏りおよび対話の進行の順調性により問題対話を決定する場合、ＡＮＤ結果フラグＦｃがＦｃ＝１のときは、問題対話に決定する。Ｆｃ＝０のときは、正常な対話として扱う。 Further, when the problem dialogue is determined based on the utterance bias of the designated speaker (customer) after the answer phase and the smoothness of the progress of the dialogue, when the AND result flag Fc is Fc = 1, the question dialogue is decided. When Fc = 0, it is handled as a normal dialogue.

＜出力部４０８＞
また、出力部４０８は、決定部４０７によって決定された決定結果を出力する機能を有する。具体的には、たとえば、問題対話として抽出されたコールＩＤをリスト化した問題リストファイルを出力する。図１３は、問題リストファイルを示す説明図である。問題リストファイル１３００は、ディスプレイ２０８やプリンタ２１２、Ｉ／Ｆ２０９などの出力装置に渡されて出力される。また、問題リストファイル１３００に記述されているコールＩＤをその録音音声データＤにリンクさせることとしてもよい。これにより、問題リストファイル１３００がディスプレイ２０８に表示された場合、コールＩＤを指定することで、その録音音声データＤを再生することができる。 <Output unit 408>
The output unit 408 has a function of outputting the determination result determined by the determination unit 407. Specifically, for example, a problem list file in which call IDs extracted as problem dialogues are listed is output. FIG. 13 is an explanatory diagram showing a problem list file. The problem list file 1300 is output to an output device such as the display 208, the printer 212, or the I / F 209. The call ID described in the problem list file 1300 may be linked to the recorded voice data D. Thereby, when the problem list file 1300 is displayed on the display 208, the recorded audio data D can be reproduced by designating the call ID.

（対話選別処理手順）
図１４は、本実施の形態にかかる対話選別処理手順を示すフローチャートである。まず、対象となる録音音声データＤを取得し（ステップＳ１４０１）、抽出部４０１により韻律情報抽出処理を実行する（ステップＳ１４０２）。つぎに、検出部４０２により発話区間検出処理を実行する（ステップＳ１４０３）。そして、取得部４０５により音声認識結果情報Ｗを取得する（ステップＳ１４０４）。 (Interaction screening process)
FIG. 14 is a flowchart showing a procedure for dialogue selection processing according to the present embodiment. First, the target recorded audio data D is acquired (step S1401), and the prosody information extraction process is executed by the extraction unit 401 (step S1402). Next, the speech section detection process is executed by the detection unit 402 (step S1403). Then, the voice recognition result information W is acquired by the acquisition unit 405 (step S1404).

このあと、設定処理を実行する（ステップＳ１４０５）。この設定処理では、指定話者の設定、対話構造分析および／または発話内容分析の使用有無の設定、発話内容分析における連続発話区間の対象組数の設定などをユーザ入力によりおこなう。この設定処理は、ステップＳ１４０１〜Ｓ１４０４に先立って行ってもよい。 Thereafter, the setting process is executed (step S1405). In this setting process, setting of a designated speaker, setting of whether or not to use dialog structure analysis and / or utterance content analysis, setting of the number of target groups of continuous utterance sections in utterance content analysis, and the like are performed by user input. This setting process may be performed prior to steps S1401 to S1404.

このあと、基本対話分析部４０３による基本対話分析処理（ステップＳ１４０６）、対
話構造分析部４０４による対話構造分析処理（ステップＳ１４０７）、発話内容分析部４０６による発話内容分析処理（ステップＳ１４０８）を実行する。対話構造分析処理（ステップＳ１４０７）および発話内容分析処理（ステップＳ１４０８）は、設定処理の設定にしたがって実行する。このあと、決定部４０７により決定処理を実行し（ステップＳ１４０９）、出力部４０８により出力処理を実行する（ステップＳ１４１０）。これにより、一連の対話選別処理を終了する。 Thereafter, a basic dialog analysis process (step S1406) by the basic dialog analysis unit 403, a dialog structure analysis process (step S1407) by the dialog structure analysis unit 404, and an utterance content analysis process (step S1408) by the utterance content analysis unit 406 are executed. . The dialog structure analysis process (step S1407) and the utterance content analysis process (step S1408) are executed according to the setting of the setting process. Thereafter, the determination process is executed by the determination unit 407 (step S1409), and the output process is executed by the output unit 408 (step S1410). As a result, a series of dialogue selection processing is completed.

＜発話区間検出処理手順＞
図１５は、検出部４０２による発話区間検出処理（ステップＳ１４０３）の詳細な処理手順を示すフローチャートである。まず、韻律データＭａ，Ｍｃのうち未処理の韻律データがあるか否かを判断し（ステップＳ１５０１）、未処理の韻律データがある場合（ステップＳ１５０１：Ｙｅｓ）、未処理の韻律データを選択して読み込む（ステップＳ１５０２）。そして、ｓ＝１，ｒ＝０とする（ステップＳ１５０３）。ここでｓはパワー値を特定する韻律ＩＤである。ｒは、韻律ＩＤ：ｓをインクリメントさせるカウンタである。 <Speech section detection processing procedure>
FIG. 15 is a flowchart showing a detailed processing procedure of the speech segment detection processing (step S1403) by the detection unit 402. First, it is determined whether there is unprocessed prosody data among the prosody data Ma and Mc (step S1501). If there is unprocessed prosody data (step S1501: Yes), unprocessed prosody data is selected. (Step S1502). Then, s = 1 and r = 0 are set (step S1503). Here, s is a prosodic ID that identifies the power value. r is a counter that increments the prosodic ID: s.

そして、ｓ≦Ｓであるか否かを判断する（ステップＳ１５０４）。Ｓは韻律ＩＤであるｓの最大値である。ｓ≦Ｓでない場合（ステップＳ１５０４：Ｎｏ）、ステップＳ１５０１に戻る。一方、ｓ≦Ｓである場合（ステップＳ１５０４：Ｙｅｓ）、韻律ＩＤ：ｓのパワー値Ｐｓがしきい値となる最低パワー値Ｐｔに対して、Ｐ（ｓ＋ｒ）≧Ｐｔであるか否かを判断する（ステップＳ１５０５）。 Then, it is determined whether or not s ≦ S (step S1504). S is the maximum value of s which is a prosodic ID. If s ≦ S is not satisfied (step S1504: NO), the process returns to step S1501. On the other hand, if s ≦ S (step S1504: Yes), it is determined whether P (s + r) ≧ Pt with respect to the lowest power value Pt at which the power value Ps of the prosody ID: s is a threshold value. (Step S1505).

Ｐ（ｓ＋ｒ）≧Ｐｔである場合（ステップＳ１５０５：Ｙｅｓ）、ｒをインクリメントし（ステップＳ１５０６）、ｓ＋ｒ≦Ｓであるか否かを判断する（ステップＳ１５０７）。ｓ＋ｒ≦Ｓである場合（ステップＳ１５０７：Ｙｅｓ）、連続時間ｔ（ｓ，ｓ＋ｒ）を算出し（ステップＳ１５０８）、ステップＳ１５０５に戻る。連続時間ｔ（ｓ，ｓ＋ｒ）とは、韻律ＩＤ：ｓから韻律ＩＤ：（ｓ＋ｒ）までの時間の合計である。 If P (s + r) ≧ Pt (step S1505: Yes), r is incremented (step S1506), and it is determined whether s + r ≦ S is satisfied (step S1507). If s + r ≦ S (step S1507: Yes), the continuous time t (s, s + r) is calculated (step S1508), and the process returns to step S1505. The continuous time t (s, s + r) is the total time from prosodic ID: s to prosodic ID: (s + r).

一方、ｓ＋ｒ≦Ｓでない場合（ステップＳ１５０７：Ｎｏ）、連続時間ｔ（ｓ，ｓ＋ｒ−１）が算出されたか否かを判断する（ステップＳ１５１４）。算出されていない場合（ステップＳ１５１４：Ｎｏ）、ステップＳ１５０１に移行する。一方、算出された場合（ステップＳ１５１４：Ｙｅｓ）、連続時間ｔ（ｓ，ｓ＋ｒ−１）がしきい値となる所定連続時間Ｔに対して、ｔ（ｓ，ｓ＋ｒ−１）≧Ｔであるか否かを判断する（ステップＳ１５１５）。ｔ（ｓ，ｓ＋ｒ−１）≧Ｔでない場合（ステップＳ１５１５：Ｎｏ）、ステップＳ１５０１に移行する。一方、ｔ（ｓ，ｓ＋ｒ−１）≧Ｔである場合（ステップＳ１５１５：Ｙｅｓ）、連続時間ｔ（ｓ，ｓ＋ｒ−１）を発話区間として保存する（ステップＳ１５１６）。 On the other hand, if s + r ≦ S is not satisfied (step S1507: NO), it is determined whether or not the continuous time t (s, s + r−1) has been calculated (step S1514). If not calculated (step S1514: NO), the process proceeds to step S1501. On the other hand, if calculated (step S1514: Yes), is t (s, s + r-1)? T with respect to the predetermined continuous time T at which the continuous time t (s, s + r-1) is a threshold value? It is determined whether or not (step S1515). If t (s, s + r−1) ≧ T is not satisfied (step S1515: NO), the process proceeds to step S1501. On the other hand, if t (s, s + r−1) ≧ T (step S1515: Yes), the continuous time t (s, s + r−1) is stored as an utterance interval (step S1516).

また、ステップＳ１５０５において、Ｐ（ｓ＋ｒ）≧Ｐｔでない場合（ステップＳ１５０５：Ｎｏ）、連続時間ｔ（ｓ，ｓ＋ｒ）が算出されたか否かを判断する（ステップＳ１５０９）。算出されていない場合（ステップＳ１５０９：Ｎｏ）、ステップＳ１５１２に移行する。一方、算出された場合（ステップＳ１５０９：Ｙｅｓ）、連続時間ｔ（ｓ，ｓ＋ｒ）がしきい値となる所定連続時間Ｔに対して、ｔ（ｓ，ｓ＋ｒ）≧Ｔであるか否かを判断する（ステップＳ１５１０）。 If P (s + r) ≧ Pt is not satisfied in step S1505 (step S1505: No), it is determined whether the continuous time t (s, s + r) has been calculated (step S1509). If not calculated (step S1509: No), the process proceeds to step S1512. On the other hand, if calculated (step S1509: Yes), it is determined whether or not t (s, s + r) ≧ T with respect to the predetermined continuous time T at which the continuous time t (s, s + r) is a threshold value. (Step S1510).

ｔ（ｓ，ｓ＋ｒ）≧Ｔでない場合（ステップＳ１５１０：Ｎｏ）、ステップＳ１５１２に移行する。一方、ｔ（ｓ，ｓ＋ｒ）≧Ｔである場合（ステップＳ１５１０：Ｙｅｓ）、連続時間ｔ（ｓ，ｓ＋ｒ）を発話区間として保存する（ステップＳ１５１１）。このあと、韻律ＩＤ：ｓをｓ＝ｓ＋ｒ＋１とするとともに（ステップＳ１５１２）、カウンタｒをリセット（ｒ＝０）して（ステップＳ１５１３）、ステップＳ１５０５に戻る。一方、ステップＳ１５０１において、未処理の韻律データがない場合（ステップＳ１５０１：Ｎｏ
）、ステップＳ１４０４に移行する。 If t (s, s + r) ≧ T is not satisfied (step S1510: NO), the process proceeds to step S1512. On the other hand, if t (s, s + r) ≧ T (step S1510: Yes), the continuous time t (s, s + r) is stored as an utterance interval (step S1511). Thereafter, the prosodic ID: s is set to s = s + r + 1 (step S1512), the counter r is reset (r = 0) (step S1513), and the process returns to step S1505. On the other hand, if there is no unprocessed prosodic data in step S1501 (step S1501: No)
), The process proceeds to step S1404.

＜基本対話分析処理手順＞
図１６は、基本対話分析部４０３による基本対話分析処理（ステップＳ１４０６）の詳細な処理手順を示すフローチャートである。まず、主導権話者特定部４３１により主導権話者特定処理を実行し（ステップＳ１６０１）、冒頭フェーズ特定部４３２により冒頭フェーズ特定処理を実行する（ステップＳ１６０２）。そして、ステップＳ１４０７に移行する。 <Basic dialogue analysis processing procedure>
FIG. 16 is a flowchart showing a detailed processing procedure of the basic dialog analysis process (step S1406) by the basic dialog analysis unit 403. First, the initiative speaker identification unit 431 executes initiative speaker identification processing (step S1601), and the beginning phase identification unit 432 executes opening phase identification processing (step S1602). Then, control goes to a step S1407.

＜主導権話者特定処理手順＞
図１７は、主導権話者特定部４３１による主導権話者特定処理（ステップＳ１６０１）の詳細な処理手順を示すフローチャート（その１）である。まず、発話区間の区間ＩＤ：ｘをｘ＝１とし（ステップＳ１７０１）、区間ＩＤカウンタ：ｉをｉ＝０、顧客発話数カウンタ：ｊをｊ＝０、エージェント発話数カウンタ：ｋをｋ＝０とする（ステップＳ１７０２）。 <Initiative speaker identification processing procedure>
FIG. 17 is a flowchart (part 1) illustrating a detailed processing procedure of the initiative speaker specifying process (step S1601) by the initiative speaker specifying unit 431. First, the section ID of the utterance section: x is set to x = 1 (step S1701), the section ID counter: i is set to i = 0, the customer utterance count counter: j is set to j = 0, the agent utterance count counter: k is set to k = 0. (Step S1702).

つぎに、発話区間の総数Ｘに対してｘ≦Ｘであるか否かを判断する（ステップＳ１７０３）。ｘ≦Ｘでない場合（ステップＳ１７０３：Ｎｏ）、ステップＳ１６０２に移行する。一方、ｘ≦Ｘである場合（ステップＳ１７０３：Ｙｅｓ）、Ｌａ（ｘ＋ｉ）＜Ｌｃ（ｘ＋ｉ）であるか否かを判断する（ステップＳ１７０４）。Ｌａ（ｘ＋ｉ）は、区間ＩＤ：ｘ＋ｉのエージェントの発話区間の区間長であり、Ｌｃ（ｘ＋ｉ）は、区間ＩＤ：ｘ＋ｉの顧客の発話区間の区間長である。すなわち、同一区間ＩＤでどちらの話者の発話が長いかを判断する。 Next, it is determined whether or not x ≦ X with respect to the total number X of utterance sections (step S1703). If x ≦ X is not satisfied (step S1703: NO), the process proceeds to step S1602. On the other hand, if x ≦ X is satisfied (step S1703: YES), it is determined whether La (x + i) <Lc (x + i) is satisfied (step S1704). La (x + i) is the section length of the utterance section of the agent with section ID: x + i, and Lc (x + i) is the section length of the utterance section of the customer with section ID: x + i. That is, it is determined which speaker's utterance is longer in the same section ID.

Ｌａ（ｘ＋ｉ）＜Ｌｃ（ｘ＋ｉ）である場合（ステップＳ１７０４：Ｙｅｓ）、顧客の方が長いため、顧客発話数カウンタ：ｊをインクリメントして（ステップＳ１７０５）、ステップＳ１７０７に移行する。 If La (x + i) <Lc (x + i) (step S1704: Yes), the customer is longer, so the customer utterance number counter: j is incremented (step S1705), and the process proceeds to step S1707.

一方、Ｌａ（ｘ＋ｉ）＜Ｌｃ（ｘ＋ｉ）でない場合（ステップＳ１７０４：Ｎｏ）、エージェントの方が長いため、エージェント発話数カウンタ：ｋをインクリメントして（ステップＳ１７０６）、ステップＳ１７０７に移行する。ステップＳ１７０７では、区間ＩＤカウンタ：ｉをインクリメントし（ステップＳ１７０７）、ｉ≦ｎであるか否かを判断する（ステップＳ１７０８）。 On the other hand, if La (x + i) <Lc (x + i) is not satisfied (step S1704: No), the agent is longer, so the agent utterance number counter: k is incremented (step S1706), and the process proceeds to step S1707. In step S1707, the section ID counter: i is incremented (step S1707), and it is determined whether i ≦ n is satisfied (step S1708).

ここで、ｎは主導権話者を特定するための所定の区間数である。ｉ≦ｎである場合（ステップＳ１７０８：Ｙｅｓ）、ステップＳ１７０４に戻る。一方、ｉ≦ｎでない場合（ステップＳ１７０８：Ｎｏ）、図１８のステップＳ１８０１に移行する。 Here, n is a predetermined number of sections for specifying the initiative speaker. If i ≦ n (step S1708: YES), the process returns to step S1704. On the other hand, if i ≦ n is not satisfied (step S1708: NO), the process proceeds to step S1801 in FIG.

図１８は、主導権話者特定部４３１による主導権話者特定処理（ステップＳ１６０１）の詳細な処理手順を示すフローチャート（その２）である。ステップＳ１８０１において、ｊ＞ｋ（ｊ≧ｋでもよい）である場合（ステップＳ１８０１：Ｙｅｓ）、区間［ｘ，ｘ＋ｎ−１］の主導権話者を顧客に決定し（ステップＳ１８０２）、ステップＳ１８０４に移行する。 FIG. 18 is a flowchart (part 2) illustrating a detailed processing procedure of the initiative speaker specifying process (step S1601) by the initiative speaker specifying unit 431. In step S1801, if j> k (j ≧ k may be acceptable) (step S1801: Yes), the initiative speaker in the section [x, x + n−1] is determined as the customer (step S1802), and the process proceeds to step S1804. Transition.

一方、ｊ＞ｋ（ｊ≧ｋでもよい）でない場合（ステップＳ１８０１：Ｎｏ）、区間［ｘ，ｘ＋ｎ−１］の主導権話者をエージェントに決定し（ステップＳ１８０２）、ステップＳ１８０４に移行する。 On the other hand, if j> k (j ≧ k may be acceptable) (step S1801: No), the initiative speaker in the section [x, x + n−1] is determined as the agent (step S1802), and the process proceeds to step S1804.

ステップＳ１８０４において、ｘ＞１であるか否かを判断し（ステップＳ１８０４）、ｘ＞１でない場合（ステップＳ１８０４：Ｎｏ）、ステップＳ１８０８に移行する。一方
、ｘ＞１である場合（ステップＳ１８０４：Ｙｅｓ）、主導権話者に決定された話者（決定話者）が直前区間の決定話者と同一話者であるか否かを判断する（ステップＳ１８０５）。 In step S1804, it is determined whether or not x> 1 (step S1804). If x> 1 is not satisfied (step S1804: No), the process proceeds to step S1808. On the other hand, if x> 1 (step S1804: Yes), it is determined whether or not the speaker determined as the initiative speaker (determined speaker) is the same speaker as the determined speaker in the immediately preceding section ( Step S1805).

同一である場合（ステップＳ１８０５：Ｙｅｓ）、主導権話者情報Ｑの直前の区間の終点を区間ＩＤ：ｘ＋ｎ−１の終了時刻に修正し（ステップＳ１８０８）、ステップＳ１８０７に移行する。一方、同一でない場合（ステップＳ１８０５：Ｎｏ）、開始点を区間ｘの開始時刻、終点を区間ｘ＋ｎ−１の終了時刻として主導権話者情報Ｑの新規レコードとして書き込む（ステップＳ１８０６）。そして、ステップＳ１８０７において、ｘをインクリメントして、ステップＳ１７０２に戻る。 If they are the same (step S1805: Yes), the end point of the section immediately before the initiative speaker information Q is corrected to the end time of the section ID: x + n-1 (step S1808), and the process proceeds to step S1807. On the other hand, if they are not the same (step S1805: No), a new record of initiative speaker information Q is written with the start point as the start time of section x and the end point as the end time of section x + n-1 (step S1806). In step S1807, x is incremented, and the process returns to step S1702.

＜対話構造分析処理手順＞
図１９は、対話構造分析部４０４による対話構造分析処理（ステップＳ１４０７）の詳細な処理手順を示すフローチャートである。まず、対象コール情報があるか否かを判断する（ステップＳ１９０１）。どのコール情報を対象コール情報とするかは、設定処理（ステップＳ１４０５）において設定しておく。デフォルトでは、未処理のコール情報を順次対象とすることとしてもよい。 <Dialogue structure analysis processing procedure>
FIG. 19 is a flowchart showing a detailed processing procedure of the dialog structure analysis process (step S1407) by the dialog structure analysis unit 404. First, it is determined whether there is target call information (step S1901). Which call information is the target call information is set in the setting process (step S1405). By default, unprocessed call information may be sequentially targeted.

対象コール情報がある場合（ステップＳ１９０１：Ｙｅｓ）、対象コール情報を選択し（ステップＳ１９０２）、回答フェーズ以降の対話時間を算出する（ステップＳ１９０３）。そして、回答フェーズ以降の対話時間がしきい値となる所定時間以上であるか否かを判断する（ステップＳ１９０４）。所定時間以上でない場合（ステップＳ１９０４：Ｎｏ）、ステップＳ１９０１に戻る。 If there is target call information (step S1901: Yes), the target call information is selected (step S1902), and the interaction time after the answer phase is calculated (step S1903). Then, it is determined whether or not the dialogue time after the answer phase is equal to or longer than a predetermined time as a threshold (step S1904). When it is not longer than the predetermined time (step S1904: No), the process returns to step S1901.

一方、所定時間以上である場合（ステップＳ１９０４：Ｙｅｓ）、回答フェーズ以降の各話者の主導権保持時間を算出する（ステップＳ１９０５）。主導権保持時間とは、主導権話者となった対話区間の話者ごとの合計区間長である。そして、指定話者の主導権保持率がしきい値となる所定保持率以上であるか否かを判断する（ステップＳ１９０６）。主導権保持率とは、指定話者（たとえば顧客）の主導権保持時間を、両話者の総主導権保持時間で割った値である。所定保持率以上でない場合（ステップＳ１９０６：Ｎｏ）、ステップＳ１９０１に戻る。 On the other hand, if it is equal to or longer than the predetermined time (step S1904: Yes), the initiative holding time of each speaker after the answer phase is calculated (step S1905). The initiative holding time is the total section length for each speaker of the conversation section that has become the initiative speaker. Then, it is determined whether or not the designated speaker's initiative retention rate is equal to or higher than a predetermined retention rate that is a threshold value (step S1906). The initiative retention rate is a value obtained by dividing the initiative retention time of a designated speaker (for example, a customer) by the total initiative retention time of both speakers. If it is not equal to or higher than the predetermined retention rate (step S1906: NO), the process returns to step S1901.

一方、所定保持率以上である場合（ステップＳ１９０６：Ｙｅｓ）、回答フェーズ以降の指定話者の発話の平均発話時間長を算出する（ステップＳ１９０７）。そして、全指定話者の平均発話時間長以上であるか否かを判断する（ステップＳ１９０８）。平均発話時間長以上でない場合（ステップＳ１９０８：Ｎｏ）、ステップＳ１９０１に戻る。 On the other hand, if it is equal to or higher than the predetermined retention rate (step S1906: Yes), the average utterance time length of the utterance of the designated speaker after the answer phase is calculated (step S1907). And it is judged whether it is more than the average utterance time length of all the designated speakers (step S1908). When it is not longer than the average utterance time length (step S1908: No), the process returns to step S1901.

一方、平均発話時間長以上である場合（ステップＳ１９０８：Ｙｅｓ）、対話構造分析結果フラグＦａをＦａ＝１とし（ステップＳ１９０９）、ステップＳ１９０１に戻る。一方、ステップＳ１９０１において、対象コール情報がない場合（ステップＳ１９０１：Ｎｏ）、ステップＳ１４０８に移行する。 On the other hand, if it is equal to or longer than the average utterance time length (step S1908: Yes), the dialog structure analysis result flag Fa is set to Fa = 1 (step S1909) and the process returns to step S1901. On the other hand, in step S1901, if there is no target call information (step S1901: No), the process proceeds to step S1408.

＜発話内容分析処理手順＞
図２０は、発話内容分析部４０６による発話内容分析処理（ステップＳ１４０８）の詳細な処理手順を示すフローチャートである。まず、対象コール情報があるか否かを判断する（ステップＳ２００１）。どのコール情報を対象コール情報とするかは、設定処理（ステップＳ１４０５）において設定しておく。デフォルトでは、未処理のコール情報を順次対象とすることとしてもよい。 <Speech content analysis processing procedure>
FIG. 20 is a flowchart showing a detailed processing procedure of the utterance content analysis process (step S1408) by the utterance content analysis unit 406. First, it is determined whether there is target call information (step S2001). Which call information is the target call information is set in the setting process (step S1405). By default, unprocessed call information may be sequentially targeted.

対象コール情報がある場合（ステップＳ２００１：Ｙｅｓ）、対象コール情報を選択し
（ステップＳ２００２）、回答フェーズ以降の未処理の指定話者の発話があるか否かを判断する（ステップＳ２００３）。未処理の所定話者の発話がある場合（ステップＳ２００３：Ｙｅｓ）、未処理の発話を選択し（ステップＳ２００４）、選択発話の発話時間長がしきい値となる所定時間長以上か否かを判断する（ステップＳ２００５）。所定時間長以上でない場合（ステップＳ２００５：Ｎｏ）、ステップＳ２００３に戻る。 If there is target call information (step S2001: Yes), the target call information is selected (step S2002), and it is determined whether or not there is an unprocessed designated speaker after the answer phase (step S2003). If there is an utterance of an unprocessed predetermined speaker (step S2003: Yes), an unprocessed utterance is selected (step S2004), and whether or not the utterance time length of the selected utterance is greater than or equal to a predetermined time length that becomes a threshold Judgment is made (step S2005). When it is not longer than the predetermined time length (step S2005: No), the process returns to step S2003.

一方、所定時間長以上である場合（ステップＳ２００５：Ｙｅｓ）、当該発話区間中の認識キーワードを音声認識結果情報Ｗから抽出する（ステップＳ２００６）。そして、抽出キーワードを類似度算出テーブル１２００に書込み（ステップＳ２００７）、ステップＳ２００３に戻る。 On the other hand, when it is more than the predetermined time length (step S2005: Yes), the recognition keyword in the said speech area is extracted from the speech recognition result information W (step S2006). Then, the extracted keyword is written in the similarity calculation table 1200 (step S2007), and the process returns to step S2003.

一方、ステップＳ２００３において、回答フェーズ以降の未処理の指定話者の発話がない場合（ステップＳ２００３：Ｎｏ）、算出部４６１により、指定話者の連続する発話間の類似度を算出する（ステップＳ２００８）。そして、各連続発話区間について、その類似度が判断部４６２によりしきい値となる所定類似度以上かを判断し、算出部４６１により、所定類似度以上の連続発話区間の数を計数する（ステップＳ２００９）。 On the other hand, if there is no unprocessed designated speaker utterance after the answer phase in step S2003 (step S2003: No), the calculation unit 461 calculates the similarity between consecutive utterances of the designated speaker (step S2008). ). Then, for each continuous utterance section, the determination unit 462 determines whether the similarity is equal to or higher than a predetermined similarity, which is a threshold value, and the calculation unit 461 counts the number of continuous utterance sections that are equal to or higher than the predetermined similarity (step S2009).

そして、判断部４６２により、計数値が所定数以上であるか否かを判断する（ステップＳ２０１０）。この所定数はしきい値となるため、設定処理（ステップＳ１４０５）において設定しておく。所定数以上である場合（ステップＳ２０１０：Ｙｅｓ）、発話内容分析結果フラグＦｂをＦｂ＝１とする（ステップＳ２０１１）。一方、所定数以上でない場合（ステップＳ２０１０：Ｎｏ）、ステップＳ２００１に戻る。 Then, the determination unit 462 determines whether or not the count value is a predetermined number or more (step S2010). Since this predetermined number is a threshold value, it is set in the setting process (step S1405). If the number is greater than or equal to the predetermined number (step S2010: Yes), the utterance content analysis result flag Fb is set to Fb = 1 (step S2011). On the other hand, when it is not more than the predetermined number (step S2010: No), the process returns to step S2001.

＜決定処理手順＞
図２１は、決定部４０７による決定処理（ステップＳ１４０９）の詳細な処理手順を示すフローチャートである。まず、対象コール情報があるか否かを判断する（ステップＳ２１０１）。どのコール情報を対象コール情報とするかは、設定処理（ステップＳ１４０５）において設定しておく。デフォルトでは、未処理のコール情報を順次対象とすることとしてもよい。 <Decision processing procedure>
FIG. 21 is a flowchart illustrating a detailed processing procedure of the determination process (step S1409) by the determination unit 407. First, it is determined whether there is target call information (step S2101). Which call information is the target call information is set in the setting process (step S1405). By default, unprocessed call information may be sequentially targeted.

対象コール情報がある場合（ステップＳ２１０１：Ｙｅｓ）、対象コール情報を選択し（ステップＳ２１０２）、フラグの値が１であるか否かを判断する（ステップＳ２１０３）。ここで、使用するフラグが対話構造分析結果フラグＦａであるか、発話内容分析結果フラグＦｂであるか、ＡＮＤ結果フラグＦｃであるかは、設定処理（ステップＳ１４０５）において設定されているため、設定されたフラグの値を参照する。 If there is target call information (step S2101: Yes), the target call information is selected (step S2102), and it is determined whether or not the flag value is 1 (step S2103). Here, whether the flag to be used is the dialog structure analysis result flag Fa, the utterance content analysis result flag Fb, or the AND result flag Fc is set in the setting process (step S1405). Refers to the value of the specified flag.

フラグの値が「１」でない場合（ステップＳ２１０３：Ｎｏ）、ステップＳ２１０１に戻る。一方、「１」である場合（ステップＳ２１０３：Ｙｅｓ）、対象コール情報を問題対話に決定する（ステップＳ２１０４）。具体的には、そのコールＩＤを問題リストファイル１３００に書き込む。そして、ステップＳ２１０１に戻る。一方、ステップＳ２１０１において、対象コール情報がない場合（ステップＳ２１０１：Ｎｏ）、出力処理（ステップＳ１４１０）に移行する。 When the value of the flag is not “1” (step S2103: No), the process returns to step S2101. On the other hand, if it is “1” (step S2103: Yes), the target call information is determined to be a problem dialogue (step S2104). Specifically, the call ID is written in the problem list file 1300. Then, the process returns to step S2101. On the other hand, when there is no target call information in step S2101 (step S2101: No), the process proceeds to an output process (step S1410).

このように、本実施の形態では、回答フェーズ以降の対話構造を分析することで、対話が指定話者に偏っている問題対話（候補）を抽出することができる。このような問題対話では、話者間で話がかみ合っていないと推定される。したがって、問題対話の選別を効率的かつ高精度におこなうことができ、モニタリングの作業効率の向上を図ることができる。 As described above, in this embodiment, it is possible to extract problem dialogues (candidates) whose dialogue is biased toward a designated speaker by analyzing the dialogue structure after the answer phase. In such a problem dialogue, it is presumed that the talks between the speakers are not engaged. Therefore, problem dialogues can be selected efficiently and with high accuracy, and monitoring work efficiency can be improved.

また、指定話者の発話内容を分析することにより、同じ発話が繰り返されている問題対
話（候補）を抽出することができる。このような問題対話では、話者間で話がかみ合っていないと推定される。したがって、問題対話の選別を効率的かつ高精度におこなうことができ、モニタリングの作業効率の向上を図ることができる。 Further, by analyzing the utterance content of the designated speaker, it is possible to extract problem dialogues (candidates) in which the same utterance is repeated. In such a problem dialogue, it is presumed that the talks between the speakers are not engaged. Therefore, problem dialogues can be selected efficiently and with high accuracy, and monitoring work efficiency can be improved.

また、対話構造分析と発話内容分析の両分析をおこなうことで、対話が指定話者に偏り、かつ、同じ発話が繰り返されている問題対話（候補）を抽出することができる。このような問題対話では、話者間で話がかみ合っていない確度がいずれか一方の分析の場合に比べて高いと推定される。したがって、問題対話の選別を効率的かつ高精度におこなうことができ、モニタリングの作業効率の向上を図ることができる。 Further, by performing both analysis of dialog structure analysis and utterance content analysis, it is possible to extract problem dialogs (candidates) in which the dialog is biased toward the designated speaker and the same utterance is repeated. In such a problem dialogue, it is estimated that the probability that the talk is not engaged between the speakers is higher than the case of either analysis. Therefore, problem dialogues can be selected efficiently and with high accuracy, and monitoring work efficiency can be improved.

なお、本実施の形態で説明した対話選別方法は、予め用意されたプログラムをパーソナル・コンピュータやワークステーション等のコンピュータで実行することにより実現することができる。このプログラムは、ハードディスク、フレキシブルディスク、ＣＤ−ＲＯＭ、ＭＯ、ＤＶＤ等のコンピュータで読み取り可能な記録媒体に記録され、コンピュータによって記録媒体から読み出されることによって実行される。またこのプログラムは、インターネット等のネットワークを介して配布することが可能な媒体であってもよい。 The dialog selection method described in the present embodiment can be realized by executing a program prepared in advance on a computer such as a personal computer or a workstation. This program is recorded on a computer-readable recording medium such as a hard disk, a flexible disk, a CD-ROM, an MO, and a DVD, and is executed by being read from the recording medium by the computer. The program may be a medium that can be distributed through a network such as the Internet.

上述した実施の形態に関し、さらに以下の付記を開示する。 The following additional notes are disclosed with respect to the embodiment described above.

（付記１）コンピュータを、
話者間の対話に関する音声データから得られる前記話者ごとの音韻情報から一連の発話区間を前記話者ごとに検出する検出手段、
前記検出手段によって検出された前記話者ごとの一連の発話区間の区間長に基づいて、前記両話者の特定の対話区間ごとに主導権話者を特定する主導権話者特定手段、
前記音声データの開始冒頭において前記主導権話者特定手段によって特定された主導権話者が相手方に質問をする立場の話者である区間を質問フェーズとし、当該質問フェーズ後において前記主導権話者が相手方から質問を受ける立場の話者である区間を回答フェーズとする冒頭フェーズ特定手段、
前記冒頭フェーズ特定手段によって特定された回答フェーズ以降において前記両話者のうち指定話者が前記主導権話者である主導権保持時間を算出し、前記指定話者の主導権保持時間の時間長に基づいて、前記対話の前記指定話者への偏りを分析する対話構造分析手段、
前記対話構造分析手段によって分析された分析結果に基づいて、前記対話を問題対話に決定する決定手段、
前記決定手段によって決定された決定結果を出力する出力手段、
として機能させることを特徴とする対話選別プログラム。 (Appendix 1) Computer
Detecting means for detecting a series of utterance sections for each speaker from phonological information for each speaker obtained from speech data relating to dialogue between speakers;
Based on the section length of a series of utterance sections for each speaker detected by the detection means, the initiative speaker identification means for identifying the initiative speaker for each specific conversation section of the two speakers,
A section in which the initiative speaker specified by the initiative speaker specifying means at the beginning of the voice data is a speaker who is in a position to ask a question to the other party is set as a question phase, and the initiative speaker after the question phase The initial phase identification means, where the answer phase is the section where the speaker is in the position of receiving a question from the other party,
After the answer phase specified by the opening phase specifying means, the initiative holding time in which the designated speaker is the initiative speaker among the two speakers is calculated, and the time length of the initiative holding time of the designated speaker A dialog structure analysis means for analyzing the bias of the dialog toward the designated speaker,
A determination unit that determines the dialogue as a problem dialogue based on the analysis result analyzed by the dialogue structure analysis unit;
Output means for outputting the determination result determined by the determination means;
Dialogue screening program characterized by functioning as

（付記２）前記コンピュータを、
所定の認識キーワードと一致または関連するキーワードおよびその出現時刻を含む前記音声データに関する前記話者ごとの認識結果を取得する取得手段、
前記取得手段によって取得された前記指定話者の認識結果の中から選ばれたキーワードの、前記指定話者の前記一連の発話区間での出現状況に基づいて、前記対話の進行の順調性を分析する発話内容分析手段、として機能させ、
前記決定手段は、
前記対話構造分析手段および前記発話内容分析手段によって分析された分析結果に基づいて、前記対話を問題対話に決定することを特徴とする付記１に記載の対話選別プログラム。 (Appendix 2)
An acquisition means for acquiring a recognition result for each speaker related to the voice data including a keyword matching or related to a predetermined recognition keyword and its appearance time;
Analyzing the smoothness of the progress of the dialogue based on the appearance status of the keyword selected from the recognition result of the designated speaker acquired by the acquiring means in the series of utterance sections of the designated speaker. Function as an utterance content analysis means,
The determining means includes
The dialogue selection program according to appendix 1, wherein the dialogue is determined to be a question dialogue based on the analysis result analyzed by the dialogue structure analysis means and the utterance content analysis means.

（付記３）前記コンピュータを、
前記指定話者の連続発話区間における前記指定話者の認識結果の中から選ばれたキーワ
ードと同一または類似のキーワードの出現回数と前記指定話者の連続発話区間における前記指定話者の認識結果内の全キーワードの出現回数とに基づいて、前記連続発話区間の類似度を算出する算出手段、
前記算出手段によって算出された類似度が所定の類似度以上であるか否かを判断する判断手段、として機能させ、
前記発話内容分析手段は、
前記判断手段によって前記所定の類似度以上であると判断された場合、前記対話を進行が順調でない対話であると分析することを特徴とする付記２に記載の対話選別プログラム。 (Appendix 3)
The number of appearances of the same or similar keyword as the keyword selected from the recognition results of the designated speaker in the continuous utterance section of the designated speaker and the recognition result of the designated speaker in the continuous utterance section of the designated speaker Calculating means for calculating the similarity of the continuous utterance sections based on the number of appearances of all the keywords of
Functioning as a determination means for determining whether the similarity calculated by the calculation means is equal to or greater than a predetermined similarity;
The utterance content analysis means includes:
The dialogue selection program according to appendix 2, wherein the dialogue is analyzed as a dialogue whose progress is not smooth when the judgment means judges that the degree of similarity is equal to or higher than the predetermined similarity.

（付記４）前記算出手段は、
前記判断手段によって前記所定の類似度以上であると判断された連続発話区間の個数を計数し、
前記判断手段は、
前記算出手段によって算出された個数が所定数以上であるか否かを判断し、前記対話を進行が順調でない対話であると分析することを特徴とする付記３に記載の対話選別プログラム。 (Supplementary Note 4) The calculating means includes:
Counting the number of continuous speech sections determined by the determining means to be equal to or greater than the predetermined similarity,
The determination means includes
4. The dialogue selection program according to appendix 3, wherein it is determined whether or not the number calculated by the calculating means is a predetermined number or more, and the dialogue is analyzed as a dialogue whose progress is not smooth.

（付記５）前記決定手段は、
前記対話構造分析手段によって前記対話が前記指定話者への偏りがある対話であると分析された場合、前記対話を問題対話に決定することを特徴とする付記１〜４のいずれか一つに記載の対話選別プログラム。 (Supplementary note 5)
In any one of Supplementary notes 1 to 4, wherein when the dialogue is analyzed by the dialogue structure analyzing unit as a dialogue biased toward the designated speaker, the dialogue is determined as a problem dialogue. The dialogue selection program described.

（付記６）前記決定手段は、
前記発話内容分析手段によって前記対話が、進行が順調でない対話であると分析された場合、前記対話を問題対話に決定することを特徴とする付記２〜４のいずれか一つに記載の対話選別プログラム (Appendix 6) The determination means includes:
The dialogue selection according to any one of appendices 2 to 4, wherein when the dialogue is analyzed by the utterance content analysis means as a dialogue whose progress is not smooth, the dialogue is determined as a question dialogue. program

（付記７）前記決定手段は、
前記対話構造分析手段によって前記対話が前記指定話者への偏りがある対話であると分析され、かつ、前記発話内容分析手段によって前記対話が、進行が順調でない対話であると分析された場合、前記対話を問題対話に決定することを特徴とする付記２〜４のいずれか一つに記載の対話選別プログラム。 (Supplementary note 7)
When the dialog is analyzed by the dialog structure analysis means as a dialog biased toward the designated speaker, and the dialog is analyzed by the utterance content analysis means as a dialog whose progress is not smooth, The dialogue selection program according to any one of appendices 2 to 4, wherein the dialogue is determined as a problem dialogue.

（付記８）コンピュータを、
話者間の対話に関する音声データから得られる前記話者ごとの音韻情報から一連の発話区間を前記話者ごとに検出する検出手段、
前記検出手段によって検出された前記話者ごとの一連の発話区間の区間長に基づいて、前記両話者の特定の対話区間ごとに主導権話者を特定する主導権話者特定手段、
前記音声データの開始冒頭において前記主導権話者特定手段によって特定された主導権話者が相手方に質問をする立場の話者である区間を質問フェーズとし、当該質問フェーズ後において前記主導権話者が相手方から質問を受ける立場の話者である区間を回答フェーズとする冒頭フェーズ特定手段、
所定の認識キーワードと一致または関連するキーワードおよびその出現時刻を含む前記音声データに関する前記話者ごとの認識結果を取得する取得手段、
前記取得手段によって取得された前記指定話者の認識結果の中から選ばれたキーワードの、前記指定話者の前記一連の発話区間での出現状況に基づいて、前記対話の前記冒頭フェーズ特定手段によって特定された回答フェーズ以降における進行の順調性を分析する発話内容分析手段、
前記発話内容分析手段によって分析された分析結果に基づいて、前記対話を問題対話に
決定する決定手段、
前記決定手段によって決定された決定結果を出力する出力手段、
として機能させることを特徴とする対話選別プログラム。 (Appendix 8)
Detecting means for detecting a series of utterance sections for each speaker from phonological information for each speaker obtained from speech data relating to dialogue between speakers;
Based on the section length of a series of utterance sections for each speaker detected by the detection means, the initiative speaker identification means for identifying the initiative speaker for each specific conversation section of the two speakers,
A section in which the initiative speaker specified by the initiative speaker specifying means at the beginning of the voice data is a speaker who is in a position to ask a question to the other party is set as a question phase, and the initiative speaker after the question phase The initial phase identification means, where the answer phase is the section where the speaker is in the position of receiving a question from the other party,
An acquisition means for acquiring a recognition result for each speaker related to the voice data including a keyword matching or related to a predetermined recognition keyword and its appearance time;
Based on the appearance status of the designated speaker in the series of utterance sections of the keyword selected from the recognition result of the designated speaker acquired by the acquiring unit, by the opening phase specifying unit of the dialog Utterance content analysis means for analyzing the smoothness of progress after the identified answer phase,
A determination unit that determines the dialogue as a problem dialogue based on the analysis result analyzed by the utterance content analysis unit;
Output means for outputting the determination result determined by the determination means;
Dialogue screening program characterized by functioning as

（付記９）前記コンピュータを、
前記指定話者の連続発話区間における前記指定話者の認識結果の中から選ばれたキーワードと同一または類似のキーワードの出現回数と前記指定話者の連続発話区間における前記指定話者の認識結果内の全キーワードの出現回数とに基づいて、前記連続発話区間の類似度を算出する算出手段、
前記算出手段によって算出された類似度が所定の類似度以上であるか否かを判断する判断手段、として機能させ、
前記発話内容分析手段は、
前記判断手段によって前記所定の類似度以上であると判断された場合、前記対話を進行が順調でない対話であると分析することを特徴とする付記８に記載の対話選別プログラム。 (Supplementary note 9)
The number of appearances of the same or similar keyword as the keyword selected from the recognition results of the designated speaker in the continuous utterance section of the designated speaker and the recognition result of the designated speaker in the continuous utterance section of the designated speaker Calculating means for calculating the similarity of the continuous utterance sections based on the number of appearances of all the keywords of
Functioning as a determination means for determining whether the similarity calculated by the calculation means is equal to or greater than a predetermined similarity;
The utterance content analysis means includes:
9. The dialogue selection program according to appendix 8, wherein the dialogue is analyzed as a dialogue whose progress is not smooth when the judgment means judges that the degree of similarity is equal to or higher than the predetermined similarity.

（付記１０）前記算出手段は、
前記判断手段によって前記所定の類似度以上であると判断された連続発話区間の個数を計数し、
前記判断手段は、
前記算出手段によって算出された個数が所定数以上であるか否かを判断し、前記対話を進行が順調でない対話であると分析することを特徴とする付記９に記載の対話選別プログラム。 (Supplementary Note 10) The calculating means includes:
Counting the number of continuous speech sections determined by the determining means to be equal to or greater than the predetermined similarity,
The determination means includes
The dialogue selection program according to appendix 9, wherein it is determined whether or not the number calculated by the calculation means is a predetermined number or more, and the dialogue is analyzed as a dialogue whose progress is not smooth.

（付記１１）前記決定手段は、
前記発話内容分析手段によって前記対話が、進行が順調でない対話であると分析された場合、前記対話を問題対話に決定することを特徴とする付記８〜１０のいずれか一つに記載の対話選別プログラム。 (Supplementary Note 11) The determining means includes:
The dialogue selection according to any one of appendices 8 to 10, wherein when the dialogue is analyzed by the utterance content analysis means as a dialogue whose progress is not smooth, the dialogue is determined as a problem dialogue. program.

（付記１２）話者間の対話に関する音声データから得られる前記話者ごとの音韻情報から一連の発話区間を前記話者ごとに検出する検出手段と、
前記検出手段によって検出された前記話者ごとの一連の発話区間の区間長に基づいて、前記両話者の特定の対話区間ごとに主導権話者を特定する主導権話者特定手段と、
前記音声データの開始冒頭において前記主導権話者特定手段によって特定された主導権話者が相手方に質問をする立場の話者である区間を質問フェーズとし、当該質問フェーズ後において前記主導権話者が相手方から質問を受ける立場の話者である区間を回答フェーズとする冒頭フェーズ特定手段と、
前記冒頭フェーズ特定手段によって特定された回答フェーズ以降において前記両話者のうち指定話者が前記主導権話者である主導権保持時間を算出し、前記指定話者の主導権保持時間の時間長に基づいて、前記対話の前記指定話者への偏りを分析する対話構造分析手段と、
前記対話構造分析手段によって分析された分析結果に基づいて、前記対話を問題対話に決定する決定手段と、
前記決定手段によって決定された決定結果を出力する出力手段と、
を備えることを特徴とする対話選別装置。 (Supplementary Note 12) Detection means for detecting a series of utterance sections for each speaker from phonological information for each speaker obtained from voice data related to conversation between speakers;
Based on the section length of a series of utterance sections for each of the speakers detected by the detection means, initiative speaker identification means for identifying the initiative speaker for each specific conversation section of the two speakers,
A section in which the initiative speaker specified by the initiative speaker specifying means at the beginning of the voice data is a speaker who is in a position to ask a question to the other party is set as a question phase, and the initiative speaker after the question phase An initial phase identification means in which an answer phase is a section in which the speaker is a speaker in a position to receive a question from the other party,
After the answer phase specified by the opening phase specifying means, the initiative holding time in which the designated speaker is the initiative speaker among the two speakers is calculated, and the time length of the initiative holding time of the designated speaker A dialog structure analyzing means for analyzing the bias of the dialog toward the designated speaker,
Determining means for determining the dialogue as a problem dialogue based on the analysis result analyzed by the dialogue structure analyzing means;
Output means for outputting the determination result determined by the determination means;
A dialog screening apparatus comprising:

（付記１３）話者間の対話に関する音声データから得られる前記話者ごとの音韻情報から一連の発話区間を前記話者ごとに検出する検出手段と、
前記検出手段によって検出された前記話者ごとの一連の発話区間の区間長に基づいて、前記両話者の特定の対話区間ごとに主導権話者を特定する主導権話者特定手段と、
前記音声データの開始冒頭において前記主導権話者特定手段によって特定された主導権話者が相手方に質問をする立場の話者である区間を質問フェーズとし、当該質問フェーズ後において前記主導権話者が相手方から質問を受ける立場の話者である区間を回答フェーズとする冒頭フェーズ特定手段と、
所定の認識キーワードと一致または関連するキーワードおよびその出現時刻を含む前記音声データに関する前記話者ごとの認識結果を取得する取得手段と、
前記取得手段によって取得された前記指定話者の認識結果の中から選ばれたキーワードの、前記指定話者の前記一連の発話区間での出現状況に基づいて、前記対話の前記冒頭フェーズ特定手段によって特定された回答フェーズ以降における進行の順調性を分析する発話内容分析手段と、
前記発話内容分析手段によって分析された分析結果に基づいて、前記対話を問題対話に決定する決定手段と、
前記決定手段によって決定された決定結果を出力する出力手段と、
を備えることを特徴とする対話選別装置。 (Supplementary note 13) Detection means for detecting a series of utterance sections for each speaker from phonological information for each speaker obtained from voice data related to conversation between speakers;
Based on the section length of a series of utterance sections for each of the speakers detected by the detection means, initiative speaker identification means for identifying the initiative speaker for each specific conversation section of the two speakers,
A section in which the initiative speaker specified by the initiative speaker specifying means at the beginning of the voice data is a speaker who is in a position to ask a question to the other party is set as a question phase, and the initiative speaker after the question phase An initial phase identification means in which an answer phase is a section in which the speaker is a speaker in a position to receive a question from the other party,
An acquisition means for acquiring a recognition result for each speaker related to the voice data including a keyword that matches or relates to a predetermined recognition keyword and its appearance time;
Based on the appearance status of the designated speaker in the series of utterance sections of the keyword selected from the recognition result of the designated speaker acquired by the acquiring unit, by the opening phase specifying unit of the dialog An utterance content analysis means for analyzing the smoothness of progress after the identified answer phase;
Determining means for determining the dialogue as a question dialogue based on the analysis result analyzed by the utterance content analyzing means;
Output means for outputting the determination result determined by the determination means;
A dialog screening apparatus comprising:

（付記１４）コンピュータが、
話者間の対話に関する音声データから得られる前記話者ごとの音韻情報から一連の発話区間を前記話者ごとに検出する検出工程と、
前記検出工程によって検出された前記話者ごとの一連の発話区間の区間長に基づいて、前記両話者の特定の対話区間ごとに主導権話者を特定する主導権話者特定工程と、
前記音声データの開始冒頭において前記主導権話者特定工程によって特定された主導権話者が相手方に質問をする立場の話者である区間を質問フェーズとし、当該質問フェーズ後において前記主導権話者が相手方から質問を受ける立場の話者である区間を回答フェーズとする冒頭フェーズ特定工程と、
前記冒頭フェーズ特定工程によって特定された回答フェーズ以降において前記両話者のうち指定話者が前記主導権話者である主導権保持時間を算出し、前記指定話者の主導権保持時間の時間長に基づいて、前記対話の前記指定話者への偏りを分析する対話構造分析工程と、
前記対話構造分析工程によって分析された分析結果に基づいて、前記対話を問題対話に決定する決定工程と、
前記決定工程によって決定された決定結果を出力する出力工程と、
を実行することを特徴とする対話選別方法。 (Supplementary note 14)
A detection step for detecting a series of utterance sections for each speaker from phonological information for each speaker obtained from speech data relating to dialogue between speakers;
Based on the section length of a series of utterance intervals for each speaker detected by the detection step, the initiative speaker identification step for identifying the initiative speaker for each specific conversation interval of the two speakers,
A section in which the initiative speaker identified by the initiative speaker identification step at the beginning of the voice data is a speaker who is in a position to ask a question to the other party is defined as a question phase, and the initiative speaker after the question phase The initial phase identification process in which the answering phase is the section where the speaker is in a position to receive questions from the other party,
After the answer phase specified by the opening phase specifying step, the initiative holding time that the designated speaker is the initiative speaker among the two speakers is calculated, and the time length of the initiative holding time of the designated speaker Based on the dialog structure analysis step of analyzing the bias of the dialog to the designated speaker,
A determination step of determining the dialogue as a problem dialogue based on the analysis result analyzed by the dialogue structure analysis step;
An output step of outputting the determination result determined by the determination step;
Dialogue screening method characterized by executing

（付記１５）話者間の対話に関する音声データから得られる前記話者ごとの音韻情報から一連の発話区間を前記話者ごとに検出する検出工程と、
前記検出工程によって検出された前記話者ごとの一連の発話区間の区間長に基づいて、前記両話者の特定の対話区間ごとに主導権話者を特定する主導権話者特定工程と、
前記音声データの開始冒頭において前記主導権話者特定工程によって特定された主導権話者が相手方に質問をする立場の話者である区間を質問フェーズとし、当該質問フェーズ後において前記主導権話者が相手方から質問を受ける立場の話者である区間を回答フェーズとする冒頭フェーズ特定工程と、
所定の認識キーワードと一致または関連するキーワードおよびその出現時刻を含む前記音声データに関する前記話者ごとの認識結果を取得する取得工程と、
前記取得工程によって取得された前記指定話者の認識結果の中から選ばれたキーワードの、前記指定話者の前記一連の発話区間での出現状況に基づいて、前記対話の前記冒頭フェーズ特定工程によって特定された回答フェーズ以降における進行の順調性を分析する発話内容分析工程と、
前記発話内容分析工程によって分析された分析結果に基づいて、前記対話を問題対話に決定する決定工程と、
前記決定工程によって決定された決定結果を出力する出力工程と、
を実行することを特徴とする対話選別方法。 (Supplementary Note 15) A detection step of detecting a series of utterance sections for each speaker from phonological information for each speaker obtained from voice data related to dialogue between speakers;
Based on the section length of a series of utterance intervals for each speaker detected by the detection step, the initiative speaker identification step for identifying the initiative speaker for each specific conversation interval of the two speakers,
A section in which the initiative speaker identified by the initiative speaker identification step at the beginning of the voice data is a speaker who is in a position to ask a question to the other party is defined as a question phase, and the initiative speaker after the question phase The initial phase identification process in which the answering phase is the section where the speaker is in a position to receive questions from the other party,
An acquisition step of acquiring a recognition result for each speaker related to the voice data including a keyword that matches or relates to a predetermined recognition keyword and its appearance time;
Based on the appearance status of the designated speaker in the series of utterance sections of the keyword selected from the recognition result of the designated speaker acquired by the acquiring step, by the opening phase specifying step of the dialogue Utterance content analysis process that analyzes the smoothness of the progress after the identified answer phase,
A determination step of determining the dialogue as a question dialogue based on the analysis result analyzed by the utterance content analysis step;
An output step of outputting the determination result determined by the determination step;
Dialogue screening method characterized by executing

４００対話選別装置
４０１抽出部
４０２検出部
４０３基本対話分析部
４０４対話構造分析部
４０５取得部
４０６発話内容分析部
４０７決定部
４０８出力部
４３１主導権話者特定部
４３２冒頭フェーズ特定部
４６１算出部
４６２判断部 400 Dialogue Sorting Device 401 Extraction Unit 402 Detection Unit 403 Basic Dialogue Analysis Unit 404 Dialogue Structure Analysis Unit 405 Acquisition Unit 406 Utterance Content Analysis Unit 407 Determination Unit 408 Output Unit 431 Initiative Speaker Specification Unit 432 Opening Phase Specification Unit 461 Calculation Unit 462 Judgment part

Claims

Computer
Detecting means for detecting a series of utterance sections for each speaker from phonological information for each speaker obtained from speech data relating to dialogue between speakers;
Based on the section length of a series of utterance sections for each speaker detected by the detection means, the initiative speaker identification means for identifying the initiative speaker for each specific conversation section of the two speakers,
A section in which the initiative speaker specified by the initiative speaker specifying means at the beginning of the voice data is a speaker who is in a position to ask a question to the other party is set as a question phase, and the initiative speaker after the question phase The initial phase identification means, where the answer phase is the section where the speaker is in the position of receiving a question from the other party,
After the answer phase specified by the opening phase specifying means, the initiative holding time in which the designated speaker is the initiative speaker among the two speakers is calculated, and the time length of the initiative holding time of the designated speaker A dialog structure analysis means for analyzing the bias of the dialog toward the designated speaker,
A determination unit that determines the dialogue as a problem dialogue based on the analysis result analyzed by the dialogue structure analysis unit;
Output means for outputting the determination result determined by the determination means;
Dialogue screening program characterized by functioning as

The computer,
An acquisition means for acquiring a recognition result for each speaker related to the voice data including a keyword matching or related to a predetermined recognition keyword and its appearance time;
Analyzing the smoothness of the progress of the dialogue based on the appearance status of the keyword selected from the recognition result of the designated speaker acquired by the acquiring means in the series of utterance sections of the designated speaker. Function as an utterance content analysis means,
The determining means includes
2. The dialogue selection program according to claim 1, wherein the dialogue is determined as a question dialogue based on the analysis result analyzed by the dialogue structure analyzing unit and the utterance content analyzing unit.

The computer,
The number of appearances of the same or similar keyword as the keyword selected from the recognition results of the designated speaker in the continuous utterance section of the designated speaker and the recognition result of the designated speaker in the continuous utterance section of the designated speaker Calculating means for calculating the similarity of the continuous utterance sections based on the number of appearances of all the keywords of
Functioning as a determination means for determining whether the similarity calculated by the calculation means is equal to or greater than a predetermined similarity;
The utterance content analysis means includes:
3. The dialogue selection program according to claim 2, wherein if the judgment means judges that the degree of similarity is equal to or higher than the predetermined similarity, the dialogue is analyzed as a dialogue whose progress is not smooth.

Computer
Detecting means for detecting a series of utterance sections for each speaker from phonological information for each speaker obtained from speech data relating to dialogue between speakers;
Based on the section length of a series of utterance sections for each speaker detected by the detection means, the initiative speaker identification means for identifying the initiative speaker for each specific conversation section of the two speakers,
A section in which the initiative speaker specified by the initiative speaker specifying means at the beginning of the voice data is a speaker who is in a position to ask a question to the other party is set as a question phase, and the initiative speaker after the question phase The initial phase identification means, where the answer phase is the section where the speaker is in the position of receiving a question from the other party,
An acquisition means for acquiring a recognition result for each speaker related to the voice data including a keyword matching or related to a predetermined recognition keyword and its appearance time;
Based on the appearance status of the designated speaker in the series of utterance sections of the keyword selected from the recognition result of the designated speaker acquired by the acquiring unit, by the opening phase specifying unit of the dialog Utterance content analysis means for analyzing the smoothness of progress after the identified answer phase,
A determination unit that determines the dialogue as a problem dialogue based on the analysis result analyzed by the utterance content analysis unit;
Output means for outputting the determination result determined by the determination means;
Dialogue screening program characterized by functioning as

Detecting means for detecting a series of utterance sections for each speaker from phonological information for each speaker obtained from speech data relating to conversation between speakers;
Based on the section length of a series of utterance sections for each of the speakers detected by the detection means, initiative speaker identification means for identifying the initiative speaker for each specific conversation section of the two speakers,
A section in which the initiative speaker specified by the initiative speaker specifying means at the beginning of the voice data is a speaker who is in a position to ask a question to the other party is set as a question phase, and the initiative speaker after the question phase An initial phase identification means in which an answer phase is a section in which the speaker is a speaker in a position to receive a question from the other party,
After the answer phase specified by the opening phase specifying means, the initiative holding time in which the designated speaker is the initiative speaker among the two speakers is calculated, and the time length of the initiative holding time of the designated speaker A dialog structure analyzing means for analyzing the bias of the dialog toward the designated speaker,
Determining means for determining the dialogue as a problem dialogue based on the analysis result analyzed by the dialogue structure analyzing means;
Output means for outputting the determination result determined by the determination means;
A dialog screening apparatus comprising:

Computer
A detection step for detecting a series of utterance sections for each speaker from phonological information for each speaker obtained from speech data relating to dialogue between speakers;
Based on the section length of a series of utterance intervals for each speaker detected by the detection step, the initiative speaker identification step for identifying the initiative speaker for each specific conversation interval of the two speakers,
A section in which the initiative speaker identified by the initiative speaker identification step at the beginning of the voice data is a speaker who is in a position to ask a question to the other party is defined as a question phase, and the initiative speaker after the question phase The initial phase identification process in which the answering phase is the section where the speaker is in a position to receive questions from the other party,
After the answer phase specified by the opening phase specifying step, the initiative holding time that the designated speaker is the initiative speaker among the two speakers is calculated, and the time length of the initiative holding time of the designated speaker Based on the dialog structure analysis step of analyzing the bias of the dialog to the designated speaker,
A determination step of determining the dialogue as a problem dialogue based on the analysis result analyzed by the dialogue structure analysis step;
An output step of outputting the determination result determined by the determination step;
Dialogue screening method characterized by executing