JP7169031B1

JP7169031B1 - Program, information processing device, information processing system, information processing method, information processing terminal

Info

Publication number: JP7169031B1
Application number: JP2022079947A
Authority: JP
Inventors: 健二山内; 泰一橋本
Original assignee: Revcomm
Current assignee: Revcomm
Priority date: 2022-05-16
Filing date: 2022-05-16
Publication date: 2022-11-10
Anticipated expiration: 2042-05-16
Also published as: JP2023168692A; JP2023169093A

Abstract

【課題】対話において話者間でどのような話題に関してコミュニケーションを行ったのか確認するプログラム、情報処理装置、情報処理システム、情報処理方法及び情報処理端末を提供する。【解決手段】ネットワークを介して接続されたサーバ、複数のユーザ端末、ＣＲＭシステム及び音声サーバを備えるシステムにおいて、サーバの制御部は、対話に関する音声データを受け付け、受け付けた音声データから、発話区間ごとに複数の区間音声データを抽出し、複数の区間音声データのうち、所定の話題に関する第１トピックと関連する１又は複数の区間音声データを特定し、複数の区間音声データのうち、特定した１又は複数の区間音声データと、第１トピックと、に基づき、１又は複数の区間音声データに含まれるテキスト情報を要約した要約テキストを生成するトピック提示処理を実行するトピック処理部を備える。【選択図】図１９A program, an information processing device, an information processing system, an information processing method, and an information processing terminal are provided for confirming what topic was communicated between speakers in a dialogue. In a system comprising a server connected via a network, a plurality of user terminals, a CRM system, and a voice server, a control unit of the server receives voice data relating to dialogue, and from the received voice data, extracting a plurality of segmental audio data, identifying one or a plurality of segmental audio data related to a first topic related to a predetermined topic among the plurality of segmental audio data, and identifying one or more segmental audio data among the plurality of segmental audio data Alternatively, a topic processing unit that executes topic presentation processing for generating summary text summarizing text information included in one or more pieces of segmental speech data based on the plurality of pieces of segmental speech data and the first topic. [Selection drawing] Fig. 19

Description

本開示は、プログラム、情報処理装置、情報処理システム、情報処理方法、情報処理端末に関する。 The present disclosure relates to a program, an information processing device, an information processing system, an information processing method, and an information processing terminal.

複数のユーザ間で行われるオンライン対話サービスが知られている。
特許文献１には、客観的な指標を考慮しつつ、より効率的な営業活動の実現を補助する手法が開示されている。 Online interactive services between multiple users are known.
Patent Literature 1 discloses a method of assisting the realization of more efficient sales activities while considering objective indicators.

再表２０２０／１８４６３１号公報Retable 2020/184631

対話において話者間がどのような話題に関してコミュニケーションを行ったのか確認することができていないという課題がある。
そこで、本開示は、上記課題を解決すべくなされたものであって、その目的は、対話において話者間でどのような話題に関してコミュニケーションを行ったのか確認する技術を提供することである。 There is a problem that it is not possible to confirm what topic the speakers communicated about in the dialogue.
Accordingly, the present disclosure has been made to solve the above problems, and its purpose is to provide a technique for confirming what topics have been communicated between speakers in a dialogue.

プロセッサと、記憶部とを備え、第１ユーザと第２ユーザとの間の対話に関する情報をコンピュータに処理させるプログラムであって、プログラムは、プロセッサに、対話に関する音声データを受け付ける受付ステップと、受付ステップにおいて受け付けた音声データから、発話区間ごとに複数の区間音声データを抽出する音声抽出ステップと、複数の区間音声データのうち、所定の話題に関する第１トピックと関連する１または複数の区間音声データを特定する区間特定ステップと、複数の区間音声データのうち、区間特定ステップにおいて特定された１または複数の区間音声データと、第１トピックと、に基づき、１または複数の区間音声データに含まれるテキスト情報を要約した要約テキストを生成する要約ステップと、を実行させるプログラム。 A program, comprising a processor and a storage unit, for causing a computer to process information relating to a dialogue between a first user and a second user, the program comprising: a receiving step of receiving voice data regarding the dialogue to the processor; a speech extraction step of extracting a plurality of segmental speech data for each utterance segment from the speech data received in the step; included in the one or more section audio data based on the section identification step of identifying the section audio data, the one or more section audio data identified in the section identification step among the plurality of section audio data, and the first topic A program that causes a summary step to generate summary text that summarizes textual information.

本開示によれば、対話サービスにおいて話者がどのような話題に関してコミュニケーションを行ったのか特定することができる。 According to the present disclosure, it is possible to identify what topic the speaker has communicated with in the dialogue service.

システム１の機能構成を示すブロック図である。2 is a block diagram showing a functional configuration of system 1; FIG. サーバ１０の機能構成を示すブロック図である。3 is a block diagram showing the functional configuration of the server 10; FIG. 第１ユーザ端末２０の機能構成を示すブロック図である。2 is a block diagram showing the functional configuration of the first user terminal 20; FIG. 第２ユーザ端末３０の機能構成を示すブロック図である。3 is a block diagram showing a functional configuration of a second user terminal 30; FIG. ＣＲＭシステム５０の機能構成を示すブロック図である。2 is a block diagram showing the functional configuration of a CRM system 50; FIG. ユーザテーブル１０１２のデータ構造を示す図である。FIG. 10 is a diagram showing the data structure of a user table 1012; FIG. 組織テーブル１０１３のデータ構造を示す図である。FIG. 10 is a diagram showing the data structure of an organization table 1013; FIG. 対話テーブル１０１４のデータ構造を示す図である。FIG. 10 is a diagram showing the data structure of a dialogue table 1014; FIG. ラベルテーブル１０１５のデータ構造を示す図である。FIG. 10 is a diagram showing the data structure of a label table 1015; FIG. 音声区間テーブル１０１６のデータ構造を示す図である。FIG. 10 is a diagram showing the data structure of a voice interval table 1016; トピック関連度テーブル１０１７のデータ構造を示す図である。FIG. 10 shows the data structure of a topic relevance table 1017. FIG. 感情条件マスタ１０２１のデータ構造を示す図である。FIG. 10 is a diagram showing the data structure of an emotional condition master 1021; FIG. 話者タイプマスタ１０２２のデータ構造を示す図である。FIG. 10 is a diagram showing the data structure of speaker type master 1022. FIG. トピックマスタ１０２３のデータ構造を示す図である。FIG. 10 is a diagram showing the data structure of a topic master 1023; FIG. 顧客テーブル５０１２のデータ構造を示す図である。It is a figure which shows the data structure of the customer table 5012. FIG. 感情解析処理の動作を示すフローチャートである。4 is a flowchart showing the operation of emotion analysis processing; 印象解析処理の動作を示すフローチャートである。4 is a flowchart showing the operation of impression analysis processing; トピック解析処理の動作を示すフローチャートである。10 is a flowchart showing the operation of topic analysis processing; トピック提示処理の動作を示すフローチャートである。10 is a flowchart showing the operation of topic presentation processing; トピック提示処理の動作を示す画面例である。It is an example of a screen which shows operation|movement of a topic presentation process. コンピュータ９０の基本的なハードウェア構成を示すブロック図である。2 is a block diagram showing the basic hardware configuration of computer 90. FIG.

以下、本開示の実施形態について図面を参照して説明する。実施形態を説明する全図において、共通の構成要素には同一の符号を付し、繰り返しの説明を省略する。なお、以下の実施形態は、特許請求の範囲に記載された本開示の内容を不当に限定するものではない。また、実施形態に示される構成要素のすべてが、本開示の必須の構成要素であるとは限らない。また、各図は模式図であり、必ずしも厳密に図示されたものではない。 Hereinafter, embodiments of the present disclosure will be described with reference to the drawings. In all the drawings for explaining the embodiments, common constituent elements are given the same reference numerals, and repeated explanations are omitted. It should be noted that the following embodiments do not unduly limit the content of the present disclosure described in the claims. Also, not all the components shown in the embodiments are essential components of the present disclosure. Each figure is a schematic diagram and is not necessarily strictly illustrated.

＜システム１の構成＞
本開示におけるシステム１は、オペレータである第１ユーザと顧客である第２ユーザとの間でオンラインで行われる対話サービス（オンライン対話サービス）を提供する情報処理システムである。なお、本開示におけるシステム１は、第１ユーザ、第２ユーザに加えて、他の１または複数のユーザを含む三者以上のユーザ間でオンラインで行われる対話サービスも提供可能としても良い。
システム１は、ネットワークＮを介して接続された、サーバ１０、第１ユーザ端末２０、第２ユーザ端末３０、ＣＲＭシステム５０、音声サーバ（ＰＢＸ）６０の情報処理装置を備える。
図１は、システム１の機能構成を示すブロック図である。
図２は、サーバ１０の機能構成を示すブロック図である。
図３は、第１ユーザ端末２０の機能構成を示すブロック図である。
図４は、第２ユーザ端末３０の機能構成を示すブロック図である。
図５は、ＣＲＭシステム５０の機能構成を示すブロック図である。 <Configuration of System 1>
A system 1 according to the present disclosure is an information processing system that provides an online dialogue service (online dialogue service) between a first user who is an operator and a second user who is a customer. Note that the system 1 according to the present disclosure may also provide an online interactive service between three or more users including the first user, the second user, and one or more other users.
The system 1 includes information processing devices connected via a network N, including a server 10 , a first user terminal 20 , a second user terminal 30 , a CRM system 50 and a voice server (PBX) 60 .
FIG. 1 is a block diagram showing the functional configuration of system 1. As shown in FIG.
FIG. 2 is a block diagram showing the functional configuration of the server 10. As shown in FIG.
FIG. 3 is a block diagram showing the functional configuration of the first user terminal 20. As shown in FIG.
FIG. 4 is a block diagram showing the functional configuration of the second user terminal 30. As shown in FIG.
FIG. 5 is a block diagram showing the functional configuration of the CRM system 50. As shown in FIG.

各情報処理装置は演算装置と記憶装置とを備えたコンピュータにより構成されている。コンピュータの基本ハードウェア構成および、当該ハードウェア構成により実現されるコンピュータの基本機能構成は後述する。サーバ１０、第１ユーザ端末２０、第２ユーザ端末３０、ＣＲＭシステム５０、音声サーバ（ＰＢＸ）６０のそれぞれについて、後述するコンピュータの基本ハードウェア構成およびコンピュータの基本機能構成と重複する説明は省略する。 Each information processing device is composed of a computer having an arithmetic device and a storage device. The basic hardware configuration of the computer and the basic functional configuration of the computer realized by the hardware configuration will be described later. For each of the server 10, the first user terminal 20, the second user terminal 30, the CRM system 50, and the voice server (PBX) 60, descriptions overlapping the basic hardware configuration of the computer and the basic functional configuration of the computer to be described later will be omitted. .

＜サーバ１０の構成＞
サーバ１０は、第１ユーザと第２ユーザとの間で行われる対話に関連するデータ（対話データ）を記憶、管理するサービスを提供する情報処理装置である。
サーバ１０は、記憶部１０１、制御部１０４を備える。 <Configuration of Server 10>
The server 10 is an information processing device that provides a service of storing and managing data (dialogue data) related to dialogue between a first user and a second user.
The server 10 has a storage unit 101 and a control unit 104 .

＜サーバ１０の記憶部１０１の構成＞
サーバ１０の記憶部１０１は、アプリケーションプログラム１０１１、感情評価モデル１０３１、印象評価モデル１０３２、第１印象評価モデル１０３３、第２印象評価モデル１０３４、要約モデル１０３５、ユーザテーブル１０１２、組織テーブル１０１３、対話テーブル１０１４、ラベルテーブル１０１５、音声区間テーブル１０１６、トピック関連度テーブル１０１７、感情条件マスタ１０２１、話者タイプマスタ１０２２、トピックマスタ１０２３を備える。 <Configuration of Storage Unit 101 of Server 10>
The storage unit 101 of the server 10 includes an application program 1011, an emotion evaluation model 1031, an impression evaluation model 1032, a first impression evaluation model 1033, a second impression evaluation model 1034, a summary model 1035, a user table 1012, an organization table 1013, and a dialogue table. 1014, a label table 1015, a voice section table 1016, a topic relevance table 1017, an emotional condition master 1021, a speaker type master 1022, and a topic master 1023.

アプリケーションプログラム１０１１は、サーバ１０の制御部１０４を各機能ユニットとして機能させるためのプログラムである。
アプリケーションプログラム１０１１は、ウェブブラウザアプリケーションなどのアプリケーションを含む。 The application program 1011 is a program for causing the control unit 104 of the server 10 to function as each functional unit.
Application programs 1011 include applications such as web browser applications.

感情評価モデル１０３１は、音声データ、動画データ、音声データまたは動画データにおけるユーザの発言内容に関するテキストデータを入力データとして、複数の感情状態ごとの数値的な強度、数値を出力するためのモデルである。 The emotion evaluation model 1031 is a model for outputting numerical strengths and numerical values for each of a plurality of emotional states, using voice data, video data, and text data relating to user utterances in voice data or video data as input data. .

印象評価モデル１０３２は、音声データ、動画データ、音声データまたは動画データにおけるユーザの発言内容に関するテキストデータを入力データとして、複数の印象ごとの数値的な強度、数値を出力するためのモデルである。 The impression evaluation model 1032 is a model for outputting numerical strengths and numerical values for each of a plurality of impressions, using voice data, video data, and text data relating to user utterances in voice data or video data as input data.

第１印象評価モデル１０３３は、音声データ、動画データ、音声データまたは動画データにおけるユーザの発言内容に関するテキストデータを入力データとして、話者の話し方に関する対話特徴量を出力するためのモデルである。対話特徴量とは、話者の話速、抑揚、丁寧な表現の数、フィラーの数および文法的な発話の数のうち少なくともいずれか１つの話し方に関する特徴量である。 The first impression evaluation model 1033 is a model for outputting a dialogue feature amount related to the manner of speaking of a speaker, using voice data, moving image data, and text data related to user utterances in voice data or moving image data as input data. The dialogue feature amount is a feature amount relating to at least one of the speaker's speaking speed, intonation, the number of polite expressions, the number of fillers, and the number of grammatical utterances.

第２印象評価モデル１０３４は、対話特徴量を入力データとして、複数の印象ごとの数値的な強度、数値を出力するためのモデルである。 The second impression evaluation model 1034 is a model for outputting numerical strengths and numerical values for each of a plurality of impressions using dialogue feature amounts as input data.

ユーザテーブル１０１２は、サービスを利用する会員ユーザ（以下、ユーザ）の情報を記憶し管理するテーブルである。ユーザは、サービスの利用登録を行うことで、当該ユーザの情報がユーザテーブル１０１２の新しいレコードに記憶される。これにより、ユーザは本開示にかかるサービスを利用できるようになる。
ユーザテーブル１０１２は、ユーザＩＤを主キーとして、ユーザＩＤ、ＣＲＭＩＤ、組織ＩＤ、ユーザ名、ユーザ属性のカラムを有するテーブルである。
図６は、ユーザテーブル１０１２のデータ構造を示す図である。 The user table 1012 is a table that stores and manages information on member users (hereinafter referred to as users) who use the service. By registering to use the service, the user's information is stored in a new record in the user table 1012 . This enables the user to use the service according to the present disclosure.
The user table 1012 is a table having user ID, CRM ID, organization ID, user name, and user attribute columns with user ID as a primary key.
FIG. 6 is a diagram showing the data structure of the user table 1012. As shown in FIG.

ユーザＩＤは、ユーザを識別するためのユーザ識別情報を記憶する項目である。ユーザ識別情報は、ユーザごとにユニークな値が設定されている項目である。
ＣＲＭＩＤは、ＣＲＭシステム５０において、ユーザを識別するためのユーザ識別情報を記憶する項目である。ユーザはＣＲＭＩＤによりＣＲＭシステム５０にログインすることにより、ＣＲＭサービスの提供を受けることができる。サーバ１０におけるユーザＩＤは、ＣＲＭシステム５０におけるＣＲＭＩＤと関連づけられている。
組織ＩＤは、組織を識別するための組織識別情報を記憶する項目である。
ユーザ名は、ユーザの氏名を記憶する項目である。ユーザ名は、氏名ではなく、ニックネームなど任意の文字列を設定しても良い。
ユーザ属性は、ユーザの年齢、性別、出身地、方言、職種（営業、カスタマーサポートなど）などのユーザの属性に関する情報を記憶する項目である。ユーザ属性は、ユーザ個人の属性に関する情報に加え、ユーザが所属する組織、企業、グループ等に関する業種、事業規模、売上げ規模等の企業属性に関する情報を含んでも良い。 User ID is an item that stores user identification information for identifying a user. User identification information is an item in which a unique value is set for each user.
CRMID is an item that stores user identification information for identifying a user in the CRM system 50 . The user can receive CRM services by logging into the CRM system 50 using the CRM ID. A user ID in the server 10 is associated with a CRMID in the CRM system 50 .
The organization ID is an item that stores organization identification information for identifying an organization.
The user name is an item that stores the name of the user. Any character string such as a nickname may be set as the user name instead of the full name.
The user attribute is an item that stores information related to user attributes such as age, gender, hometown, dialect, occupation (sales, customer support, etc.) of the user. The user attributes may include information about the user's personal attributes, as well as information about the company's attributes such as the organization, company, group, etc. to which the user belongs.

組織テーブル１０１３は、ユーザが所属する組織に関する情報（組織情報）を記憶し管理するテーブルである。組織は、会社、法人、企業グループ、サークル、各種団体など任意の組織、グループが含まれる。組織は、会社の部署（営業部、総務部、カスタマーサポート部）などのより詳細なサブグループごとに定義しても良い。
組織テーブル１０１３は、組織ＩＤを主キーとして、組織ＩＤ、組織名、組織属性のカラムを有するテーブルである。
図７は、組織テーブル１０１３のデータ構造を示す図である。 The organization table 1013 is a table that stores and manages information (organization information) regarding organizations to which users belong. Organizations include arbitrary organizations and groups such as companies, corporations, corporate groups, circles, and various organizations. Organizations may also be defined by more detailed sub-groups such as company departments (sales department, general affairs department, customer support department).
The organization table 1013 is a table having columns of organization ID, organization name, and organization attribute with organization ID as a primary key.
FIG. 7 is a diagram showing the data structure of the organization table 1013. As shown in FIG.

組織ＩＤは、組織を識別するための組織識別情報を記憶する項目である。組織識別情報は、組織情報ごとにユニークな値が設定されている項目である。
組織名は、組織の名称を記憶する項目である。組織名は任意の文字列を設定できる。
組織属性は、組織種別（会社、企業グループ、その他団体など）、業種（不動産、金融など）などの組織の属性に関する情報を記憶する項目である。 The organization ID is an item that stores organization identification information for identifying an organization. Organization identification information is an item in which a unique value is set for each organization information.
The organization name is an item that stores the name of the organization. Any character string can be set for the organization name.
The organization attribute is an item that stores information related to organization attributes such as organization type (company, corporate group, other organization, etc.) and industry (real estate, finance, etc.).

対話テーブル１０１４は、ユーザと顧客との間で行われる対話に関連する情報（対話情報）を記憶し管理するためのテーブルである。
対話テーブル１０１４は、対話ＩＤを主キーとして、対話ＩＤ、ユーザＩＤ、顧客ＩＤ、対話カテゴリ、受発信種別、音声データ、動画データのカラムを有するテーブルである。
図８は、対話テーブル１０１４のデータ構造を示す図である。 The dialogue table 1014 is a table for storing and managing information (dialogue information) related to dialogue between the user and the customer.
The dialogue table 1014 is a table having columns of dialogue ID, user ID, customer ID, dialogue category, reception/transmission type, audio data, and video data, with the dialogue ID as a primary key.
FIG. 8 is a diagram showing the data structure of the dialogue table 1014. As shown in FIG.

対話ＩＤは、対話を識別するための対話識別情報を記憶する項目である。対話識別情報は、対話情報ごとにユニークな値が設定されている項目である。
ユーザＩＤは、ユーザと顧客との間で行われる対話において、ユーザを識別するためのユーザ識別情報を記憶する項目である。対話情報ごとに、複数のユーザＩＤが関連づけられていても良い。
顧客ＩＤは、ユーザと顧客との間で行われる対話において、顧客を識別するためのユーザ識別情報を記憶する項目である。対話情報ごとに、複数の顧客のユーザＩＤが関連づけられていても良い。
対話カテゴリは、ユーザと顧客との間で行われた対話の種類（カテゴリ）を記憶する項目である。対話データは、対話カテゴリにより分類される。対話カテゴリには、ユーザと顧客との間で行われる対話の目的などに応じて、テレフォンオペレーター、テレマーケティング、カスタマーサポート、テクニカルサポートなどの値が記憶される。
受発信種別は、ユーザと顧客との間で行われた対話が、ユーザが発信した（アウトバウンド）ものか、ユーザが受信した（インバウンド）もののいずれかを区別するための情報を記憶する項目である。また、３者以上のユーザによる対話の際には、ルームという受発信種別が記憶される。
音声データは、マイクにより集音された音声データを記憶する項目である。他の場所に配置された音声データファイルに対する参照情報（パス）を記憶するものとしても良い。音声データのフォーマットは、ＡＡＣ，ＡＴＲＡＣ、ｍｐ３、ｍｐ４など任意のデータフォーマットで良い。
音声データは、ユーザの音声と顧客の音声とが、それぞれ独立して識別可能な識別子が設定された形式のデータであっても良い。この場合、サーバ１０の制御部１０４は、ユーザの音声、顧客の音声に対してそれぞれ独立した解析処理を実行できる。また、ユーザ、顧客の音声データに基づき、ユーザ、顧客のユーザＩＤを特定できる。
本開示において、音声データに替えて、音声情報を含む動画データを用いても構わない。また、本開示における音声データは、動画データに含まれる音声データも含む。
動画データは、カメラ等による撮影された動画データを記憶する項目である。他の場所に配置された動画データファイルに対する参照情報（パス）を記憶するものとしても良い。動画データのフォーマットは、ＭＰ４、ＭＯＶ、ＷＭＶ、ＡＶＩ、ＡＶＣＨＤなど任意のデータフォーマットで良い。
動画データは、ユーザの動画と顧客の動画とが、それぞれ独立して識別可能な識別子が設定された形式のデータであっても良い。この場合、サーバ１０の制御部１０４は、ユーザの動画、顧客の動画に対してそれぞれ独立した解析処理を実行できる。また、ユーザ、顧客の動画データに基づき、ユーザ、顧客のユーザＩＤを特定できる。 The dialogue ID is an item that stores dialogue identification information for identifying a dialogue. Dialogue identification information is an item in which a unique value is set for each piece of dialogue information.
The user ID is an item that stores user identification information for identifying the user in the interaction between the user and the customer. A plurality of user IDs may be associated with each piece of dialogue information.
The customer ID is an item that stores user identification information for identifying a customer in a dialogue between the user and the customer. A plurality of customer user IDs may be associated with each piece of dialogue information.
The dialogue category is an item that stores the type (category) of dialogue between the user and the customer. Interaction data is classified by interaction category. Values such as telephone operator, telemarketing, customer support, and technical support are stored in the dialogue category according to the purpose of the dialogue between the user and the customer.
The reception/transmission type is an item that stores information for distinguishing whether the dialogue between the user and the customer is transmitted by the user (outbound) or received by the user (inbound). . Also, when three or more users interact, the reception/transmission type of room is stored.
Audio data is an item that stores audio data collected by a microphone. It is also possible to store reference information (paths) to audio data files located in other locations. The audio data format may be any data format such as AAC, ATRAC, mp3, mp4.
The voice data may be data in a format in which the user's voice and the customer's voice are individually identifiable as identifiers. In this case, the control unit 104 of the server 10 can perform independent analysis processing on the user's voice and the customer's voice. Also, the user ID of the user or customer can be specified based on the voice data of the user or customer.
In the present disclosure, video data including audio information may be used instead of audio data. Also, audio data in the present disclosure includes audio data included in moving image data.
Moving image data is an item that stores moving image data captured by a camera or the like. It is also possible to store reference information (paths) for moving image data files located in other locations. Any data format such as MP4, MOV, WMV, AVI, and AVCHD may be used as the format of moving image data.
The moving image data may be data in a format in which identifiers are set so that the moving images of the user and the moving images of the customer are independently identifiable. In this case, the control unit 104 of the server 10 can perform independent analysis processing on the user's moving image and the customer's moving image. Also, the user ID of the user and the customer can be specified based on the video data of the user and the customer.

ラベルテーブル１０１５は、ラベルに関する情報（ラベル情報）を記憶し管理するためのテーブルである。
ラベルテーブル１０１５は、対話ＩＤ、ラベルデータのカラムを有するテーブルである。
図９は、ラベルテーブル１０１５のデータ構造を示す図である。 The label table 1015 is a table for storing and managing information about labels (label information).
The label table 1015 is a table having columns of dialogue ID and label data.
FIG. 9 is a diagram showing the data structure of the label table 1015. As shown in FIG.

対話ＩＤは、対話を識別するための対話識別情報を記憶する項目である。
ラベルデータは、対話を管理するためのラベル情報を記憶する項目である。ラベル情報は、分類名、ラベル、分類ラベル、タグなど、対話情報を管理するための付加的な情報である。
ラベルデータはラベル情報の名称を示す文字列でも良いし、他のテーブルに記憶されたラベル情報の名称を参照するためのラベルＩＤ等でも良い。
ラベルデータは、特定の対話における話者の感情状態に応じた分類情報を含む。分類データは、特定の対話において話者の応対の善し悪しを分類するための分類情報を含む。 The dialogue ID is an item that stores dialogue identification information for identifying a dialogue.
Label data is an item that stores label information for managing interactions. Label information is additional information for managing dialogue information, such as classification names, labels, classification labels, and tags.
The label data may be a character string indicating the name of label information, or may be a label ID or the like for referring to the name of label information stored in another table.
Label data includes classification information according to the speaker's emotional state in a particular dialogue. The classification data includes classification information for classifying whether the speaker's response is good or bad in a specific dialogue.

音声区間テーブル１０１６は、対話情報に含まれる複数の音声区間に関する情報（音声区間情報）を記憶し管理するためのテーブルである。
音声区間テーブル１０１６は、区間ＩＤを主キーとして、区間ＩＤ、対話ＩＤ、話者ＩＤ、開始日時、終了日時、区間音声データ、区間動画データ、区間読上テキスト、感情データ、印象データ、トピックＩＤのカラムを有するテーブルである。
図１０は、音声区間テーブル１０１６のデータ構造を示す図である。 The speech segment table 1016 is a table for storing and managing information (speech segment information) regarding a plurality of speech segments included in dialogue information.
The speech section table 1016 uses the section ID as a main key, and includes section ID, dialogue ID, speaker ID, start date and time, end date and time, section speech data, section video data, section reading text, emotion data, impression data, and topic ID. is a table with columns of
FIG. 10 is a diagram showing the data structure of the voice interval table 1016. As shown in FIG.

区間ＩＤは、音声区間を識別するための区間識別情報を記憶する項目である。区間識別情報は、音声区間情報ごとにユニークな値が設定されている項目である。
対話ＩＤは、音声区間情報が関連づけられる対話を識別するための対話識別情報を記憶する項目である。
話者ＩＤは、音声区間情報が関連づけられる話者を識別するための話者識別情報を記憶する項目である。具体的に、話者ＩＤは、対話に参加した、複数のユーザのユーザＩＤを記憶する項目である。
開始日時は、音声区間、動画区間の開始日時を記憶する項目である。
終了日時は、音声区間、動画区間の終了日時を記憶する項目である。
区間音声データは、音声区間に含まれる音声データを記憶する項目である。他の場所に配置された音声データファイルに対する参照情報（パス）を記憶するものとしても良い。また、開始日時、終了日時に基づき対話テーブル１０１４の音声データの開始日時から終了日時までの期間の音声データに対する参照を記憶しても良い。また、区間音声データは、区間動画データに含まれる音声データを含むものとしても構わない。
音声データのフォーマットは、ＡＡＣ，ＡＴＲＡＣ、ｍｐ３、ｍｐ４など任意のデータフォーマットで良い。
区間動画データは、音声区間に含まれる動画データを記憶する項目である。他の場所に配置された動画データファイルに対する参照情報（パス）を記憶するものとしても良い。また、開始日時、終了日時に基づき対話テーブル１０１４の動画データの開始日時から終了日時までの期間の動画データに対する参照を記憶しても良い。
動画データのフォーマットは、ＭＰ４、ＭＯＶ、ＷＭＶ、ＡＶＩ、ＡＶＣＨＤなど任意のデータフォーマットで良い。
区間読上テキストは、音声区間に含まれる区間音声データにおいて話者により発話された内容のテキスト情報を記憶する項目である．具体的に、区間読上テキストは、区間音声データ、区間動画データに基づき、人手、任意の機械学習、深層学習等の学習モデルを用いることにより生成しても良い。
感情データは、音声区間において、話者の感情状態を記憶する項目である。感情データは、興味・興奮、喜び、驚き、不安、怒り、嫌悪、軽蔑、恐怖、恥、罪悪感等の、話者の複数の感情状態に関する多次元尺度（感情ベクトル）である。感情データは、対話区間において、話者がどのような感情状態にあるのか、複数の感情状態（次元）ごとの強度、数値として定量的に表現したものである。感情データは、感情ベクトルに基づき、１次元の感情に関する強度を示す感情スカラーを算出し、記憶する構成としても良い。
印象データは、音声区間において、話者の印象を記憶する項目である。印象データは、好き、嫌い、うるさい、聞きづらい、丁寧、わかりにくい、おどおどした、神経質、威圧的、暴力的および性的の、話者が与える複数の異なる印象に関する多次元尺度（ベクトル）である。対話区間において、話者がどのような印象を与えるのか、複数の印象（次元）ごとの強度、数値として定量的に表現したものである。
トピックＩＤは、音声区間において、音声区間に関連づけられたトピック識別情報を記憶する項目である。 The section ID is an item that stores section identification information for identifying a speech section. The section identification information is an item in which a unique value is set for each piece of speech section information.
The dialog ID is an item that stores dialog identification information for identifying a dialog associated with speech segment information.
The speaker ID is an item that stores speaker identification information for identifying a speaker associated with speech segment information. Specifically, the speaker ID is an item that stores the user IDs of multiple users who have participated in the dialogue.
The start date and time is an item for storing the start date and time of the audio segment and the moving image segment.
The end date and time is an item for storing the end date and time of the audio segment and the video segment.
The section sound data is an item for storing sound data included in the sound section. It is also possible to store reference information (paths) to audio data files located in other locations. Also, based on the start date and time and the end date and time, reference to the voice data in the period from the start date and time to the end date and time of the voice data in the dialog table 1014 may be stored. Also, the section audio data may include the audio data included in the section video data.
The audio data format may be any data format such as AAC, ATRAC, mp3, mp4.
Section moving image data is an item for storing moving image data included in a voice section. It is also possible to store reference information (paths) for moving image data files located in other locations. Also, based on the start date and time and the end date and time, reference to the video data in the period from the start date and time to the end date and time of the video data in the interaction table 1014 may be stored.
Any data format such as MP4, MOV, WMV, AVI, and AVCHD may be used as the format of moving image data.
The section reading text is an item that stores text information of the content uttered by the speaker in the section speech data included in the speech section. Specifically, the section reading text may be generated manually, using arbitrary machine learning, deep learning, or other learning models based on section audio data and section video data.
Emotion data is an item that stores the emotional state of the speaker in the speech period. Emotional data is a multidimensional measure (emotional vector) of a speaker's multiple emotional states, such as interest/excitement, joy, surprise, anxiety, anger, disgust, contempt, fear, shame, guilt, and so on. Emotion data quantitatively expresses what kind of emotional state the speaker is in during the dialogue section as the intensity and numerical value for each of a plurality of emotional states (dimensions). The emotion data may be configured to calculate and store an emotion scalar indicating the intensity of one-dimensional emotion based on the emotion vector.
The impression data is an item that stores the speaker's impression in the speech period. Impression data is a multidimensional scale (vector) of multiple different impressions given by a speaker: like, dislike, picky, hard to hear, polite, unclear, timid, nervous, intimidating, violent and sexual. It quantitatively expresses what kind of impression the speaker gives in the dialogue interval as the intensity and numerical value for each of multiple impressions (dimensions).
The topic ID is an item that stores topic identification information associated with a voice segment in the voice segment.

トピック関連度テーブル１０１７は、音声区間ごとのトピック関連度に関する情報（トピック関連度情報）を記憶し管理するためのテーブルである。
トピック関連度テーブル１０１７は、区間ＩＤ、トピックＩＤ、関連度のカラムを有するテーブルである。
図１１は、トピック関連度テーブル１０１７のデータ構造を示す図である。 The topic relevance table 1017 is a table for storing and managing information on topic relevance (topic relevance information) for each speech segment.
The topic relevance table 1017 is a table having columns of section ID, topic ID, and relevance.
FIG. 11 is a diagram showing the data structure of the topic relevance table 1017. As shown in FIG.

区間ＩＤは、対象となる音声区間の区間識別情報を記憶する項目である。
トピックＩＤは、トピックを識別するためのトピック識別情報を記憶する項目である。
関連度は、対話情報に含まれる音声区間において、トピックＩＤにより特定されるトピック識別情報ごとの関連度に関する情報を記憶する項目である。１の音声区間について、トピックＩＤにより特定されるトピックとの関連度を示す数値が記憶する項目である。関連度が大きいほど対話情報とトピックとの関連性が強くなる。 The section ID is an item for storing section identification information of a target speech section.
Topic ID is an item that stores topic identification information for identifying a topic.
The degree of relevance is an item that stores information relating to the degree of relevance for each topic identification information specified by a topic ID in a speech segment included in dialogue information. This is an item in which a numerical value indicating the degree of relevance to a topic specified by a topic ID is stored for one voice segment. The greater the degree of relevance, the stronger the relevance between the dialogue information and the topic.

感情条件マスタ１０２１は、感情条件に関する情報（感情条件情報）を記憶し管理するためのテーブルである。
感情条件マスタ１０２１は、感情条件、ラベルデータのカラムを有するテーブルである。
図１２は、感情条件マスタ１０２１のデータ構造を示す図である。 Emotional condition master 1021 is a table for storing and managing information on emotional conditions (emotional condition information).
The emotional condition master 1021 is a table having columns of emotional conditions and label data.
FIG. 12 is a diagram showing the data structure of the emotional condition master 1021. As shown in FIG.

感情条件は、感情データに関する条件を記憶する項目である。具体的には、感情データの閾値、平均値、回帰分析を行った際の回帰係数などに対する条件が記憶される。
ラベルデータは、感情条件に関連づけられるラベル情報を記憶する項目である。 The emotional condition is an item that stores conditions related to emotional data. Specifically, conditions for threshold values, average values, regression coefficients when regression analysis is performed, and the like of emotion data are stored.
Label data is an item that stores label information associated with emotional conditions.

話者タイプマスタ１０２２は、印象条件に関する情報（印象条件情報）を記憶し管理するためのテーブルである。
話者タイプマスタ１０２２は、印象条件、話者タイプのカラムを有するテーブルである。
図１３は、話者タイプマスタ１０２２のデータ構造を示す図である。 The speaker type master 1022 is a table for storing and managing information on impression conditions (impression condition information).
The speaker type master 1022 is a table having columns of impression condition and speaker type.
FIG. 13 shows the data structure of the speaker type master 1022. As shown in FIG.

印象条件は、印象データに関する条件を記憶する項目である。具体的には、印象データの閾値、平均値、回帰分析を行った際の回帰係数などに対する条件が記憶される。
話者タイプは、印象条件に関連づけられる話者タイプを記憶する項目である。話者タイプは、強引、控え目、重厚、友好的、積極的、感情的などの話者が対話相手に与える印象を分類したものである。 The impression condition is an item that stores conditions related to impression data. Specifically, conditions for threshold values, average values, regression coefficients when performing regression analysis, etc. of impression data are stored.
The speaker type is an item that stores the speaker type associated with the impression condition. The speaker type classifies the impression that the speaker gives to the interlocutor, such as assertive, modest, serious, friendly, positive, and emotional.

トピックマスタ１０２３は、トピックに関する情報（トピック情報）を記憶し管理するためのテーブルである。
トピックマスタ１０２３は、トピックＩＤを主キーとして、トピックＩＤ、キーワードのカラムを有するテーブルである。
図１４は、トピックマスタ１０２３のデータ構造を示す図である。 The topic master 1023 is a table for storing and managing information on topics (topic information).
The topic master 1023 is a table having topic ID and keyword columns with topic ID as a primary key.
FIG. 14 is a diagram showing the data structure of the topic master 1023. As shown in FIG.

トピックＩＤは、トピックを識別するためのトピック識別情報を記憶する項目である。トピック識別情報は、トピック情報ごとにユニークな値が設定されている項目である。
キーワードは、トピックが関連づけられる複数のキーワードを記憶する項目である。具体的に、１のトピックに対して複数のキーワードが関連づけられる。 Topic ID is an item that stores topic identification information for identifying a topic. Topic identification information is an item in which a unique value is set for each topic information.
A keyword is an item that stores a plurality of keywords with which a topic is associated. Specifically, a plurality of keywords are associated with one topic.

＜サーバ１０の制御部１０４の構成＞
サーバ１０の制御部１０４は、ユーザ登録制御部１０４１、感情解析部１０４２、印象解析部１０４３、トピック処理部１０４４、学習部１０５１を備える。制御部１０４は、記憶部１０１に記憶されたアプリケーションプログラム１０１１を実行することにより、各機能ユニットが実現される。 <Configuration of Control Unit 104 of Server 10>
The control unit 104 of the server 10 includes a user registration control unit 1041 , an emotion analysis unit 1042 , an impression analysis unit 1043 , a topic processing unit 1044 and a learning unit 1051 . Control unit 104 implements each functional unit by executing application program 1011 stored in storage unit 101 .

ユーザ登録制御部１０４１は、本開示に係るサービスの利用を希望するユーザの情報をユーザテーブル１０１２に記憶する処理を行う。
ユーザテーブル１０１２に記憶される情報は、ユーザが任意の情報処理端末からサービス提供者が運営するウェブページなどを開き、所定の入力フォームに情報を入力しサーバ１０へ送信する。ユーザ登録制御部１０４１は、受信した情報をユーザテーブル１０１２の新しいレコードに記憶し、ユーザ登録が完了する。これにより、ユーザテーブル１０１２に記憶されたユーザはサービスを利用できるようになる。
ユーザ登録制御部１０４１によるユーザ情報のユーザテーブル１０１２への登録に先立ち、サービス提供者は所定の審査を行いユーザによるサービス利用可否を制限しても良い。
ユーザＩＤは、ユーザを識別できる任意の文字列または数字で良く、ユーザが希望する任意の文字列または数字、もしくはユーザ登録制御部１０４１が自動的に任意の文字列または数字を設定しても良い。 The user registration control unit 1041 performs processing for storing information of users who wish to use the service according to the present disclosure in the user table 1012 .
The information stored in the user table 1012 is transmitted to the server 10 by the user opening a web page or the like operated by the service provider from any information processing terminal, entering information in a predetermined input form, and transmitting the information to the server 10 . The user registration control unit 1041 stores the received information in a new record of the user table 1012, and user registration is completed. As a result, the users stored in the user table 1012 can use the service.
Prior to registration of user information in the user table 1012 by the user registration control unit 1041, the service provider may perform a predetermined examination to limit whether or not the user can use the service.
The user ID may be any character string or number that can identify the user, any character string or number desired by the user, or any character string or number automatically set by the user registration control unit 1041. .

感情解析部１０４２は、感情解析処理を実行する。詳細は後述する。 Emotion analysis unit 1042 executes emotion analysis processing. Details will be described later.

印象解析部１０４３は、印象解析処理を実行する。詳細は後述する。 The impression analysis unit 1043 executes impression analysis processing. Details will be described later.

トピック処理部１０４４は、トピック定義処理、トピック解析処理、トピック提示処理を実行する。詳細は後述する。 The topic processing unit 1044 executes topic definition processing, topic analysis processing, and topic presentation processing. Details will be described later.

学習部１０５１は、学習処理を実行する。 The learning unit 1051 executes learning processing.

＜第１ユーザ端末２０の構成＞
第１ユーザ端末２０は、サービスを利用する第１ユーザが操作する情報処理装置である。第１ユーザ端末２０は、例えば、据え置き型のＰＣ（ＰｅｒｓｏｎａｌＣｏｍｐｕｔｅｒ）、ラップトップＰＣであってもよいし、スマートフォン、タブレット等の携帯端末でもよい。また、ＨＭＤ（ＨｅａｄＭｏｕｎｔＤｉｓｐｌａｙ）、腕時計型端末等のウェアラブル端末であってもよい。
第１ユーザ端末２０は、記憶部２０１、制御部２０４、入力装置２０６、出力装置２０８を備える。 <Configuration of first user terminal 20>
The first user terminal 20 is an information processing device operated by a first user who uses the service. The first user terminal 20 may be, for example, a stationary PC (Personal Computer), a laptop PC, or a mobile terminal such as a smart phone or a tablet. Moreover, it may be a wearable terminal such as an HMD (Head Mount Display) or a wristwatch type terminal.
The first user terminal 20 includes a storage unit 201 , a control unit 204 , an input device 206 and an output device 208 .

＜第１ユーザ端末２０の記憶部２０１の構成＞
第１ユーザ端末２０の記憶部２０１は、第１ユーザＩＤ２０１１、アプリケーションプログラム２０１２を備える。 <Configuration of Storage Unit 201 of First User Terminal 20>
The storage unit 201 of the first user terminal 20 has a first user ID 2011 and an application program 2012 .

第１ユーザＩＤ２０１１は、第１ユーザのユーザ識別情報を記憶する。ユーザは、第１ユーザ端末２０から第１ユーザＩＤ２０１１を、音声サーバ（ＰＢＸ）６０へ送信する。音声サーバ（ＰＢＸ）６０は、第１ユーザＩＤ２０１１に基づき第１ユーザを識別し、本開示にかかるサービスを第１ユーザに対して提供する。なお、第１ユーザＩＤ２０１１には、第１ユーザ端末２０を利用しているユーザを識別するにあたり音声サーバ（ＰＢＸ）６０から一時的に付与されるセッションＩＤなどの情報を含む。 The first user ID 2011 stores the user identification information of the first user. The user transmits the first user ID 2011 from the first user terminal 20 to the voice server (PBX) 60 . The voice server (PBX) 60 identifies the first user based on the first user ID 2011 and provides the service according to the present disclosure to the first user. The first user ID 2011 includes information such as a session ID temporarily assigned by the voice server (PBX) 60 for identifying the user using the first user terminal 20 .

アプリケーションプログラム２０１２は、記憶部２０１に予め記憶されていても良いし、通信ＩＦを介してサービス提供者が運営するウェブサーバ等からダウンロードする構成としても良い。
アプリケーションプログラム２０１２は、ウェブブラウザアプリケーションなどのアプリケーションを含む。
アプリケーションプログラム２０１２は、第１ユーザ端末２０に記憶されているウェブブラウザアプリケーション上で実行されるＪａｖａＳｃｒｉｐｔ（登録商標）などのインタープリター型プログラミング言語を含む。 The application program 2012 may be stored in the storage unit 201 in advance, or may be downloaded from a web server or the like operated by the service provider via the communication IF.
Application programs 2012 include applications such as web browser applications.
The application program 2012 includes an interpreted programming language such as JavaScript (registered trademark) that runs on a web browser application stored in the first user terminal 20 .

＜第１ユーザ端末２０の制御部２０４の構成＞
第１ユーザ端末２０の制御部２０４は、入力制御部２０４１、出力制御部２０４２を備える。制御部２０４は、記憶部２０１に記憶されたアプリケーションプログラム２０１２を実行することにより、各機能ユニットが実現される。 <Configuration of the control unit 204 of the first user terminal 20>
The control unit 204 of the first user terminal 20 has an input control unit 2041 and an output control unit 2042 . Control unit 204 implements each functional unit by executing application program 2012 stored in storage unit 201 .

＜第１ユーザ端末２０の入力装置２０６の構成＞
第１ユーザ端末２０の入力装置２０６は、カメラ２０６１、マイク２０６２、位置情報センサ２０６３、モーションセンサ２０６４、キーボード２０６５を備える。 <Configuration of Input Device 206 of First User Terminal 20>
The input device 206 of the first user terminal 20 has a camera 2061 , a microphone 2062 , a position information sensor 2063 , a motion sensor 2064 and a keyboard 2065 .

＜第１ユーザ端末２０の出力装置２０８の構成＞
第１ユーザ端末２０の出力装置２０８は、ディスプレイ２０８１、スピーカ２０８２を備える。 <Configuration of output device 208 of first user terminal 20>
The output device 208 of the first user terminal 20 has a display 2081 and a speaker 2082 .

＜第２ユーザ端末３０の構成＞
第２ユーザ端末３０は、サービスを利用する第２ユーザが操作する情報処理装置である。第２ユーザ端末３０は、例えば、スマートフォン、タブレット等の携帯端末でもよいし、据え置き型のＰＣ（ＰｅｒｓｏｎａｌＣｏｍｐｕｔｅｒ）、ラップトップＰＣであってもよい。また、ＨＭＤ（ＨｅａｄＭｏｕｎｔＤｉｓｐｌａｙ）、腕時計型端末等のウェアラブル端末であってもよい。
第２ユーザ端末３０は、記憶部３０１、制御部３０４、入力装置３０６、出力装置３０８を備える。 <Configuration of Second User Terminal 30>
The second user terminal 30 is an information processing device operated by a second user who uses the service. The second user terminal 30 may be, for example, a mobile terminal such as a smart phone or tablet, a stationary PC (Personal Computer), or a laptop PC. Moreover, it may be a wearable terminal such as an HMD (Head Mount Display) or a wristwatch type terminal.
The second user terminal 30 includes a storage section 301 , a control section 304 , an input device 306 and an output device 308 .

＜第２ユーザ端末３０の記憶部３０１の構成＞
第２ユーザ端末３０の記憶部３０１は、アプリケーションプログラム３０１２、電話番号３０１３を備える。 <Configuration of Storage Unit 301 of Second User Terminal 30>
The storage unit 301 of the second user terminal 30 has an application program 3012 and a telephone number 3013 .

アプリケーションプログラム３０１２は、記憶部３０１に予め記憶されていても良いし、通信ＩＦを介してサービス提供者が運営するウェブサーバ等からダウンロードする構成としても良い。
アプリケーションプログラム３０１２は、ウェブブラウザアプリケーションなどのアプリケーションを含む。
アプリケーションプログラム３０１２は、第２ユーザ端末３０に記憶されているウェブブラウザアプリケーション上で実行されるＪａｖａＳｃｒｉｐｔ（登録商標）などのインタープリター型プログラミング言語を含む。 The application program 3012 may be stored in advance in the storage unit 301, or may be downloaded from a web server or the like operated by the service provider via the communication IF.
Application programs 3012 include applications such as web browser applications.
The application program 3012 includes an interpreted programming language such as JavaScript (registered trademark) executed on a web browser application stored in the second user terminal 30 .

＜第２ユーザ端末３０の制御部３０４の構成＞
第２ユーザ端末３０の制御部３０４は、入力制御部３０４１、出力制御部３０４２を備える。制御部３０４は、記憶部３０１に記憶されたアプリケーションプログラム３０１２を実行することにより、各機能ユニットが実現される。 <Configuration of the control unit 304 of the second user terminal 30>
The control unit 304 of the second user terminal 30 has an input control unit 3041 and an output control unit 3042 . Control unit 304 implements each functional unit by executing application program 3012 stored in storage unit 301 .

＜第２ユーザ端末３０の入力装置３０６の構成＞
第２ユーザ端末３０の入力装置３０６は、カメラ３０６１、マイク３０６２、位置情報センサ３０６３、モーションセンサ３０６４、タッチデバイス３０６５を備える。 <Configuration of Input Device 306 of Second User Terminal 30>
The input device 306 of the second user terminal 30 has a camera 3061 , a microphone 3062 , a position information sensor 3063 , a motion sensor 3064 and a touch device 3065 .

＜第２ユーザ端末３０の出力装置３０８の構成＞
第２ユーザ端末３０の出力装置３０８は、ディスプレイ３０８１、スピーカ３０８２を備える。 <Configuration of output device 308 of second user terminal 30>
The output device 308 of the second user terminal 30 has a display 3081 and a speaker 3082 .

＜ＣＲＭシステム５０の構成＞
ＣＲＭシステム５０は、ＣＲＭ（ＣｕｓｔｏｍｅｒＲｅｌａｔｉｏｎｓｈｉｐＭａｎａｇｅｍｅｎｔ、第２ユーザ関係管理）サービスを提供する事業者（ＣＲＭ事業者）が管理、運営する情報処理装置である。ＣＲＭサービスとしては、ＳａｌｅｓＦｏｒｃｅ、ＨｕｂＳｐｏｔ、ＺｏｈｏＣＲＭ、ｋｉｎｔｏｎｅなどがある。
ＣＲＭシステム５０は、記憶部５０１、制御部５０４を備える。 <Configuration of CRM System 50>
The CRM system 50 is an information processing device managed and operated by a company (CRM company) that provides CRM (Customer Relationship Management, second user relationship management) services. CRM services include SalesForce, HubSpot, Zoho CRM, Kintone, and the like.
The CRM system 50 has a storage unit 501 and a control unit 504 .

＜ＣＲＭシステム５０の記憶部５０１の構成＞
ＣＲＭシステム５０の記憶部５０１は、アプリケーションプログラム５０１１、顧客テーブル５０１２を備える。 <Configuration of Storage Unit 501 of CRM System 50>
The storage unit 501 of the CRM system 50 has an application program 5011 and a customer table 5012 .

アプリケーションプログラム５０１１は、ＣＲＭシステム５０の制御部５０４を各機能ユニットとして機能させるためのプログラムである。
アプリケーションプログラム５０１１は、ウェブブラウザアプリケーションなどのアプリケーションを含む。 The application program 5011 is a program for causing the control unit 504 of the CRM system 50 to function as each functional unit.
Application programs 5011 include applications such as web browser applications.

顧客テーブル５０１２は、顧客にかかるユーザ情報（顧客情報）を記憶し管理するためのテーブルである。
顧客テーブル５０１２は、顧客ＩＤを主キーとして、顧客ＩＤ、ユーザＩＤ、氏名、電話番号、話者タイプのカラムを有するテーブルである。
図１５は、顧客テーブル５０１２のデータ構造を示す図である。 The customer table 5012 is a table for storing and managing user information (customer information) on customers.
The customer table 5012 is a table having customer ID, user ID, name, telephone number, and speaker type columns with customer ID as a primary key.
FIG. 15 is a diagram showing the data structure of the customer table 5012. As shown in FIG.

顧客ＩＤは、顧客のユーザ識別情報を記憶する項目である。ユーザ識別情報は、顧客ごとにユニークな値が設定されている項目である。
ユーザＩＤは、顧客を管理するユーザのユーザ識別情報を記憶する項目である。
氏名は、顧客の氏名を記憶する項目である。
電話番号は、顧客の電話番号を記憶する項目である。
ユーザは、ＣＲＭシステムが提供するウェブサイトにアクセスし、電話を発信したい顧客を選択し「発信」などの所定の操作を行なうことにより、第１ユーザ端末２０から顧客の電話番号に対して電話を発信できる。
話者タイプは、顧客ＩＤにより特定されるユーザの話者タイプを記憶する項目である。 The customer ID is an item that stores customer user identification information. User identification information is an item in which a unique value is set for each customer.
User ID is an item for storing user identification information of a user who manages a customer.
The name is an item for storing the customer's name.
The phone number is an item that stores the customer's phone number.
The user accesses the website provided by the CRM system, selects the customer to whom he/she wants to make a call, and performs a predetermined operation such as "call" to make a call from the first user terminal 20 to the customer's telephone number. I can make a call.
The speaker type is an item that stores the user's speaker type specified by the customer ID.

＜ＣＲＭシステム５０の制御部５０４の構成＞
ＣＲＭシステム５０の制御部５０４は、ユーザ登録制御部５０４１を備える。制御部５０４は、記憶部５０１に記憶されたアプリケーションプログラム５０１１を実行することにより、各機能ユニットが実現される。 <Configuration of Control Unit 504 of CRM System 50>
The control unit 504 of the CRM system 50 has a user registration control unit 5041 . Control unit 504 implements each functional unit by executing application program 5011 stored in storage unit 501 .

ユーザ登録制御部５０４１は、本開示に係るサービスにおいて顧客情報を顧客テーブル５０１２に記憶する処理を行う。
顧客テーブル５０１２に記憶される情報は、ユーザが任意の情報処理端末からサービス提供者が運営するウェブページなどを開き、所定の入力フォームに情報を入力しＣＲＭシステム５０へ送信する。ユーザ登録制御部５０４１は、受信した情報を顧客テーブル５０１２の新しいレコードに記憶し、顧客の登録が完了する。これにより、顧客情報が顧客の管理を行うユーザのユーザＩＤと関連づけて記憶される。
顧客ＩＤは、ユーザを識別できる任意の文字列または数字で良く、ユーザが希望する任意の文字列または数字、もしくはユーザ登録制御部５０４１が自動的に任意の文字列または数字を設定しても良い。 The user registration control unit 5041 performs processing for storing customer information in the customer table 5012 in the service according to the present disclosure.
The information stored in the customer table 5012 is sent to the CRM system 50 by the user opening a web page or the like operated by the service provider from any information processing terminal, entering information in a predetermined input form, and sending the information to the CRM system 50 . The user registration control unit 5041 stores the received information in a new record in the customer table 5012, completing customer registration. As a result, the customer information is stored in association with the user ID of the user who manages the customer.
The customer ID may be any character string or number that can identify the user, any character string or number desired by the user, or any character string or number automatically set by the user registration control unit 5041. .

＜音声サーバ（ＰＢＸ）６０の構成＞
音声サーバ（ＰＢＸ）６０は、ネットワークＮと電話網Ｔとを互いに接続することで第１ユーザ端末２０と第２ユーザ端末３０との間における対話を可能とする交換機として機能する情報処理装置である。
音声サーバ（ＰＢＸ）６０は、記憶部６０１を備える。 <Configuration of voice server (PBX) 60>
The voice server (PBX) 60 is an information processing device that functions as an exchange that enables dialogue between the first user terminal 20 and the second user terminal 30 by connecting the network N and the telephone network T to each other. .
The voice server (PBX) 60 has a storage unit 601 .

＜音声サーバ（ＰＢＸ）６０の記憶部６０１の構成＞
音声サーバ（ＰＢＸ）６０の記憶部６０１は、アプリケーションプログラム６０１１を備える。 <Configuration of storage unit 601 of voice server (PBX) 60>
The storage unit 601 of the voice server (PBX) 60 has an application program 6011 .

アプリケーションプログラム６０１１は、音声サーバ（ＰＢＸ）６０の制御部６０４を各機能ユニットとして機能させるためのプログラムである。
アプリケーションプログラム６０１１は、ウェブブラウザアプリケーションなどのアプリケーションを含む。 The application program 6011 is a program for causing the control unit 604 of the voice server (PBX) 60 to function as each functional unit.
Application programs 6011 include applications such as web browser applications.

＜システム１の動作＞
以下、システム１の各処理について説明する。
図１６は、感情解析処理の動作を示すフローチャートである。
図１７は、印象解析処理の動作を示すフローチャートである。
図１８は、トピック解析処理の動作を示すフローチャートである。
図１９は、トピック提示処理の動作を示すフローチャートである。
図２０は、トピック提示処理の動作を示す画面例である。 <Operation of System 1>
Each process of the system 1 will be described below.
FIG. 16 is a flow chart showing the operation of emotion analysis processing.
FIG. 17 is a flowchart showing the operation of impression analysis processing.
FIG. 18 is a flowchart showing the operation of topic analysis processing.
FIG. 19 is a flowchart showing the operation of topic presentation processing.
FIG. 20 is a screen example showing the operation of topic presentation processing.

＜発信処理＞
発信処理は、ユーザ（第１ユーザ）から顧客（第２ユーザ）に対し発信（架電）する処理である。 <Outgoing process>
The calling process is a process of making a call (calling) from a user (first user) to a customer (second user).

＜発信処理の概要＞
発信処理は、ユーザは第１ユーザ端末２０の画面に表示された複数の顧客のうち発信を希望する顧客を選択し、発信操作を行うことにより、顧客に対して発信を行なう一連の処理である。本開示においては、顧客として第２ユーザを選択する場合を一例として説明する。 <Outline of call processing>
The calling process is a series of processes in which the user selects a customer who wishes to make a call from among a plurality of customers displayed on the screen of the first user terminal 20 and performs a calling operation to make a call to the customer. . In the present disclosure, a case of selecting the second user as a customer will be described as an example.

＜発信処理の詳細＞
ユーザから顧客に発信する場合におけるシステム１の発信処理について説明する。 <Details of outgoing processing>
A call processing of the system 1 when a call is made from a user to a customer will be described.

ユーザが顧客に発信する場合、システム１において以下の処理が実行される。 When a user calls a customer, the system 1 performs the following processes.

ユーザは第１ユーザ端末２０を操作することにより、ウェブブラウザを起動し、ＣＲＭシステム５０が提供するＣＲＭサービスのウェブサイトへアクセスする。ユーザは、ＣＲＭサービスが提供する顧客管理画面を開くことにより自身の顧客を第１ユーザ端末２０のディスプレイ２０８１へ一覧表示できる。
具体的に、第１ユーザ端末２０は、ＣＲＭＩＤ２０１３および顧客を一覧表示する旨のリクエストをＣＲＭシステム５０へ送信する。ＣＲＭシステム５０は、リクエストを受信すると、顧客テーブル５０１２を検索し、顧客ＩＤ、氏名、電話番号、顧客属性、顧客組織名、顧客組織属性などのユーザの顧客に関する情報を第１ユーザ端末２０に送信する。第１ユーザ端末２０は、受信した顧客に関する情報を第１ユーザ端末２０のディスプレイ２０８１に表示する。 By operating the first user terminal 20 , the user activates the web browser and accesses the CRM service website provided by the CRM system 50 . The user can display a list of his/her own customers on the display 2081 of the first user terminal 20 by opening the customer management screen provided by the CRM service.
Specifically, the first user terminal 20 transmits a CRM ID 2013 and a request to display a list of customers to the CRM system 50 . When the CRM system 50 receives the request, it searches the customer table 5012 and sends information about the user's customer, such as customer ID, name, telephone number, customer attributes, customer organization name, customer organization attributes, etc., to the first user terminal 20. do. The first user terminal 20 displays the received customer information on the display 2081 of the first user terminal 20 .

ユーザは、第１ユーザ端末２０のディスプレイ２０８１に一覧表示された顧客から発信を希望する顧客（第２ユーザ）を押下し選択する。顧客が選択された状態で、第１ユーザ端末２０のディスプレイ２０８１に表示された「発信」ボタンまたは、電話番号ボタンを押下することにより、ＣＲＭシステム５０に対し電話番号を含むリクエストを送信する。リクエストを受信したＣＲＭシステム５０は、電話番号を含むリクエストをサーバ１０へ送信する。リクエストを受信したサーバ１０は、音声サーバ（ＰＢＸ）６０に対し、発信リクエストを送信する。音声サーバ（ＰＢＸ）６０は、発信リクエストを受信すると、受信した電話番号に基づき第２ユーザ端末３０に対し発信（呼出し）を行う。 The user presses and selects a customer (second user) to whom a call is desired from the customers listed on the display 2081 of the first user terminal 20 . With the customer selected, pressing the "call" button or phone number button displayed on the display 2081 of the first user terminal 20 sends a request including the phone number to the CRM system 50 . The CRM system 50 that has received the request transmits the request including the telephone number to the server 10 . The server 10 that has received the request transmits a call origination request to the voice server (PBX) 60 . Upon receiving the call request, the voice server (PBX) 60 makes a call (call) to the second user terminal 30 based on the received telephone number.

これに伴い、第１ユーザ端末２０は、スピーカ２０８２などを制御し音声サーバ（ＰＢＸ）６０により発信（呼出し）が行われている旨を示す鳴動を行う。また、第１ユーザ端末２０のディスプレイ２０８１は、音声サーバ（ＰＢＸ）６０により顧客に対して発信（呼出し）が行われている旨を示す情報を表示する。例えば、第１ユーザ端末２０のディスプレイ２０８１は、「呼出中」という文字を表示してもよい。 Along with this, the first user terminal 20 controls the speaker 2082 and the like to ring to indicate that the voice server (PBX) 60 is making a call (calling). Also, the display 2081 of the first user terminal 20 displays information indicating that the voice server (PBX) 60 is making a call to the customer. For example, the display 2081 of the first user terminal 20 may display the words "calling".

顧客は、第２ユーザ端末３０において不図示の受話器を持ち上げたり、第２ユーザ端末３０の入力装置３０６に着信時に表示される「受信」ボタンなどを押下することにより、第２ユーザ端末３０は対話可能状態となる。これに伴い、音声サーバ（ＰＢＸ）６０は、第２ユーザ端末３０による応答がなされたことを示す情報（以下、「応答イベント」と呼ぶ）を、サーバ１０、ＣＲＭシステム５０などを介して第１ユーザ端末２０に送信する。
これにより、ユーザと顧客は、それぞれ第１ユーザ端末２０、第２ユーザ端末３０を用いて対話可能状態となり、ユーザと顧客との間で対話できるようになる。具体的には、第１ユーザ端末２０のマイク２０６２により集音されたユーザの音声は、第２ユーザ端末３０のスピーカ３０８２から出力される。同様に、第２ユーザ端末３０のマイク３０６２から集音された顧客の音声は、第１ユーザ端末２０のスピーカ２０８２から出力される。 The customer picks up the receiver (not shown) on the second user terminal 30 or presses the "receive" button displayed on the input device 306 of the second user terminal 30 when receiving an incoming call. becomes possible. Along with this, the voice server (PBX) 60 transmits information indicating that the second user terminal 30 has responded (hereinafter referred to as a “response event”) to the first Send to the user terminal 20 .
As a result, the user and the customer can interact using the first user terminal 20 and the second user terminal 30, respectively, so that the user and the customer can interact with each other. Specifically, the user's voice collected by the microphone 2062 of the first user terminal 20 is output from the speaker 3082 of the second user terminal 30 . Similarly, the customer's voice collected from the microphone 3062 of the second user terminal 30 is output from the speaker 2082 of the first user terminal 20 .

第１ユーザ端末２０のディスプレイ２０８１は、対話可能状態になると、応答イベントを受信し、対話が行われていることを示す情報を表示する。例えば、第１ユーザ端末２０のディスプレイ２０８１は、「応答中」という文字を表示してもよい。 When the display 2081 of the first user terminal 20 becomes ready for interaction, it receives the response event and displays information indicating that interaction is taking place. For example, the display 2081 of the first user terminal 20 may display the characters "answering".

＜着信処理＞
着信処理は、ユーザが顧客から着信（受電）する処理である。 <Incoming processing>
Incoming call processing is processing in which the user receives a call (receives a call) from a customer.

＜着信処理の概要＞
着信処理は、ユーザが第１ユーザ端末２０においてアプリケーションを立ち上げている場合に、顧客がユーザに対して発信した場合に、ユーザが着信する一連の処理である。 <Overview of Incoming Call Processing>
The incoming call process is a series of processes in which the user receives an incoming call when the customer calls the user while the user has launched an application on the first user terminal 20 .

＜着信処理の詳細＞
ユーザが顧客から着信（受電）する場合におけるシステム１の着信処理について説明する。 <Details of incoming call processing>
Incoming call processing of the system 1 when the user receives an incoming call (receives a call) from a customer will be described.

ユーザが顧客から着信する場合、システム１において以下の処理が実行される。 When a user receives an incoming call from a customer, the system 1 performs the following processing.

ユーザは第１ユーザ端末２０を操作することにより、ウェブブラウザを起動し、ＣＲＭシステム５０が提供するＣＲＭサービスのウェブサイトへアクセスする。このとき、ユーザはウェブブラウザにおいて、自身のアカウントにてＣＲＭシステム５０にログインし待機しているものとする。なお、ユーザはＣＲＭシステム５０にログインしていれば良く、ＣＲＭサービスにかかる他の作業などを行っていても良い。 By operating the first user terminal 20 , the user activates the web browser and accesses the CRM service website provided by the CRM system 50 . At this time, it is assumed that the user logs in to the CRM system 50 with his own account on the web browser and waits. It is sufficient for the user to be logged in to the CRM system 50, and the user may be performing other work related to the CRM service.

顧客は、第２ユーザ端末３０を操作し、音声サーバ（ＰＢＸ）６０に割り当てられた所定の電話番号を入力し、音声サーバ（ＰＢＸ）６０に対して発信する。音声サーバ（ＰＢＸ）６０は、第２ユーザ端末３０の発信を着信イベントとして受信する。 The customer operates the second user terminal 30 , inputs a predetermined telephone number assigned to the voice server (PBX) 60 , and makes a call to the voice server (PBX) 60 . The voice server (PBX) 60 receives the outgoing call from the second user terminal 30 as an incoming call event.

音声サーバ（ＰＢＸ）６０は、サーバ１０に対し、着信イベントを送信する。具体的には、音声サーバ（ＰＢＸ）６０は、サーバ１０に対して顧客の電話番号３０１１を含む着信リクエストを送信する。サーバ１０は、ＣＲＭシステム５０を介して第１ユーザ端末２０に対して着信リクエストを送信する。
これに伴い、第１ユーザ端末２０は、スピーカ２０８２などを制御し音声サーバ（ＰＢＸ）６０により着信が行われている旨を示す鳴動を行う。第１ユーザ端末２０のディスプレイ２０８１は、音声サーバ（ＰＢＸ）６０により顧客から着信があること旨を示す情報を表示する。例えば、第１ユーザ端末２０のディスプレイ２０８１は、「着信中」という文字を表示してもよい。 A voice server (PBX) 60 sends an incoming call event to the server 10 . Specifically, the voice server (PBX) 60 transmits an incoming call request including the customer's telephone number 3011 to the server 10 . The server 10 transmits an incoming call request to the first user terminal 20 via the CRM system 50 .
Along with this, the first user terminal 20 controls the speaker 2082 and the like to ring to indicate that the voice server (PBX) 60 is receiving an incoming call. The display 2081 of the first user terminal 20 displays information indicating that the voice server (PBX) 60 has received an incoming call from a customer. For example, the display 2081 of the first user terminal 20 may display the characters "Incoming Call".

第１ユーザ端末２０は、ユーザによる応答操作を受付ける。応答操作は、例えば、第１ユーザ端末２０において不図示の受話器を持ち上げたり、第１ユーザ端末２０のディスプレイ２０８１に「電話に出る」と表示されたボタンを、ユーザがマウス２０６６を操作して押下する操作などにより実現される。
第１ユーザ端末２０は、応答操作を受付けると、音声サーバ（ＰＢＸ）６０に対し、ＣＲＭシステム５０、サーバ１０を介して応答リクエストを送信する。音声サーバ（ＰＢＸ）６０は、送信されてきた応答リクエストを受信し、音声通信を確立する。これにより、第１ユーザ端末２０は、第２ユーザ端末３０と対話可能状態となる。
第１ユーザ端末２０のディスプレイ２０８１は、対話が行われていることを示す情報を表示する。例えば、第１ユーザ端末２０のディスプレイ２０８１は、「対話中」という文字を表示してもよい。 The first user terminal 20 receives a user's response operation. The response operation is, for example, by lifting the handset (not shown) of the first user terminal 20 or by operating the mouse 2066 to press the button labeled "answer the call" on the display 2081 of the first user terminal 20. It is realized by the operation of
Upon receiving the response operation, the first user terminal 20 transmits a response request to the voice server (PBX) 60 via the CRM system 50 and the server 10 . The voice server (PBX) 60 receives the transmitted response request and establishes voice communication. This allows the first user terminal 20 to interact with the second user terminal 30 .
The display 2081 of the first user terminal 20 displays information indicating that dialogue is taking place. For example, the display 2081 of the first user terminal 20 may display the characters "interacting".

＜発信処理、着信処理の変形例＞
第１ユーザが第２ユーザとの間で対話可能状態となる方法は、発信処理、着信処理に限られず、第１ユーザと第２ユーザとの間で対話を実現するための任意の方法を用いても構わない。例えば、サーバ１０上に、第１ユーザと第２ユーザとの間で対話を行うためのルームとよばれる仮想的な対話空間を作成し、第１ユーザおよび第２ユーザが当該ルームへ第１ユーザ端末２０、第２ユーザ端末３０に記憶されたウェブブラウザまたはアプリケーションプログラムを介してアクセスすることにより対話可能状態となる方法でも構わない。この場合、音声サーバ（ＰＢＸ）５０は不要となる。
具体的には、対話の主催者となる第１ユーザが第１ユーザ端末２０の入力装置２０６を操作し、サーバ１０へ対話開催に関するリクエストを送信する。サーバ１０の制御部１０４は、リクエストを受信するとユニークなルームＩＤなどのルーム識別情報を発行し、第１ユーザ端末２０へレスポンスを送信する。第１ユーザは、受信したルーム識別情報を、対話相手である第２ユーザへメール、チャットなど任意の通信手段により送信する。第１ユーザは、第１ユーザ端末２０の入力装置２０６を操作し、ウェブブラウザなどでサーバ１０のルームに関するサービスを提供するＵＲＬへアクセスし、ルーム識別情報を入力することによりルームに入室できる。同様に、第２ユーザは第２ユーザ端末３０の入力装置３０６を操作し、ウェブブラウザなどでサーバ１０のルームに関するサービスを提供するＵＲＬへアクセスし、ルーム識別情報を入力することによりルームに入室できる。これにより、第１ユーザと第２ユーザとはルーム識別情報により関連付けられたルームとよばれる仮想的な対話空間内で、それぞれ第１ユーザ端末２０、第２ユーザ端末３０を介して対話を行うことができる。
ルーム識別情報を入力することにより、第１ユーザ、第２ユーザに加えて、他の１または複数のユーザが１つのルームに入室できる。これにより、三者以上の複数のユーザは、ルーム識別情報により関連付けられたルームとよばれる仮想的な対話空間内で、それぞれのユーザ端末を介して対話を行うことができる。 <Modified example of outgoing call processing and incoming call processing>
The method for allowing the first user to interact with the second user is not limited to outgoing call processing and incoming call processing, and any method for realizing interaction between the first user and the second user can be used. I don't mind. For example, a virtual dialogue space called a room for dialogue between a first user and a second user is created on the server 10, and the first user and the second user enter the room. A method of enabling interaction by accessing via a web browser or an application program stored in the terminal 20 or the second user terminal 30 may also be used. In this case, the voice server (PBX) 50 becomes unnecessary.
Specifically, the first user, who is the organizer of the dialogue, operates the input device 206 of the first user terminal 20 to transmit a request for holding the dialogue to the server 10 . Upon receiving the request, the control unit 104 of the server 10 issues room identification information such as a unique room ID, and transmits a response to the first user terminal 20 . The first user transmits the received room identification information to the second user who is the conversation partner by any means of communication such as e-mail or chat. The first user can enter the room by operating the input device 206 of the first user terminal 20, accessing the URL of the server 10 providing room-related services using a web browser, etc., and entering the room identification information. Similarly, the second user can enter the room by operating the input device 306 of the second user terminal 30, accessing the URL of the server 10 providing room-related services using a web browser, etc., and entering the room identification information. . As a result, the first user and the second user can have a dialogue via the first user terminal 20 and the second user terminal 30, respectively, in a virtual dialogue space called a room associated with the room identification information. can be done.
By inputting room identification information, in addition to the first and second users, one or more other users can enter one room. As a result, three or more users can interact via their respective user terminals in a virtual interaction space called a room associated with the room identification information.

＜動画対話＞
本開示におけるシステム１は、動画データを含むオンライン対話サービス（ビデオ対話サービス）を提供しても良い。例えば、第１ユーザ端末２０の制御部２０４、第２ユーザ端末３０の制御部３０４は、それぞれ、第１ユーザ端末２０のカメラ２０６１、第２ユーザ端末３０のカメラ３０６１により撮影された動画データをサーバ１０へ送信する。
サーバ１０は、受信した動画データに基づき、第１ユーザ端末２０のカメラ２０６１により撮影された動画データを第２ユーザ端末３０へ、第２ユーザ端末３０のカメラ３０６１により撮影された動画データを第１ユーザ端末２０へ送信する。第１ユーザ端末２０の制御部２０４は、受信した第２ユーザ端末３０のカメラ３０６１により撮影された動画データをディスプレイ２０８１に表示する。第２ユーザ端末３０の制御部３０４は、受信した第１ユーザ端末２０のカメラ２０６１により撮影された動画データをディスプレイ３０８１に表示する。
サーバ１０は、オンライン対話に参加している一部またはすべての複数のユーザの動画データを第１ユーザ端末２０、第２ユーザ端末３０へ送信しても良い。この場合、第１ユーザ端末２０の制御部２０４は、受信したオンライン対話に参加している一部またはすべての複数のユーザの動画データを一画面に並べて第１ユーザ端末２０のディスプレイ２０８１に表示する。これにより、オンライン対話に参加している複数のユーザの対話状況を確認できる。第２ユーザ端末３０においても同様の処理を実行しても良い。 <Video dialogue>
The system 1 in the present disclosure may provide an online interactive service (video interactive service) including moving image data. For example, the control unit 204 of the first user terminal 20 and the control unit 304 of the second user terminal 30 send video data captured by the camera 2061 of the first user terminal 20 and the camera 3061 of the second user terminal 30 to the server, respectively. 10.
Based on the received moving image data, the server 10 transmits the moving image data captured by the camera 2061 of the first user terminal 20 to the second user terminal 30, and transmits the moving image data captured by the camera 3061 of the second user terminal 30 to the first user terminal 30. Send to the user terminal 20 . The control unit 204 of the first user terminal 20 displays the received video data captured by the camera 3061 of the second user terminal 30 on the display 2081 . The control unit 304 of the second user terminal 30 displays the received video data captured by the camera 2061 of the first user terminal 20 on the display 3081 .
The server 10 may transmit video data of some or all of the multiple users participating in the online dialogue to the first user terminal 20 and the second user terminal 30 . In this case, the control unit 204 of the first user terminal 20 arranges the video data of some or all of the received plural users participating in the online dialogue on one screen and displays it on the display 2081 of the first user terminal 20. . Thereby, it is possible to confirm the conversation status of a plurality of users participating in the online conversation. A similar process may be executed in the second user terminal 30 as well.

＜対話記憶処理＞
対話記憶処理は、ユーザと顧客との間で行われる対話に関するデータを記憶する処理である。 <Dialogue Amnestics>
The interaction storage process is the process of storing data relating to interactions between users and customers.

＜対話記憶処理の概要＞
対話記憶処理は、ユーザと顧客との間で対話が開始された場合に、対話に関するデータを対話テーブル１０１４に記憶する一連の処理である。 <Outline of dialogue memory processing>
The dialogue storage process is a series of processes for storing data related to dialogue in the dialogue table 1014 when dialogue is started between the user and the customer.

＜対話記憶処理の詳細＞
ユーザと顧客との間で対話が開始されると、音声サーバ（ＰＢＸ）６０は、ユーザと顧客との間で行われる対話に関する音声データを録音し、サーバ１０へ送信する。サーバ１０の制御部１０４は、音声データを受信すると、対話テーブル１０１４に新たなレコードを作成し、ユーザと顧客との間で行われる対話に関するデータを記憶する。具体的に、サーバ１０の制御部１０４は、ユーザＩＤ、顧客ＩＤ、対話カテゴリ、受発信種別、音声データの内容を対話テーブル１０１４の新たなレコードに記憶する。 <Details of dialogue memory processing>
When a dialogue is started between the user and the customer, the voice server (PBX) 60 records voice data relating to the dialogue between the user and the customer and transmits the data to the server 10 . Upon receiving the voice data, the control unit 104 of the server 10 creates a new record in the dialogue table 1014 and stores data relating to the dialogue between the user and the customer. Specifically, the control unit 104 of the server 10 stores the user ID, the customer ID, the dialogue category, the reception/transmission type, and the contents of the voice data in a new record of the dialogue table 1014 .

サーバ１０の制御部１０４は、発信処理または着信処理において第１ユーザ端末２０から、第１ユーザの第１ユーザＩＤ２０１１を取得し、対話テーブル１０１４の新たなレコードのユーザＩＤの項目に記憶する。
サーバ１０の制御部１０４は、発信処理または着信処理において電話番号に基づきＣＲＭシステム５０へ問い合わせを行なう。ＣＲＭシステム５０は、顧客テーブル５０１２を電話番号により検索することにより、顧客ＩＤを取得し、サーバ１０へ送信する。サーバ１０の制御部１０４は、取得した顧客ＩＤを対話テーブル１０１４の新たなレコードの顧客ＩＤの項目に記憶する。
サーバ１０の制御部１０４は、予めユーザまたは顧客ごとに設定された対話カテゴリの値を、対話テーブル１０１４の新たなレコードの対話カテゴリの項目に記憶する。なお、対話カテゴリは、対話ごとにユーザが値を選択、入力することにより記憶しても良い。
サーバ１０の制御部１０４は、行われている対話がユーザにより発信したものか、顧客から発信されたものかを識別し、対話テーブル１０１４の新たなレコードの受発信種別の項目にアウトバウンド（ユーザから発信）、インバウンド（顧客から発信）のいずれかの値を記憶する。 The control unit 104 of the server 10 acquires the first user ID 2011 of the first user from the first user terminal 20 in the outgoing call process or the incoming call process, and stores it in the user ID item of the new record in the dialogue table 1014 .
The control unit 104 of the server 10 makes an inquiry to the CRM system 50 based on the telephone number in the outgoing call processing or incoming call processing. The CRM system 50 obtains the customer ID by searching the customer table 5012 by telephone number, and transmits it to the server 10 . The control unit 104 of the server 10 stores the acquired customer ID in the customer ID field of the new record of the interaction table 1014 .
The control unit 104 of the server 10 stores the dialogue category value set in advance for each user or customer in the dialogue category field of the new record in the dialogue table 1014 . The dialogue category may be stored by the user selecting and inputting a value for each dialogue.
The control unit 104 of the server 10 identifies whether the ongoing dialogue is originated by the user or originated by the customer, and adds outbound (from the user Outgoing) or inbound (originating from the customer).

サーバ１０の制御部１０４は、音声サーバ（ＰＢＸ）６０から受信する音声データを、対話テーブル１０１４の新たなレコードの音声データの項目に記憶する。なお、音声データは他の場所に音声データファイルとして記憶し、対話終了後に、音声データファイルに対する参照情報（パス）を記憶するものとしても良い。また、サーバ１０の制御部１０４は、対話終了後に音声データを記憶する構成としても良い。 The control unit 104 of the server 10 stores the voice data received from the voice server (PBX) 60 in the voice data item of the new record of the interaction table 1014 . The voice data may be stored as a voice data file in another location, and the reference information (path) to the voice data file may be stored after the dialogue is finished. Also, the control unit 104 of the server 10 may be configured to store voice data after the dialogue is finished.

また、ビデオ対話サービスにおいては、サーバ１０の制御部１０４は、第１ユーザ端末２０、第２ユーザ端末３０のから受信する動画データを、対話テーブル１０１４の新たなレコードの動画データの項目に記憶する。なお、動画データは他の場所に動画データファイルとして記憶し、対話終了後に、動画データファイルに対する参照情報（パス）を記憶するものとしても良い。また、サーバ１０の制御部１０４は、対話終了後に動画データを記憶する構成としても良い。 In the video dialogue service, the control unit 104 of the server 10 stores the moving image data received from the first user terminal 20 and the second user terminal 30 in the moving image data item of the new record of the dialogue table 1014. . The moving image data may be stored as a moving image data file in another location, and reference information (path) to the moving image data file may be stored after the dialogue is finished. Also, the control unit 104 of the server 10 may be configured to store the moving image data after the dialogue ends.

＜感情解析処理＞
感情解析処理は、複数のユーザにより行われたオンライン対話の音声、動画等の対話情報を解析し、対話に参加しているユーザの感情状態を特定するとともに、感情状態に基づきラベル情報を特定し、対話情報と関連づけて記憶する処理である。 <Emotion Analysis Processing>
Emotion analysis processing analyzes dialogue information such as voice and video of online dialogue conducted by multiple users, identifies the emotional state of users participating in the dialogue, and identifies label information based on the emotional state. , are stored in association with dialogue information.

＜感情解析処理の概要＞
感情解析処理は、ユーザ間のオンライン対話を検知すると、対話に関する対話情報を記憶し、対話情報に含まれる音声データ、動画データをそれぞれ発話区間ごとの区間音声データ、区間動画データ等の区間データへ分割し、区間データごとの感情特徴量を算定し、感情特徴量に基づきラベル情報を特定し、ラベル情報を対話情報と関連づけて記憶する一連の処理である。 <Overview of Emotion Analysis Processing>
When an online dialogue between users is detected, the emotion analysis process stores dialogue information related to the dialogue, and converts the audio data and video data included in the dialogue information into segment data such as segment audio data and segment video data for each utterance segment, respectively. This is a series of processes of dividing, calculating the emotion feature amount for each section data, identifying label information based on the emotion feature amount, and storing the label information in association with dialogue information.

＜感情解析処理の詳細＞
以下に、感情解析処理の詳細を説明する。 <Details of emotion analysis processing>
Details of the emotion analysis processing will be described below.

ステップＳ１０１において、既に説明した発信処理、着信処理、ルーム等を介してユーザと顧客との間でのオンライン対話が開始される。 In step S101, an online dialogue between the user and the customer is started through the already-described outgoing call processing, incoming call processing, room, and the like.

ステップＳ１０２において、サーバ１０の感情解析部１０４２は、対話に関する音声データを受け付ける受付ステップを実行する。
具体的に、対話記憶処理により、第１ユーザ端末２０は、第１ユーザＩＤ２０１１、マイク２０６２から集音した音声データ、カメラ２０６１により撮影した動画データをサーバ１０へ送信する。サーバ１０の制御部１０４は、受信した第１ユーザＩＤ２０１１、音声データ、動画データを、それぞれ、対話テーブル１０１４の新たなレコードのユーザＩＤ、音声データ、動画データの項目に記憶する。
同様に、第２ユーザ端末３０は、第２ユーザＩＤ３０１１、マイク３０６２から集音した音声データ、カメラ３０６１により撮影した動画データをサーバ１０へ送信する。サーバ１０の制御部１０４は、受信した第２ユーザＩＤ３０１１、音声データ、動画データを、それぞれ、対話テーブル１０１４の新たなレコードのユーザＩＤ、音声データ、動画データの項目に記憶する。
これに伴い、新たな対話ＩＤが採番され、対話テーブル１０１４の新たなレコードの対話ＩＤの項目に記憶される。 In step S102, the emotion analysis unit 1042 of the server 10 executes a receiving step of receiving voice data regarding dialogue.
Specifically, the first user terminal 20 transmits the first user ID 2011 , audio data collected from the microphone 2062 , and moving image data captured by the camera 2061 to the server 10 by the dialogue storage process. The control unit 104 of the server 10 stores the received first user ID 2011, audio data, and moving image data in the user ID, audio data, and moving image data items of the new record in the interaction table 1014, respectively.
Similarly, the second user terminal 30 transmits the second user ID 3011 , audio data collected from the microphone 3062 , and video data captured by the camera 3061 to the server 10 . The control unit 104 of the server 10 stores the received second user ID 3011, audio data, and moving image data in the user ID, audio data, and moving image data items of the new record in the interaction table 1014, respectively.
Along with this, a new dialogue ID is numbered and stored in the dialogue ID field of the new record in the dialogue table 1014 .

ステップＳ１０３において、サーバ１０の感情解析部１０４２は、受付ステップにおいて受け付けた音声データから、発話区間ごとに複数の区間音声データを抽出する音声抽出ステップを実行する。
具体的に、サーバ１０の感情解析部１０４２は、ステップＳ１０２において対話テーブル１０１４に記憶された対話ＩＤ、音声データ、動画データを取得する（受け付ける）。サーバ１０の感情解析部１０４２は、取得（受付）した音声データ、動画データから、音声が存在する区間（発話区間）を検出し、発話区間のそれぞれに対して音声データ、動画データを、それぞれ、区間音声データ、区間動画データとして抽出する。区間音声データ、区間動画データは、発話区間ごとに話者のユーザＩＤ、発話区間の開始日時、発話区間の終了日時と関連づけられる。
サーバ１０の感情解析部１０４２は、抽出された区間音声データ、区間動画データの発話内容に対してテキスト認識を行うことにより、区間音声データ、区間動画データを文字（テキスト）である区間読上テキストに変換し、文字に起こす。なお、テキスト認識の具体的手法は特に限定されない。例えば信号処理技術、ＡＩ（人工知能）を利用した機械学習や深層学習等によって変換してもよい。 In step S103, the emotion analysis unit 1042 of the server 10 executes a voice extraction step of extracting a plurality of segmental voice data for each utterance segment from the voice data received in the receiving step.
Specifically, the emotion analysis unit 1042 of the server 10 acquires (accepts) the dialogue ID, audio data, and video data stored in the dialogue table 1014 in step S102. The emotion analysis unit 1042 of the server 10 detects a section (speech section) in which speech exists from the acquired (accepted) audio data and video data, and generates audio data and video data for each speech section. It is extracted as section audio data and section video data. The segment audio data and segment video data are associated with the user ID of the speaker, the start date and time of the utterance segment, and the end date and time of the utterance segment for each utterance segment.
The emotion analysis unit 1042 of the server 10 performs text recognition on the utterance contents of the extracted segmental voice data and segmental video data, thereby transforming the segmental voice data and segmental video data into segmental reading text, which is characters (text). Convert to and transcribe. Note that the specific method of text recognition is not particularly limited. For example, it may be converted by signal processing technology, machine learning using AI (artificial intelligence), deep learning, or the like.

サーバ１０の感情解析部１０４２は、処理対象の対話ＩＤ、話者のユーザＩＤ（第１ユーザＩＤ２０１１または第２ユーザＩＤ３０１１）、開始日時、終了日時、区間音声データ、区間動画データ、区間読上テキストを、それぞれ、音声区間テーブル１０１６の新たなレコードの対話ＩＤ、話者ＩＤ、開始日時、終了日時、区間音声データ、区間動画データ、区間読上テキストの項目に記憶する。 The emotion analysis unit 1042 of the server 10 processes the dialogue ID to be processed, the speaker's user ID (first user ID 2011 or second user ID 3011), start date and time, end date and time, segment audio data, segment video data, segment reading text are stored in the dialog ID, speaker ID, start date/time, end date/time, segment voice data, segment video data, and segment reading text of new records in the voice segment table 1016, respectively.

音声区間テーブル１０１６には、音声データの発話区間ごとの区間読上テキストが、開始日時、話者と関連づけられ連続的な時系列データとして記憶される。ユーザは、音声区間テーブル１０１６に記憶された区間読上テキストを確認することにより、音声データの内容を確認することなしにテキスト情報として対話内容を確認できる。 In the voice segment table 1016, the segment reading text for each utterance segment of the voice data is associated with the start date and time and the speaker and stored as continuous time-series data. By checking the read-aloud text for the section stored in the speech section table 1016, the user can check the content of the dialogue as text information without checking the content of the speech data.

なお、テキスト認識処理の際に、予めテキストに含まれるフィラーなどのユーザと顧客との間で行われた対話を把握する上で無意味な情報をテキストから除外して、音声認識情報を音声区間テーブル１０１６に記憶する構成としても良い。 In addition, during the text recognition process, information such as fillers included in the text that is meaningless in terms of understanding the dialogue between the user and the customer is removed from the text in advance, and the speech recognition information is converted to speech segments. It may be configured to be stored in the table 1016 .

ステップＳ１０４において、サーバ１０の感情解析部１０４２は、音声抽出ステップにおいて抽出された複数の区間音声データのそれぞれに対応し、区間音声データにおける話者の感情状態に関する複数の感情特徴量を算定する感情算定ステップを実行する。感情算定ステップは、音声抽出ステップにおいて抽出した区間音声データを入力データとして、学習モデルに適用することにより、感情特徴量を出力データとして算定する。
具体的に、サーバ１０の感情解析部１０４２は、Ｓ１０３において音声区間テーブル１０１６に記憶された区間音声データ、区間動画データ、区間読上テキストを取得し、入力データとして感情評価モデル１０３１に適用する、感情評価モデル１０３１は入力データに応じた感情特徴量を出力データとして出力する。 In step S104, the emotion analysis unit 1042 of the server 10 calculates a plurality of emotional feature amounts relating to the emotional state of the speaker in the segmental speech data corresponding to each of the plurality of segmental speech data extracted in the speech extraction step. Perform calculation steps. The emotion calculation step calculates an emotion feature amount as output data by applying the interval speech data extracted in the speech extraction step as input data to a learning model.
Specifically, the emotion analysis unit 1042 of the server 10 acquires the segment voice data, segment video data, and segment reading text stored in the voice segment table 1016 in S103, and applies them to the emotion evaluation model 1031 as input data. The emotion evaluation model 1031 outputs an emotion feature quantity corresponding to the input data as output data.

ステップＳ１０４において、感情算定ステップは、音声抽出ステップにおいて抽出された複数の区間音声データのそれぞれに対応する、多次元の感情に関する強度を示す感情ベクトルを算定するステップを実行する。
具体的に、サーバ１０の感情解析部１０４２は、Ｓ１０３において音声区間テーブル１０１６に記憶された区間音声データ、区間動画データ、区間読上テキストを取得し、入力データとして感情評価モデル１０３１に適用する、感情評価モデル１０３１は入力データに応じた複数の感情状態（次元）ごとの強度、数値として定量的に表現される感情ベクトルを出力データとして出力する。 In step S104, the emotion calculation step executes a step of calculating an emotion vector indicating intensity related to multidimensional emotion corresponding to each of the plurality of segment voice data extracted in the voice extraction step.
Specifically, the emotion analysis unit 1042 of the server 10 acquires the segment voice data, segment video data, and segment reading text stored in the voice segment table 1016 in S103, and applies them to the emotion evaluation model 1031 as input data. The emotion evaluation model 1031 outputs, as output data, emotion vectors quantitatively represented as intensities and numerical values for each of a plurality of emotional states (dimensions) according to input data.

感情算定ステップは、算定された感情ベクトルに基づき、音声抽出ステップにおいて抽出された複数の区間音声データのそれぞれに対応する、１次元の感情に関する強度を示す感情スカラーを算定するステップを実行する。
サーバ１０の感情解析部１０４２は、感情ベクトルに対して、主成分分析、深層学習モデル等の学習モデル、感情ベクトルの成分ごとの演算等を適用することにより、１次元の感情に関する強度を示す感情スカラーを算出する。例えば、感情スカラーは音声区間情報における話者の感情状態のポジティブ度、ネガティブ度を定量的に表現した指標であり、＋１（ポジティブ）から、－１（ネガティブ）の値の範囲に正規化された数値データとしても良い。 In the emotion calculation step, based on the calculated emotion vector, a step of calculating an emotion scalar representing a one-dimensional emotion-related intensity corresponding to each of the plurality of segmental speech data extracted in the speech extraction step is executed.
The emotion analysis unit 1042 of the server 10 applies a principal component analysis, a learning model such as a deep learning model, a calculation for each component of the emotion vector, etc. to the emotion vector, thereby obtaining an emotion that indicates the intensity of one-dimensional emotion. Compute a scalar. For example, the emotion scalar is an index that quantitatively expresses the degree of positivity or negativity of the speaker's emotional state in the speech segment information, normalized to a range of values from +1 (positive) to -1 (negative). Numerical data may also be used.

サーバ１０の感情解析部１０４２は、算定した感情特徴量である感情ベクトル、感情スカラーを、音声区間テーブル１０１６の解析対象のレコードの感情データの項目に記憶する。感情データの項目には、感情ベクトル、感情スカラーのいずれかが記憶される構成としても良い。 The emotion analysis unit 1042 of the server 10 stores the emotion vector and emotion scalar, which are the calculated emotion feature amounts, in the emotion data item of the record to be analyzed in the voice segment table 1016 . Either an emotion vector or an emotion scalar may be stored in the item of emotion data.

ステップＳ１０４において、サーバ１０の感情解析部１０４２は、音声区間テーブル１０１６の解析対象のレコードの話者ＩＤに基づき、ユーザテーブル１０１２のユーザＩＤを検索し、ユーザ属性を取得する。 In step S104, the emotion analysis unit 1042 of the server 10 retrieves the user ID of the user table 1012 based on the speaker ID of the analysis target record of the speech segment table 1016, and acquires the user attribute.

ステップＳ１０５において、サーバ１０の感情解析部１０４２は、感情算定ステップにおいて算定された複数の感情特徴量に基づき、対話に対するラベル情報を特定するラベル特定ステップを実行する。
具体的に、サーバ１０の感情解析部１０４２は、対話ＩＤに基づき、音声区間テーブル１０１６の対話ＩＤを検索し、感情データの項目を取得する。サーバ１０の感情解析部１０４２は、感情データに基づき、感情条件マスタ１０２１の感情条件に該当するレコード有無を検索し、該当するレコードのラベルデータの項目を取得する。
本開示においては、サーバ１０の感情解析部１０４２は、１の対話情報に対して抽出された複数の音声区間情報のそれぞれに対して算定し、記憶された複数の感情データに対応する複数の感情特徴量を、感情条件として、ラベルデータを特定し、取得する構成としても良い。 In step S105, the emotion analysis unit 1042 of the server 10 executes a label identification step of identifying label information for the dialogue based on the plurality of emotion feature amounts calculated in the emotion calculation step.
Specifically, the emotion analysis unit 1042 of the server 10 searches for the dialogue ID in the voice segment table 1016 based on the dialogue ID, and acquires the item of emotion data. The emotion analysis unit 1042 of the server 10 searches the emotion condition master 1021 for the presence or absence of a record corresponding to the emotion condition based on the emotion data, and acquires the item of the label data of the corresponding record.
In the present disclosure, the emotion analysis unit 1042 of the server 10 calculates each of a plurality of speech segment information extracted from one piece of dialogue information, and calculates a plurality of emotions corresponding to a plurality of stored emotion data. A configuration may be adopted in which label data is specified and acquired by using the feature amount as the emotion condition.

ステップＳ１０５において、ラベル特定ステップは、感情算定ステップにおいて算定された複数の感情スカラーに基づき、対話に対するラベル情報を特定するステップを実行する。
具体的に、サーバ１０の感情解析部１０４２は、１の対話情報に対して抽出された複数の音声区間情報のそれぞれに対して算定し、記憶された複数の感情データに含まれる感情スカラーを、感情条件として、ラベルデータを特定しても良い。 In step S105, the label identification step executes a step of identifying label information for the dialogue based on the plurality of emotion scalars calculated in the emotion calculation step.
Specifically, the emotion analysis unit 1042 of the server 10 calculates each of the plurality of voice segment information extracted for one piece of dialogue information, and calculates the emotion scalar included in the plurality of stored emotion data as follows: Label data may be specified as the emotional condition.

ステップＳ１０５において、ラベル特定ステップは、感情算定ステップにおいて算定された複数の感情ベクトルに基づき、対話に対するラベル情報を特定するステップを実行する。
具体的に、サーバ１０の感情解析部１０４２は、１の対話情報に対して抽出された複数の音声区間情報のそれぞれに対して算定し、記憶された複数の感情データに含まれる感情ベクトルを、感情条件として、ラベルデータを特定しても良い。例えば、感情条件は、感情ベクトルのそれぞれの要素成分に対する範囲等により特定される構成としても良い。 In step S105, the label identification step executes a step of identifying label information for the dialogue based on the plurality of emotion vectors calculated in the emotion calculation step.
Specifically, the emotion analysis unit 1042 of the server 10 calculates each of the plurality of voice segment information extracted for one piece of dialogue information, and the emotion vectors included in the plurality of stored emotion data are calculated as follows: Label data may be specified as the emotional condition. For example, the emotional condition may be specified by a range or the like for each element component of the emotion vector.

ステップＳ１０５において、ラベル特定ステップは、感情算定ステップにおいて算定された複数の感情特徴量のうち、所定の閾値以上または以下の感情特徴量の個数に基づき、対話に対するラベル情報を特定するステップを実行する。
具体的に、感情条件マスタ１０２１の感情条件の項目に、所定の閾値と、閾値以上の個数（所定個数）の情報が記憶されているとする。サーバ１０の感情解析部１０４２は、１の対話情報に対して抽出された複数の音声区間情報のそれぞれに対応する感情スカラーの値を、所定の閾値と比較し、所定の閾値以上の音声区間情報（感情スカラー）の個数をカウントする。なお、所定の閾値以下の個数をカウントしても構わない。
サーバ１０の感情解析部１０４２は、カウントされた音声区間情報の個数が、所定個数よりも多い場合には当該感情条件に該当すると判定し、感情条件マスタ１０２１において感情条件に関連付けられたラベルデータの項目を取得し特定する。
例えば、所定の閾値以上の音声区間情報（感情スカラー）の個数が、所定個数よりも多い場合は、対話における感情状態がポジティブであることを示すラベル情報を特定する。同様に、所定の閾値以下の音声区間情報（感情スカラー）の個数が、所定個数よりも多い場合は、対話における感情状態がネガティブであることを示すラベル情報を特定する。 In step S105, the label identification step executes a step of identifying label information for the dialogue based on the number of emotion feature amounts equal to or greater than or equal to a predetermined threshold among the plurality of emotion feature amounts calculated in the emotion calculation step. .
Specifically, it is assumed that a predetermined threshold value and the number of pieces of information equal to or greater than the threshold value (predetermined number) are stored in the emotional condition item of the emotional condition master 1021 . The emotion analysis unit 1042 of the server 10 compares the value of the emotion scalar corresponding to each of the plurality of speech segment information extracted for one piece of dialogue information with a predetermined threshold, and Count the number of (emotion scalar). It should be noted that the number below a predetermined threshold may be counted.
The emotion analysis unit 1042 of the server 10 determines that the emotion condition corresponds to the emotion condition when the number of counted voice segment information is larger than a predetermined number, and the label data associated with the emotion condition in the emotion condition master 1021 is stored. Get and identify an item.
For example, when the number of pieces of speech segment information (emotion scalar) equal to or greater than a predetermined threshold is greater than a predetermined number, label information indicating that the emotional state in the dialogue is positive is specified. Similarly, when the number of pieces of speech segment information (emotion scalar) below a predetermined threshold is greater than a predetermined number, label information indicating that the emotional state in the dialogue is negative is specified.

ステップＳ１０５において、ラベル特定ステップは、感情算定ステップにおいて算定された複数の感情特徴量のうち、所定の閾値以上または以下の感情特徴量が占める割合に基づき、対話に対するラベル情報を特定するステップを実行する。
具体的に、感情条件マスタ１０２１の感情条件の項目に、所定の閾値と、閾値以上の割合（所定割合）の情報が記憶されているとする。サーバ１０の感情解析部１０４２は、１の対話情報に対して抽出された複数の音声区間情報のそれぞれに対応する感情スカラーの値を、所定の閾値と比較し、所定の閾値以上の音声区間情報（感情スカラー）の個数をカウントする。なお、所定の閾値以下の個数をカウントしても構わない。
サーバ１０の感情解析部１０４２は、カウントされた音声区間情報の個数の、１の対話情報に対して抽出されたすべての音声区間情報の個数に対する割合が、所定割合よりも多い場合には当該感情条件に該当すると判定し、感情条件マスタ１０２１において感情条件に関連付けられたラベルデータの項目を取得し特定する。
例えば、所定の閾値以上の音声区間情報（感情スカラー）の割合が、所定割合よりも多い場合は、対話における感情状態がポジティブであることを示すラベル情報を特定する。同様に、所定の閾値以下の音声区間情報（感情スカラー）の割合が、所定割合よりも多い場合は、対話における感情状態がネガティブであることを示すラベル情報を特定する。 In step S105, the label identification step executes a step of identifying label information for the dialogue based on the ratio of the emotion feature amounts above or below a predetermined threshold among the plurality of emotion feature amounts calculated in the emotion calculation step. do.
Specifically, it is assumed that a predetermined threshold value and information on the ratio of the threshold value or more (predetermined ratio) are stored in the emotional condition item of the emotional condition master 1021 . The emotion analysis unit 1042 of the server 10 compares the value of the emotion scalar corresponding to each of the plurality of speech segment information extracted for one piece of dialogue information with a predetermined threshold, and Count the number of (emotion scalar). It should be noted that the number below a predetermined threshold may be counted.
The emotion analysis unit 1042 of the server 10, when the ratio of the counted number of voice segment information to the number of all voice segment information extracted for one piece of dialogue information is greater than a predetermined percentage, It is determined that the condition is met, and the item of label data associated with the emotional condition is acquired and specified in the emotion condition master 1021 .
For example, if the rate of speech segment information (emotion scalar) equal to or greater than a predetermined threshold is greater than a predetermined rate, label information indicating that the emotional state in the dialogue is positive is specified. Similarly, when the ratio of speech segment information (emotion scalar) below a predetermined threshold is greater than a predetermined ratio, label information indicating that the emotional state in the dialogue is negative is specified.

なお、感情スカラーの代わりに、感情ベクトルに含まれる１の要素成分、感情ベクトルに含まれる１または複数の要素成分に基づき算定される指標等を感情特徴量とみなして、同様の処理を実行しても構わない。 In addition, instead of the emotion scalar, one element component included in the emotion vector, an index calculated based on one or more element components included in the emotion vector, etc. are regarded as emotion feature amounts, and similar processing is performed. I don't mind.

ステップＳ１０５において、ラベル特定ステップは、感情算定ステップにおいて算定された複数の感情特徴量の統計値に基づき、対話に対するラベル情報を特定するステップを実行する。
具体的に、感情条件マスタ１０２１の感情条件の項目に、所定の閾値の情報が記憶されているとする。サーバ１０の感情解析部１０４２は、１の対話情報に対して抽出された複数の音声区間情報のそれぞれに対応する感情スカラーの値の平均値、中央値、最頻値などの平均、最大値、最小値などの統計値を算定し、所定の閾値と比較し、所定の閾値以上の場合には当該感情条件に該当すると判定し、感情条件マスタ１０２１において感情条件に関連付けられたラベルデータの項目を取得し特定する。なお、所定の閾値以下の場合を条件としても構わない。 In step S105, the label identification step executes a step of identifying label information for the dialogue based on the statistical values of the plurality of emotion feature quantities calculated in the emotion calculation step.
Specifically, it is assumed that predetermined threshold information is stored in the emotion condition item of the emotion condition master 1021 . The emotion analysis unit 1042 of the server 10 calculates the average value, the median value, the average value such as the mode value, the maximum value, Statistical values such as the minimum value are calculated, compared with a predetermined threshold value, and if it is equal to or greater than the predetermined threshold value, it is determined that the emotional condition is applicable, and the item of the label data associated with the emotional condition in the emotional condition master 1021 is selected. Obtain and identify. It should be noted that the condition may be the case where it is equal to or less than a predetermined threshold value.

ステップＳ１０５において、ラベル特定ステップは、感情算定ステップにおいて算定された複数の感情特徴量の時系列的な変化に基づき、対話に対するラベル情報を特定するステップを実行する。
ラベル特定ステップは、感情算定ステップにおいて算定された複数の感情特徴量の時系列的な変化に対して回帰分析を行うステップと、回帰分析の結果得られた回帰係数に基づき、対話に対するラベル情報を特定するステップと、を含む。
具体的に、感情条件マスタ１０２１の感情条件の項目に、回帰係数の範囲が記憶されているとする。対象となる対話データにおいて、対話データに関連づけられた複数の音声区間情報のそれぞれに対して、Ｘ軸に音声区間情報の開始日時、終了日時、開始日時から終了日時の間の任意の日時の値、Ｙ軸に当該音声区間情報の感情データに含まれる感情スカラーの値とした場合に、Ｙ＝ｆ（Ｘ）の回帰分析を行う。回帰分析は、１次回帰、２次回帰等、任意の回帰分析を適用しても構わない。回帰分析を行うことにより回帰係数を算定し、回帰係数の範囲と比較し、回帰係数の範囲内の場合には当該感情条件に該当すると判定し、感情条件マスタ１０２１において感情条件に関連付けられたラベルデータの項目を取得し特定する。
例えば、線形回帰（１次回帰）の場合において、切片が負であり、傾きが正である場合は、対話における感情状態が改善していることを示すラベル情報を特定する。
なお、感情スカラーの代わりに、感情ベクトルに含まれる１の要素成分、感情ベクトルに含まれる１または複数の要素成分に基づき算定される指標等を感情特徴量とみなして、同様の処理を実行しても構わない。 In step S105, the label identification step executes a step of identifying label information for the dialogue based on time-series changes in the plurality of emotion feature quantities calculated in the emotion calculation step.
The label identification step includes a step of performing regression analysis on time-series changes in the multiple emotion feature quantities calculated in the emotion calculation step, and label information for the dialogue based on the regression coefficients obtained as a result of the regression analysis. identifying.
Specifically, it is assumed that the range of regression coefficients is stored in the emotion condition item of the emotion condition master 1021 . In the target dialogue data, for each of the plurality of speech segment information associated with the dialogue data, the start date and time of the speech segment information, the end date and time, the value of any date and time between the start date and time and the end date and time on the X axis, Regression analysis of Y=f(X) is performed when the Y axis is the value of the emotion scalar included in the emotion data of the speech segment information. Any regression analysis such as linear regression or quadratic regression may be applied as the regression analysis. A regression coefficient is calculated by performing a regression analysis, compared with the range of the regression coefficient, and if it is within the range of the regression coefficient, it is determined that the emotional condition corresponds, and the label associated with the emotional condition in the emotional condition master 1021. Acquire and identify an item of data.
For example, in the case of linear regression (first-order regression), when the intercept is negative and the slope is positive, label information indicating that the emotional state in the dialogue is improving is specified.
In addition, instead of the emotion scalar, one element component included in the emotion vector, an index calculated based on one or more element components included in the emotion vector, etc. are regarded as emotion feature amounts, and similar processing is performed. I don't mind.

ステップＳ１０５において、サーバ１０の感情解析部１０４２は、音声抽出ステップにおいて抽出した時系列的に連続する複数の区間音声データに対応した複数の感情特徴量の集合である第１感情群を特定するステップを実行する。サーバ１０の感情解析部１０４２は、音声抽出ステップにおいて抽出した時系列的に連続する複数の区間音声データに対応した複数の感情特徴量の集合である第２感情群を特定するステップを実行する。
具体的に、サーバ１０の感情解析部１０４２は、１の対話情報に対して抽出された複数の音声区間情報を、それぞれ複数の音声区間情報からなる区間群に分割し、それぞれの区間群に対して既に説明したラベル特定ステップを実行しても構わない。これにより、複数の区間群のそれぞれに対応するラベル情報が特定される。
例えば、サーバ１０の感情解析部１０４２は、区間群に含まれる抽出された複数の音声区間情報のそれぞれに対して感情スカラーを算定し感情データに記憶する。記憶された複数の感情データに含まれる感情スカラーを、感情条件として、ラベルデータを特定しても良い。
例えば、サーバ１０の感情解析部１０４２は、区間群に含まれる抽出された複数の音声区間情報のそれぞれに対して感情ベクトルを算定し感情データに記憶する。記憶された複数の感情データに含まれる感情ベクトルを、感情条件として、ラベルデータを特定しても良い。 In step S105, the emotion analysis unit 1042 of the server 10 identifies a first emotion group, which is a set of a plurality of emotion feature amounts corresponding to a plurality of time-series continuous segmental voice data extracted in the voice extraction step. to run. The emotion analysis unit 1042 of the server 10 executes a step of identifying a second emotion group, which is a set of a plurality of emotion feature amounts corresponding to a plurality of time-series continuous segmental voice data extracted in the voice extraction step.
Specifically, the emotion analysis unit 1042 of the server 10 divides a plurality of speech segment information extracted from one piece of dialogue information into segment groups each composed of a plurality of speech segment information, and for each segment group: may perform the label identification step already described. As a result, label information corresponding to each of the plurality of section groups is specified.
For example, the emotion analysis unit 1042 of the server 10 calculates an emotion scalar for each of a plurality of pieces of extracted speech segment information included in the segment group and stores it in emotion data. The label data may be specified using the emotion scalar included in the plurality of stored emotion data as the emotion condition.
For example, the emotion analysis unit 1042 of the server 10 calculates an emotion vector for each of a plurality of pieces of extracted voice segment information included in the segment group and stores it in emotion data. Emotion vectors included in the plurality of stored emotion data may be used as emotion conditions to identify label data.

ステップＳ１０５において、ラベル特定ステップは、第１感情群に含まれる複数の感情特徴量に基づき、対話に対する第１ラベル情報を特定するステップと、第２感情群に含まれる複数の感情特徴量に基づき、対話に対する第２ラベル情報を特定するステップと、を含む。
具体的に、サーバ１０の感情解析部１０４２は、１の対話情報に対して抽出された複数の音声区間情報を、それぞれ複数の音声区間情報からなる区間群に分割し、それぞれの区間群に対して既に説明したラベル特定ステップを実行することにより、複数の区間群のそれぞれに対応するラベル情報が特定される。 In step S105, the label identification step includes a step of identifying first label information for the dialogue based on the plurality of emotion feature amounts included in the first emotion group; , identifying second label information for the interaction.
Specifically, the emotion analysis unit 1042 of the server 10 divides a plurality of speech segment information extracted from one piece of dialogue information into segment groups each composed of a plurality of speech segment information, and for each segment group: Label information corresponding to each of the plurality of section groups is specified by executing the label specifying step already described in .

ステップＳ１０５において、サーバ１０の感情解析部１０４２は、第１ラベル情報および第２ラベル情報を、第１ユーザに提示するラベル提示ステップを実行する。
具体的に、サーバ１０の感情解析部１０４２は、特定された第１ラベル情報、第２ラベル情報を第１ユーザ端末２０へ送信する。第１ユーザ端末２０の制御部２０４は、受信した第１ラベル情報、第２ラベル情報を第１ユーザ端末２０のディスプレイ２０８１に表示し、第１ユーザに提示する。なお、第１ラベル情報および第２ラベル情報は、第２ユーザ、それ以外の管理者、他のユーザ等の任意のユーザに提示しても良い。 In step S105, the emotion analysis unit 1042 of the server 10 executes a label presentation step of presenting the first label information and the second label information to the first user.
Specifically, the emotion analysis unit 1042 of the server 10 transmits the identified first label information and second label information to the first user terminal 20 . The control unit 204 of the first user terminal 20 displays the received first label information and second label information on the display 2081 of the first user terminal 20 to present them to the first user. Note that the first label information and the second label information may be presented to arbitrary users such as the second user, other administrators, and other users.

ステップＳ１０５において、サーバ１０の感情解析部１０４２は、第１ユーザから、ラベル提示ステップにおいて提示した第１ラベル情報および第２ラベル情報の少なくともいずれか１つを選択する選択指示を受け付ける選択受付ステップを実行する。
具体的に、第１ユーザは、第１ユーザ端末２０の入力装置２０６などを操作することにより、第１ユーザ端末２０のディスプレイ２０８１に提示された第１ラベル情報、第２ラベル情報のいずれか１つを選択する。なお、第１ユーザはいずれも選択しないものとしても良い。第１ユーザ端末２０の制御部２０４は、選択されたラベル情報をサーバ１０へ送信する。サーバ１０の感情解析部１０４２は、受信したラベル情報を特定する。 In step S105, the emotion analysis unit 1042 of the server 10 performs a selection receiving step of receiving a selection instruction from the first user to select at least one of the first label information and the second label information presented in the label presenting step. Run.
Specifically, the first user operates the input device 206 or the like of the first user terminal 20 to select either the first label information or the second label information presented on the display 2081 of the first user terminal 20. choose one. Note that the first user may select none of them. The control unit 204 of the first user terminal 20 transmits the selected label information to the server 10 . Emotion analysis unit 1042 of server 10 identifies the received label information.

ステップＳ１０５において、ラベル特定ステップは、感情算定ステップにおいて算定された複数の感情特徴量と、複数の感情特徴量に対応する区間音声データを発話した第１ユーザまたは第２ユーザのユーザ属性と、に基づき、対話に対するラベル情報を特定するステップを実行する。
具体的に、サーバ１０の感情解析部１０４２は、ラベル情報を特定する際に、ステップＳ１０４において特定した第１ユーザ、第２ユーザのユーザ属性を考慮し、ラベル情報を特定しても良い。例えば、感情条件マスタ１０２１における感情条件に、第１ユーザ、第２ユーザのユーザ属性を条件として含めても構わない。 In step S105, in the label specifying step, the plurality of emotion feature amounts calculated in the emotion calculation step, and the user attributes of the first user or the second user who uttered the segmental speech data corresponding to the plurality of emotion feature amounts. Based on this, perform the step of identifying label information for the interaction.
Specifically, when identifying the label information, the emotion analysis unit 1042 of the server 10 may consider the user attributes of the first user and the second user identified in step S104 to identify the label information. For example, the emotion conditions in the emotion condition master 1021 may include the user attributes of the first user and the second user as conditions.

ステップＳ１０５において、ラベル特定ステップは、感情算定ステップにおいて算定された、第２ユーザの発話にかかる区間音声データに対応する複数の感情特徴量に基づき、第１ユーザの発話にかかる区間音声データに対応する複数の感情特徴量を考慮せずに、対話に対するラベル情報を特定するステップを実行する。
具体的に、サーバ１０の感情解析部１０４２は、１の対話情報に対して抽出された複数の音声区間情報のうち、話者ＩＤが第１ユーザＩＤ２０１１である音声区間情報を除外し、話者ＩＤが第２ユーザＩＤ３０１１である音声区間情報のみに基づき、既に説明したラベル特定ステップを実行しても構わない。
これにより、顧客の感情状態のみ考慮したラベル情報を特定できる。通常、オペレータ等に相当する第１ユーザは、自身の感情状態ではなく、顧客の感情状態に関心があることが一般的である。このような構成にすることにより、顧客の感情状態を特に考慮したラベル情報を特定できる。 In step S105, the label specifying step corresponds to the segmental voice data of the first user's utterance based on the plurality of emotion feature values corresponding to the segmental voice data of the second user's utterance calculated in the emotion calculating step. A step of identifying label information for a dialogue is performed without considering a plurality of emotion features.
Specifically, the emotion analysis unit 1042 of the server 10 excludes the voice segment information whose speaker ID is the first user ID 2011 from among the plurality of voice segment information extracted for one piece of dialogue information. The already-described label identification step may be executed based only on the voice section information whose ID is the second user ID 3011 .
This makes it possible to specify label information that takes into account only the customer's emotional state. Generally, the first user, such as an operator, is interested in the customer's emotional state rather than his or her own emotional state. By adopting such a configuration, it is possible to specify label information that particularly considers the customer's emotional state.

サーバ１０の感情解析部１０４２は、１の対話情報に対して抽出された複数の音声区間情報のうち、話者ＩＤが第２ユーザＩＤ３０１１である音声区間情報を除外し、話者ＩＤが第１ユーザＩＤ２０１１である音声区間情報のみに基づき、既に説明したラベル特定ステップを実行しても構わない。 The emotion analysis unit 1042 of the server 10 excludes the speech section information whose speaker ID is the second user ID 3011 from among the plurality of speech section information extracted for one piece of dialogue information, The already-described label specifying step may be executed based only on the voice section information of the user ID 2011 .

サーバ１０の感情解析部１０４２は、話者ＩＤが第１ユーザＩＤ２０１１である音声区間情報、話者ＩＤが第２ユーザＩＤ３０１１である音声区間情報、のそれぞれに対して既に説明したラベル特定ステップを実行して、それぞれ、第１のラベル情報、第２のラベル情報の複数のラベル情報を特定しても良い。 The emotion analysis unit 1042 of the server 10 executes the already described label identifying step for each of the voice segment information whose speaker ID is the first user ID 2011 and the voice segment information whose speaker ID is the second user ID 3011. Then, a plurality of label information of the first label information and the second label information may be specified respectively.

また、サーバ１０の感情解析部１０４２は、１の対話情報に対して抽出された複数の音声区間情報のうち、話者ＩＤにより特定されるユーザが対話の主催者であるホストユーザである音声区間情報を除外し、話者ＩＤにより特定されるユーザがホストユーザでない音声区間情報のみに基づき、既に説明したラベル特定ステップを実行しても構わない。
これにより、対話の主催者の感情状態を考慮せずにラベル情報を特定できる。通常、対話の主催者は、自身の感情状態ではなく、対話相手の感情状態に関心があることが一般的である。このような構成にすることにより、対話相手の感情状態を考慮したラベル情報を特定できる。 In addition, the emotion analysis unit 1042 of the server 10 extracts a plurality of pieces of voice segment information extracted for one piece of dialogue information. Information may be excluded, and the already described label identification step may be performed based only on speech segment information for which the user identified by the speaker ID is not the host user.
This allows label information to be specified without considering the emotional state of the host of the dialogue. It is common for dialogue organizers to be more interested in the emotional state of the interlocutor than in their own emotional state. With such a configuration, it is possible to specify label information that takes into consideration the emotional state of the conversation partner.

ステップＳ１０６において、サーバ１０の感情解析部１０４２は、ラベル特定ステップにおいて特定されたラベル情報を、対話と関連づけて記憶する記憶ステップを実行する。
具体的に、サーバ１０の感情解析部１０４２は、ステップＳ１０５において特定されたラベル情報を、ステップＳ１０１において採番された対話ＩＤと関連づけてラベルテーブル１０１５のラベルデータの項目に記憶する。
なお、ステップＳ１０５においては、特定されたラベル情報を第１ユーザに提示し、第１ユーザから選択指示を受け付けたラベル情報をラベルテーブル１０１５のラベルデータとして記憶する構成としても良い。 In step S106, the emotion analysis unit 1042 of the server 10 executes a storage step of storing the label information identified in the label identification step in association with the dialogue.
Specifically, the emotion analysis unit 1042 of the server 10 stores the label information specified in step S105 in the label data item of the label table 1015 in association with the dialogue ID numbered in step S101.
In step S105, the specified label information may be presented to the first user, and the label information for which the selection instruction is received from the first user may be stored as label data in the label table 1015. FIG.

ステップＳ１０６において、記憶ステップは、ラベル特定ステップにおいて特定された第１ラベル情報または第２ラベル情報を、対話と関連づけて記憶するステップを実行する。記憶ステップは、選択受付ステップにおいて第１ユーザから受け付けた選択指示に基づき第１ラベル情報および第２ラベル情報の少なくともいずれか１つを、対話と関連づけて記憶するステップを実行する。
具体的に、第１ユーザから選択指示を受け付けたラベル情報をラベルテーブル１０１５のラベルデータとして記憶する構成としても良い。 In step S106, the storing step executes a step of storing the first label information or the second label information specified in the label specifying step in association with the interaction. The storing step stores at least one of the first label information and the second label information in association with the dialogue based on the selection instruction received from the first user in the selection receiving step.
Specifically, label information for which a selection instruction is received from the first user may be stored as label data in the label table 1015 .

また、第１ユーザは、第１ユーザ端末２０の入力装置２０６を操作することにより、サーバ１０からラベルテーブル１０１５に記憶されたラベル情報を、第１ユーザ端末２０のディスプレイ２０８１に表示することができる。 Further, the first user can display the label information stored in the label table 1015 from the server 10 on the display 2081 of the first user terminal 20 by operating the input device 206 of the first user terminal 20. .

＜感情解析処理の実行タイミングについて＞
感情解析処理のステップＳ１０３～Ｓ１０６は複数のユーザによるオンライン対話の終了後に実行する構成としても良い。これにより、オンライン対話が終了した後、対話内容が確定した後に、対話におけるユーザの感情状態に応じたラベル情報が特定され、対話情報と関連づけられて記憶される。 <Regarding the execution timing of emotion analysis processing>
Steps S103 to S106 of the emotion analysis process may be configured to be executed after the online dialogue by a plurality of users is finished. As a result, after the online dialogue ends and the contents of the dialogue are determined, the label information corresponding to the emotional state of the user in the dialogue is identified and stored in association with the dialogue information.

また、感情解析処理は複数のユーザによるオンライン対話の開始後、対話の終了前までに実行する構成としても良い。
つまり、複数のユーザによるオンライン対話の対話中の任意のタイミングに実行する構成としても良い。また、ステップＳ１０３～ステップＳ１０６は、オンライン対話の対話中に定期的にリアルタイムに実行する構成としても良い。これにより、オンライン対話の対話途中においても、それまでの対話におけるユーザの感情状態に応じたラベル情報が特定され、対話情報と関連づけられて記憶される構成としても良い。
これにより、ユーザは、オンライン対話の対話中にリアルタイムに、オンライン対話に参加しているユーザの感情状態を確認できるとともに、対話情報を最新の感情状態に基づき整理、管理できる。 Also, the emotion analysis processing may be configured to be executed after the start of online dialogue by a plurality of users and before the end of the dialogue.
In other words, it may be configured to be executed at an arbitrary timing during online dialogue between a plurality of users. Also, steps S103 to S106 may be configured to be periodically executed in real time during online dialogue. As a result, even during the online dialogue, the label information corresponding to the emotional state of the user in the previous dialogue may be identified and stored in association with the dialogue information.
As a result, the user can confirm the emotional state of the user participating in the online dialogue in real time during the online dialogue, and organize and manage the dialogue information based on the latest emotional state.

＜印象解析処理＞
印象解析処理は、複数のユーザにより行われたオンライン対話の音声、動画等の対話情報を解析し、対話に参加しているユーザの印象状態を特定するとともに、印象状態、話者タイプをユーザに提示する処理である。 <Impression analysis processing>
Impression analysis processing analyzes dialogue information such as voice and video of online dialogue conducted by multiple users, identifies the impression state of the users participating in the dialogue, and informs the user of the impression state and speaker type. This is the processing to be presented.

＜印象解析処理の概要＞
印象解析処理は、ユーザ間のオンライン対話を検知すると、対話に関する対話情報を記憶し、対話情報に含まれる音声データ、動画データをそれぞれ発話区間ごとの区間音声データ、区間動画データ等の区間データへ分割し、区間データごとの印象特徴量を算定し、印象特徴量に基づき話者タイプを特定し、特定した話者タイプをユーザへ提示する一連の処理である。 <Overview of Impression Analysis Processing>
When an online dialogue between users is detected, the impression analysis process stores the dialogue information related to the dialogue, and converts the audio data and video data included in the dialogue information into segment data such as segment audio data and segment video data for each utterance segment, respectively. This is a series of processes of dividing, calculating the impression feature amount for each section data, identifying the speaker type based on the impression feature amount, and presenting the identified speaker type to the user.

＜印象解析処理の詳細＞
以下に、印象解析処理の詳細を説明する。 <Details of impression analysis processing>
Details of the impression analysis processing will be described below.

ステップＳ３０１において、既に説明した発信処理、着信処理、ルーム等を介してユーザと顧客との間でのオンライン対話が開始される。 In step S301, an online dialogue between the user and the customer is started through the already explained outgoing call processing, incoming call processing, room, and the like.

ステップＳ３０２において、サーバ１０の印象解析部１０４３は、第２ユーザから第１ユーザとの対話応対に関する対話情報を取得する対話取得ステップを実行する。
ステップＳ３０２は、感情解析処理におけるステップＳ１０２と同様であるため説明を省略する。 In step S302, the impression analysis unit 1043 of the server 10 executes a dialog acquisition step of acquiring dialog information from the second user regarding the interaction response with the first user.
Since step S302 is the same as step S102 in the emotion analysis process, the description is omitted.

ステップＳ３０３において、サーバ１０の印象解析部１０４３は、ステップＳ３０２において受け付けた第２ユーザの音声データから、発話区間ごとに複数の区間音声データを抽出する音声抽出ステップを実行する。
ステップＳ３０３は、感情解析処理におけるステップＳ１０３と同様であるため説明を省略する。 In step S303, the impression analysis unit 1043 of the server 10 executes a voice extraction step of extracting a plurality of segment voice data for each utterance segment from the voice data of the second user received in step S302.
Since step S303 is the same as step S103 in the emotion analysis process, the explanation is omitted.

ステップＳ３０４において、サーバ１０の印象解析部１０４３は、対話取得ステップにおいて取得した第２ユーザの対話情報に基づき、対話において第２ユーザが他のユーザに対して与える印象に関する印象特徴量を算定する印象算定ステップを実行する。印象算定ステップは、対話取得ステップにおいて第２ユーザから取得した対話情報に基づき、好き、嫌い、うるさい、聞きづらい、丁寧、わかりにくい、おどおどした、神経質、威圧的、暴力的および性的のうち少なくともいずれか１つの印象に関する強度を示す印象特徴量を算定するステップを実行する。
印象算定ステップは、対話取得ステップにおいて第２ユーザから取得した対話情報を入力データとして、学習モデルに適用することにより、対話において第２ユーザが他のユーザに対して与える印象に関する印象特徴量を出力データとして算定するステップを実行する。
具体的に、サーバ１０の印象解析部１０４３は、Ｓ３０３において音声区間テーブル１０１６に記憶された区間音声データ、区間動画データ、区間読上テキストを取得し、音声区間情報のうち話者ＩＤが第１ユーザＩＤ２０１１である音声区間情報を除外し、話者ＩＤが第２ユーザＩＤ３０１１である音声区間情報のみを入力データとして印象評価モデル１０３２に適用し、印象評価モデル１０３２は入力データに応じた印象特徴量を出力データとして出力する。これにより、第２ユーザが与える印象を、印象特徴量により評価できる。
なお、印象評価モデル１０３２に適用する入力データは、音声区間情報のうち話者ＩＤが第２ユーザＩＤ３０１１である音声区間情報を除外し、話者ＩＤが第１ユーザＩＤ２０１１である音声区間情報としても良い。この場合、第１ユーザが与える印象を、印象特徴量により評価できる。 In step S304, the impression analysis unit 1043 of the server 10 calculates an impression feature amount relating to the impression given by the second user to other users in the dialogue based on the dialogue information of the second user acquired in the dialogue acquisition step. Perform calculation steps. In the impression calculation step, based on the dialogue information acquired from the second user in the dialogue acquisition step, at least any one of like, dislike, loud, hard to hear, polite, difficult to understand, timid, nervous, overbearing, violent and sexual. Alternatively, a step of calculating an impression feature quantity indicating the strength of one impression is executed.
In the impression calculation step, dialogue information obtained from the second user in the dialogue acquisition step is used as input data, and is applied to a learning model, thereby outputting an impression feature quantity relating to the impression given by the second user to other users in the dialogue. Execute the step of computing as data.
Specifically, the impression analysis unit 1043 of the server 10 acquires the segment voice data, segment video data, and segment reading text stored in the voice segment table 1016 in S303. The speech section information with the user ID 2011 is excluded, and only the speech section information with the speaker ID of the second user ID 3011 is applied to the impression evaluation model 1032 as input data. as the output data. Thereby, the impression given by the second user can be evaluated by the impression feature amount.
Note that the input data applied to the impression evaluation model 1032 excludes the speech section information whose speaker ID is the second user ID 3011 among the speech section information, and the speech section information whose speaker ID is the first user ID 2011. good. In this case, the impression given by the first user can be evaluated by the impression feature amount.

ステップＳ３０４において、印象算定ステップは、対話取得ステップにおいて取得した第２ユーザの対話情報に基づき、対話における第２ユーザの話し方に関する対話特徴量を算定するステップと、算定された対話特徴量に基づき、印象特徴量を算定するステップと、を含む。
印象算定ステップは、対話取得ステップにおいて取得した第２ユーザの対話情報を入力データとして、第１学習モデルに適用することにより、対話における第２ユーザの話し方に関する対話特徴量を出力データとして算定するステップと、算定された対話特徴量を入力データとして、第２学習モデルに適用することにより、印象特徴量を算定するステップと、を含む。
印象算定ステップは、対話取得ステップにおいて取得した第２ユーザの対話情報に基づき、対話における第２ユーザの話速、抑揚、丁寧な表現の数、フィラーの数および文法的な発話の数のうち少なくともいずれか１つの話し方に関する対話特徴量を算定するステップを含む。 In step S304, the impression calculation step includes calculating a dialogue feature amount related to the second user's way of speaking in the dialogue based on the second user's dialogue information acquired in the dialogue acquisition step, and based on the calculated dialogue feature amount, and calculating an impression feature amount.
The impression calculation step is a step of applying the dialogue information of the second user acquired in the dialogue acquisition step as input data to the first learning model, thereby calculating, as output data, a dialogue feature quantity relating to the manner of speaking of the second user in the dialogue. and calculating an impression feature amount by applying the calculated dialogue feature amount as input data to a second learning model.
In the impression calculation step, based on the second user's dialogue information acquired in the dialogue acquisition step, at least the second user's speaking speed, intonation, the number of polite expressions, the number of fillers, and the number of grammatical utterances in the dialogue. It includes the step of calculating dialogue features for any one of the speaking styles.

具体的に、サーバ１０の印象解析部１０４３は、Ｓ３０３において音声区間テーブル１０１６に記憶された区間音声データ、区間動画データ、区間読上テキストを取得し、音声区間情報のうち話者ＩＤが第１ユーザＩＤ２０１１である音声区間情報を除外し、話者ＩＤが第２ユーザＩＤ３０１１である音声区間情報のみを入力データとして第１印象評価モデル１０３３に適用し、第１印象評価モデル１０３３は入力データに応じた対話特徴量を出力データとして出力する。
サーバ１０の印象解析部１０４３は、対話特徴量を入力データとして第２印象評価モデル１０３４に適用し、第２印象評価モデル１０３４は入力データに応じた印象特徴量を出力データとして出力する。これにより、第２ユーザが与える印象を、印象特徴量により評価できる。
なお、印象評価モデル１０３２に適用する入力データは、音声区間情報のうち話者ＩＤが第２ユーザＩＤ３０１１である音声区間情報を除外し、話者ＩＤが第１ユーザＩＤ２０１１である音声区間情報としても良い。この場合、第１ユーザが与える印象を、印象特徴量により評価できる。 Specifically, the impression analysis unit 1043 of the server 10 acquires the segment voice data, segment video data, and segment reading text stored in the voice segment table 1016 in S303. The speech section information with the user ID 2011 is excluded, and only the speech section information with the speaker ID of the second user ID 3011 is applied as input data to the first impression evaluation model 1033, and the first impression evaluation model 1033 It outputs the dialogue feature quantity obtained as output data.
The impression analysis unit 1043 of the server 10 applies the dialogue feature amount as input data to the second impression evaluation model 1034, and the second impression evaluation model 1034 outputs the impression feature amount according to the input data as output data. Thereby, the impression given by the second user can be evaluated by the impression feature amount.
Note that the input data applied to the impression evaluation model 1032 excludes the speech section information whose speaker ID is the second user ID 3011 among the speech section information, and the speech section information whose speaker ID is the first user ID 2011. good. In this case, the impression given by the first user can be evaluated using the impression feature amount.

ステップＳ３０４において、サーバ１０の印象解析部１０４３は、印象算定ステップにおいて算定された印象特徴量を、第２ユーザと関連づけて記憶する記憶ステップを実行する。
具体的に、サーバ１０の印象解析部１０４３は、算定された印象特徴量を、音声区間テーブル１０１６の解析対象のレコードの印象データの項目に記憶する。これにより、音声区間テーブル１０１６の話者ＩＤ（第２ユーザＩＤ）を介して、印象特徴量が第２ユーザと関連づけて記憶される。なお、印象特徴量は、ＣＲＭシステム５０の顧客テーブル５０１２に不図示の印象データを記憶するカラムを設けることにより、第２ユーザＩＤと関連づけて記憶する構成としても良い。また、印象特徴量は、サーバ１０のユーザテーブル１０１２に不図示の印象データを記憶するカラムを設けることにより、第２ユーザＩＤと関連づけて記憶する構成としても良い。
ＣＲＭシステム５０の顧客テーブル５０１２に記憶することにより、対象となる対話において特定されたユーザの印象特徴量を、社内の他の部署のメンバー等と共有できる。例えば、印象特徴量により特定される対話相手の印象に応じて効率的な業務を行うことができる。 In step S304, the impression analysis unit 1043 of the server 10 executes a storage step of storing the impression feature amount calculated in the impression calculation step in association with the second user.
Specifically, the impression analysis unit 1043 of the server 10 stores the calculated impression feature amount in the impression data item of the record to be analyzed in the voice segment table 1016 . As a result, the impression feature amount is stored in association with the second user via the speaker ID (second user ID) of the voice segment table 1016 . Note that the impression feature amount may be stored in association with the second user ID by providing a column for storing impression data (not shown) in the customer table 5012 of the CRM system 50 . Further, the impression feature amount may be stored in association with the second user ID by providing a column for storing impression data (not shown) in the user table 1012 of the server 10 .
By storing it in the customer table 5012 of the CRM system 50, it is possible to share the impression feature amount of the user specified in the target dialogue with members of other departments in the company. For example, efficient work can be performed according to the impression of the dialogue partner specified by the impression feature amount.

ステップＳ３０５において、サーバ１０の印象解析部１０４３は、印象算定ステップにおいて算定された印象特徴量に基づき、第２ユーザが他のユーザに対して与える印象をラベルした話者タイプを特定する特定ステップを実行する。
具体的に、サーバ１０の印象解析部１０４３は、対話ＩＤに基づき、音声区間テーブル１０１６の対話ＩＤを検索し、印象データの項目を取得する。サーバ１０の印象解析部１０４３は、印象データに基づき、話者タイプマスタ１０２２の印象条件に該当するレコード有無を検索し、該当するレコードの話者タイプの項目を取得する。
本開示においては、サーバ１０の印象解析部１０４３は、１の対話情報に対して抽出された複数の音声区間情報のそれぞれに対して算定し、記憶された複数の印象データにかかる印象特徴量を、印象条件として、話者タイプを特定し、取得する構成としても良い。 In step S305, the impression analysis unit 1043 of the server 10 performs an identification step of identifying the speaker type labeled with the impression given by the second user to other users based on the impression feature amount calculated in the impression calculation step. Run.
Specifically, the impression analysis unit 1043 of the server 10 searches for the dialogue ID in the voice segment table 1016 based on the dialogue ID, and acquires the items of impression data. Based on the impression data, the impression analysis unit 1043 of the server 10 searches the speaker type master 1022 for records corresponding to the impression condition, and acquires the speaker type item of the corresponding record.
In the present disclosure, the impression analysis unit 1043 of the server 10 calculates each of the plurality of speech segment information extracted for one piece of dialogue information, and calculates the impression feature amount related to the plurality of stored impression data. , the speaker type may be identified and acquired as the impression condition.

ステップＳ３０５において、サーバ１０の印象解析部１０４３は、特定ステップにおいて特定された話者タイプを、第２ユーザと関連づけて記憶する記憶ステップを実行する。
具体的に、サーバ１０の印象解析部１０４３は、特定された話者タイプ、第２ユーザＩＤをＣＲＭシステム５０へ送信する。ＣＲＭシステム５０の制御部５０４は、受信した話者タイプ、第２ユーザＩＤをそれぞれ、顧客テーブル５０１２の話者タイプ、ユーザＩＤの項目に記憶する。つまり、特定した話者タイプを、当該対話において発話したユーザのユーザＩＤと関連づけて記憶する。
ＣＲＭシステム５０の顧客テーブル５０１２に記憶することにより、対象となる対話において特定されたユーザの話者タイプを、社内の他の部署のメンバー等と共有できる。例えば、対話相手の話者タイプに応じて効率的な応対業務を行うことができる。
本開示においては、ユーザの話者タイプをＣＲＭシステム５０の顧客テーブル５０１２に記憶する構成としたが、サーバ１０のユーザテーブル１０１２に第２ユーザと関連づけて記憶する構成としても構わない。 In step S305, the impression analysis unit 1043 of the server 10 executes a storage step of storing the speaker type specified in the specifying step in association with the second user.
Specifically, the impression analysis unit 1043 of the server 10 transmits the specified speaker type and second user ID to the CRM system 50 . The control unit 504 of the CRM system 50 stores the received speaker type and second user ID in the speaker type and user ID items of the customer table 5012, respectively. That is, the specified speaker type is stored in association with the user ID of the user who spoke in the dialogue.
By storing in customer table 5012 of CRM system 50, the user's speaker type identified in the conversation of interest can be shared with members of other departments within the company. For example, efficient reception work can be performed according to the speaker type of the conversation partner.
In the present disclosure, the user's speaker type is stored in the customer table 5012 of the CRM system 50, but may be stored in the user table 1012 of the server 10 in association with the second user.

ステップＳ３０６において、サーバ１０の印象解析部１０４３は、第１ユーザに対して、記憶ステップにおいて第２ユーザと関連づけて記憶された印象特徴量を提示する提示ステップを実行する。
具体的に、サーバ１０の印象解析部１０４３は、ステップＳ３０５において特定された印象特徴量を第１ユーザ端末２０へ送信する。第１ユーザ端末２０の制御部２０４は、受信した印象特徴量を、第１ユーザ端末２０のディスプレイ２０８１に表示し、第１ユーザへ提示する。なお、印象特徴量は、第２ユーザ、それ以外の管理者、他のユーザ等の任意のユーザに提示しても良い。 In step S306, the impression analysis unit 1043 of the server 10 executes a presentation step of presenting to the first user the impression feature amount stored in association with the second user in the storage step.
Specifically, the impression analysis unit 1043 of the server 10 transmits the impression feature quantity specified in step S305 to the first user terminal 20 . The control unit 204 of the first user terminal 20 displays the received impression feature quantity on the display 2081 of the first user terminal 20 to present it to the first user. Note that the impression feature amount may be presented to any user such as the second user, other administrators, or other users.

ステップＳ３０６において、サーバ１０の印象解析部１０４３は、第１ユーザと第２ユーザとの間で行われる対話に先立って、第１ユーザに対して、記憶ステップにおいて第２ユーザと関連づけて記憶された印象特徴量を提示する提示ステップを実行する。
例えば、第１ユーザまたは他のユーザが、発信処理、着信処理、ルーム等を介して第２ユーザとの間でのオンライン対話を開始する際に、第１ユーザ端末２０のディスプレイ２０８１に表示される、第２ユーザへ発信を行うための発信画面、第２ユーザから着信を受けるための着信画面、対話開始前のルーム画面等に、ステップＳ３０５において第２ユーザと関連づけられて記憶された第２ユーザの印象特徴量を表示し、第１ユーザへ提示しても良い。
これにより、第１ユーザは、対話開始に先立ち、第２ユーザの印象に応じた応対を準備できる。 In step S306, the impression analysis unit 1043 of the server 10 stores the first user in association with the second user in the storage step prior to the interaction between the first user and the second user. A presentation step of presenting the impression feature quantity is executed.
For example, when the first user or another user starts an online dialogue with the second user via outgoing call processing, incoming call processing, rooms, etc., the display 2081 of the first user terminal 20 displays , an outgoing call screen for making an outgoing call to the second user, an incoming call screen for receiving an incoming call from the second user, a room screen before starting the dialogue, etc., the second user stored in association with the second user in step S305. may be displayed and presented to the first user.
Thereby, the first user can prepare a response according to the second user's impression before starting the dialogue.

なお、サーバ１０の印象解析部１０４３は、第１ユーザと第２ユーザとの間で行われる対話に先立って、第１ユーザに対して、記憶ステップにおいて第２ユーザと関連づけて記憶された話者タイプを提示する提示ステップを実行しても良い。
例えば、第１ユーザまたは他のユーザが、発信処理、着信処理、ルーム等を介して第２ユーザとの間でのオンライン対話を開始する際に、第１ユーザ端末２０のディスプレイ２０８１に表示される、第２ユーザへ発信を行うための発信画面、第２ユーザから着信を受けるための着信画面、対話開始前のルーム画面等に、ステップＳ３０５において第２ユーザと関連づけられて記憶された第２ユーザの話者タイプを表示し、第１ユーザへ提示しても良い。
これにより、第１ユーザは、対話開始に先立ち、第２ユーザの話者タイプに応じた応対を準備できる。 Note that, prior to the dialogue between the first user and the second user, the impression analysis unit 1043 of the server 10 asks the first user for the speaker stored in association with the second user in the storage step. A presenting step of presenting the type may be performed.
For example, when the first user or another user starts an online dialogue with the second user via outgoing call processing, incoming call processing, rooms, etc., the display 2081 of the first user terminal 20 displays , an outgoing call screen for making an outgoing call to the second user, an incoming call screen for receiving an incoming call from the second user, a room screen before starting the dialogue, etc., the second user stored in association with the second user in step S305. may be displayed and presented to the first user.
This allows the first user to prepare a response according to the second user's speaker type before starting the dialogue.

サーバ１０の印象解析部１０４３は、第１ユーザと第２ユーザとの間で行われる対話の終了前に、第１ユーザに対して、記憶ステップにおいて第２ユーザと関連づけて記憶された印象特徴量を提示する提示ステップを実行しても良い。
例えば、第１ユーザまたは他のユーザが、第２ユーザとの間でのオンライン対話を行っている間に、第１ユーザ端末２０のディスプレイ２０８１に表示される対話画面、ルーム画面等に、ステップＳ３０５において第２ユーザと関連づけられて記憶された第２ユーザの印象特徴量を表示し、第１ユーザへ提示しても良い。なお、印象特徴量は、第２ユーザ、それ以外の管理者、他のユーザ等の任意のユーザに提示しても良い。
これにより、第１ユーザは、対話中に、第２ユーザの印象に応じた応対を準備できる。 Before the dialogue between the first user and the second user ends, the impression analysis unit 1043 of the server 10 provides the first user with the impression feature amount stored in association with the second user in the storage step. may be performed.
For example, while the first user or another user is having an online dialogue with the second user, the interactive screen, room screen, etc. displayed on the display 2081 of the first user terminal 20 may display step S305 may display the second user's impression feature quantity stored in association with the second user in and present it to the first user. Note that the impression feature amount may be presented to any user such as the second user, other administrators, or other users.
Thereby, the first user can prepare a response according to the second user's impression during the dialogue.

サーバ１０の印象解析部１０４３は、第１ユーザと第２ユーザとの間で行われる対話の終了前に、第１ユーザに対して、記憶ステップにおいて第２ユーザと関連づけて記憶された話者タイプを提示する提示ステップを実行しても良い。
例えば、第１ユーザまたは他のユーザが、第２ユーザとの間でのオンライン対話を行っている間に、第１ユーザ端末２０のディスプレイ２０８１に表示される対話画面、ルーム画面等に、ステップＳ３０５において第２ユーザと関連づけられて記憶された第２ユーザの話者タイプを表示し、第１ユーザへ提示しても良い。なお、印象特徴量は、第２ユーザ、それ以外の管理者、他のユーザ等の任意のユーザに提示しても良い。
これにより、第１ユーザは、対話中に、第２ユーザの話者タイプに応じた応対を準備できる。 Before the dialogue between the first user and the second user ends, the impression analysis unit 1043 of the server 10 provides the first user with the speaker type stored in association with the second user in the storage step. may be performed.
For example, while the first user or another user is having an online dialogue with the second user, the interactive screen, room screen, etc. displayed on the display 2081 of the first user terminal 20 may display step S305 may display the second user's speaker type stored in association with the second user in and present it to the first user. Note that the impression feature amount may be presented to any user such as the second user, other administrators, or other users.
Thereby, the first user can prepare a response according to the second user's speaker type during the dialogue.

サーバ１０の印象解析部１０４３は、印象算定ステップにおいて、複数の対話特徴量のうち、印象特徴量に対する影響度が大きい１または複数の前記対話特徴量を提示する提示ステップを実行しても良い。
具体的に、サーバ１０の印象解析部１０４３は、複数の対話特徴量を入力データとして第２印象評価モデル１０３４に適用し、第２印象評価モデル１０３４は入力データに応じた印象特徴量を出力データとして出力する際に、出力される印象特徴量に大きな影響を与える１または複数の対話特徴量を特定し、第１ユーザ端末２０、第２ユーザ端末３０、それ以外の他のユーザ端末等へ送信し、ユーザへ提示する構成としても良い。
例えば、第２印象評価モデル１０３４は、出力される印象特徴量に大きな影響を与える１または複数の対話特徴量を出力データとして出力するものとしても良い。これにより、印象特徴量に大きな影響を与える対話特徴量を高速に取得することができる。 In the impression calculation step, the impression analysis unit 1043 of the server 10 may execute a presentation step of presenting one or more of the plurality of dialogue feature quantities having a large degree of influence on the impression feature quantity.
Specifically, the impression analysis unit 1043 of the server 10 applies a plurality of dialogue feature amounts as input data to the second impression evaluation model 1034, and the second impression evaluation model 1034 outputs impression feature amounts according to the input data. When outputting as, one or more dialogue feature values that have a large impact on the output impression feature value are specified, and transmitted to the first user terminal 20, the second user terminal 30, other user terminals, etc. and may be presented to the user.
For example, the second impression evaluation model 1034 may output, as output data, one or a plurality of dialogue feature quantities that greatly affect the output impression feature quantity. As a result, it is possible to quickly acquire the dialogue feature that greatly affects the impression feature.

＜印象解析処理の変形例＞
印象解析処理は、顧客である第２ユーザではなく、オペレータである第１ユーザの印象状態を特定する構成としても良い。
また、第１ユーザが他のユーザに与えたい目標印象特徴量、目標話者タイプを受け付けて、第１ユーザが改善すべき対話特徴量を算定し、第１ユーザに提示しても良い。つまり、第１ユーザに対して、好ましい話し方を提案するステップを含めても良い。
この場合、印象解析処理のステップＳ３０１からステップＳ３０５において、第２ユーザを第１ユーザと読み替えるだけで処理内容としては同様であるため説明を省略する。 <Modified example of impression analysis processing>
The impression analysis process may be configured to identify the impression state of the first user who is the operator instead of the second user who is the customer.
Alternatively, the first user may receive the desired impression feature amount and the target speaker type that the first user wants to give to other users, calculate the dialogue feature amount to be improved by the first user, and present it to the first user. That is, a step of suggesting a preferred way of speaking to the first user may be included.
In this case, in steps S301 to S305 of the impression analysis process, only the second user is read as the first user, and the processing contents are the same, so description thereof will be omitted.

ステップＳ３０６において、サーバ１０の印象解析部１０４３は、対話において第１ユーザが他のユーザに対して与えるべき目標となる目標話者タイプを受け付ける目標受付ステップを実行する。
具体的に、第１ユーザは、第１ユーザ端末２０の入力装置２０６などを操作することにより、サーバ１０が提供する所定のウェブページにアクセスし、一覧表示された複数の話者タイプから、目標とする話者タイプ（目標話者タイプ）を選択する。第１ユーザ端末２０の制御部２０４は、選択された目標話者タイプを特定し、サーバ１０へ送信する。サーバ１０は、目標話者タイプを受信し受け付ける。目標話者タイプは、第１ユーザが他のユーザに与える印象状態として望ましい印象状態に関する話者タイプであり、第１ユーザが自身で選択しても良いし、第１ユーザの管理者等が、第１ユーザの職務等に応じて選択しても良い。 In step S306, the impression analysis unit 1043 of the server 10 executes a target receiving step of receiving a target speaker type that the first user should give to other users in dialogue.
Specifically, the first user accesses a predetermined web page provided by the server 10 by operating the input device 206 or the like of the first user terminal 20, and selects the target speaker type from the displayed list of speaker types. Select the desired speaker type (target speaker type). The control unit 204 of the first user terminal 20 identifies the selected target speaker type and transmits it to the server 10 . The server 10 receives and accepts the target speaker type. The target speaker type is a speaker type related to an impression state desired by the first user as an impression state to be given to other users. You may select according to a 1st user's job etc.

ステップＳ３０６において、サーバ１０の印象解析部１０４３は、対話において第１ユーザが他のユーザに対して与えるべき目標となる目標印象特徴量を受け付ける目標受付ステップを実行する。
具体的に、サーバ１０の印象解析部１０４３は、受信した目標話者タイプに基づき、話者タイプマスタ１０２２の話者タイプの項目を検索し、印象条件を取得する。サーバ１０の印象解析部１０４３は、取得した印象条件に基づいて、当該印象条件の範囲に含まれる印象特徴量を目標印象特徴量として特定し、受け付ける。サーバ１０の印象解析部１０４３は、目標話者タイプを入力データとして、不図示の学習モデル等に適用することにより出力された目標印象特徴量を取得し、受け付ける構成としても良い。また、第１ユーザから、第１ユーザ端末２０の入力装置２０６などを介して目標印象特徴量を受け付ける構成としても良い。 In step S306, the impression analysis unit 1043 of the server 10 executes a target receiving step of receiving a target impression feature amount that the first user should give to other users in the dialogue.
Specifically, the impression analysis unit 1043 of the server 10 retrieves the speaker type item of the speaker type master 1022 based on the received target speaker type, and acquires the impression condition. Based on the acquired impression conditions, the impression analysis unit 1043 of the server 10 identifies and receives impression feature amounts included in the range of the impression conditions as target impression feature amounts. The impression analysis unit 1043 of the server 10 may be configured to acquire and receive the target impression feature amount output by applying the target speaker type as input data to a learning model (not shown) or the like. Alternatively, the desired impression feature amount may be received from the first user via the input device 206 of the first user terminal 20 or the like.

ステップＳ３０６において、サーバ１０の印象解析部１０４３は、印象算定ステップにおいて算定された印象特徴量と、目標受付ステップにおいて受け付けた目標印象特徴量とに基づき、第１ユーザが改善すべき対話特徴量を算定する改善ステップを実行する。
具体的に、サーバ１０の印象解析部１０４３は、特定した目標印象特徴量に基づいて、当該目標印象特徴量を得るための対話特徴量を目標対話特徴量として特定し、受け付ける。サーバ１０の印象解析部１０４３は、目標印象特徴量を入力データとして、不図示の学習モデル等に適用することにより目標対話特徴量を取得し、受け付ける構成としても良い。
第１ユーザが改善すべき対話特徴量としては、例えば、「話速をより速く」、「話速をより遅く」、「抑揚をより大きく」、「抑揚をより小さく」といったものである。また、第１ユーザが改善すべき対話特徴量は、目標となる対話特徴量（目標対話特徴量）としても良い。 In step S306, the impression analysis unit 1043 of the server 10 determines the dialogue feature amount to be improved by the first user based on the impression feature amount calculated in the impression calculation step and the target impression feature amount received in the target reception step. Carry out the calculated improvement steps.
Specifically, the impression analysis unit 1043 of the server 10 identifies and accepts the dialogue feature amount for obtaining the target impression feature amount as the target dialogue feature amount based on the identified target impression feature amount. The impression analysis unit 1043 of the server 10 may be configured to acquire and receive a target dialogue feature amount by applying the target impression feature amount as input data to a learning model (not shown) or the like.
Dialogue features to be improved by the first user include, for example, "faster speaking speed", "slower speaking speed", "higher intonation", and "lower intonation". Further, the dialogue feature amount to be improved by the first user may be a target dialogue feature amount (target dialogue feature amount).

サーバ１０の印象解析部１０４３は、ステップＳ３０４において算定した対話特徴量と目標対話特徴量とを比較する。サーバ１０の印象解析部１０４３は、対話特徴量の目標対話特徴量に対する差分を第１ユーザが改善すべき対話特徴量として算定する。また、サーバ１０の印象解析部１０４３は、対話特徴量と目標対話特徴量とを比較し、乖離度が大きい対話特徴量を第１ユーザが改善すべき対話特徴量として特定する。
サーバ１０の印象解析部１０４３は、第１ユーザが改善すべき対話特徴量を第１ユーザ端末２０へ送信する。第１ユーザ端末２０の制御部２０４は、受信した改善すべき対話特徴量を第１ユーザ端末２０のディスプレイ２０８１に表示し、第１ユーザに提示する。
例えば、対話における第１ユーザの話速、抑揚、丁寧な表現の数、フィラーの数および文法的な発話の数等の対話特徴量のうち、第１ユーザが改善すべき対話特徴量を特定し、話速、抑揚、丁寧な表現の数、フィラーの数等をどの程度改善すべきか第１ユーザに対して提示する。これにより、オペレータ等が、具体的に話し方を改善することにより他者に与える印象を改善できる。
なお、対話特徴量は、第２ユーザ、それ以外の他のユーザに提示しても良い。 The impression analysis unit 1043 of the server 10 compares the dialogue feature amount calculated in step S304 with the target dialogue feature amount. The impression analysis unit 1043 of the server 10 calculates the difference between the dialogue feature amount and the target dialogue feature amount as the dialogue feature amount to be improved by the first user. In addition, the impression analysis unit 1043 of the server 10 compares the dialogue feature amount and the target dialogue feature amount, and identifies the dialogue feature amount with a large divergence as the dialogue feature amount to be improved by the first user.
The impression analysis unit 1043 of the server 10 transmits to the first user terminal 20 the dialogue feature amount to be improved by the first user. The control unit 204 of the first user terminal 20 displays the received dialogue feature amount to be improved on the display 2081 of the first user terminal 20 to present it to the first user.
For example, out of the dialogue feature quantities such as the first user's speaking speed, intonation, the number of polite expressions, the number of fillers, and the number of grammatical utterances in the dialogue, the dialogue feature quantity to be improved by the first user is specified. , speech rate, intonation, the number of polite expressions, the number of fillers, etc., should be improved to the first user. As a result, the operator or the like can improve the impression given to others by specifically improving the manner of speaking.
Note that the dialogue feature amount may be presented to the second user or other users.

これにより、サーバ１０の印象解析部１０４３は、印象算定ステップにおいて算定された話者タイプと、目標受付ステップにおいて受け付けた目標話者タイプとに基づき、第１ユーザが改善すべき対話特徴量を算定する改善ステップを実行できる。
つまり、ユーザは受け付けた目標話者タイプに応じて改善すべき対話特徴量を把握できるとともに、改善すべき対話特徴量に基づき話し方を改善することにより自身が他者に与える印象を目標話者タイプに近づけることができる。 As a result, the impression analysis unit 1043 of the server 10 calculates the dialogue feature amount to be improved by the first user based on the speaker type calculated in the impression calculation step and the target speaker type received in the target reception step. improvement steps can be taken.
In other words, the user can grasp the dialogue feature amount to be improved according to the received target speaker type, and improve the speaking style based on the dialogue feature amount to be improved, so that the impression the user gives to others can be determined by the target speaker type. can be brought closer to

＜トピック定義処理＞
トピック定義処理は、ユーザが、複数のキーワードと関連づけられ、所定の話題に関するトピックを登録し記憶する処理である。 <Topic definition processing>
The topic definition process is a process in which a user registers and stores a topic associated with a plurality of keywords and related to a predetermined topic.

＜トピック定義処理の概要＞
ユーザは、複数の単語、名詞、形容詞等のキーワードに基づき、新たなトピックを定義し、記憶できる。また、既に記憶されたトピックに対して、過去に記憶された対話情報に基づいて、当該トピックと関連性が高いキーワードの提示を受け、当該キーワードをトピックに関連づけられたキーワードに追加し、記憶することにより、トピックに関連づけられたキーワードを拡張する一連の処理である。 <Overview of topic definition processing>
Users can define and memorize new topics based on keywords such as multiple words, nouns, adjectives, and the like. Also, for an already stored topic, a keyword highly related to the topic is presented based on the dialogue information stored in the past, and the keyword is added to the keywords related to the topic and stored. It is a series of processes for expanding the keywords associated with the topic.

＜トピック定義処理の詳細＞
以下に、トピック定義処理の詳細を説明する。 <Details of topic definition processing>
Details of the topic definition process will be described below.

サーバ１０のトピック処理部１０４４は、音声記憶ステップにおいて記憶された音声データと、キーワード受付ステップにおいて受け付けた複数のキーワードに基づき、第１トピックに新たに関連づける１または複数の新たなキーワードを第１ユーザに対して提示するキーワード提示ステップを実行する。
具体的に、第１ユーザは、第１ユーザ端末２０の入力装置２０６などを操作することにより、アプリケーションプログラム２０１２を実行しブラウザアプリケーションを実行する。第１ユーザは、ブラウザアプリケーションにおいて、サーバ１０が提供する所定のウェブサーバを指定する所定のＵＲＬ（ＵｎｉｆｏｒｍＲｅｓｏｕｒｃｅＬｏｃａｔｏｒ）を入力することにより、サーバ１０へトピックを定義するためのページを要求するリクエストを送信する。 Based on the speech data stored in the speech storage step and the plurality of keywords accepted in the keyword acceptance step, the topic processing unit 1044 of the server 10 selects one or more new keywords to be newly associated with the first topic by the first user. perform a keyword presenting step for presenting to
Specifically, the first user operates the input device 206 or the like of the first user terminal 20 to execute the application program 2012 and execute the browser application. The first user inputs a predetermined URL (Uniform Resource Locator) designating a predetermined web server provided by the server 10 in the browser application, thereby making a request to the server 10 for a page for defining a topic. Send.

サーバ１０のトピック処理部１０４４は、受信したリクエストに含まれる第１ユーザＩＤ２０１１に基づき、音声区間テーブル１０１６の話者ＩＤの項目を検索し、区間読上テキストを取得する。
サーバ１０のトピック処理部１０４４は、区間読上テキストに対して形態素解析等の処理を実行することにより、区間読上テキストに含まれる名詞、形容詞、キーワード等の文字列を抽出する。このとき、対話情報、音声区間情報ごとの文字列の出現頻度等に基づき、文字列に対する重要度の算定を行っても良い。重要度の算定手法としては、ｔｆ－ｉｄｆ等がある。サーバ１０のトピック処理部１０４４は、重要度が高い所定個数の文字列をキーワード候補として特定する。 Based on the first user ID 2011 included in the received request, the topic processing unit 1044 of the server 10 searches for the speaker ID item in the speech segment table 1016 and acquires segment reading text.
The topic processing unit 1044 of the server 10 extracts character strings such as nouns, adjectives, and keywords included in the segment reading text by executing processing such as morphological analysis on the segment reading text. At this time, the degree of importance of the character string may be calculated based on the appearance frequency of the character string for each piece of dialogue information and voice section information. tf-idf and the like are available as methods for calculating the degree of importance. The topic processing unit 1044 of the server 10 identifies a predetermined number of character strings with high importance as keyword candidates.

サーバ１０のトピック処理部１０４４は、トピックマスタ１０２３からトピックＩＤ、キーワードを取得し、複数のトピックＩＤのそれぞれに関連づけられた複数のキーワードと、１または複数の対話情報または音声区間情報において共起関係にあり、トピックＩＤとは関連づけられていない文字列をキーワード候補として特定しても良い。なお、共起関係の算定にあたり、キーワード、文字列ごとの重要度を考慮しても良い。キーワード候補の特定にあたり、出現頻度等に基づき算定された重要度を考慮し、所定個数の文字列をキーワード候補として特定しても良い。 The topic processing unit 1044 of the server 10 acquires topic IDs and keywords from the topic master 1023, and a plurality of keywords associated with each of the plurality of topic IDs and co-occurrence relationships in one or a plurality of pieces of dialogue information or speech section information. , and is not associated with the topic ID, may be specified as a keyword candidate. In calculating the co-occurrence relationship, the degree of importance of each keyword and character string may be considered. When identifying keyword candidates, a predetermined number of character strings may be identified as keyword candidates in consideration of the degree of importance calculated based on appearance frequency or the like.

サーバ１０のトピック処理部１０４４は、特定したキーワード候補を第１ユーザ端末２０へ送信する。第１ユーザ端末２０の制御部２０４は、受信したキーワード候補を第１ユーザ端末２０のディスプレイ２０８１に表示し、第１ユーザに提示する。 The topic processing unit 1044 of the server 10 transmits the identified keyword candidates to the first user terminal 20 . The control unit 204 of the first user terminal 20 displays the received keyword candidate on the display 2081 of the first user terminal 20 and presents it to the first user.

サーバ１０のトピック処理部１０４４は、第１ユーザから１または複数のキーワードを受け付けるキーワード受付ステップを実行する。
具体的に、第１ユーザは、第１ユーザ端末２０の入力装置２０６などを操作することにより、第１ユーザ端末２０のディスプレイ２０８１に表示されたキーワード候補から新たにトピックと関連づけるためのキーワードを選択する。
第１ユーザ端末２０の制御部２０４は、第１ユーザにより選択された１または複数のキーワード候補をサーバ１０へ送信する。 The topic processing unit 1044 of the server 10 executes a keyword acceptance step of accepting one or more keywords from the first user.
Specifically, the first user operates the input device 206 or the like of the first user terminal 20 to select a keyword to be newly associated with the topic from the keyword candidates displayed on the display 2081 of the first user terminal 20. do.
The control unit 204 of the first user terminal 20 transmits the one or more keyword candidates selected by the first user to the server 10 .

キーワード受付ステップは、キーワード提示ステップにおいて第１ユーザに対して提示された複数の新たなキーワードのうち、第１ユーザにより選択された１または複数のキーワードを受け付けるステップを実行する。
具体的に、サーバ１０のトピック処理部１０４４は、第１ユーザ端末２０から１または複数のキーワード候補を受信し、受け付ける。 The keyword acceptance step executes a step of accepting one or more keywords selected by the first user from among the plurality of new keywords presented to the first user in the keyword presentation step.
Specifically, the topic processing unit 1044 of the server 10 receives and accepts one or more keyword candidates from the first user terminal 20 .

サーバ１０のトピック処理部１０４４は、キーワード受付ステップにおいて受け付けた１または複数のキーワードを、所定の話題に関する第１トピックと関連づけて記憶するトピック記憶ステップを実行する。
具体的に、サーバ１０のトピック処理部１０４４は、受け付けた複数のキーワード候補を、トピックＩＤと関連づけてトピックマスタ１０２３に記憶する。なお、第１ユーザにより選択された１または複数のキーワード候補は、既にトピックマスタ１０２３に記憶されているトピックＩＤと関連づけても良いし、新たなトピックＩＤを生成し、当該新たに生成されたトピックＩＤと関連づける構成としても良い。
既にトピックマスタ１０２３に記憶されているトピックＩＤと関連づけて記憶する場合は、第１ユーザは、第１ユーザ端末２０の入力装置２０６などを操作することにより、関連づける対象となるトピックＩＤを選択する選択操作を実行する。 The topic processing unit 1044 of the server 10 executes a topic storage step of storing one or more keywords accepted in the keyword acceptance step in association with a first topic related to a predetermined topic.
Specifically, the topic processing unit 1044 of the server 10 associates a plurality of received keyword candidates with topic IDs and stores them in the topic master 1023 . One or a plurality of keyword candidates selected by the first user may be associated with a topic ID already stored in the topic master 1023, or a new topic ID may be generated and the newly generated topic It is good also as a structure associated with ID.
When storing in association with a topic ID already stored in the topic master 1023, the first user operates the input device 206 or the like of the first user terminal 20 to select the topic ID to be associated. perform an operation.

＜トピック解析処理＞
トピック解析処理は、複数のユーザにより行われたオンライン対話の音声、動画等の対話情報を解析し、対話情報と１または複数のトピックとの関連度を算定し、関連度に基づき、対話情報にトピックを関連づけ、記憶する処理である。 <Topic analysis processing>
Topic analysis processing analyzes dialogue information such as audio and video of online dialogue conducted by multiple users, calculates the degree of relevance between the dialogue information and one or more topics, and based on the degree of relevance, analyzes the dialogue information This is the process of associating and storing topics.

＜トピック解析処理の概要＞
トピック解析処理は、ユーザ間のオンライン対話を検知すると、対話に関する対話情報を記憶し、対話情報に含まれる音声データ、動画データをそれぞれ発話区間ごとの区間音声データ、区間動画データ等の区間データへ分割し、区間データごとに複数のトピックとの関連度を算定し、区間データごとのトピックを特定し、代表的なトピックを対話情報のラベル情報として記憶する一連の処理である。 <Overview of topic analysis processing>
When an online dialogue between users is detected, the topic analysis process stores the dialogue information related to the dialogue, and converts the audio data and video data included in the dialogue information into segment data such as segment audio data and segment video data for each utterance segment, respectively. This is a series of processes of dividing, calculating the degree of relevance to a plurality of topics for each section data, identifying the topic for each section data, and storing representative topics as label information of dialogue information.

＜トピック解析処理の詳細＞
以下に、トピック解析処理の詳細を説明する。 <Details of topic analysis processing>
Details of the topic analysis process will be described below.

ステップＳ５１１において、既に説明した発信処理、着信処理、ルーム等を介してユーザと顧客との間でのオンライン対話が開始される。 In step S511, an online dialogue between the user and the customer is started through the already explained outgoing call processing, incoming call processing, room, and the like.

ステップＳ５１２において、サーバ１０のトピック処理部１０４４は、対話に関する音声データを受け付ける受付ステップを実行する。サーバ１０のトピック処理部１０４４は、受付ステップにおいて受け付けた音声データを記憶する音声記憶ステップを実行する。
ステップＳ５１２は、感情解析処理におけるステップＳ１０２と同様であるため説明を省略する。 In step S512, the topic processing unit 1044 of the server 10 executes a receiving step of receiving voice data regarding dialogue. The topic processing unit 1044 of the server 10 executes a voice storage step of storing the voice data received in the reception step.
Since step S512 is the same as step S102 in the emotion analysis process, the description is omitted.

ステップＳ５１３において、サーバ１０のトピック処理部１０４４は、受付ステップにおいて受け付けた音声データから、発話区間ごとに複数の区間音声データを抽出する音声抽出ステップを実行する。
ステップＳ５１３は、感情解析処理におけるステップＳ１０３と同様であるため説明を省略する。 In step S513, the topic processing unit 1044 of the server 10 executes a voice extraction step of extracting a plurality of segment voice data for each utterance segment from the voice data received in the receiving step.
Since step S513 is the same as step S103 in the emotion analysis process, the explanation is omitted.

ステップＳ５１３において、音声抽出ステップは、対話が終了する前に、受付ステップにおいて受け付けた音声データから、発話区間ごとに複数の区間音声データを抽出するステップを実行しても良い。
つまり、音声抽出ステップは、複数のユーザによるオンライン対話の対話中の任意のタイミングに実行する構成としても良い。 In step S513, the voice extracting step may execute a step of extracting a plurality of segmental voice data for each utterance segment from the voice data received in the receiving step before the dialogue ends.
In other words, the speech extraction step may be configured to be executed at arbitrary timing during online dialogue between a plurality of users.

ステップＳ５１４において、サーバ１０のトピック処理部１０４４は、複数のキーワードと関連づけられ、所定の話題に関する第１トピックを特定するトピック特定ステップを実行する。
具体的に、サーバ１０のトピック処理部１０４４は、トピックマスタ１０２３を参照して、トピック定義処理により予め登録されたトピックＩＤ、トピックＩＤに関連づけられた１または複数のキーワードを取得し、特定する。 In step S514, the topic processing unit 1044 of the server 10 performs a topic identification step of identifying a first topic associated with a plurality of keywords and related to a predetermined topic.
Specifically, the topic processing unit 1044 of the server 10 refers to the topic master 1023 to obtain and specify a topic ID registered in advance by the topic definition process and one or more keywords associated with the topic ID.

関連度算定ステップは、複数の区間音声データごとに、トピック特定ステップにおいて特定した複数のトピックごとの関連度を算定するステップを実行する。
本開示においては、主に簡単のため１の第１トピックと、第１トピックに関連づけられた１または複数のキーワードについて説明するが、トピックは１つに限られず複数のトピック（第２トピック、第３トピック・・・）に対して同様の処理を実行しても構わない。 The degree-of-relevance calculation step executes a step of calculating the degrees of relevance for each of the plurality of topics identified in the topic identification step for each of the plurality of segmental speech data.
In this disclosure, one first topic and one or more keywords associated with the first topic will be mainly described for simplicity, but the topic is not limited to one and may include multiple topics (second topic, second topic, 3 topics . . . ).

ステップＳ５１４において、サーバ１０のトピック処理部１０４４は、複数の区間音声データごとに、トピック特定ステップにおいて特定した第１トピックとの関連度を示す第１関連度を算定する関連度算定ステップを実行する。
具体的に、サーバ１０のトピック処理部１０４４は、Ｓ５１３において取得した音声区間情報と、第１トピックに関連づけられたキーワードとの関連性に応じて、第１トピックとの関連度を示す第１関連度を算定する。 In step S514, the topic processing unit 1044 of the server 10 executes a degree-of-relevance calculation step of calculating a first degree of relevance indicating the degree of relevance to the first topic identified in the topic identification step for each of the plurality of segmental audio data. .
Specifically, the topic processing unit 1044 of the server 10 generates a first association index indicating the degree of association with the first topic according to the association between the speech section information acquired in S513 and the keyword associated with the first topic. Calculate degrees.

第１関連度の算定方法の一例を以下の通り説明する。サーバ１０のトピック処理部１０４４は、第１トピックに関連づけられたキーワードに基づき分散表現（埋め込み表現）として高次元ベクトル（トピックベクトル）を作成する。また、サーバ１０のトピック処理部１０４４は、複数の音声区間情報に含まれる区間読上テキストに対して形態素解析等の処理を実行することにより、区間読上テキストに含まれる名詞、形容詞、キーワード等の文字列を抽出し、抽出された文字列に基づき分散表現として高次元ベクトル（音声区間ベクトル）を作成する。なお、分散表現の作成方法としては、Ｗｏｒｄ２ｖｅｃと呼ばれる手法が知られている。サーバ１０のトピック処理部１０４４は、第１関連度を、トピックベクトルと音声区間ベクトルとのコサイン類似度を計算することにより算定する。なお、第１関連度は、ユークリッド距離、マハラノビス距離、マンハッタン距離、チェビシェフ距離、ミンコフスキー距離等、任意の多次元ベクトル間の距離を算定するアルゴリズムを適用しても構わない。
このように計算された第１関連度は、第１トピックに関連づけられた複数のキーワードと、複数の音声区間情報に含まれる文字列との全体的な類似傾向を反映したものとなる。これにより、音声区間情報に含まれる文字列が、トピックに含まれるキーワードの言い換え表現や表記の違いにより同じ意味の単語が異なる単語と判定されずに、第１トピックに含まれるキーワードと意味内容の関連性が高い音声区間情報について、より高い関連度が得られる。
本開示においては、第１トピックとの関連度を示す第１関連度の算定について説明したが、任意のトピックと、当該トピックと音声区間情報との関連度の算定も同様である。 An example of the method for calculating the first degree of association will be described below. The topic processing unit 1044 of the server 10 creates a high-dimensional vector (topic vector) as distributed representation (embedded representation) based on the keyword associated with the first topic. In addition, the topic processing unit 1044 of the server 10 performs processing such as morphological analysis on the segmented reading text included in the plurality of pieces of speech segment information, thereby identifying nouns, adjectives, keywords, etc. included in the segmented reading text. is extracted, and a high-dimensional vector (speech segment vector) is created as a distributed representation based on the extracted character string. Note that a technique called Word2vec is known as a method for creating a distributed representation. The topic processing unit 1044 of the server 10 calculates the first relevance by calculating the cosine similarity between the topic vector and the speech segment vector. For the first degree of association, any algorithm for calculating the distance between arbitrary multidimensional vectors, such as Euclidean distance, Mahalanobis distance, Manhattan distance, Chebyshev distance, Minkowski distance, etc., may be applied.
The first degree of relevance calculated in this way reflects the general tendency of similarity between the plurality of keywords associated with the first topic and the character strings included in the plurality of speech segment information. As a result, the character strings included in the speech segment information are not determined to be different words with the same meaning due to differences in paraphrasing expressions and notations of the keywords included in the topic, so that the keywords included in the first topic and the semantic content are different. A higher degree of relevance is obtained for speech segment information that is highly relevant.
In the present disclosure, the calculation of the first degree of relevance indicating the degree of relevance to the first topic has been described, but the calculation of the degree of relevance between an arbitrary topic and the speech section information is the same.

関連度算定ステップは、対話が終了する前に、複数の区間音声データに含まれる区間音声データごとに、トピック特定ステップにおいて特定した第１トピックとの関連度を示す第１関連度を算定するステップを実行しても良い。
つまり、複数のユーザによるオンライン対話の対話中の任意のタイミングに実行する構成としても良い。これにより、オンライン対話の対話途中においても、それまでの対話における音声区間情報に対して、それぞれのトピックとの関連度を算定できる。 The degree-of-relevance calculating step is a step of calculating a first degree of relevance indicating the degree of relevance to the first topic identified in the topic identifying step for each segmental audio data included in the plurality of segmental audio data before the dialogue ends. may be executed.
In other words, it may be configured to be executed at an arbitrary timing during online dialogue between a plurality of users. As a result, even in the middle of an online dialogue, it is possible to calculate the degree of relevance to each topic for speech segment information in the dialogue up to that point.

関連度算定ステップは、第１トピックに関連づけられた複数のキーワードのうち、音声抽出ステップにおいて抽出された複数の区間音声データに多く含まれるキーワードほど関連度へ与える重みが小さくなるようにし、複数の区間音声データごとに第１トピックに関連づけられた複数のキーワードの重み付けを考慮した一致度を、第１トピックとの関連度を示す第１関連度として算定しても良い。
具体的に、関連度算定の際に第１トピックに関連づけられた複数のキーワードごとの重要性について、異なる重み付けを行っても良い。例えば、１の対話情報に対して抽出された複数の音声区間情報に対して、多くの音声区間情報に頻出するキーワードの、関連度へ与える影響度合いが小さくなるように、重要性、重みを他のキーワードに比べて小さい値としても良い。これにより、多くの音声区間情報に頻出するありふれたキーワードに関連付いたトピックとの関連度が過大に評価されることを防止できる。
本開示においては、第１トピックとの関連度を示す第１関連度の算定について説明したが、任意のトピックと、当該トピックと音声区間情報との関連度の算定も同様としても良い。 In the degree-of-relevance calculation step, among the plurality of keywords associated with the first topic, the more keywords included in the plurality of segmental speech data extracted in the speech extraction step, the smaller the weight given to the degree of relevance. A degree of matching that takes into consideration the weighting of a plurality of keywords associated with the first topic for each piece of segmental audio data may be calculated as the first degree of relevance indicating the degree of relevance with the first topic.
Specifically, different weights may be applied to the importance of each of the plurality of keywords associated with the first topic when calculating the degree of association. For example, for a plurality of pieces of speech segment information extracted for one piece of dialogue information, the importance and weight of keywords frequently appearing in many pieces of speech segment information may be reduced so as to reduce the degree of influence on the degree of relevance. may be set to a value smaller than the number of keywords. As a result, it is possible to prevent overestimation of the degree of relevance to topics associated with common keywords that frequently appear in a lot of speech segment information.
In the present disclosure, calculation of the first degree of relevance, which indicates the degree of relevance to the first topic, has been described.

関連度算定ステップは、第１トピックに関連づけられた複数のキーワードのうち、第１関連度の算定対象となる対象区間音声データから時系列的に所定個数前までの複数の区間音声データに多く含まれるキーワードほど関連度へ与える重みが小さくなるようにし、複数の区間音声データごとに第１トピックに関連づけられた複数のキーワードとの重み付けを考慮した一致度を、第１トピックとの関連度を示す第１関連度として算定しても良い。
例えば、１の対話情報に対して抽出された複数の音声区間情報のすべてではなく、算定対象となる対象区間音声情報から時系列的に所定個数前までの複数の音声区間情報に対して、多くの音声区間情報に頻出するキーワードの、関連度へ与える影響度合いが小さくなるように、重要性、重みを他のキーワードに比べて小さい値としても良い。これにより、対話が終了する前の対話中の任意のタイミングにおいても、直近の音声区間情報とトピックとの関連度をより正確に算定できる。
本開示においては、第１トピックとの関連度を示す第１関連度の算定について説明したが、任意のトピックと、当該トピックと音声区間情報との関連度の算定も同様としても良い。 In the degree-of-relevance calculation step, among the plurality of keywords associated with the first topic, a large number of the plurality of section-speech data up to a predetermined number in chronological order from the target section-sound data for which the first degree of relevance is to be calculated. The degree of association with the first topic is indicated by weighting the degree of relevance for each of the plurality of segmental audio data, taking into consideration the weighting of the plurality of keywords associated with the first topic, for each of the plurality of segmental audio data. It may be calculated as the first degree of association.
For example, instead of all of a plurality of speech segment information extracted from one piece of dialogue information, many of the speech segment information up to a predetermined number in chronological order from the target segment speech information to be calculated are extracted. The importance and weight may be set to values smaller than those of other keywords so that the degree of influence of the keywords frequently appearing in the speech section information on the relevance is small. This makes it possible to more accurately calculate the degree of relevance between the most recent speech segment information and the topic even at any timing during the dialogue before the dialogue ends.
In the present disclosure, calculation of the first degree of relevance, which indicates the degree of relevance to the first topic, has been described.

サーバ１０のトピック処理部１０４４は、１の対話情報に対して抽出された複数の音声区間情報に対して、複数のトピックごとに算定された関連度を、音声区間情報を特定する区間ＩＤ、トピックを特定するトピックＩＤ、算定された関連度を、それぞれ、トピック関連度テーブル１０１７の新たなレコードの区間ＩＤ、トピックＩＤ、関連度の項目に記憶する。 The topic processing unit 1044 of the server 10 calculates the degree of relevance calculated for each of the plurality of topics with respect to the plurality of voice segment information extracted from one piece of dialogue information, the segment ID specifying the voice segment information, the topic and the calculated degree of relevance are respectively stored in the section ID, topic ID, and degree of relevance items of a new record in the topic relevance table 1017 .

ステップＳ５１５において、それぞれの音声区間情報において所定値以上の関連度を有する１または複数のトピックのうち、もっとも関連度が高いトピックを音声区間情報が言及している所定の話題に関するトピックとして特定する。なお、トピックは必ずしも特定される必要はない。サーバ１０のトピック処理部１０４４は、特定したトピックのトピックＩＤを、音声区間テーブル１０１６において関連度の算定対象となる音声区間情報の区間ＩＤにより特定されるレコードのトピックＩＤの項目に記憶する。これにより、音声区間情報が、関連度が高いトピックと関連づけて記憶される。 In step S515, among one or a plurality of topics having a degree of relevance equal to or greater than a predetermined value in each speech segment information, the topic with the highest degree of relevance is specified as a topic related to the predetermined topic referred to by the speech segment information. Note that the topic does not necessarily have to be specified. The topic processing unit 1044 of the server 10 stores the topic ID of the identified topic in the topic ID field of the record identified by the section ID of the speech section information for which the degree of association is to be calculated in the speech section table 1016 . As a result, the speech section information is stored in association with the topic with a high degree of relevance.

ステップＳ５１６において、サーバ１０のトピック処理部１０４４は、関連度算定ステップにおいて算定された複数のトピックごとの関連度に基づき、対話に対するラベル情報を特定するラベル特定ステップを実行する。サーバ１０のトピック処理部１０４４は、ラベル特定ステップにおいて特定されたラベル情報を、対話と関連づけて記憶する記憶ステップを実行する。
具体的に、サーバ１０のトピック処理部１０４４は、ステップＳ５１５において、１の対話情報に対して抽出された複数の音声区間情報のそれぞれに対して記憶されたトピックＩＤを集計し、集計されたトピックＩＤが多い順番に１または複数のトピックＩＤを、当該１の対話情報を特徴付けるトピックとして特定する。なお、集計されたトピックＩＤの個数が所定数以上の１または複数のトピックＩＤを、当該１の対話情報を特徴付けるトピックとして特定しても良い。
サーバ１０のトピック処理部１０４４は、当該特定したトピックＩＤのトピック名、ラベル等のトピックの名称をラベル情報として特定する。なお、不図示のテーブル等を参照して、特定したトピックＩＤに基づき、任意のラベル情報を特定する構成としても良い。
特定したラベル情報、当該１の対話情報の対話ＩＤを、ラベルテーブル１０１５の新たなレコードのラベルデータ、対話ＩＤの項目に記憶する。これにより、対話情報と、対話情報を特徴付けるトピックがラベル情報として関連づけられ記憶され、対話情報を検索する際などに利便性よく利用できる。 In step S516, the topic processing unit 1044 of the server 10 executes a label identification step of identifying label information for the dialogue based on the degree of relevance for each of the plurality of topics calculated in the degree of relevance calculation step. The topic processing unit 1044 of the server 10 executes a storage step of storing the label information identified in the label identification step in association with the dialogue.
Specifically, in step S515, the topic processing unit 1044 of the server 10 aggregates the topic IDs stored for each of the plurality of voice segment information extracted for one piece of dialogue information, and the aggregated topic One or a plurality of topic IDs are specified in descending order of IDs as topics that characterize the one piece of dialogue information. Note that one or a plurality of topic IDs whose total number of topic IDs is equal to or greater than a predetermined number may be specified as the topic that characterizes the one item of dialogue information.
The topic processing unit 1044 of the server 10 identifies the topic name of the identified topic ID and the name of the topic, such as a label, as label information. Note that a table (not shown) or the like may be referred to, and arbitrary label information may be specified based on the specified topic ID.
The specified label information and the dialogue ID of the one piece of dialogue information are stored in the label data and dialogue ID fields of a new record in the label table 1015 . As a result, the dialogue information and the topic characterizing the dialogue information are associated and stored as label information, which can be conveniently used when retrieving the dialogue information.

＜トピック解析処理の実行タイミングについて＞
トピック解析処理のステップＳ５１３～Ｓ５１６は複数のユーザによるオンライン対話の終了後に実行する構成としても良い。これにより、オンライン対話が終了した後、対話内容が確定した後に、対話に関連するトピックが特定され、対話情報と関連づけられて記憶される。 <Regarding the execution timing of topic analysis processing>
Steps S513 to S516 of the topic analysis process may be configured to be executed after the online dialogue by a plurality of users is finished. As a result, after the online dialogue ends and the contents of the dialogue are determined, the topic related to the dialogue is identified and stored in association with the dialogue information.

また、トピック解析処理は複数のユーザによるオンライン対話の開始後、対話の終了前までに実行する構成としても良い。
つまり、複数のユーザによるオンライン対話の対話中の任意のタイミングに実行する構成としても良い。また、ステップＳ５１３～ステップＳ５１６は、オンライン対話の対話中に定期的にリアルタイムに実行する構成としても良い。これにより、オンライン対話の対話途中においても、それまでの対話に応じたトピックが特定され、対話情報と関連づけられて記憶される構成としても良い。
これにより、ユーザは、オンライン対話の対話中にリアルタイムに、オンライン対話に参加しているユーザが言及している話題を確認できるとともに、対話情報を最新のトピックに基づき整理、管理できる。 Also, the topic analysis processing may be configured to be executed after the start of online dialogue by a plurality of users and before the end of the dialogue.
In other words, it may be configured to be executed at an arbitrary timing during online dialogue between a plurality of users. Also, steps S513 to S516 may be configured to be periodically executed in real time during online dialogue. As a result, even in the middle of an online dialogue, a topic corresponding to the dialogue up to that point may be identified and stored in association with the dialogue information.
As a result, the user can confirm the topics mentioned by the users participating in the online dialogue in real time during the online dialogue, and can organize and manage the dialogue information based on the latest topics.

＜トピック提示処理＞
トピック提示処理は、複数のユーザにより行われたオンライン対話の音声、動画等の対話情報を視覚的に可視化しユーザに提示するとともに、対話情報に関連づけられたトピックをユーザに対して提示する処理である。ユーザは、対話情報と、対話情報に関連するトピックを一目で確認することができ、対話内容の概要を直感的に把握できる。 <Topic presentation processing>
The topic presentation process visually visualizes dialogue information such as audio and video of online dialogue conducted by multiple users and presents it to the user, as well as presenting the topic associated with the dialogue information to the user. be. The user can confirm the dialogue information and the topic related to the dialogue information at a glance, and can intuitively grasp the outline of the dialogue contents.

＜トピック提示処理の概要＞
ユーザから提示対象となる対話情報の指定を受け付け、対話情報を取得し、区間データおよび区間データごとのトピックを取得し、対話情報を解析し話者ごとの発話状況を視覚的に確認可能な音声グラフをユーザに提示し、音声グラフに重ねて発話区間ごとのトピックを音声グラフに重ねてユーザに提示する一連の処理である。 <Overview of topic presentation processing>
A voice that accepts the dialogue information to be presented from the user, acquires the dialogue information, acquires the section data and the topic for each section data, analyzes the dialogue information, and visually confirms the utterance status of each speaker. This is a series of processes of presenting a graph to the user, superimposing it on the speech graph, and presenting the topic for each utterance section to the user by superimposing it on the speech graph.

＜トピック提示処理の詳細＞
以下に、トピック提示処理の詳細を説明する。 <Details of topic presentation processing>
Details of the topic presentation process will be described below.

ステップＳ５２１において、第１ユーザはトピックを確認したい対話情報を選択する。
具体的に、第１ユーザは、第１ユーザ端末２０の入力装置２０６などを操作することにより、アプリケーションプログラム２０１２を実行しブラウザアプリケーションを実行する。第１ユーザは、ブラウザアプリケーションにおいて、サーバ１０が提供する所定のウェブサーバを指定する所定のＵＲＬ（ＵｎｉｆｏｒｍＲｅｓｏｕｒｃｅＬｏｃａｔｏｒ）を入力することにより、サーバ１０へトピックを提示させるためのページを要求するリクエストを送信する。
サーバ１０のトピック処理部１０４４は、受信したリクエストに含まれる第１ユーザＩＤ２０１１に基づき、対話テーブル１０１４のユーザＩＤの項目を検索し、対話ＩＤを取得する。サーバ１０のトピック処理部１０４４は、取得した１または複数の対話ＩＤを第１ユーザ端末２０に送信する。第１ユーザ端末２０の制御部２０４は、受信した１または複数の対話ＩＤを第１ユーザ端末２０のディスプレイ２０８１に表示することにより、第１ユーザに提示する。
第１ユーザは、第１ユーザ端末２０の入力装置２０６などを操作することにより、提示された対話ＩＤから所定の対話ＩＤを選択する。第１ユーザ端末２０の制御部２０４は、選択された所定の対話ＩＤをサーバ１０へ送信する。サーバ１０は、対話ＩＤを受信し、受け付ける。 In step S521, the first user selects dialogue information whose topic he wants to check.
Specifically, the first user operates the input device 206 or the like of the first user terminal 20 to execute the application program 2012 and execute the browser application. The first user inputs a predetermined URL (Uniform Resource Locator) designating a predetermined web server provided by the server 10 in the browser application, thereby making a request to the server 10 for a page for presenting a topic. Send.
The topic processing unit 1044 of the server 10 searches the user ID item of the dialogue table 1014 based on the first user ID 2011 included in the received request, and acquires the dialogue ID. The topic processing unit 1044 of the server 10 transmits the acquired one or more dialogue IDs to the first user terminal 20 . The control unit 204 of the first user terminal 20 presents the received one or more interaction IDs to the first user by displaying them on the display 2081 of the first user terminal 20 .
The first user selects a predetermined interaction ID from the presented interaction IDs by operating the input device 206 of the first user terminal 20 or the like. The control unit 204 of the first user terminal 20 transmits the selected predetermined interaction ID to the server 10 . The server 10 receives and accepts the dialogue ID.

なお、第１ユーザが、本開示にかかるオンライン対話サービスを利用して対話中である場合には、当該対話中の対話情報が選択されているものとしても良い。つまり、対話中に第１ユーザ端末２０のディスプレイ２０８１に表示される対話画面において、トピック提示処理を実行する構成としても良い。 Note that, when the first user is having a dialogue using the online dialogue service according to the present disclosure, the dialogue information during the dialogue may be selected. In other words, the topic presentation process may be executed on the dialog screen displayed on the display 2081 of the first user terminal 20 during the dialog.

ステップＳ５２２において、サーバ１０のトピック処理部１０４４は、受信した対話ＩＤに基づき、対話テーブル１０１４の対話ＩＤの項目を検索し、ユーザＩＤ、顧客ＩＤ、対話カテゴリ、受発信種別、音声データ、動画データ等の対話情報を取得する。 In step S522, the topic processing unit 1044 of the server 10 searches the dialogue ID item in the dialogue table 1014 based on the received dialogue ID, and searches for the user ID, customer ID, dialogue category, transmission/reception type, voice data, and video data. Acquire dialogue information such as

ステップＳ５２３において、サーバ１０のトピック処理部１０４４は、受信した対話ＩＤに基づき、音声区間テーブル１０１６の対話ＩＤの項目を検索し、区間ＩＤ、開始日時、終了日時、トピックＩＤの項目を取得する。サーバ１０のトピック処理部１０４４は、取得した区間ＩＤに基づき、トピック関連度テーブル１０１７の区間ＩＤの項目を検索し、トピックＩＤ、関連度を取得する。
つまり、サーバ１０のトピック処理部１０４４は、対話ＩＤに関連づけられた複数の音声区間情報と、音声区間情報ごとのトピックＩＤ、関連度を取得する。 In step S523, the topic processing unit 1044 of the server 10 searches the dialogue ID item in the voice segment table 1016 based on the received dialogue ID, and acquires the segment ID, start date/time, end date/time, and topic ID items. The topic processing unit 1044 of the server 10 searches the item of the section ID in the topic relevance table 1017 based on the obtained section ID, and obtains the topic ID and the relevance.
That is, the topic processing unit 1044 of the server 10 acquires a plurality of pieces of speech segment information associated with the dialogue ID, the topic ID for each piece of speech segment information, and the degree of association.

ステップＳ５２４において、サーバ１０のトピック処理部１０４４は、ステップＳ５２２において取得した対話情報に基づき、話者による発話状況の時系列推移を示す音声グラフを出力し、第１ユーザ端末２０に送信する。第１ユーザ端末２０の制御部２０４は、受信した音声グラフを第１ユーザ端末２０のディスプレイ２０８１に表示し、第１ユーザに提示する。第１ユーザに提示される音声グラフを含む画面例７０を図２０に示す。
なお、音声グラフは、第２ユーザ、それ以外の管理者、他のユーザ等の任意のユーザに提示しても良い。 In step S<b>524 , the topic processing unit 1044 of the server 10 outputs a speech graph showing the chronological transition of the utterance state of the speaker based on the dialogue information acquired in step S<b>522 , and transmits the speech graph to the first user terminal 20 . The control unit 204 of the first user terminal 20 displays the received speech graph on the display 2081 of the first user terminal 20 to present it to the first user. An example screen 70 containing an audio graph presented to the first user is shown in FIG.
Note that the voice graph may be presented to any user such as the second user, other administrators, or other users.

音声グラフは、横軸を対話時間、縦軸（上方）を第１ユーザの音声の出力量、縦軸（下方）を第２ユーザの音声の出力量とするグラフであり、実線Ｌ１が第１ユーザの音声を示し、破線Ｌ２が第２ユーザの音声を示している。
実線Ｌ１及び破線Ｌ２を見ると、基本的には、第１ユーザが音声を発している（話している）間は、第２ユーザは音声を発しておらず（黙って聞いている）、第２ユーザが音声を発している（話している）間は、第１ユーザは音声を発していない（黙って聞いている）ことがわかる。ここで、Ｚ３で示された箇所は、両者が同時に音声を発している状態（被っている状態）であり、第２ユーザの話が終わらないうちに第１ユーザが話し始めた可能性がある。Ｚ１及びＺ２で示された箇所は、両者が音声を発していない時間（沈黙の時間）である。Ｐ１及びＰ２で示された箇所は、所定のキーワードが出現した箇所である。 The speech graph is a graph in which the horizontal axis is the dialogue time, the vertical axis (upper) is the first user's speech output amount, and the vertical axis (lower) is the second user's speech output amount. The voice of the user is shown, and the dashed line L2 represents the voice of the second user.
Looking at the solid line L1 and the dashed line L2, basically, while the first user is speaking (speaking), the second user is not speaking (listening silently). It can be seen that while the second user is speaking (speaking), the first user is not speaking (listening silently). Here, the location indicated by Z3 is a state in which both of them are uttering voices at the same time (a state of overlap), and there is a possibility that the first user started speaking before the second user finished speaking. . The locations indicated by Z1 and Z2 are times when neither of them utters a sound (silence time). The locations indicated by P1 and P2 are locations where a given keyword appears.

ステップＳ５２５において、サーバ１０のトピック処理部１０４４は、複数の区間音声データのうち、関連度算定ステップにおいて算定された第１関連度が所定値以上の１または複数の区間音声データを含む、第１区間群を特定する区間群特定ステップを実行する。
具体的に、サーバ１０のトピック処理部１０４４は、トピック解析処理において、１の対話情報に対して抽出された複数の音声区間情報のそれぞれに対して算定された第１関連度が所定値以上の１または複数の音声区間情報が、第１トピックに関する話題について言及していると判定すると、当該１または複数の音声区間情報を含む、１または複数の音声区間情報を第１区間群として特定する。例えば、時系列的に連続する複数の音声区間情報のトピックとの関連づけが、区間１：トピックＡ、区間２：トピックＡ、区間３：トピックなし、区間４：トピックＡ、区間５：トピックなし、区間６：トピックＢ、区間７：トピックＢ、区間８：トピックＢである場合において、区間１から区間４をトピックＡに関する区間群として特定し、区間６から区間８をトピックＢに関する区間群として特定する。区間３などのように、トピックＡの区間に他のトピックと関連づいた音声区間が含まれている場合においても、区間１から区間４が全体としてトピックＡの話題について言及していると考えられる場合には、区間１から区間４をまとめてトピックＡに関する区間群として特定しても良い。 In step S525, the topic processing unit 1044 of the server 10 selects, from among the plurality of segmental audio data, one or more pieces of segmental audio data whose first relevance calculated in the relevance calculating step is equal to or greater than a predetermined value. Execute an interval group identification step for identifying an interval group.
Specifically, in the topic analysis processing, the topic processing unit 1044 of the server 10 determines that the first relevance calculated for each of the plurality of speech segment information extracted for one piece of dialogue information is equal to or greater than a predetermined value. When it is determined that one or more pieces of speech segment information refer to a topic related to the first topic, one or more pieces of speech segment information including the one or more pieces of speech segment information are identified as a first segment group. For example, the association with topics of a plurality of chronologically continuous speech section information is section 1: topic A, section 2: topic A, section 3: no topic, section 4: topic A, section 5: no topic, Section 6: Topic B, Section 7: Topic B, Section 8: Topic B, Section 1 to Section 4 are specified as the group of sections related to Topic A, and Section 6 to Section 8 are specified as the group of sections related to Topic B. do. Even if the section of topic A includes a speech section related to another topic, such as section 3, section 1 to section 4 as a whole can be considered to refer to the topic of topic A. In this case, Sections 1 to 4 may be collectively identified as a group of sections related to Topic A.

本開示においては、第１区間群を特定するものとしたが、複数の区間音声データのうち、所定の話題に関する第１トピックと関連する１または複数の区間音声データを特定するものとしても良い。また、第１ユーザまたは第２ユーザの入力操作により、１または複数の区間音声データ、第１区間群を選択することにより特定しても良い。 In the present disclosure, the first segment group is identified, but one or more segmental audio data related to a first topic related to a predetermined topic may be identified among a plurality of segmental audio data. Alternatively, the input operation of the first user or the second user may be used to select one or a plurality of segment audio data and the first segment group.

ステップＳ５２５において、サーバ１０のトピック処理部１０４４は、区間群特定ステップにおいて特定された第１区間群を、第１トピックと関連づけて、第１ユーザまたは第２ユーザに提示する提示ステップを実行する。提示ステップは、受付ステップにおいて受け付けた音声データを解析することにより得られる、話者による発話状況の時系列推移を示す音声グラフにおいて、区間群特定ステップにおいて特定された第１区間群を音声グラフと同じ時系列軸上に提示するとともに、第１トピックを第１区間群に関連づけて、第１ユーザまたは第２ユーザに提示するステップを実行する。
具体的に、図２０の音声グラフにおいて、サーバ１０のトピック処理部１０４４は、第１トピックに関連づけられた第１区間群Ｔ１、第２トピックに関連づけられた第２区間群Ｔ２、第３トピックに関連づけられた第３区間群Ｔ３を、描画オブジェクトとして音声グラフに重ねて提示する。例えば、第１区間群Ｔ１、第２区間群Ｔ２、第３区間群Ｔ３は、それぞれトピックごとに割り当てられた異なる色による描画オブジェクトとして描画する構成としても良い。これにより、第１ユーザは、区間群を関連するトピックと関連づけて音声グラフと重ねて視認できる。これにより、第１ユーザは音声グラフにおいて、どの箇所がどのようなトピックについて話題となっているのか視覚的に一目で確認できる。
なお、サーバ１０のトピック処理部１０４４は、区間群特定ステップにおいて特定された第１区間群を、第１ユーザ、第２ユーザ以外の管理者、他のユーザ等の任意のユーザに提示する構成としても良い。 In step S525, the topic processing unit 1044 of the server 10 associates the first segment group identified in the segment group identifying step with the first topic, and executes the presentation step of presenting it to the first user or the second user. In the presenting step, in the speech graph showing the time-series transition of the utterance situation of the speaker, which is obtained by analyzing the speech data received in the receiving step, the first segment group identified in the segment group identifying step is regarded as the speech graph. Presenting on the same chronological axis and associating the first topic with the first section group and presenting it to the first user or the second user.
Specifically, in the speech graph of FIG. 20, the topic processing unit 1044 of the server 10 processes the first section group T1 associated with the first topic, the second section group T2 associated with the second topic, and the third topic The associated third segment group T3 is presented as a drawing object overlaid on the speech graph. For example, the first section group T1, the second section group T2, and the third section group T3 may be drawn as drawing objects in different colors assigned to each topic. Thereby, the first user can associate the section group with the related topic and visually superimpose it on the speech graph. This allows the first user to visually confirm at a glance which part is talking about what topic in the speech graph.
Note that the topic processing unit 1044 of the server 10 presents the first segment group specified in the segment group specifying step to any user such as an administrator other than the first user and the second user, other users, etc. Also good.

ステップＳ５２５において、区間群特定ステップは、時系列的に並べられた複数の区間音声データのそれぞれに対して算定された第１関連度に基づき移動平均を算定するステップと、算定された移動平均が所定値以上の区間音声データを、第１区間群として特定するステップと、を含んでも良い。
具体的に、サーバ１０のトピック処理部１０４４は、区間群を特定するのに際して、トピック関連度テーブルから取得した音声区間情報を、音声区間情報の開始日時等に基づき時系列的に並べる。サーバ１０のトピック処理部１０４４は、所定の音声区間情報の関連度に対して、当該所定の音声区間情報に対する直近Ｎ個の関連度の平均を移動平均として算定する。Ｎは任意の整数である。算定された移動平均を、当該所定の音声区間情報に対する新たな関連度と見なして、当該関連度が所定値以上の音声区間情報を第１トピックに関連づけられた第１区間群として特定する。
本開示においては、主に簡単のため１の第１トピックの関連度に対する移動平均について説明したが、トピックは１つに限られず複数のトピックに対して同様の処理を実行しても構わない。
これにより、発話区間ごとに関連度が高いトピックが短期間で切り替わる場合においても、トピックの関連度を平滑化することにより、トピックについて言及している区間群をまとめて特定できる。オンライン対話サービスにおいて、話者がどのような話題について発話を行ったのか、ユーザはより確認しやすくなる。 In step S525, the section group identification step includes calculating a moving average based on the first relevance calculated for each of the plurality of section audio data arranged in time series, and a step of identifying the segment audio data having a predetermined value or more as the first segment group.
Specifically, when specifying a segment group, the topic processing unit 1044 of the server 10 arranges the speech segment information obtained from the topic relevance table in chronological order based on the start date and time of the speech segment information. The topic processing unit 1044 of the server 10 calculates, as a moving average, the average of the latest N degrees of association with predetermined speech segment information. N is any integer. The calculated moving average is regarded as a new degree of relevance with respect to the predetermined speech segment information, and the speech segment information with the relevance greater than or equal to a predetermined value is specified as the first segment group associated with the first topic.
In the present disclosure, the moving average for the degree of relevance of one first topic has been mainly described for the sake of simplicity, but the number of topics is not limited to one, and similar processing may be performed for a plurality of topics.
As a result, even when the topic with a high degree of relevance changes in a short period of time for each utterance section, by smoothing the relevance of the topic, a group of sections referring to the topic can be collectively identified. In the online dialogue service, it becomes easier for the user to confirm what topic the speaker spoke about.

ステップＳ５２５において、区間群特定ステップは、時系列的に並べられた複数の区間音声データのうち、算定された第１関連度が所定値以上の連続する複数の区間音声データを、第１区間群として特定するステップを実行しても良い。
具体的に、サーバ１０のトピック処理部１０４４は、区間群を特定するのに際して、トピック関連度テーブルから取得した音声区間情報を、音声区間情報の開始日時等に基づき時系列的に並べる。サーバ１０のトピック処理部１０４４は、関連度が所定値以上の複数の連続する音声区間情報を第１トピックに関連づけられた第１区間群として特定する。
本開示においては、主に簡単のため１の第１トピックの関連度に対する移動平均について説明したが、トピックは１つに限られず複数のトピックに対して同様の処理を実行しても構わない。
これにより、特定のトピックについて連続して関連度が高い区間音声データを、トピックについて言及している区間群としてまとめて特定できる。オンライン対話サービスにおいて、話者がどのような話題について発話を行ったのか、ユーザはより確認しやすくなる。 In step S525, in the segment group identification step, among the plurality of segment audio data arranged in chronological order, a plurality of consecutive segment audio data having a calculated first degree of association equal to or greater than a predetermined value are selected as the first segment group. You may perform the steps identified as
Specifically, when specifying a segment group, the topic processing unit 1044 of the server 10 arranges the speech segment information obtained from the topic relevance table in chronological order based on the start date and time of the speech segment information. The topic processing unit 1044 of the server 10 identifies a plurality of pieces of continuous speech segment information with degrees of association equal to or greater than a predetermined value as the first segment group associated with the first topic.
In the present disclosure, the moving average for the degree of relevance of one first topic has been mainly described for the sake of simplicity, but the number of topics is not limited to one, and similar processing may be performed for a plurality of topics.
As a result, it is possible to collectively identify the segment audio data continuously having a high degree of relevance to a specific topic as a segment group referring to the topic. In the online dialogue service, it becomes easier for the user to confirm what topic the speaker spoke about.

ステップＳ５２５において、サーバ１０のトピック処理部１０４４は、複数の区間音声データのうち、１または複数の区間音声データと、トピック特定ステップにおいて特定した第１トピックと、に基づき、１または複数の区間音声データに含まれるテキスト情報を要約した要約テキストを生成する要約ステップを実行する。要約ステップは、１または複数の区間音声データに含まれるテキスト情報のうち、トピック特定ステップにおいて特定した第１トピックと関連性が高い箇所のみ抽出することにより、１または複数の区間音声データに含まれるテキスト情報を要約した要約テキストを生成するステップを実行する。 In step S525, the topic processing unit 1044 of the server 10 generates one or more segmental audio data based on the one or more segmental audio data among the plurality of segmental audio data and the first topic identified in the topic identification step. Perform a summarization step that generates summary text that summarizes the textual information contained in the data. The summarizing step includes extracting only text information included in one or more pieces of segmental speech data that are highly relevant to the first topic identified in the topic identification step, so that the text information is included in one or more pieces of segmental speech data Execute the step of generating a summary text that summarizes the textual information.

ステップＳ５２５において、要約ステップは、１または複数の区間音声データに含まれるテキスト情報と、第１トピックに関連づけられた複数のキーワードを入力データとして、学習モデルに適用することにより、要約テキストを生成するステップを実行する。
具体的に、区間音声データ、区間動画データおよび区間読上テキストの少なくともいずれか１つを含む区間データと、当該区間データのトピックに関連づけられた複数のキーワードと、を入力データとして、要約モデル１０３５に適用し、当該区間データに含まれるテキスト情報を要約したテキスト情報である要約テキストを出力データとして取得する。これにより、区間データに含まれるテキスト情報のうち、特にトピックと関連性が高い箇所のみを抽出することができ、区間データに含まれるテキスト情報を要約した要約テキストを取得できる。 In step S525, the summarization step generates a summary text by applying text information included in one or more segmental speech data and a plurality of keywords associated with the first topic as input data to a learning model. Execute the step.
Specifically, section data including at least one of section audio data, section video data, and section reading text, and a plurality of keywords associated with the topic of the section data as input data, the summary model 1035 , and obtains a summary text, which is text information that summarizes the text information included in the section data, as output data. As a result, it is possible to extract only portions that are particularly highly relevant to the topic from among the text information included in the section data, and obtain a summary text that summarizes the text information included in the section data.

ステップＳ５２５において、要約ステップは、区間群特定ステップにおいて特定された第１区間群に含まれる１または複数の区間音声データと、トピック特定ステップにおいて特定した第１トピックと、に基づき、１または複数の区間音声データに含まれるテキスト情報を要約した要約テキストを生成するステップを実行する。
具体的に、区間群に含まれる１または複数の区間データと、当該区間群のトピックに関連づけられた複数のキーワードと、を入力データとして、要約モデル１０３５に適用し、当該区間群に含まれるテキスト情報を要約したテキスト情報である要約テキストを出力データとして取得する。これにより、区間データに含まれるテキスト情報のうち、特にトピックと関連性が高い箇所を抽出することができ、区間データに含まれるテキスト情報を要約した要約テキストを取得できる。 In step S525, the summarizing step includes one or more segment audio data included in the first segment group identified in the segment group identifying step and the first topic identified in the topic identifying step. A step of generating a summary text summarizing the text information contained in the segmental speech data is executed.
Specifically, one or more section data included in the section group and a plurality of keywords associated with the topic of the section group are applied as input data to the summary model 1035, and the text contained in the section group is Obtain summary text, which is text information that summarizes information, as output data. As a result, it is possible to extract a portion of the text information included in the section data that is particularly highly relevant to the topic, and obtain a summary text that summarizes the text information included in the section data.

ステップＳ５２５において、サーバ１０のトピック処理部１０４４は、要約ステップにおいて生成された要約テキストを、１または複数の区間音声データと関連づけて提示する提示ステップを実行する。
ステップＳ５２５において、サーバ１０のトピック処理部１０４４は、要約ステップにおいて生成された要約テキストを、区間群特定ステップにおいて特定されただい１区間群と関連づけて提示する提示ステップを実行する。
具体的に、図２０の音声グラフにおいて、サーバ１０のトピック処理部１０４４は、第１区間群Ｔ１の第１トピックに関する要約テキスト７０１を、第１区間群Ｔ１と関連づけて提示する。なお、サーバ１０のトピック処理部１０４４は、区間群ではなく、任意の１または複数の音声区間に関連づけて、要約テキスト７０１を提示しても良い。
なお、サーバ１０のトピック処理部１０４４は、区間群特定ステップにおいて特定された第１区間群を、第１ユーザ、第２ユーザ、それ以外の管理者、他のユーザ等の任意のユーザに提示する構成としても良い。 In step S525, the topic processing unit 1044 of the server 10 executes a presentation step of presenting the summary text generated in the summary step in association with one or more pieces of segmental speech data.
In step S525, the topic processing unit 1044 of the server 10 executes a presentation step of presenting the summary text generated in the summary step in association with the just-one segment group identified in the segment group identification step.
Specifically, in the speech graph of FIG. 20, the topic processing unit 1044 of the server 10 presents the summary text 701 regarding the first topic of the first section group T1 in association with the first section group T1. Note that the topic processing unit 1044 of the server 10 may present the summary text 701 in association with any one or a plurality of voice segments instead of the segment group.
Note that the topic processing unit 1044 of the server 10 presents the first segment group specified in the segment group specifying step to arbitrary users such as the first user, the second user, other administrators, and other users. It may be configured.

＜学習処理＞
感情評価モデル１０３１、印象評価モデル１０３２、第１印象評価モデル１０３３、第２印象評価モデル１０３４の学習処理を以下に説明する。 <Learning processing>
Learning processing of the emotion evaluation model 1031, the impression evaluation model 1032, the first impression evaluation model 1033, and the second impression evaluation model 1034 will be described below.

＜感情評価モデル１０３１の学習処理＞
感情評価モデル１０３１の学習処理は、感情評価モデル１０３１に含まれるディープニューラルネットワークの学習パラメータを深層学習により学習させる処理である。 <Learning processing of emotion evaluation model 1031>
The learning process of the emotion evaluation model 1031 is a process of learning the learning parameters of the deep neural network included in the emotion evaluation model 1031 by deep learning.

＜感情評価モデル１０３１の学習処理の概要＞
感情評価モデル１０３１の学習処理は、区間音声データ、区間動画データ、区間読上テキストを入力データ（入力ベクトル）として、感情特徴量である感情ベクトルまたは感情スカラーを出力データ（教師データ）となるように、感情評価モデル１０３１に含まれるディープニューラルネットワークの学習パラメータを深層学習により学習させる処理である。
感情評価モデル１０３１の入力データから、区間音声データ、区間動画データ、区間読上テキストのいずれかを省略しても構わない。 <Overview of Learning Processing of Emotion Evaluation Model 1031>
In the learning process of the emotion evaluation model 1031, the input data (input vector) is the segmental voice data, the segmental video data, and the segmental reading text, and the emotion vector or the emotion scalar, which is the emotional feature quantity, is output data (teacher data). Second, it is a process of learning the learning parameters of the deep neural network included in the emotion evaluation model 1031 by deep learning.
From the input data of the emotion evaluation model 1031, any one of the segment voice data, the segment moving image data, and the segment reading text may be omitted.

＜感情評価モデル１０３１の学習処理の詳細＞
サーバ１０の学習部１０５１は、区間音声データ、区間動画データ、区間読上テキストなどを入力データ（入力ベクトル）として、所定の感情特徴量を出力データ（教師データ）となるよう、学習データを作成する。
サーバ１０の学習部１０５１は、学習データに基づき、感情評価モデル１０３１のディープニューラルネットワークを学習させるための訓練データ、テストデータ、検証データなどのデータセットを作成する。
サーバ１０の学習部１０５１は、作成したデータセットに基づき感情評価モデル１０３１に含まれるディープニューラルネットワークの学習パラメータを深層学習により学習させる。 <Details of the learning process of the emotion evaluation model 1031>
The learning unit 1051 of the server 10 prepares learning data so that a predetermined emotional feature amount becomes output data (teacher data) with input data (input vectors) such as interval audio data, interval video data, interval reading text, and the like. do.
The learning unit 1051 of the server 10 creates data sets such as training data, test data, and verification data for learning the deep neural network of the emotion evaluation model 1031 based on the learning data.
The learning unit 1051 of the server 10 learns the learning parameters of the deep neural network included in the emotion evaluation model 1031 by deep learning based on the created data set.

＜印象評価モデル１０３２の学習処理＞
印象評価モデル１０３２の学習処理は、印象評価モデル１０３２に含まれるディープニューラルネットワークの学習パラメータを深層学習により学習させる処理である。 <Learning processing of the impression evaluation model 1032>
The learning process of the impression evaluation model 1032 is a process of learning the learning parameters of the deep neural network included in the impression evaluation model 1032 by deep learning.

＜印象評価モデル１０３２の学習処理の概要＞
印象評価モデル１０３２の学習処理は、区間音声データ、区間動画データ、区間読上テキストを入力データ（入力ベクトル）として、印象特徴量を出力データ（教師データ）となるように、印象評価モデル１０３２に含まれるディープニューラルネットワークの学習パラメータを深層学習により学習させる処理である。
印象評価モデル１０３２の入力データから、区間音声データ、区間動画データ、区間読上テキストのいずれかを省略しても構わない。 <Overview of Learning Processing of Impression Evaluation Model 1032>
In the learning process of the impression evaluation model 1032, the impression evaluation model 1032 receives the input data (input vector) from the segment audio data, the segment video data, and the segment reading text, and outputs the impression feature amount as the output data (teaching data). This is the process of learning the learning parameters of the included deep neural network by deep learning.
From the input data of the impression evaluation model 1032, any one of the segment voice data, the segment video data, and the segment reading text may be omitted.

＜印象評価モデル１０３２の学習処理の詳細＞
サーバ１０の学習部１０５１は、区間音声データ、区間動画データ、区間読上テキストなどを入力データ（入力ベクトル）として、所定の印象特徴量を出力データ（教師データ）となるよう、学習データを作成する。
サーバ１０の学習部１０５１は、学習データに基づき、印象評価モデル１０３２のディープニューラルネットワークを学習させるための訓練データ、テストデータ、検証データなどのデータセットを作成する。
サーバ１０の学習部１０５１は、作成したデータセットに基づき印象評価モデル１０３２に含まれるディープニューラルネットワークの学習パラメータを深層学習により学習させる。 <Details of Learning Processing of Impression Evaluation Model 1032>
The learning unit 1051 of the server 10 creates learning data so that a predetermined impression feature amount becomes output data (teacher data) using the segment audio data, segment video data, segment reading text, etc. as input data (input vector). do.
The learning unit 1051 of the server 10 creates data sets such as training data, test data, and verification data for learning the deep neural network of the impression evaluation model 1032 based on the learning data.
The learning unit 1051 of the server 10 learns the learning parameters of the deep neural network included in the impression evaluation model 1032 by deep learning based on the created data set.

＜第１印象評価モデル１０３３の学習処理＞
第１印象評価モデル１０３３の学習処理は、第１印象評価モデル１０３３に含まれるディープニューラルネットワークの学習パラメータを深層学習により学習させる処理である。 <Learning processing of the first impression evaluation model 1033>
The learning process of the first impression evaluation model 1033 is a process of learning the learning parameters of the deep neural network included in the first impression evaluation model 1033 by deep learning.

＜第１印象評価モデル１０３３の学習処理の概要＞
第１印象評価モデル１０３３の学習処理は、区間音声データ、区間動画データ、区間読上テキストを入力データ（入力ベクトル）として、対話特徴量を出力データ（教師データ）となるように、第１印象評価モデル１０３３に含まれるディープニューラルネットワークの学習パラメータを深層学習により学習させる処理である。
第１印象評価モデル１０３３の入力データから、区間音声データ、区間動画データ、区間読上テキストのいずれかを省略しても構わない。 <Overview of Learning Processing of First Impression Evaluation Model 1033>
In the learning process of the first impression evaluation model 1033, the input data (input vectors) are the segmental voice data, the segmental video data, and the segmental reading text, and the dialogue feature amount is output data (teacher data). This is a process of learning the learning parameters of the deep neural network included in the evaluation model 1033 by deep learning.
From the input data of the first impression evaluation model 1033, any one of the segment voice data, the segment video data, and the segment reading text may be omitted.

＜第１印象評価モデル１０３３の学習処理の詳細＞
サーバ１０の学習部１０５１は、区間音声データ、区間動画データ、区間読上テキストなを入力データ（入力ベクトル）として、所定の対話特徴量を出力データ（教師データ）となるよう、学習データを作成する。
サーバ１０の学習部１０５１は、学習データに基づき、第１印象評価モデル１０３３のディープニューラルネットワークを学習させるための訓練データ、テストデータ、検証データなどのデータセットを作成する。
サーバ１０の学習部１０５１は、作成したデータセットに基づき第１印象評価モデル１０３３に含まれるディープニューラルネットワークの学習パラメータを深層学習により学習させる。 <Details of the learning process of the first impression evaluation model 1033>
The learning unit 1051 of the server 10 creates learning data such that the input data (input vector) is the segmental audio data, the segmental video data, the segmental reading text, etc., and the predetermined dialogue feature amount is output data (teaching data). do.
The learning unit 1051 of the server 10 creates data sets such as training data, test data, and verification data for learning the deep neural network of the first impression evaluation model 1033 based on the learning data.
The learning unit 1051 of the server 10 learns the learning parameters of the deep neural network included in the first impression evaluation model 1033 by deep learning based on the created data set.

＜第２印象評価モデル１０３４の学習処理＞
第２印象評価モデル１０３４の学習処理は、第２印象評価モデル１０３４に含まれるディープニューラルネットワークの学習パラメータを深層学習により学習させる処理である。 <Learning processing of the second impression evaluation model 1034>
The learning process of the second impression evaluation model 1034 is a process of learning the learning parameters of the deep neural network included in the second impression evaluation model 1034 by deep learning.

＜第２印象評価モデル１０３４の学習処理の概要＞
第２印象評価モデル１０３４の学習処理は、対話特徴量を入力データ（入力ベクトル）として、印象特徴量を出力データ（教師データ）となるように、第２印象評価モデル１０３４に含まれるディープニューラルネットワークの学習パラメータを深層学習により学習させる処理である。 <Overview of Learning Processing of Second Impression Evaluation Model 1034>
The learning process of the second impression evaluation model 1034 uses the deep neural network included in the second impression evaluation model 1034 so that the dialogue feature amount is input data (input vector) and the impression feature amount is output data (teaching data). This is a process of learning the learning parameters of by deep learning.

＜第２印象評価モデル１０３４の学習処理の詳細＞
サーバ１０の学習部１０５１は、対話特徴量などを入力データ（入力ベクトル）として、所定の印象特徴量を出力データ（教師データ）となるよう、学習データを作成する。
サーバ１０の学習部１０５１は、学習データに基づき、第２印象評価モデル１０３４のディープニューラルネットワークを学習させるための訓練データ、テストデータ、検証データなどのデータセットを作成する。
サーバ１０の学習部１０５１は、作成したデータセットに基づき第２印象評価モデル１０３４に含まれるディープニューラルネットワークの学習パラメータを深層学習により学習させる。 <Details of the learning process of the second impression evaluation model 1034>
The learning unit 1051 of the server 10 creates learning data so that the conversation feature amount and the like are input data (input vector) and a predetermined impression feature amount is output data (teacher data).
The learning unit 1051 of the server 10 creates data sets such as training data, test data, and verification data for learning the deep neural network of the second impression evaluation model 1034 based on the learning data.
The learning unit 1051 of the server 10 learns the learning parameters of the deep neural network included in the second impression evaluation model 1034 by deep learning based on the created data set.

＜要約モデル１０３５の学習処理の詳細＞
サーバ１０の学習部１０５１は、区間音声データ、区間動画データおよび区間読上テキストの少なくともいずれか１つを含む区間データと、所定の話題に関するトピックに関連づけられた複数のキーワードと、を入力データ（入力ベクトル）として、当該区間データに含まれるテキスト情報を要約したテキスト情報である要約テキストを出力データ（教師データ）となるよう、学習データを作成する。
サーバ１０の学習部１０５１は、学習データに基づき、要約モデル１０３５のディープニューラルネットワークを学習させるための訓練データ、テストデータ、検証データなどのデータセットを作成する。
サーバ１０の学習部１０５１は、作成したデータセットに基づき要約モデル１０３５に含まれるディープニューラルネットワークの学習パラメータを深層学習により学習させる。 <Details of Learning Processing of Summary Model 1035>
The learning unit 1051 of the server 10 receives the input data ( Learning data is generated so that output data (teacher data) is summarized text, which is text information that summarizes the text information included in the section data, as an input vector).
The learning unit 1051 of the server 10 creates data sets such as training data, test data, and verification data for learning the deep neural network of the summary model 1035 based on the learning data.
The learning unit 1051 of the server 10 learns the learning parameters of the deep neural network included in the summary model 1035 by deep learning based on the created data set.

＜コンピュータの基本ハードウェア構成＞
図２１は、コンピュータ９０の基本的なハードウェア構成を示すブロック図である。コンピュータ９０は、プロセッサ９０１、主記憶装置９０２、補助記憶装置９０３、通信ＩＦ９９１（インタフェース、Interface）を少なくとも備える。これらは通信バス９２１により相互に電気的に接続される。 <Basic computer hardware configuration>
FIG. 21 is a block diagram showing the basic hardware configuration of the computer 90. As shown in FIG. The computer 90 includes at least a processor 901, a main storage device 902, an auxiliary storage device 903, and a communication IF 991 (interface). These are electrically connected to each other by a communication bus 921 .

プロセッサ９０１とは、プログラムに記述された命令セットを実行するためのハードウェアである。プロセッサ９０１は、演算装置、レジスタ、周辺回路等から構成される。 The processor 901 is hardware for executing an instruction set described in a program. The processor 901 is composed of an arithmetic unit, registers, peripheral circuits, and the like.

主記憶装置９０２とは、プログラム、及びプログラム等で処理されるデータ等を一時的に記憶するためのものである。例えば、ＤＲＡＭ（Dynamic Random Access Memory）等の揮発性のメモリである。 The main storage device 902 is for temporarily storing programs and data processed by the programs. For example, it is a volatile memory such as a DRAM (Dynamic Random Access Memory).

補助記憶装置９０３とは、データ及びプログラムを保存するための記憶装置である。例えば、フラッシュメモリ、ＨＤＤ（Hard Disc Drive）、光磁気ディスク、ＣＤ－ＲＯＭ、ＤＶＤ－ＲＯＭ、半導体メモリ等である。 Auxiliary storage device 903 is a storage device for storing data and programs. For example, flash memory, HDD (Hard Disc Drive), magneto-optical disk, CD-ROM, DVD-ROM, semiconductor memory, and the like.

通信ＩＦ９９１とは、有線又は無線の通信規格を用いて、他のコンピュータとネットワークを介して通信するための信号を入出力するためのインタフェースである。
ネットワークは、インターネット、ＬＡＮ、無線基地局等によって構築される各種移動通信システム等で構成される。例えば、ネットワークには、３Ｇ、４Ｇ、５Ｇ移動通信システム、ＬＴＥ（Long Term Evolution）、所定のアクセスポイントによってインターネットに接続可能な無線ネットワーク（例えばWi-Fi（登録商標））等が含まれる。無線で接続する場合、通信プロトコルとして例えば、Ｚ－Ｗａｖｅ（登録商標）、ＺｉｇＢｅｅ（登録商標）、Ｂｌｕｅｔｏｏｔｈ（登録商標）等が含まれる。有線で接続する場合は、ネットワークには、ＵＳＢ（Universal Serial Bus）ケーブル等により直接接続するものも含む。 The communication IF 991 is an interface for inputting and outputting signals for communicating with other computers via a network using a wired or wireless communication standard.
The network is composed of various mobile communication systems constructed by the Internet, LAN, wireless base stations, and the like. For example, networks include 3G, 4G, and 5G mobile communication systems, LTE (Long Term Evolution), wireless networks (for example, Wi-Fi (registered trademark)) that can be connected to the Internet through predetermined access points, and the like. When connecting wirelessly, communication protocols include, for example, Z-Wave (registered trademark), ZigBee (registered trademark), Bluetooth (registered trademark), and the like. In the case of wired connection, the network includes direct connection using a USB (Universal Serial Bus) cable or the like.

なお、各ハードウェア構成の全部または一部を複数のコンピュータ９０に分散して設け、ネットワークを介して相互に接続することによりコンピュータ９０を仮想的に実現することができる。このように、コンピュータ９０は、単一の筐体、ケースに収納されたコンピュータ９０だけでなく、仮想化されたコンピュータシステムも含む概念である。 It should be noted that the computer 90 can be virtually realized by distributing all or part of each hardware configuration to a plurality of computers 90 and connecting them to each other via a network. Thus, the computer 90 is a concept that includes not only the computer 90 housed in a single housing or case, but also a virtualized computer system.

＜コンピュータ９０の基本機能構成＞
コンピュータ９０の基本ハードウェア構成（図２１）により実現されるコンピュータの機能構成を説明する。コンピュータは、制御部、記憶部、通信部の機能ユニットを少なくとも備える。 <Basic Functional Configuration of Computer 90>
A functional configuration of the computer realized by the basic hardware configuration of the computer 90 (FIG. 21) will be described. The computer includes at least functional units of a control section, a storage section, and a communication section.

なお、コンピュータ９０が備える機能ユニットは、それぞれの機能ユニットの全部または一部を、ネットワークで相互に接続された複数のコンピュータ９０に分散して設けても実現することができる。コンピュータ９０は、単一のコンピュータ９０だけでなく、仮想化されたコンピュータシステムも含む概念である。 Note that the functional units included in the computer 90 can also be implemented by distributing all or part of each functional unit to a plurality of computers 90 interconnected via a network. The computer 90 is a concept that includes not only a single computer 90 but also a virtualized computer system.

制御部は、プロセッサ９０１が補助記憶装置９０３に記憶された各種プログラムを読み出して主記憶装置９０２に展開し、当該プログラムに従って処理を実行することにより実現される。制御部は、プログラムの種類に応じて様々な情報処理を行う機能ユニットを実現することができる。これにより、コンピュータは情報処理を行う情報処理装置として実現される。 The control unit is implemented by the processor 901 reading out various programs stored in the auxiliary storage device 903, developing them in the main storage device 902, and executing processing according to the programs. The control unit can implement functional units that perform various information processing according to the type of program. Thereby, the computer is implemented as an information processing device that performs information processing.

記憶部は、主記憶装置９０２、補助記憶装置９０３により実現される。記憶部は、データ、各種プログラム、各種データベースを記憶する。また、プロセッサ９０１は、プログラムに従って記憶部に対応する記憶領域を主記憶装置９０２または補助記憶装置９０３に確保することができる。また、制御部は、各種プログラムに従ってプロセッサ９０１に、記憶部に記憶されたデータの追加、更新、削除処理を実行させることができる。 A storage unit is realized by the main storage device 902 and the auxiliary storage device 903 . The storage unit stores data, various programs, and various databases. Also, the processor 901 can secure a storage area corresponding to the storage unit in the main storage device 902 or the auxiliary storage device 903 according to a program. In addition, the control unit can cause the processor 901 to execute addition, update, and deletion processing of data stored in the storage unit according to various programs.

データベースは、リレーショナルデータベースを指し、行と列によって構造的に規定された表形式のテーブル、マスタと呼ばれるデータ集合を、互いに関連づけて管理するためのものである。データベースでは、表をテーブル、マスタ、表の列をカラム、表の行をレコードと呼ぶ。リレーショナルデータベースでは、テーブル、マスタ同士の関係を設定し、関連づけることができる。
通常、各テーブル、各マスタにはレコードを一意に特定するための主キーとなるカラムが設定されるが、カラムへの主キーの設定は必須ではない。制御部は、各種プログラムに従ってプロセッサ９０１に、記憶部に記憶された特定のテーブル、マスタにレコードを追加、削除、更新を実行させることができる。 A database refers to a relational database, and is used to manage tabular tables structurally defined by rows and columns, and data sets called masters in association with each other. In a database, a table is called a table, a master is called a column, and a row is called a record. In a relational database, relationships between tables and masters can be set and associated.
Normally, each table and each master has a primary key column for uniquely identifying a record, but setting a primary key to a column is not essential. The control unit can cause the processor 901 to add, delete, and update records in specific tables and masters stored in the storage unit according to various programs.

なお、本開示におけるデータベース、マスタは、情報が構造的に規定された任意のデータ構造体（リスト、辞書、連想配列、オブジェクトなど）を含み得る。データ構造体には、データと、任意のプログラミング言語により記述された関数、クラス、メソッドなどを組み合わせることにより、データ構造体と見なし得るデータも含むものとする。 Note that the database and master in the present disclosure may include any data structure (list, dictionary, associative array, object, etc.) in which information is structurally defined. The data structure also includes data that can be regarded as a data structure by combining data with functions, classes, methods, etc. written in any programming language.

通信部は、通信ＩＦ９９１により実現される。通信部は、ネットワークを介して他のコンピュータ９０と通信を行う機能を実現する。通信部は、他のコンピュータ９０から送信された情報を受信し、制御部へ入力することができる。制御部は、各種プログラムに従ってプロセッサ９０１に、受信した情報に対する情報処理を実行させることができる。また、通信部は、制御部から出力された情報を他のコンピュータ９０へ送信することができる。 A communication unit is implemented by the communication IF 991 . The communication unit implements a function of communicating with another computer 90 via a network. The communication section can receive information transmitted from another computer 90 and input it to the control section. The control unit can cause the processor 901 to execute information processing on the received information according to various programs. Also, the communication section can transmit information output from the control section to another computer 90 .

＜付記＞
以上の各実施形態で説明した事項を以下に付記する。 <Appendix>
The items described in the above embodiments will be added below.

（付記１）
プロセッサと、記憶部とを備え、第１ユーザと第２ユーザとの間の対話に関する情報をコンピュータに処理させるプログラムであって、プログラムは、プロセッサに、対話に関する音声データを受け付ける受付ステップ（Ｓ５１２）と、受付ステップにおいて受け付けた音声データから、発話区間ごとに複数の区間音声データを抽出する音声抽出ステップ（Ｓ５１３）と、複数の区間音声データのうち、所定の話題に関する第１トピックと関連する１または複数の区間音声データを特定する区間特定ステップ（Ｓ５２５）と、複数の区間音声データのうち、区間特定ステップにおいて特定された１または複数の区間音声データと、第１トピックと、に基づき、１または複数の区間音声データに含まれるテキスト情報を要約した要約テキストを生成する要約ステップ（Ｓ５２５）と、を実行させるプログラム。
これにより、対話において話者がどのような話題（トピック）に関してコミュニケーションを行ったのか、ユーザは一目で確認することができる。 (Appendix 1)
A program comprising a processor and a storage unit, and causing a computer to process information relating to a dialogue between a first user and a second user, wherein the program causes the processor to receive voice data relating to the dialogue (S512). a voice extracting step (S513) for extracting a plurality of segmental voice data for each utterance segment from the voice data received in the receiving step; Alternatively, based on the segment identification step (S525) of identifying a plurality of segment audio data, one or a plurality of segment audio data identified in the segment identification step among the plurality of segment audio data, and the first topic, 1 Alternatively, a program for executing a summarizing step (S525) of generating a summary text summarizing text information contained in a plurality of segmental speech data.
As a result, the user can confirm at a glance what topic the speaker has communicated about in the dialogue.

（付記２）
要約ステップ（Ｓ５２５）は、１または複数の区間音声データに含まれるテキスト情報のうち、第１トピックと関連性が高い箇所を抽出することにより、１または複数の区間音声データに含まれるテキスト情報を要約した要約テキストを生成するステップである、付記１記載のプログラム。
これにより、対話において話者がどのような話題（トピック）に関してコミュニケーションを行ったのか、ユーザは一目で確認することができる。 (Appendix 2)
The summarizing step (S525) extracts the text information contained in the one or more segmental speech data by extracting the portion highly relevant to the first topic from the text information contained in the one or more segmental speech data. 10. The program of Clause 1, the step of generating a condensed summary text.
As a result, the user can confirm at a glance what topic the speaker has communicated about in the dialogue.

（付記３）
要約ステップ（Ｓ５２５）は、１または複数の区間音声データに含まれるテキスト情報と、第１トピックに関連づけられた複数のキーワードを入力データとして、学習モデルに適用することにより、要約テキストを生成するステップである、付記１記載のプログラム。
これにより、対話において話者がどのような話題（トピック）に関してコミュニケーションを行ったのか、ユーザは一目で確認することができる。 (Appendix 3)
The summarizing step (S525) is a step of generating a summary text by applying text information contained in one or more segmental speech data and a plurality of keywords associated with the first topic as input data to a learning model. The program according to Supplementary Note 1, wherein
As a result, the user can confirm at a glance what topic the speaker has communicated about in the dialogue.

（付記４）
プログラムは、プロセッサに、複数の区間音声データごとに、第１トピックとの関連度を示す第１関連度を算定する関連度算定ステップ（Ｓ５１４）と、を実行させ、区間特定ステップ（Ｓ５２５）は、複数の区間音声データのうち、関連度算定ステップにおいて算定された第１関連度が所定値以上の１または複数の区間音声データを含む、第１区間群を特定するステップであり、要約ステップ（Ｓ５２５）は、区間特定ステップにおいて特定された第１区間群に含まれる１または複数の区間音声データと、第１トピックと、に基づき、１または複数の区間音声データに含まれるテキスト情報を要約した要約テキストを生成するステップである、付記１記載のプログラム。
これにより、対話において話者がどのような話題（トピック）に関してコミュニケーションを行ったのか、ユーザは一目で確認することができる。 (Appendix 4)
The program causes the processor to execute a degree-of-relevance calculation step (S514) of calculating a first degree of relevance indicating the degree of relevance to a first topic for each of a plurality of pieces of section speech data, and a section identification step (S525) of , a step of identifying a first segment group including one or a plurality of segmental audio data whose first relevance calculated in the relevance calculating step is equal to or greater than a predetermined value among the plurality of segmental audio data, and a summarizing step ( S525) summarizes the text information contained in the one or more segmental audio data based on the one or more segmental audio data contained in the first segment group identified in the segment identifying step and the first topic. 10. The program of Clause 1, the step of generating summary text.
As a result, the user can confirm at a glance what topic the speaker has communicated about in the dialogue.

（付記５）
プログラムは、プロセッサに、要約ステップにおいて生成された要約テキストを、１または複数の区間音声データと関連づけて提示する提示ステップ（Ｓ５２５）と、を実行させる付記１記載のプログラム。
これにより、対話において話者がどのような話題（トピック）に関してコミュニケーションを行ったのか、ユーザは一目で確認することができる。 (Appendix 5)
1. The program according to appendix 1, wherein the program causes the processor to perform a presenting step (S525) of presenting the summarized text generated in the summarizing step in association with one or more segmental speech data.
As a result, the user can confirm at a glance what topic the speaker has communicated about in the dialogue.

（付記６）
プログラムは、プロセッサに、要約ステップにおいて生成された要約テキストを、区間特定ステップにおいて特定されただい１区間群と関連づけて提示する提示ステップ（Ｓ５２５）と、を実行させる付記４記載のプログラム。
これにより、対話において話者がどのような話題（トピック）に関してコミュニケーションを行ったのか、ユーザは一目で確認することができる。 (Appendix 6)
5. The program according to appendix 4, wherein the program causes the processor to execute a presenting step (S525) of presenting the summary text generated in the summarizing step in association with the just-one group of sections identified in the section identifying step (S525).
As a result, the user can confirm at a glance what topic the speaker has communicated about in the dialogue.

（付記７）
プログラムは、プロセッサに、区間特定ステップにおいて特定された第１区間群を、第１トピックと関連づけて提示する提示ステップ（Ｓ５２５）と、を実行させる付記４記載のプログラム。
これにより、対話において話者がどのような話題（トピック）に関してコミュニケーションを行ったのか、ユーザは一目で確認することができる。 (Appendix 7)
4. The program according to appendix 4, wherein the program causes the processor to execute a presenting step (S525) of presenting the first segment group identified in the segment identifying step in association with the first topic (S525).
As a result, the user can confirm at a glance what topic the speaker has communicated about in the dialogue.

（付記８）
提示ステップ（Ｓ５２５）は、受付ステップにおいて受け付けた音声データを解析することにより得られる、話者による発話状況の時系列推移を示す音声グラフにおいて、区間特定ステップにおいて特定された第１区間群を音声グラフと同じ時系列軸上に提示するとともに、第１トピックを第１区間群に関連づけて提示するステップである、付記７記載のプログラム。
これにより、対話において話者が行った発話状況を時系列的に示す音声グラフと重ねて、話者がどのような話題について発話を行ったのか、ユーザは一目で確認することができる。 (Appendix 8)
The presenting step (S525) presents the first section group identified in the section identifying step in the speech graph showing the chronological transition of the utterance situation of the speaker, which is obtained by analyzing the speech data received in the receiving step. 8. The program according to appendix 7, wherein the step of presenting on the same chronological axis as the graph and presenting the first topic in association with the first section group.
As a result, the user can confirm at a glance what topic the speaker has spoken about by superimposing the situation of the speaker's utterance in the dialogue on the speech graph showing the chronological order.

（付記９）
プログラムは、プロセッサに、第１ユーザから１または複数のキーワードを受け付けるキーワード受付ステップ（Ｓ５０２）と、キーワード受付ステップにおいて受け付けた１または複数のキーワードを、所定の話題に関する第１トピックと関連づけて記憶するトピック記憶ステップ（Ｓ５０３）と、を実行させる、付記１記載のプログラム。
これにより、ユーザが自身で予めキーワードと関連づけて記憶させたトピックに基づき、対話において話者がどのような話題に関してコミュニケーションを行ったのか、一目で確認することができる。 (Appendix 9)
The program stores, in the processor, a keyword receiving step (S502) of receiving one or more keywords from a first user, and the one or more keywords received in the keyword receiving step in association with a first topic related to a predetermined topic. The program according to appendix 1, causing the topic storage step (S503) to be executed.
With this, it is possible to confirm at a glance what topic the speaker has communicated about in the dialogue based on the topic that the user has stored in advance in association with the keyword.

（付記１０）
プログラムは、プロセッサに、受付ステップにおいて受け付けた音声データを記憶する音声記憶ステップ（Ｓ５１２）と、音声記憶ステップにおいて記憶された音声データに基づき、第１トピックに新たに関連づける１または複数の新たなキーワードを第１ユーザに対して提示するキーワード提示ステップ（Ｓ５０１）と、を実行させ、キーワード受付ステップ（Ｓ５０２）は、キーワード提示ステップにおいて第１ユーザに対して提示された複数の新たなキーワードのうち、第１ユーザにより選択された１または複数のキーワードを受け付けるステップである、付記９記載のプログラム。
これにより、ユーザは過去の対話情報において用いられたキーワードに基づき、トピックに新たに関連づけるのが好ましい１または複数の新たなキーワードの提示を受けることができる。ユーザは、簡単にトピックを定義し、記憶することができる。 (Appendix 10)
The program instructs the processor to store a voice data received in the receiving step (S512), and one or more new keywords to be newly associated with the first topic based on the voice data stored in the voice storing step. and a keyword presentation step (S501) of presenting to the first user, and a keyword reception step (S502) includes, among the plurality of new keywords presented to the first user in the keyword presentation step, 10. The program of Clause 9, the step of accepting one or more keywords selected by the first user.
Thereby, the user can receive presentation of one or more new keywords that are preferably newly associated with the topic, based on the keywords used in the past dialogue information. Users can easily define and memorize topics.

（付記１１）
音声抽出ステップ（Ｓ５１３）は、対話が終了する前に、受付ステップにおいて受け付けた音声データから、発話区間ごとに複数の区間音声データを抽出するステップであり、関連度算定ステップ（Ｓ５１４）は、対話が終了する前に、複数の区間音声データに含まれる区間音声データごとに、第１トピックとの関連度を示す第１関連度を算定するステップである、付記４記載のプログラム。
これにより、区間音声データとトピックとの関連度の算定を対話中にリアルタイムに実行することができる。例えば、商談中に、話者がどのような話題に関してコミュニケーションを行っているのか確認することができる。 (Appendix 11)
The voice extracting step (S513) is a step of extracting a plurality of segment voice data for each utterance segment from the voice data received in the receiving step before the dialogue ends. 4. The program according to Supplementary Note 4, which is a step of calculating a first degree of relevance indicating a degree of relevance to the first topic for each segmental audio data included in the plurality of segmental audio data before the end of.
As a result, it is possible to calculate the degree of relevance between the segmental audio data and the topic in real time during the dialogue. For example, during business negotiations, it is possible to confirm what topic the speaker is communicating about.

（付記１２）
関連度算定ステップ（Ｓ５１４）は、複数の区間音声データごとに、それぞれ複数のキーワードと関連づけられた複数のトピックごとの関連度を算定するステップであり、プログラムは、プロセッサに、関連度算定ステップにおいて算定された複数のトピックごとの関連度に基づき、対話に対する応対メモを特定するメモ特定ステップ（Ｓ５１６）と、メモ特定ステップにおいて特定された応対メモを、対話と関連づけて記憶する記憶ステップ（Ｓ５１６）と、を実行させる付記４記載のプログラム。
これにより、対話全体を特徴づけるトピックを特定し、当該トピックに関する応対メモを対話に対して付与することにより、対話情報を管理することができる。 (Appendix 12)
The degree-of-relevance calculation step (S514) is a step of calculating the degrees of relevance for each of a plurality of topics associated with a plurality of keywords for each of the plurality of segmental audio data. A memo specifying step (S516) of specifying a response memo to the dialogue based on the calculated degrees of relevance for each of the plurality of topics, and a storing step (S516) of storing the response memo identified in the memo identifying step in association with the dialogue. and the program according to appendix 4.
Accordingly, dialogue information can be managed by specifying a topic that characterizes the entire dialogue and adding a response memo on the topic to the dialogue.

（付記１３）
関連度算定ステップ（Ｓ５１４）は、第１トピックに関連づけられた複数のキーワードのうち、音声抽出ステップにおいて抽出された複数の区間音声データに多く含まれるキーワードほど関連度へ与える重みが小さくなるようにし、複数の区間音声データごとに第１トピックに関連づけられた複数のキーワードの重み付けを考慮した一致度を、第１トピックとの関連度を示す第１関連度として算定する、付記４記載のプログラム。
これにより、トピックに関連づけられたキーワードのうち、多くの区間音声データに含まれるありふれたキーワードの重みを小さくすることができる。特定の区間音声データに出現するキーワードの重要度が高まることにより、区間音声データとトピックとの関連度をより正確に算定することができる。 (Appendix 13)
In the degree-of-relevance calculation step (S514), among the plurality of keywords associated with the first topic, the more keywords are included in the plurality of segmental speech data extracted in the speech extraction step, the smaller the weight given to the degree of relevance. 4. The program according to Supplementary Note 4, wherein the degree of matching in consideration of the weighting of the plurality of keywords associated with the first topic for each of the plurality of segmental speech data is calculated as the first degree of relevance indicating the degree of relevance with the first topic.
As a result, among the keywords associated with the topic, the weight of common keywords contained in many segmental speech data can be reduced. By increasing the importance of keywords appearing in specific segmental audio data, it is possible to more accurately calculate the degree of relevance between the segmental audio data and the topic.

（付記１４）
関連度算定ステップ（Ｓ５１４）は、第１トピックに関連づけられた複数のキーワードのうち、第１関連度の算定対象となる対象区間音声データから時系列的に所定個数前までの複数の区間音声データに多く含まれるキーワードほど関連度へ与える重みが小さくなるようにし、複数の区間音声データごとに第１トピックに関連づけられた複数のキーワードとの重み付けを考慮した一致度を、第１トピックとの関連度を示す第１関連度として算定する、付記１３記載のプログラム。
これにより、トピックに関連づけられたキーワードのうち、対象となる区間音声データ近傍の複数の過去の区間音声データのみを考慮してより少ない計算量で、区間音声データとトピックとの関連度をより正確に算定することができる。また、トピックとの関連度をリアルタイムで計算することができる。 (Appendix 14)
The degree-of-relevance calculation step (S514) includes, among the plurality of keywords associated with the first topic, a plurality of pieces of segmental speech data up to a predetermined number in chronological order from the target segmental speech data for which the first degree of relevance is to be calculated. The weight given to the degree of relevance becomes smaller as the number of keywords included in the first topic increases. 14. The program according to appendix 13, wherein the program is calculated as the first degree of relevance indicating the degree of relevance.
As a result, among the keywords associated with the topic, considering only a plurality of past segmental audio data near the target segmental audio data, the degree of relevance between the segmental audio data and the topic can be determined more accurately with a smaller amount of calculation. can be calculated to Also, the degree of relevance to a topic can be calculated in real time.

（付記１５）
区間特定ステップ（Ｓ５２５）は、時系列的に並べられた複数の区間音声データのそれぞれに対して算定された第１関連度に基づき移動平均を算定するステップと、算定された移動平均が所定値以上の区間音声データを、第１区間群として特定するステップと、を含む、付記４記載のプログラム。
これにより、発話区間ごとに関連度が高いトピックが短期間で切り替わる場合においても、トピックの関連度を平滑化することにより、トピックについて言及している区間群をまとめて特定することができる。対話において、話者がどのような話題について発話を行ったのか、ユーザはより確認しやすくなる。 (Appendix 15)
The section specifying step (S525) includes a step of calculating a moving average based on the first relevance calculated for each of the plurality of section audio data arranged in chronological order; A program according to appendix 4, comprising the step of identifying the above segmental audio data as the first segment group.
As a result, even when topics with high relevance for each utterance segment change in a short period of time, by smoothing the relevance of the topic, it is possible to collectively identify a segment group referring to the topic. It becomes easier for the user to confirm what topic the speaker spoke about in the dialogue.

（付記１６）
区間特定ステップ（Ｓ５２５）は、時系列的に並べられた複数の区間音声データのうち、算定された第１関連度が所定値以上の連続する複数の区間音声データを、第１区間群として特定するステップである、付記４記載のプログラム。
これにより、特定のトピックについて連続して関連度が高い区間音声データを、トピックについて言及している区間群としてまとめて特定することができる。対話において、話者がどのような話題について発話を行ったのか、ユーザはより確認しやすくなる。 (Appendix 16)
The section identifying step (S525) identifies, as a first section group, a plurality of continuous section sound data having a calculated first degree of association equal to or higher than a predetermined value among the plurality of section sound data arranged in chronological order. The program according to appendix 4, which is a step of
As a result, it is possible to collectively identify the segment audio data continuously having a high degree of relevance to a specific topic as a segment group referring to the topic. It becomes easier for the user to confirm what topic the speaker spoke about in the dialogue.

（付記１７）
プロセッサと、記憶部とを備える情報処理装置であって、プロセッサは、付記１から１６のいずれか記載のプログラムを実行する、情報処理装置。
これにより、対話において話者がどのような話題（トピック）に関してコミュニケーションを行ったのか、ユーザは一目で確認することができる。 (Appendix 17)
An information processing apparatus comprising a processor and a storage unit, wherein the processor executes the program according to any one of Appendixes 1 to 16.
As a result, the user can confirm at a glance what topic the speaker has communicated about in the dialogue.

（付記１８）
プロセッサと、記憶部とを備える情報処理装置を含む情報処理システムであって、プロセッサは、付記１から１６のいずれか記載のプログラムを実行する、情報処理システム。
これにより、対話において話者がどのような話題（トピック）に関してコミュニケーションを行ったのか、ユーザは一目で確認することができる。 (Appendix 18)
17. An information processing system including an information processing device comprising a processor and a storage unit, wherein the processor executes the program according to any one of appendices 1 to 16.
As a result, the user can confirm at a glance what topic the speaker has communicated about in the dialogue.

（付記１９）
プロセッサと、記憶部とを備えるコンピュータにより実行される情報処理方法であって、プロセッサに、付記１から１６のいずれか記載のプログラムを実行させる、情報処理方法。
これにより、対話において話者がどのような話題（トピック）に関してコミュニケーションを行ったのか、ユーザは一目で確認することができる。 (Appendix 19)
17. An information processing method executed by a computer comprising a processor and a storage unit, the information processing method comprising causing the processor to execute the program according to any one of Appendices 1 to 16.
As a result, the user can confirm at a glance what topic the speaker has communicated about in the dialogue.

（付記２０）
プロセッサと、表示装置とを備える情報処理端末であって、プロセッサは、付記５から８のいずれか記載のプログラムを実行可能な情報処理装置において実行される提示ステップにより提示された情報を表示装置に表示可能である、情報処理端末。
これにより、対話において話者がどのような話題（トピック）に関してコミュニケーションを行ったのか、ユーザは一目で確認することができる。 (Appendix 20)
An information processing terminal comprising a processor and a display device, wherein the processor displays information presented by a presentation step executed in the information processing device capable of executing the program according to any one of appendices 5 to 8 on the display device. Information processing terminal capable of display.
As a result, the user can confirm at a glance what topic the speaker has communicated about in the dialogue.

１システム、１０サーバ、１０１記憶部、１０４制御部、１０６入力装置、１０８出力装置、２０第１ユーザ端末、２０１記憶部、２０４制御部、２０６入力装置、２０８出力装置、３０第２ユーザ端末、３０１記憶部、３０４制御部、３０６入力装置、３０８出力装置、５０ＣＲＭシステム、５０１記憶部、５０４制御部、５０６入力装置、５０８出力装置、６０音声サーバ（ＰＢＸ）、６０１記憶部、６０４制御部、６０６入力装置、６０８出力装置
1 system, 10 server, 101 storage unit, 104 control unit, 106 input device, 108 output device, 20 first user terminal, 201 storage unit, 204 control unit, 206 input device, 208 output device, 30 second user terminal, 301 storage unit, 304 control unit, 306 input device, 308 output device, 50 CRM system, 501 storage unit, 504 control unit, 506 input device, 508 output device, 60 voice server (PBX), 601 storage unit, 604 control unit , 606 input device, 608 output device

Claims

A program, comprising a processor and a storage unit, for causing a computer to process information relating to interaction between a first user and a second user,
The program causes the processor to:
a receiving step of receiving audio data relating to the dialogue;
a voice extracting step of extracting a plurality of segmental voice data for each utterance segment from the voice data received in the receiving step;
a section identification step of identifying one or a plurality of section sound data related to a first topic related to a predetermined topic among the plurality of section sound data;
Text information contained in the one or more section audio data based on the one or more section audio data identified in the section identifying step and the first topic among the plurality of section audio data, a summarizing step to generate a summarized summary text;
a keyword receiving step of receiving one or more keywords from the first user;
a topic storing step of storing the one or more keywords received in the keyword receiving step in association with the first topic related to a predetermined topic;
program to run.

The summarizing step extracts the text information contained in the one or more segmental speech data by extracting portions highly relevant to the first topic from the text information contained in the one or more segmental speech data. is the step of generating a summary text that summarizes the
A program according to claim 1.

The summarizing step generates the summary text by applying text information included in the one or more segmental speech data and the plurality of keywords associated with the first topic as input data to a learning model. is the step to
A program according to claim 1.

The program causes the processor to:
a degree-of-relevance calculation step of calculating a first degree of relevance indicating a degree of relevance to the first topic for each of the plurality of segmental audio data;
and
The section identification step identifies a first section group including one or a plurality of section sound data whose first relevance calculated in the relevance calculation step is equal to or greater than a predetermined value, among the plurality of section sound data. is a step to
The summarizing step is included in the one or more segmental audio data based on the one or more segmental audio data included in the first segment group identified in the segment identifying step and the first topic. generating a summary text that summarizes the textual information;
A program according to claim 1.

The program causes the processor to:
a presenting step of presenting the summarized text generated in the summarizing step in association with the one or more segmental speech data;
2. The program according to claim 1, causing the execution of

The program causes the processor to:
a presenting step of presenting the summary text generated in the summarizing step in association with the first section group identified in the section identifying step;
5. The program according to claim 4, causing the execution of

The program causes the processor to:
a presentation step of presenting the first segment group identified in the segment identification step in association with the first topic;
5. The program according to claim 4, causing the execution of

In the presenting step, the first section group identified in the section identifying step in the speech graph showing the time-series transition of the utterance situation of the speaker, which is obtained by analyzing the speech data received in the receiving step. is presented on the same chronological axis as the speech graph, and the first topic is presented in association with the first segment group,
8. A program according to claim 7.

The program causes the processor to:
a voice storage step of storing the voice data received in the receiving step;
a keyword presenting step of presenting one or more new keywords to be newly associated with the first topic to the first user based on the voice data stored in the voice storing step;
and
The keyword accepting step is a step of accepting one or more keywords selected by the first user from among the plurality of new keywords presented to the first user in the keyword presenting step.
A program according to claim 1 .

The voice extracting step is a step of extracting a plurality of segment voice data for each utterance segment from the voice data received in the receiving step before the dialogue ends,
The degree-of-relevance calculating step is a step of calculating, before the dialogue ends, the first degree of relevance indicating the degree of relevance to the first topic for each segmental audio data included in the plurality of segmental audio data. be,
5. The program according to claim 4.

The relevance calculating step is a step of calculating, for each of the plurality of segmental audio data, the relevance of each of a plurality of topics associated with a plurality of keywords,
The program causes the processor to:
a memo identifying step of identifying a response memo to the dialogue based on the degrees of relevance for each of the plurality of topics calculated in the relevance calculating step;
a storing step of storing the response memo identified in the memo identifying step in association with the dialogue;
5. The program according to claim 4, causing the execution of

The relevance calculation step includes:
among the plurality of keywords associated with the first topic, the more keywords are included in the plurality of segmental speech data extracted in the speech extraction step, the smaller the weight given to the degree of association;
Calculating a matching degree considering the weighting of the plurality of keywords associated with the first topic for each of the plurality of segmental audio data as the first degree of relevance indicating the degree of relevance with the first topic;
5. The program according to claim 4.

The relevance calculation step includes:
Of the plurality of keywords associated with the first topic, the more keywords are included in the plurality of segmental audio data up to a predetermined number in chronological order from the target segmental audio data for which the first relevance is to be calculated, the more relevant the keyword is. so that the weight given to degrees is small,
Calculating the degree of matching with the plurality of keywords associated with the first topic for each of the plurality of segmental audio data in consideration of the weighting as the first degree of association indicating the degree of association with the first topic ,
13. A program according to claim 12 .

The section identification step includes:
calculating a moving average based on the first relevance calculated for each of the plurality of segment audio data arranged in time series;
identifying, as the first segment group, the segment audio data for which the calculated moving average is equal to or greater than a predetermined value;
including,
5. The program according to claim 4.

The section identification step includes:
A step of specifying, as the first segment group, a plurality of continuous segmental audio data having a calculated first degree of association equal to or greater than a predetermined value among the plurality of segmental audio data arranged in time series. ,
5. The program according to claim 4.

An information processing device comprising a processor and a storage unit,
The processor executes the program according to any one of claims 1 to 15 ,
Information processing equipment.

An information processing system including an information processing device comprising a processor and a storage unit,
The processor executes the program according to any one of claims 1 to 15 ,
Information processing system.

An information processing method executed by a computer comprising a processor and a storage unit,
causing the processor to execute the program according to any one of claims 1 to 15 ;
Information processing methods.