JP6524242B2

JP6524242B2 - Speech recognition result display device, speech recognition result display method, speech recognition result display program

Info

Publication number: JP6524242B2
Application number: JP2017538034A
Authority: JP
Inventors: 孝彦中野
Original assignee: Toshiba Corp; Toshiba Digital Solutions Corp
Current assignee: Toshiba Corp; Toshiba Digital Solutions Corp
Priority date: 2015-08-31
Filing date: 2016-08-30
Publication date: 2019-06-05
Anticipated expiration: 2036-08-30
Also published as: WO2017038794A1; JPWO2017038794A1

Description

本発明の実施形態は音声認識結果表示装置、音声認識結果表示方法、音声認識結果表示プログラムに関する。 Embodiments of the present invention relate to a voice recognition result display device, a voice recognition result display method, and a voice recognition result display program.

従来より、クライアント（スマートフォン、PCなど）端末から入力された音声データを音声認識してテキストに変換し、クライアント端末の表示画面に認識結果として表示する技術が提供されている。例えば会議等における発話を時系列に時間軸と共に表示したり、発話区間と無音区間を区別して表示する技術も提供されている。 2. Description of the Related Art Conventionally, there has been provided a technology for performing voice recognition of voice data input from a client (smart phone, PC, etc.) terminal, converting it into text, and displaying it as a recognition result on a display screen of the client terminal. For example, techniques for displaying speech in a meeting or the like in time series with a time axis, or displaying a speech section and a silent section separately are also provided.

特許第５６８５７０２号公報Patent No. 5685702

本発明が解決しようとする課題は、過去に行われた一連の発話に関して、自動抽出されたキーワードをベースに、各時間帯の会話のテーマ、議題などを簡易な方法で確認することができる音声認識結果表示装置を提供することである。 The problem to be solved by the present invention relates to a voice which can confirm the theme of the conversation of each time zone, the agenda, etc. by a simple method on the basis of automatically extracted keywords regarding a series of utterances performed in the past. It is providing a recognition result display device.

実施形態の音声認識結果表示装置は、音声データに対する音声認識処理の結果である音声テキストデータから、所定のタイミングで音声テキストデータに含まれる文字列を抽出するキーワード抽出手段と、音声テキストデータ、キーワード抽出手段で抽出した文字列を記録する記憶部と、クライアント端末からの音声認識結果の表示要求に基づき、クライアント端末から指定された所定時間における音声テキストデータから抽出した文字列を記憶部から検索するキーワード検索手段とを有し、キーワード検索手段で検索された文字列をキーワードとしてクライアント端末画面に表示する。 The speech recognition result display device according to the embodiment includes a keyword extraction unit for extracting a character string included in speech text data at a predetermined timing from speech text data which is a result of speech recognition processing on speech data, speech text data, keyword Based on the storage unit for recording the character string extracted by the extraction means and the display request of the speech recognition result from the client terminal, the storage unit is searched for the character string extracted from the voice text data in the predetermined time specified from the client terminal A keyword search unit is provided, and the character string searched by the keyword search unit is displayed on the client terminal screen as a keyword.

第１の実施形態に係る音声認識結果表示システムの全体構成を示すブロック図。BRIEF DESCRIPTION OF THE DRAWINGS The block diagram which shows the whole structure of the speech recognition result display system which concerns on 1st Embodiment. 第１の実施形態に係るユーザー端末の画面表示の一例を示す図。The figure which shows an example of the screen display of the user terminal which concerns on 1st Embodiment. 第１の実施形態に係る音声データおよび関連情報の一例を示す図。FIG. 2 is a view showing an example of voice data and related information according to the first embodiment. 第１の実施形態に係る音声認識結果から抽出したキーワードの一例を示す図。FIG. 5 is a view showing an example of keywords extracted from a speech recognition result according to the first embodiment. 第１の実施形態に係るユーザー端末の画面に表示されるキーワード表示の一例を示す図。The figure which shows an example of the keyword display displayed on the screen of the user terminal which concerns on 1st Embodiment. 第１の実施形態に係る音声認識結果から抽出したキーワードの一例を示す図。FIG. 5 is a view showing an example of keywords extracted from a speech recognition result according to the first embodiment. 第１の実施形態に係るユーザー端末の画面に表示されるキーワード表示の一例を示す図。The figure which shows an example of the keyword display displayed on the screen of the user terminal which concerns on 1st Embodiment. 第１の実施形態に係る音声認識結果から抽出したキーワードの中から表示対象外とするキーワードの一例を示す図。FIG. 6 is a view showing an example of keywords to be excluded from display among keywords extracted from a speech recognition result according to the first embodiment. 第１の実施形態に係るユーザー端末の画面に表示されるキーワード表示の一例を示す図。The figure which shows an example of the keyword display displayed on the screen of the user terminal which concerns on 1st Embodiment. 第１の実施形態に係る音声認識結果表示システムの処理フローを示す図。The figure which shows the processing flow of the speech recognition result display system which concerns on 1st Embodiment. 第１の実施形態に係るユーザー端末の画面に表示されるキーワード表示の一例を示す図。The figure which shows an example of the keyword display displayed on the screen of the user terminal which concerns on 1st Embodiment. 第１の実施形態に係るユーザー端末の画面に表示される音声認識結果の一例を示す図。The figure which shows an example of the speech recognition result displayed on the screen of the user terminal which concerns on 1st Embodiment. 第１の実施形態に係るユーザー端末の画面に表示される音声認識結果の一例を示す図。The figure which shows an example of the speech recognition result displayed on the screen of the user terminal which concerns on 1st Embodiment. 第１の実施形態に係るユーザー端末の画面に表示される音声認識結果の一例を示す図。The figure which shows an example of the speech recognition result displayed on the screen of the user terminal which concerns on 1st Embodiment. 第１の実施形態に係る声認識結果の音声テキストデータ表示処理のフロー示す図。The figure which shows the flow of the audio | voice text data display process of the voice recognition result which concerns on 1st Embodiment. 第１の実施形態に係るユーザー端末の画面に表示される音声認識結果の一例を示す図。The figure which shows an example of the speech recognition result displayed on the screen of the user terminal which concerns on 1st Embodiment.

（第１の実施形態）
以下、本発明の第１の実施形態について図面を参照して説明する。First Embodiment
Hereinafter, a first embodiment of the present invention will be described with reference to the drawings.

図１は第１の実施形態に係る音声認識結果表示システムの全体構成を示すブロック図である。図１に示すように、音声認識表示システムは、ユーザー端末（クライアント端末）１００、音声認識結果表示装置２００、音声認識サーバー３００で構成され、それぞれネットワーク４００を介して接続される。音声認識サーバー３００は、入力部３０１、音声認識処理部３０２、出力部３０３を備え、ユーザー端末１００から入力された音声データを音声認識結果表示装置２００を介して受信し、音声認識処理部３０２で処理した音声認識結果（音声テキストデータ）を音声認識結果表示装置２００に送信する。 FIG. 1 is a block diagram showing the overall configuration of a speech recognition result display system according to the first embodiment. As shown in FIG. 1, the voice recognition display system includes a user terminal (client terminal) 100, a voice recognition result display device 200, and a voice recognition server 300, which are connected via a network 400, respectively. The voice recognition server 300 includes an input unit 301, a voice recognition processing unit 302, and an output unit 303, receives voice data input from the user terminal 100 via the voice recognition result display device 200, and the voice recognition processing unit 302 The processed speech recognition result (speech text data) is transmitted to the speech recognition result display device 200.

ユーザー端末１００は、例えばスマートフォン、タブレット、ＰＣ等であり、入出力部１０１、表示制御部１０２、表示部１０３を備える。入出力部１０１は図示しないマイクからの音声、表示部１０３に表示する音声認識結果の送信要求、当該要求に基づき画面表示された音声認識結果に対する表示切替え等の要求を音声認識結果表示装置２００に送信する。表示部１０３は、入出力部１０１から音声認識結果表示装置２００に送信した音声に対する認識結果を表示する。 The user terminal 100 is, for example, a smartphone, a tablet, a PC or the like, and includes an input / output unit 101, a display control unit 102, and a display unit 103. The input / output unit 101 sends to the voice recognition result display device 200 a request for sending voice from a microphone (not shown), a request for sending voice recognition results to be displayed on the display unit 103, and switching the voice recognition result displayed on the screen based on the request Send. The display unit 103 displays the recognition result of the voice transmitted from the input / output unit 101 to the voice recognition result display device 200.

入出力部１０１から音声認識結果表示装置２００に送信される音声データは、発話者を特定するための識別情報及び発話時間と共に送信される。また、入出力部１０１から送信される表示部１０３に表示する音声認識結果の送信要求情報には、表示要求範囲を示す日時が含まれる。この表示要求範囲を示す日時は、図２に示すようなユーザー端末１００を起動した際に表示される画面に対するユーザー(クライアント)操作に基づいて取得する。例えば、ユーザー端末１００に内蔵される時計に基づき起動した時点を起点とする時間軸を表示し、表示された時間軸に対するユーザー操作に基づいて、表示要求範囲を示す日時を取得することができる。 The voice data transmitted from the input / output unit 101 to the voice recognition result display device 200 is sent together with identification information for specifying a speaker and a speech time. The transmission request information of the speech recognition result to be displayed on the display unit 103 transmitted from the input / output unit 101 includes the date and time indicating the display request range. The date and time indicating the display request range is acquired based on the user (client) operation on the screen displayed when the user terminal 100 as shown in FIG. 2 is activated. For example, it is possible to display a time axis starting from the time of activation based on a clock built in the user terminal 100, and to obtain the date and time indicating the display request range based on the user operation on the displayed time axis.

図２はユーザー端末１００が起動された時に表示される画面の一例である。図２に示す表示例において、画面５００に表示されている時間軸５０１は、２０１５年７月３１日の１３時から１５時となっており、ユーザーがポインタ５０２の操作で指定した時間は、１４時となっている。ユーザー端末１００はユーザーが１４時を指定するポインタ５０２の操作に伴い、操作した時点で画面５００に表示されている時間軸５０１から音声認識結果表示装置２００に要求する表示要求範囲を特定し、表示要求範囲を特定する時間情報を入出力部１０１から音声認識結果装置２００に送信する。図２の例においては、表示範囲を特定する時間情報は、２０１５年７月３１日の１３時から１５時となる。尚、図２に示す時間軸５０１の表示範囲はユーザー端末１００における操作により１日、半日、１時間等所定の範囲に切替えができる。図２では、ユーザー端末１００における表示範囲に対する操作が２時間の例となっている。 FIG. 2 is an example of a screen displayed when the user terminal 100 is activated. In the display example shown in FIG. 2, the time axis 501 displayed on the screen 500 is from 13 o'clock to 15 o'clock on July 31, 2015, and the time designated by the user by operating the pointer 502 is 14 It has become time. The user terminal 100 specifies a display request range to be requested to the voice recognition result display device 200 from the time axis 501 displayed on the screen 500 at the time of operation by the user operating the pointer 502 specifying 14 o'clock. Time information for specifying the required range is transmitted from the input / output unit 101 to the speech recognition result device 200. In the example of FIG. 2, the time information for specifying the display range is from 13:00 to 15:00 on July 31, 2015. The display range of the time axis 501 shown in FIG. 2 can be switched to a predetermined range such as one day, half a day, one hour, etc. by an operation on the user terminal 100. In FIG. 2, the operation for the display range in the user terminal 100 is an example of two hours.

表示制御部１０２は、入出力部１０１を介して音声認識結果表示装置２００に送信要求した音声認識結果を表示部１０３に表示するために制御する。本実施形態では、音声認識結果表示装置２００から表示対象情報として受信したキーワードや表示位置を示す情報に基づき表示内容や表示位置を調整し表示部１０３に表示する。 The display control unit 102 controls to display on the display unit 103 the speech recognition result requested to be transmitted to the speech recognition result display device 200 via the input / output unit 101. In the present embodiment, the display content and the display position are adjusted based on the keyword received as the display target information from the voice recognition result display device 200 and the information indicating the display position, and the display unit 103 displays the adjusted display content and the display position.

音声認識結果表示装置２００は、音声データ入力部２０１、音声データ出力部２０２、キーワード抽出部２０３、キーワード重要度算出部２０４、記憶部２０５、キーワード検索部２０６を備える。 The voice recognition result display device 200 includes a voice data input unit 201, a voice data output unit 202, a keyword extraction unit 203, a keyword importance degree calculation unit 204, a storage unit 205, and a keyword search unit 206.

音声データ入力部２０１は、ユーザー端末１００の入出力部１０１から送信された音声データを受信し、記憶部２０５に登録すると共に、音声認識サーバー３００で音声認識処理をするために音声データを音声認識サーバー３００に送信する。また、音声認識サーバー３００の音声認識処理部３０２で処理された結果（音声テキストデータ）を受信し、記憶部２０５に登録する。 The voice data input unit 201 receives voice data transmitted from the input / output unit 101 of the user terminal 100 and registers the voice data in the storage unit 205 and performs voice recognition on the voice data for the voice recognition server 300 to perform voice recognition processing. Send to server 300. In addition, the result (voice text data) processed by the voice recognition processing unit 302 of the voice recognition server 300 is received and registered in the storage unit 205.

図３は、記憶部２０５に登録される音声データの登録例を示す図である。図３に示すように、ユーザー端末１００から入力された音声データは発話者を特定するための識別情報及び発話時間と共に登録される。更に、音声認識サーバー３００で認識処理された結果が対応付けて登録される。 FIG. 3 is a diagram showing an example of registration of voice data registered in the storage unit 205. As shown in FIG. As shown in FIG. 3, voice data input from the user terminal 100 is registered together with identification information for specifying a speaker and speech time. Furthermore, the result of the recognition process performed by the speech recognition server 300 is registered in association.

音声データ出力部２０２は、音声データ入力部２０１を介して記憶部２０５に登録された音声データを、ユーザー端末１００からの要求に応じて検索して取得し、ユーザー端末１００に送信する。ユーザー端末１００は、音声データを受信することで音声の再生が可能となる。 The voice data output unit 202 searches and obtains voice data registered in the storage unit 205 via the voice data input unit 201 in response to a request from the user terminal 100, and transmits the voice data to the user terminal 100. The user terminal 100 can reproduce voice by receiving voice data.

キーワード抽出部２０３は、記憶部２０５に登録された音声認識結果に含まれる文字列を抽出し、ユーザー端末１００に表示するキーワードとなる文字列を特定する。抽出する文字列の品詞は、キーワードになり得る品詞であればよく、例えば名詞や動詞が該当する。 The keyword extraction unit 203 extracts a character string included in the speech recognition result registered in the storage unit 205, and specifies a character string to be a keyword to be displayed on the user terminal 100. The part of speech of the character string to be extracted may be a part of speech that can be a keyword, for example, a noun or a verb.

図４はキーワード抽出の結果を示す図である。キーワードとなる文字列の抽出は、予め設定された時間間隔（１日に１回、半日に１回、１時間に１回など）でキーワード抽出部２０３が実行する。キーワード抽出の実行結果は、図４に示すように、抽出した文字列をキーワードとし、抽出した時間と共に記憶部２０５に記録する。図４は２０１５年７月３１日の１３時から１５時の間に抽出されたキーワードとなる文字列を記録した例である。図４に示すように、東京という文字列が２０１５年７月３１日の１３時５分に抽出されたことを示している。宿題という文字列は２０１５年７月３１日の１４時４０分、設定という文字列は２０１５年７月３１日の１３時１５分、教育という文字列は２０１５年７月３１日の１４時にそれぞれ抽出されたことを示している。例えば、東京という文字列がキーワード抽出部２０３の１回の実行動作で複数回抽出された場合は、都度記録するのではなく出現回数をカウントし、後述するキーワード重要度算出部２０５により重みに反映する。 FIG. 4 is a diagram showing the result of keyword extraction. Extraction of a character string to be a keyword is executed by the keyword extraction unit 203 at preset time intervals (once a day, once a half day, once a hour, etc.). The execution result of the keyword extraction is recorded in the storage unit 205 together with the extracted time with the extracted character string as a keyword as shown in FIG. FIG. 4 shows an example in which a character string as a keyword extracted between 13 o'clock and 15 o'clock on July 31, 2015 is recorded. As shown in FIG. 4, it indicates that the character string “Tokyo” is extracted at 13: 5 on July 31, 2015. The string for homework is extracted at 14:40 on July 31, 2015. The string for setting is extracted at 13:15 on July 31, 2015. The string for education is extracted at 14:00 on July 31, 2015. It shows that it was done. For example, when the character string "Tokyo" is extracted a plurality of times by one execution operation of the keyword extracting unit 203, the number of appearances is counted instead of recording each time, and the keyword importance calculating unit 205 described later reflects the weight. Do.

また、キーワード抽出の検索範囲は上記のキーワード抽出部の実行のタイミングとの関係により、検索対象が１日分、半日分、１時間分等様々となる。 In addition, the search range for keyword extraction varies depending on the relationship with the execution timing of the keyword extraction unit described above, such as one day, half a day, and one hour for the search target.

ここではキーワード抽出部２０３が記憶部２０５に登録された音声データの認識結果から自動的に文字列を抽出してキーワードを特定する方法を説明したが、ユーザーが予め特定のキーワードを登録し、キーワード抽出部２０３を実行した際に、ユーザーにより登録されたキーワードが含まれるか否かを検索し、上記と同様、キーワードの出現回数をカウントするようにしてもよい。 Here, a method has been described in which the keyword extraction unit 203 automatically extracts a character string from the recognition result of voice data registered in the storage unit 205 to specify a keyword, but the user registers a specific keyword in advance. When the extraction unit 203 is executed, it may be searched whether or not the keyword registered by the user is included, and the number of appearances of the keyword may be counted as described above.

キーワード重要度算出部２０４は、キーワード抽出部２０３で抽出した文字列の出現回数に基づき重みを計算し、算出した結果を図４に示すように、抽出された文字列に対する重み付けとして設定する。この重み付けは、出現回数のみでなく例えば、キーワード抽出部２０３の処理において、所定時間分の音声データのうち１つの発話の中に同じ文字列が複数含まれている場合や、一人のユーザーの発話のみに含むのではなく、複数人の発話に同じ文字列が含まれる場合や、所定時間分の音声データの全てに同じ文字列が含まれる場合等、発話の中の出現場所も加味してもよい。出現頻度に基づき重み付けをすることで抽出した文字列の中から重要なキーワードの抽出が可能となる。尚、キーワードの抽出と同様に、重み付けについてもユーザーが設定するようにしてもよい。 The keyword importance degree calculation unit 204 calculates a weight based on the appearance frequency of the character string extracted by the keyword extraction unit 203, and sets the calculated result as a weight to the extracted character string as shown in FIG. This weighting is not limited to the number of occurrences, but, for example, in the processing of the keyword extraction unit 203, a plurality of the same character strings are included in one utterance of voice data for a predetermined time, or an utterance of one user If the same character string is included in the utterances of a plurality of people, or if the same character string is included in all of the voice data for a predetermined time, the appearance location in the utterance may also be taken into consideration. Good. By weighting based on the appearance frequency, it becomes possible to extract important keywords from the extracted character string. As in the case of the keyword extraction, the user may set weighting.

記憶部２０５は、音声データ入力部２０１から入力された音声データ、音声認識サーバー３００で処理された音声認識結果のテキストデータ、キーワード抽出部２０３で抽出されたキーワード、キーワード重要度算出部２０４等で算出した重みを記録する。 The storage unit 205 includes voice data input from the voice data input unit 201, text data of voice recognition results processed by the voice recognition server 300, keywords extracted by the keyword extracting unit 203, and a keyword importance degree calculating unit 204. Record the calculated weights.

キーワード検索部２０６は、ユーザー端末１００の入出力部１０１からの音声認識結果の表示要求に応じて、キーワード抽出部２０３で抽出したキーワードを検索し、その結果に基づきユーザー端末１００の表示部１０３にキーワードを表示させる。キーワード検索部２０６は、ユーザー端末１００の入出力部１０１から表示要求範囲を示す日時を受信すると、図４に示すキーワード抽出結果を参照し、ユーザー端末１００から送信された表示要求範囲を示す日時と、キーワードが抽出された時間を比較し、ユーザー端末１００から送信された表示要求範囲に含まれるキーワードを特定する。例えば、ユーザー端末１００の入出力部１０１から、表示要求範囲として２０１５年７月３１日の１３時から１５時を受信した場合を例に説明する。キーワード検索部２０６は、図４に示すキーワード抽出結果のキーワードが抽出された時間を参照し、２０１５年７月３１日の１３時から１５時に含まれるキーワードとして、「東京」、「宿題」、「設定」、「教育」を特定する。そして、ユーザー端末１００からの表示要求に対する結果として、キーワード、キーワード抽出時間、重みをユーザー端末１００に送信する。ユーザー端末１００の入出力部１０１が当該情報を受信し、表示制御部１０２が表示部１０３に表示する。 The keyword search unit 206 searches for the keyword extracted by the keyword extraction unit 203 in response to the display request of the speech recognition result from the input / output unit 101 of the user terminal 100, and the display unit 103 of the user terminal 100 Display keywords. When the keyword search unit 206 receives the date and time indicating the display request range from the input / output unit 101 of the user terminal 100, the keyword search unit 206 refers to the keyword extraction result shown in FIG. The time at which the keyword is extracted is compared, and the keyword included in the display request range transmitted from the user terminal 100 is specified. For example, the case of receiving from 13:00 to 15:00 on July 31, 2015 as the display request range from the input / output unit 101 of the user terminal 100 will be described as an example. The keyword search unit 206 refers to the time when the keyword extracted as the keyword extraction result shown in FIG. 4 is extracted, and “Tokyo”, “homework”, “homework” as “keywords included from 13 o'clock to 15 o'clock on July 31, 2015. Identify "setting", "education". Then, as a result of the display request from the user terminal 100, the keyword, the keyword extraction time, and the weight are transmitted to the user terminal 100. The input / output unit 101 of the user terminal 100 receives the information, and the display control unit 102 causes the display unit 103 to display the information.

続いて、ユーザー端末１００の表示部１０３に表示される音声認識結果の具体的な表示方法について説明する。 Subsequently, a specific display method of the speech recognition result displayed on the display unit 103 of the user terminal 100 will be described.

図５は、キーワード検索部２０６で検索した結果の表示例を示す図である。図５は上記のキーワード検索部２０６で、２０１５年７月３１日の１３時から１５時を検索範囲として検索し、抽出された４つのキーワード（東京、宿題、設定、教育）を表示している。各キーワードは図４に示すように、キーワードに設定された重み値に基づき表示の大きさや表示の形式を変えて表示している。図４に示すように、４つのキーワードの重みは、「東京」が０．９５、「宿題」が１．０、「設定」が０．３、「教育」が０．３２となっている。ここでは、重みは０から１の間の数値を用いており、１に近い値ほど重要なキーワードであることを示しており、重み値が一番大きい「宿題」を他のキーワードと形式を変えて星形の形状で表示している。その他の３つのキーワードについては全て円形の形状で表示しているが、重み値に応じて円形の大きさを変えることで、重要度を変化させて表示している。表示の形状は一例であり、複数のキーワードの中から重要度が高いキーワードが特定できる表示であれば色を変えるなどの表示方法とすることができる。これらの表示方法は問わない。 FIG. 5 is a view showing a display example of a result searched by the keyword search unit 206. As shown in FIG. In FIG. 5, the above keyword search unit 206 searches from 13:00 to 15:00 on July 31, 2015 as the search range, and displays the extracted four keywords (Tokyo, homework, setting, education) . As shown in FIG. 4, each keyword is displayed by changing the display size and the display format based on the weight value set for the keyword. As shown in FIG. 4, the weight of the four keywords is 0.95 for "Tokyo", 1.0 for "Homework", 0.3 for "Setting", and 0.32 for "Education". Here, the weight uses a numerical value between 0 and 1, indicating that the closer the value to 1, the more important the keyword, and change the form with the other keyword with the highest weight value "homework" It is displayed in the shape of a star. The other three keywords are all displayed in a circular shape, but the importance is changed by changing the size of the circle according to the weight value. The shape of the display is an example, and a display method such as changing a color can be used as long as the display can specify a keyword having a high degree of importance among a plurality of keywords. There is no limitation on the display method.

また、各キーワードの表示位置については、図４に示すキーワードが抽出された時間に基づき決定する。図４の例では、「東京」は２０１５年７月３１日の１３時５分であり、図５に示すように１３時５分付近に表示される。同様に「宿題」は２０１５年７月３１日の１４時４０分付近に、「設定」は２０１５年７月３１日の１３時１５分付近に、「教育」は２０１５年７月３１日の１４時付近に表示される。この表示から、２０１５年７月３１日の１３時から１５時の発話には、キーワードとして宿題が含まれる発話が多く、かつ１４時４０分前後にキーワードとして宿題が含まれる発話が多いことが分かる。このように時間軸に沿って、キーワードを重要度に応じて表示形式を変えて表示することにより、各時間帯にどのようなキーワードを含んだ発話が行われたかを、簡単に俯瞰することができるようになる。 Further, the display position of each keyword is determined based on the time when the keyword shown in FIG. 4 is extracted. In the example of FIG. 4, “Tokyo” is 13: 5 on July 31, 2015, and is displayed near 13: 5 as shown in FIG. 5. Similarly, “Homework” is around 14:40 on July 31, 2015, “Setting” is around 13:15 on July 31, 2015, “Education” is on July 14, 2015. Displayed near the hour. From this display, it is understood that there are many utterances including homework as a keyword and many utterances including homework as a keyword around 14:40 in the utterance of 13:00 to 15:00 on July 31, 2015 . In this way, by changing the display format according to the degree of importance and displaying the keywords along the time axis, it is possible to easily see what keywords have been included in each time zone. become able to.

本実施形態では、音声認識結果表示装置２００のキーワード抽出部２０３で音声データからキーワードを自動で抽出し、抽出されたキーワードをユーザー端末１００に表示する例で説明したが、キーワードの数が多い場合は、ユーザー端末１００の表示画面に表示できるキーワードの数が限られるため、重みの低いキーワードが画面に表示されない可能性がある。そこで、ユーザーが必要としないキーワードを表示対象から除く設定をユーザーができるようにしてもよい。その方法を簡単に図６から図９を参照して説明する。 In the present embodiment, the keyword extraction unit 203 of the voice recognition result display device 200 automatically extracts the keyword from the voice data, and the extracted keyword is displayed on the user terminal 100. However, the number of keywords is large Since the number of keywords that can be displayed on the display screen of the user terminal 100 is limited, there is a possibility that keywords with low weights are not displayed on the screen. Therefore, the user may be able to make settings for excluding from display targets keywords that the user does not need. The method will be briefly described with reference to FIGS.

図６はキーワード抽出の結果を示す図であり、登録されているキーワードは異なるが図４と同じキーワード抽出の結果である。図６に示すように抽出されたキーワードが５つ登録されているが、ユーザー端末１００の表示部１０３の表示領域の関係から、図７に示すように表示できるキーワードが４つとなっている。そのため、一番重みの低いキーワードとなっている「設定」が表示されていない。ここで、図８に示すように表示対象から除外するキーワードを設定することで、重みが高い場合でも除外キーワードに登録されている場合は表示対象から除外し、代わりに表示ができなかった重みの高いキーワードを表示する。図９は、除外対象のキーワードとして遊びを登録した例であり、この除外設定に基づき、「遊び」の代わりに「設定」が表示されている。 FIG. 6 is a diagram showing the result of the keyword extraction, which is the same as the result of the keyword extraction as in FIG. 4 although the registered keywords are different. Although five keywords extracted as shown in FIG. 6 are registered, four keywords can be displayed as shown in FIG. 7 from the relationship of display areas of the display unit 103 of the user terminal 100. Therefore, "setting" which is the keyword with the lowest weight is not displayed. Here, by setting a keyword to be excluded from the display target as shown in FIG. 8, even if the weight is high, if it is registered as a negative keyword, it is excluded from the display target and the weight can not be displayed instead. Display high keywords. FIG. 9 is an example in which a play is registered as a keyword to be excluded, and based on this exclusion setting, “setting” is displayed instead of “play”.

ここで、図１０を参照して本実施形態に係る音声認識結果表示システムの処理フローについて説明する。図１０は、第１の実施形態に係る音声認識結果表示システムの処理フローを示す図である。 Here, the processing flow of the speech recognition result display system according to the present embodiment will be described with reference to FIG. FIG. 10 is a diagram showing a processing flow of the speech recognition result display system according to the first embodiment.

ユーザー端末１００は、マイク等からのユーザーの発話（音声データ）を、発話者を特定する識別情報および発話時間と共に入出力部１０１から音声認識結果表示装置２００に送信する（ステップＳ１）。音声認識結果表示装置２００は音声データ入力部２０１で受信した音声データを音声認識サーバー３００に送信すると共に、音声データ、発話者を特定する識別情報、発話時間を記憶部２０５に記録する（ステップＳ２）。音声認識サーバー３００は入力部３０１で受信した音声データの認識処理を音声認識処理部３０２で実行し、認識結果のテキストデータを音声認識結果表示装置２００に送信する（ステップＳ３）。音声認識結果表示装置２００は音声認識サーバー３００から受信した認識結果の音声テキストデータをステップＳ１で記録した音声データに対応付けて記録する（ステップＳ４）。音声認識結果表示装置２００は、キーワード抽出部２０３で予め設定された時間間隔で記憶部２０５に記録されている音声テキストデータに含まれる文字列を検索する。検索結果として抽出した文字列を、当該文字列を抽出した時間と共に記憶部２０５に記録する。文字列の検索の過程で既に抽出し記録した文字列が検索された場合は出現回数をカウントし記録する（ステップＳ５）。音声認識結果表示装置２００のキーワード重要度算出部２０４はステップＳ５で抽出した文字列に対して、文字列の出現頻度に基づき各文字列に対する重みを設定する（ステップＳ６）。 The user terminal 100 transmits the user's speech (speech data) from a microphone or the like from the input / output unit 101 to the speech recognition result display device 200 together with identification information for specifying a speaker and speech time (step S1). The voice recognition result display device 200 transmits the voice data received by the voice data input unit 201 to the voice recognition server 300, and stores the voice data, identification information for identifying the speaker, and the speech time in the storage unit 205 (step S2) ). The speech recognition server 300 causes the speech recognition processing unit 302 to execute recognition processing of the speech data received by the input unit 301, and transmits the text data of the recognition result to the speech recognition result display device 200 (step S3). The voice recognition result display device 200 records the voice text data of the recognition result received from the voice recognition server 300 in association with the voice data recorded in step S1 (step S4). The voice recognition result display device 200 searches for a character string included in voice text data recorded in the storage unit 205 at time intervals preset by the keyword extraction unit 203. The character string extracted as the search result is recorded in the storage unit 205 together with the time when the character string is extracted. When the character string already extracted and recorded in the process of searching for the character string is searched, the number of appearances is counted and recorded (step S5). The keyword importance calculator 204 of the voice recognition result display device 200 sets weights for each character string based on the appearance frequency of the character string for the character string extracted in step S5 (step S6).

ユーザー端末１００は、ユーザーによる、音声認識結果のキーワード表示要求操作に基づき、入出力部１０１から音声認識結果表示装置２００に表示要求範囲を示す日時情報を含むキーワード検索要求を送信する（ステップＳ７）。音声認識結果表示装置２００はユーザー端末１００から受信したキーワード検索要求の表示要求範囲を示す日時に基づき、要求日時とステップＳ５の処理で記憶部２０５に記録した文字列(キーワード)を抽出した時間を比較して、要求日時の範囲内に抽出されたキーワードを特定する。そして、特定したキーワード、キーワードの抽出時間、ステップＳ６の処理で設定された重みをユーザー端末１００に送信する（ステップＳ８）。ユーザー端末１００は、入出力部１０１で音声認識結果表示装置２００から受信したキーワードを、表示制御部１０２でキーワード抽出時間から表示位置を調整し、重みから表示の大きさを調整して、表示部２０３に表示する（ステップＳ９）。 The user terminal 100 transmits a keyword search request including date and time information indicating a display request range from the input / output unit 101 to the speech recognition result display device 200 based on the keyword display request operation of the speech recognition result by the user (step S7) . Based on the date and time indicating the display request range of the keyword search request received from the user terminal 100, the voice recognition result display device 200 extracts the character string (keyword) recorded in the storage unit 205 in the process of step S5. In comparison, keywords extracted within the range of request date and time are identified. Then, the specified keyword, the extraction time of the keyword, and the weight set in the process of step S6 are transmitted to the user terminal 100 (step S8). The user terminal 100 adjusts the display position from the keyword extraction time by the display control unit 102 and adjusts the display size from the weight by using the display control unit 102 with the keywords received from the voice recognition result display device 200 by the input / output unit 101 It displays on 203 (step S9).

次に、図１０のフローチャートで説明した処理に基づきユーザー端末１００の表示部１０３に表示された音声認識結果に対する表示切替えおよび他の表示方法について図１１から図１５を参照して説明する。 Next, display switching and other display methods for the voice recognition result displayed on the display unit 103 of the user terminal 100 based on the processing described in the flowchart of FIG. 10 will be described with reference to FIGS.

図１１は、音声認識結果のキーワード表示の一例を示す図である。図１２は、図１１の表示に対するユーザー操作により、キーワードに対応する音声認識結果を付加した表示の一例を示す図である。 FIG. 11 is a diagram showing an example of keyword display of speech recognition results. FIG. 12 is a view showing an example of a display to which a speech recognition result corresponding to a keyword is added by a user operation on the display of FIG.

図１１は音声認識結果表示装置２００のキーワード検索部２０６で検索されたユーザー端末１００からの要求に基づく検索の結果（キーワード）を表示している。具体的には、２０１５年７月３１日の１０時から１２時における発話に含まれるキーワードが表示されたユーザー端末１００の表示画面６００である。キーワードとして金曜日、出張が表示され、ユーザーが操作するポインタ６０２は１１時を示している。図１１に示す表示画面６００において、ユーザーによる時間軸６０１上のポインタ６０２の移動操作により、ポインタ６０２の時間軸上の位置が、随時、ユーザー端末１００から音声認識結果表示装置２００に送信される。音声認識結果装置２００は図３に示す音声データと共に記憶部２０５に記録されている発話時間を参照し、ユーザー端末１００から受信したポインタの示す時間に対する発話が存在する場合には、該発話を行ったユーザーの情報と発話内容がユーザー端末１００に送信され、表示される。 FIG. 11 shows the result (keyword) of the search based on the request from the user terminal 100 searched by the keyword search unit 206 of the speech recognition result display device 200. Specifically, it is the display screen 600 of the user terminal 100 on which the keyword included in the utterance at 10 o'clock to 12 o'clock on July 31, 2015 is displayed. As a keyword, a business trip is displayed on Friday, and a pointer 602 operated by the user indicates 11 o'clock. In the display screen 600 shown in FIG. 11, the position on the time axis of the pointer 602 is transmitted from the user terminal 100 to the voice recognition result display device 200 as needed by the user's operation of moving the pointer 602 on the time axis 601. The speech recognition result apparatus 200 refers to the speech time recorded in the storage unit 205 together with the speech data shown in FIG. 3 and performs the speech when there is speech for the time indicated by the pointer received from the user terminal 100. The user information and the utterance content are transmitted to the user terminal 100 and displayed.

この検索結果の表示例を図１２に示している。 A display example of the search result is shown in FIG.

図１２に示すように、表示制御部１０２は、ユーザー端末１００が音声認識結果表示装置２００から受信した検索結果の情報に基づいて制御し、表示部１０３に、キーワード(金曜日)を含む発話内容６０３を、発話された時間に対応する位置に表示する。発話内容には発話者、発話した時間、発話内容が表示される。表示内容は一例であり、発話内容に変えて要約を表示したり、その他の情報を表示するようにしてもよい。上記では、ポインタにより時間を選択して、該時間に行われた発話の情報を表示しているが、ユーザーが画面上のキーワードをタッチするなどにより選択することにより、ユーザー端末１００が選択されたキーワードの情報を音声認識結果表示装置２００に送信することができる。そして、音声認識結果表示装置２００が、図３に示す音声データと共に記憶部２０５に記録されている音声認識結果（音声テキストデータ）から該キーワードを含む発話を検索し、ユーザー端末１００に検索された発話に関する発話時間を含む情報を送信することにより、ユーザー端末１００の画面の時間軸上に吹き出しなどの形式で発話の情報を表示してもよい。 As shown in FIG. 12, the display control unit 102 controls based on the information of the search result received by the user terminal 100 from the voice recognition result display device 200, and the display unit 103 includes the utterance content 603 including the keyword (Friday) Is displayed at a position corresponding to the time of utterance. As the uttered content, the utterer, the time when the uttered, and the uttered content are displayed. The display content is an example, and a summary may be displayed instead of the uttered content, or other information may be displayed. In the above, the time is selected by the pointer, and the information of the utterance performed at the time is displayed, but the user selects the user terminal 100 by selecting it by touching the keyword on the screen. Information of a keyword can be transmitted to the speech recognition result display device 200. Then, the speech recognition result display device 200 searches for speech including the keyword from the speech recognition result (speech text data) recorded in the storage unit 205 together with the speech data shown in FIG. Information on the utterance may be displayed in the form of a balloon or the like on the time axis of the screen of the user terminal 100 by transmitting information including the utterance time related to the utterance.

ここでは１つの発話のみが表示されているが、検索の結果、複数の発話が検索された場合は、全ての発話内容が表示される。尚、ユーザーの操作により選択されなかったキーワードはグレーアウトで表示したり、選択されたキーワードをハイライトで表示するなど、選択されたキーワードと選択されなかったキーワードを区別して表示するようにしてもよい。 Here, only one utterance is displayed, but when a plurality of utterances are searched as a result of the search, all utterance contents are displayed. The keywords not selected by the user operation may be displayed in gray, or the selected keywords may be displayed highlighted, or the selected keywords and the non-selected keywords may be displayed separately. .

続いて、図１３、図１４を参照して他の表示例を説明する。図１３、図１４は音声認識結果のキーワード表示とあわせて、画面表示されている時間帯における発話量を表示する一例である。 Subsequently, another display example will be described with reference to FIGS. 13 and 14. FIG. 13 and FIG. 14 show an example of displaying the amount of speech in the time zone displayed on the screen together with the keyword display of the speech recognition result.

図１３、図１４は音声認識結果表示装置２００のキーワード検索部２０６でユーザー端末１００からの要求に基づく検索結果（キーワード）を表示している。具体的には、２０１５年７月３１日の１３時から１５時における発話に含まれるキーワードが表示されたユーザー端末１００の表示画面６００である。図５ではキーワードのみを表示する表示例を説明したが、図１３ではユーザー端末１００に表示されている時間軸（６０１）の範囲に含まれる発話数を三角形の高さを変化させて発話が多い時間帯を可視化している（発話数情報６０４）。また、図１４は三角形の表示の代わりに折れ線グラフ（発話数情報６０５）を用いて各時間帯の発話数を表現している。発話数については、ユーザー端末１００からのキーワード検索要求に対する音声認識結果表示装置２００の処理において、キーワード検索部２０６で上述のキーワード検索と並行して、ユーザー端末１００から受信した表示要求範囲を示す日時で、図３に示す音声データと共に記憶部２０５に記録されている発話時間を基に検索し、表示要求範囲に含まれる音声データのレコード数をカウントし、所定時間単位(例えば３０分単位)のレコード数をユーザー端末１００に送信する。ユーザー端末１００は音声認識結果表示装置２００から受信した情報に基づき、表示制御部１０２で調整した結果を表示部１０３に表示する。 In FIGS. 13 and 14, the keyword search unit 206 of the speech recognition result display device 200 displays the search result (keyword) based on the request from the user terminal 100. FIG. Specifically, it is the display screen 600 of the user terminal 100 on which the keyword included in the utterance at 13:00 to 15:00 on July 31, 2015 is displayed. Although the display example which displays only the keyword is described in FIG. 5, the number of utterances included in the range of the time axis (601) displayed on the user terminal 100 is changed in triangle height in FIG. The time zone is visualized (speech number information 604). Further, FIG. 14 expresses the number of utterances in each time zone by using a line graph (the number of utterances information 605) instead of the triangle display. Regarding the number of utterances, in the processing of the voice recognition result display device 200 in response to the keyword search request from the user terminal 100, the date and time indicating the display request range received from the user terminal 100 in parallel with the above keyword search by the keyword search unit 206 Then, based on the utterance time recorded in the storage unit 205 together with the audio data shown in FIG. 3, the number of records of the audio data included in the display request range is counted, and the predetermined time unit (for example, 30 minutes unit) The number of records is transmitted to the user terminal 100. The user terminal 100 displays the result adjusted by the display control unit 102 on the display unit 103 based on the information received from the voice recognition result display device 200.

ここで、図１５を参照して本実施形態に係る音声認識結果の音声テキスト表示の処理フローについて説明する。図１５は、第１の実施形態に係る音声認識結果表示の処理フローを示す図である。 Here, with reference to FIG. 15, the processing flow of the audio | voice text display of the speech recognition result which concerns on this embodiment is demonstrated. FIG. 15 is a diagram showing a processing flow of speech recognition result display according to the first embodiment.

ユーザー端末１００は、表示部１０３に表示された音声認識結果表示装置２００から取得したキーワードに対するユーザーの指定操作に伴い、音声認識結果装置２００に指定キーワードと指定キーワードの表示位置に対応する日時を送信する（ステップＳ１１）。音声認識結果表示装置２００は、キーワード検索部２０６でユーザー端末１００から受信した指定されたキーワードおよび日時に基づき、記憶部２０５に記録された当該日時に発話された音声データに対応する音声テキストデータから指定キーワードを含む音声テキストデータを検索する（ステップＳ１２）。指定のキーワードが含まれる音声テキストデータを発話者、発話時間等、付随情報と共にユーザー端末１００に送信する（ステップＳ１３）。ユーザー端末１００は、表示制御部１０２で音声認識結果表示装置２００から受信した音声テキストデータを指定したキーワードを含むデータとして、表示位置を調整し、表示部１０３に表示された指定キーワードの近傍に表示する。 The user terminal 100 transmits the designated keyword and the date and time corresponding to the display position of the designated keyword to the speech recognition result device 200 in accordance with the user's designation operation for the keyword acquired from the speech recognition result display device 200 displayed on the display unit 103. (Step S11). The voice recognition result display device 200 uses voice text data corresponding to voice data uttered at the date and time recorded in the storage unit 205 based on the designated keyword and date and time received from the user terminal 100 by the keyword search unit 206. The voice text data including the designated keyword is searched (step S12). The voice text data including the designated keyword is transmitted to the user terminal 100 together with the accompanying information such as the speaker and the speech time (step S13). The user terminal 100 adjusts the display position as data including the designated keyword and the voice text data received from the voice recognition result display device 200 by the display control unit 102, and displays it in the vicinity of the designated keyword displayed on the display unit 103. Do.

次に、図１６を参照して他の表示例を説明する。図１６は、過去のキーワードから目的のキーワードを検索するためのキーワード検索画面７００の一例である。 Next, another display example will be described with reference to FIG. FIG. 16 is an example of a keyword search screen 700 for searching for a target keyword from past keywords.

図１６の例では、画面上部に検索したいキーワードの入力エリア７０１と、縦軸として月単位の時間軸７０２ａ、横軸として時間単位の時間軸７０２ｂを含む表示エリア７０２と、が設けられている。画面を垂直方向にスクロールすることにより、表示対象の月を選択できる。また、画面のピンチイン／ピンチアウトにより、１２時間表示／２４時間表示を切り替えることができ、１２時間表示中は水平方向にスクロールすることにより、表示時間帯（０〜１２時／１２〜２４時）を切り替えることができる。 In the example of FIG. 16, an input area 701 of a keyword to be searched is provided at the top of the screen, and a display area 702 including a time axis 702a in units of months as a vertical axis and a time axis 702b in units of time as a horizontal axis. By vertically scrolling the screen, it is possible to select a month to be displayed. In addition, it is possible to switch 12 hours display / 24 hours display by pinch in / pinch out of the screen, and display time zone (0 to 12 o'clock / 12 to 24 o'clock) by scrolling horizontally during 12 hours display. Can be switched.

画面上部のキーワードの入力エリア７０１にキーワードを入力し、虫眼鏡アイコンをタッチすると、入力されたキーワード及び選択された表示要求範囲を示す月／時間を含む音声認識結果の表示要求が音声認識結果表示装置２００に送信される。キーワード検索部２０６は、受信した音声認識結果の表示要求に基づいて、キーワード抽出部２０３によって抽出したキーワードの中から表示要求範囲内の該当するキーワードを検索し、検索結果をユーザー端末１００に送信する。ユーザー端末１００の入出力部１０１が検索結果を受信し、表示制御部１０２が図１６に示すような画面を表示部１０３に表示する。 When a keyword is input in the keyword input area 701 at the top of the screen and the magnifying glass icon is touched, a display request for the voice recognition result including the input keyword and the month / time indicating the selected display request range is the voice recognition result display device Sent to 200. Based on the received display request of the speech recognition result, the keyword search unit 206 searches the keywords extracted by the keyword extraction unit 203 for the corresponding keyword within the display request range, and transmits the search result to the user terminal 100. . The input / output unit 101 of the user terminal 100 receives the search result, and the display control unit 102 displays a screen as shown in FIG. 16 on the display unit 103.

図１６の例では、入力されたキーワード（図１６の例では“出張”）が発話された月／時間の箇所に○印が表示されている。○印の大きさは、検索されたキーワードを含む発話の数を表現している。発話数が多いほど、円を大きくして表示し、発話数が少ないほど、小さな円を表示するように制御される。また、表示エリア７０２に表示された○印をタッチすると、図２のようなその時間帯のキーワード一覧が表示されるように構成することができる。 In the example of FIG. 16, a circle mark is displayed at the location of the month / time when the input keyword (“business trip” in the example of FIG. 16) is uttered. The size of the circle indicates the number of utterances including the searched keyword. The larger the number of utterances, the larger the circle is displayed, and the smaller the number of utterances, the smaller circle is displayed. Further, when the circle mark displayed in the display area 702 is touched, a keyword list of that time zone as shown in FIG. 2 can be displayed.

このように、本実施形態では、音声認識結果を時間帯毎に発話に含まれる文字列をキーワードとして表示することができ、どの時間帯にどのようなテーマ、話題に関する発話があったのかを容易に把握することができる。 As described above, in the present embodiment, the character recognition result can be displayed as a character string included in the utterance for each time zone as a keyword, and it is easy to determine what kind of theme or topic uttered in which time zone. Can be grasped.

また、本実施形態では、発話に含まれるキーワードの数（出現頻度）に応じて重み付けをすることで、キーワードの表示の大きさや形状を変えて表示することができ、表示されたキーワードに関する発話が集中した時間帯を容易に把握することができる。 Further, in the present embodiment, by weighting according to the number (appearance frequency) of the keywords included in the utterance, the size and shape of the display of the keywords can be changed and displayed, and the utterance related to the displayed keywords is It is possible to easily grasp the concentrated time zone.

また、本実施形態では、画面上の操作により表示されたキーワードが含まれる発話内容を表示することができるので、具体的な発話内容を容易に把握することができる。 Further, in the present embodiment, since the utterance content including the keyword displayed by the operation on the screen can be displayed, it is possible to easily grasp the specific utterance content.

また、本実施形態では、音声認識結果を時間帯毎に発話に含まれるキーワードの表示に加え、時間帯毎の発話数を表示することができ、会議等において活発に議論された時間帯を容易に把握することができる。 Further, in the present embodiment, the speech recognition result can be added to the display of the keywords included in the utterance for each time zone, and the number of utterances for each time zone can be displayed, and the time zone actively discussed in the meeting etc. Can be grasped.

また、本実施形態では、表示対象とするキーワードをユーザーが設定することもできるため、表示領域との関係から表示できるキーワードの数に制約がある場合でも、必要なキーワードのみを表示することができる。 Further, in the present embodiment, since the user can set keywords to be displayed, only necessary keywords can be displayed even when the number of keywords that can be displayed is restricted due to the relationship with the display area. .

尚、本実施形態では、音声認識結果の表示制御をユーザー端末１００で実行する構成で説明したが、音声認識結果表示装置２００で実行してもよい。 In the present embodiment, the display control of the speech recognition result has been described as being executed by the user terminal 100, but may be performed by the speech recognition result display device 200.

なお、上記の実施形態に記載した手法は、コンピュータに実行させることのできるプログラムとして、磁気ディスク（フロッピー（登録商標）ディスク、ハードディスクなど）、光ディスク（ＣＤ−ＲＯＭ、ＤＶＤなど）、光磁気ディスク（ＭＯ）、半導体メモリなどの記憶媒体に格納して頒布することもできる。 Note that the method described in the above embodiment can be executed as a program that can be executed by a computer, such as a magnetic disk (floppy (registered trademark) disk, hard disk, etc.), optical disk (CD-ROM, DVD, etc.), magneto-optical disk (CD, etc.) It can also be stored and distributed in a storage medium such as MO) or semiconductor memory.

ここで、この記憶媒体としては、プログラムを記憶でき、かつコンピュータが読み取り可能な記憶媒体であれば、その記憶形式は何れの形態であっても良い。 Here, as the storage medium, any storage format may be used as long as it can store a program and can be read by a computer.

また、記憶媒体からコンピュータにインストールされたプログラムの指示に基づきコンピュータ上で稼働しているＯＳ（オペレーティングシステム）や、データベース管理ソフト、ネットワークソフト等のＭＷ（ミドルウェア）等が本実施形態を実現するための各処理の１部を実行しても良い。 In addition, an operating system (OS) operating on a computer based on an instruction of a program installed in the computer from a storage medium, MW (middleware) such as database management software, network software, etc. realize this embodiment. A part of each process may be executed.

さらに、本実施形態における記憶媒体は、コンピュータと独立した媒体に限らず、ＬＡＮやインターネット等により伝送されたプログラムをダウンロードして記憶または１時記憶した記憶媒体も含まれる。 Furthermore, the storage medium in the present embodiment is not limited to a medium independent of the computer, but also includes a storage medium obtained by downloading and storing or temporarily storing a program transmitted by a LAN, the Internet, or the like.

また、記憶媒体は１つに限らず、複数の媒体から本実施形態における処理が実行される場合も本実施形態における記憶媒体に含まれ、媒体構成は何れの構成であっても良い。 Further, the storage medium is not limited to one, and the processing in the present embodiment may be executed from a plurality of media, and the storage medium in the present embodiment may have any configuration.

なお、本実施形態におけるコンピュータは、記憶媒体に記憶されたプログラムに基づき、本実施形態における各処理を実行するものであって、パソコン等の１つからなる装置、複数の装置がネットワーク接続されたシステム等の何れの構成であっても良い。 Note that the computer in the present embodiment executes each process in the present embodiment based on a program stored in a storage medium, and a single device such as a personal computer and a plurality of devices are connected to a network. It may be any configuration such as a system.

また、本実施形態の各記憶装置は１つの記憶装置で実現しても良いし、複数の記憶装置で実現しても良い。 Further, each storage device of the present embodiment may be realized by one storage device or may be realized by a plurality of storage devices.

そして、本実施形態におけるコンピュータとは、パソコンに限らず、情報処理機器に含まれる演算処理装置、マイコン等も含み、プログラムによって本実施形態の機能を実現することが可能な機器、装置を総称している。 The computer in the present embodiment is not limited to a personal computer, but also includes an arithmetic processing unit, a microcomputer, etc. included in an information processing device, and generically refers to devices and devices capable of realizing the functions of the present embodiment by a program. ing.

尚、本発明の実施形態を説明したが、これらの実施形態は、例として提示したものであり、発明の範囲を限定することは意図していない。これら新規な実施形態は、その他の様々な形態で実施されることが可能であり、発明の要旨を逸脱しない範囲で、種々の省略、置き換え、変更を行うことができる。これら実施形態やその変形は、発明の範囲や要旨に含まれるとともに、特許請求の範囲に記載された発明とその均等の範囲に含まれる。 While the embodiments of the present invention have been described, these embodiments are presented as examples and are not intended to limit the scope of the invention. These novel embodiments can be implemented in various other forms, and various omissions, substitutions, and modifications can be made without departing from the scope of the invention. These embodiments and modifications thereof are included in the scope and the gist of the invention, and are included in the invention described in the claims and the equivalent scope thereof.

１００…ユーザー端末
１０１…入出力部
１０２…表示制御部
１０３…表示部
２００…音声認識結果表示装置
２０１…音声データ入力部
２０２…音声データ出力部
２０３…キーワード抽出部
２０４…キーワード重要度算出部
２０５…記憶部
２０６…キーワード検索部
３００…音声認識サーバー
３０１…入力部
３０２…音声認識処理部
３０３…出力部
４００…ネットワーク
５００、６００、７００…音声認識結果表示画面
５０１、６０１…時間軸
５０２、６０２…操作ポインタ
６０３…発話情報
６０４、６０５…発話数情報100 user terminal 101 input / output unit 102 display control unit 103 display unit 200 voice recognition result display device 201 voice data input unit 202 voice data output unit 203 keyword extraction unit 204 keyword importance degree calculation unit 205 ... storage unit 206 ... keyword search unit 300 ... speech recognition server 301 ... input unit 302 ... speech recognition processing unit 303 ... output unit 400 ... networks 500, 600, 700 ... speech recognition result display screen 501, 601 ... time axis 502, 602 ... Operation pointer 603 ... Speech information 604, 605 ... Speech number information

Claims

A speech recognition result display device for displaying a result of speech recognition processing for speech data on a client terminal screen, comprising:
Keyword extraction means for extracting a character string included in the voice text data at predetermined timing from the voice text data which is the result of the voice recognition process;
A storage unit that records the voice text data, the character string, and a time when the character string is extracted by the keyword extraction unit;
Based on the display request of the voice recognition result from the client terminal, the predetermined time designated from the client terminal is compared with the time when the character string is extracted, and the time when the character string is extracted is the designated predetermined time And keyword search means for searching the storage unit for the character string included in
A voice recognition result display device characterized by displaying the character string searched by the keyword search means as a keyword on the client terminal screen.

The apparatus has a keyword importance degree calculation unit that calculates a weight for the character string based on the appearance frequency of the character string extracted by the keyword extraction unit.
The storage unit associates the weight calculated by the keyword importance degree calculation means with the character string and records the weight.
Based on the display request of the voice recognition result from said client terminal, wherein the character string searched by the keyword searching unit for the specified predetermined time from the client terminal, by changing the format in accordance with the weight client The speech recognition result display device according to claim 1, wherein the speech recognition result is displayed on a terminal screen.

3. The voice recognition result display device according to claim 2, wherein a format according to the weight displayed on the client terminal screen is a size or a color.

It said storage unit records the time obtained by extracting the character string from the audio-text data in the keyword extraction means,
Based on said display request of the speech recognition result from the client terminal, corresponding to the time of extracting the character string the string searched by the keyword searching unit for the specified predetermined time from the client terminal as a keyword The voice recognition result display device according to any one of claims 1 to 3, wherein the voice recognition result is displayed at a position.

The keyword search means searches the storage unit for voice text data including a keyword designated from the client terminal based on a display request of speech information of speech recognition result by the keyword designation operation displayed on the client terminal screen. 5. The voice recognition result display device according to claim 1, wherein the voice text data is displayed in the vicinity of a designated keyword on the client terminal screen.

A voice recognition result display method performed by a voice recognition result device for displaying a result of voice recognition processing for voice data on a client terminal screen, comprising:
Extracting a character string included in the voice text data at predetermined timing from the voice text data which is the result of the voice recognition process;
Recording in the storage unit the voice text data, the character string extracted from the voice text data, and a time when the character string is extracted in the extracting step;
Based on the display request of the voice recognition result from the client terminal, the predetermined time designated from the client terminal is compared with the time when the character string is extracted, and the time when the character string is extracted is the designated predetermined time Retrieving the character string included in the storage unit from the storage unit;
A voice recognition result display method comprising displaying a character string retrieved from the storage unit on the client terminal screen as a keyword.

A program executed by a speech recognition result display device for displaying the result of speech recognition processing for speech data on a client terminal screen, comprising:
A keyword extraction function of extracting a character string included in the voice text data at predetermined timing from the voice text data which is the result of the voice recognition process;
A recording function for recording the voice text data, the character string, and a time when the character string is extracted by the keyword extraction unit;
Based on the display request of the voice recognition result from the client terminal, a predetermined time designated by the client terminal is compared with the time when the character string is extracted from the information recorded by the recording function, and the character string is A keyword search function of searching for the character string whose extracted time is included in the designated predetermined time ;
A voice recognition result display program, wherein the character string searched by the keyword search function is displayed on the client terminal screen as a keyword.