JP6401488B2

JP6401488B2 - Foreign language conversation understanding support device, foreign language conversation understanding support method and program

Info

Publication number: JP6401488B2
Application number: JP2014088115A
Authority: JP
Inventors: 石井　宏; 宏石井; 松下　正樹; 正樹松下; 洋志中原; 和田　義毅; 義毅和田; 明弘塩田; 優樹栗原; 圭史金田
Original assignee: NTT Data Corp
Current assignee: NTT Data Corp
Priority date: 2014-04-22
Filing date: 2014-04-22
Publication date: 2018-10-10
Anticipated expiration: 2034-04-22
Also published as: JP2015207191A

Description

本発明は、外国語による会話の理解を支援する技術に関する。 The present invention relates to a technology that supports understanding of conversation in a foreign language.

従来、外国語による会話の内容を音声認識して、その認識の結果得られた文字列やその翻訳を画面に表示する技術が知られている。例えば、特許文献１には、タッチパネルに表示されるSpeak-inボタンが利用者により押下されると同ボタンが再度押下されるまでの間、利用者の音声を取り込み、その取り込んだ音声の音声認識結果と同結果の翻訳結果とを画面に表示する技術が記載されている。 2. Description of the Related Art Conventionally, a technique for recognizing the content of a conversation in a foreign language and displaying a character string obtained as a result of the recognition and its translation on a screen is known. For example, in Patent Document 1, when a Speak-in button displayed on a touch panel is pressed by a user, the user's voice is captured until the button is pressed again, and voice recognition of the captured voice is performed. A technique for displaying the result and the translation result of the result on the screen is described.

特開２００９−２０５５７９号公報JP 2009-205579 A

この特許文献１に記載の技術は、利用者が発話する前に、自身が発話する音声を取り込むための操作を行い、自身が発話後に、音声認識と翻訳の開始を指示する操作を行うという流れを想定している。このような技術は、発話者による発話のタイミングに合わせて予め音声を取り込むための操作を行うことができる場合には有効である。しかし、一般的な会話のように、会話相手の発話のタイミングを利用者側で指定できない場合には、会話相手の発話のタイミングに合わせて予め音声を取り込むための操作を行うことができず、その結果、会話相手の音声を始めから音声認識して翻訳を行うといったことができない。 The technology described in Patent Document 1 is a flow of performing an operation for capturing a voice spoken by the user before the user utters, and performing an operation for instructing the start of speech recognition and translation after the utterance. Is assumed. Such a technique is effective when an operation for capturing speech in advance can be performed in accordance with the timing of the utterance by the speaker. However, if the user cannot specify the timing of the conversation partner's utterance as in a general conversation, the operation for capturing the voice in advance according to the timing of the conversation partner's speech cannot be performed. As a result, the voice of the conversation partner cannot be recognized from the beginning and translated.

本発明は、このような事情に鑑みてなされたものであり、外国語による会話の理解を支援することを目的とする。 The present invention has been made in view of such circumstances, and an object thereof is to support understanding of conversations in a foreign language.

上記の課題を解決するため、本発明は、会話の相手により発話された内容を示す発話内容データと、時刻を示す時刻情報とが対応づけられた組を複数記憶する記憶部と、前記記憶部に記憶された発話内容データに対する処理の指示を利用者より受け付ける入力部と、前記入力部により前記指示が受け付けられた時刻よりも第１の時間遡った時刻以降の時刻を示す時刻情報と前記記憶部において対応づけられている発話内容データを抽出する抽出部と、前記抽出部により抽出された発話内容データに対する音声認識処理又は翻訳処理の結果に関連する情報を出力する出力部とを備える外国語会話理解支援装置を提供する。 In order to solve the above problems, the present invention provides a storage unit that stores a plurality of sets in which speech content data indicating content spoken by a conversation partner and time information indicating time are associated with each other, and the storage unit An input unit that receives from the user an instruction to process the utterance content data stored in the user, time information indicating a time after a time that is a first time later than the time at which the instruction is received by the input unit, and the storage A foreign language comprising: an extraction unit for extracting utterance content data associated with the unit; and an output unit for outputting information related to a result of speech recognition processing or translation processing for the utterance content data extracted by the extraction unit A conversation understanding support device is provided.

好ましい態様において、前記抽出部は、前記入力部により前記指示が受け付けられた時刻よりも予め定められた時間遡った時刻以降の時刻を示す時刻情報と前記記憶部において対応づけられている発話内容データであって、かつ前記利用者により入力された又は予め定められた条件に合致する発話内容データを抽出してもよい。 In a preferred aspect, the extraction unit includes utterance content data correlated in the storage unit with time information indicating a time after a time that is a predetermined time before the time when the instruction is received by the input unit. In addition, utterance content data input by the user or meeting predetermined conditions may be extracted.

さらに好ましい態様において、前記入力部は、前記利用者による文字の入力をさらに受け付け、前記条件に合致する発話内容データとは、前記入力部によりその入力が受け付けられた前記文字を含む単語を表す発話内容データであってもよい。 In a further preferred aspect, the input unit further accepts input of characters by the user, and the utterance content data matching the condition is an utterance representing a word including the characters whose input has been accepted by the input unit. It may be content data.

別のさらに好ましい態様において、前記入力部は、前記利用者による、発音記号を特定するための情報の入力をさらに受け付け、前記条件に合致する発話内容データとは、前記入力部によりその入力が受け付けられた前記情報により特定される発音記号をその発音記号に含む単語を表す発話内容データであってもよい。 In another more preferable aspect, the input unit further accepts input of information for specifying a phonetic symbol by the user, and utterance content data matching the condition is accepted by the input unit. It may be utterance content data representing a word including a phonetic symbol specified by the given information in the phonetic symbol.

別のさらに好ましい態様において、前記条件に合致する発話内容データとは、その品詞が予め定められた品詞である単語を表す発話内容データであってもよい。 In another more preferable aspect, the utterance content data matching the condition may be utterance content data representing a word whose part of speech is a predetermined part of speech.

また、本発明は、会話の相手により発話された内容を示す発話内容データと、時刻を示す時刻情報とが対応づけられた組を複数記憶する記憶部を備える外国語会話理解支援装置により実行される外国語会話理解支援方法であって、前記記憶部に記憶された発話内容データに対する処理の指示を利用者より受け付けるステップと、前記指示が受け付けられた時刻よりも第１の時間遡った時刻以降の時刻を示す時刻情報と前記記憶部において対応づけられている発話内容データを抽出するステップと、前記抽出された発話内容データに対する音声認識処理又は翻訳処理の結果に関連する情報を出力するステップとを備える外国語会話理解支援方法を提供する。 Further, the present invention is executed by a foreign language conversation understanding support device including a storage unit that stores a plurality of sets in which speech content data indicating content spoken by a conversation partner is associated with time information indicating time. A foreign language conversation understanding support method comprising: a step of accepting a processing instruction for utterance content data stored in the storage unit from a user; and a time that is a first time later than a time when the instruction is accepted Extracting the utterance content data associated with the time information indicating the time and the storage unit, and outputting information related to the result of speech recognition processing or translation processing on the extracted utterance content data; A foreign language conversation understanding support method is provided.

また、本発明は、会話の相手により発話された内容を示す発話内容データと、時刻を示す時刻情報とが対応づけられた組を複数記憶する記憶部を備えるコンピュータに、前記記憶部に記憶された発話内容データに対する処理の指示を利用者より受け付けるステップと、前記指示が受け付けられた時刻よりも第１の時間遡った時刻以降の時刻を示す時刻情報と前記記憶部において対応づけられている発話内容データを抽出するステップと、前記抽出された発話内容データに対する音声認識処理又は翻訳処理の結果に関連する情報を出力するステップとを実行させるためのプログラムを提供する。 Further, the present invention is stored in the storage unit in a computer including a storage unit that stores a plurality of sets in which speech content data indicating content uttered by a conversation partner and time information indicating time are associated with each other. A step of accepting an instruction to process the received utterance content data from the user, and utterance associated in the storage unit with time information indicating a time after a time that is a first time later than the time when the instruction was accepted There is provided a program for executing a step of extracting content data and a step of outputting information related to a result of speech recognition processing or translation processing on the extracted utterance content data.

本発明によれば、外国語による会話の理解を支援することができる。 According to the present invention, it is possible to support understanding of a conversation in a foreign language.

外国語会話理解支援装置１の機能ブロック図である。3 is a functional block diagram of the foreign language conversation understanding support device 1. FIG. 単語ＤＢ１２２のデータ構成の一例を示す図である。It is a figure which shows an example of a data structure of word DB122. 単語記憶処理の一例を示すフローチャートである。It is a flowchart which shows an example of a word memory | storage process. 単語表示処理の一例を示すフローチャートである。It is a flowchart which shows an example of a word display process. 抽出処理の一例を示すフローチャートである。It is a flowchart which shows an example of an extraction process. タッチパネル１３に表示される画面の一例を示す図である。FIG. 6 is a diagram illustrating an example of a screen displayed on the touch panel 13. 外国語会話理解支援装置１Ａの機能ブロック図である。It is a functional block diagram of foreign language conversation understanding support device 1A. 単語ＤＢ１２２Ａのデータ構成の一例を示す図である。It is a figure which shows an example of a data structure of word DB122A. 発音記号辞書ＤＢ１２４のデータ構成の一例を示す図である。It is a figure which shows an example of a data structure of phonetic symbol dictionary DB124. 読み辞書ＤＢ１２５のデータ構成の一例を示す図である。It is a figure which shows an example of a data structure of reading dictionary DB125. タッチパネル１３に表示される画面の一例を示す図である。FIG. 6 is a diagram illustrating an example of a screen displayed on the touch panel 13.

１．実施形態
１−１．構成
図１は、本発明の一実施形態に係る外国語会話理解支援装置１の機能ブロック図である。この外国語会話理解支援装置１は、ネットワークを介して接続される他の装置の利用者との間で音声と映像を使った会話を可能にするコンピュータである。特にこの外国語会話理解支援装置１は、その会話が当該装置の利用者にとって外国語で行われる場合に、その会話の理解を支援するコンピュータである。 1. Embodiment 1-1. Configuration FIG. 1 is a functional block diagram of a foreign language conversation understanding support device 1 according to an embodiment of the present invention. This foreign language conversation understanding support device 1 is a computer that enables conversation using voice and video with users of other devices connected via a network. In particular, the foreign language conversation understanding support device 1 is a computer that supports understanding of a conversation when the conversation is performed in a foreign language for the user of the device.

この外国語会話理解支援装置１は、具体的には、例えばパーソナルコンピュータである。より具体的には、スマートフォンやタブレット端末等の携帯端末や、据え置き型のコンピュータである。外国語会話理解支援装置１は、大まかに言うと、図１に示されるように、処理部１１と、記憶部１２と、タッチパネル１３と、カメラ１４と、マイク１５と、スピーカ１６と、通信部１７とを備える。 The foreign language conversation understanding support device 1 is specifically a personal computer, for example. More specifically, it is a mobile terminal such as a smartphone or a tablet terminal, or a stationary computer. Generally speaking, the foreign language conversation understanding support device 1 is, as shown in FIG. 1, a processing unit 11, a storage unit 12, a touch panel 13, a camera 14, a microphone 15, a speaker 16, and a communication unit. 17.

処理部１１は、例えばＣＰＵ（Central Processing Unit）等の演算処理装置である。この処理部１１は、図１に示されるように、エンコーダ１１１と、デコーダ１１２と、単語認識部１１３と、記憶情報更新部１１４と、時計部１１５と、抽出部１１６と、翻訳部１１７と、表示制御部１１８という機能を有する。これらの機能は、処理部１１により、記憶部１２に記憶されるプログラムが実行されることにより実現される。これらの機能については後述する。 The processing unit 11 is an arithmetic processing device such as a CPU (Central Processing Unit). As shown in FIG. 1, the processing unit 11 includes an encoder 111, a decoder 112, a word recognition unit 113, a stored information update unit 114, a clock unit 115, an extraction unit 116, a translation unit 117, The display controller 118 has a function. These functions are realized by the processing unit 11 executing a program stored in the storage unit 12. These functions will be described later.

記憶部１２は、例えばＥＥＰＲＯＭ（Electronically Erasable and Programmable ROM）やフラッシュメモリ等の記憶装置である。この記憶部１２は、処理部１１により実行されるプログラムを記憶する。また、記憶部１２は、図１に示されるように、学習モデルＤＢ（Database）１２１と、単語ＤＢ１２２と、翻訳辞書ＤＢ１２３とを記憶する。 The storage unit 12 is a storage device such as an EEPROM (Electronically Erasable and Programmable ROM) or a flash memory. The storage unit 12 stores a program executed by the processing unit 11. In addition, as illustrated in FIG. 1, the storage unit 12 stores a learning model DB (Database) 121, a word DB 122, and a translation dictionary DB 123.

学習モデルＤＢ１２１は、単語認識部１１３により実行される音声認識処理に必要な学習モデルを格納するデータベースである。学習モデルＤＢ１２１は、具体的には、例えば、音響モデルと、言語モデルと、辞書とを格納する。この学習モデルＤＢ１２１は、複数言語に対応すべく複数の学習モデルを格納してもよい。 The learning model DB 121 is a database that stores a learning model necessary for speech recognition processing executed by the word recognition unit 113. Specifically, the learning model DB 121 stores, for example, an acoustic model, a language model, and a dictionary. The learning model DB 121 may store a plurality of learning models so as to correspond to a plurality of languages.

単語ＤＢ１２２は、単語認識部１１３により実行される音声認識処理の結果生成される単語データを時刻情報と対応づけて格納するデータベースである。図２は、この単語ＤＢ１２２のデータ構成の一例を示す図である。単語ＤＢ１２２を構成する各レコードは、図２に示されるように、「時刻」と「単語」の各フィールドにより構成される。単語のフィールドには、単語認識部１１３により実行される音声認識処理の結果生成される単語データが格納され、時刻のフィールドには、その単語データが格納される時刻を示す時刻情報が格納される。この単語データは、本発明に係る「発話内容データ」の一例である。
なお、本実施形態では、冠詞と前置詞と人称代名詞とは単語ＤＢ１２２の登録対象としていないが、これらも登録対象としてもよい。 The word DB 122 is a database that stores word data generated as a result of speech recognition processing executed by the word recognition unit 113 in association with time information. FIG. 2 is a diagram showing an example of the data structure of the word DB 122. As shown in FIG. As shown in FIG. 2, each record constituting the word DB 122 is composed of fields of “time” and “word”. The word field stores word data generated as a result of the speech recognition processing executed by the word recognition unit 113, and the time field stores time information indicating the time at which the word data is stored. . This word data is an example of “utterance content data” according to the present invention.
In the present embodiment, articles, prepositions, and personal pronouns are not registered in the word DB 122, but may be registered.

翻訳辞書ＤＢ１２３は、翻訳部１１７により実行される翻訳処理に必要な翻訳辞書データを格納するデータベースである。この翻訳辞書ＤＢ１２３は、複数言語に対応すべく複数の翻訳辞書データを格納してもよい。 The translation dictionary DB 123 is a database that stores translation dictionary data necessary for translation processing executed by the translation unit 117. The translation dictionary DB 123 may store a plurality of translation dictionary data so as to correspond to a plurality of languages.

タッチパネル１３は、タッチセンサ等の入力装置である入力部１３１と、液晶ディスプレイ等の表示装置である表示部１３２とを備える。入力部１３１は、利用者による翻訳の指示を受け付けると、抽出部１１６にその旨を通知する。表示部１３２は、デコーダ１１２から取得した画像データにより表される画像を表示する。この表示部１３２は、本発明に係る「出力部」の一例である。 The touch panel 13 includes an input unit 131 that is an input device such as a touch sensor, and a display unit 132 that is a display device such as a liquid crystal display. When receiving an instruction for translation from the user, the input unit 131 notifies the extraction unit 116 of the fact. The display unit 132 displays an image represented by the image data acquired from the decoder 112. The display unit 132 is an example of the “output unit” according to the present invention.

カメラ１４は、例えば、ＣＣＤ（Charge Coupled Device）等の撮像素子とレンズとを備えたデジタルカメラである。カメラ１４は、生成した画像データを処理部１１に出力する。なお、カメラ１４は、静止画に限らず動画を撮影してもよい。 The camera 14 is a digital camera including an imaging element such as a CCD (Charge Coupled Device) and a lens, for example. The camera 14 outputs the generated image data to the processing unit 11. Note that the camera 14 may shoot a moving image as well as a still image.

マイク１５は、外国語会話理解支援装置１の利用者の音声を収音して音声データを処理部１１に出力する。
スピーカ１６は、処理部１１（具体的には、デコーダ１１２）から入力された音声データにより表される音声を出力する。 The microphone 15 collects the voice of the user of the foreign language conversation understanding support device 1 and outputs the voice data to the processing unit 11.
The speaker 16 outputs sound represented by the sound data input from the processing unit 11 (specifically, the decoder 112).

通信部１７は、例えばネットワークカードである。通信部１７は、外部装置との間でネットワークを介してデータ通信を行う。具体的には、通信部１７は、ネットワークを介して符号化された音声データ又は画像データが含まれるパケット群を受信すると、当該パケット群を再構成して、符号化された音声データ又は画像データを処理部１１に受け渡す。また、通信部１７は、処理部１１から符号化された音声データ又は画像データを取得すると、当該データをパケット化してネットワークを介して外部装置に送信する。
次に、処理部１１の各機能について説明する。 The communication unit 17 is a network card, for example. The communication unit 17 performs data communication with an external device via a network. Specifically, when the communication unit 17 receives a packet group including encoded audio data or image data via a network, the communication unit 17 reconstructs the packet group and encodes the encoded audio data or image data. Is transferred to the processing unit 11. In addition, when the communication unit 17 acquires the encoded audio data or image data from the processing unit 11, the communication unit 17 packetizes the data and transmits the packetized data to an external device via the network.
Next, each function of the processing unit 11 will be described.

エンコーダ１１１は、カメラ１４から画像データを取得したり、マイク１５から音声データを取得したりすると、当該データを符号化して通信部１７に受け渡す。
デコーダ１１２は、通信部１７から取得した符号化された音声データをデコード（言い換えると、復号）して、単語認識部１１３とスピーカ１６に受け渡す。また、デコーダ１１２は、通信部１７から取得した符号化された画像データをデコードして、表示制御部１１８に受け渡す。 When the encoder 111 acquires image data from the camera 14 or acquires audio data from the microphone 15, the encoder 111 encodes the data and passes it to the communication unit 17.
The decoder 112 decodes (in other words, decodes) the encoded audio data acquired from the communication unit 17 and passes it to the word recognition unit 113 and the speaker 16. In addition, the decoder 112 decodes the encoded image data acquired from the communication unit 17 and passes it to the display control unit 118.

単語認識部１１３は、デコーダ１１２から取得した音声データに対して、記憶部１２に記憶される学習モデルＤＢ１２１を参照して音声認識処理を行い、その処理の結果生成される単語データを記憶情報更新部１１４に受け渡す。音声認識処理の方法については周知の方法を用いてよい。
記憶情報更新部１１４は、単語認識部１１３から単語データを取得すると、時計部１１５から時刻情報を取得して、単語データと時刻情報とを対応づけて単語ＤＢ１２２に記憶する。 The word recognition unit 113 performs speech recognition processing on the speech data acquired from the decoder 112 with reference to the learning model DB 121 stored in the storage unit 12, and updates the stored word data as a result of the processing. Passed to the unit 114. A known method may be used for the speech recognition processing method.
When the storage information update unit 114 acquires word data from the word recognition unit 113, the storage information update unit 114 acquires time information from the clock unit 115 and associates the word data with the time information and stores them in the word DB 122.

時計部１１５は、記憶情報更新部１１４又は抽出部１１６からの要求を受けて、現在時刻を示す時刻情報を応答として返す。
抽出部１１６は、タッチパネル１３の入力部１３１が、利用者による翻訳の指示を受け付け、その旨の通知を受けると、一定の条件に合致する単語データを単語ＤＢ１２２から抽出する抽出処理を実行する。この抽出処理において抽出部１１６は、以下の処理を実行する。 The clock unit 115 receives a request from the storage information update unit 114 or the extraction unit 116, and returns time information indicating the current time as a response.
When the input unit 131 of the touch panel 13 receives a translation instruction from the user and receives a notification to that effect, the extraction unit 116 executes an extraction process of extracting word data that matches a certain condition from the word DB 122. In this extraction process, the extraction unit 116 executes the following process.

まず、抽出部１１６は、時計部１１５から時刻情報を取得する。次に、抽出部１１６の遡及時刻作成部１１６１は、取得した時刻情報により示される時刻から予め定められた時間遡った時刻を示す遡及時刻情報を作成する。この予め定められた時間は、本発明に係る「第１の時間」の一例である。この予め定められた時間は、例えば利用者により設定される。
次に、抽出部１１６は、作成した遡及時刻情報により示される時刻以降の時刻であって、かつ時計部１１５から取得した時刻情報により示される時刻以前の時刻を示す時刻情報と単語ＤＢ１２２において対応づけられている単語データを取得する。次に、抽出部１１６は、取得した単語データにより示される単語のリストのデータを翻訳部１１７に受け渡す。 First, the extraction unit 116 acquires time information from the clock unit 115. Next, the retroactive time creating unit 1161 of the extracting unit 116 creates retroactive time information indicating a time that is a predetermined time after the time indicated by the acquired time information. This predetermined time is an example of the “first time” according to the present invention. This predetermined time is set by a user, for example.
Next, the extraction unit 116 associates, in the word DB 122, time information indicating a time after the time indicated by the created retroactive time information and before the time indicated by the time information acquired from the clock unit 115. Get word data. Next, the extraction unit 116 passes the word list data indicated by the acquired word data to the translation unit 117.

翻訳部１１７は、抽出部１１６から単語リストのデータを取得すると、当該リスト内の各単語について翻訳辞書ＤＢ１２３を検索して翻訳先の単語を求め、翻訳元の単語（すなわち、リスト内の単語）と翻訳先の単語とを対応づけた単語ペアのリストのデータを表示制御部１１８に受け渡す。 When the translation unit 117 acquires the data of the word list from the extraction unit 116, the translation unit 117 searches the translation dictionary DB 123 for each word in the list to obtain a translation destination word, and the translation source word (that is, the word in the list). The data of the list of word pairs in which the translation destination word and the translation destination word are associated is transferred to the display control unit 118.

表示制御部１１８は、翻訳部１１７から取得したデータにより示される単語ペアのリストを、タッチパネル１３の表示部１３２により表示される画面の所定の位置に表示させる。
以上が処理部１１の各機能の説明である。 The display control unit 118 displays a list of word pairs indicated by the data acquired from the translation unit 117 at a predetermined position on the screen displayed by the display unit 132 of the touch panel 13.
The above is the description of each function of the processing unit 11.

１−２．動作
次に、外国語会話理解支援装置１の動作について説明する。具体的には、会話相手により発話された内容が音声認識されてその結果生成される単語データを単語ＤＢ１２２に記憶する単語記憶処理と、利用者による指示に応じて単語ＤＢ１２２から単語データを抽出して、その単語データにより示される単語とその翻訳先の単語とをタッチパネル１３に表示する単語表示処理について説明する。 1-2. Operation Next, the operation of the foreign language conversation understanding support device 1 will be described. Specifically, a word storage process for storing the word data generated as a result of speech recognition of the content uttered by the conversation partner in the word DB 122, and extracting the word data from the word DB 122 according to an instruction from the user. A word display process for displaying the word indicated by the word data and the translation destination word on the touch panel 13 will be described.

これらの処理の実行に先立ち外国語会話理解支援装置１が起動されると、音声認識の対象となる言語と翻訳先の言語とに応じて学習モデルと翻訳辞書とがそれぞれ利用学習モデル、利用翻訳辞書として選択される。ここで選択される学習モデル及び翻訳辞書は、利用者によりタッチパネル１３の入力部１３１を使って予め指定されてもよい。また、外国語会話理解支援装置１が起動されると、タッチパネル１３の入力部１３１は、利用者による翻訳の指示を受け付け可能な状態となる。 When the foreign language conversation understanding support device 1 is activated prior to the execution of these processes, the learning model and the translation dictionary are used as the learning model and the translation used, respectively, according to the speech recognition target language and the translation destination language. Selected as a dictionary. The learning model and the translation dictionary selected here may be designated in advance by the user using the input unit 131 of the touch panel 13. When the foreign language conversation understanding support device 1 is activated, the input unit 131 of the touch panel 13 is in a state where it can accept a translation instruction from the user.

以下の説明では、翻訳元の言語が英語であり翻訳先の言語が日本語である例について説明するが、言語の組み合わせはこの例に限られず、他の言語同士の組み合わせであってもよい。 In the following description, an example in which the language of the translation source is English and the language of the translation destination is Japanese will be described, but the combination of languages is not limited to this example, and may be a combination of other languages.

１−２−１．単語記憶処理
図３は、単語記憶処理の一例を示すフローチャートである。外国語会話理解支援装置１は、この単語記憶処理を、タッチパネル１３の入力部１３１により利用者による処理の終了の指示が受け付けられるか、又は当該装置の電源が切断されるまで継続する。 1-2-1. Word Storage Process FIG. 3 is a flowchart illustrating an example of the word storage process. The foreign language conversation understanding support device 1 continues this word storage processing until an instruction to end the processing by the user is received by the input unit 131 of the touch panel 13 or the power of the device is turned off.

まず、ステップＳ１１において通信部１７は、符号化された音声データを受信すると、当該音声データをデコーダ１１２に受け渡す。デコーダ１１２は、受け渡された音声データをデコードして、デコードされた音声データを単語認識部１１３に受け渡す（ステップＳ１２）。 First, in step S <b> 11, when the communication unit 17 receives encoded audio data, the communication unit 17 delivers the audio data to the decoder 112. The decoder 112 decodes the delivered voice data, and delivers the decoded voice data to the word recognition unit 113 (step S12).

単語認識部１１３は、受け渡された音声データに対して、記憶部１２に記憶される学習モデルＤＢ１２１を参照して音声認識処理を行い、その処理の結果生成される単語データを記憶情報更新部１１４に受け渡す（ステップＳ１３）。記憶情報更新部１１４は、単語データを受け取ると、時計部１１５から時刻情報を取得して、単語データと時刻情報とを対応づけて単語ＤＢ１２２に記憶する（ステップＳ１４）。例えば、単語「hello」を示す単語データと、時刻「10:05:30」を示す時刻情報とを取得した場合には、図２に示されるように、これらのデータを対応づけて単語ＤＢ１２２に記憶する。 The word recognition unit 113 performs speech recognition processing on the received speech data with reference to the learning model DB 121 stored in the storage unit 12, and stores the word data generated as a result of the processing as a storage information update unit It passes to 114 (step S13). When the storage information update unit 114 receives the word data, the storage information update unit 114 acquires the time information from the clock unit 115 and associates the word data with the time information and stores them in the word DB 122 (step S14). For example, when the word data indicating the word “hello” and the time information indicating the time “10:05:30” are acquired, these data are associated with each other in the word DB 122 as shown in FIG. Remember.

１−２−２．単語表示処理
図４は、単語表示処理の一例を示すフローチャートである。
まず、タッチパネル１３の入力部１３１は、利用者による翻訳の指示を受け付けると、抽出部１１６にその旨を通知する（ステップＳ２１）。抽出部１１６は、入力部１３１から通知を受けると、一定の条件に合致する単語データを単語ＤＢ１２２から抽出する抽出処理を実行する（ステップＳ２２）。 1-2-2. Word Display Process FIG. 4 is a flowchart illustrating an example of the word display process.
First, when the input unit 131 of the touch panel 13 receives an instruction for translation from the user, the input unit 131 notifies the extraction unit 116 of that (step S21). Upon receiving the notification from the input unit 131, the extraction unit 116 executes an extraction process for extracting word data that matches a certain condition from the word DB 122 (step S22).

図５は、この抽出処理の一例を示すフローチャートである。
この抽出処理において抽出部１１６はまず、入力部１３１から通知を受けると、時計部１１５に対して時刻情報の要求を出し、時計部１１５から時刻情報を取得する（ステップＳ３１）。次に、抽出部１１６の遡及時刻作成部１１６１は、時計部１１５から取得した時刻情報により示される時刻から予め定められた時間遡った時刻を示す遡及時刻情報を作成する（ステップＳ３２）。例えば、時計部１１５から取得した時刻情報が時刻「10:05:34」を示し、予め定められた時間が「３秒」に設定されている場合には、時刻「10:05:31」を示す遡及時刻情報が作成される。 FIG. 5 is a flowchart showing an example of this extraction process.
In this extraction process, the extraction unit 116 first receives a notification from the input unit 131, issues a request for time information to the clock unit 115, and acquires the time information from the clock unit 115 (step S31). Next, the retroactive time creating unit 1161 of the extracting unit 116 creates retroactive time information indicating a time that is a predetermined time after the time indicated by the time information acquired from the clock unit 115 (step S32). For example, when the time information acquired from the clock unit 115 indicates the time “10:05:34” and the predetermined time is set to “3 seconds”, the time “10:05:31” is set. The retroactive time information shown is created.

次に、抽出部１１６は、作成した遡及時刻情報により示される時刻以降の時刻であって、かつ時計部１１５から取得した時刻情報により示される時刻以前の時刻を示す時刻情報と単語ＤＢ１２２において対応づけられている単語データを取得する（ステップＳ３３）。例えば、作成した遡及時刻情報が時刻「10:05:31」を示し、時計部１１５から取得した時刻情報が時刻「10:05:34」を示す場合には、図２に示される単語ＤＢ１２２の例においては、単語「live」、「Tokyo」、「work」、「computer」及び「company」を示す各単語データが取得される。 Next, the extraction unit 116 associates, in the word DB 122, time information indicating a time after the time indicated by the created retroactive time information and before the time indicated by the time information acquired from the clock unit 115. The acquired word data is acquired (step S33). For example, when the created retroactive time information indicates the time “10:05:31” and the time information acquired from the clock unit 115 indicates the time “10:05:34”, the word DB 122 illustrated in FIG. In the example, each word data indicating the words “live”, “Tokyo”, “work”, “computer”, and “company” is acquired.

次に、抽出部１１６は、取得した単語データにより示される単語のリストのデータを翻訳部１１７に受け渡す（ステップＳ３４）。
以上が抽出処理の説明である。 Next, the extraction unit 116 transfers the data of the word list indicated by the acquired word data to the translation unit 117 (step S34).
The above is the description of the extraction process.

翻訳部１１７は、抽出部１１６から単語リストのデータを取得すると、当該リスト内の各単語について翻訳辞書ＤＢ１２３を検索して翻訳先の単語を求め、翻訳元の単語（すなわち、リスト内の単語）と翻訳先の単語とを対応づけた単語ペアのリストのデータを表示制御部１１８に受け渡す（ステップＳ２３）。表示制御部１１８は、受け渡されたデータにより示される単語ペアのリストを、タッチパネル１３の表示部１３２により表示される画面の所定の位置に表示させる（ステップＳ２４）。 When the translation unit 117 acquires the data of the word list from the extraction unit 116, the translation unit 117 searches the translation dictionary DB 123 for each word in the list to obtain a translation destination word, and the translation source word (that is, the word in the list). Data of the list of word pairs in which the translation destination words are associated with each other is transferred to the display control unit 118 (step S23). The display control unit 118 displays a list of word pairs indicated by the transferred data at a predetermined position on the screen displayed by the display unit 132 of the touch panel 13 (step S24).

図６は、ステップＳ２４の結果タッチパネル１３の表示部１３２に表示される画面の一例を示す図である。同画面は、図６に示されるように、会話相手の映像を表示する映像表示エリアＡ１と、単語ペアリストを表示するリスト表示エリアＡ２とを有する。
リスト表示エリアＡ２に表示された単語ペアリストは、利用者によるタッチパネル１３の入力部１３１に対する操作に応じて非表示とされてもよいし、一定時間後に自動的に非表示とされてもよい。 FIG. 6 is a diagram illustrating an example of a screen displayed on the display unit 132 of the touch panel 13 as a result of step S24. As shown in FIG. 6, this screen has a video display area A1 for displaying the video of the conversation partner and a list display area A2 for displaying the word pair list.
The word pair list displayed in the list display area A2 may be hidden in response to an operation on the input unit 131 of the touch panel 13 by the user, or may be automatically hidden after a certain time.

なお、単語ペアリストが表示される画面上の位置は、図示された例に限られず、会話の進行を妨げない位置であればよい。例えば、実行中の会話進行用のアプリケーションが図示された例のようにＴＶ会議用のアプリケーションである場合には、会話相手の映像が表示されていない領域であればよい。 The position on the screen where the word pair list is displayed is not limited to the illustrated example, and may be any position that does not hinder the progress of the conversation. For example, when the application for progressing conversation being executed is a TV conference application as in the illustrated example, it may be an area where the video of the conversation partner is not displayed.

以上説明した本実施形態に係る外国語会話理解支援装置１によれば、会話相手により発話された内容が継続的に音声認識されてその単語が記憶され、利用者による翻訳の指示が受け付けられると、その時点から予め定められた時間遡った時刻以降の時刻と対応づけられている単語が特定され、その単語とその翻訳先の単語とが画面に表示される。よって、利用者は、会話の中で理解できない部分があった場合に、その時点よりも後に翻訳の指示をすることで、その部分の内容を理解することができる。 According to the foreign language conversation understanding support device 1 according to the present embodiment described above, the content uttered by the conversation partner is continuously voice-recognized, the word is stored, and a translation instruction from the user is accepted. Then, a word associated with a time after a predetermined time after that time is specified, and the word and a translation destination word are displayed on the screen. Therefore, when there is a part that cannot be understood in the conversation, the user can understand the contents of the part by giving a translation instruction after that point.

また、この外国語会話理解支援装置１によれば、利用者は画面に対するタップ操作等の操作により翻訳の指示を行うことができる。このような操作は比較的簡単な操作であり、また音声を使った操作ではないため、音声により行われる会話を妨げることがない。また、このような操作は比較的目立たない操作でもあるため、会話相手に訳語を調べていることを悟られにくい。
また、この外国語会話理解支援装置１によれば、音声認識された単語とその翻訳先の単語とは音声情報としてではなく文字情報として画面に表示される。そのため、音声により行われる会話を妨げることがない。 Further, according to the foreign language conversation understanding support device 1, the user can instruct translation by an operation such as a tap operation on the screen. Such an operation is a relatively simple operation and is not an operation using voice, so that it does not hinder a conversation performed by voice. Moreover, since such an operation is also a relatively inconspicuous operation, it is difficult for the conversation partner to realize that the translated word is being examined.
Further, according to the foreign language conversation understanding support device 1, the speech-recognized word and the translated word are displayed on the screen as character information, not as speech information. Therefore, it does not disturb the conversation conducted by voice.

２．変形例
上記の実施形態は、以下のように変形してもよい。また、以下の変形例は互いに組み合わせてもよい。 2. Modifications The above embodiment may be modified as follows. Further, the following modifications may be combined with each other.

２−１．変形例１
上記の実施形態に係る抽出処理において抽出部１１６は、時刻情報に加えてさらに別の情報に基づいて単語データを抽出するようにし、利用者による必要な情報の特定を支援するようにしてもよい。具体的な方法としては以下の２つの方法が考えられる。
（１）時刻情報に加えて、翻訳指示の際に利用者から受け付ける別の情報に基づいて単語データの抽出を行う。
（２）時刻情報に加えて、利用者により予め設定された別の情報に基づいて単語データの抽出を行う。 2-1. Modification 1
In the extraction process according to the above-described embodiment, the extraction unit 116 may extract word data based on further information in addition to the time information, and may support the specification of necessary information by the user. . The following two methods are conceivable as specific methods.
(1) In addition to time information, word data is extracted based on other information received from the user at the time of a translation instruction.
(2) In addition to the time information, word data is extracted based on other information preset by the user.

２−１−１．利用者から受け付ける別の情報に基づいて絞り込みを行う場合
上記の（１）の方法としては具体的には以下の２つの方法が考えられる。
（１−１）利用者により入力される検索文字に基づいて絞り込みを行う。
（１−２）利用者により入力される発音記号（又は、発音記号を特定するための情報）に基づいて絞り込みを行う。 2-1-1. When narrowing down based on other information received from the user As the method (1), specifically, the following two methods are conceivable.
(1-1) Narrowing is performed based on a search character input by a user.
(1-2) Narrowing down based on phonetic symbols (or information for specifying phonetic symbols) input by the user.

２−１−１−１．検索文字に基づいて絞り込みを行う場合
本例の場合、タッチパネル１３の入力部１３１は、上記の単語表示処理のステップＳ２１において、利用者による翻訳の指示に加えて検索文字の入力を受け付け、当該指示を受け付けたことを抽出部１１６に通知するとともに、当該入力された検索文字を示す文字データを抽出部１１６に受け渡すようにしてもよい。ここで、検索文字は１文字であっても複数文字であってもよい。 2-1-1-1. In the case of this example, the input unit 131 of the touch panel 13 accepts input of a search character in addition to a translation instruction by the user in step S21 of the word display process, and the instruction May be notified to the extraction unit 116, and character data indicating the input search character may be transferred to the extraction unit 116. Here, the search character may be one character or a plurality of characters.

抽出部１１６は、当該通知を受け、かつ文字データを取得すると、上記の抽出処理のステップＳ３３において、遡及時刻情報により示される時刻以降の時刻であって、かつ時計部１１５から取得した時刻情報により示される時刻以前の時刻を示す時刻情報と単語ＤＢ１２２において対応づけられている単語データであって、かつ取得した文字データにより示される検索文字で始まる（又は、含む）単語を表す単語データを取得するようにしてもよい。
例えば、作成した遡及時刻情報が時刻「10:05:31」を示し、時計部１１５から取得した時刻情報が時刻「10:05:34」を示し、かつ取得した文字データが文字「li」を示す場合には、図２に示される単語ＤＢ１２２の例においては、単語「live」を示す単語データが取得される。 When the extraction unit 116 receives the notification and acquires the character data, the extraction unit 116 obtains the time information after the time indicated by the retroactive time information and the time information acquired from the clock unit 115 in step S33 of the extraction process. Word data that is associated with time information indicating a time before the indicated time in the word DB 122 and that represents a word that starts (or includes) a search character indicated by the acquired character data is acquired. You may do it.
For example, the created retroactive time information indicates the time “10:05:31”, the time information acquired from the clock unit 115 indicates the time “10:05:34”, and the acquired character data indicates the character “li”. In the case of showing, in the example of the word DB 122 shown in FIG. 2, word data indicating the word “live” is acquired.

なお、検索文字が入力部１３１に入力される際には、タッチパネル１３の表示部１３２は画面にソフトウェアキーボードを表示するようにし、このソフトウェアキーボードを使って検索文字は入力されてもよい。または、タッチパネル１３の表示部１３２は画面に手書き入力用の領域を表示するようにし、当該領域に入力された文字が検索文字として認識されてもよい。 When the search character is input to the input unit 131, the display unit 132 of the touch panel 13 may display a software keyboard on the screen, and the search character may be input using the software keyboard. Alternatively, the display unit 132 of the touch panel 13 may display an area for handwriting input on the screen, and a character input in the area may be recognized as a search character.

また別の例として、検索文字は、外国語会話理解支援装置１に接続された有体のキーボード（図示せず）を使って入力されてもよい。または、マイク１５により収音された利用者の音声のデータに対して音声認識処理を行い、その結果特定される文字を検索文字としてもよい。 As another example, the search characters may be input using a tangible keyboard (not shown) connected to the foreign language conversation understanding support device 1. Alternatively, voice recognition processing may be performed on the user's voice data collected by the microphone 15, and the character specified as a result may be used as a search character.

２−１−１−２．発音記号に基づいて絞り込みを行う場合
図７は、本例に係る外国語会話理解支援装置１Ａの機能ブロック図である。同図に示される外国語会話理解支援装置１Ａは、記憶部１２が単語ＤＢ１２２Ａと、発音記号辞書ＤＢ１２４と、読み辞書ＤＢ１２５とを記憶する点において、上記の実施形態に係る外国語会話理解支援装置１と相違している。 2-1-1-2. FIG. 7 is a functional block diagram of the foreign language conversation understanding support device 1A according to the present example. The foreign language conversation understanding support device 1A shown in the figure is the foreign language conversation understanding support device according to the above embodiment in that the storage unit 12 stores the word DB 122A, the phonetic symbol dictionary DB 124, and the reading dictionary DB 125. 1 and different.

図８は、単語ＤＢ１２２Ａのデータ構成の一例を示す図である。この単語ＤＢ１２２Ａの各レコードは、図８に示されるように、上記の実施形態に係る単語ＤＢ１２２と比較して、「発音記号」のフィールドをさらに有する点において相違している。 FIG. 8 is a diagram illustrating an example of a data configuration of the word DB 122A. As shown in FIG. 8, each record of the word DB 122A is different from the word DB 122 according to the above embodiment in that it further has a “phonetic symbol” field.

図９は、発音記号辞書ＤＢ１２４のデータ構成の一例を示す図である。この発音記号辞書ＤＢ１２４は、翻訳元の言語の単語データとその発音記号データとを対応づけて格納するデータベースである。この発音記号辞書ＤＢ１２４を構成する各レコードは、図９に示されるように、「翻訳元言語の単語」と「発音記号」の各フィールドにより構成される。 FIG. 9 is a diagram illustrating an example of a data configuration of the phonetic symbol dictionary DB 124. The phonetic symbol dictionary DB 124 is a database that stores word data of a translation source language and its phonetic symbol data in association with each other. As shown in FIG. 9, each record constituting the phonetic symbol dictionary DB 124 includes fields of “word of translation source language” and “phonetic symbol”.

図１０は、読み辞書ＤＢ１２５のデータ構成の一例を示す図である。この読み辞書ＤＢ１２５は、翻訳先の言語の文字（具体的には、表音文字）データと、翻訳元の言語の発音記号データとを対応づけて格納するデータベースである。この読み辞書ＤＢ１２５を構成する各レコードは、図１０に示されるように、「翻訳先言語の文字」と「翻訳元言語の発音記号」の各フィールドにより構成される。１つの文字データに対して複数の発音記号データが対応づけられてもよい。 FIG. 10 is a diagram illustrating an example of a data configuration of the reading dictionary DB 125. The reading dictionary DB 125 is a database that stores character (specifically, phonetic character) data of a translation destination language and phonetic symbol data of a translation source language in association with each other. As shown in FIG. 10, each record constituting the reading dictionary DB 125 includes fields of “translation target language characters” and “translation source language phonetic symbols”. A plurality of phonetic symbol data may be associated with one character data.

この外国語会話理解支援装置１Ａにおいて、記憶情報更新部１１４は、上記の単語記憶処理のステップＳ１４において、単語認識部１１３から単語データを受け取ると、当該単語データと発音記号辞書ＤＢ１２４において対応づけられている発音記号データを取得するようにしてもよい。そして、時計部１１５から時刻情報を取得して、当該単語データと発音記号データと時刻情報とを対応づけて単語ＤＢ１２２Ａに記憶するようにしてもよい。 In the foreign language conversation understanding support device 1A, when the storage information update unit 114 receives word data from the word recognition unit 113 in step S14 of the word storage process, the storage information update unit 114 associates the word data with the pronunciation symbol dictionary DB 124. The phonetic symbol data may be acquired. Then, time information may be acquired from the clock unit 115, and the word data, phonetic symbol data, and time information may be associated with each other and stored in the word DB 122A.

タッチパネル１３の入力部１３１は、上記の単語表示処理のステップＳ２１において、利用者による翻訳の指示に加えて、発音記号を特定するための文字（具体的には、表音文字）の入力を受け付け、当該指示を受け付けたことを抽出部１１６に通知するとともに、当該入力された文字を示す文字データを抽出部１１６に受け渡すようにしてもよい。 In step S21 of the word display process, the input unit 131 of the touch panel 13 accepts input of characters (specifically, phonetic characters) for specifying phonetic symbols in addition to instructions for translation by the user. The extraction unit 116 may be notified that the instruction has been received, and character data indicating the input character may be transferred to the extraction unit 116.

抽出部１１６は、当該通知を受け、かつ文字データを取得すると、上記の抽出処理のステップＳ３３において、まず、当該文字データと読み辞書ＤＢ１２５に対応づけられている発音記号データを取得するようにしてもよい。
例えば、取得した文字データが文字「り」を示す場合には、図１０に示される読み辞書ＤＢ１２５の例においては、発音記号「l」及び「r」を示す各発音記号データが取得される。 When the extraction unit 116 receives the notification and acquires character data, the extraction unit 116 first acquires phonetic symbol data associated with the character data and the reading dictionary DB 125 in step S33 of the extraction process. Also good.
For example, when the acquired character data indicates the character “RI”, each phonetic symbol data indicating phonetic symbols “l” and “r” is acquired in the example of the reading dictionary DB 125 shown in FIG.

そして、抽出部１１６は、遡及時刻情報により示される時刻以降の時刻であって、かつ時計部１１５から取得した時刻情報により示される時刻以前の時刻を示す時刻情報と単語ＤＢ１２２において対応づけられている単語データであって、かつ取得した発音記号データにより示される発音記号で始まる発音記号群を示す発音記号データと単語ＤＢ１２２において対応づけられている単語データを取得するようにしてもよい。
例えば、作成した遡及時刻情報が時刻「10:05:31」を示し、時計部１１５から取得した時刻情報が時刻「10:05:34」を示し、かつ取得した発音記号データがそれぞれ発音記号「l」、「r」を示す場合には、図８に示される単語ＤＢ１２２Ａの例においては、単語「live」を示す単語データが取得される。 The extraction unit 116 associates in the word DB 122 with time information indicating a time after the time indicated by the retroactive time information and before the time indicated by the time information acquired from the clock unit 115. You may make it acquire the word data matched with the phonetic symbol data which is word data and shows the phonetic symbol group which begins with the phonetic symbol shown by the acquired phonetic symbol data in the word DB122.
For example, the created retroactive time information indicates the time “10:05:31”, the time information acquired from the clock unit 115 indicates the time “10:05:34”, and the acquired phonetic symbol data is the phonetic symbol “ When “l” and “r” are indicated, word data indicating the word “live” is acquired in the example of the word DB 122A shown in FIG.

なお、抽出部１１６により取得される単語データは、上記の例のように、取得した発音記号データにより示される発音記号で始まる発音記号群を示す発音記号データと対応づけられている単語データに限られず、取得した発音記号データにより示される発音記号を含む発音記号群を示す発音記号データと対応づけられている単語データであってもよい。 Note that the word data acquired by the extraction unit 116 is limited to word data associated with phonetic symbol data indicating a phonetic symbol group starting with a phonetic symbol indicated by the acquired phonetic symbol data as in the above example. Instead, it may be word data associated with phonetic symbol data indicating a phonetic symbol group including a phonetic symbol indicated by the acquired phonetic symbol data.

この例によれば、利用者は、その意味を理解できなかった単語のスペルがわからなくても、聞き取った音をたよりに当該単語の訳語を表示させることができる。 According to this example, even if the user does not understand the spelling of the word whose meaning could not be understood, the user can display the translation of the word based on the sound that he has heard.

なお、この例において発音記号を特定するための文字は、タッチパネル１３の表示部１３２に表示されるソフトウェアキーボードや手書き入力用の領域を使って入力されてもよい。または、外国語会話理解支援装置１Ａに接続された有体のキーボード（図示せず）を使って入力されてもよい。
図１１は、発音記号を特定するための文字が手書き入力用の領域を使って入力される場合にタッチパネル１３の表示部１３２に表示される画面の一例を示す図である。同画面は、図１１に示されるように、図６に示される画面と比較して、手書き入力領域Ａ３をさらに有している。 In this example, a character for specifying a phonetic symbol may be input using a software keyboard or a handwriting input area displayed on the display unit 132 of the touch panel 13. Alternatively, the input may be performed using a tangible keyboard (not shown) connected to the foreign language conversation understanding support device 1A.
FIG. 11 is a diagram illustrating an example of a screen displayed on the display unit 132 of the touch panel 13 when a character for specifying a phonetic symbol is input using a handwriting input area. As shown in FIG. 11, the screen further has a handwriting input area A3 as compared to the screen shown in FIG.

また別の例として、発音記号を特定するための文字はマイク１５を使って入力されてもよい。この場合、マイク１５は、収音した利用者の音声のデータを音声記号変換部（図示せず）に受け渡してもよい。そして、音声記号変換部は、取得した音声データに対して、音声データに基づいて発音記号データを出力するための学習モデル（図示せず）を参照して音声認識処理を行い、その結果生成される発音記号データを抽出部１１６に受け渡すようにしてもよい。抽出部１１６は、取得した発音記号データを使って単語データの絞り込みを行うようにしてもよい。 As another example, a character for specifying a phonetic symbol may be input using the microphone 15. In this case, the microphone 15 may deliver the collected voice data of the user to a phonetic symbol conversion unit (not shown). The phonetic symbol conversion unit performs voice recognition processing on the acquired voice data with reference to a learning model (not shown) for outputting phonetic symbol data based on the voice data, and is generated as a result. The phonetic symbol data may be transferred to the extraction unit 116. The extraction unit 116 may narrow down word data using the acquired phonetic symbol data.

２−１−２．利用者により予め設定された別の情報に基づいて絞り込みを行う場合
上記の（２）の方法としては具体的には以下の２つの方法が考えられる。
（２−１）単語の品詞の種類に基づいて絞り込みを行う。
（２−２）単語のレベル（具体的には、難易度）に基づいて絞り込みを行う。 2-1-2. When narrowing down based on other information preset by the user As the above method (2), specifically, the following two methods are conceivable.
(2-1) Narrow down based on the type of part of speech of the word.
(2-2) Narrowing is performed based on the level of the word (specifically, difficulty level).

２−１−２−１．単語の品詞の種類に基づいて絞り込みを行う場合
本例の場合、単語認識部１１３は、上記の単語記憶処理のステップＳ１３において、デコーダ１１２から音声データを取得すると、当該音声データに対して、記憶部１２に記憶される学習モデルＤＢ１２１を参照して音声認識処理を行い、その処理の結果生成される単語データと、当該単語データにより表される単語の品詞を示す品詞データとを記憶情報更新部１１４に受け渡すようにしてもよい。 2-2-1-1. When narrowing down based on the type of part of speech of a word In the case of this example, when the word recognition unit 113 acquires voice data from the decoder 112 in step S13 of the word storage process, the word recognition unit 113 stores the voice data. A speech recognition process is performed with reference to the learning model DB 121 stored in the unit 12, and word data generated as a result of the process and part of speech data indicating the part of speech of the word represented by the word data are stored information update unit It may be transferred to 114.

単語データに加えて品詞データを取得する記憶情報更新部１１４は、上記の単語記憶処理のステップＳ１４において、まず、記憶部１２に記憶される、品詞の種類と重要度とを対応づけた評価基準情報（図示せず）を参照して、当該品詞データにより示される品詞の重要度を特定するようにしてもよい。そして、記憶情報更新部１１４は、その特定した重要度が予め定められた閾値以上である場合にのみ、取得した単語データを時計部１１５から取得した時刻情報と対応づけて単語ＤＢ１２２に記憶するようにしてもよい。ここで、品詞の重要度と閾値とは、利用者により予め設定されてもよい。 The storage information updating unit 114 that acquires part-of-speech data in addition to word data first evaluates the correspondence between the type of part-of-speech and importance stored in the storage unit 12 in step S14 of the word storage process. You may make it identify the importance of the part of speech shown with the said part of speech data with reference to information (not shown). Then, the stored information update unit 114 stores the acquired word data in the word DB 122 in association with the time information acquired from the clock unit 115 only when the identified importance is equal to or greater than a predetermined threshold. It may be. Here, the importance level and the threshold value of the part of speech may be set in advance by the user.

この例によれば、予め定められた閾値以上の重要度をもつ品詞の単語データのみが単語ＤＢ１２２に記憶されるため、そのような単語データのみが抽出処理において抽出部１１６により抽出されることになる。 According to this example, since only the part-of-speech word data having an importance level equal to or higher than a predetermined threshold is stored in the word DB 122, only such word data is extracted by the extraction unit 116 in the extraction process. Become.

なお、本例において、記憶情報更新部１１４は、上述の評価基準情報に代えて、記憶対象とすべき品詞のリストの情報（図示せず）を参照して、当該リストに含まれる品詞の単語データのみを時刻情報と対応づけて単語ＤＢ１２２に記憶するようにしてもよい。 In this example, the stored information updating unit 114 refers to information on a list of parts of speech to be stored (not shown) instead of the above-described evaluation criterion information, and the word of speech included in the list. Only data may be stored in the word DB 122 in association with time information.

２−１−２−２．単語のレベルに基づいて絞り込みを行う場合
本例の場合、記憶情報更新部１１４は、上記の単語記憶処理のステップＳ１４において、単語認識部１１３から単語データを取得すると、まず、記憶部１２に記憶される、単語データとレベル情報とを対応づけたレベル辞書（図示せず）を参照して、当該単語データにより表される単語のレベルを特定するようにしてもよい。そして、記憶情報更新部１１４は、その特定したレベルが利用者の語学レベル以上である場合にのみ、取得した単語データを時計部１１５から取得した時刻情報と対応づけて単語ＤＢ１２２に記憶するようにしてもよい。ここで、利用者の語学レベルは、タッチパネル１３の入力部１３１を使って予め入力されてもよい。 2-1-2-2. In the case of this example, the storage information update unit 114 first stores word data in the storage unit 12 after acquiring word data from the word recognition unit 113 in step S14 of the above word storage process. The level of the word represented by the word data may be specified with reference to a level dictionary (not shown) that associates the word data with the level information. The stored information update unit 114 stores the acquired word data in the word DB 122 in association with the time information acquired from the clock unit 115 only when the specified level is equal to or higher than the language level of the user. May be. Here, the language level of the user may be input in advance using the input unit 131 of the touch panel 13.

この例によれば、利用者の語学レベル以上のレベルの単語データのみが単語ＤＢ１２２に記憶されるため、そのような単語データのみが抽出処理において抽出部１１６により抽出されることになる。 According to this example, since only word data of a level higher than the user's language level is stored in the word DB 122, only such word data is extracted by the extraction unit 116 in the extraction process.

２−２．変形例２
上記の実施形態に係る単語記憶処理では、通信部１７により受信された符号化された音声データがデコーダ１１２によりデコードされて単語認識部１１３に受け渡されているが（ステップＳ１１及びＳ１２参照）、利用者の会話相手が外国語会話理解支援装置１の前に存在する場合には、マイク１５により収音された会話相手の音声のデータが音声認識処理の対象として単語認識部１１３に受け渡されてもよい。 2-2. Modification 2
In the word storage processing according to the above embodiment, the encoded voice data received by the communication unit 17 is decoded by the decoder 112 and transferred to the word recognition unit 113 (see steps S11 and S12). When the user's conversation partner is present in front of the foreign language conversation understanding support device 1, the voice data of the conversation partner collected by the microphone 15 is transferred to the word recognition unit 113 as a target for speech recognition processing. May be.

２−３．変形例３
上記の実施形態に係る単語記憶処理において、音声認識の対象となる言語に応じて、音声認識処理用のソフトウェアを使い分けてもよい。また、上記の実施形態に係る単語表示処理において、翻訳元の言語と翻訳先の言語とに応じて、翻訳処理用のソフトウェアを使い分けてもよい。利用される音声認識用のソフトウェアと翻訳処理用のソフトウェアとは、利用者により予め設定されてもよい。 2-3. Modification 3
In the word storage processing according to the above-described embodiment, the software for speech recognition processing may be properly used according to the language that is the target of speech recognition. Further, in the word display processing according to the above-described embodiment, the software for translation processing may be properly used according to the language of the translation source and the language of the translation destination. The voice recognition software and translation processing software to be used may be set in advance by the user.

２−４．変形例４
上記の実施形態に係る単語記憶処理では、単語データと時刻情報とが対応づけて単語ＤＢ１２２に記憶されているが（ステップＳ１４参照）、単語データに代えて、複数の単語からなる文のデータと時刻情報とを対応づけて単語ＤＢ１２２に記憶するようにしてもよい。これに伴い、上記の実施形態に係る単語表示処理では、抽出部１１６は、単語データに代えて文データを抽出して、抽出した文データにより表される文のリストのデータを翻訳部１１７に受け渡すようにしてもよい（ステップＳ２２参照）。また、翻訳部１１７は、取得したリストデータ内の各文について翻訳辞書ＤＢ１２３を検索して翻訳先の文を求め、翻訳元の文（すなわち、リスト内の文）と翻訳先の文とを対応づけた文ペアのリストのデータを表示制御部１１８に受け渡すようにしてもよい（ステップＳ２３参照）。 2-4. Modification 4
In the word storage processing according to the above embodiment, the word data and the time information are associated with each other and stored in the word DB 122 (see step S14). However, instead of the word data, the sentence data including a plurality of words and The time information may be stored in the word DB 122 in association with the time information. Accordingly, in the word display process according to the above-described embodiment, the extraction unit 116 extracts sentence data instead of word data, and sends the sentence list data represented by the extracted sentence data to the translation unit 117. You may make it deliver (refer step S22). The translation unit 117 searches the translation dictionary DB 123 for each sentence in the acquired list data to obtain a translation destination sentence, and associates the translation source sentence (that is, the sentence in the list) with the translation destination sentence. The attached sentence pair list data may be transferred to the display control unit 118 (see step S23).

２−５．変形例５
上記の実施形態において単語ＤＢ１２２に記憶される単語データと時刻情報との組は古いものから順に順次削除されてもよい。例えば、上記の単語記憶処理のステップＳ１４において記憶情報更新部１１４は、記憶部１２に確保されたリングバッファに単語データと時刻情報との組を記憶するようにしてもよい。この場合、記憶情報更新部１１４は、最後に書き込みを行ったバッファのアドレスの情報を管理しておき、次に書き込みを行う際には当該アドレスの次のアドレスのバッファに対して行うようにしてもよい。 2-5. Modification 5
In the above embodiment, pairs of word data and time information stored in the word DB 122 may be sequentially deleted from the oldest. For example, in step S14 of the word storage process described above, the storage information update unit 114 may store a set of word data and time information in a ring buffer secured in the storage unit 12. In this case, the storage information update unit 114 manages the address information of the buffer that was last written, and the next time writing is performed on the buffer at the next address of the address. Also good.

また別の例として、記憶情報更新部１１４は、単語データと時刻情報との組を新たに単語ＤＢ１２２に記憶する際に、当該時刻情報に示される時刻よりも予め定められた時間遡った時刻よりも前の時刻を示す時刻情報と対応づけられている単語データをその時刻情報とともに単語ＤＢ１２２から削除するようにしてもよい。
または、記憶情報更新部１１４は、一定の周期で時計部１１５から時刻情報を取得し、その時刻情報に示される時刻よりも予め定められた時間遡った時刻よりも前の時刻を示す時刻情報と対応づけられている単語データをその時刻情報とともに単語ＤＢ１２２から削除するようにしてもよい。 As another example, when the stored information update unit 114 newly stores a set of word data and time information in the word DB 122, the stored information update unit 114 starts from a time that is a predetermined time later than the time indicated by the time information. Alternatively, the word data associated with the time information indicating the previous time may be deleted from the word DB 122 together with the time information.
Alternatively, the stored information updating unit 114 acquires time information from the clock unit 115 at a constant cycle, and time information indicating a time before a time that is a predetermined time before the time indicated by the time information. The associated word data may be deleted from the word DB 122 together with the time information.

２−６．変形例６
上記の実施形態に係る単語記憶処理では、デコーダ１１２により音声データのデコードがなされ、単語認識部１１３により当該音声データに対して音声認識処理が施されることによって生成された単語データが時刻情報と対応づけられて単語ＤＢ１２２に記憶されているが（ステップＳ１２〜Ｓ１４参照）、この単語データに代えて、デコーダ１１２によりデコードされた音声データと時刻情報とを単語ＤＢ１２２に記憶するようにしてもよい。ここで、時刻情報と対応づけられる音声データは、デコードされた音声データにおいて所定期間の無音期間ごとに切り出された音声データとしてもよい（例えば、特開２００１−１５４６９１号公報参照）。本変形例に係る音声データは、本発明に係る「発話内容データ」の一例である。 2-6. Modification 6
In the word storage processing according to the above embodiment, the speech data is decoded by the decoder 112, and the word data generated by performing the speech recognition processing on the speech data by the word recognition unit 113 is the time information. Correspondingly stored in the word DB 122 (see steps S12 to S14), instead of the word data, the audio data decoded by the decoder 112 and the time information may be stored in the word DB 122. . Here, the audio data associated with the time information may be audio data cut out for each silence period of the predetermined period in the decoded audio data (see, for example, JP-A-2001-154691). The audio data according to this modification is an example of “utterance content data” according to the present invention.

この場合、上記の実施形態に係る単語表示処理では、抽出部１１６は、単語データに代えて音声データを単語ＤＢ１２２から抽出した後（ステップＳ２２参照）、この音声データを単語認識部１１３に受け渡すようにしてもよい。単語認識部１１３は、取得した音声データに対して、記憶部１２に記憶される学習モデルＤＢ１２１を参照して音声認識処理を行い、その処理の結果生成される単語データにより示される単語のリストのデータを翻訳部１１７に受け渡すようにしてもよい。翻訳部１１７は、取得した単語リスト内の各単語について翻訳辞書ＤＢ１２３を検索して翻訳先の単語を求め、翻訳元の単語と翻訳先の単語とを対応づけた単語ペアのリストのデータを表示制御部１１８に受け渡すようにしてもよい（ステップＳ２３）。 In this case, in the word display process according to the above-described embodiment, the extraction unit 116 extracts the voice data from the word DB 122 instead of the word data (see step S22), and then passes this voice data to the word recognition unit 113. You may do it. The word recognition unit 113 performs a speech recognition process on the acquired speech data with reference to the learning model DB 121 stored in the storage unit 12, and creates a list of words indicated by the word data generated as a result of the process. Data may be transferred to the translation unit 117. The translation unit 117 searches the translation dictionary DB 123 for each word in the acquired word list to obtain a translation destination word, and displays word pair list data that associates the translation source word with the translation destination word. You may make it deliver to the control part 118 (step S23).

この変形例によれば、画面に表示される単語を示す音声データについてのみ音声認識処理を行えばよいため、処理部１１に対する処理負荷を軽減することができる。
なお、上記の例ではデコードされた音声データと時刻情報とが単語ＤＢ１２２に記憶されているが、符号化されたままの音声データと時刻情報とを単語ＤＢ１２２に記憶しておき、音声データを抽出後に（ステップＳ２２参照）デコードして、その後音声認識処理を行うようにしてもよい。この例によれば、不必要なデコード処理が省略されるため、その点においてさらに処理部１１に対する処理負荷を軽減することができる。 According to this modification, since it is only necessary to perform the speech recognition process on the speech data indicating the word displayed on the screen, the processing load on the processing unit 11 can be reduced.
In the above example, the decoded voice data and time information are stored in the word DB 122. However, the encoded voice data and time information are stored in the word DB 122 to extract the voice data. It may be decoded later (see step S22), and then the speech recognition process may be performed. According to this example, since unnecessary decoding processing is omitted, the processing load on the processing unit 11 can be further reduced in that respect.

２−７．変形例７
上記の実施形態においては音声認識処理は外国語会話理解支援装置１内において行われているが、この装置とネットワークを介して接続される他の装置（例えば、音声認識サーバ）において行われてもよい。この場合、上記の単語記憶処理において通信部１７は、受信した符号化された音声データを音声認識サーバ（図示せず）に転送し、音声認識サーバにて音声認識処理の結果生成された単語データを受信して、この単語データを記憶情報更新部１１４に受け渡すようにしてもよい。ここで、通信部１７は、すでに符号化されている音声データをそのまま音声認識サーバに転送すればよいため、新たにエンコード処理を行う必要はなく、従って処理部１１に対する新たな処理負荷が発生することはない。 2-7. Modification 7
In the above embodiment, the speech recognition process is performed in the foreign language conversation understanding support device 1, but it may be performed in another device (for example, a speech recognition server) connected to this device via a network. Good. In this case, in the word storage process, the communication unit 17 transfers the received encoded voice data to a voice recognition server (not shown), and the word data generated as a result of the voice recognition process in the voice recognition server. And the word data may be transferred to the stored information updating unit 114. Here, since the communication unit 17 only needs to transfer the already encoded audio data to the speech recognition server as it is, there is no need to perform a new encoding process, and thus a new processing load is generated on the processing unit 11. There is nothing.

２−８．変形例８
上記の実施形態に係る単語表示処理では、抽出部１１６によって抽出された単語データは翻訳部１１７により翻訳処理され、その結果、翻訳元の単語だけでなく翻訳先の単語も画面に表示されている（ステップＳ２３及びＳ２４参照）。しかし、ここで、翻訳元の単語だけを画面に表示するようにしてもよい。すなわち、上記の実施形態に係る単語表示処理においてステップＳ２３は省略されてもよい。これは、利用者によっては、会話中に登場する翻訳元言語の単語さえ特定できれば、翻訳先言語の単語まで示されなくても会話の内容を理解することができる場合があるからである。 2-8. Modification 8
In the word display processing according to the above embodiment, the word data extracted by the extraction unit 116 is translated by the translation unit 117, and as a result, not only the translation source word but also the translation destination word are displayed on the screen. (See steps S23 and S24). However, only the translation source word may be displayed on the screen. That is, step S23 may be omitted in the word display process according to the above embodiment. This is because, depending on the user, as long as the words of the translation source language appearing in the conversation can be identified, the contents of the conversation may be understood even if the words of the translation destination language are not shown.

２−９．変形例９
上記の実施形態に係る抽出処理では、時計部１１５から取得した時刻情報により示される時刻から予め定められた時間遡った時刻を示す遡及時刻情報が作成されているが（ステップＳ３２）、この予め定められた時間は、条件に応じて変化する可変時間としてもよい。例えば、当該可変時間は、利用者により翻訳の指示が入力される際の利用者の指の押圧力や押圧時間に応じて変化してもよい。具体的には、押圧力が強くなるにつれて又は押圧時間が長くなるにつれて、可変時間が長くなるように制御してもよい。この可変時間は、本発明に係る「第１の時間」の一例である。 2-9. Modification 9
In the extraction process according to the above-described embodiment, retroactive time information indicating a time that is a predetermined time backward from the time indicated by the time information acquired from the clock unit 115 is created (step S32). The given time may be a variable time that changes according to conditions. For example, the variable time may change according to the pressing force or pressing time of the user's finger when a translation instruction is input by the user. Specifically, it may be controlled so that the variable time becomes longer as the pressing force becomes stronger or the pressing time becomes longer. This variable time is an example of the “first time” according to the present invention.

２−１０．変形例１０
上記の実施形態に係る単語表示処理において表示制御部１１８は、単語ペアのリストに加えて又は代えて、当該リストに含まれる翻訳元又は翻訳先の単語に関連する情報を画面に表示させるようにしてもよい。ここで、翻訳元又は翻訳先の単語に関連する情報とは、例えば、当該単語を検索キーとしてインターネット上で検索を行った結果ヒットした情報である。 2-10. Modification 10
In the word display process according to the above embodiment, the display control unit 118 displays information related to the translation source or translation destination words included in the list on the screen in addition to or instead of the word pair list. May be. Here, the information related to the translation source or translation destination word is, for example, information hit as a result of searching on the Internet using the word as a search key.

２−１１．変形例１１
上記の実施形態及び変形例群に係る外国語会話理解支援装置１又は１Ａの機能群を実現するプログラムは、コンピュータ装置が読み取り可能な記録媒体を介して提供されてもよい。ここで、記録媒体とは、例えば、磁気テープや磁気ディスクなどの磁気記録媒体や、光ディスクなどの光記録媒体や、光磁気記録媒体や、半導体メモリ等である。また、このプログラムは、インターネット等のネットワークを介して提供されてもよい。 2-11. Modification 11
The program for realizing the function group of the foreign language conversation understanding support device 1 or 1A according to the above-described embodiment and modification group may be provided via a recording medium readable by the computer device. Here, the recording medium is, for example, a magnetic recording medium such as a magnetic tape or a magnetic disk, an optical recording medium such as an optical disk, a magneto-optical recording medium, or a semiconductor memory. In addition, this program may be provided via a network such as the Internet.

１…外国語会話理解支援装置、１１…処理部、１２…記憶部、１３…タッチパネル、１４…カメラ、１５…マイク、１６…スピーカ、１７…通信部、１１１…エンコーダ、１１２…デコーダ、１１３…単語認識部、１１４…記憶情報更新部、１１５…時計部、１１６…抽出部、１１７…翻訳部、１１８…表示制御部、１２１…学習モデルＤＢ、１２２…単語ＤＢ、１２３…翻訳辞書ＤＢ、１２４…発音記号辞書ＤＢ、１２５…読み辞書ＤＢ、１３１…入力部、１３２…表示部、１１６１…遡及時刻作成部 DESCRIPTION OF SYMBOLS 1 ... Foreign language conversation understanding support apparatus, 11 ... Processing part, 12 ... Memory | storage part, 13 ... Touch panel, 14 ... Camera, 15 ... Microphone, 16 ... Speaker, 17 ... Communication part, 111 ... Encoder, 112 ... Decoder, 113 ... Word recognizing unit 114... Stored information updating unit 115 115 clock unit 116 extracting unit 117 translating unit 118 display control unit 121 learning model DB 122 word DB 123 translation dictionary DB 124 ... phonetic symbol dictionary DB, 125 ... reading dictionary DB, 131 ... input unit, 132 ... display unit, 1161 ... retroactive time creation unit

Claims

Of speech contents data indicating the content uttered by the other party of the conversation, only speech contents data representing the words of the inputted or predetermined difficulty by the user, in association time information and corresponds indicating a time more A storage unit for storing;
An input unit processing instructions for speech contents data stored in the storage unit receives from the user,
An extraction unit that extracts time information indicating a time after a time that is a first time later than a time when the instruction is received by the input unit and utterance content data that is associated with the storage unit;
An output unit that outputs information related to a result of speech recognition processing or translation processing for the utterance content data extracted by the extraction unit;
The input unit is a touch sensor;
The foreign language conversation understanding support device, wherein the first time changes according to a force or time when the input unit is pressed by the user.

A word recognition unit for generating word data by performing voice recognition processing on voice data indicating contents uttered by the conversation partner;
A storage information updating unit that stores only word data representing the difficulty level word in the generated word data in the storage unit in association with time information indicating time as the utterance content data; ,
The foreign language conversation according to claim 1, wherein the output unit outputs information related to the utterance content data extracted by the extraction unit or information related to a result of translation processing on the utterance content data. Understanding support device.

Of speech contents data indicating the content uttered by the other party of the conversation, only speech contents data representing the words of the inputted or predetermined difficulty by the user, in association time information and corresponds indicating a time more A foreign language conversation understanding support method executed by a foreign language conversation understanding support device including a storage unit for storing,
A step of processing instructions for the speech contents data stored in the storage unit receives from the user,
Extracting utterance content data associated with time information indicating a time after a time that is a first time later than the time when the instruction is received and the storage unit;
Outputting information related to a result of speech recognition processing or translation processing on the extracted utterance content data, and
The foreign language conversation understanding support device includes a touch sensor,
The foreign language conversation understanding support method, wherein the first time changes according to a force or time when the touch sensor is pressed by the user.

Of speech contents data indicating the content uttered by the other party of the conversation, only speech contents data representing the words of the inputted or predetermined difficulty by the user, in association time information and corresponds indicating a time more In a computer having a storage unit for storing,
A step of processing instructions for the speech contents data stored in the storage unit receives from the user,
Extracting utterance content data associated with time information indicating a time after a time that is a first time later than the time when the instruction is received and the storage unit;
Outputting information related to a result of speech recognition processing or translation processing on the extracted utterance content data, and a program for executing
The computer includes a touch sensor,
The program according to claim 1, wherein the first time changes according to a force or time when the touch sensor is pressed by the user.