JP2020091786A

JP2020091786A - Speech recognition input device

Info

Publication number: JP2020091786A
Application number: JP2018229961A
Authority: JP
Inventors: 宏平元木; Kohei Motoki
Original assignee: Hitachi Ltd
Current assignee: Hitachi Ltd
Priority date: 2018-12-07
Filing date: 2018-12-07
Publication date: 2020-06-11
Anticipated expiration: 2038-12-07
Also published as: JP7218163B2

Abstract

To provide a speech recognition input device capable of reducing the number of utterances required for operation while improving responsiveness to voice input instructions.SOLUTION: A speech recognition input device includes: a data acquisition part that acquires a piece of predetermined data and temporarily stores the data attached with acquisition time of the data in a storage unit; a voice input processing unit that receives a voice input, temporarily stores the voice input time when the voice input is received in the memory, and performs a series of recognition processing of the voice input; and a voice input determination unit configured so as to, when the voice input matches a specific operation instruction, output a command to perform the processing according to the operation instruction on the data at the acquisition time when the voice input is stored in addition to the acquisition time that is the same as the voice input time.SELECTED DRAWING: Figure 1

Description

本発明は、音声認識入力装置に関し、特に、画像撮像装置等に接続され、当該画像撮像装置に対してコマンドを出力する音声認識入力装置に関する。 The present invention relates to a voice recognition input device, and more particularly to a voice recognition input device that is connected to an image capturing device or the like and outputs a command to the image capturing device.

検査や手術等において医用Ｘ線画像撮像装置を使用する場合、患者の健康状態や放射線被曝などに配慮し、医用Ｘ線画像撮像装置の使用時間をできるだけ短時間とすることが好ましい。このため、例えばＸ線の照射範囲を微調整する場合等には、操作者による入力指示がスムーズに行われることが求められる。 When using the medical X-ray image capturing apparatus in examinations, surgery, etc., it is preferable to use the medical X-ray image capturing apparatus as short as possible in consideration of the health condition of the patient and radiation exposure. For this reason, for example, when finely adjusting the X-ray irradiation range, it is required that the input instruction be smoothly given by the operator.

また、例えば、連続したＸ線照射を行って患者の体内を動画像として観察するＸ線透視中にも、操作者による様々な操作が行われることから入力指示がスムーズに行われることが好ましい。操作としては、例えば、参照用として画像や動画を取得又は保存する場合、Ｘ線の照射範囲を調整するためにＸ線絞りを操作する場合、過去の撮影画像の中から現在の検査に関する画像を連続的に表示させ、その中から閲覧したい画像を選択して再度表示させる場合等が挙げられる。
従来、操作者は、マウス、キーボード、スイッチ及びレバー等の入力装置を物理的に操作することで上述のような入力指示を行っている。 In addition, for example, even during X-ray fluoroscopy in which continuous X-ray irradiation is performed to observe the inside of the patient as a moving image, various operations are performed by the operator, and therefore it is preferable that the input instruction is smoothly performed. As an operation, for example, when acquiring or saving an image or a moving image for reference, when operating the X-ray diaphragm to adjust the irradiation range of X-rays, an image related to the current examination is selected from past captured images. For example, it is possible to continuously display the images, select an image to be browsed from the images, and display the images again.
Conventionally, an operator physically issues an input instruction as described above by physically operating an input device such as a mouse, a keyboard, a switch and a lever.

ところが、検査や手術中においては、Ｘ線画像撮像装置の操作者の手が塞がっていることも多く、視線や音声などの物理的な操作を必要としない入力装置が求められている。このような入力装置の例として、特許文献１には、音声認識技術を使用した音声入力指示による画像表示が提案されている。すなわち、特許文献１では、医療画像を被験者氏名及び被験者ＩＤと共に記憶し、被験者氏名又は被験者ＩＤの少なくとも一方を音声認識した場合に、認識結果をキーとして該当する医療画像をデータファイルから検索し表示させる。これにより、操作者の手が塞がっている場合においても、被験者の氏名等を入力するという煩雑な操作を行うことなく所望の画像を表示させている。 However, the operator's hand of the X-ray image pickup apparatus is often blocked during the examination or surgery, and thus an input apparatus that does not require physical operations such as a line of sight and voice is required. As an example of such an input device, Patent Document 1 proposes image display by a voice input instruction using a voice recognition technique. That is, in Patent Document 1, a medical image is stored together with a subject name and a subject ID, and when at least one of the subject name and the subject ID is voice-recognized, the corresponding medical image is searched and displayed from the data file using the recognition result as a key. Let Thereby, even when the operator's hand is blocked, a desired image is displayed without performing a complicated operation of inputting the subject's name or the like.

特開２００３−３３９６９５号公報JP, 2003-339695, A

しかしながら、通常、音声認識処理には所定の時間を要することから、入力指示から操作までには一定の時間を要し、操作者が望んだタイミングで入力指示した操作が実行されないことがある。このため、所望の操作を実行するために繰り返し発話を行う必要がある。また、微調整等の細かい操作を行う際にも何度も発話が必要となる。つまり、操作者の入力指示に対する応答性が低く、操作に時間を要し操作そのものが煩雑化してしまう。
そこで、本発明では、音声入力指示に対して応答性を向上させ、操作に要する発話回数を低減させることができる音声認識入力装置を提供することを目的とする。 However, since the voice recognition process usually requires a predetermined time, it takes a certain time from the input instruction to the operation, and the operation instructed by the input may not be executed at the timing desired by the operator. For this reason, it is necessary to repeatedly speak in order to perform a desired operation. In addition, it is necessary to utter many times when performing fine operations such as fine adjustment. That is, the responsiveness to the operator's input instruction is low, the operation takes time, and the operation itself becomes complicated.
Therefore, it is an object of the present invention to provide a voice recognition input device capable of improving responsiveness to a voice input instruction and reducing the number of utterances required for operation.

本発明は、上記事情に鑑みてなされたものであり、音声入力指示に対して応答性を向上させ、操作に要する発話回数を低減させることを目的とする。 The present invention has been made in view of the above circumstances, and an object thereof is to improve responsiveness to a voice input instruction and reduce the number of utterances required for operation.

上記課題を解決するために、本発明は以下の手段を提供する。
本発明の一態様は、所定のデータを取得し、該データと共に該データの取得時刻を付帯させて記憶部に一時的に記憶させるデータ取得部と、音声入力を受け付け、該音声入力を受け付けた音声入力時刻を前記記憶部に一時的に記憶させると共に、前記音声入力について音声認識処理を行う音声入力処理部と、前記音声認識処理の結果、前記音声入力が特定の操作指示と合致した場合に、前記音声入力時刻と同時刻の前記取得時刻に付帯して記憶された前記データについて前記操作指示に従った処理を行うためのコマンドを出力する音声入力判定部と、を備えた音声認識入力装置を提供する。 In order to solve the above problems, the present invention provides the following means.
According to one embodiment of the present invention, a data acquisition unit that acquires predetermined data, temporarily stores the acquisition time of the data together with the data, and temporarily stores the data in a storage unit, receives a voice input, and receives the voice input. When the voice input time is temporarily stored in the storage unit and the voice input processing unit performs voice recognition processing on the voice input, and as a result of the voice recognition processing, the voice input matches a specific operation instruction. A voice recognition input device comprising: a voice input determination unit that outputs a command for performing a process in accordance with the operation instruction on the data stored incidentally at the acquisition time that is the same as the voice input time. I will provide a.

本発明によれば、音声入力指示に対して応答性を向上させ、操作に要する発話回数を低減させることができる。 According to the present invention, it is possible to improve responsiveness to a voice input instruction and reduce the number of utterances required for operation.

本発明の第１の実施形態に係る音声認識入力装置の概略構成を示すブロック図である。It is a block diagram which shows schematic structure of the speech recognition input device which concerns on the 1st Embodiment of this invention. 図１の音声認識入力装置による画像撮像装置において撮像した画像の保存指示に係る音声入力処理の流れを示すフローチャートである。6 is a flowchart showing a flow of a voice input process according to a storage instruction of an image captured by the image capturing device by the voice recognition input device of FIG. 1. 図１の音声認識入力装置における画像データとこれに付帯されて記憶されるデータ取得時刻の例を示す参考図である。FIG. 3 is a reference diagram showing an example of image data in the voice recognition input device of FIG. 1 and a data acquisition time associated with this and stored. 本発明の第２の実施形態に係る音声認識入力装置において読み出された画像データとこれに付帯されて記憶される画像データの表示時刻の例を示す参考図である。FIG. 9 is a reference diagram showing an example of display times of image data read out by the voice recognition input device according to the second embodiment of the present invention and image data incidental to the image data. 本発明の第３の実施形態に係る音声認識入力装置による画像撮像装置の絞り枠の位置調節に対する音声入力処理の流れを示すフローチャートである。9 is a flowchart showing a flow of voice input processing for position adjustment of a diaphragm frame of the image pickup device by the voice recognition input device according to the third embodiment of the present invention. 図５の音声認識入力装置における絞り枠の位置情報とこれに付帯されて記憶される位置情報取得時刻の例を示す参考図である。FIG. 6 is a reference diagram showing an example of position information of a diaphragm frame in the voice recognition input device of FIG. 5 and position information acquisition times incidental to this and stored.

本発明の実施形態に係る音声認識入力装置は、当該音声認識入力装置に接続された外部の画像撮像装置等に対して入力指示を行うものであり、画像撮像装置において撮像された複数の静止画から所望の画像を表示させたり、動画から所望の画像を抜き出して保存したり、画像撮像装置において画像を撮像する際の絞り位置を調節したり等所望の処理を指示するための入力を行うものである。 The voice recognition input device according to the embodiment of the present invention is for giving an input instruction to an external image pickup device or the like connected to the voice recognition input device, and includes a plurality of still images picked up by the image pickup device. Inputting to instruct desired processing such as displaying a desired image from an image, extracting and saving a desired image from a moving image, adjusting an aperture position when capturing an image in an image capturing device, etc. Is.

（第１の実施形態）
以下、本発明の第１の実施形態に係る音声認識入力装置について、図面を参照してより詳細に説明する。図１に本実施形態に係る音声認識入力装置の概略構成図を示す。
音声認識入力装置１０は、音声認識入力装置１０全体を制御する中央処理装置（ＣＰＵ）１１、音声入力を受け付けるマイク１２、メモリ１３及び磁気ディスク１４を備え、これらの各構成はシステムバスを介して互いに接続されている。 (First embodiment)
Hereinafter, the voice recognition input device according to the first embodiment of the present invention will be described in more detail with reference to the drawings. FIG. 1 shows a schematic configuration diagram of a voice recognition input device according to the present embodiment.
The voice recognition input device 10 includes a central processing unit (CPU) 11 that controls the entire voice recognition input device 10, a microphone 12 that receives voice input, a memory 13, and a magnetic disk 14. Each of these components is connected via a system bus. Connected to each other.

また、音声認識入力装置１０は画像撮像装置２０と通信可能に接続され、画像撮像装置２０に対する種々の入力指示を行うと共に、画像撮像装置２０から画像データ等を取得する。音声認識入力装置１０は、画像撮像装置２０を介してディスプレイ３０と接続され、画像撮像装置２０において取得した画像等をディスプレイ３０に表示させる指示を行う。
画像撮像装置２０としては、Ｘ線装置、ＭＲＩ装置、ＣＴ装置、ＰＥＴ装置など、医用画像取得のためのハードウェアを適用することができる。 In addition, the voice recognition input device 10 is communicably connected to the image capturing device 20, issues various input instructions to the image capturing device 20, and acquires image data and the like from the image capturing device 20. The voice recognition input device 10 is connected to the display 30 via the image capturing device 20, and gives an instruction to display an image or the like acquired by the image capturing device 20 on the display 30.
As the image capturing device 20, hardware for acquiring a medical image such as an X-ray device, an MRI device, a CT device, and a PET device can be applied.

音声認識入力装置１０によって画像撮像装置等に対して音声による入力指示を行うために、図１に示すように、ＣＰＵ１１は、音声入力処理部１１１、データ取得部１１２及び音声入力判定部１１３の機能を実現する。なお、ＣＰＵ１１が実現する音声入力処理部１１１、データ取得部１１２及び音声入力判定部１１３の機能は、ＣＰＵ１１が予め磁気ディスク１４等のメモリに格納されたプログラムを読み込んで実行することによりソフトウエアとして実現することができる。なお、ＣＰＵ１１に含まれる各部が実行する動作の一部又は全部を、ＡＳＩＣ（ａｐｐｌｉｃａｔｉｏｎｓｐｅｃｉｆｉｃｉｎｔｅｇｒａｔｅｄｃｉｒｃｕｉｔ）やＦＰＧＡ（ｆｉｅｌｄ−ｐｒｏｇｒａｍｍａｂｌｅｇａｔｅａｒｒａｙ）により実現することもできる。 As shown in FIG. 1, the CPU 11 has the functions of the voice input processing unit 111, the data acquisition unit 112, and the voice input determination unit 113 in order to give a voice input instruction to the image capturing device or the like by the voice recognition input device 10. To achieve. Note that the functions of the voice input processing unit 111, the data acquisition unit 112, and the voice input determination unit 113 realized by the CPU 11 are realized as software by the CPU 11 reading and executing a program previously stored in a memory such as the magnetic disk 14. Can be realized. Note that part or all of the operations executed by the respective units included in the CPU 11 can be realized by an ASIC (application specific integrated circuit) or an FPGA (field-programmable gate array).

音声入力処理部１１１は、マイク１２を介して入力された音声について、音声入力を受け付けた音声入力時刻をメモリ１３に一時的に記憶させると共に、音声入力を認識対象として公知の音声認識処理を行って認識結果を音声入力判定部１１３に出力する。 The voice input processing unit 111 temporarily stores the voice input time at which the voice input is received in the memory 13 for the voice input through the microphone 12, and performs a known voice recognition process with the voice input as a recognition target. And outputs the recognition result to the voice input determination unit 113.

データ取得部１１２は、音声認識入力装置１０と接続された画像撮像装置２０から所定のデータを取得し、取得したデータと共にデータの取得時刻を付帯させてメモリ１３に一時的に記憶させる。本実施形態においては、画像撮像装置２０においてＸ線透視などにおる連続的な画像撮影によって得られた画像データを取得する。なお、データ取得部１１２は、画像データをメモリ１３に一時的に記憶する際に、取得した画像データを同時にディスプレイ３０に表示させる。これにより、操作者による所望の画像データについての保存指示を支援することができる。 The data acquisition unit 112 acquires predetermined data from the image capturing device 20 connected to the voice recognition input device 10, attaches the acquired time together with the acquired data, and temporarily stores the acquired data in the memory 13. In the present embodiment, image data obtained by continuous image capturing such as X-ray fluoroscopy is acquired by the image capturing device 20. Note that the data acquisition unit 112 causes the display 30 to simultaneously display the acquired image data when temporarily storing the image data in the memory 13. As a result, it is possible to support the operator's storage instruction regarding desired image data.

音声入力判定部１１３は、音声入力処理部１１１から取得した音声入力についての音声認識処理の結果が予め定められた操作指示と合致した場合に、メモリ１３に記憶された音声入力時刻と同時刻のデータ取得時刻に付帯して記憶されたデータについて音声入力に係る操作指示に従った処理を行うためのコマンドを出力する。 The voice input determination unit 113 determines whether the voice input time stored in the memory 13 is the same as the voice input time when the result of the voice recognition process for the voice input acquired from the voice input processing unit 111 matches a predetermined operation instruction. A command for performing processing according to an operation instruction related to voice input is output for the data stored incidentally to the data acquisition time.

本実施形態では、例えば、Ｘ線透視において連続的に画像が撮影されている期間において、特定の画像データを特に静止画として保存させたい場合等に、予め定められた操作指示として「保存」を定義しておく。これにより、音声入力判定部１１３は、音声入力が保存であった場合に、音声入力時刻と同時刻に付帯してメモリ１３に記憶された画像データを磁気ディスク１４に保存するためのコマンドを出力する。 In the present embodiment, for example, when a specific image data is desired to be stored as a still image during a period in which images are continuously captured under fluoroscopy, “save” is set as a predetermined operation instruction. Define it. As a result, when the voice input is stored, the voice input determination unit 113 outputs a command to store the image data stored in the memory 13 at the same time as the voice input time in the magnetic disk 14. To do.

マイク１２は、操作者の発話を音声による入力として受け付けて電気信号に変換し、音声入力処理部１１１に出力する。
メモリ１３は、ＣＰＵ１１が実行するプログラムや演算処理の途中経過を記憶したり、画像撮像装置２０から読み込んだ画像データ等の各種データや、各種データを取得した時刻の他、音声におる入力指示を受け付けた入力指示時刻をＣＰＵからの指示によって一時的にきおくしたりする。 The microphone 12 receives the utterance of the operator as a voice input, converts the voice into an electric signal, and outputs the electric signal to the voice input processing unit 111.
The memory 13 stores programs executed by the CPU 11 and the progress of arithmetic processing, various data such as image data read from the image pickup device 20, the time when the various data are acquired, and voice input instructions. The received input instruction time is temporarily set aside according to an instruction from the CPU.

磁気ディスク１４は、ＣＰＵ１１が実行するプログラムやプログラム実行に必要なデータを格納する。さらに、磁気ディスク１４は、画像撮像装置２０から読み込んだ画像データ等を保存する。磁気ディスク１４として、例えば、ハードディスク等のＣＤ／ＤＶＤ、ＵＳＢメモリ、ＳＤカード等の可搬性記録媒体とデータの授受が可能な装置を適用することができる。 The magnetic disk 14 stores a program executed by the CPU 11 and data necessary for executing the program. Further, the magnetic disk 14 stores image data read from the image pickup device 20. As the magnetic disk 14, for example, a device capable of exchanging data with a portable recording medium such as a CD/DVD such as a hard disk, a USB memory, or an SD card can be applied.

以下、このように構成された音声認識入力装置１０による画像撮像装置２０において撮像した画像の保存指示に係る音声入力処理の流れについて図２のフローチャートに従って説明する。 The flow of the voice input process related to the instruction to save the image captured by the image capturing device 20 by the voice recognition input device 10 configured as described above will be described below with reference to the flowchart of FIG.

図２に示すように、画像撮像装置２０においてＸ線透視等による連続的な画像撮像が開始されると、ステップＳ１０１においてデータ取得部１１２が、画像撮像装置２０から画像データの取得を開始する。このとき、データ取得部１２は、図３に示すように画像データと共に当該画像データを取得したデータ取得時刻を付帯させてメモリ１３に一時的に記憶させる。この時、データ取得部１１２は、画像データの取得と同時にディスプレイ３０に取得した画像データを表示させる。これにより、操作者は、ディスプレイ３０を参照しながら、所望の画像について画像保存指示を行うことができる。 As shown in FIG. 2, when the image capturing apparatus 20 starts continuous image capturing by X-ray fluoroscopy or the like, the data acquiring unit 112 starts acquiring image data from the image capturing apparatus 20 in step S101. At this time, the data acquisition unit 12 temporarily stores the image data in the memory 13 together with the image data and the data acquisition time when the image data was acquired, as shown in FIG. At this time, the data acquisition unit 112 causes the display 30 to display the acquired image data simultaneously with the acquisition of the image data. This allows the operator to give an image saving instruction for a desired image while referring to the display 30.

次のステップＳ１０２では、音声入力処理部１１１が、マイク１２を介して音声入力があったか否かを判定し、入力指示がなかったと判定した場合には、画像データの取得が継続している期間中はステップＳ１０２の処理を繰り返す。音声入力処理部１１１により音声入力があったと判定した場合には、ステップＳ１０３に進み、音声入力の時刻を音声入力時刻としてメモリ１３に一時的に記憶して、音声認識処理を行う（ステップＳ１０４）。 In the next step S102, the voice input processing unit 111 determines whether or not there is voice input through the microphone 12, and when it is determined that there is no input instruction, during the period in which image data acquisition continues. Repeats the process of step S102. When the voice input processing unit 111 determines that there is voice input, the process proceeds to step S103, the voice input time is temporarily stored in the memory 13 as the voice input time, and the voice recognition process is performed (step S104). ..

次のステップＳ１０５では、音声入力判定部１１３が、音声入力処理部１１１により音声認識処理を行った結果、音声入力が画像保存指示だったか否かを判定する。判定の結果、音声入力が画像保存指示であった場合にはステップＳ１０６に進み、音声入力が画像保存指示でなかった場合は、ステップＳ１０８に進む。 In the next step S105, the voice input determination unit 113 determines whether or not the voice input is an image saving instruction as a result of the voice recognition processing performed by the voice input processing unit 111. As a result of the determination, when the voice input is the image saving instruction, the process proceeds to step S106, and when the voice input is not the image saving instruction, the process proceeds to step S108.

ステップＳ１０６では、音声入力判定部１１３が、音声入力時刻と一致するデータ取得時刻に付帯して記憶された画像データを磁気ディスク１４に保存するためのコマンドを出力する。本実施形態では、音声認識入力装置が磁気ディスク１４を備えているため、ＣＰＵ１１がデータ取得部１１２によって画像撮像装置２０から取得してメモリ１３に一時的に記憶させた画像データのうち、音声入力時刻と一致するデータ取得時刻に付帯して記憶された画像データを磁気ディスク１４に記憶させる。そして、ステップＳ１０７に進み、ステップＳ１０３でメモリ１３に一時的に保存された音声入力時刻を消去し、ステップＳ１０８に進む。 In step S<b>106, the voice input determination unit 113 outputs a command for saving the image data stored at the data acquisition time coincident with the voice input time on the magnetic disk 14. In the present embodiment, since the voice recognition input device is provided with the magnetic disk 14, the CPU 11 obtains the voice input from the image data acquired from the image capturing device 20 by the data acquisition unit 112 and temporarily stored in the memory 13. The image data stored at the data acquisition time coincident with the time is stored in the magnetic disk 14. Then, the process proceeds to step S107, the voice input time temporarily stored in the memory 13 at step S103 is deleted, and the process proceeds to step S108.

ステップＳ１０８では、画像データの取得が終了したか否か、つまり、画像撮像装置２０においてＸ線透視による連続的な画像撮像が終了したか否かを判定し、画像撮像が終了した場合には、メモリ１３に一時的に記憶された画像データとこれに付帯して記憶されたデータ取得時刻を消去して、音声入力処理を終了する。 In step S108, it is determined whether the acquisition of the image data is completed, that is, whether the image capturing apparatus 20 has completed the continuous image capturing by X-ray fluoroscopy. The image data temporarily stored in the memory 13 and the data acquisition time stored in association with the image data are erased, and the voice input process ends.

このように、本実施形態によれば、操作者による画像保存指示に係る音声入力がなされた音声入力時刻を保存し、これと並行して、データ取得部が画像データを取得する際にデータ取得時刻を画像データと共に記憶させている。このため、音声入力時刻とデータ取得時刻とを照合することにより、操作者が画像保存指示を行った画像データを直ちに抽出することができる。従って、操作者が保存したい画像を、指示したタイミングで、かつ、少ない発話回数で正確に保存することができる。すなわち、操作者が望む操作を素早く正確に実行させることができ、操作に要する発話回数を低減させた応答性の高い音声認識入力装置を実現することができる。 As described above, according to the present embodiment, the voice input time at which the voice input according to the image storage instruction by the operator is performed is stored, and in parallel with this, the data acquisition unit acquires the data when acquiring the image data. The time is stored together with the image data. Therefore, by collating the voice input time and the data acquisition time, it is possible to immediately extract the image data for which the operator has instructed to save the image. Therefore, it is possible to accurately save the image that the operator wants to save, at the designated timing and with a small number of utterances. That is, it is possible to realize a highly responsive voice recognition input device in which the operation desired by the operator can be executed quickly and accurately, and the number of utterances required for the operation is reduced.

（変形例）
上述した音声認識入力装置１０では、操作者の発話による音声入力指示で特定の画像データを保存する例について説明したが、これに限られず、例えば、画像保存に関して、「保存開始」及び「保存終了」の２つの指示を行い、保存開始から保存終了までの期間の全ての画像データを保存することもできる。
このようにすることで、操作者が保存したい画像データの全てを、指示したタイミングで、かつ、少ない発話回数で正確に保存することができる。 (Modification)
In the voice recognition input device 10 described above, an example in which specific image data is saved by a voice input instruction by the operator's utterance has been described, but the present invention is not limited to this. It is also possible to save all the image data during the period from the start of saving to the end of saving by making two instructions.
By doing so, it is possible to accurately save all the image data that the operator wants to save at the instructed timing and with a small number of utterances.

（第２の実施形態）
本実施形態では、画像撮像装置２０から連続的に画像データを取得しながらディスプレイ３０に同時に画像データを表示する、或いは、既にメモリ１３又は磁気ディスク１４に保存された画像データを連続的にディスプレイ３０に表示させる場合において、例えば、画像送り、画像戻し、画像をめくる時間を早めたり遅くしたりする等、操作者の所望の操作を音声入力により行うことができる。 (Second embodiment)
In the present embodiment, the image data is simultaneously displayed on the display 30 while continuously acquiring the image data from the image capturing device 20, or the image data already stored in the memory 13 or the magnetic disk 14 is continuously displayed 30. In the case of displaying on the screen, for example, it is possible to perform an operation desired by the operator by voice input, such as advancing the image, returning the image, and advancing or delaying the time for turning the image.

この場合において、音声入力処理部１１１が操作者による音声入力と共に音声入力時刻をメモリ１３に記憶させ、かつ、データ取得部１１２が画像データを読み出して（取得して）表示させる際に、表示中の画像データと表示時刻をメモリ１３に一時的に記憶させる（図４参照）。これにより、例えば、操作者により表示停止指示があった場合に、表示停止指示に係る音声入力時刻と一致する表示時刻に対応する画像データをディスプレイ３０に表示させることができる。 In this case, when the voice input processing unit 111 stores the voice input time in the memory 13 together with the voice input by the operator, and the data acquisition unit 112 reads (acquires) and displays the image data, it is being displayed. The image data and the display time of are stored temporarily in the memory 13 (see FIG. 4). Thereby, for example, when the display stop instruction is given by the operator, it is possible to cause the display 30 to display the image data corresponding to the display time corresponding to the voice input time according to the display stop instruction.

（第３の実施形態）
本実施形態においては、音声入力装置がＸ線絞りを備えた画像撮像装置２０に接続された場合であって、画像撮像装置２０において画像を撮像する際のＸ線絞り位置を調節する場合の音声入力処理について説明する。本実施形態における音声認識入力装置１０は、上述した第１の実施形態に係る音声認識入力装置と同一の構成を有するため、同一の構成については同符号を付し、各構成についての説明を省略する。 (Third Embodiment)
In the present embodiment, a voice when the voice input device is connected to the image capturing device 20 having an X-ray diaphragm, and a voice when adjusting the X-ray aperture position when capturing an image in the image capturing device 20. The input process will be described. Since the voice recognition input device 10 according to the present embodiment has the same configuration as the voice recognition input device according to the first embodiment described above, the same components are designated by the same reference numerals, and the description of each component will be omitted. To do.

Ｘ線絞り枠の位置は、音声入力に従って調節することができ、例えば、Ｘ線絞り枠の任意の一辺を移動、上下両方の辺、左右両方の辺、四辺全てを同時に移動させることができる他、移動方向もＸ線絞り枠を広げる方向又は狭める方向等任意に組み合わせ、適宜調節を行うことができる。操作者により移動開始に係る音声が入力に従って、指示された辺が指示された方向に移動を開始し、再び操作者による停止に係る音声入力に従ってＸ線絞り枠が停止する。 The position of the X-ray diaphragm frame can be adjusted according to the voice input, and for example, any one side of the X-ray diaphragm frame can be moved, both upper and lower sides, both left and right sides, and all four sides can be simultaneously moved. The moving directions can be arbitrarily adjusted by appropriately combining the moving direction and the narrowing direction of the X-ray diaphragm frame. The operator starts moving the designated side in the designated direction in response to the voice input to start the movement, and the X-ray diaphragm frame is stopped again in response to the voice input to stop the operator.

以下、音声認識入力装置１０による画像撮像装置２０の絞り枠の位置調節に対する音声入力処理の流れについて図５のフローチャートに従って説明する。
操作者により、絞り枠の移動開始について音声が入力されると、音声認識処理を経て指示された辺が指示された方向に移動を開始、すなわち、絞り枠の調節操作が開始する。 The flow of voice input processing for adjusting the position of the aperture frame of the image pickup device 20 by the voice recognition input device 10 will be described below with reference to the flowchart of FIG.
When the operator inputs a voice to start moving the diaphragm frame, the designated side starts moving in the designated direction through the voice recognition process, that is, the diaphragm frame adjusting operation starts.

図５に示すように、画像撮像装置２０の絞り枠の調節操作が開始すると、ステップＳ２０１においてデータ取得部１１２が、画像撮像装置２０の絞り枠の位置をディスプレイ３０に表示させると共に、絞り枠の位置情報（データ）の取得を開始する。ディスプレイ３０に絞り枠の位置を表示させることで、ディスプレイ上で絞り枠の位置調節を行うことができる。そして、絞り位置情報の取得に際して、データ取得部１２は、図６に示すように絞り枠の位置情報と共に当該位置情報を取得した位置情報取得時刻を付帯させてメモリ１３に一時的に記憶させる。 As shown in FIG. 5, when the adjustment operation of the aperture frame of the image capturing apparatus 20 is started, the data acquisition unit 112 causes the display 30 to display the position of the aperture frame of the image capturing apparatus 20 and the aperture frame of the image capturing apparatus 20 in step S201. Start acquisition of location information (data). By displaying the position of the diaphragm frame on the display 30, the position of the diaphragm frame can be adjusted on the display. Then, when acquiring the aperture position information, the data acquisition unit 12 temporarily stores the position information in the memory 13 together with the position information of the aperture frame and the position information acquisition time when the position information was acquired, as shown in FIG.

次のステップＳ２０２では、音声入力処理部１１１が、マイク１２を介して音声入力があったか否かを判定し、入力指示がなかったと判定した場合には、画像データの取得が継続している期間中はステップＳ２０２の処理を繰り返す。音声入力処理部１１１により音声入力があったと判定した場合には、ステップＳ２０３に進み、音声入力の時刻を音声入力時刻としてメモリ１３に一時的に記憶して、音声認識処理を行う（ステップＳ２０４）。 In the next step S202, the voice input processing unit 111 determines whether or not there is voice input through the microphone 12, and when it is determined that there is no input instruction, during the period during which the acquisition of image data continues. Repeats the process of step S202. When the voice input processing unit 111 determines that there is a voice input, the process proceeds to step S203, the voice input time is temporarily stored in the memory 13 as the voice input time, and the voice recognition process is performed (step S204). ..

次のステップＳ２０５では、音声入力判定部１１３が、音声入力処理部１１１により音声認識処理を行った結果、音声入力が絞り枠移動停止指示だったか否かを判定する。判定の結果、音声入力が絞り枠移動停止指示であった場合にはステップＳ２０６に進み、音声入力が絞り枠移動停止指示でなかった場合は、ステップＳ２０８に進む。 In the next step S205, the voice input determination unit 113 determines whether or not the voice input is a diaphragm frame movement stop instruction as a result of the voice recognition processing performed by the voice input processing unit 111. As a result of the determination, if the voice input is the aperture frame movement stop instruction, the process proceeds to step S206. If the voice input is not the aperture frame movement stop instruction, the process proceeds to step S208.

ステップＳ２０６では、音声入力判定部１１３が、音声入力時刻と一致する位置情報取得時刻に付帯して記憶された位置情報を画像撮像装置２０に出力する。本実施形態では、データ取得部１１２がメモリ１３に一時的に記憶させた位置情報のうち、音声入力時刻と一致する位置情報取得時刻に付帯して記憶された位置情報を画像撮像装置２０に出力する。そして、ステップＳ２０７に進み、ステップＳ２０３でメモリ１３に一時的に保存された音声入力時刻を消去し、ステップＳ２０８に進む。 In step S<b>206, the voice input determination unit 113 outputs to the image capturing apparatus 20 the position information that is stored incidentally at the position information acquisition time that matches the voice input time. In the present embodiment, out of the position information temporarily stored in the memory 13 by the data acquisition unit 112, the position information additionally stored at the position information acquisition time that matches the voice input time is output to the image capturing apparatus 20. To do. Then, the process proceeds to step S207, the voice input time temporarily stored in the memory 13 at step S203 is deleted, and the process proceeds to step S208.

ステップＳ２０８では、画像データの取得が終了したか否か、つまり、画像撮像装置２０においてＸ線透視による連続的な画像撮像が終了したか否かを判定し、画像撮像が終了した場合には、メモリ１３に一時的に記憶された画像データとこれに付帯して記憶されたデータ取得時刻を消去して、音声入力処理を終了する。 In step S208, it is determined whether or not the acquisition of the image data has been completed, that is, whether or not the image capturing apparatus 20 has completed the continuous image capturing by X-ray fluoroscopy. The image data temporarily stored in the memory 13 and the data acquisition time stored in association with the image data are erased, and the voice input process ends.

このように、本実施形態によれば、操作者による絞り枠の位置調節に係る音声入力がなされた音声入力時刻を保存し、これと並行して、データ取得部が絞り枠の位置情報を取得する際に位置情報取得時刻を位置情報と共に記憶させている。このため、音声入力時刻と位置情報取得時刻とを照合することにより、操作者が絞り枠停止指示を行ったときの位置情報を直ちに抽出することができる。従って、操作者が決定したい絞り枠の位置情報を少ない発話回数で正確に画像撮像装置２０に出力することができる。すなわち、操作者が望む操作を素早く正確に実行させることができ、操作に要する発話回数を低減させた応答性の高い音声認識入力装置を実現することができる。
上述した音声認識入力装置は、所望の画像の保存又は表示や、絞り枠の位置調節等の操作に係る入力だけでなく、例えば画像撮像装置に設けられた寝台の移動等の操作に係る入力を行うこともできる。 As described above, according to the present embodiment, the voice input time at which the voice input regarding the position adjustment of the aperture frame is performed by the operator is stored, and in parallel with this, the data acquisition unit acquires the position information of the aperture frame. The position information acquisition time is stored together with the position information when performing. Therefore, by collating the voice input time with the position information acquisition time, the position information when the operator gives the stop frame stop instruction can be immediately extracted. Therefore, the position information of the aperture frame that the operator wants to determine can be accurately output to the image pickup device 20 with a small number of utterances. That is, it is possible to realize a highly responsive voice recognition input device in which the operation desired by the operator can be executed quickly and accurately and the number of utterances required for the operation is reduced.
The voice recognition input device described above not only performs input related to operations such as storage or display of a desired image and adjustment of the position of the aperture frame, but also input related to operations such as movement of a bed provided in the image capturing apparatus. You can also do it.

１０・・・音声認識入力装置、１１・・・ＣＰＵ、１２・・・マイク、１３・・・メモリ、１４・・・磁気ディスク、１１１・・・音声入力処理部、１１２・・・データ取得部、１１３・・・音声入力判定部、２０・・・画像撮像装置、３０・・・ディスプレイ 10... Voice recognition input device, 11... CPU, 12... Microphone, 13... Memory, 14... Magnetic disk, 111... Voice input processing unit, 112... Data acquisition unit , 113... Voice input determination unit, 20... Image pickup device, 30... Display

Claims

A data acquisition unit that acquires predetermined data, attaches the acquisition time of the data together with the data, and temporarily stores the data in a storage unit;
A voice input processing unit that receives voice input, temporarily stores the voice input time when the voice input is received in the storage unit, and performs voice recognition processing for the voice input;
As a result of the voice recognition process, when the voice input matches a specific operation instruction, a process according to the operation instruction is performed on the data stored incidentally at the acquisition time at the same time as the voice input time. A voice recognition input device, comprising: a voice input determination unit that outputs a command for execution.

The data is image data continuously acquired by an external image pickup device,
The data acquisition unit accompanies the acquisition time of the image data together with the image data and temporarily stores it in the storage unit,
The voice input determination unit stores the image data stored in association with the acquisition time at the same time as the input instruction time when the operation instruction related to the voice input is an image storage instruction. The voice recognition input device according to claim 1, which outputs a command.

The data is image data continuously acquired by an external image pickup device,
While the data acquisition unit displays the image data on the display device while acquiring the image data, the acquisition time of the image data is additionally stored and temporarily stored in the storage unit,
When the operation instruction related to the voice input is an image display, the voice input determination unit displays on the display device the image data stored in association with the acquisition time at the same time as the voice input time. The voice recognition input device according to claim 1, which outputs a command for causing the input.

The data is position information of an aperture frame in an external image pickup device,
The data acquisition unit continuously acquires the position information, and temporarily stores the position information in the storage unit with the acquisition time of the position information.
When the operation instruction related to the voice input is a movement stop instruction of a diaphragm frame, the voice input determination unit stores the position information additionally stored at the acquisition time that is the same time as the voice input time. The voice recognition input device according to claim 1, wherein the voice recognition input device outputs the image recognition device.