JP7141938B2

JP7141938B2 - Voice recognition input device, voice recognition input program and medical imaging system

Info

Publication number: JP7141938B2
Application number: JP2018229984A
Authority: JP
Inventors: 宏之助天明
Original assignee: 富士フイルムヘルスケア株式会社
Priority date: 2018-12-07
Filing date: 2018-12-07
Publication date: 2022-09-26
Anticipated expiration: 2038-12-07
Also published as: JP2020089641A

Description

本発明は、音声認識入力装置、音声認識入力プログラム及び医用画像撮像システムに関し、特に、医用画像撮像装置等の医療機器に接続され、当該医療機器に対してコマンドを出力する音声認識入力装置、音声認識入力プログラム及び医用画像撮像システムに関する。 TECHNICAL FIELD The present invention relates to a speech recognition input device, a speech recognition input program, and a medical imaging system, and more particularly, to a speech recognition input device, which is connected to a medical device such as a medical imaging device and outputs a command to the medical device. The present invention relates to a recognition input program and a medical imaging system.

近年、インターベンション治療の高度化に伴い、Ｘ線透視撮影下および軟性内視鏡操作下で血管や消化管の検査又は治療を行うケースが増えている。これら検査又は治療では術者の清潔性が担保される必要があり、手動による機器操作はその清潔性を維持できない。
また、例えば、Ｘ線透視撮影装置を用いた検査では、しばしば検査手技を施行する術者の口頭指示のもと、それを補助する術者サポートの医療従事者が機器操作を行うことがある。このような場合、術者から補助役の医療従事者に対する口頭指示等の意思疎通に手間取り、術者の意図通りの機器操作まで時間がかかることがあり、術者が医療従事者による補助がなくとも直接口頭指示によって医療機器の操作を行うことが望まれる。
そこで、手術や検査等に利用される医療機器において、清潔性の確保や操作性向上のために音声認識による操作が望まれている。 In recent years, with the advancement of interventional treatment, there are an increasing number of cases in which blood vessels and gastrointestinal tracts are examined or treated under X-ray fluoroscopy and flexible endoscope manipulation. In these examinations or treatments, it is necessary to ensure the cleanliness of the operator, and manual operation of the equipment cannot maintain the cleanliness.
Further, for example, in an examination using an X-ray fluoroscope, equipment is often operated by a medical worker who assists the operator under verbal instructions of the operator who performs the examination procedure. In such a case, communication such as verbal instructions from the operator to the supporting medical staff takes time, and it may take time for the operator to operate the equipment as intended, and the operator is not assisted by the medical staff. It is desired that both medical devices are operated by direct verbal instructions.
Therefore, in medical equipment used for surgery, examination, etc., it is desired to operate by voice recognition in order to ensure cleanliness and improve operability.

一方、昨今、音声認識技術は、従来から存在する隠れマルコフモデルを用いた手法に加え、ＤｅｅｐＬｅａｒｎｉｎｇを用いた手法が出現し単語認識のみならず文章としての音声認識処理が可能になるなど認識精度が向上してきている。また、音声認識処理にはサーバやＣｌｏｕｄを用いた大規模な機械学習を行いて逐次的に性能を向上させるものがあるが、医療機器は秘匿性を考慮して設計される必要があることから、医療機器に適用される音声認識入力装置は、Ｃｌｏｕｄやサーバに接続せず非ネットワーク環境下で音声認識処理を行う必要がある。 On the other hand, in recent years, in addition to conventional methods using hidden Markov models, methods using deep learning have emerged in speech recognition technology. is improving. In addition, there are voice recognition processes that perform large-scale machine learning using servers and clouds to sequentially improve performance, but medical equipment must be designed with confidentiality in mind. A speech recognition input device applied to medical equipment needs to perform speech recognition processing in a non-network environment without connecting to Cloud or a server.

そして、音声認識により操作を行う医療機器の例として、特許文献１には、Ｘ線画像診断装置において、操作者の負担を軽減するために、誤作動によって被検者に危害を与える虞のある機能は操作者による手動の操作に基づいて制御し、誤作動によっても被検者に危害を与える虞のない機能については操作者が発生する音声を認識することによって制御することが開示されている。 As an example of a medical device operated by voice recognition, Patent Document 1 discloses an X-ray image diagnostic apparatus that may harm a subject due to malfunction in order to reduce the burden on the operator. It is disclosed that the functions are controlled based on the manual operation by the operator, and the functions that are not likely to harm the subject even if they malfunction are controlled by recognizing the voice generated by the operator. .

特開２００６－１４９９０９号公報JP 2006-149909 A

しかしながら、音声認識処理に用いるデータベースに、例えば、互いに類似した音素を持つ音声操作コマンドが複数個登録されていた場合には、音声認識処理において誤検出を生じさせる可能性がある。すなわち、術者が発した音声が、類似した音素からなる複数の音声操作コマンドのうち何れの音声操作コマンドに該当するか判別ができず、誤検出となる虞がある。この場合、音声認識処理によって操作を行うことができず、結果的に術者はサポートを行う医療従事者に機器操作を指示することとなり、術者の意図通りの機器操作に要する時間を短縮することができない。また、その場合は音声操作コマンドの認識率の向上が必要となるが、その手段は明示されていない。 However, if, for example, a plurality of voice operation commands having similar phonemes are registered in the database used for voice recognition processing, there is a possibility of false detection occurring in voice recognition processing. That is, it is not possible to determine to which voice manipulation command the voice uttered by the operator corresponds to among a plurality of voice manipulation commands made up of similar phonemes, and there is a risk of erroneous detection. In this case, the operation cannot be performed by voice recognition processing, and as a result, the operator instructs the medical staff who provides support to operate the equipment, shortening the time required to operate the equipment as intended by the operator. I can't. Also, in that case, it is necessary to improve the recognition rate of voice operation commands, but the means for doing so is not specified.

本発明は、上記事情に鑑みてなされたものであり、音声認識処理において、術者の音声による操作指示を正確に認識し、誤検出を低減させることを目的とする。 SUMMARY OF THE INVENTION The present invention has been made in view of the above circumstances, and an object of the present invention is to accurately recognize operating instructions by voice of an operator and reduce erroneous detection in voice recognition processing.

上記課題を解決するために、本発明は以下の手段を提供する。
本発明の一態様は、外部機器に対し操作コマンドを入力する音声認識入力装置であって、１の操作コマンドについて複数の音声コマンドを対応付けて記録すると共に、音声コマンド毎に当該音声コマンドの使用頻度に応じた重み係数を記録した音声認識データテーブルを記憶した記憶部と、音声入力を受け付け、該音声入力を認識対象として音声認識処理を行って前記音声入力に対応する音声コマンドを音声認識処理の結果として出力する音声認識部と、前記音声認識データテーブルを参照して、前記音声コマンドを該音声コマンドに対応して記録された操作コマンドに変換するコマンド変換部と、前記操作コマンドを前記外部機器に出力する操作決定部と、を備え、前記音声認識部が、前記音声入力に相当し得る音声コマンド候補を複数選出し、これら複数の音声コマンド候補夫々に前記重み係数を乗じることにより最も確からしい音声コマンドを前記音声入力に対する音声認識処理の結果として出力する、音声認識入力装置を提供する。
本発明によれば、音声コマンド毎に使用頻度に応じた重み係数を記録したデータテーブルを用いて音声認識処理を行うので、音声による操作指示において音声認識処理の精度を向上させることができる。 In order to solve the above problems, the present invention provides the following means.
One aspect of the present invention is a voice recognition input device for inputting an operation command to an external device, in which a plurality of voice commands are associated with one operation command and recorded, and the voice command is used for each voice command. A storage unit storing a voice recognition data table in which weighting coefficients according to frequency are recorded, a voice input is received, voice recognition processing is performed using the voice input as a recognition target, and a voice command corresponding to the voice input is voice recognized. a command conversion unit for converting the voice command into an operation command recorded corresponding to the voice command by referring to the voice recognition data table; and converting the operation command to the external and an operation determination unit for outputting to a device, wherein the voice recognition unit selects a plurality of voice command candidates that can correspond to the voice input, and multiplies each of the plurality of voice command candidates by the weighting factor. To provide a voice recognition input device that outputs a voice command that seems to be a voice command as a result of voice recognition processing for the voice input.
According to the present invention, voice recognition processing is performed using a data table that records a weighting factor corresponding to the frequency of use for each voice command, so that the accuracy of voice recognition processing can be improved for operation instructions by voice.

本発明によれば、音声認識処理において、誤検出を低減させ、術者の音声による操作指示を正確に認識することができる。 According to the present invention, it is possible to reduce erroneous detection in voice recognition processing, and to accurately recognize operating instructions by the voice of the operator.

本発明の第１実施形態に係る音声認識入力装置の概略構成を示すブロック図である。1 is a block diagram showing a schematic configuration of a speech recognition input device according to a first embodiment of the present invention; FIG. 図１の音声認識入力装置の音声認識ＤＢに格納された音声認識データテーブルの一例である。2 is an example of a speech recognition data table stored in a speech recognition DB of the speech recognition input device of FIG. 1; 本発明の第１の実施形態における音声認識入力装置において、重み係数を更新する際に参照する、操作コマンドの使用頻度、操作コマンドコード及びオフセット係数Ｔの関係を示すグラフである。5 is a graph showing the relationship between the frequency of use of operation commands, operation command codes, and offset coefficients T, which are referred to when updating weighting factors in the voice recognition input device according to the first embodiment of the present invention. 本発明の第１の実施形態に係る音声認識入力装置による音声認識入力処理の流れを示すフローチャートである。4 is a flow chart showing the flow of speech recognition input processing by the speech recognition input device according to the first embodiment of the present invention; 本発明の第１の実施形態の変形例における音声認識入力装置において、重み係数を更新する際に参照する、音声コマンドの積算回数、音声コマンドコード及びオフセット係数Ｖの関係を示すグラフである。9 is a graph showing the relationship between the cumulative number of voice commands, the voice command code, and the offset coefficient V, which are referred to when updating the weighting factor in the voice recognition input device according to the modification of the first embodiment of the present invention. 本発明の第２の実施形態に係る音声認識入力装置における音声認識ＤＢに格納されたデータテーブルの一例を示し、（Ａ）は、検査開始情報の状態を示すデータテーブル、（Ｂ）は、検査種別の状態を示すデータテーブル、（Ｃ）はＸ線照射情報の状態を示すデータテーブル、（Ｄ）は装置運用状況と音声認識ＤＢの分類を示すデータテーブルである。1 shows an example of a data table stored in a speech recognition DB in a speech recognition input device according to a second embodiment of the present invention, (A) is a data table showing the state of examination start information, (B) is an examination A data table showing the state of the type, (C) a data table showing the state of the X-ray irradiation information, and (D) a data table showing the device operation status and the classification of the speech recognition DB. 本発明の第２の実施形態に係る音声認識入力装置において、音声認識データテーブルの切替処理の流れを示すフローチャートである。9 is a flow chart showing the flow of switching processing of the speech recognition data table in the speech recognition input device according to the second embodiment of the present invention;

本発明の実施形態に係る音声認識入力装置は、当該音声認識入力装置に接続された医用画像撮像装置等の外部機器に対して入力を行うものである。 A voice recognition input device according to an embodiment of the present invention performs input to an external device such as a medical imaging device connected to the voice recognition input device.

（第１の実施形態）
以下、本発明の第１の実施形態に係る音声認識入力装置について、図面を参照してより詳細に説明する。図１に本実施形態に係る音声認識入力装置の概略構成図を示す。
音声認識入力装置１０は、音声認識入力装置１０全体を制御する中央処理装置（ＣＰＵ）１１、マイク等の音声入力を受け付ける音声入力Ｉ／Ｆ（インターフェイス）１２、マウスやキーボードなどからなり手動入力を受け付ける手動入力Ｉ／Ｆ（インターフェイス）１３、メモリ１４、音声認識アルゴリズムや音声認識処理に必要なデータを格納した音声認識ＤＢ１５及び音声入力に関するログを収集し記録するログ収集ＤＢ１６を備え、これらの各構成はシステムバスを介して互いに接続されている。 (First embodiment)
Hereinafter, the speech recognition input device according to the first embodiment of the present invention will be described in more detail with reference to the drawings. FIG. 1 shows a schematic configuration diagram of a speech recognition input device according to this embodiment.
The voice recognition input device 10 includes a central processing unit (CPU) 11 that controls the voice recognition input device 10 as a whole, a voice input I/F (interface) 12 that accepts voice input from a microphone or the like, a mouse, a keyboard, or the like. A manual input I / F (interface) 13 that accepts manual input (interface) 13, a memory 14, a voice recognition DB 15 that stores data necessary for voice recognition algorithms and voice recognition processing, and a log collection DB 16 that collects and records logs related to voice input. The configurations are connected together via a system bus.

本実施形態において、音声認識入力装置１０は、画像撮像装置２０と通信可能に接続され、画像撮像装置２０に対する種々の入力指示を行う。また、音声認識入力装置１０は、画像撮像装置２０を介してディスプレイ３０と接続され、画像撮像装置２０において取得した画像等をディスプレイ３０に表示させ、表示させた画像に対して拡大や縮小等の所望の操作指示を行う。画像撮像装置２０としては、Ｘ線装置、ＭＲＩ装置、ＣＴ装置、ＰＥＴ装置など、医用画像取得のためのハードウェアを適用することができる。 In this embodiment, the voice recognition input device 10 is communicably connected to the imaging device 20 and performs various input instructions to the imaging device 20 . Further, the voice recognition input device 10 is connected to a display 30 via an imaging device 20, displays an image or the like acquired by the imaging device 20 on the display 30, and enlarges or reduces the displayed image. A desired operation instruction is given. Hardware for acquiring medical images, such as an X-ray device, an MRI device, a CT device, and a PET device, can be applied as the image capturing device 20 .

音声認識入力装置１０によって画像撮像装置等に対して音声による入力指示を行うために、図１に示すように、ＣＰＵ１１は、音声操作処理部１２０、手動操作処理部１３０及びシステム操作決定部１４０の機能を実現する。特に、音声操作処理部１２０は、音声認識部１１１、コマンド変換部１１２及びコマンド解析部１１３の機能を実現する。 As shown in FIG. 1, the CPU 11 controls a voice operation processing unit 120, a manual operation processing unit 130, and a system operation determination unit 140 in order to issue voice input instructions to an imaging device or the like using the voice recognition input device 10. Realize the function. In particular, the voice operation processing unit 120 implements the functions of the voice recognition unit 111 , the command conversion unit 112 and the command analysis unit 113 .

なお、ＣＰＵ１１が実現するこれら各部の機能は、図示しない磁気ディスク等のメモリに格納されたプログラムをＣＰＵが予め読み込んで実行することによりソフトウエアとして実現することができる。なお、ＣＰＵ１１に含まれる各部が実行する動作の一部又は全部を、ＡＳＩＣ（ａｐｐｌｉｃａｔｉｏｎｓｐｅｃｉｆｉｃｉｎｔｅｇｒａｔｅｄｃｉｒｃｕｉｔ）やＦＰＧＡ（ｆｉｅｌｄ－ｐｒｏｇｒａｍｍａｂｌｅｇａｔｅａｒｒａｙ）により実現することもできる。 The functions of these units realized by the CPU 11 can be realized as software by the CPU reading in advance a program stored in a memory such as a magnetic disk (not shown) and executing the program. Part or all of the operations performed by each unit included in the CPU 11 can also be realized by ASIC (application specific integrated circuit) or FPGA (field-programmable gate array).

音声操作処理部１２０は、マイク等の音声入力Ｉ／Ｆ１２を介して入力された音声による操作指示（音声入力）を認識して、画像撮像装置２０に対して操作指示を行うものであり、音声認識部１１１、コマンド変換部１１２及びコマンド解析部１１３の機能を実現する。 The voice operation processing unit 120 recognizes a voice operation instruction (voice input) input via the voice input I/F 12 such as a microphone, and issues an operation instruction to the imaging device 20. It implements the functions of the recognition unit 111 , the command conversion unit 112 and the command analysis unit 113 .

音声認識部１１１は、予め音声認識ＤＢ１５等に記憶された音声認識アルゴリズムに従って、音声入力Ｉ／Ｆ１２を介して入力された音声による操作指示に対して音声認識処理を行い、認識結果である音声コマンドをコマンド変換部１１２に出力する。ここで、音声認識処理に際して音声認識部１１１は、後述する音声認識ＤＢ１５に格納された音声認識データテーブル（図２参照）を用い、所定の音声認識アルゴリズムに従って音声認識処理を行い、認識結果として音声コマンドを選出する。音声認識部１１１による音声認識処理の詳細は後述する。 The speech recognition unit 111 performs speech recognition processing on a speech operation instruction input via the speech input I/F 12 according to a speech recognition algorithm stored in advance in the speech recognition DB 15 or the like, and recognizes a speech command as a recognition result. to the command conversion unit 112 . Here, in the speech recognition process, the speech recognition unit 111 uses a speech recognition data table (see FIG. 2) stored in the speech recognition DB 15, which will be described later, to perform speech recognition processing according to a predetermined speech recognition algorithm. Pick a command. Details of the speech recognition processing by the speech recognition unit 111 will be described later.

コマンド変換部１１２は、音声認識部１１１における音声認識処理を経て選出された音声コマンドに対応する操作コマンドに変換し、当該操作コマンドを術者による操作指示としてシステム操作決定部１４０及びコマンド解析部１１３に出力する。 The command conversion unit 112 converts the voice command selected through the voice recognition processing in the voice recognition unit 111 into an operation command corresponding to the voice command, and converts the operation command into an operation instruction by the operator, which is used by the system operation determination unit 140 and the command analysis unit 113. output to

コマンド解析部１１３は、コマンド変換部１１２から、術者による操作指示に係る操作コマンドに関する情報を取得し、操作履歴を生成してログ収集ＤＢ１６に記録させると共に、当該操作履歴を解析する。ここで、操作コマンドに関する情報として、操作コマンドのみならず、当該操作コマンドに変換される前の音声コマンド等を含めることができる。また、コマンド解析部１１３は、操作履歴の解析結果に基づいて音声認識データテーブルの重み係数を更新する。重み係数の更新についての詳細は後述する。 The command analysis unit 113 acquires information about operation commands related to operation instructions by the operator from the command conversion unit 112, generates an operation history, records it in the log collection DB 16, and analyzes the operation history. Here, the information about the operation command can include not only the operation command but also the voice command before being converted into the operation command. Also, the command analysis unit 113 updates the weighting factor of the voice recognition data table based on the analysis result of the operation history. The details of updating the weighting factors will be described later.

手動操作処理部１３０は、手動での操作を行う場合に、マウスやキーボード等の手動入力Ｉ／Ｆ１３を介して入力された操作指示に基づいて当該操作指示に係る操作コマンドを生成し、システム操作決定部１４０に出力する。
システム操作決定部１４０は、音声操作処理部１２０又は手動操作処理部１３０から入力された操作コマンドを画像撮像装置２０に出力すると共に、コマンド解析部１１３に出力する。 When a manual operation is performed, the manual operation processing unit 130 generates an operation command based on an operation instruction input via the manual input I/F 13 such as a mouse and a keyboard, and performs system operation. Output to determination unit 140 .
The system operation determination unit 140 outputs the operation command input from the voice operation processing unit 120 or the manual operation processing unit 130 to the imaging device 20 and to the command analysis unit 113 .

音声入力Ｉ／Ｆ１２は、術者等の操作者の発話を音声による操作指示（音声入力）として受け付けて電気信号の音声データに変換し、音声データを音声操作処理部１２０に出力するものであり、例えばマイク等を適用することができる。
手動入力Ｉ／Ｆ１３は、術者等による手動の操作指示を受け付け、受け付けた操作指示を電気信号に変換して手動操作処理部１３０に出力するものであり、例えば、マウス、キーボード、タッチパネルなどの入力装置を適用することができる。 The voice input I/F 12 receives speech of an operator such as an operator as a voice operation instruction (voice input), converts it into voice data of an electric signal, and outputs the voice data to the voice operation processing unit 120 . , for example a microphone or the like can be applied.
The manual input I/F 13 receives a manual operation instruction from an operator or the like, converts the received operation instruction into an electric signal, and outputs the electric signal to the manual operation processing unit 130. For example, a mouse, a keyboard, a touch panel, or the like. Input devices can be applied.

メモリ１４は、ＣＰＵ１１が実行するプログラムや演算処理の途中経過を記憶したり、音声や手動による操作指示を一時的に記憶したりする。
音声認識ＤＢ１５は、予め定められた音声認識アルゴリズムを格納すると共に、音声認識処理に用いる音声認識データテーブルを記録している。音声認識データテーブルの詳細については後述する。
ログ収集ＤＢ１６は、コマンド解析部１１３によって生成された操作履歴などの操作指示に関する情報を取得して記録する。 The memory 14 stores the programs executed by the CPU 11 and intermediate progress of the arithmetic processing, and temporarily stores voice and manual operation instructions.
The speech recognition DB 15 stores a predetermined speech recognition algorithm and records a speech recognition data table used for speech recognition processing. Details of the voice recognition data table will be described later.
The log collection DB 16 acquires and records information related to operation instructions such as operation history generated by the command analysis unit 113 .

（音声認識処理及び音声認識データテーブルについて）
音声認識データテーブルは、図２に示すように、音声コマンドに関するデータを示す音声コマンドデータｄｂ１と、操作コマンドに関するデータを示す操作コマンドデータｄｂ２と、音声コマンド毎に定められた重み係数を示す重み係数データｄｂ３から構成される。 (Regarding voice recognition processing and voice recognition data table)
As shown in FIG. 2, the voice recognition data table includes voice command data db1 indicating data relating to voice commands, operation command data db2 indicating data relating to operation commands, and weighting factors indicating weighting factors determined for each voice command. It consists of data db3.

図２に示すように、音声認識データテーブルにおいて、同一の操作コマンドｄｂ２１に対して複数の音声コマンドｄｂ１１が対応付けて記録されている。このようにすることで、同一の操作指示に対して術者毎に異なる口癖や発話による指示がなされた場合であっても、音声認識処理を経て同一の操作を実現させることができる。また、音声コマンドデータｄｂ１の各音声コマンドに対して、夫々コマンド読みｄｂ１２及び音声コマンドコードｄｂ１３が対応付けて記録されている。操作コマンドｄｂ２１についても、同様に操作コマンド毎に操作コマンドコードｄｂ２２が対応付けられて記憶されている。 As shown in FIG. 2, in the voice recognition data table, a plurality of voice commands db11 are recorded in association with the same operation command db21. In this way, even when the same operation instruction is given by different operators using different phrases or utterances, the same operation can be realized through voice recognition processing. Also, a command reading db12 and a voice command code db13 are recorded in association with each voice command of the voice command data db1. As for the operation command db21, similarly, an operation command code db22 is associated with each operation command and stored.

ところで、音声認識部１１１は、次のように音声認識処理を行う。すなわち、音声認識部１１１は、まず、音声入力Ｉ／Ｆを介して入力された音声データを音波に変換し、音波から音声データの１文字ずつを音素に分解して特定する。続いて、隠れマルコフモデルに則った統計的機械学習や深層学習モデルを用いた機械学習等の音声認識アルゴリズムを用いて、音声データの音素とコマンド読みｄｂ１２の音素との照合を行う。この照合により、音声認識部１１１は、音声データと類似するコマンド読みｄｂ１２を選出し、選出されたコマンド読みに対応する音声コマンドｄｂ１１の候補とその確からしさの指標となる得点を出力する。 By the way, the speech recognition unit 111 performs speech recognition processing as follows. That is, the speech recognition unit 111 first converts speech data input via the speech input I/F into sound waves, decomposes each character of the speech data from the sound waves into phonemes, and specifies the phonemes. Subsequently, using a speech recognition algorithm such as statistical machine learning based on a hidden Markov model or machine learning using a deep learning model, the phonemes of the speech data are compared with the phonemes of the command reading db12. By this collation, the voice recognition unit 111 selects command readings db12 similar to the voice data, and outputs voice command db11 candidates corresponding to the selected command readings and scores as indicators of their likelihood.

ここで、音声認識処理の一例として、術者が「画像縮小」と発話して操作指示を行う場合について検討する。術者が「画像縮小」と発話した場合、音声認識データテーブルのコマンド読みｄｂ１２において「がぞうしゅくしょう」と「がぞうしゅうしゅう」とは途中まで音素が一致している。このため、音声認識部１１は、入力される音声データの品質に依存して「がぞうしゅくしょう」を、「画像収集」と誤って認識する虞がある。この場合、術者は操作コマンドｄｂ２１の画像縮小を指示したにも拘らず、音声認識の誤認識によって音声コマンドｄｂ１１の「がぞうしゅうしゅう」に対応した操作コマンドｄｂ２１の「透視記録」が操作コマンドとして選択され、術者の意図しない操作が行われてしまう。 Here, as an example of voice recognition processing, a case in which the operator utters "reduce image" to give an operation instruction will be considered. When the operator utters "reduce image", the phonemes of "gazoshukusho" and "gazoshushuu" match halfway in the command reading db12 of the speech recognition data table. For this reason, the speech recognition unit 11 may mistakenly recognize "collection of images" as "collection of images" depending on the quality of the input speech data. In this case, even though the operator instructed image reduction of the operation command db21, the operation command db21 "perspective record" corresponding to the voice command db11 "remember" is selected as the operation command due to misrecognition of the voice recognition. and an operation unintended by the operator is performed.

そこで、音声認識データテーブルでは、このような誤認識を回避するために重み係数データｄｂ３を音声コマンド毎に対応付けて記録している。重み係数データｄｂ３は、音声認識処理の過程において出力される１以上の音声コマンド候補の各々に付帯した得点に対して乗算する重み係数である。重み係数データｄｂ３に記録された重み係数は、各音声コマンドに対応し、当該音声コマンドの使用頻度等に応じて定められた値である。 Therefore, in order to avoid such erroneous recognition, the voice recognition data table records the weighting factor data db3 in association with each voice command. The weighting factor data db3 is a weighting factor to be multiplied by the score attached to each of the one or more voice command candidates output in the process of speech recognition processing. The weighting factor recorded in the weighting factor data db3 is a value that corresponds to each voice command and is determined according to the frequency of use of the voice command.

また、重み係数は、ログ収集ＤＢ１６に記録された操作履歴をコマンド解析部１１３が解析した結果に基づいて更新することができる。すなわち、コマンド解析部１１３が、操作履歴を解析することにより、一定期間内における操作コマンド毎の使用回数を算出し、使用回数に基づいて重み係数を更新するためのオフセット係数Ｔを算出する。 Also, the weighting factor can be updated based on the result of analysis of the operation history recorded in the log collection DB 16 by the command analysis unit 113 . That is, the command analysis unit 113 analyzes the operation history to calculate the number of times each operation command is used within a certain period of time, and calculates the offset coefficient T for updating the weighting factor based on the number of times of use.

オフセット係数Ｔは、例えば、図３に示すグラフに従って各操作コマンドに対応する操作コマンドコードが発行された積算回数によって定めることができる。この他、オフセット係数Ｔは、予め定めた期間における操作コマンドコードの発行総数に対する各操作コマンドの割合に基づいて算出することもできる。コマンド解析部１１３は、算出されたオフセット係数Ｔを操作コマンドに対応して記録される各重み係数に乗じることにより重み係数を更新する。 The offset coefficient T can be determined, for example, by the accumulated number of times the operation command code corresponding to each operation command is issued according to the graph shown in FIG. In addition, the offset coefficient T can also be calculated based on the ratio of each operation command to the total number of issued operation command codes in a predetermined period. The command analysis unit 113 multiplies each weighting factor recorded corresponding to the operation command by the calculated offset factor T to update the weighting factor.

なお、コマンド解析部１１３による操作履歴の解析は、自動的に行うことができる他、術者や装置提供者による操作指示に従って行う等任意のタイミングで行うことができる。また、コマンド解析部１１３は、音声認識部１１１の音声認識処理において誤検出が生じた場合には、操作コマンドの使用回数から減算するなどして、重み係数を更新することができる。 The analysis of the operation history by the command analysis unit 113 can be performed automatically, or can be performed at arbitrary timing, such as in accordance with an operation instruction from the operator or the device provider. Further, when an erroneous detection occurs in the voice recognition processing of the voice recognition unit 111, the command analysis unit 113 can update the weighting factor by subtracting it from the number of times the operation command is used.

以下、このように構成された音声認識入力装置１０による音声入力処理の流れについて図４のフローチャートに従って説明する。
図４に示すように、音声認識入力装置１０が作動すると、音声入力を待機状態となる。音声入力Ｉ／Ｆにおいて音声入力があった場合には（ステップＳ１０１）、ステップＳ１０２に進み、音声認識部１１１が音声入力Ｉ／Ｆ１２から音声データの入力を受け付け当該音声データの音素と音声認識ＤＢ１５に登録された各コマンド読みの音素との照合を行い、コマンド読みに対応して記録された音声コマンドの候補を選出する。この候補の選出は、音声認識の確からしさの指標となる得点に基づいて判断することができる。 The flow of speech input processing by the speech recognition input device 10 configured as described above will be described below with reference to the flowchart of FIG.
As shown in FIG. 4, when the voice recognition input device 10 is activated, voice input is put into a standby state. When there is a voice input in the voice input I/F (step S101), the process proceeds to step S102, and the voice recognition unit 111 receives input of voice data from the voice input I/F 12 and recognizes the phonemes of the voice data and the voice recognition DB 15. , and selects voice command candidates recorded corresponding to command readings. Selection of this candidate can be determined based on the score, which is an index of certainty of speech recognition.

次のステップＳ１０３では、ステップＳ１０２によって選出された音声コマンドの候補が１以上あるか否かを判定し、音声コマンドの候補数が１つ以上ない場合にはステップＳ１０４に進み音声コマンドなしとしてステップＳ１０１に戻る。音声コマンドの候補数が１以上ある場合にはステップＳ１０５に進み、音声コマンドの各候補に付与されている得点と、音声認識データテーブルに当該音声コマンドに対応して記録されている重み係数とを乗算する。 In the next step S103, it is determined whether or not there is one or more voice command candidates selected in step S102. back to If the number of voice command candidates is one or more, the process advances to step S105 to calculate the score assigned to each voice command candidate and the weighting factor recorded corresponding to the voice command in the voice recognition data table. Multiply.

ステップＳ１０６では、ステップＳ１０５における得点と重み係数との乗算の結果、最高得点となる音声コマンド候補を選出する。次のステップＳ１０７では、最高得点、すなわち、選出された音声コマンド候補の得点が予め定めた閾値より大きいか否かを判定し、最高得点が予め定めた閾値より小さい場合には音声コマンドがなかったとしてステップＳ１０４を経て、音声による操作を実行せずに、ステップＳ１０１に戻り、音声認識入力装置１０は、再度、音声入力を待機する状態となる。このとき、術者へ音声操作を実行しない旨の通知を、例えば合成された音声、アラーム、ディスプレイを用いて行うことができる。 In step S106, the voice command candidate with the highest score is selected as a result of the multiplication of the score and the weighting factor in step S105. In the next step S107, it is determined whether the highest score, that is, the score of the selected voice command candidate is greater than a predetermined threshold. If the highest score is less than the predetermined threshold, there is no voice command. After passing through step S104, the process returns to step S101 without executing the voice operation, and the voice recognition input device 10 again waits for voice input. At this time, it is possible to notify the operator that the voice operation will not be performed by using, for example, a synthesized voice, an alarm, or a display.

最高得点が予め定めた閾値より大きい場合には、ステップＳ１０８に進み最高得点を示した音声コマンドを音声認識処理の結果として決定する。決定された音声コマンドはコマンド変換部１１２に出力され、コマンド変換部１１２において、音声認識データテーブルを用いて、決定された音声コマンドを、当該音声コマンドに対応する操作コマンドに変換する（Ｓ１０９）。 If the highest score is greater than the predetermined threshold, the process proceeds to step S108 to determine the voice command indicating the highest score as the result of voice recognition processing. The determined voice command is output to the command conversion unit 112, and the command conversion unit 112 uses the voice recognition data table to convert the determined voice command into an operation command corresponding to the voice command (S109).

次のステップＳ１１０において、コマンド変換部１１２は、変換された操作コマンドをコマンド解析部１１３及びシステム操作決定部１４０に出力する。コマンド解析部１１３では、入力された操作コマンドを含めて操作履歴を更新生成し、ログ収集ＤＢ１６に記録させる。システム操作決定部１４０では、入力された操作コマンドを画像撮像装置２０に出力する。画像撮像装置２０では、入力された操作コマンドに応じた操作が実行される。 In the next step S<b>110 , the command conversion section 112 outputs the converted operation command to the command analysis section 113 and the system operation determination section 140 . The command analysis unit 113 updates and generates an operation history including the input operation command, and records it in the log collection DB 16 . The system operation determination unit 140 outputs the input operation command to the imaging device 20 . The imaging device 20 executes an operation according to the input operation command.

このように本実施形態によれば、操作コマンドに対して複数の音声コマンドを対応付けて記録し、かつ、各操作コマンドについて使用頻度の高い順に高い重み係数を持たせ音声認識処理用いることで、術者毎に異なる発話の癖や好みに依存せず精度よく音声認識処理を行うことができる。また、操作コマンドの使用頻度を記録し、当該使用頻度に応じて重み係数を更新することで、経時的に術者の使用頻度の高い操作コマンドについて音声認識処理の精度を向上させることができ、術者の音声による操作指示を正確に認識することができる。 As described above, according to the present embodiment, a plurality of voice commands are associated with an operation command and recorded, and each operation command is given a weighting factor in descending order of frequency of use, and voice recognition processing is performed. It is possible to accurately perform speech recognition processing without depending on the utterance habits and preferences that differ from operator to operator. In addition, by recording the frequency of use of the operation command and updating the weighting factor according to the frequency of use, it is possible to improve the accuracy of voice recognition processing for the operation command that is frequently used by the operator over time. It is possible to accurately recognize operation instructions by the voice of the operator.

（変形例）
上述した第１の実施形態では、コマンド解析部１１３が操作コマンドの使用頻度に基づいて重み係数を更新する例について説明した。本変形例では、入力された音声コマンドに基づいて重み係数を更新する例について説明する。 (Modification)
In the first embodiment described above, an example has been described in which the command analysis unit 113 updates the weighting factor based on the frequency of use of the operation command. In this modified example, an example of updating weighting factors based on an input voice command will be described.

コマンド解析部１１３は、操作履歴を解析することにより、コマンド変換部１１２から入力された操作コマンド変換される前の音声コマンドについて、一定期間内における音声コマンド毎の検出頻度を算出する。そして、算出された音声コマンドの検出頻度に基づいて重み係数を更新するためのオフセット係数Ｖを算出する。 The command analysis unit 113 analyzes the operation history to calculate the frequency of detection of each voice command within a certain period of time before being converted into an operation command input from the command conversion unit 112 . Then, an offset coefficient V for updating the weighting coefficient is calculated based on the calculated voice command detection frequency.

オフセット係数Ｖは、例えば、図５に示すグラフに従って、各音声コマンドに対応する音声コマンドコードが発行された積算回数によって定めることができる。この他、ある期間の音声コマンドコードの発行総数に対する各音声コマンドの割合に基づいてオフセット係数Ｖを決定しても良い。 The offset coefficient V can be determined, for example, according to the graph shown in FIG. 5, based on the accumulated number of times the voice command code corresponding to each voice command has been issued. Alternatively, the offset coefficient V may be determined based on the ratio of each voice command to the total number of voice command codes issued during a certain period.

コマンド解析部１１３は、算出されたオフセット係数Ｖを操作コマンドに対応して記録される各重み係数に乗じることにより重み係数を更新する。この場合にも、コマンド解析部１１３は、音声認識部１１１の音声認識処理において誤検出が生じた場合には、音声コマンドの積算回数から減算するなどして、重み係数を更新することが好ましい。 The command analysis unit 113 multiplies each weighting factor recorded corresponding to the operation command by the calculated offset factor V to update the weighting factor. In this case as well, when an erroneous detection occurs in the voice recognition processing of the voice recognition unit 111, the command analysis unit 113 preferably updates the weighting factor by subtracting it from the cumulative number of voice commands.

このように本変形例では、術者の発話に基づく音声コマンドについて検出頻度の高い順に高い重み係数を持たせ、当該重み係数を音声認識処理用いることで、術者毎に異なる発話の癖や好みに依存せず精度よく音声認識処理を行うことができる。また、音声コマンドの検出回数を記録し、当該検出回数に応じて重み係数を更新することで、経時的に術者の検出頻度の高い音声コマンドについて音声認識の精度を向上させることができる。上述の第１の実施形態に比して更に音声認識の精度を向上させることができ、術者の音声による操作指示を正確に認識することができる。 As described above, in this modified example, a higher weighting factor is given to voice commands based on the operator's utterances in descending order of detection frequency. It is possible to accurately perform speech recognition processing without depending on In addition, by recording the number of voice command detections and updating the weighting factor according to the number of detections, it is possible to improve the accuracy of voice recognition over time for voice commands that are frequently detected by the operator. The accuracy of speech recognition can be further improved as compared with the above-described first embodiment, and the operator's voice operation instructions can be accurately recognized.

なお、上述の操作コマンドの使用頻度に基づくオフセット係数Ｔ及び音声コマンドの検出頻度に基づくオフセット係数Ｖを共に乗じた結果を重み係数に乗じることにより更新することもできる。この場合には、使用頻度の高い操作コマンド且つ検出頻度の高い音声コマンドの音声認識精度がより向上する。この場合にも、コマンド解析部１１３は、音声認識部１１１の音声認識処理において誤検出が生じた場合には、操作コマンドの使用回数及び音声コマンドの積算回数から減算するなどして、重み係数を更新することが好ましい。 It is also possible to update by multiplying the result of multiplying both the offset coefficient T based on the frequency of use of the operation command and the offset coefficient V based on the detection frequency of the voice command by the weighting coefficient. In this case, the accuracy of voice recognition for frequently used operation commands and frequently detected voice commands is further improved. In this case as well, when an erroneous detection occurs in the voice recognition processing of the voice recognition unit 111, the command analysis unit 113 subtracts the number of times the operation command is used and the cumulative number of times the voice command is accumulated, and sets the weighting factor. preferably updated.

（第２の実施形態）
上述した第１の実施形態及びその変形例においては、音声認識ＤＢ１５内に１の音声認識データテーブルが格納されている例について説明した。本実施形態においては、音声認識ＤＢ１５に、音声認識入力装置１０が適用される外部装置の運用状況に依存して、図２に示すような音声認識データテーブルが複数格納されており、外部装置の運用状況に応じて音声認識処理に用いるデータテーブルを切り替える。 (Second embodiment)
In the first embodiment and its modified example described above, an example in which one voice recognition data table is stored in the voice recognition DB 15 has been described. In this embodiment, the voice recognition DB 15 stores a plurality of voice recognition data tables as shown in FIG. The data table used for speech recognition processing is switched according to the operational status.

装置の運用状況として、例えば、検査開始前後、検査種別、Ｘ線出力の有無等が考えられ、音声認識ＤＢには、予めこれらの状況に応じて複数の音声認識データテーブルｔｇ００１～ｔｇ＊＊＊を格納しておく。音声認識データテーブルの一例は、図２に示した通りである。また、音声認識ＤＢには、図６に示すような装置の運用状況を示すデータテーブルが格納され、これらのデータテーブルを参照して適切な音声認識データテーブルを選択する。なお、図６（Ａ）は、検査開始情報の状態を示すデータテーブル、（Ｂ）は、検査種別の状態を示すデータテーブル、（Ｃ）はＸ線照射情報の状態を示すデータテーブル、（Ｄ）は装置運用状況と音声認識データベースの分類を示すデータテーブルである。 As the operation status of the apparatus, for example, before and after the start of examination, examination type, presence/absence of X-ray output, etc. can be considered. store. An example of the voice recognition data table is as shown in FIG. Further, the speech recognition DB stores data tables showing the operational status of the apparatus as shown in FIG. 6, and an appropriate speech recognition data table is selected by referring to these data tables. 6(A) is a data table showing the state of examination start information, (B) is a data table showing the state of examination types, (C) is a data table showing the state of X-ray irradiation information, (D ) is a data table showing the device operation status and the classification of the speech recognition database.

以下、音声認識データテーブルｔｇ００１～ｔｇ＊＊＊の切り替えの流れについて、図７のフローチャートに従って説明する。
音声認識入力装置１０が起動すると、システム操作決定部１４が画像撮像装置２０から逐次的に装置運用状況に係る情報を取得する（ステップＳ２０１）。本実施形態においては、システム操作決定部１４が、例えば、検査開始情報、検査種別情報及びＸ線照射情報を取得する。装置運用状況に係る情報を取得すると、システム操作決定部１４は、取得した情報それぞれについて従前の状態と比べて変化があったか否かを判定する（ステップＳ２０２）。 The switching flow of the speech recognition data tables tg001 to tg*** will be described below with reference to the flowchart of FIG.
When the voice recognition input device 10 is activated, the system operation determination unit 14 sequentially acquires information on the device operation status from the imaging device 20 (step S201). In this embodiment, the system operation determination unit 14 acquires, for example, examination start information, examination type information, and X-ray irradiation information. After obtaining the information about the device operation status, the system operation determining unit 14 determines whether or not each of the obtained information has changed from the previous state (step S202).

ステップＳ２０２の判定において、検査開始情報、検査種別情報及びＸ線照射情報のうち何れか少なくとも１つの情報に変化があった場合にステップＳ２０３に進み、装置運用状況コマンドＳｔを生成する。装置運用状況コマンドＳｔは、図６（Ｄ）に示すように、検査開始情報、検査種別及びＸ線照射情報の３つの情報からなり、これらの組み合わせに応じて適用すべき音声認識データテーブルが定まるようになっている。 If it is determined in step S202 that at least one of the examination start information, examination type information, and X-ray irradiation information has changed, the process advances to step S203 to generate an apparatus operation status command St. As shown in FIG. 6D, the apparatus operation status command St consists of three pieces of information: examination start information, examination type, and X-ray irradiation information. It's like

システム操作決定部１４は、生成された装置運用状況コマンドＳｔを音声認識部１１１へ出力し（ステップＳ２０５）、音声認識部１１１は、入力された装置運用状況コマンドＳｔに従って音声認識データテーブルを選択し、切り替える。装置運用状況コマンドＳｔの各情報、例えば、検査開始情報が「検査開始後」を示すａｏ１、検査種別が「Ａｂｄｏｍｅｎ（腹部）」を示す１００１、Ｘ線照射情報が「照射中」を示すｃ０１である場合には、音声認識データベースｔｇ００３が選択される。 The system operation determination unit 14 outputs the generated device operation status command St to the voice recognition unit 111 (step S205), and the voice recognition unit 111 selects a voice recognition data table according to the input device operation status command St. , switch. Each information of the apparatus operation status command St, for example, examination start information ao1 indicating "after examination start", examination type 1001 indicating "abdomen (abdomen)", X-ray irradiation information c01 indicating "under irradiation" In some cases, the speech recognition database tg003 is selected.

上述のように音声認識ＤＢに音声認識データテーブルが複数格納されている場合においても、第１の実施形態及びその変形例と同様に、重み係数の更新を行うことができる。 Even when a plurality of speech recognition data tables are stored in the speech recognition DB as described above, weighting factors can be updated in the same manner as in the first embodiment and its modification.

コマンド解析部１１３は、操作履歴を生成する際に、操作コマンドを示す操作コマンドコードや音声コマンドを示す音声コマンドコードと共に、検査開始の有無や検査種別に係るコマンドコード（図６参照）を記録する。このようにすることで、更新が必要な音声認識データベースの重み係数についてのみ更新を行うことができる。 When generating the operation history, the command analysis unit 113 records the command code (see FIG. 6) related to whether or not to start the examination and the examination type, together with the operation command code indicating the operation command and the voice command code indicating the voice command. . By doing so, it is possible to update only the weighting coefficients of the speech recognition database that need to be updated.

このように本実施形態によれば、音声認識入力装置を適用する装置の運用状況に応じて音声認識処理に用いる音声認識データテーブルを切り替えることができる。各音声認識データテーブルは、装置運用状況毎に使用頻度の高い操作、使用頻度の高い音声コマンドに比重を置いた重み係数を定めることができるため、音声認識処理の精度を向上させることができる。 As described above, according to the present embodiment, it is possible to switch the speech recognition data table used for the speech recognition process according to the operational status of the device to which the speech recognition input device is applied. Each voice recognition data table can define a weighting factor that places a weight on frequently used operations and frequently used voice commands for each device operation status, thereby improving the accuracy of voice recognition processing.

１０・・・音声認識入力装置、１１・・・ＣＰＵ、１２・・・音声入力Ｉ／Ｆ、１３・・・手動入力Ｉ／Ｆ、１４・・・メモリ、１５・・・音声認識ＤＢ、１６・・・ログ収集ＤＢ、２０・・・画像撮像装置、３０・・・ディスプレイ、１１１・・・音声認識部、１１２・・・コマンド変換部、１１３・・・コマンド解析部、１２０・・・音声操作処理部、１３０・・・手動操作処理部、１４０・・・システム操作決定部、２０・・・画像撮像装置、３０・・・ディスプレイ DESCRIPTION OF SYMBOLS 10... Voice recognition input device, 11... CPU, 12... Voice input I/F, 13... Manual input I/F, 14... Memory, 15... Voice recognition DB, 16 ... log collection DB, 20 ... image pickup device, 30 ... display, 111 ... speech recognition section, 112 ... command conversion section, 113 ... command analysis section, 120 ... voice Operation processing unit 130 Manual operation processing unit 140 System operation determination unit 20 Image pickup device 30 Display

Claims

A voice recognition input device for inputting an operation command to an external device,
a storage unit storing a voice recognition data table in which a plurality of voice commands are associated with one operation command and recorded, and a weighting factor corresponding to the frequency of use of the voice command is recorded for each voice command;
a voice recognition unit that receives a voice input, performs voice recognition processing on the voice input as a recognition target, and outputs a voice command corresponding to the voice input as a result of the voice recognition processing;
a command conversion unit that refers to the voice recognition data table and converts the voice command into an operation command recorded in correspondence with the voice command;
an operation determination unit that outputs the operation command to the external device;
The speech recognition unit refers to the speech recognition data table, selects a plurality of speech command candidates that can correspond to the speech input, and calculates an index of likelihood of speech recognition, and for each of these plurality of speech command candidates : A voice recognition input device for selecting the most probable voice command by multiplying the index and the weighting factor, and outputting the selected voice command as a result of voice recognition processing for the voice input.

A command analysis unit that generates an operation history recording at least one of the operation command and the voice command corresponding to the operation command, and updates the weighting factor based on a result of analyzing the operation history. 1. The speech recognition input device according to 1.

3. The voice recognition input device according to claim 2, wherein the command analysis unit updates the weighting factor based on at least one of the frequency of use of the operation command within a certain period of time and the cumulative number of voice commands.

the storage unit stores a plurality of voice recognition data tables corresponding to the operation status of the external device;
5. The speech recognition input device according to claim 1, wherein the speech recognition unit switches the speech recognition data table used for speech recognition processing according to the operational status of the external device.

A voice recognition input program that causes a computer to input an operation command by voice to a medical imaging device,
A plurality of voice commands are associated with one operation command and recorded, and a voice input from the user is recognized by referring to a voice recognition data table in which a weighting factor corresponding to the frequency of use of each voice command is recorded. a speech recognition step of performing speech recognition processing as a recognition target and outputting a speech command corresponding to the speech input as a result of the speech recognition processing;
a command conversion step of referring to the voice recognition data table and converting the voice command into an operation command recorded corresponding to the voice command;
an operation decision step of outputting the operation command to the medical imaging device;
In the voice recognition step, referring to the voice recognition data table, a plurality of voice command candidates that can correspond to the voice input are selected and an index of likelihood of voice recognition is calculated, and for each of the plurality of voice command candidates , A voice recognition input program for selecting the most probable voice command by multiplying the index and the weighting factor, and outputting the selected voice command as a result of voice recognition processing for the voice input.

a speech recognition input device according to any one of claims 1 to 4;
5. A medical device, comprising a medical imaging device as an external device, wherein the voice recognition input device according to any one of claims 1 to 4 issues operation instructions to the medical imaging device by voice recognition input. Image capture system.