JP5049930B2

JP5049930B2 - Distributed speech recognition system

Info

Publication number: JP5049930B2
Application number: JP2008230693A
Authority: JP
Inventors: 智一佐野; 克典永井; 良雄丸山; 浩次郎岡本; 太一野村
Original assignee: Hitachi Ltd
Current assignee: Hitachi Ltd
Priority date: 2008-09-09
Filing date: 2008-09-09
Publication date: 2012-10-17
Anticipated expiration: 2028-09-09
Also published as: JP2010066360A

Description

本発明は、主に発電等のプラントの稼働を制御する複数の制御装置に行う指令を音声により行う技術に関する。 The present invention relates to a technique for giving a voice command to a plurality of control devices that mainly control operation of a plant such as power generation.

プラントが配置されている現場には、プラントの稼働を制御する制御装置（「制御盤」ともいう。）が配置されており、通常は、複数の制御装置が並列に配置されている。この制御装置では、その制御を行うにあたり、他の制御装置との信号の取り合いが行われている。 A control device (also referred to as a “control panel”) for controlling the operation of the plant is arranged at a site where the plant is arranged, and usually a plurality of control devices are arranged in parallel. In this control device, signals are exchanged with other control devices when performing the control.

近年、制御装置間の信号の取り合いは、制御装置間をケーブル配線で接続した有線通信から無線通信に切り替えられるようになった。無線通信であれば、ケーブルの配線に必要なインタフェースや空間を設ける必要はなく、基本的には、制御装置に対して送信用装置および受信用装置を配置すれば十分である。しかし、無線通信では、制御装置等の規格を統一する必要が生じ、メーカが異なり、規格の異なる制御装置等が用いられる場合には、これらの装置からなるシステムの実現は難しい。 In recent years, signal communication between control devices can be switched from wired communication in which control devices are connected by cable wiring to wireless communication. In the case of wireless communication, it is not necessary to provide an interface or space necessary for cable wiring. Basically, it is sufficient to arrange a transmission device and a reception device with respect to the control device. However, in wireless communication, it is necessary to unify the standards of control devices and the like. When control devices with different standards are used from different manufacturers, it is difficult to realize a system composed of these devices.

そこで、無線通信において音声を使用する方法が考えられる。つまり、現場に配置されたスピーカから制御装置用の指令を音声で制御装置に出力し、音声の信号処理を含めた制御装置間の信号の取り合いを実現するというものである。この方法であれば、前記送信用装置は不要であり、受信用装置のみ対応すれば良く、システムの実現における規格の統一化は容易になる。 Therefore, a method of using voice in wireless communication can be considered. In other words, a command for the control device is output to the control device by voice from a speaker arranged at the site, and signal exchange between the control devices including voice signal processing is realized. With this method, the transmitting device is not necessary, and only the receiving device needs to be supported, and it is easy to standardize the standard in realizing the system.

このようなシステムは特に、制御装置の点検作業において有用である。この点検作業においては、特定の検査員が試験器材を現場に持ち込み、前記取り合いの動作確認、当該制御装置の制御対象となる補機、つまり、プラントを構成するポンプ、バルブ等の動作確認等といった、制御装置の点検が行われる。また、この点検では、現場にいる検査員と、現場から離れた工場で監視している監視員とがトランシーバ等で連絡を取り合っている。検査員は監視員からの指令に従って前記点検を行い、その結果を監視員に報告する。また、監視員が制御装置に対し直接指令するためのコマンド操作を行い、検査員による点検をサポートする場合もある。このコマンド操作を音声で行えば、点検作業は容易になる。 Such a system is particularly useful in inspection work for control devices. In this inspection work, a specific inspector brings test equipment to the site, confirms the operation of the joint, and confirms the operation of auxiliary equipment that is the control target of the control device, that is, pumps, valves, etc. constituting the plant. The control device is inspected. In this inspection, an inspector at the site and a monitor who is monitoring at a factory away from the site communicate with each other through a transceiver or the like. The inspector performs the inspection in accordance with the instruction from the observer and reports the result to the observer. In some cases, the supervisor may perform a command operation for directing the control device to support inspection by the inspector. If this command operation is performed by voice, the inspection work becomes easy.

ただ、音声を使用する場合には、制御装置側で音声の認識を適切に行う必要があるが、その音声にノイズが入り込んでしまい、音声の誤認識を招くおそれがある。他の作業で多くのノイズが生じてしまう現場においては尚更である。 However, when voice is used, it is necessary to properly recognize the voice on the control device side, but noise enters the voice and may cause erroneous recognition of the voice. This is especially true at sites where a lot of noise is generated by other operations.

特許文献１に開示されているように、複数個所で音声を検出し、その音声を比較しながら認識すれば、ある程度は誤認識を低減することができる。しかし、音声の検出に対し、音声の認識を処理するための構成は単純化されているため、よほど高精度の音声認識技術を用いない限り、前記点検に必要とされる認識の精度を実現することは困難である。
特許第３７２５５６６号公報（段落００２５、図４等） As disclosed in Patent Document 1, if speech is detected at a plurality of locations and recognized while comparing the speech, erroneous recognition can be reduced to some extent. However, since the configuration for processing the speech recognition for the detection of the speech is simplified, the recognition accuracy required for the inspection is realized unless a highly accurate speech recognition technology is used. It is difficult.
Japanese Patent No. 3725566 (paragraph 0025, FIG. 4 etc.)

前記事情を鑑みて、本発明は、発電等のプラントの制御装置に入力される音声の認識の精度を高めることを目的とする。 In view of the above circumstances, an object of the present invention is to improve the accuracy of recognition of voice input to a plant control device such as power generation.

前記目的を達成するため、本発明では、発電等のプラントの制御装置において音声認識を行う際に、入力された音声の音声信号が各制御装置同士で一致するか否かを判断する。詳細は、後記する。 In order to achieve the above object, according to the present invention, when speech recognition is performed in a plant control device such as power generation, it is determined whether or not the input speech signals coincide with each other. Details will be described later.

本発明により、発電等のプラントの制御装置に入力される音声の認識の精度を高めることができる。 According to the present invention, it is possible to improve the accuracy of recognition of a voice input to a plant control device such as power generation.

≪構成≫
以下、本発明の実施の形態（以下、「実施形態」という。）を、図を用いて説明する。
図１は、本実施形態による分散型音声認識システムが使用されるネットワークの構成を示したブロック図である。 ≪Configuration≫
Embodiments of the present invention (hereinafter referred to as “embodiments”) will be described below with reference to the drawings.
FIG. 1 is a block diagram showing a configuration of a network in which the distributed speech recognition system according to the present embodiment is used.

この分散型音声認識システムは、発電等のプラントの稼働を制御し、現場において複数（図１では８つ）並列に配置された制御装置２に搭載されている。分散型音声認識システムの説明は、後記する。現場のネットワークには、前記制御装置２の他に、工場から出力される音声を少なくとも制御装置２に向けて出力する音声出力部１および工場にあるコンピュータ（工場用サーバ８等）とインターネット回線により通信可能に接続された現場用サーバ４が通信可能に接続されている。また、現場においては、検査員３が制御装置２の検査を行っている。検査としては、例えば、制御装置２に搭載されているＰＩ／Ｏ（Process Input／Output）インタフェースとして実現されるＡＩ／Ｏ（Analog Input／output）ユニットまたはＤＩ／Ｏ（Digital Input／Output）ユニットの静特性検査がある。 This distributed speech recognition system controls the operation of a plant such as power generation, and is mounted on a plurality of control devices 2 arranged in parallel (eight in FIG. 1) in the field. The distributed speech recognition system will be described later. In addition to the control device 2, the on-site network includes a voice output unit 1 that outputs at least voice output from the factory to the control device 2, a computer (such as a factory server 8) in the factory, and an Internet line. The on-site server 4 connected to be communicable is connected to be communicable. In addition, the inspector 3 is inspecting the control device 2 at the site. As an inspection, for example, an AI / O (Analog Input / Output) unit or DI / O (Digital Input / Output) unit implemented as a PI / O (Process Input / Output) interface mounted on the control device 2 is used. There is a static characteristic inspection.

一方、工場においては、そのネットワークに対し、音声の入力を行うマイク７を有する工場用端末６および現場にあるコンピュータ（現場用サーバ４等）とインターネット回線により通信可能に接続された工場用サーバ８が通信可能に接続されている。工場においては、制御装置２の稼働状況を音声により制御する監視員５が待機している。また、監視員５と検査員３とは、例えば、トランシーバを用いて、主に制御装置２の点検に関する連絡を取り合っている。 On the other hand, in a factory, a factory server 6 having a microphone 7 for inputting voice and a factory computer 6 (site server 4 etc.) connected to the network so as to be communicable via an Internet line. Are communicably connected. In the factory, a supervisor 5 who controls the operation status of the control device 2 by voice is on standby. In addition, the supervisor 5 and the inspector 3 communicate with each other mainly regarding the inspection of the control device 2 using a transceiver, for example.

なお、コンピュータである音声出力部１、制御装置２、現場用サーバ４、工場用端末６および工場用サーバ８は、そのハードウェア構成として、入力ポートを有する入力部、出力ポートを有する出力部、ＣＰＵ（Central Processing Unit）等として実現される制御部（第１の制御部、第２の制御部を含む。）、外部記憶装置としてのＨＤＤ（Hard Disk Drive）等で実現される記憶部（第１の記憶部、第２の記憶部を含む。）、読み書きされるデータが展開される記憶領域を有するＲＡＭ（Random Access Memory）等で実現されるメモリを有している。これらの装置で本発明に関する処理が実行されるときには、各装置の記憶部に格納されたプログラムがメモリにロードされ、各ＣＰＵ（制御部）により実行されることにより、ネットワークを構成する各装置上に具現化される各処理部により実行される。また、各プログラムは予め記憶部に格納されても良いし、他の記憶媒体または通信媒体（ネットワークまたはネットワークを伝搬する搬送波）を介して、必要なときに導入されても良い。 In addition, the audio | voice output part 1, the control apparatus 2, the site server 4, the factory terminal 6, and the factory server 8 which are computers are an input part which has an input port, an output part which has an output port, as the hardware constitutions, A control unit (including a first control unit and a second control unit) realized as a CPU (Central Processing Unit) and the like, and a storage unit (first storage) realized as an HDD (Hard Disk Drive) as an external storage device 1 and a second storage unit), and a memory realized by a RAM (Random Access Memory) having a storage area where data to be read and written is expanded. When processing relating to the present invention is executed in these devices, a program stored in the storage unit of each device is loaded into the memory and executed by each CPU (control unit), so that each device constituting the network It is executed by each processing unit embodied in Each program may be stored in the storage unit in advance, or may be introduced when necessary via another storage medium or communication medium (a network or a carrier wave propagating through the network).

図２は、本実施形態による分散型音声認識システムの構成の一実施例を示したブロック図である。制御装置２の各々は、分散型音声認識システムを構成する機能部として、音声入力部（Ａ１またはＢ１）、音声認識部（Ａ２またはＢ２）、時刻印加部（Ａ３またはＢ３）、音声正常判断部（Ａ４またはＢ４）、メッセージ処理部（Ａ５またはＢ５）、重要音声テーブル（Ａ６またはＢ６）、認識音声テーブル（Ａ７またはＢ７）および指令メッセージテーブル（Ａ８またはＢ８）を備えている。図２に図示されている２つの制御装置２（第１の制御装置、第２の制御装置）は、音声出力部１から出力された音声が入力された制御装置２（図１に図示された８つの制御装置）から所定の方法で選択されたものである。選択の方法としては、例えば、各制御装置２に入力された音声の音量を計測し、最も大きな音量が入力された制御装置２を選ぶという方法がある。この場合、音量を計測する装置は、各制御装置２に備えられている。 FIG. 2 is a block diagram showing an example of the configuration of the distributed speech recognition system according to the present embodiment. Each of the control devices 2 includes a speech input unit (A1 or B1), a speech recognition unit (A2 or B2), a time application unit (A3 or B3), and a speech normality determination unit as functional units constituting the distributed speech recognition system. (A4 or B4), a message processing unit (A5 or B5), an important voice table (A6 or B6), a recognized voice table (A7 or B7), and a command message table (A8 or B8). The two control devices 2 (first control device and second control device) shown in FIG. 2 are connected to the control device 2 (shown in FIG. 1) to which the sound output from the sound output unit 1 is input. 8 control devices) are selected by a predetermined method. As a selection method, for example, there is a method in which the volume of the sound input to each control device 2 is measured and the control device 2 to which the highest volume is input is selected. In this case, a device for measuring the volume is provided in each control device 2.

音声入力部（Ａ１、Ｂ１）は、音声出力部１から出力された、監視員５の音声を、ノイズ、その他の音も含めて検出する機能を有する。検出した音声は、例えばデジタルの音声信号に変換され、その音声信号が音声認識部（Ａ２、Ｂ２）に出力される。 The voice input unit (A1, B1) has a function of detecting the voice of the supervisor 5 output from the voice output unit 1 including noise and other sounds. The detected voice is converted into, for example, a digital voice signal, and the voice signal is output to the voice recognition unit (A2, B2).

音声認識部（Ａ２、Ｂ２）は、音声入力部（Ａ１、Ｂ１）から出力された音声信号に対し、例えば、ＦＦＴ（Fast Fourier Transform）等を用いて周波数解析を行い、音声の特徴量を抽出する。特徴量が抽出できない場合は、ノイズのみからなる音声として認識し、その音声信号は無視される。無視された音声信号は破棄される。音声の認識は、例えば、前記特徴量を用いて複合類似度法、ＨＭＭ（Hidden Markov Model）、ＤＰ(Dynamic Programing)マッチングなどの手法により、各制御装置２に記憶されている不図示の辞書（キーワードが体系的に管理され、記憶されたデータ群）との照合が行われる。
また、特徴量が抽出された音声信号において、後記する重要音声テーブル（Ａ６、Ｂ６）に登録された音声と一致するものがあれば、その音声に重要音声が含まれているものとして処理し、当該音声信号が時刻印加部（Ａ３、Ｂ３）に出力される。なお、重要音声については、後記する。 The voice recognition unit (A2, B2) performs frequency analysis on the voice signal output from the voice input unit (A1, B1) using, for example, FFT (Fast Fourier Transform) or the like, and extracts voice feature values. To do. If the feature quantity cannot be extracted, it is recognized as a voice consisting only of noise, and the voice signal is ignored. Ignored audio signals are discarded. Speech recognition is performed by using, for example, a dictionary (not shown) stored in each control device 2 by a technique such as a composite similarity method, HMM (Hidden Markov Model), DP (Dynamic Programming) matching using the feature amount. Keywords are systematically managed and collated with a stored data group).
In addition, in the voice signal from which the feature amount is extracted, if there is a voice that matches the voice registered in the important voice table (A6, B6) to be described later, the voice is processed as being included in the voice, The audio signal is output to the time application unit (A3, B3). The important voice will be described later.

時刻印加部（Ａ３、Ｂ３）は、音声認識部（Ａ２、Ｂ２）から出力された音声信号に対し、その音声が当該制御装置２に入力された時刻を示す時刻データを付加する。時刻データは、各制御装置２に搭載されている不図示のタイマが計時処理をしたことにより生成されたデータである。例えば、音声入力部（Ａ１、Ｂ１）がタイマに時刻データを生成するように要求したときに、計時処理が行われ、その要求を受信した時の時刻を生成するといった制御がなされる。時刻データが付加された音声信号は、認識音声テーブル（Ａ７、Ｂ７）に所定のデータ構造を有して一時的に登録される。 The time application unit (A3, B3) adds time data indicating the time when the voice is input to the control device 2 to the voice signal output from the voice recognition unit (A2, B2). The time data is data generated when a timer (not shown) installed in each control device 2 performs a time measurement process. For example, when the voice input unit (A1, B1) requests the timer to generate time data, a time measurement process is performed, and control is performed such that the time when the request is received is generated. The voice signal to which the time data is added is temporarily registered in the recognized voice table (A7, B7) with a predetermined data structure.

音声正常判断部（Ａ４、Ｂ４）は、認識音声テーブル（Ａ７、Ｂ７）から取得した音声信号が、ノイズの混入が所定の閾値以下となる正常なものであるか否かを判断する。この判断においては、他の制御装置２の音声正常判断部から当該音声信号を受信し、自ら受信した音声信号に付加された時刻データが示す時刻と他の制御装置から受信したそれとがほぼ同時刻、例えば、±０．２秒以内に収まる時刻の音声信号同士を比較する。この比較により正常と判断された音声信号は、メッセージ処理部（Ａ５、Ｂ５）に出力され、そうでないものは無視される。前記比較では、例えば、比較対象となる２つの音声信号の波形が一致する割合が閾値以上であるか否かが判断される。 The sound normality determination unit (A4, B4) determines whether or not the sound signal acquired from the recognized sound table (A7, B7) is a normal signal in which noise mixing is a predetermined threshold value or less. In this determination, the audio signal is received from the audio normality determination unit of the other control device 2, and the time indicated by the time data added to the audio signal received by itself is substantially the same as that received from the other control device. For example, audio signals at times falling within ± 0.2 seconds are compared. The audio signal determined to be normal by this comparison is output to the message processing unit (A5, B5), and the other is ignored. In the comparison, for example, it is determined whether or not the rate at which the waveforms of two audio signals to be compared match is equal to or greater than a threshold value.

メッセージ処理部（Ａ５、Ｂ５）は、音声正常判断部（Ａ４、Ｂ４）から出力された音声信号が制御装置２で実行される処理に用いられるもの、例えば、補機９の動作を制御するのに用いられる音声信号であるか否かを判断する。この判断は、当該音声信号が指令メッセージテーブル（Ａ８、Ｂ８）に登録された指令内容（後記）を含むか否かが判断される。登録されている場合には、その指令内容を示すデータ（制御用データ）を補機９に出力して補機９を制御し、そうでない場合には、当該音声信号は無視される。 The message processing unit (A5, B5) controls the operation of the auxiliary machine 9, for example, the audio signal output from the audio normality determination unit (A4, B4) is used for processing executed by the control device 2. It is determined whether or not the audio signal is used for the recording. This determination is made as to whether or not the audio signal includes the command content (described later) registered in the command message table (A8, B8). If registered, data indicating the contents of the command (control data) is output to the auxiliary machine 9 to control the auxiliary machine 9, otherwise the audio signal is ignored.

重要音声テーブル（Ａ６またはＢ６）は、補機９を制御するために重要と判断された音声信号の音声を重要音声として、所定のデータ構造を有して（例えば、音声ファイル化して）登録するデータベースである。重要であるか否かの判断は、プラントを制御する者の設計事項であるが、通常は、補機９の動作を規定する音声を重要とする。例えば、補機９の「起動（キドウ）」、「停止（テイシ）」をいう。
図３は、重要音声テーブル（Ａ６、Ｂ６）のデータ構造を図示したものである。この重要音声テーブル（Ａ６またはＢ６）は、当該レコードを識別する番号が登録される識別番号（Ｎｏ）フィールド３０１と、重要と規定された音声が登録される重要音声フィールド３０２とを備えたデータベースである。入力された音声信号の音声において、重要音声フィールド３０２に登録された音声と一致する部分が存在したときは、その部分を重要音声と定めるフラグ処理がなされる。 The important voice table (A6 or B6) registers the voice of the voice signal determined to be important for controlling the auxiliary machine 9 as the important voice and has a predetermined data structure (for example, converted into a voice file). It is a database. The determination of whether or not it is important is a design matter of the person who controls the plant, but usually the voice that defines the operation of the auxiliary machine 9 is important. For example, it refers to “starting” and “stopping” of the auxiliary machine 9.
FIG. 3 shows the data structure of the important voice table (A6, B6). The important voice table (A6 or B6) is a database including an identification number (No) field 301 in which a number for identifying the record is registered, and an important voice field 302 in which a voice defined as important is registered. is there. If there is a portion that matches the sound registered in the important sound field 302 in the sound of the input sound signal, flag processing is performed to determine that portion as the important sound.

認識音声テーブル（Ａ７、Ｂ７）は、時刻データが付加された音声信号を、所定のデータ構造を有して（例えば、音声ファイル化した音声信号と時刻データとを対応付けて）登録するデータベースである。
図４は、認識音声テーブル（Ａ７、Ｂ７）のデータ構造を図示したものである。この認識音声テーブル（Ａ７、Ｂ７）は、当該レコードを識別する番号が登録される識別番号（Ｎｏ）フィールド４０１と、時刻データが示す時刻が登録される時刻フィールド４０２と、音声信号が示す音声のうち重要音声でない部分が登録される音声フィールド４０３と、音声信号が示す音声のうち重要音声である部分が登録される重要音声フィールド４０４とを備えたデータベースである。当該音声信号に対する補機９の制御が済んだ後は、認識音声テーブル（Ａ７、Ｂ７）内の当該レコードは破棄される。
なお、音声正常判断部（Ａ４、Ｂ４）は、重要音声でない部分については、音声信号に若干量（例えば、３０％程度）のノイズが混入し、音声信号に乱れが生じたと判断した場合であっても正常と判断する。ただ、重要音声である部分については、音声信号に混入したノイズが微少量（例えば、１％程度）以下であるときに正常と判断するようにして認識の精度に差を設けるようにする。 The recognized voice table (A7, B7) is a database that registers a voice signal to which time data is added, having a predetermined data structure (for example, associating a voice signal that has been converted into a voice file and time data). is there.
FIG. 4 shows the data structure of the recognition voice table (A7, B7). The recognition voice table (A7, B7) includes an identification number (No) field 401 in which a number for identifying the record is registered, a time field 402 in which the time indicated by the time data is registered, and the voice indicated by the voice signal. The database includes an audio field 403 in which a part that is not important voice is registered, and an important voice field 404 in which a part that is important voice among voices indicated by the voice signal is registered. After the auxiliary machine 9 is controlled for the voice signal, the record in the recognized voice table (A7, B7) is discarded.
Note that the voice normality determination unit (A4, B4) is a case where it is determined that the audio signal is disturbed by a slight amount (for example, about 30%) of noise in the non-important voice part. Even if it is judged as normal. However, with respect to a portion that is an important voice, a difference is provided in recognition accuracy so that it is judged normal when the noise mixed in the voice signal is a very small amount (for example, about 1%) or less.

指令メッセージテーブル（Ａ８、Ｂ８）は、各制御装置２がその指令対象（主に、補機９）にする指令内容を、所定のデータ構造を有して（例えば、音声ファイル化して）登録するデータベースである。
図５は、指令メッセージテーブル（Ａ８、Ｂ８）のデータ構造を図示したものである。この指令メッセージテーブル（Ａ８、Ｂ８）は、当該レコードを識別する番号が登録される識別番号（Ｎｏ）フィールド５０１と、指令対象となる設備、つまりプラントを識別する値が登録される対象設備番号フィールド５０２と、制御装置２の指令対象として、プラントを構成する補機９を識別する値が登録される指令対象フィールド５０３と、制御装置２からの指令内容を識別する値が登録される指令フィールド５０４と、前記指令内容を論理回路で表現したロジックシート（アルゴリズム）が登録されるロジックシートフィールド５０５と、指令内容に応じた制御を実現するためにロジックシートに入力される音声命令を識別する番号が登録される音声命令番号フィールド５０６とを備えたデータベースである。音声命令番号フィールド５０６に登録される番号は、指令フィールド５０４に登録される値と１対１に対応する。
ロジックシートは各制御装置２の記憶部に記憶されており、入力される音声信号に一致するレコードが抽出されたときに、そのレコードに登録されているロジックシートが読み出される。また、ロジックシートには、音声命令番号フィールド５０６に登録された番号が入力される、論理回路としての領域を備えている。 In the command message table (A8, B8), each control device 2 registers the command contents to be commanded (mainly, the auxiliary machine 9) having a predetermined data structure (for example, converted into a voice file). It is a database.
FIG. 5 shows the data structure of the command message table (A8, B8). This command message table (A8, B8) includes an identification number (No) field 501 in which a number for identifying the record is registered, and a target facility number field in which a value for identifying a facility to be commanded, that is, a plant is registered. 502, a command target field 503 in which a value for identifying the auxiliary machine 9 constituting the plant is registered as a command target of the control device 2, and a command field 504 in which a value for identifying the command content from the control device 2 is registered. And a logic sheet field 505 in which a logic sheet (algorithm) expressing the command content in a logic circuit is registered, and a number for identifying a voice command input to the logic sheet in order to realize control according to the command content The database includes a voice command number field 506 to be registered. The number registered in the voice command number field 506 has a one-to-one correspondence with the value registered in the command field 504.
The logic sheet is stored in the storage unit of each control device 2, and when a record that matches the input audio signal is extracted, the logic sheet registered in the record is read out. Further, the logic sheet has an area as a logic circuit in which the number registered in the voice command number field 506 is input.

≪処理≫
次に、本実施形態による分散型音声認識システムによる処理動作について説明する。図６は、本実施形態による分散型音声認識システムによる処理動作を示すフローチャートである。この処理動作の主体は、各制御装置２の制御部である。
また、この処理動作が、現場にいる検査員３と工場で監視している監視員５とによる制御装置２の検査（主に、補機９に対する制御装置２の制御（取り合いも含む。）が正常に行われているか否かを確認する検査）において実行されているものとして説明する。ちなみに、検査員３と監視員５とはトランシーバで検査に関する連絡をやり取りしている。例えば、検査員３は監視員５に対し、
「盤No.5 端子台No.X11 １番、２番ジャンパーします」
とか、
「盤No.5 端子台No.X11 チャンネルNo.1 2.5V入力します」
といった、制御装置２の検査をするために、所定の信号入力を確認するための作業内容を報告する。
一方、監視員５は、前記した検査員３の作業に応じて補機９の稼働状態を変更するために、
「１号給水ポンプＡ起動します」
といった、補機９のテストに必要な指令を音声出力部１から音声により出力する。この音声は音量の大小は様々であるが、すべての制御装置２に入力される。 << Process >>
Next, processing operations performed by the distributed speech recognition system according to the present embodiment will be described. FIG. 6 is a flowchart showing the processing operation by the distributed speech recognition system according to the present embodiment. The main body of this processing operation is the control unit of each control device 2.
In addition, this processing operation is performed by the inspection device 3 by the inspector 3 at the site and the monitoring device 5 monitoring at the factory (mainly, control of the control device 2 with respect to the auxiliary machine 9 (including interaction)). The description will be made assuming that the test is performed in the inspection for confirming whether or not the test is normally performed. By the way, the inspector 3 and the monitor 5 communicate with each other about the inspection through the transceiver. For example, the inspector 3 gives the
"Board No.5 Terminal block No.X11 Jumper No.1 and No.2"
And,
“Panel No.5 Terminal block No.X11 Channel No.1 2.5V input”
In order to check the control device 2, the work content for confirming a predetermined signal input is reported.
On the other hand, the supervisor 5 changes the operating state of the auxiliary machine 9 according to the work of the inspector 3 described above.
"No. 1 water supply pump A starts"
A command necessary for the test of the auxiliary machine 9 is output from the voice output unit 1 by voice. This sound is input to all the control devices 2 although the volume varies.

まず、ステップＳ０１において、制御装置２の制御部は、工場からネットワーク等を介して送信され、現場の音声出力部１から出力された、監視員５の音声を、ノイズその他の音も含めて検出する。検出した音声は、音声信号として入力される。入力された後、ステップＳ０２に進む。 First, in step S01, the control unit of the control device 2 detects the voice of the supervisor 5 transmitted from the factory via a network or the like and output from the on-site voice output unit 1, including noise and other sounds. To do. The detected voice is input as a voice signal. After the input, the process proceeds to step S02.

次に、ステップＳ０２において、制御装置２の制御部は、入力された音声信号がメモリに記録可能であるか否かを判断する。記録可能であれば（Ｓ０２でＹｅｓ）、ステップＳ０３に進む。そうでなければ（Ｓ０２でＮｏ）ステップＳ０９に進む。なお、音声信号が記録可能であるとは、音声認識部（Ａ２、Ｂ２）において周波数解析を行い、音声の特徴量を抽出することが可能であることを意味する。また、重要音声テーブル（Ａ６、Ｂ６）を参照して、音声信号に「キドウ」といったような重要音声に相当する音声の特徴量が含まれていれば、その音声信号に含まれる重要音声のフラグ処理を行う。例えば、「１号給水ポンプＡ起動します」という音声が入力された場合、音声認識により「イチゴウキュウスイポンプエイキドウ」と解析され、「キドウ」という箇所については、重要音声テーブル（Ａ６、Ｂ６）のＮｏ１と番号付けられたレコード（図３参照）が参照されることにより重要信号としてフラグ処理される。なお、「します」の部分は無視される（ステップＳ０９参照）。 Next, in step S02, the control unit of the control device 2 determines whether or not the input audio signal can be recorded in the memory. If recording is possible (Yes in S02), the process proceeds to step S03. Otherwise (No in S02), the process proceeds to step S09. Note that the voice signal can be recorded means that the voice recognition unit (A2, B2) can perform frequency analysis and extract a voice feature amount. In addition, referring to the important sound table (A6, B6), if the sound signal includes a feature amount of sound corresponding to the important sound such as “Kidou”, the flag of the important sound included in the sound signal. Process. For example, when a voice saying “No. 1 water supply pump A starts” is input, it is analyzed by voice recognition as “Strawberry Equip Pump Ikido”, and “Kido” is the important voice table (A6, B6). ) No. 1 record (see FIG. 3) is referred to and flag processing is performed as an important signal. Note that the part of “I do” is ignored (see step S09).

次に、ステップＳ０３において、制御装置２の制御部は、記録可能であると判断された音声信号に時刻を印加し、時刻データ付きの音声信号を生成する。時刻データ付きの音声信号は、認識音声テーブル（Ａ７、Ｂ７）に一時的に登録される。時刻を印加した後、ステップＳ０４に進む。「１号給水ポンプＡ起動します」という音声が１０：１０に入力された場合、認識音声テーブル（Ａ７、Ｂ７）のＮｏ１と番号付けられたレコード（図４参照）が作成される。 Next, in step S03, the control unit of the control device 2 applies time to the audio signal determined to be recordable, and generates an audio signal with time data. The voice signal with time data is temporarily registered in the recognized voice table (A7, B7). After applying the time, the process proceeds to step S04. When the voice “Start No. 1 water supply pump A” is input at 10:10, a record (see FIG. 4) numbered No. 1 in the recognition voice table (A7, B7) is created.

次に、ステップＳ０４において、制御装置２の制御部は、他の制御装置２で入力された時刻付近の音声信号と比較する。この比較においては、他の制御装置２から時刻データ付きの音声信号を取得し、認識音声テーブル（Ａ７、Ｂ７）の時刻フィールド４０２に登録された時刻を参照して、ほぼ同時刻と判断できる音声信号を抽出し、両者を比較する。比較した後、ステップＳ０５に進む。 Next, in step S <b> 04, the control unit of the control device 2 compares the sound signal near the time input by the other control device 2. In this comparison, an audio signal with time data is acquired from another control device 2, and the audio that can be determined to be substantially the same time by referring to the time registered in the time field 402 of the recognized audio table (A7, B7). Extract the signal and compare the two. After the comparison, the process proceeds to step S05.

次に、ステップＳ０５において、制御装置２の制御部は、比較対象となる音声信号同士が一致するか否かを判断する。音声信号が一致する場合には（ステップＳ０５でＹｅｓ）、正常な音声信号と判断され、ステップＳ０６に進む。そうでなければ（ステップＳ０５でＮｏ）、異常な音声信号と判断され、ステップＳ１０に進む。なお、音声信号同士が一致するとは、他の制御装置２から取得した時刻データ付きの音声信号に含まれる音声（重要音声以外の音声）と認識音声テーブル（Ａ７、Ｂ７）の該当レコードの音声フィールド４０３に登録された音声が若干のノイズが含まれていても一致し（比較対象となる音声信号の一致する割合が第２の閾値以上であるが、第１の閾値を下回り）、かつ、他の制御装置２から取得した時刻データ付きの音声信号に含まれる重要音声と認識音声テーブル（Ａ７、Ｂ７）の該当レコードの重要音声フィールド４０４に登録された重要音声が殆どノイズを含まず、ほぼ厳密に一致する（比較対象となる音声信号の一致する割合が第１の閾値以上である）ことを意味する。「１号給水ポンプＡ起動します。」という音声が入力された場合、一方では、「キュウスイポンプ」と認識され、他方では「キュウス××ンプ」（××の部分はノイズ）と認識されても、重要音声ではないので一致していると判断される。しかし、一方では「キドウ」と認識され、他方では「×ドウ」（×の部分はノイズ）と認識されたのであれば、重要音声であるので一致していないと判断される。 Next, in step S05, the control unit of the control device 2 determines whether or not the audio signals to be compared match. If the audio signals match (Yes in step S05), it is determined as a normal audio signal, and the process proceeds to step S06. Otherwise (No in step S05), it is determined that the sound signal is abnormal, and the process proceeds to step S10. Note that the voice signals match each other, the voice (voice other than the important voice) included in the voice signal with time data acquired from the other control device 2 and the voice field of the corresponding record in the recognized voice table (A7, B7). The audio registered in 403 matches even if some noise is included (the ratio of the audio signals to be compared is equal to or higher than the second threshold but lower than the first threshold), and other The important voices included in the voice signal with time data acquired from the control device 2 and the important voices registered in the important voice field 404 of the corresponding record of the recognition voice table (A7, B7) contain almost no noise and are almost strictly (The ratio at which the audio signals to be compared are equal to or higher than the first threshold value). When the voice “No. 1 water pump A starts” is input, it will be recognized as “Kyu Sui Pump” on the one hand, and as “Cous XX Pump” (XX part is noise) on the other hand. However, since it is not an important voice, it is determined that they match. However, if it is recognized as “Kido” on the one hand and “× Doe” on the other hand (X is a noise), it is determined that the voices are not matched because they are important voices.

次に、ステップＳ０６おいて、制御装置２の制御部は、正常と判断された音声信号と、指令メッセージテーブル（Ａ８、Ｂ８）のデータとを比較する。正常と判断された音声信号に含まれる音声の特徴量を検索キーとして指令メッセージテーブル（Ａ８、Ｂ８）を検索する。比較した後、ステップＳ０７に進む。 Next, in step S06, the control unit of the control device 2 compares the audio signal determined to be normal with the data in the command message table (A8, B8). The command message table (A8, B8) is searched using the feature value of the voice included in the voice signal determined to be normal as a search key. After the comparison, the process proceeds to step S07.

次に、ステップＳ０７において、制御装置２の制御部は、指令メッセージテーブル（Ａ８、Ｂ８）に対象となるレコードは存在するか否かを判断する。具体的には、検索キーとなる音声信号に含まれる音声の特徴量が指令メッセージテーブル（Ａ８、Ｂ８）内の対象設備番号フィールド５０２、指令対象フィールド５０３、指令フィールド５０４に登録された値と一致するか否かを判断する。そのレコードが存在すれば（ステップＳ０７でＹｅｓ）、ステップＳ０８に進む。そうでなければ（ステップＳ０７でＮｏ）、ステップＳ１１に進む。
なお、本実施形態では、指令対象フィールド５０３に登録される値は、重要音声でない音声であり、指令フィールド５０４に登録される値は、重要音声であるとする。「１号給水ポンプＡ起動します」という音声が入力された場合、「イチゴウキュウスイポンプエイ」という部分がＮｏ１と番号付けられたレコードの指令対象フィールド５０３の値と一致し、「キドウ」という部分が、指令フィールド５０４の値と一致する。 Next, in step S07, the control unit of the control device 2 determines whether or not the target record exists in the command message table (A8, B8). Specifically, the voice feature amount included in the voice signal serving as the search key matches the values registered in the target equipment number field 502, the command target field 503, and the command field 504 in the command message table (A8, B8). Judge whether to do. If the record exists (Yes in step S07), the process proceeds to step S08. Otherwise (No in step S07), the process proceeds to step S11.
In the present embodiment, it is assumed that the value registered in the command target field 503 is a voice that is not an important voice, and the value registered in the command field 504 is an important voice. When a voice saying “No. 1 water supply pump A starts” is input, the part of “Strawberry cucumber pump A” matches the value of the command target field 503 of the record numbered No1, and “Kido” The part matches the value of the command field 504.

次に、ステップＳ０８において、制御装置２の制御部は、指令メッセージテーブル（Ａ８、Ｂ８）に記載されているデータを補機９に出力する。具体的には、検索キーとなる音声信号により指令メッセージテーブル（Ａ８、Ｂ８）から抽出されたレコードにおいて、ロジックシートフィールド５０５に登録されていたロジックシートを記憶部から読み出し、音声命令番号フィールド５０６に登録されていた音声命令番号を、読み出したロジックシートの所定の領域に入力する。入力すれば、ロジックシートに示された論理回路に従って演算結果となる制御用データが求められる。その制御用データを補機９に出力して終了する。「１号給水ポンプＡ起動します」という音声が入力された場合、Ｎｏ１と番号付けられたレコードが参照され、「ＣＳ００１」というロジックシートおよび「１」という音声指令番号が抽出され、ロジックシートＣＳ００１の所定の領域に音声指令番号１が入力される。その結果、プラント１号の給水ポンプＡを起動させる制御用データが制御装置２から出力される。 Next, in step S08, the control unit of the control device 2 outputs the data described in the command message table (A8, B8) to the auxiliary machine 9. Specifically, in the record extracted from the command message table (A8, B8) by the voice signal serving as the search key, the logic sheet registered in the logic sheet field 505 is read from the storage unit and stored in the voice command number field 506. The registered voice command number is input to a predetermined area of the read logic sheet. If input, control data that is an operation result is obtained in accordance with the logic circuit shown in the logic sheet. The control data is output to the auxiliary machine 9 and the process ends. When a voice saying “No. 1 water pump A starts” is input, a record numbered No1 is referred to, a logic sheet “CS001” and a voice command number “1” are extracted, and a logic sheet CS001 is extracted. The voice command number 1 is input to the predetermined area. As a result, control data for starting the feed pump A of the plant 1 is output from the control device 2.

次に、ステップＳ０９において、制御装置２の制御部は、記録不可能であると判断された音声信号を無視して、その音声信号を破棄して終了する。 Next, in step S09, the control unit of the control device 2 ignores the audio signal determined to be unrecordable, discards the audio signal, and ends.

次に、ステップＳ１０において、制御装置２の制御部は、他の制御装置２と比較して一致しないと判断された音声信号を無視して、その音声信号を破棄して終了する。 Next, in step S <b> 10, the control unit of the control device 2 ignores the audio signal determined not to match the other control device 2, discards the audio signal, and ends.

次に、ステップＳ１１において、制御装置２の制御部は、入力された音声信号に対し、指令メッセージテーブル（Ａ８、Ｂ８）に対象となるレコードは存在しないと判断された音声信号を無視して、その音声信号を破棄して終了する。
以上で、分散型音声認識システムによる処理動作の説明を終了する。 Next, in step S11, the control unit of the control device 2 ignores the audio signal determined to have no target record in the command message table (A8, B8) for the input audio signal, The audio signal is discarded and the process ends.
This is the end of the description of the processing operation performed by the distributed speech recognition system.

≪まとめ≫
本実施形態により、以下の効果を奏する。すなわち、発電等のプラントの制御装置において音声認識を行う際に、入力された音声の音声信号が各制御装置同士で一致するか否かを判断するので、発電等のプラントの制御装置に入力される音声の認識の精度を高めることができる。音声認識の精度を上げるときには、システム「単体」でいかに精度良く認識するかという考えが主流であるが、発電等のプラントの制御装置に用いる場合には、既に記したような困難が生じるので、音声認識する箇所を「複数」備えるようにして所望の認識精度を実現する。制御装置が元々複数備わっているハードウェア構成であるため、分散型音声認識システムを複数備えることによる、資源の減少、インタフェースの追加等といった負担は少ない。 ≪Summary≫
According to the present embodiment, the following effects can be obtained. That is, when performing speech recognition in a plant control device such as power generation, it is determined whether or not the voice signals of the input speech match between the control devices, so that they are input to the plant control device such as power generation. The accuracy of voice recognition can be increased. When raising the accuracy of speech recognition, the idea of how to recognize accurately with the system “single unit” is the mainstream, but when used for plant control devices such as power generation, the difficulties described above arise, A desired recognition accuracy is achieved by providing “plural” speech recognition locations. Since the hardware configuration originally includes a plurality of control devices, the burden of reducing resources, adding interfaces, and the like due to the provision of a plurality of distributed speech recognition systems is small.

また、音声認識で制御装置の制御を行うことにより、他の制御装置と接続するためにケーブル配線を用いる必要がなくなり、本実施形態の分散型音声認識システムの適用範囲は拡大される。 Further, by controlling the control device by voice recognition, it is not necessary to use cable wiring to connect to another control device, and the application range of the distributed speech recognition system of this embodiment is expanded.

また、制御装置間の取り合いを音声で行うことになるため、取り合い信号数に制限は無く、さらに配線も不要となることから、配線を流れる電流の電流値の検討や接続試験等は不要であるとともに、装置全体を大幅に小規模化することができる。 In addition, since the communication between the control devices is performed by voice, there is no limit on the number of signals to be connected, and no wiring is required, so there is no need to examine the current value of the current flowing through the wiring or to perform a connection test. At the same time, the entire apparatus can be greatly reduced in size.

≪その他≫
なお、前記形態は、本発明を実施するための最良のものであるが、その実施形式はこれに限定するものではない。したがって、本発明の要旨を変更しない範囲において、その実施形式を種々変形することが可能である。 ≪Others≫
In addition, although the said form is the best thing for implementing this invention, the implementation form is not limited to this. Therefore, various modifications can be made to the implementation form without changing the gist of the present invention.

例えば、本実施形態では、監視員５がマイク７を使用して制御装置２に指令する音声を音声出力部１から出力した。しかし、この音声は、工場用端末６においてその指令用に予め録音しておき、所望のタイミングでその音声を読み出して音声出力部１から出力するようにしても良い。 For example, in the present embodiment, the voice output unit 1 outputs a voice commanded to the control device 2 by the monitor 5 using the microphone 7. However, this sound may be recorded in advance for the command at the factory terminal 6, read out at a desired timing, and output from the sound output unit 1.

また、本実施形態では、２台以上並列に配置された制御装置から２つの制御装置を選択してそれぞれの制御装置において音声認識を行った。しかし、音声認識を行う制御装置の数は複数であればその個数は限定しない。処理に伴う負荷を考慮しつつ、すべての制御装置において音声認識を行い音声信号が正常であるか否かを判断するようにしても良い。さらに、本実施形態の分散型音声認識システムを制御装置に搭載するのではなく、独立した複数個からなる装置として構成し、それらの装置を制御装置等と通信可能に接続したネットワークを構成しても良い。 In the present embodiment, two control devices are selected from two or more control devices arranged in parallel, and voice recognition is performed in each control device. However, the number of control devices that perform speech recognition is not limited as long as it is plural. It is also possible to determine whether or not the voice signal is normal by performing voice recognition in all the control devices while considering the load associated with the processing. Further, the distributed speech recognition system according to the present embodiment is not mounted on the control device, but is configured as a plurality of independent devices, and a network in which these devices are communicably connected to the control device is configured. Also good.

その他、ハードウェア、ソフトウェア、各フローチャート、データ構造等の具体的な構成について、本発明の趣旨を逸脱しない範囲で適宜変更が可能である。 In addition, specific configurations such as hardware, software, flowcharts, and data structures can be changed as appropriate without departing from the spirit of the present invention.

音声により処理を実行する装置に対し、本発明の分散型音声認識システムを適用することができる。適用する際、装置と外部接続するか、装置の内部に搭載するかは問わない。 The distributed speech recognition system of the present invention can be applied to a device that executes processing by speech. When applied, it does not matter whether it is externally connected to the device or mounted inside the device.

本実施形態による分散型音声認識システムが使用されるネットワークの構成を示したブロック図である。It is the block diagram which showed the structure of the network where the distributed type speech recognition system by this embodiment is used. 本実施形態による分散型音声認識システムの構成の一実施例を示したブロック図である。It is the block diagram which showed one Example of the structure of the distributed speech recognition system by this embodiment. 重要音声テーブル（Ａ６、Ｂ６）のデータ構造を図示したものである。The data structure of the important voice table (A6, B6) is illustrated. 認識音声テーブル（Ａ７、Ｂ７）のデータ構造を図示したものである。The data structure of a recognition voice table (A7, B7) is illustrated. 指令メッセージテーブル（Ａ８、Ｂ８）のデータ構造を図示したものである。The data structure of a command message table (A8, B8) is illustrated. 本実施形態による分散型音声認識システムによる処理動作を示すフローチャートである。It is a flowchart which shows the processing operation by the distributed speech recognition system by this embodiment.

Explanation of symbols

１音声出力部
２制御装置（第１の制御装置、第２の制御装置を含む。）
３検査員
４現場用サーバ
５監視員
６工場用端末
７マイク
８工場用サーバ
９補機
Ａ１、Ｂ１音声入力部
Ａ２、Ｂ２音声認識部
Ａ３、Ｂ３時刻印加部
Ａ４、Ｂ４音声正常判断部
Ａ５、Ｂ５メッセージ処理部
Ａ６、Ｂ６重要音声テーブル
Ａ７、Ｂ７認識音声テーブル
Ａ８、Ｂ８指令メッセージテーブル
DESCRIPTION OF SYMBOLS 1 Audio | voice output part 2 Control apparatus (A 1st control apparatus and a 2nd control apparatus are included.)
3 Inspector 4 On-site server 5 Monitor 6 Factory terminal 7 Microphone 8 Factory server 9 Auxiliary machine A1, B1 Voice input part A2, B2 Voice recognition part A3, B3 Time application part A4, B4 Voice normality judgment part A5, B5 Message processor A6, B6 Important voice table A7, B7 Recognition voice table A8, B8 Command message table

Claims

A distributed speech recognition system comprising: a first control device that controls a first device; and a second control device that controls a second device,
The first control device includes:
The voice inputted from the outside is converted into a first voice signal, the voice is recognized by extracting the feature amount of the voice from the first voice signal, and the recognized voice receives a predetermined command. including satisfies the command containing conditions are in the first audio signal meets the normal condition is normal, and when it is determined to be within the first information identifying the first device in the audio signal Includes a first control unit that controls to output control data according to the command to the first device;
A first storage unit that stores a command for the first device and an algorithm for obtaining the control data in association with each other;
Equipped with a,
The second control device includes:
A voice inputted from outside is converted into a second voice signal, a voice feature is extracted from the second voice signal, the voice is recognized, and the recognized voice receives a predetermined command. including satisfies the command containing conditions are in, the second audio signal meets the normal condition is normal, and when it is determined that includes the information for specifying the second device to a second audio signal Includes a second control unit that controls to output control data in accordance with the command to the second device;
A second storage unit that stores a command for the second device and an algorithm for obtaining the control data in association with each other;
Equipped with a,
Before SL first controller and the second controller is communicatively connected,
When the first control unit determines that the command inclusion condition is satisfied, the first control unit adds first time data indicating a time when the sound is input to the first sound signal. ,
When the second control unit determines that the command inclusion condition is satisfied, the second control unit adds second time data indicating a time when the sound is input to the second sound signal. ,
The first controller is
Of the first audio signal to which the first time data is added and the second audio signal to which the second time data is added and acquired from the second control device, the second audio signal Comparing whether the time indicated by the first time data substantially matches the time indicated by the first time data to determine whether or not the normal condition is satisfied for the first audio signal. Distributed speech recognition system.

The first controller is
The first important audio signal included in the first audio signal by the comparison and corresponding to the command matches the second important audio signal included in the second audio signal and corresponding to the command. der ratio than the first threshold value is, and,
According to the comparison, the proportion of the first non-important audio signal included in the first audio signal and the second non-important audio signal included in the second audio signal is equal to the first threshold value. 2. The distributed speech recognition system according to claim 1, wherein the normal condition is determined to be satisfied for the first speech signal when the second threshold is less than or equal to a second threshold value.

  There are a plurality of control devices that are candidates for the second control device,
  Each of the control devices includes volume measuring means for measuring the volume of the sound input to itself,
  Of each of the sound volume measuring means, a control device provided with sound volume measuring means for measuring the maximum sound volume is defined as the second control device.
  The distributed speech recognition system according to claim 1 or 2, wherein the system is a distributed speech recognition system.