JP2005338286A

JP2005338286A - Object sound processor and transport equipment system using same, and object sound processing method

Info

Publication number: JP2005338286A
Application number: JP2004155027A
Authority: JP
Inventors: Takashi Akasaka; 貴志赤坂; Hirobumi Mochizuki; 博文望月; Masamitsu Mizuno; 正光水野; Akihiro Yamazaki; 章弘山崎; Nobuyasu Arimune; 伸泰有宗; Takao Kawai; 隆男河合; Kyoji Mukumoto; 恭司椋本
Original assignee: Yamaha Marine Co Ltd; Yamaha Motor Co Ltd
Current assignee: Yamaha Marine Co Ltd; Yamaha Motor Co Ltd
Priority date: 2004-05-25
Filing date: 2004-05-25
Publication date: 2005-12-08

Abstract

PROBLEM TO BE SOLVED: To provide an object sound processor capable of processing an object sound adaptively to different situations depending upon a position in transport equipment, and a transport equipment system using the same. SOLUTION: A speaker as a member of the crew of a ship operates the ship by speaking to microphones 41 and 42 of a speech indication device 10. Speech signals from the microphones 41 and 42 are subjected to speech recognition processing after noise removal processing, and ship operation data are generated based upon recognition results. At a plurality of places of the ship, IC tags 61 to 6M where position identification information is recorded are arranged. An IC tag reader 46 reads the position identification information of the IC tags 61 to 6M and specifies the position of the speaker. Attribute information and environment information of the ship, on the other hand, are received by a communication device 44 and a communication processing section 76. According to the position identification information, attribute information, and environment information, noise removing algorithm is selected and parameters of the speech recognition processing are selected. COPYRIGHT: (C)2006,JPO&NCIPI

Description

この発明は、船舶等の比較的大型の輸送機器内での使用に適した対象音処理装置およびこれを用いた輸送機器システム、ならびに対象音処理方法に関する。対象音の典型例は、人の発話する音声であり、対象音処理の典型例はその音声を認識する音声認識処理である。 The present invention relates to a target sound processing apparatus suitable for use in a relatively large transport device such as a ship, a transport device system using the same, and a target sound processing method. A typical example of the target sound is a voice uttered by a person, and a typical example of the target sound process is a voice recognition process for recognizing the voice.

音声認識装置は、車載ナビゲーションシステムの操作（下記特許文献１参照）や、船舶の操作（下記特許文献２参照）などに利用されている。
このような音声認識装置は、話者の発した声を受音するマイクロフォンと、このマイクロフォンが出力するアナログ音声信号をディジタル音声信号に変換するアナログ／ディジタル変換回路と、このアナログ／ディジタル変換回路が生成するディジタル音声信号を処理する音声処理回路とを備えている。 The voice recognition device is used for an operation of an in-vehicle navigation system (see Patent Document 1 below), a ship operation (see Patent Document 2 below), and the like.
Such a speech recognition apparatus includes a microphone that receives a voice uttered by a speaker, an analog / digital conversion circuit that converts an analog speech signal output from the microphone into a digital speech signal, and the analog / digital conversion circuit. And an audio processing circuit for processing the generated digital audio signal.

ここでいう音声処理回路は、機能的には、雑音除去部と、発話区間検出部と、音響解析部と、照合部とを含む。雑音除去部は、音声信号から、音声以外の周囲の雑音に該当する成分を除去する。発話区間検出部は、ディジタル音声信号から話者が発話した区間の信号を抽出する。音響解析部は、抽出された発話区間の信号から、音声認識処理のための特徴ベクトルを生成する。照合部は、生成された特徴ベクトルと、予め準備されている認識辞書および音響モデルとを照合し、認識結果を表す音声認識データを生成する。
特開２００３−１０８１９１号公報特許第２６６８４５０号公報実開平８−１５５２公報多々良潔ら、「複数の装着型マイクロホンを用いた受音系の構築に関する検討」日本音響学会講演論文集、２００３年３月、pp.177-178 高井信勝、『「信号処理」「画像処理」のためのMATLAB入門』工学社、２０００年 M. Omologo, P. Svaizerら、「Acousticevent localization using a crosspower-spectrum phase based technique」、Proc. ICASSP94、１９９４年、pp.274-276 S.F.BOLL、「Suppression of acoustic nise in speech using spectral subtraction」、IEEE Thrans. ASSP, vol.27, no.2、１９７９年４月、pp.113-120 The speech processing circuit here includes functionally a noise removing unit, an utterance section detecting unit, an acoustic analyzing unit, and a matching unit. The noise removing unit removes components corresponding to ambient noise other than the voice from the voice signal. The utterance section detection unit extracts a signal of a section where the speaker uttered from the digital voice signal. The acoustic analysis unit generates a feature vector for speech recognition processing from the extracted signal of the utterance section. The collation unit collates the generated feature vector with a recognition dictionary and an acoustic model prepared in advance, and generates speech recognition data representing a recognition result.
JP 2003-108191 A Japanese Patent No. 2668450 Japanese Utility Model Publication 8-1552 Yoshikazu Tada, “Study on Construction of Sound Reception System Using Multiple Wearable Microphones” Proceedings of the Acoustical Society of Japan, March 2003, pp.177-178 Nobukatsu Takai, “Introduction to MATLAB for“ Signal Processing ”and“ Image Processing ”” Engineering, 2000 M. Omologo, P. Svaizer et al., "Acousticevent localization using a crosspower-spectrum phase based technique", Proc. ICASSP94, 1994, pp.274-276 SFBOLL, "Suppression of acoustic nise in speech using spectral subtraction", IEEE Thrans. ASSP, vol.27, no.2, April 1979, pp.113-120

船舶のような大型の輸送機器内では、乗員は船舶内を移動する。したがって、乗員が音声認識処理の可能な携帯型装置を携帯し、この携帯型装置からの無線通信によって船舶の操作指示を行うことができれば便利である（たとえば、特許文献３参照）。
しかし、大型の輸送機器内では、音声指示を行う乗員の位置によって、騒音環境が大きく変化する。たとえば、船舶では、乗員が、キャビン、フライデッキおよびアウトデッキのいずれの位置にいるかにより、エンジン音および風切り音などに起因する騒音が全く異なる。 In a large transport device such as a ship, a passenger moves in the ship. Therefore, it is convenient if an occupant can carry a portable device capable of performing voice recognition processing and can give a ship operation instruction by wireless communication from the portable device (see, for example, Patent Document 3).
However, in a large transport device, the noise environment varies greatly depending on the position of the occupant who gives voice instructions. For example, in a ship, noise caused by engine sound, wind noise, and the like is completely different depending on whether the occupant is in a cabin, a fly deck, or an out deck.

このような騒音環境の大きな変化を想定した音声認識処理についての提案はなく、したがって、従来からの音声認識装置は、大型の輸送機器内での使用には必ずしも適していない。すなわち、輸送機器内の使用者の場所によって、音声認識率が大きく変化し、使用者の場所によっては、必ずしも高確率で話者（使用者）の音声を認識することができなかった。 There is no proposal for a speech recognition process that assumes such a large change in the noise environment. Therefore, a conventional speech recognition device is not necessarily suitable for use in a large-sized transport device. That is, the voice recognition rate varies greatly depending on the location of the user in the transport equipment, and depending on the location of the user, the voice of the speaker (user) cannot always be recognized with high probability.

音声認識に限らず、所定の対象音を含む音信号を処理する場合には、同様の問題がある。
そこで、この発明の目的は、輸送機器内の位置に依存して異なる状況に適応して対象音を処理することができる対象音処理装置およびそれを用いた輸送機器システムを提供することである。 Not only speech recognition but also a similar problem occurs when processing a sound signal including a predetermined target sound.
Accordingly, an object of the present invention is to provide a target sound processing apparatus capable of processing a target sound in accordance with different situations depending on the position in the transport equipment, and a transport equipment system using the target sound processing apparatus.

また、この発明の他の目的は、輸送機器内の位置に依存して異なる状況に適応して対象音を処理することができる対象音処理方法を提供することである。 Another object of the present invention is to provide a target sound processing method capable of processing a target sound in accordance with different situations depending on the position in the transport device.

上記の目的を達成するための請求項１記載の発明は、輸送機器内で生じる所定の対象音を処理するための対象音処理装置であって、前記輸送機器内における対象音の発生位置を特定する発生位置特定手段と、対象音を含む音の電気信号である音信号が入力され、この音信号を処理する音信号処理を実行するとともに、その音信号処理のパラメータを変更可能な音信号処理手段と、前記発生位置特定手段によって特定された発生位置に応じて、前記音信号処理手段における前記音信号処理のパラメータを切り換えるパラメータ切り換え手段とを含むことを特徴とする対象音処理装置である。 In order to achieve the above object, an invention according to claim 1 is a target sound processing apparatus for processing a predetermined target sound generated in a transport device, and specifies a generation position of the target sound in the transport device. Sound signal processing that can input sound position processing means and a sound signal that is an electric signal of a sound including the target sound, and that can perform sound signal processing to process the sound signal and change parameters of the sound signal processing And a parameter switching means for switching a parameter of the sound signal processing in the sound signal processing means in accordance with the generation position specified by the generation position specifying means.

この構成によれば、輸送機器内における対象音の発生位置が特定され、この特定された発生位置に応じて音信号処理のパラメータが切り換えられる。これにより、対象音の処理の態様を、輸送機器内の位置に依存して変動する状況（たとえば騒音状況）に適応させることができる。
また、パラメータの切り換えのために複雑な処理を必要としないので、負荷の少ない処理で、精度の高い音信号処理を実現できる。 According to this configuration, the generation position of the target sound in the transport device is specified, and the sound signal processing parameters are switched according to the specified generation position. Thereby, the processing mode of the target sound can be adapted to a situation (for example, a noise situation) that varies depending on the position in the transport device.
In addition, since complicated processing is not required for parameter switching, high-accuracy sound signal processing can be realized with low load processing.

前記輸送機器は、当該輸送機器内における対象音の発生位置に応じて音信号処理のパラメータを切り換えることにより、音信号処理精度に実質的な影響が生じるほど大きな輸送機器である。より具体的には、対象音が使用者から発せられる音の場合（たとえば、使用者の声）には、使用者（乗員）が内部で移動可能な輸送機器である。このような輸送機器の例は、船舶に代表される海上輸送機器、列車および大型車両（たとえば、バス）に代表される地上輸送機器、飛行機に代表される空中輸送機器、宇宙船に代表される宇宙空間輸送機器を含む。 The transport device is a transport device that is so large that the sound signal processing accuracy is substantially affected by switching the sound signal processing parameters in accordance with the generation position of the target sound in the transport device. More specifically, in the case where the target sound is a sound emitted from the user (for example, a voice of the user), it is a transport device in which the user (occupant) can move. Examples of such transportation equipment are represented by marine transportation equipment represented by ships, ground transportation equipment represented by trains and large vehicles (for example, buses), aerial transportation equipment represented by airplanes, and spacecrafts. Includes space transportation equipment.

処理の対象となる音の例としては、声、拍手または足音のように人間がその身体から発する音、楽器その他の音源（笛、警笛、電子音源など）のような音発生装置から発せられる音、輸送機器に備えられた機械（エンジンや舵取り装置など）から発せられる機械音を挙げることができる。
音信号処理の例は、対象音の認識処理（たとえば、人間の音声の認識処理）、特定の音を検出する処理、および雑音除去処理を含む。 Examples of sounds to be processed include sounds emitted from human bodies such as voices, applause or footsteps, sounds emitted from sound generators such as musical instruments and other sound sources (such as whistles, horns, and electronic sound sources) A mechanical sound emitted from a machine (such as an engine or a steering device) provided in a transportation device can be given.
Examples of sound signal processing include target sound recognition processing (for example, human speech recognition processing), processing for detecting a specific sound, and noise removal processing.

対象音の入力源の例としては、対象音を含む音を受音して音信号に変換するマイクロフォン、および予め録音した音を再生する再生機器を挙げることができる。
発生位置特定手段は、音信号処理のパラメータの切り換えが必要なほど音環境の異なる複数の離隔した位置を特定するものであればよい。たとえば、輸送機器が船舶の場合、発生位置特定手段は、対象音の発生位置を操縦室、キャビン、フライングデッキおよびアウトデッキのうちのいずれかに特定するものであってもよい。 Examples of the target sound input source include a microphone that receives a sound including the target sound and converts it into a sound signal, and a playback device that reproduces a pre-recorded sound.
The generation position specifying means may be any means that specifies a plurality of separated positions having different sound environments so that the sound signal processing parameters need to be switched. For example, when the transport device is a ship, the generation position specifying unit may specify the generation position of the target sound in any one of a cockpit, a cabin, a flying deck, and an out deck.

また、発生位置特定手段は、主要な雑音源による音信号処理への影響が実質的に異なる複数の位置を区別して特定するものであってもよい。より具体的には、発生位置特定手段は、主要な雑音源（たとえば原動機）からの距離の異なる複数の位置を区別して特定するものであってもよい。また、たとえば、主要な雑音源が輸送機器の移動に応じて生じる風切り音である場合には、発生位置特定手段は、風当たり状況の異なる複数の位置を区別して特定するものであってもよい。さらに、発生位置特定手段は、遮蔽空間内（たとえば、船舶のキャビン内）と、遮蔽空間外（開放空間。たとえば、船舶のデッキ）とを区別して特定するものであってもよい。 Further, the generation position specifying means may distinguish and specify a plurality of positions that are substantially different in influence on sound signal processing by a main noise source. More specifically, the generation position specifying means may identify and specify a plurality of positions having different distances from a main noise source (for example, a prime mover). Further, for example, when the main noise source is wind noise generated according to the movement of the transport device, the generation position specifying means may identify and specify a plurality of positions having different wind-contact situations. Furthermore, the generation position specifying means may identify and specify the inside of the shielded space (for example, in the cabin of the ship) and the outside of the shielded space (open space, for example, the deck of the ship).

前記対象音処理装置は、使用者自身が発する音または使用者の近傍で発生する音を処理するものである場合、使用者が携帯または装着可能な形態に構成されることが好ましい。たとえば、使用者の音声を認識して、その認識結果に基づいて輸送機器に備えられた機器（輸送機器の操縦のための主要機器、または操縦以外の目的の補助機器）を操作する音声指示装置に対して、この対象音処理装置が適用されてもよい。 The target sound processing device is preferably configured in a form that the user can carry or wear when processing the sound generated by the user himself or the sound generated in the vicinity of the user. For example, a voice instruction device that recognizes a user's voice and operates a device (a main device for maneuvering the transportation device or an auxiliary device other than the maneuvering device) provided in the transportation device based on the recognition result However, this target sound processing apparatus may be applied.

請求項２記載の発明は、前記輸送機器の属性情報および前記輸送機器の環境情報の少なくとも１つを含む輸送機器情報を取得する輸送機器情報取得手段をさらに含み、前記パラメータ切り換え手段は、前記発生位置特定手段によって特定された発生位置に加えて、前記輸送機器情報取得手段によって取得された輸送機器情報に基づいて、前記音信号処理手段における前記音信号処理のパラメータを切り換えるものであることを特徴とする請求項１記載の対象音処理装置である。 The invention according to claim 2 further includes transport equipment information acquisition means for acquiring transport equipment information including at least one of attribute information of the transport equipment and environmental information of the transport equipment, and the parameter switching means includes the generation The sound signal processing parameter in the sound signal processing means is switched based on the transport equipment information acquired by the transport equipment information acquisition means in addition to the generation position specified by the position specifying means. The target sound processing device according to claim 1.

この構成によれば、輸送機器の属性情報および／または環境情報をも加味して音信号処理のパラメータが切り換えられる。これにより、音信号処理の精度をさらに向上することができる。
輸送機器の属性情報は、輸送機器の内的性質（とくに騒音環境に関連のあるもの）に関する情報である。輸送機器の属性情報は、さらに、輸送機器に固有の静的な属性情報と、輸送機器の作動期間中に変動する動的な属性情報とに分けることができる。静的な属性情報の例は、当該輸送機器の大きさ、輸送機器の種類および原動機の種類（輸送機器内蔵型、外部取り付け型など）を含む。動的な属性情報の例は、原動機の回転数および原動機の状態を含む。輸送機器の推進力を得るための原動機は、熱エネルギーを利用した熱機関（エンジン）であってもよいし、電気エネルギーを利用した電気モータであってもよい。このような輸送機器の属性情報を取得するように構成することで、輸送機器の種類毎に異なる状況にも対応することができる。 According to this configuration, the sound signal processing parameters are switched in consideration of the attribute information and / or environmental information of the transport equipment. Thereby, the accuracy of the sound signal processing can be further improved.
The attribute information of the transport equipment is information on the internal properties of the transport equipment (particularly related to the noise environment). The attribute information of the transport device can be further divided into static attribute information unique to the transport device and dynamic attribute information that varies during the operation period of the transport device. Examples of static attribute information include the size of the transport device, the type of transport device, and the type of prime mover (transport device built-in type, externally mounted type, etc.). Examples of the dynamic attribute information include the rotational speed of the prime mover and the state of the prime mover. The prime mover for obtaining the propulsive force of the transportation device may be a heat engine (engine) using thermal energy or an electric motor using electric energy. By configuring so as to acquire such attribute information of the transport device, it is possible to cope with different situations for each type of transport device.

輸送機器の環境情報は、輸送機器を取り巻く外的環境要因（とくに騒音環境に関連のある外的要因）に関する情報である。輸送機器の環境情報の例は、輸送機器の周囲の風の強さ、風の向き、および波の大きさを含む。また、宇宙空間輸送機器の環境情報の例としては、当該輸送機器が大気圏内にあるか、大気圏外にあるかの情報を挙げることができる。
輸送機器情報取得手段は、前記例示した属性情報および環境情報の全てを取得するものであってもよく、それらの一部を取得するものであってもよい。 The environmental information of the transport equipment is information on external environmental factors surrounding the transport equipment (particularly external factors related to the noise environment). Examples of environmental information of the transport device include wind intensity, wind direction, and wave magnitude around the transport device. Moreover, as an example of the environmental information of the space transportation equipment, information on whether the transportation equipment is in the atmosphere or outside the atmosphere can be given.
The transport device information acquisition means may acquire all of the exemplified attribute information and environment information, or may acquire a part of them.

請求項３記載の発明は、前記輸送機器は、前記輸送機器の属性情報（とくに静的な属性情報）を蓄積したデータベースを備えており、前記輸送機器情報取得手段は、前記データベースから前記輸送機器の属性情報を取得する属性情報取得手段を含むことを特徴とする請求項２記載の対象音処理装置である。
この構成によれば、輸送機器の設備であるデータベースを利用して、輸送機器の属性情報を取得できるので、簡単な構成で、輸送機器の正確な属性情報を得ることができる。 According to a third aspect of the present invention, the transport device includes a database in which attribute information (particularly static attribute information) of the transport device is stored, and the transport device information acquisition unit is configured to receive the transport device from the database. 3. The target sound processing apparatus according to claim 2, further comprising attribute information acquisition means for acquiring the attribute information.
According to this configuration, since the attribute information of the transport device can be acquired using the database that is the facility of the transport device, accurate attribute information of the transport device can be obtained with a simple configuration.

たとえば、輸送機器は、その内部にＬＡＮ（ローカルエリアネットワーク）を備えていて、このＬＡＮに前記データベースが接続されていてもよい。この場合には、前記対象音処理装置に、当該ＬＡＮとの通信のための通信手段を備えることにより、前記データベースから輸送機器の属性情報を取得できる。
輸送機器の属性情報は、キーボード等の入力装置からの入力操作によって当該対象音処理装置に入力するようにしてもよい。 For example, the transport device may include a LAN (Local Area Network) inside and the database may be connected to the LAN. In this case, the attribute information of the transport equipment can be acquired from the database by providing the target sound processing device with communication means for communication with the LAN.
The attribute information of the transport device may be input to the target sound processing device by an input operation from an input device such as a keyboard.

請求項４記載の発明は、前記輸送機器は、前記輸送機器情報を検出する輸送機器情報検出手段を備えており、前記輸送機器情報取得手段は、前記輸送機器情報検出手段によって検出された輸送機器情報を前記輸送機器から取得するものであることを特徴とする請求項２または３記載の対象音処理装置である。
この構成により、輸送機器の設備である輸送機器情報検出手段を利用して、輸送機器の属性情報および／または環境情報を取得できる。したがって、対象音処理装置自身の構成を複雑化することなく、輸送機器情報を取得できる。 According to a fourth aspect of the present invention, the transport equipment includes transport equipment information detection means for detecting the transport equipment information, and the transport equipment information acquisition means is transport equipment detected by the transport equipment information detection means. 4. The target sound processing apparatus according to claim 2, wherein the information is acquired from the transport device.
With this configuration, the attribute information and / or environmental information of the transport device can be acquired by using the transport device information detection means that is a facility of the transport device. Therefore, the transport equipment information can be acquired without complicating the configuration of the target sound processing device itself.

前記輸送機器情報検出手段は、前記属性情報（とくに動的な属性情報）を検出する属性情報検出手段を含んでいてもよい。このような属性情報検出手段の例としては、原動機回転数センサのように、輸送機器に備えられた機器の作動状態を検出するセンサ類を挙げることができる。
また、前記輸送機器情報検出手段は、前記輸送機器の周囲の環境情報を検出する環境情報検出手段を含んでいて、前記輸送機器情報取得手段は、前記環境情報検出手段によって検出された環境情報を前記輸送機器から取得する環境情報取得手段を含むものであってもよい。この環境情報検出手段の例は、風力計、風向計および波高計を含む。また、前記環境情報検出手段は、情報通信によって、輸送機器の周辺の環境情報を取得するものであってもよい。 The transport equipment information detecting means may include attribute information detecting means for detecting the attribute information (particularly dynamic attribute information). Examples of such attribute information detection means include sensors that detect the operating state of the equipment provided in the transport equipment, such as a motor speed sensor.
In addition, the transport equipment information detection means includes environmental information detection means for detecting environmental information around the transport equipment, and the transport equipment information acquisition means uses the environmental information detected by the environmental information detection means. It may include environmental information acquisition means that acquires from the transport equipment. Examples of the environmental information detection means include an anemometer, an anemometer, and a wave height meter. The environmental information detection means may acquire environmental information around the transport device by information communication.

たとえば、輸送機器の内部に前記のＬＡＮのような通信手段が備えられている場合には、対象音処理装置は、このＬＡＮを介して前記輸送機器情報を取得する手段（通信装置）を備えていることが好ましい。
請求項５記載の発明は、前記音信号処理手段は、入力される音信号から雑音成分を除去する雑音除去処理を実行する雑音除去処理手段を含み、前記パラメータ切り換え手段は、前記雑音除去処理手段が実行する雑音除去処理に適用される雑音除去パラメータ（アルゴリズムおよび／または雑音除去処理の係数）を切り換える雑音除去パラメータ切り換え手段を含むことを特徴とする請求項１ないし４のいずれかに記載の対象音処理装置である。 For example, when the communication device such as the LAN is provided in the transportation device, the target sound processing device includes a device (communication device) for acquiring the transportation device information via the LAN. Preferably it is.
According to a fifth aspect of the present invention, the sound signal processing means includes noise removal processing means for performing noise removal processing for removing a noise component from the input sound signal, and the parameter switching means is the noise removal processing means. 5. The object according to claim 1, further comprising noise removal parameter switching means for switching a noise removal parameter (algorithm and / or coefficient of the noise removal process) applied to the noise removal process executed by. It is a sound processing device.

この構成によれば、雑音除去処理のパラメータが輸送機器内の対象音の発生位置に応じて切り換えられる。これにより、輸送機器内の異なる複数の位置における雑音状況に適した雑音除去処理を行うことができ、音信号から対象音の成分を高精度に抽出できる。
さらに、請求項２の特徴との組み合わせの場合には、輸送機器の属性情報および／または環境情報がさらに加味されて雑音除去パラメータが選択されることになるので、より適切な雑音除去処理が行われる。 According to this configuration, the parameter of the noise removal process is switched according to the target sound generation position in the transport device. Thereby, it is possible to perform a noise removal process suitable for noise situations at a plurality of different positions in the transport device, and to extract the component of the target sound from the sound signal with high accuracy.
Further, in the case of the combination with the feature of claim 2, since the noise removal parameter is selected by further adding the attribute information and / or environment information of the transport equipment, more appropriate noise removal processing is performed. Is called.

前記雑音除去パラメータの例としては、雑音除去アルゴリズムの種類、複数種類の雑音除去アルゴリズムの組み合わせ、雑音除去アルゴリズムに使用される係数を挙げることができる。すなわち、雑音除去パラメータの選択は、複数種類の雑音除去アルゴリズムからの１つの雑音除去アルゴリズムの選択であってもよい。また、雑音除去パラメータの選択は、複数種類のアルゴリズムからの２つ以上の雑音除去アルゴリズムの組み合わせの選択であってもよい。さらに、雑音除去パラメータの選択は、特定の雑音除去アルゴリズムに適用される係数の選択であってもよい。 Examples of the noise removal parameters include types of noise removal algorithms, combinations of a plurality of types of noise removal algorithms, and coefficients used for the noise removal algorithms. That is, the selection of the noise removal parameter may be selection of one noise removal algorithm from a plurality of types of noise removal algorithms. Further, the selection of the noise removal parameter may be a selection of a combination of two or more noise removal algorithms from a plurality of types of algorithms. Further, the selection of denoising parameters may be a selection of coefficients that are applied to a particular denoising algorithm.

雑音除去アルゴリズムの例としては、高域強調フィルタ処理およびスペクトルサブトラクション処理を挙げることができる。
請求項６記載の発明は、前記音信号処理手段は、入力される音信号から対象音を含む信号区間を抽出する対象音区間抽出処理を実行する対象音抽出手段を含み、前記パラメータ切り換え手段は、前記対象音抽出手段が実行する対象音区間抽出処理に適用される対象音抽出パラメータを切り換える対象音抽出パラメータ切り換え手段を含むことを特徴とする請求項１ないし５のいずれかに記載の対象音処理装置である。 Examples of the noise removal algorithm include high frequency enhancement filter processing and spectral subtraction processing.
According to a sixth aspect of the present invention, the sound signal processing means includes target sound extraction means for executing target sound section extraction processing for extracting a signal section including the target sound from an input sound signal, and the parameter switching means is 6. The target sound according to claim 1, further comprising target sound extraction parameter switching means for switching a target sound extraction parameter applied to the target sound section extraction processing executed by the target sound extraction means. It is a processing device.

この構成によれば、対象音区間抽出処理のパラメータが、輸送機器内における対象音発生位置に応じて切り換えられる。これにより、対象音を含む信号区間を正確に抽出できる。とくに、請求項２の構成との組み合わせの場合には、輸送機器の属性情報および／または環境情報がさらに加味されて対象音抽出パラメータが変更されるから、対象音の信号区間をより高精度に抽出できる。 According to this configuration, the parameter of the target sound segment extraction process is switched according to the target sound generation position in the transport device. Thereby, the signal section containing the target sound can be accurately extracted. In particular, in the case of the combination with the configuration of claim 2, the target sound extraction parameter is changed by further adding the attribute information and / or environment information of the transport equipment, so that the signal section of the target sound is made more accurate. Can be extracted.

対象音が、人間の音声である場合には、対象音区間抽出処理は、話者が発話した発話区間の信号を抽出する発話区間抽出（検出）処理であってもよい。
対象音区間抽出処理は、たとえば、入力される音信号の周波数スペクトルを分析し、その分析結果に対して統計処理を施すことによって、音信号から対象音の信号区間を抽出する処理であってもよい。この場合に、前記統計処理に適用されるパラメータを、輸送機器内の対象音発生位置や輸送機器の属性情報／環境情報に応じて切り換えるようにしてもよい。 When the target sound is human speech, the target sound section extraction process may be an utterance section extraction (detection) process that extracts a signal of the utterance section uttered by the speaker.
For example, the target sound segment extraction process is a process of extracting the signal segment of the target sound from the sound signal by analyzing the frequency spectrum of the input sound signal and performing statistical processing on the analysis result. Good. In this case, the parameter applied to the statistical processing may be switched according to the target sound generation position in the transport device and the attribute information / environment information of the transport device.

請求項７記載の発明は、前記音信号処理手段は、対象音の音信号と照合可能な音響モデルを複数種類有しており、前記パラメータ切り換え手段は、前記複数種類の音響モデルを切り換える音響モデル切り換え手段を含むことを特徴とする請求項１ないし６のいずれかに記載の対象音処理装置である。
この構成によれば、輸送機器内における対象音発生位置に応じて音響モデルが切り換えられる。これにより、対象音と音響モデルとの照合処理を正確に行うことができ、たとえば、対象音の認識処理の精度を高めることができる。 The sound signal processing means has a plurality of types of acoustic models that can be collated with the sound signal of the target sound, and the parameter switching means is an acoustic model that switches between the plurality of types of acoustic models. 7. The target sound processing apparatus according to claim 1, further comprising switching means.
According to this configuration, the acoustic model is switched according to the target sound generation position in the transport device. Thereby, the collation process with an object sound and an acoustic model can be performed correctly, for example, the precision of the recognition process of an object sound can be improved.

とくに、請求項２の構成との組み合わせの場合には、輸送機器の属性情報および／または環境情報がさらに加味されて音響モデルが変更されるから、より正確な照合処理が可能になる。
請求項８記載の発明は、前記輸送機器は、当該輸送機器内の異なる複数の位置に配置され、位置識別情報をそれぞれ担持（記録）した複数の識別情報担持体を備えており、前記発生位置特定手段は、前記識別情報担持体に担持された位置識別情報を読み取る読取装置を含むことを特徴とする請求項１ないし７のいずれかに記載の対象音処理装置である。 In particular, in the case of the combination with the configuration of claim 2, the acoustic model is changed by further adding the attribute information and / or environment information of the transport equipment, so that more accurate matching processing is possible.
The invention according to claim 8 is characterized in that the transport device includes a plurality of identification information carriers arranged (recorded) at different positions in the transport device and each carrying (recording) position identification information. 8. The target sound processing apparatus according to claim 1, wherein the specifying unit includes a reading device that reads position identification information carried on the identification information carrier.

この構成によれば、輸送機器の複数の位置に配置された識別情報担持体が担持する位置識別情報を読取装置によって読み取ることで、対象音の発生位置が特定される。
識別情報担持体は、対象音の発生位置の近傍に配置されればよい。たとえば、対象音が使用者の声である場合のように、対象音の発生位置（発話位置）が輸送機器内で動き回るような場合、使用者は、発話しようとする場所の近傍の識別情報担持体の位置識別情報を読取装置に読み取らせる。これにより、使用者の発話位置に対応したパラメータが設定される。 According to this configuration, the position where the target sound is generated is specified by reading the position identification information carried by the identification information carrier disposed at a plurality of positions of the transport device by the reading device.
The identification information carrier may be disposed in the vicinity of the target sound generation position. For example, when the target sound generation position (speech position) moves around in the transport device, such as when the target sound is the voice of the user, the user carries identification information in the vicinity of the place where the target speech is to be made. The reader is made to read the body position identification information. Thereby, the parameter corresponding to the user's utterance position is set.

前記識別情報担持体の例としては、ＩＣタグやバーコードラベル等の情報記録媒体を挙げることができる。ＩＣタグを用いるときには、前記読取装置としてＩＣタグ読取装置が用いられる。同様に、バーコードラベルを用いるときには、前記読取装置としてバーコードリーダが用いられる。
また、前記読取装置または当該読取装置を含む対象音処理装置全体が、使用者が携帯または装着可能な形態に構成されていることが好ましい。 Examples of the identification information carrier include information recording media such as IC tags and barcode labels. When using an IC tag, an IC tag reader is used as the reader. Similarly, when a barcode label is used, a barcode reader is used as the reading device.
Moreover, it is preferable that the reading device or the entire target sound processing device including the reading device is configured to be carried or worn by the user.

前記発生位置特定手段は、使用者が位置情報を入力する位置情報入力手段を含むものであってもよい。この構成によれば、使用者が位置情報を入力する構成であるので、簡単な構成で、対象音の発生位置に適応した精度の高い音信号処理を実現できる。位置情報入力手段としては、キーボード等の入力装置を適用できる。
請求項９記載の発明は、前記輸送機器は船舶であり、前記発生位置特定手段は、船舶内における対象音の発生位置を特定するものであることを特徴とする請求項１ないし８のいずれかに記載の対象音処理装置である。 The generation position specifying means may include position information input means for a user to input position information. According to this configuration, since the user inputs the position information, a highly accurate sound signal process adapted to the target sound generation position can be realized with a simple configuration. An input device such as a keyboard can be applied as the position information input means.
The invention according to claim 9 is characterized in that the transport device is a ship, and the generation position specifying means specifies the generation position of the target sound in the ship. The target sound processing device according to claim 1.

一般に、船舶では、キャビンやデッキなど、対象音の発生位置が広く分布しており、原動機からの距離や風当たりの状況によって、雑音環境が大きく異なる。したがって、船舶内における対象音の発生位置を特定して、その特定された発生位置に応じて音信号処理のパラメータを切り換えることにより、精度の高い音信号処理が可能になる。
請求項１０記載の発明は、前記対象音は、前記輸送機器内の話者が発話した音声であり、
前記音信号処理手段は、前記話者の音声を認識する音声認識処理を実行する音声認識処理手段を含むことを特徴とする請求項１ないし９のいずれかに記載の対象音処理装置である。 In general, in ships, the generation positions of target sounds such as cabins and decks are widely distributed, and the noise environment varies greatly depending on the distance from the prime mover and the wind conditions. Therefore, the sound signal processing with high accuracy can be performed by specifying the generation position of the target sound in the ship and switching the sound signal processing parameters according to the specified generation position.
In the invention according to claim 10, the target sound is a voice uttered by a speaker in the transport device,
10. The target sound processing apparatus according to claim 1, wherein the sound signal processing means includes voice recognition processing means for executing voice recognition processing for recognizing the voice of the speaker.

この構成によれば、輸送機器内における話者の発話位置が特定され、その特定された発話位置に応じて、音声認識処理に適用されるパラメータが切り換えられる。これにより、音声認識率を向上できる。
話者の音声は、ハンドセット型またはヘッドセット型のマイクロフォンから入力されてもよい。また、マイクロフォンおよび対象音処理装置が、携帯可能な筐体内に一体化されていてもよい。輸送機器が船舶である場合、話者の音声を入力するためのマイクロフォンは、乗員が装着する救命胴衣等の装着具に取り付けられていてもよい。この装着具には、対象音処理装置が併せて取り付けられていてもよい。高い音声認識率を得るためには、発話位置とマイクロフォンとの位置関係が一定に保たれることが好ましいから、ヘッドセット型マイクロフォンを使用したり、装着具に取り付けられたマイクロフォンを使用したりすることが好ましい。 According to this configuration, the utterance position of the speaker in the transport device is specified, and the parameters applied to the speech recognition process are switched according to the specified utterance position. Thereby, the voice recognition rate can be improved.
The speaker's voice may be input from a handset or headset microphone. Further, the microphone and the target sound processing device may be integrated in a portable housing. When the transport device is a ship, the microphone for inputting the voice of the speaker may be attached to a wearing tool such as a life jacket worn by the occupant. The target sound processing device may be attached to the mounting tool. In order to obtain a high speech recognition rate, it is preferable that the positional relationship between the utterance position and the microphone is kept constant. Therefore, a headset type microphone or a microphone attached to a wearing device is used. It is preferable.

また、話者の音声を入力するための複数のマイクロフォンが、輸送機器内の複数の位置に分散配置されていてもよい。この場合、たとえば、話者が所定の認証情報を担持した認証情報担持体（たとえばカード状のもの。ＩＣタグ、バーコードラベルなど。）を保持し、マイクロフォンの各設置箇所の近傍に読取装置を配置する。使用に際しては、話者が当該読取装置によって認証情報担持体の認証情報を読み取らせる。これにより、いずれの場所の読取装置によって認証情報が入力されたかによって、話者の位置を特定することができる。この特定された位置情報に基づいて、音声認識処理に適用されるパラメータを切り換えればよい。 Further, a plurality of microphones for inputting a speaker's voice may be distributed at a plurality of positions in the transport device. In this case, for example, the speaker holds an authentication information carrier (for example, a card-like one, an IC tag, a bar code label, etc.) carrying predetermined authentication information, and a reading device is placed near each microphone installation location. Deploy. In use, the speaker causes the reader to read the authentication information of the authentication information carrier. As a result, the position of the speaker can be specified depending on where the authentication information is input by the reading device. Based on the specified position information, the parameters applied to the speech recognition process may be switched.

請求項１１記載の発明は、輸送機器と、請求項１ないし１０のいずれかに記載の対象音処理装置を備えたことを特徴とする輸送機器システムである。この構成により、輸送機器上での対象音の処理の精度を向上できる。
請求項１２記載の発明は、輸送機器と、請求項１０記載の対象音処理装置と、前記音声認識処理手段による音声認識結果に基づいて、前記話者による前記輸送機器の操作のための指示を特定し、その指示に対応した輸送機器操作データを生成する輸送機器操作データ生成手段と、この輸送機器操作データ生成手段によって生成された輸送機器操作データに基づいて前記輸送機器に備えられた機器を作動させる機器制御手段とを含むことを特徴とする輸送機器システムである。 The invention described in claim 11 is a transport apparatus system comprising the transport apparatus and the target sound processing device according to any one of claims 1 to 10. With this configuration, it is possible to improve the accuracy of processing the target sound on the transport device.
According to a twelfth aspect of the present invention, an instruction for operating the transport device by the speaker is based on a voice recognition result by the transport device, the target sound processing device according to the tenth embodiment, and the voice recognition processing means. A transport equipment operation data generating means for identifying and generating transport equipment operation data corresponding to the instruction, and a device provided in the transport equipment based on the transport equipment operation data generated by the transport equipment operation data generating means A transportation device system including a device control means to be operated.

この構成により、話者が与える音声指示に対応した輸送機器操作データが生成され、これに基づいて、輸送機器に備えられた各種の機器を作動させることができる。音声認識処理に適用されるパラメータは話者の位置に応じて適切に切り換えられるから、高い音声認識率を確保できる。したがって、音声指示による輸送機器の操作を円滑に行うことができる。 With this configuration, transportation device operation data corresponding to the voice instruction given by the speaker is generated, and based on this, various devices provided in the transportation device can be operated. Since the parameters applied to the speech recognition process are appropriately switched according to the position of the speaker, a high speech recognition rate can be ensured. Therefore, the transportation device can be operated smoothly by voice instruction.

前記輸送機器操作データ生成手段は、音声認識手段を備えた対象音処理装置側に備えられてもよい。また、輸送機器操作データ生成手段は、輸送機器側に備えられてもよい。
請求項１３記載の発明は、輸送機器内で生じる所定の対象音を処理するための対象音処理方法であって、前記輸送機器内における対象音の発生位置を特定するステップと、対象音を含む音の電気信号である音信号に対して音信号処理を実行するステップと、前記音信号処理に適用されるパラメータを、前記特定された対象音の発生位置に応じて切り換えるステップとを含むことを特徴とする対象音処理方法である。 The transport equipment operation data generation means may be provided on the target sound processing apparatus side provided with voice recognition means. Further, the transport equipment operation data generating means may be provided on the transport equipment side.
A thirteenth aspect of the present invention is a target sound processing method for processing a predetermined target sound generated in a transport device, the method including a step of specifying a generation position of the target sound in the transport device and the target sound Performing sound signal processing on a sound signal that is an electric signal of sound, and switching a parameter applied to the sound signal processing in accordance with the specified generation position of the target sound. This is a characteristic target sound processing method.

この方法により、対象音の処理態様を、輸送機器内の位置に依存して変動する状況に適用させることができる。したがって、輸送機器内でありながら、対象音の処理を高精度に行える。
請求項１４記載の発明は、前記輸送機器の属性情報および前記輸送機器の環境情報の少なくとも１つを含む輸送機器情報を取得するステップをさらに含み、前記パラメータを切り換えるステップは、前記特定された対象音の発生位置に加えて、前記取得された輸送機器情報に基づいて、前記音信号処理に適用されるパラメータを切り換えるステップを含むことを特徴とする請求項１３記載の対象音処理方法である。 By this method, the processing mode of the target sound can be applied to a situation that varies depending on the position in the transport device. Therefore, the target sound can be processed with high accuracy while in the transport device.
The invention according to claim 14 further includes a step of acquiring transport device information including at least one of attribute information of the transport device and environmental information of the transport device, and the step of switching the parameter includes the specified object The target sound processing method according to claim 13, further comprising a step of switching a parameter applied to the sound signal processing based on the acquired transport device information in addition to a sound generation position.

この方法によれば、輸送機器の属性情報および環境情報を加味して音信号処理のパラメータが切り換えられるので、音信号処理の精度をさらに向上できる。
これらの方法の発明に関しても、前述の対象音処理装置に関連して説明したような変形が可能である。 According to this method, the sound signal processing parameters are switched in consideration of the attribute information and environmental information of the transport equipment, so that the accuracy of the sound signal processing can be further improved.
With respect to the inventions of these methods, the modifications described in relation to the above-described target sound processing apparatus can be made.

以下では、この発明の実施の形態を、添付図面を参照して詳細に説明する。
図１は、この発明の一実施形態に係る輸送機器システムの一例としての船舶システムの概念的な構成を示す図解図であり、図２は前記船舶システムを構成する船舶の図解的な平面図である。この船舶システムは、輸送機器の一例である船舶１に備えられた機器の操作を音声指示によって行うことができるものである。操作可能な機器は、この実施形態では、推進装置の一例としての船外機２、舵取りのための操舵装置３、信号紅炎装置４、船舶電話装置５およびゴムボート作動装置７を含む。これらの船舶１に備えられた機器の操作を、以下では、「船舶操作」という。したがって、船舶操作には、推進装置（船外機２）および舵取り装置（操舵装置３）のような船舶１の航行に不可欠な機器の操作である「操船操作」と、信号紅炎装置４、船舶電話装置５およびゴムボート作動装置７のように船舶１の航行には直接的な関係のない「付属機器操作」とが含まれる。 Hereinafter, embodiments of the present invention will be described in detail with reference to the accompanying drawings.
FIG. 1 is an illustrative view showing a conceptual configuration of a ship system as an example of a transportation device system according to an embodiment of the present invention, and FIG. 2 is an illustrative plan view of the ships constituting the ship system. is there. This ship system can perform operation of equipment provided in the ship 1 which is an example of transport equipment by voice instruction. In this embodiment, the operable equipment includes an outboard motor 2 as an example of a propulsion device, a steering device 3 for steering, a signal red flame device 4, a marine telephone device 5, and a rubber boat operating device 7. The operation of the devices provided in these ships 1 is hereinafter referred to as “ship operation”. Therefore, in ship operation, “ship operation” which is an operation of equipment essential for navigation of the ship 1 such as a propulsion device (outboard motor 2) and a steering device (steering device 3), a signal red flame device 4, “Vehicle telephone device 5” and “rubber boat actuating device 7” include “accessory operation” which is not directly related to the navigation of ship 1.

船舶操作のための音声指示を与えるための音声指示装置１０は、船舶１の乗員である話者８が携帯するトランシーバ形（ハンドセット形）のものである。この音声指示装置１０は、船舶１との間で、電波を利用した無線通信を行う。これにより、音声指示装置１０を保持した話者８は、音声指示による船舶操作を行うことができる。この音声指示装置１０が対象音処理装置に相当する。 The voice instruction device 10 for giving a voice instruction for ship operation is of a transceiver type (handset type) carried by a speaker 8 who is an occupant of the ship 1. The voice instruction device 10 performs radio communication with the ship 1 using radio waves. Thereby, the speaker 8 holding the voice instruction device 10 can perform a ship operation by voice instruction. This voice instruction device 10 corresponds to a target sound processing device.

船舶１は、船体（ハル）５１と、この船体５１上に設けられた上部構造物（スーパーストラクチャー）５２とを備えている。上部構造物５２は、図１の例では、船体５１の上面を覆うアウトデッキ（外甲板）５３と、このアウトデッキ５３のほぼ中央に設けられたキャビン５４と、このキャビン５４の前方に設けられた操縦室５５とを含む。キャビン５４の天井部は、アウトデッキ５３よりも上方に位置し、フライングデッキ５６を形成している。 The marine vessel 1 includes a hull (hull) 51 and an upper structure (superstructure) 52 provided on the hull 51. In the example of FIG. 1, the upper structure 52 is provided in an out deck (outer deck) 53 that covers the upper surface of the hull 51, a cabin 54 provided substantially in the center of the out deck 53, and in front of the cabin 54. And the cockpit 55. The ceiling portion of the cabin 54 is located above the out deck 53 and forms a flying deck 56.

船体５１には、船外機（アウトボード・エンジン）２がその後尾に取り付けられている。また、船体５１内には、操舵部１３、シフト・スロットル操作部１４、キー入力部１５および無線通信部１６が設けられている。さらに、船体５１内には、前述の信号紅炎装置４、船舶電話装置５およびゴムボート作動装置７が配置されている。操舵装置３は、操舵部１３、この操舵部１３に結合されたステアリングハンドル１８および船外機２に結合されたステアリングケーブル１９などによって構成されている。 An outboard motor (outboard engine) 2 is attached to the hull 51 at the tail. In the hull 51, a steering unit 13, a shift / throttle operation unit 14, a key input unit 15, and a wireless communication unit 16 are provided. Further, in the hull 51, the signal red flame device 4, the marine telephone device 5 and the rubber boat operating device 7 are arranged. The steering device 3 includes a steering unit 13, a steering handle 18 coupled to the steering unit 13, a steering cable 19 coupled to the outboard motor 2, and the like.

図２に示すように、船舶１の複数の異なる位置には、各位置を特定するための識別情報（位置識別情報）を担持した識別情報担持体としてのＩＣタグ６１，６２，・・・・・・，６Ｍ（ただし、Ｍは２以上の自然数）が配置されている。すなわち、ＩＣタグ６１〜６Ｍは、船舶１内において話者８が位置する可能性のある複数箇所に分散して配置されている。より具体的には、ＩＣタグ６１は操縦室５５内に配置され、ＩＣタグ６２はキャビン５４内に配置され、ＩＣタグ６３はアウトデッキ５３の船首側において操縦室５５の右側（右舷）に配置され、ＩＣタグ６４はアウトデッキ５３の船首側において操縦室５５の左側（左舷）に配置され、ＩＣタグ６Ｍは、フライングデッキ５６に配置されている。 As shown in FIG. 2, IC tags 61, 62,... As identification information carriers carrying identification information (position identification information) for specifying each position are provided at a plurality of different positions on the ship 1. .., 6M (where M is a natural number of 2 or more) are arranged. That is, the IC tags 61 to 6 M are distributed and arranged in a plurality of places where the speaker 8 may be located in the ship 1. More specifically, the IC tag 61 is disposed in the cockpit 55, the IC tag 62 is disposed in the cabin 54, and the IC tag 63 is disposed on the right side (starboard) of the cockpit 55 on the bow side of the out deck 53. The IC tag 64 is disposed on the left side (port) of the cockpit 55 on the bow side of the out deck 53, and the IC tag 6M is disposed on the flying deck 56.

図３は、船舶１の内部の主として電気的構成を示すブロック図である。船舶１には、船内ＬＡＮ（ローカルエリアネットワーク）１１が設けられている。この船内ＬＡＮ１１に、データベース装置１２、操舵部１３、シフト・スロットル操作部１４、キー入力部１５、無線通信部１６、処理・出力部１７、信号紅炎装置４、船舶電話装置５およびゴムボート作動装置７が接続されている。船内ＬＡＮ１１には、さらに、船舶１の環境情報を得るための環境情報センサ９（環境情報検出手段）が接続されている。この環境情報センサ９の例としては、風向計、風力計および波高計（たとえば、Ｇセンサ）を挙げることができる。船内ＬＡＮ１１には、周知のハードウェア規格およびプロトコルを適用できる。 FIG. 3 is a block diagram mainly showing an electrical configuration inside the ship 1. The ship 1 is provided with an inboard LAN (local area network) 11. The inboard LAN 11 includes a database device 12, a steering unit 13, a shift / throttle operation unit 14, a key input unit 15, a wireless communication unit 16, a processing / output unit 17, a signal red flame device 4, a ship telephone device 5, and a rubber boat operating device. 7 is connected. Further, an environmental information sensor 9 (environment information detecting means) for obtaining environmental information of the ship 1 is connected to the inboard LAN 11. Examples of the environmental information sensor 9 include an anemometer, an anemometer, and a wave height meter (for example, a G sensor). Known hardware standards and protocols can be applied to the inboard LAN 11.

船外機２には、エンジン２１、エンジン制御ユニット２２、スタータモータ２３が備えられている。エンジン制御ユニット２２は、船内ＬＡＮ１１に接続されている。また、エンジン制御ユニット２２には、エンジン２１のシフト状態およびスロットル開度状態を変更するためのアクチュエータ２４が備えられている。
スタータモータ２３には、バッテリ２６からの電力が、スタータリレー２７を介して供給されるようになっている。スタータリレー２７は、キー入力部１５によって作動されるようになっている。このスタータリレー２７が閉成されることにより、スタータモータ２３に電力が供給され、エンジン２１が始動する。キー入力部１５は、乗員の手動操作に応答してスタータリレー２７を閉成させるほか、船内ＬＡＮ１１から与えられる所定の制御信号に応答してスタータリレー２７を閉成させる。 The outboard motor 2 includes an engine 21, an engine control unit 22, and a starter motor 23. The engine control unit 22 is connected to the inboard LAN 11. The engine control unit 22 is provided with an actuator 24 for changing the shift state and the throttle opening state of the engine 21.
The starter motor 23 is supplied with power from the battery 26 via a starter relay 27. The starter relay 27 is actuated by the key input unit 15. By closing the starter relay 27, electric power is supplied to the starter motor 23, and the engine 21 is started. The key input unit 15 closes the starter relay 27 in response to a passenger's manual operation, and closes the starter relay 27 in response to a predetermined control signal given from the inboard LAN 11.

データベース装置１２は、船舶１の属性情報を予め蓄積したもので、必要に応じてその属性情報を出力する。船舶１の属性情報の例としては、船舶１の大きさ、推進装置の種類、エンジンの回転数およびエンジンの状態が挙げられる。これらのうち、船舶１の大きさおよび推進装置の種類を表す属性情報のように静的な属性情報はデータベース装置１２に蓄積されており、エンジンの回転数およびエンジンの状態（スロットル開度、シフト、エンジンカバーの開閉状態）のように時間とともに変動する動的な属性情報は、エンジン回転数センサなどのセンサ類２０によって検出される。 The database device 12 stores the attribute information of the ship 1 in advance, and outputs the attribute information as necessary. Examples of the attribute information of the ship 1 include the size of the ship 1, the type of propulsion device, the engine speed, and the state of the engine. Among these, static attribute information, such as attribute information indicating the size of the ship 1 and the type of propulsion device, is stored in the database device 12, and the engine speed and engine state (throttle opening, shift) The dynamic attribute information that fluctuates with time such as the engine cover open / closed state) is detected by sensors 20 such as an engine speed sensor.

推進装置の種類を表す属性情報の例としては、船外機、船内外機（インボード・エンジン・アウトボード・ドライブ）、船内機（インボード・ドライブ）およびウォータ・ジェット・ドライブなどの駆動方式種別の情報、２ストロークガソリンエンジン、４ストロークガソリンエンジンおよびディーゼルエンジンなどのエンジン種別の情報を挙げることができる。 Examples of attribute information indicating the type of propulsion device include outboard motors, inboard / outboard motors (inboard engine / outboard drive), inboard motors (inboard drive), and water jet drive. Information on the type can include information on the type of engine such as a 2-stroke gasoline engine, a 4-stroke gasoline engine, and a diesel engine.

操舵部１３は、ステアリングハンドル１８のハンドル角を検出するハンドル角検出部３１と、検出されたハンドル角に対応する機械的変位を発生するステアリングアクチュエータ３２とを有している。このステアリングアクチュエータ３２が発生する機械的変位は、ステアリングケーブル１９を介して船外機２に伝達される。これにより、船外機２の方向が変化させられる。 The steering unit 13 includes a handle angle detection unit 31 that detects the handle angle of the steering handle 18 and a steering actuator 32 that generates a mechanical displacement corresponding to the detected handle angle. The mechanical displacement generated by the steering actuator 32 is transmitted to the outboard motor 2 via the steering cable 19. Thereby, the direction of the outboard motor 2 is changed.

シフト・スロットル操作部１４は、シフト操作情報を船内ＬＡＮ１１を介してエンジン制御ユニット２２に送信するシフト情報送信部３３と、スロットル操作情報を船内ＬＡＮ１１を介してエンジン制御ユニット２２に送信するスロットル情報送信部３４とを備えている。シフト・スロットル操作部１４におけるシフト操作およびスロットル操作は、たとえば、操作レバー（リモートコントロール・レバー）１４ａ（図１参照）の倒れ角を変化させる操作である。操作レバー１４ａを前方に一定量倒すと、シフトが中立から正転（前進）とされ、さらに操作レバー１４ａを前方に倒すに従ってスロットル開度を大きくするスロットル操作を行える。また、操作レバー１４ａを後方に一定量倒すと、シフトが中立から逆転（後退）とされ、さらに操作レバー１４ａを後方に倒すに従ってスロットル開度を大きくするスロットル操作を行える。 The shift / throttle operation unit 14 transmits shift operation information to the engine control unit 22 via the inboard LAN 11, and throttle information transmission that transmits throttle operation information to the engine control unit 22 via the inboard LAN 11. Part 34. The shift operation and the throttle operation in the shift / throttle operation unit 14 are, for example, operations for changing the tilt angle of the operation lever (remote control lever) 14a (see FIG. 1). When the operation lever 14a is tilted forward by a certain amount, the shift is changed from neutral to forward rotation (forward), and further, the throttle operation can be performed to increase the throttle opening as the operation lever 14a is tilted forward. Further, when the operation lever 14a is tilted backward by a certain amount, the shift is reversed from the neutral position (reverse), and further, the throttle operation can be performed to increase the throttle opening as the operation lever 14a is tilted backward.

エンジン制御ユニット２２は、シフト・スロットル操作部１４からのシフト操作およびスロットル操作の情報に従って、アクチュエータ２４を作動させるための信号を出力する。アクチュエータ２４は、この信号に応答して、エンジン２１のスロットル弁（図示せず）の開度を調整したり、シフト機構（クラッチ。図示せず）を作動させたりするための機械的変位を発生する。エンジン２１では、その回転数がエンジン回転数センサによって検出され、その他、スロットル弁の開度、シフト状態（クラッチの状態）のようにエンジン２１の状態を表す情報が各種のセンサ類２０によって検出される。エンジン回転数センサを含むこれらのセンサ類２０の検出信号が、エンジン制御ユニット２２に供給される。これにより、エンジン制御ユニット２２は、エンジン２１の制御状態を検知できる。 The engine control unit 22 outputs a signal for operating the actuator 24 in accordance with the shift operation and throttle operation information from the shift / throttle operation unit 14. In response to this signal, the actuator 24 generates a mechanical displacement for adjusting the opening of a throttle valve (not shown) of the engine 21 or operating a shift mechanism (clutch, not shown). To do. In the engine 21, its rotational speed is detected by an engine rotational speed sensor, and other information indicating the state of the engine 21 such as the opening degree of the throttle valve and a shift state (clutch state) is detected by various sensors 20. The Detection signals of these sensors 20 including the engine speed sensor are supplied to the engine control unit 22. Thereby, the engine control unit 22 can detect the control state of the engine 21.

無線通信部１６は、アンテナ３５（図１参照）を介して音声指示装置１０との間で交信を行う。無線通信部１６は、音声指示装置１０から、船舶操作データを受信すると、この受信した船舶操作データを、船内ＬＡＮ１１を介して、処理・出力部１７に受け渡す。処理・出力部１７は、受け渡された船舶操作データに基づいて、船舶操作のための制御信号を船内ＬＡＮ１１を介して該当する機器に送信する。すなわち、処理・出力部１７は、船舶１に備えられた機器を船舶操作データに応じて作動させる機器制御手段として機能する。 The wireless communication unit 16 communicates with the voice instruction device 10 via the antenna 35 (see FIG. 1). When receiving the vessel operation data from the voice instruction device 10, the wireless communication unit 16 passes the received vessel operation data to the processing / output unit 17 via the inboard LAN 11. Based on the received ship operation data, the processing / output unit 17 transmits a control signal for ship operation to the corresponding device via the inboard LAN 11. That is, the processing / output unit 17 functions as a device control unit that operates a device provided in the ship 1 according to the ship operation data.

また、無線通信部１６は、音声指示装置１０から、船舶１の属性情報および環境情報の送信要求を受信して、これを船内ＬＡＮ１１を介して処理・出力部１７に受け渡す。処理・出力部１７は、この送信要求に応答して、データベース装置１２から船舶１の静的な属性情報を読み出し、さらに、エンジン制御ユニット２２からエンジン回転数などの動的な属性情報を取得する。さらに、処理・出力部１７は、環境情報センサ９から、船舶１の環境情報を取得する。これらの取得された属性情報および環境情報は、処理・出力部１７から、船内ＬＡＮ１１を介して無線通信部１６に受け渡され、この無線通信部１６から音声指示装置１０へと送信される。 In addition, the wireless communication unit 16 receives a request for transmission of the attribute information and environment information of the ship 1 from the voice instruction device 10, and passes the request to the processing / output unit 17 via the inboard LAN 11. In response to this transmission request, the processing / output unit 17 reads out the static attribute information of the ship 1 from the database device 12 and further acquires dynamic attribute information such as the engine speed from the engine control unit 22. . Further, the processing / output unit 17 acquires environmental information of the ship 1 from the environmental information sensor 9. The acquired attribute information and environment information are transferred from the processing / output unit 17 to the wireless communication unit 16 via the inboard LAN 11, and transmitted from the wireless communication unit 16 to the voice instruction device 10.

図４は、音声指示装置１０の電気的構成を説明するためのブロック図である。この音声指示装置１０は、話者８の発話する音声を２チャンネルで受音するための第１および第２マイクロフォン４１，４２と、通信装置４４と、演算処理ユニット４５と、ＩＣタグ６１〜６Ｍから位置識別情報を読み取るためのＩＣタグ読取装置４６とを、トランシーバ形の筐体内に一体的に備えている。通信装置４４は、アンテナ４４Ａを介して、船舶１側の無線通信部１６との間で電波を送受信する無線通信手段である。 FIG. 4 is a block diagram for explaining the electrical configuration of the voice instruction device 10. The voice instruction device 10 includes first and second microphones 41 and 42 for receiving voices spoken by the speaker 8 in two channels, a communication device 44, an arithmetic processing unit 45, and IC tags 61 to 6M. IC tag reader 46 for reading the position identification information from the inside of the transceiver type housing. The communication device 44 is a wireless communication unit that transmits and receives radio waves to and from the wireless communication unit 16 on the ship 1 side via the antenna 44A.

演算処理ユニット４５は、マイクロコンピュータおよびメモリを含む電子システムである。この演算処理ユニット４５は、第１および第２マイクロフォン４１，４２、通信装置４４およびＩＣタグ読取装置４６に接続されている。演算処理ユニット４５は、第１および第２マイクロフォン４１，４２によって受音された話者８の音声を認識する音声認識処理、ならびに通信装置４４から船舶１に向けて送信すべき船舶操作データを生成する船舶操作データ生成処理を主たる機能とするものである。演算処理ユニット４５は、これらの機能を、マイクロコンピュータにプログラムを実行させることによって実現するようになっている。 The arithmetic processing unit 45 is an electronic system including a microcomputer and a memory. The arithmetic processing unit 45 is connected to the first and second microphones 41 and 42, the communication device 44 and the IC tag reading device 46. The arithmetic processing unit 45 generates voice recognition processing for recognizing the voice of the speaker 8 received by the first and second microphones 41 and 42 and ship operation data to be transmitted from the communication device 44 to the ship 1. The main function is to perform ship operation data generation processing. The arithmetic processing unit 45 realizes these functions by causing a microcomputer to execute a program.

演算処理ユニット４５は、第１および第２マイクロフォン４１，４２が出力するアナログ音声信号をディジタル音声信号に変換するＡ／Ｄ（アナログ／ディジタル）変換部７１，７２と、このＡ／Ｄ変換部７１，７２が出力する音声信号に対して雑音除去処理を行う雑音除去処理部７３と、雑音除去処理後の音声信号に対して音声認識処理を行う音声認識処理部７４と、この音声認識処理部７４による処理結果（認識結果）に基づいて船舶操作データを生成する船舶操作データ生成部７５（輸送機器操作データ生成手段）と、通信装置４４を介する通信のためのデータ処理を実行する通信処理部７６とを備えている。 The arithmetic processing unit 45 includes A / D (analog / digital) converters 71 and 72 that convert analog audio signals output from the first and second microphones 41 and 42 into digital audio signals, and the A / D converter 71. , 72 performs a noise removal processing unit 73 for performing noise removal processing on the speech signal, a speech recognition processing unit 74 for performing speech recognition processing on the speech signal after the noise removal processing, and the speech recognition processing unit 74. A ship operation data generation unit 75 (transport equipment operation data generation means) that generates ship operation data based on the processing result (recognition result) by the communication processing unit 76 that executes data processing for communication via the communication device 44. And.

雑音除去処理部７３は、音声信号中に含まれる雑音部分を除去し、雑音部分を低減した音声信号を出力する。たとえば、スペクトルサブトラクション法や高域強調フィルタなどの雑音除去法を、雑音除去処理部７３での処理に適用できる。演算処理ユニット４５は、雑音除去処理部７３における雑音除去処理に適用可能な複数種類の雑音除去処理アルゴリズム７３１〜７３ｋ（ｋは２以上の自然数）を記憶した雑音除去アルゴリズムバッファ７７と、この雑音除去処理に適用されるアルゴリズムを切り換えるための雑音除去アルゴリズム切り換え部７８（雑音除去パラメータ切り換え手段）とを備えている。雑音除去アルゴリズム切り換え部７８は、ＩＣタグ読取装置４６によって読み取られた位置識別情報と、通信装置４４および通信処理部７６によって船内ＬＡＮ１１から取得される情報（属性情報および環境情報。輸送機器情報）とに基づいて、雑音除去処理部７３での雑音除去処理に適用されるアルゴリズムを切り換える。 The noise removal processing unit 73 removes a noise part included in the voice signal and outputs a voice signal with the noise part reduced. For example, a noise removal method such as a spectral subtraction method or a high-frequency emphasis filter can be applied to the processing in the noise removal processing unit 73. The arithmetic processing unit 45 includes a noise removal algorithm buffer 77 storing a plurality of types of noise removal processing algorithms 731 to 73k (k is a natural number of 2 or more) applicable to the noise removal processing in the noise removal processing unit 73, and the noise removal. A noise removal algorithm switching unit 78 (noise removal parameter switching means) for switching an algorithm applied to the processing is provided. The noise removal algorithm switching unit 78 includes position identification information read by the IC tag reading device 46, information acquired from the inboard LAN 11 by the communication device 44 and the communication processing unit 76 (attribute information and environment information, transport equipment information), and the like. Based on the above, the algorithm applied to the noise removal processing in the noise removal processing unit 73 is switched.

前記通信装置４４および通信処理部７６は、船舶１から属性情報を取得する属性情報取得手段としての機能と、船舶１から環境情報を取得する環境情報取得手段としての機能とを有する。
図５は、音声認識処理部７４の機能的な構成を説明するためのブロック図である。音声認識処理部７４は、雑音除去処理がなされた２チャンネル分の音声信号から、話者８の発話区間の音声信号を抽出する発話区間検出部９１（対象音抽出手段）と、この発話区間検出部９１によって抽出された音声信号を解析して、音響特徴を表す特徴ベクトルを生成する音響解析部９２と、この音響解析部９２によって生成された特徴ベクトルを予め準備してある音響モデルおよび認識辞書９８と照合し、認識結果データを生成する照合部９３とを備えている。 The communication device 44 and the communication processing unit 76 have a function as attribute information acquisition means for acquiring attribute information from the ship 1 and a function as environment information acquisition means for acquiring environment information from the ship 1.
FIG. 5 is a block diagram for explaining a functional configuration of the voice recognition processing unit 74. The voice recognition processing unit 74 extracts a voice signal of the utterance section of the speaker 8 from the two-channel voice signals subjected to the noise removal process, and this utterance section detection. An acoustic analysis unit 92 that analyzes the voice signal extracted by the unit 91 to generate a feature vector representing an acoustic feature, and an acoustic model and a recognition dictionary in which the feature vector generated by the acoustic analysis unit 92 is prepared in advance 98 and a verification unit 93 that generates verification result data.

発話区間検出部９１は、発話区間検出処理のパラメータ（発話検出パラメータ。対象音抽出パラメータ）を変更可能なものである。この発話検出パラメータを、ＩＣタグ読取装置４６によって読み取られた位置識別情報と、通信装置４４を介して船内ＬＡＮ１１から取得される情報（属性情報および環境情報）とに基づいて、複数種類に切り換えるために、発話検出パラメータ切り換え部９４が設けられている。この発話検出パラメータ切り換え部９４は、発話検出パラメータバッファ９５に記憶されている複数種類の発話検出パラメータ９１１〜９１ｍ（ｍは２以上の自然数）のなかからいずれかを選択して、発話区間検出部９１に設定するものであり、対象音抽出パラメータ切り換え手段としての機能を有する。 The utterance section detection unit 91 can change a parameter (speech detection parameter, target sound extraction parameter) of the utterance section detection process. To switch the utterance detection parameter to a plurality of types based on position identification information read by the IC tag reader 46 and information (attribute information and environment information) acquired from the inboard LAN 11 via the communication device 44 In addition, an utterance detection parameter switching unit 94 is provided. The utterance detection parameter switching unit 94 selects one of a plurality of types of utterance detection parameters 911 to 91m (m is a natural number of 2 or more) stored in the utterance detection parameter buffer 95, and an utterance section detection unit. 91 and has a function as a target sound extraction parameter switching means.

一方、照合部９３において照合される音響モデルを切り換えるために、音響モデル切り換え部９６が設けられている。この音響モデル切り換え部９６は、ＩＣタグ読取装置４６によって読み取られた位置識別情報と、通信装置４４を介して船内ＬＡＮ１１から取得される情報（属性情報および環境情報）とに基づいて、音響モデルを切り換える。この音響モデル切り換え部９６は、音響モデルバッファ９７に記憶されている複数種類の音響モデル９３１〜９３ｎ（ｎは２以上の自然数）のなかからいずれかを選択して、照合部９３における照合処理に適用する。 On the other hand, an acoustic model switching unit 96 is provided to switch the acoustic model to be collated by the collation unit 93. The acoustic model switching unit 96 selects an acoustic model based on the position identification information read by the IC tag reading device 46 and information (attribute information and environmental information) acquired from the inboard LAN 11 via the communication device 44. Switch. The acoustic model switching unit 96 selects one of a plurality of types of acoustic models 931 to 93n (n is a natural number of 2 or more) stored in the acoustic model buffer 97, and performs collation processing in the collation unit 93. Apply.

照合部９３は、音響解析部９２によって生成された特徴ベクトルを、音響モデル切り換え部９６によって選択される音響モデルと照合し、さらに認識辞書９８と照合することにより、認識結果データを生成する。
話者８は、船舶１内の様々な位置に移動し、音声指示装置１０を使用して船舶操作等を行う。このとき、話者８の周囲の騒音環境は、船舶１内における話者８の位置によって異なる。たとえば、エンジン音は、船尾に近づくほど大きく、また、操縦室５５またはキャビン５４よりもアウトデッキ５３やフライングデッキ５６の方が大きい。さらに、風切り音について言えば、操縦室５５およびキャビン５４よりも、アウトデッキ５３の方が大きく、さらに、アウトデッキ５３よりもフライングデッキ５６の方が大きい。 The collation unit 93 collates the feature vector generated by the acoustic analysis unit 92 with the acoustic model selected by the acoustic model switching unit 96 and further collates with the recognition dictionary 98 to generate recognition result data.
The speaker 8 moves to various positions in the ship 1 and uses the voice instruction device 10 to operate the ship. At this time, the noise environment around the speaker 8 varies depending on the position of the speaker 8 in the ship 1. For example, the engine sound is louder as it approaches the stern, and the out deck 53 and the flying deck 56 are louder than the cockpit 55 or the cabin 54. Further, regarding wind noise, the out deck 53 is larger than the cockpit 55 and the cabin 54, and the flying deck 56 is larger than the out deck 53.

そこで、この実施形態では、船舶１内における話者８の位置に応じて、雑音除去処理のパラメータ（アルゴリズム）を切り換え、さらに、音声認識処理のパラメータ（発話検出パラメータおよび音響モデル）を切り換えることにより、音声認識率の向上が図られている。
さらに、音声認識率は、船舶１の属性や環境状況によっても影響を受ける。船舶１の属性とは、前述の属性情報により表される船舶１の性質である。また、環境状況とは、前述の環境情報によって表される船舶１の周囲環境である。 Therefore, in this embodiment, the noise removal processing parameters (algorithms) are switched according to the position of the speaker 8 in the ship 1, and further the speech recognition processing parameters (speech detection parameters and acoustic models) are switched. The speech recognition rate is improved.
Furthermore, the voice recognition rate is also affected by the attributes of the ship 1 and the environmental conditions. The attribute of the ship 1 is the property of the ship 1 represented by the attribute information described above. The environmental situation is the surrounding environment of the ship 1 represented by the above-described environmental information.

データベース装置１２に蓄積されている静的な属性情報、エンジン回転数センサ等のセンサ類２０によって検出される動的な属性情報、および環境情報センサ９によって検出される環境情報は、船内ＬＡＮ１１から無線通信部１６に与えられる。これらの情報は、さらに、通信装置４４で受信されて、雑音除去アルゴリズム切り換え部７８および音声認識処理部７４へと受け渡される。これらの属性情報および環境情報に基づいて、雑音除去処理のパラメータ（アルゴリズム）が切り換えられ、かつ、音声認識処理のパラメータ（発話検出パラメータおよび音響モデル）が切り換えられるようになっている。これにより、音声認識率のさらなる向上が図られている。しかも、船舶１の大きさなどの属性情報をも取得するようにしているから、音声指示装置１０は、複数種類の船舶１に対して容易に適応して、良好な音声認識率を実現できる。 Static attribute information stored in the database device 12, dynamic attribute information detected by sensors 20 such as an engine speed sensor, and environment information detected by the environment information sensor 9 are wirelessly transmitted from the inboard LAN 11. It is given to the communication unit 16. These pieces of information are further received by the communication device 44 and transferred to the noise removal algorithm switching unit 78 and the speech recognition processing unit 74. Based on these attribute information and environment information, parameters (algorithms) for noise removal processing are switched, and parameters (speech detection parameters and acoustic models) for speech recognition processing are switched. Thereby, the speech recognition rate is further improved. Moreover, since the attribute information such as the size of the ship 1 is also acquired, the voice instruction device 10 can easily adapt to a plurality of types of ships 1 and realize a good voice recognition rate.

話者８は、音声による船舶操作指示を行う場合、ＩＣタグ読取装置４６を話者８の近くにあるＩＣタグ６１〜６Ｍにかざして、位置識別情報を読み取らせ、その後にマイクロフォン４１，４２に向かって船舶操作のための音声を発話する。話者８が船舶１内で移動しないかぎり、以後は、ＩＣタグ６１〜６Ｍの読取操作を要することなく、話者８は、マイクロフォン４１，４２に向かって音声を発話して差し支えない。 When the speaker 8 gives a ship operation instruction by voice, the IC tag reader 46 is held over the IC tags 61 to 6M near the speaker 8 so that the position identification information is read, and then the microphones 41 and 42 are read. Speak the sound for ship operation. As long as the speaker 8 does not move in the ship 1, the speaker 8 can speak to the microphones 41 and 42 without requiring the reading operation of the IC tags 61 to 6M.

ＩＣタグ６１〜６Ｍのいずれかの位置識別情報が読み取られると、これに応答して、通信処理部７６は、通信装置４４を介して、船舶１に対して、属性情報および環境情報の送信を要求する送信要求を与える。この送信要求は、船舶１に備えられた無線通信部１６に与えられる。無線通信部１６は、船内ＬＡＮ１１を介して、データベース装置１２およびエンジン制御ユニット２２から属性情報を取得し、さらに、環境情報センサ９から環境情報を取得する。そして、無線通信部１６は、これらの属性情報および環境情報を音声指示装置１０に向けて送信する。音声指示装置１０では、船舶１から送信されてくる属性情報および環境情報が通信装置４４によって受信され、通信処理部７６に受け渡される。 When the position identification information of any of the IC tags 61 to 6M is read, in response to this, the communication processing unit 76 transmits the attribute information and the environment information to the ship 1 via the communication device 44. Give the request to send. This transmission request is given to the wireless communication unit 16 provided in the ship 1. The wireless communication unit 16 acquires attribute information from the database device 12 and the engine control unit 22 via the inboard LAN 11, and further acquires environment information from the environment information sensor 9. Then, the wireless communication unit 16 transmits the attribute information and the environment information to the voice instruction device 10. In the voice instruction device 10, the attribute information and the environment information transmitted from the ship 1 are received by the communication device 44 and transferred to the communication processing unit 76.

こうして取得された位置識別情報、属性情報および環境情報は、雑音除去アルゴリズム切り換え部７８、発話検出パラメータ切り換え部９４および音響モデル切り換え部９６に与えられる。このようにして、騒音環境等に良好に適応した雑音除去処理および音声認識処理が行われるので、高い音声認識率を実現できる。しかも、音声信号から騒音環境を推定するような複雑な処理を要しないので、負荷の小さな処理で、音声認識処理の態様を即座に騒音環境に適応させることができる。 The position identification information, attribute information, and environment information acquired in this way are provided to the noise removal algorithm switching unit 78, the utterance detection parameter switching unit 94, and the acoustic model switching unit 96. Thus, noise removal processing and speech recognition processing that are well adapted to the noise environment and the like are performed, so that a high speech recognition rate can be realized. In addition, since complicated processing for estimating the noise environment from the voice signal is not required, it is possible to immediately adapt the mode of the voice recognition processing to the noise environment with a small processing load.

図６は、雑音除去処理部７３の構成例を示すブロック図である。この構成例では、雑音除去処理部７３は、複数の雑音除去フィルタ処理部を有している。より具体的には、雑音除去処理部７３は、第１フィルタ処理部としての高域強調フィルタ部８１，８２と、第２フィルタ処理部としてのスペクトルサブトラクションフィルタ部８３，８４とを備えている。スペクトルサブトラクションフィルタ部８３，８４における処理のために、高域強調フィルタ部８１とスペクトルサブトラクションフィルタ部８３との間、および高域強調フィルタ部８２とスペクトルサブトラクションフィルタ部８４との間には、それぞれ、前処理部８９および９０が介装されている。 FIG. 6 is a block diagram illustrating a configuration example of the noise removal processing unit 73. In this configuration example, the noise removal processing unit 73 has a plurality of noise removal filter processing units. More specifically, the noise removal processing unit 73 includes high-frequency enhancement filter units 81 and 82 as first filter processing units, and spectral subtraction filter units 83 and 84 as second filter processing units. For processing in the spectral subtraction filter units 83 and 84, between the high frequency enhancement filter unit 81 and the spectral subtraction filter unit 83, and between the high frequency enhancement filter unit 82 and the spectral subtraction filter unit 84, respectively. Pre-processing units 89 and 90 are interposed.

高域強調フィルタ部８１およびスペクトルサブトラクションフィルタ部８３は、第１マイクロフォン４１が出力する音声信号に対してフィルタ処理を行うものである。同様に、高域強調フィルタ部８２およびスペクトルサブトラクションフィルタ部８４は、第２マイクロフォン４２が出力する音声信号に対してフィルタ処理を行うものである。すなわち、第１および第２マイクロフォン４１，４２が出力する２チャンネルの音声信号は、２種類の雑音除去フィルタ処理（高域強調フィルタ処理およびスペクトルサブトラクションフィルタ処理）を受けることになる。より具体的には、各チャンネルの音声信号は、高域強調フィルタ処理を受けた後に、スペクトルサブトラクションフィルタ処理を受けるようになっている。 The high-frequency emphasis filter unit 81 and the spectral subtraction filter unit 83 perform filter processing on the audio signal output from the first microphone 41. Similarly, the high-frequency emphasis filter unit 82 and the spectral subtraction filter unit 84 perform filter processing on the audio signal output from the second microphone 42. That is, the two-channel audio signals output from the first and second microphones 41 and 42 are subjected to two types of noise removal filter processing (high-frequency emphasis filter processing and spectral subtraction filter processing). More specifically, the audio signal of each channel is subjected to spectral subtraction filter processing after being subjected to high-frequency emphasis filter processing.

高域強調フィルタ部８１，８２に対しては、そのフィルタ処理の有効／無効およびフィルタ処理の特性を調整するための係数γを可変設定することができる。また、スペクトルサブトラクションフィルタ部８３，８４に対しても、そのフィルタ処理の有効／無効およびフィルタ処理の特性を調整するための係数δ，εを可変設定することができる。すなわち、図６の構成例では、係数γ，δ，εを１組とした複数組の係数セットが雑音除去アルゴリズムバッファ７７に蓄積されており、これらのうちの適切な係数セットを選択して、雑音除去アルゴリズム切り換え部７８が雑音除去処理部７３に設定する。 For the high frequency emphasis filter sections 81 and 82, the coefficient γ for adjusting the validity / invalidity of the filter processing and the characteristics of the filter processing can be variably set. Further, the coefficients δ and ε for adjusting the validity / invalidity of the filter processing and the characteristics of the filter processing can be variably set for the spectral subtraction filter units 83 and 84 as well. That is, in the configuration example of FIG. 6, a plurality of sets of coefficients with one set of coefficients γ, δ, and ε are accumulated in the noise removal algorithm buffer 77, and an appropriate coefficient set is selected from among these sets. The noise removal algorithm switching unit 78 sets the noise removal processing unit 73.

高域強調フィルタ部８１，８２は、たとえば、下記第(1)式および第(2)式で表されるディジタルフィルタ処理をそれぞれ実行する。
ｙ₁(i)＝ｘ₁(i)−γｘ₁(i-1) ・・・・・・(1)
ｙ₂(i)＝ｘ₂(i)−γｘ₂(i-1) ・・・・・・(2)
ただし、第(1)式および第(2)式においては、Ａ／Ｄ変換部７１，７２によってｉ番目（ｉは自然数）にサンプリングされてディジタル化された２チャンネルの入力信号を、それぞれｘ₁(i)，ｘ₂(i)と表し、これらに対するフィルタ処理後の出力信号をｙ₁(i)，ｙ₂(i)と表してある。 The high frequency emphasizing filter units 81 and 82 execute digital filter processing represented by the following equations (1) and (2), for example.
y ₁ (i) = x ₁ (i) −γx ₁ (i−1) (1)
y ₂ (i) = x ₂ (i) −γx ₂ (i−1) (2)
However, in the equations (1) and (2), the two-channel input signals sampled and digitized by the i-th (i is a natural number) by the A / D converters 71 and 72 are respectively x _1. (i) and x ₂ (i) are represented, and the output signals after filtering are represented as y ₁ (i) and y ₂ (i).

係数γの値は、０．９０〜１．００とされることが多い。この係数γを「０」に設定すれば、高域強調フィルタ処理が無効化され、ｙ₁(i)＝ｘ₁(i)、ｙ₂(i)＝ｘ₁(i)となる。また、係数γの値を可変設定することで、フィルタ特性を調整できる。
スペクトルサブトラクションフィルタ部８３，８４は、たとえば、下記第(3)式および第(4)式で表されるディジタルフィルタ処理を実行する。 The value of the coefficient γ is often set to 0.90 to 1.00. If this coefficient γ is set to “0”, the high frequency emphasis filter processing is invalidated, and y ₁ (i) = x ₁ (i), y ₂ (i) = x ₁ (i). Further, the filter characteristic can be adjusted by variably setting the value of the coefficient γ.
The spectral subtraction filter units 83 and 84 execute digital filter processing represented by the following expressions (3) and (4), for example.

ただし、第(3)式および第(4)式において、Ｘ₁（ω），Ｘ₂（ω）は、２チャンネルの観測信号（図６の例では高域強調フィルタ処理後の音声信号）のパワースペクトル（周波数関数）をそれぞれ表し、Ｓ₁（ω），Ｓ₂（ω）は、それらに対するフィルタ処理後の出力信号を表す。また、Ｎ（ω）は雑音信号の推定パワースペクトルを表している。ωは角周波数を表す。 However, in the equations (3) and (4), X ₁ (ω) and X ₂ (ω) are two-channel observation signals (voice signals after high-frequency emphasis filter processing in the example of FIG. 6). Each represents a power spectrum (frequency function), and S ₁ (ω) and S ₂ (ω) represent output signals after filter processing on them. N (ω) represents the estimated power spectrum of the noise signal. ω represents an angular frequency.

係数δは減算（サブトラクション）係数とよばれ、δ≧０である。また、εはフロアリング係数とよばれ、０≦ε≦１である。第１および第２マイクロフォン４１，４２の周囲の環境に応じて、最適なδ，εの対が異なる。δ＝０（さらに必要に応じてε＝１）とすれば、スペクトルサブトラクションフィルタ処理が無効化される。また、係数δ，εの値を可変設定することで、フィルタ特性を調整できる。 The coefficient δ is called a subtraction coefficient, and δ ≧ 0. Ε is called a flooring coefficient, and 0 ≦ ε ≦ 1. Depending on the environment around the first and second microphones 41 and 42, the optimum pair of δ and ε is different. If δ = 0 (and ε = 1 if necessary), the spectral subtraction filter processing is invalidated. Also, the filter characteristics can be adjusted by variably setting the values of the coefficients δ and ε.

なお、スペクトルサブトラクション法については、たとえば、非特許文献４に開示がある。
次の表１は、高域強調フィルタの効果の例を示す実験結果である。この表１には、スペクトルサブトラクションフィルタ部８３，８４を無効とし、高域強調フィルタ部８１，８２の有効（γ＝１）／無効（γ＝０）を切り換えて音声認識率（単位％）を測定した結果が示されている。測定は、船舶１内の複数の場所において、エンジン回転数の複数の値について行われた。 The spectral subtraction method is disclosed in Non-Patent Document 4, for example.
Table 1 below shows experimental results showing examples of effects of the high-frequency emphasis filter. In Table 1, the spectral subtraction filter units 83 and 84 are disabled, and the high frequency emphasis filter units 81 and 82 are switched between valid (γ = 1) / invalid (γ = 0), and the speech recognition rate (unit%) is set. The measurement results are shown. The measurement was performed for a plurality of values of the engine speed at a plurality of locations in the ship 1.

キャビン内では、風の影響がなく、エンジン騒音も比較的小さい。アウトデッキ後方は、エンジンの前方に位置し、エンジン騒音が最も大きい。ただし、アウトデッキ後方では、キャビン等を形成するデッキハウス（甲板室）が風よけとなるため、風切り音は少ない。そして、フライングデッキは、アウトデッキ後方に比較するとエンジンからの距離が遠いが、風当たりが最も強い。 Inside the cabin, there is no wind effect and the engine noise is relatively low. The rear of the out deck is located in front of the engine and has the highest engine noise. However, behind the out deck, there is little wind noise because the deck house (the deck room) that forms the cabin and the like serves as a windbreak. The flying deck is far away from the engine compared to the rear of the out deck, but has the strongest wind.

表１から理解されるように、風の影響が強いところ（フライングデッキ）では、高域強調フィルタを使う方が音声認識率がよい。また、風の影響もエンジン騒音も小さいところ（キャビン：エンジン回転数＝500,2200、アウトデッキ：エンジン回転数＝500）では、高域強調フィルタの有無はさほど関係がない。逆に、風の影響はほとんどなくエンジン騒音のみが大きくなるところ（キャビン：エンジン回転数＝2800、アウトデッキ：エンジン回転数＝2200,2800）では、高域強調フィルタ処理を使うと、音声認識率が悪くなる。 As understood from Table 1, in a place where the influence of wind is strong (flying deck), the speech recognition rate is better when the high frequency emphasis filter is used. Also, where the influence of wind and engine noise are small (cabin: engine speed = 500,2200, outdeck: engine speed = 500), the presence or absence of the high-frequency emphasis filter is not so relevant. On the other hand, in places where there is almost no wind influence and only engine noise is large (cabin: engine speed = 2800, out deck: engine speed = 2200, 2800), using the high frequency enhancement filter processing, the speech recognition rate Becomes worse.

次の表２は、スペクトルサブトラクションフィルタの効果の例を示す実験結果である。この表２には、高域強調フィルタ部８１，８２を無効とし、スペクトルサブトラクションフィルタ部８３，８４の有効（たとえば、δ＝２．０、ε＝０．４）／無効（たとえば、δ＝０、ε＝１）を切り換えて音声認識率（単位％）を測定した結果が示されている。測定は、船舶１内の複数の場所において、エンジン回転数の複数の値について行われた。 Table 2 below shows experimental results showing examples of effects of the spectral subtraction filter. In Table 2, the high-frequency emphasis filter units 81 and 82 are disabled, and the spectral subtraction filter units 83 and 84 are enabled (for example, δ = 2.0, ε = 0.4) / disabled (for example, δ = 0). , Ε = 1), and the result of measuring the voice recognition rate (unit%) is shown. The measurement was performed for a plurality of values of the engine speed at a plurality of locations in the ship 1.

エンジンによる騒音はほぼ定常雑音とみることができる。表２から理解されるように、キャビンやアウトデッキでは、エンジン騒音が大きいとき（エンジン回転数＝2800）に、スペクトルサブトラクションフィルタの効果がある。しかし、これらの場所では、エンジン騒音が小さいとき（エンジン回転数＝500）、逆に音声認識率が下がっている。フライングデッキでは、エンジン回転数に無関係に、スペクトルサブトラクションフィルタの効果は見られない。 Noise from the engine can be regarded as almost stationary noise. As can be understood from Table 2, the spectral subtraction filter has an effect in the cabin and the out deck when the engine noise is large (engine speed = 2800). However, in these places, when the engine noise is low (engine speed = 500), the speech recognition rate is lowered. In the flying deck, the effect of the spectral subtraction filter is not seen regardless of the engine speed.

前記表１および表２に示された実験結果から、船舶１内の場所、エンジンの状態および環境条件によって、最適な雑音除去アルゴリズム（複数の雑音除去アルゴリズムの最適な組み合わせ、および最適な可変係数）が異なることがわかる。
次の表３は、船舶１内における話者８の位置およびエンジン回転数と雑音除去アルゴリズムとの対応例を示す。 From the experimental results shown in Table 1 and Table 2, the optimum noise removal algorithm (the optimum combination of a plurality of noise removal algorithms and the optimum variable coefficient) depends on the location in the ship 1, the state of the engine, and the environmental conditions. Are different.
Table 3 below shows a correspondence example between the position of the speaker 8 and the engine speed in the ship 1 and the noise removal algorithm.

雑音除去アルゴリズム切り換え部７８は、たとえば、表３に示すようなテーブル情報を記憶したメモリ７８Ａ（図４参照）を備えている。このメモリ７８Ａには、様々な状況での実験を行い、個々の状況に対応して、当該状況において最適な雑音除去アルゴリズムの識別情報が予め格納されている。
表３の例の場合、雑音除去アルゴリズム切り換え部７８は、ＩＣタグ読取装置４６から提供される位置識別情報と、通信処理部７６を介して船舶１側から受信されるエンジン回転数とに基づいて、前記メモリ内のテーブルから、対応する雑音除去アルゴリズムの識別情報を特定する。この特定された識別情報に対応する雑音除去アルゴリズム（係数γ，δ，εの組）が雑音除去アルゴリズムバッファ７７から読み出され、雑音除去処理部７３に設定される。すなわち、話者８の船舶１内での位置とエンジン回転数とが一意に決定されれば、その状況に最適な１つの雑音除去アルゴリズムが、雑音除去アルゴリズムバッファ７７から選択される。 The noise removal algorithm switching unit 78 includes, for example, a memory 78A (see FIG. 4) that stores table information as shown in Table 3. In this memory 78A, experiments in various situations are performed, and identification information of an optimum noise removal algorithm is stored in advance corresponding to each situation.
In the case of the example in Table 3, the noise removal algorithm switching unit 78 is based on the position identification information provided from the IC tag reader 46 and the engine speed received from the ship 1 side via the communication processing unit 76. The identification information of the corresponding denoising algorithm is specified from the table in the memory. A noise removal algorithm (a set of coefficients γ, δ, and ε) corresponding to the identified identification information is read from the noise removal algorithm buffer 77 and set in the noise removal processing unit 73. That is, if the position of the speaker 8 in the ship 1 and the engine speed are uniquely determined, one noise removal algorithm optimal for the situation is selected from the noise removal algorithm buffer 77.

むろん、エンジン回転数以外にも船舶１の属性情報および／または環境情報を用いて雑音除去アルゴリズムを切り換える場合には、それらに応じた雑音除去アルゴリズムの切り換えを実現するためのテーブル情報が用意される。
図６に示されているように、前処理部８９は、高域強調フィルタ部８１によって処理されたディジタル音声信号をそれぞれフレーム化するフレーム化部８５と、このフレーム化部８５によってフレーム化された音声信号の周波数分析処理を行う周波数分析部８７とを備えている。同様に、前処理部９０は、高域強調フィルタ部８２によって処理されたディジタル音声信号をそれぞれフレーム化するフレーム化部８６と、このフレーム化部８６によってフレーム化された音声信号の周波数分析処理を行う周波数分析部８８とを備えている。 Of course, when the noise removal algorithm is switched using the attribute information and / or the environment information of the ship 1 in addition to the engine speed, table information for realizing the switching of the noise removal algorithm corresponding to them is prepared. .
As shown in FIG. 6, the preprocessing unit 89 framing the digital audio signal processed by the high-frequency emphasis filter unit 81 into frames, and the framing unit 85 framed them. And a frequency analysis unit 87 for performing frequency analysis processing of the audio signal. Similarly, the pre-processing unit 90 framing the digital audio signal processed by the high frequency emphasis filter unit 82, respectively, and frequency analysis processing of the audio signal framed by the framing unit 86. And a frequency analysis unit 88 for performing the operation.

Ａ／Ｄ変換部７１，７２（図４参照）は、第１および第２マイクロフォン４１，４２からの２チャンネルの音声信号ｘ₁(t)，ｘ₂(t)（ｔは時間を表す。）を、所定時間間隔Δｔでそれぞれサンプリングする。このサンプリングされた音声信号ｘ₁(i)，ｘ₂(i)に、たとえば、高域強調フィルタ処理を施して得られた信号ｙ₁(i)，ｙ₂(i)が、フレーム化部８５，８６に入力される。フレーム化部８５，８６は、所定のサンプル数の音声信号ｙ₁(i)，ｙ₂(i)を次々にフレーム化していく。 The A / D converters 71 and 72 (see FIG. 4) are two-channel audio signals x ₁ (t) and x ₂ (t) from the first and second microphones 41 and 42 (t represents time). Are sampled at predetermined time intervals Δt. For example, signals y ₁ (i) and y ₂ (i) obtained by subjecting the sampled audio signals x ₁ (i) and x ₂ (i) to high-frequency emphasis filter processing are converted into a framing unit 85. , 86. The framing units 85 and 86 frame the audio signals y ₁ (i) and y ₂ (i) of a predetermined number of samples one after another.

周波数分析部８７，８８は、フレーム単位で、音声信号ｙ₁(i)，ｙ₂(i)を高速フーリエ変換（ＦＦＴ：Fast Fourier Transform）により周波数分析して、周波数関数Ｘ₁(ω)，Ｘ₂(ω)を生成し、スペクトルサブトラクションフィルタ部８３，８４に受け渡す。
図７は、発話区間検出部９１の構成例を説明するためのブロック図である。発話区間検出部９１は、この実施形態では、前記非特許文献１に開示された技術を用いている。すなわち、高域強調フィルタ処理を経た２チャンネルの音声信号ｙ₁(i)，ｙ₂(i)のクロススペクトルに基づいて発話区間が検出される。 The frequency analyzers 87 and 88 perform frequency analysis on the audio signals y ₁ (i) and y ₂ (i) by Fast Fourier Transform (FFT) in units of frames to obtain frequency functions X ₁ (ω), X ₂ (ω) is generated and passed to the spectral subtraction filter units 83 and 84.
FIG. 7 is a block diagram for explaining a configuration example of the utterance section detection unit 91. In this embodiment, the utterance section detection unit 91 uses the technique disclosed in Non-Patent Document 1. That is, the speech section is detected based on the cross spectrum of the two-channel audio signals y ₁ (i) and y ₂ (i) that have undergone the high-frequency emphasis filter processing.

第２マイクロフォン４２で受音した音声に対応する音声信号ｙ₂(i)が、第１マイクロフォン４１で受音した音声に対応する音声信号ｙ₁(i)の時間移動波形であると仮定する。すると、２つの音声信号ｙ₁(i)，ｙ₂(i)間の遅延時間をｉ₀・Δｔとすると、音声信号ｙ₂(i)は、下記第(5)式で表すことができる。
ｙ₂(i)＝ｙ₁(i−i₀) ・・・・・・(5)
この場合、周波数関数Ｓ₁(ω)，Ｓ₂(ω)の関係は、下記第(6)式で表される。 Assume that the audio signal y ₂ (i) corresponding to the sound received by the second microphone 42 is a time movement waveform of the audio signal y ₁ (i) corresponding to the sound received by the first microphone 41. Then, if the delay time between the _two audio signals y ₁ (i) and y ₂ (i) is i ₀ · Δt, the audio signal y ₂ (i) can be expressed by the following equation (5).
y ₂ (i) = y ₁ (i−i ₀ ) (5)
In this case, the relationship between the frequency functions S ₁ (ω) and S ₂ (ω) is expressed by the following equation (6).

一方、非発話区間の或るフレームについて、クロススペクトルＧ(ω)の位相の周波数に対する変化を二次元平面にプロットすると、たとえば、図８(b)に示すとおりとなる。すなわち、非発話区間では、クロススペクトルＧ(ω)の位相成分は、周波数に対して一定のトレンドを持たない。これは、非発話区間において第１および第２マイクロフォン４１，４２が受音するノイズは、その位相がランダムだからである。 On the other hand, when a change with respect to the frequency of the phase of the cross spectrum G (ω) is plotted on a two-dimensional plane for a certain frame in the non-speech interval, for example, the result is as shown in FIG. That is, in the non-utterance period, the phase component of the cross spectrum G (ω) does not have a constant trend with respect to the frequency. This is because the phase of the noise received by the first and second microphones 41 and 42 in the non-speech section is random.

そこで、クロススペクトルＧ(ω)の位相成分の周波数に対する傾きが一定の傾向を持つかどうかを調べることにより、発話区間のフレームと非発話区間のフレームとを判別できる。
このような処理を実行するために、発話区間検出部９１は、クロススペクトル計算部１００と、位相成分抽出部１０１と、位相アンラップ（unwrap）処理部１０２と、主計算部１０３とを備えている。 Therefore, by examining whether the slope of the phase component of the cross spectrum G (ω) with respect to the frequency has a constant tendency, it is possible to discriminate between the frames in the speech zone and the frames in the non-speech zone.
In order to execute such processing, the utterance section detection unit 91 includes a cross spectrum calculation unit 100, a phase component extraction unit 101, a phase unwrap processing unit 102, and a main calculation unit 103. .

クロススペクトル計算部１００は、雑音除去処理部７３（スペクトルサブトラクションフィルタ部８３，８４）から与えられる周波数関数Ｓ₁(ω)，Ｓ₂(ω)に対して前記第(7)式による演算を施してクロススペクトルＧ(ω)を求める。位相成分抽出部１０１は、クロススペクトル計算部１００が求めたクロススペクトルＧ(ω)の位相を検出（抽出）する。この検出結果は、位相アンラップ処理部１０２に与えられる。位相アンラップ処理部１０２は、検出された位相に基づき、クロススペクトルＧ(ω)に対して、位相の不連続部を平滑化するためのアンラップ処理を行う。 The cross spectrum calculation unit 100 performs an operation according to the expression (7) on the frequency functions S ₁ (ω) and S ₂ (ω) given from the noise removal processing unit 73 (spectral subtraction filter units 83 and 84). To obtain a cross spectrum G (ω). The phase component extraction unit 101 detects (extracts) the phase of the cross spectrum G (ω) obtained by the cross spectrum calculation unit 100. This detection result is given to the phase unwrap processing unit 102. The phase unwrap processing unit 102 performs unwrap processing for smoothing the discontinuous portion of the phase on the cross spectrum G (ω) based on the detected phase.

主計算部１０３は、位相アンラップ処理部１０２からアンラップ処理されたクロススペクトルＧ(ω)が与えられる周波数帯域分割部１０５と、この周波数帯域分割部１０５が生成するＮ個の周波数セグメントが与えられる第１〜第Ｎ傾き計算部１０６−１〜１０６−Ｎ（以下、総称するときには「第１〜第Ｎ傾き計算部１０６」という。）と、この第１〜第Ｎ傾き計算部１０６がそれぞれ求める傾きの頻度に関するヒストグラムを生成するヒストグラム計算部１０７とを備えている。そして、発話区間検出部９１には、さらに、ヒストグラム計算部１０７が生成するヒストグラムに基づいて各フレームが発話区間に属するか非発話区間に属するかを判定する発話区間判定部１０８と、音声信号入力オン／オフ制御部１０９とが備えられている。 The main calculation unit 103 is provided with a frequency band division unit 105 to which the cross spectrum G (ω) subjected to the unwrap processing from the phase unwrap processing unit 102 is provided, and N frequency segments generated by the frequency band division unit 105 are provided. 1st to Nth slope calculation units 106-1 to 106-N (hereinafter, collectively referred to as "first to Nth slope calculation units 106") and slopes obtained by the first to Nth slope calculation units 106, respectively. A histogram calculation unit 107 that generates a histogram related to the frequency of Further, the speech segment detection unit 91 further includes a speech segment determination unit 108 that determines whether each frame belongs to a speech segment or a non-speech segment based on the histogram generated by the histogram calculation unit 107, and an audio signal input. An on / off control unit 109 is provided.

発話区間検出部９１は、さらに、雑音除去処理部７３から与えられる２チャンネルの周波数関数Ｓ₁（ω），Ｓ₂（ω）のうちの１チャンネルの周波数関数Ｓ₁（ω）に逆フーリエ変換を施して音声信号ｚ(i)を生成する逆フーリエ変換部１０４を備えている。この逆フーリエ変換部１０４が生成する音声信号ｚ(i)が音声信号入力オン／オフ制御部１０９に与えられる。 The utterance section detection unit 91 further performs inverse Fourier transform to one channel frequency function S ₁ (ω) of the two channel frequency functions S ₁ (ω) and S ₂ (ω) given from the noise removal processing unit 73. And an inverse Fourier transform unit 104 that generates an audio signal z (i). The audio signal z (i) generated by the inverse Fourier transform unit 104 is supplied to the audio signal input on / off control unit 109.

周波数帯域分割部１０５は、クロススペクトルＧ(ω)の位相成分を小さな周波数セグメント（Ｎ個のセグメント）に分割（帯域分割）する。第１〜第Ｎ傾き計算部１０６は、周波数帯域分割部１０５によって分割されたＮ個の周波数セグメントに関して、たとえば最小自乗法を用いて、それぞれ、位相成分の周波数に対する傾きを計算する。最小自乗法によってセグメント毎に傾きを求めるには、たとえば、前記非特許文献２に記載されているような公知の技術を適用できる。 The frequency band division unit 105 divides (band division) the phase component of the cross spectrum G (ω) into small frequency segments (N segments). The first to Nth slope calculation units 106 calculate the slopes of the phase components with respect to the frequency, for example, using the least square method for the N frequency segments divided by the frequency band division unit 105. In order to obtain the inclination for each segment by the method of least squares, for example, a known technique as described in Non-Patent Document 2 can be applied.

図９(a)は、発話区間のフレームに関してヒストグラム計算部１０７が求めたヒストグラムの例であり、図９(b)は非発話区間のフレームに関してヒストグラム計算部１０７が求めたヒストグラムの例である。これらは、セグメント毎に得られた傾きについてのヒストグラムである。すなわち、図９(a)および図９(b)には、位相の傾きの分布が示されており、全セグメント数に対する、各傾きのセグメントの数の割合（すなわち正規化された頻度）が百分率で表されている。 FIG. 9A is an example of a histogram obtained by the histogram calculation unit 107 for a frame in an utterance interval, and FIG. 9B is an example of a histogram obtained by the histogram calculation unit 107 for a frame in a non-utterance interval. These are histograms for the slopes obtained for each segment. That is, FIGS. 9 (a) and 9 (b) show the distribution of the slope of the phase, and the ratio of the number of segments of each slope (that is, the normalized frequency) to the total number of segments is a percentage. It is represented by

図９(a)および図９(b)の比較から明らかなように、発話区間フレームでは、ヒストグラム中に明確なピーク値が現れる。すなわち、位相の傾きがごく狭い範囲に局在している。これに対して、非発話区間フレームでは、ヒストグラムが平滑な形状となり、明確なピークがなく、位相の傾きが広い範囲に分布している。
発話区間判定部１０８は、たとえば、ヒストグラム計算部１０７によって求められたヒストグラムにおける各位相の傾きの頻度に基づいて、各フレームが発話区間フレームか非発話区間フレームかを判定する。すなわち、たとえば、位相傾きのピークの理論値ｐを中心とした所定範囲ｐ±α内の位相傾きの出現頻度が所定のしきい値β（％）以上の場合には、そのフレームは発話区間に属すると判定され、その出現頻度が当該しきい値未満の場合には、そのフレームは非発話区間に属すると判定される。理論値ｐは、音源（発話位置）から第１および第２マイクロフォン４１，４２までの距離が等しい場合には、「０」となる。 As is clear from the comparison between FIG. 9A and FIG. 9B, a clear peak value appears in the histogram in the speech interval frame. That is, the phase gradient is localized in a very narrow range. On the other hand, in the non-speech section frame, the histogram has a smooth shape, no clear peak, and the phase gradient is distributed over a wide range.
For example, the utterance section determination unit 108 determines whether each frame is an utterance section frame or a non-speech section frame based on the frequency of inclination of each phase in the histogram obtained by the histogram calculation section 107. That is, for example, when the appearance frequency of the phase inclination within the predetermined range p ± α centered on the theoretical value p of the peak of the phase inclination is equal to or higher than the predetermined threshold value β (%), the frame is included in the speech section. If the appearance frequency is less than the threshold value, the frame is determined to belong to a non-speech segment. The theoretical value p is “0” when the distances from the sound source (speech position) to the first and second microphones 41 and 42 are equal.

前記α，βは、発話区間検出処理に適用される発話検出パラメータの例であり、これらのパラメータは可変設定可能とされていて、発話検出パラメータ切り換え部９４（図５参照）によって複数種類の値に切り換えられるようになっている。
音声信号入力オン／オフ制御部１０９は、発話区間判定部１０８の判定結果を受けて、オン状態とオフ状態とで切り換わる。オン状態とは、逆フーリエ変換部１０４が生成するフレーム単位の音声信号ｚ(i)を音響解析部９２（図５参照）へ通過させる状態である。オフ状態とは、逆フーリエ変換部１０４が生成するフレーム単位の音声信号ｚ(i)の音響解析部９２への通過を阻止する状態である。音声信号入力オン／オフ制御部１０９は、発話区間判定部１０８によって音声信号フレームが発話区間に属すると判定されるとオン状態となり、発話区間判定部１０８によって音声信号フレームが非発話区間に属すると判定されるとオフ状態となる。したがって、音響解析部９２には、発話区間の音声信号フレームのみが供給されることになる。 Α and β are examples of utterance detection parameters applied to the utterance section detection process. These parameters can be variably set, and a plurality of kinds of values are set by the utterance detection parameter switching unit 94 (see FIG. 5). Can be switched to.
The voice signal input on / off control unit 109 switches between an on state and an off state in response to the determination result of the utterance section determination unit 108. The on state is a state in which the frame-unit audio signal z (i) generated by the inverse Fourier transform unit 104 is passed to the acoustic analysis unit 92 (see FIG. 5). The off state is a state in which the frame unit audio signal z (i) generated by the inverse Fourier transform unit 104 is prevented from passing to the acoustic analysis unit 92. The speech signal input on / off control unit 109 is turned on when the speech segment determination unit 108 determines that the speech signal frame belongs to the speech segment, and the speech signal determination unit 108 determines that the speech signal frame belongs to the non-speech segment. When judged, it is turned off. Therefore, only the audio signal frame in the utterance section is supplied to the acoustic analysis unit 92.

なお、音響解析部９２において、周波数分析部８７および周波数分析部８８と同スペックの高速フーリエ変換（ＦＦＴ）を行うのであれば、音声信号z(i)の代わりに、周波数関数Ｓ₁（ω）を、音声信号入力オン／オフ制御部１０９を介して音響解析部９２に与えてもよい。この場合には、逆フーリエ変換部１０４は不要である。
次の表４は、船舶１内の複数の場所において、エンジン回転数の複数の値について、最適なパラメータα，βを求めた実験結果を示す。 If the acoustic analysis unit 92 performs fast Fourier transform (FFT) with the same specifications as the frequency analysis unit 87 and the frequency analysis unit 88, the frequency function S ₁ (ω) is used instead of the audio signal z (i). May be provided to the acoustic analysis unit 92 via the audio signal input on / off control unit 109. In this case, the inverse Fourier transform unit 104 is not necessary.
Table 4 below shows experimental results for obtaining optimum parameters α and β for a plurality of values of the engine speed at a plurality of locations in the ship 1.

この表４から、船舶１内の位置およびエンジンの回転数に応じて最適なパラメータα，βが異なることが理解される。したがって、さらに、船舶１の属性や船舶１の置かれた環境によっても、最適なパラメータα，βが存在することが予想される。
発話区間検出処理には、音声と雑音を判別する他の方法を適用することができる。たとえば、零交差数（単位時間の信号波形が零点を交差する回数）と波形の振幅レベルとに基づいて音声と雑音とを判別する方法が知られている。より具体的には、入力波形の振幅レベルに対するしきい値が、単位時間あたりの零交差数に応じて予め定められる。そして、入力波形の零交差数および振幅レベルを検出し、振幅レベルが零交差数に対応したしきい値を超える場合に、その入力信号が発話区間の信号であると判別される。この方法においても、最適なしきい値を、騒音環境に応じて調整することで、発話区間検出精度を高めることができる。すなわち、当該しきい値を発話検出パラメータとして、船舶１内における話者８の位置やエンジン回転数などに応じて可変設定することで、精度の高い発話区間検出処理が可能になる。 From Table 4, it is understood that the optimum parameters α and β differ depending on the position in the ship 1 and the engine speed. Therefore, it is expected that optimum parameters α and β exist depending on the attributes of the ship 1 and the environment in which the ship 1 is placed.
Other methods for discriminating speech and noise can be applied to the speech segment detection processing. For example, a method is known in which speech and noise are discriminated based on the number of zero crossings (the number of times a signal waveform of unit time crosses zero) and the amplitude level of the waveform. More specifically, a threshold for the amplitude level of the input waveform is determined in advance according to the number of zero crossings per unit time. Then, the number of zero crossings and the amplitude level of the input waveform are detected, and when the amplitude level exceeds a threshold corresponding to the number of zero crossings, it is determined that the input signal is a signal in the utterance section. Also in this method, the utterance section detection accuracy can be improved by adjusting the optimum threshold according to the noise environment. That is, by setting the threshold value as an utterance detection parameter variably according to the position of the speaker 8 in the ship 1, the engine speed, and the like, highly accurate utterance section detection processing can be performed.

次の表５は、船舶１内における話者８の位置およびエンジン回転数と発話検出パラメータとの対応例を示す。 Table 5 below shows a correspondence example between the position of the speaker 8 and the engine speed in the ship 1 and the speech detection parameter.

発話検出パラメータ切り換え部９４は、たとえば、表５に示すようなテーブル情報を記憶したメモリ９４Ａ（図５参照）を備えている。このメモリ９４Ａには、様々な状況での実験を行い、個々の状況に対応して、当該状況において最適な発話検出パラメータの識別情報が予め格納されている。
表５の例の場合、発話検出パラメータ切り換え部９４は、ＩＣタグ読取装置４６から提供される位置識別情報と、通信処理部７６を介して船舶１側から受信されるエンジン回転数とに基づいて、前記メモリ９４Ａ内のテーブルから、対応する発話検出パラメータの識別情報を特定する。この特定された識別情報に対応する発話検出パラメータ（最適なパラメータα，βの組）が発話検出パラメータバッファ９５から読み出され、発話区間検出部９１に設定される。 The speech detection parameter switching unit 94 includes, for example, a memory 94A (see FIG. 5) that stores table information as shown in Table 5. In the memory 94A, experiments in various situations are performed, and the identification information of the optimum speech detection parameter in the situation is stored in advance corresponding to each situation.
In the case of the example of Table 5, the speech detection parameter switching unit 94 is based on the position identification information provided from the IC tag reader 46 and the engine speed received from the ship 1 side via the communication processing unit 76. The identification information of the corresponding speech detection parameter is specified from the table in the memory 94A. The utterance detection parameter (the optimum set of parameters α and β) corresponding to the identified identification information is read from the utterance detection parameter buffer 95 and set in the utterance section detection unit 91.

むろん、エンジン回転数以外にも船舶１の属性情報および／または環境情報を用いて発話検出パラメータを切り換える場合には、それらに応じた発話検出パラメータの切り換えを実現するためのテーブル情報が用意される。
次に、音声認識の原理について説明する。
音響解析部９２は、発話区間のフレームの音声信号を解析して、音響的特徴を表す特徴ベクトルｘを生成する。この特徴ベクトルｘが、照合部９３によって、音響モデルおよび認識辞書と照合される。 Of course, when the utterance detection parameters are switched using the attribute information and / or environment information of the ship 1 in addition to the engine speed, table information for realizing the switching of the utterance detection parameters in accordance with them is prepared. .
Next, the principle of speech recognition will be described.
The acoustic analysis unit 92 analyzes the speech signal of the frame in the utterance section, and generates a feature vector x representing an acoustic feature. This feature vector x is collated by the collation unit 93 with the acoustic model and the recognition dictionary.

音響モデルは、音声の音響的特徴をモデル化したものであり、発話された音声の部分的または全体的な特徴量系列との類似性の評価を行うための参照情報である。たとえば、認識対象となる単語（または音素）ｗから特徴ｘが観測される確率ｐ(x|w)が、音響モデルから得られる。
認識辞書（単語辞書または言語モデル）は、音響モデルの接続に関する制約を与えるための情報である。このような情報の典型は、或る単語に引き続いて別の単語が出現する確率である。たとえば、単語（または音素）ｗが出現する確率ｐ(w)が、認識辞書から得られる。 The acoustic model is obtained by modeling the acoustic features of speech, and is reference information for evaluating similarity to a partial or entire feature amount sequence of spoken speech. For example, the probability p (x | w) that the feature x is observed from the word (or phoneme) w to be recognized is obtained from the acoustic model.
The recognition dictionary (word dictionary or language model) is information for giving a restriction on connection of an acoustic model. A typical example of such information is a probability that another word appears after a certain word. For example, the probability p (w) that the word (or phoneme) w appears is obtained from the recognition dictionary.

照合部９３は、特徴ベクトルｘに対して、尤もらしい単語（または音素）ｗ′を、下記第(8)式によって求め、これを認識結果として出力する。
ｗ′＝ａｒｇｍａｘ_wｐ(x|w)ｐ(w) ・・・・・・(8)
音声認識率を向上するには、実際に音声認識を使用する騒音環境中で音響モデルを作成することが好ましい。 The matching unit 93 obtains a probable word (or phoneme) w ′ for the feature vector x by the following equation (8) and outputs it as a recognition result.
w ′ = arg max _w p (x | w) p (w) (8)
In order to improve the speech recognition rate, it is preferable to create an acoustic model in a noise environment that actually uses speech recognition.

そこで、この実施形態では、船舶１内における話者８の複数の位置に関して、船舶１の属性情報および環境情報に対応した複数の音響モデル９３１〜９３ｎが予め実験によって作成され、音響モデルバッファ９７に格納されている。
次の表６は、船舶１内における話者８の位置およびエンジン回転数と音響モデルとの対応例を示す。 Therefore, in this embodiment, a plurality of acoustic models 931 to 93n corresponding to the attribute information and the environment information of the ship 1 are created in advance for the plurality of positions of the speaker 8 in the ship 1 and stored in the acoustic model buffer 97. Stored.
Table 6 below shows a correspondence example between the position of the speaker 8 and the engine speed in the ship 1 and the acoustic model.

音響モデル切り換え部９６は、たとえば、表６に示すようなテーブル情報を記憶したメモリ９６Ａ（図５参照）を備えている。このメモリ９６Ａには、個々の状況に対応して、当該状況において最適な音響モデルの識別情報が予め格納されている。
表６の例の場合、音響モデル切り換え部９６は、ＩＣタグ読取装置４６から提供される位置識別情報と、通信処理部７６を介して船舶１側から受信されるエンジン回転数とに基づいて、前記メモリ９６Ａ内のテーブルから、対応する音響モデルの識別情報を特定する。この特定された識別情報に対応する音響モデルが音響モデルバッファ９７から読み出され、照合部９３における照合処理に適用される。 The acoustic model switching unit 96 includes, for example, a memory 96A (see FIG. 5) that stores table information as shown in Table 6. The memory 96A stores in advance the identification information of the acoustic model that is optimal in the situation corresponding to each situation.
In the case of the example of Table 6, the acoustic model switching unit 96 is based on the position identification information provided from the IC tag reader 46 and the engine speed received from the ship 1 side via the communication processing unit 76. The identification information of the corresponding acoustic model is specified from the table in the memory 96A. The acoustic model corresponding to the identified identification information is read from the acoustic model buffer 97 and applied to the matching process in the matching unit 93.

むろん、エンジン回転数以外にも船舶１の属性情報および／または環境情報を用いて音響モデルを切り換える場合には、それらに応じた音響モデルの切り換えを実現するためのテーブル情報が用意される。
次に、認識結果データが受け渡される船舶操作データ生成部７５（図４参照）の働きについて概説する。 Of course, when the acoustic model is switched using the attribute information and / or the environment information of the ship 1 in addition to the engine speed, table information for realizing the switching of the acoustic model according to them is prepared.
Next, the operation of the ship operation data generation unit 75 (see FIG. 4) to which the recognition result data is transferred will be outlined.

音声認識処理部７４から認識結果データが受け渡されると、船舶操作データ生成部７５は、それが船舶操作コマンドである場合に、それに対応した船舶操作データを生成して通信処理部７６に受け渡す。
船舶操作コマンドは、たとえば、操船操作のためのコマンド（操船コマンド）および付属機器の操作のためのコマンド（付属機器操作コマンド）を含む。 When the recognition result data is transferred from the voice recognition processing unit 74, the ship operation data generating unit 75 generates ship operation data corresponding to the ship operation command and transfers it to the communication processing unit 76 when it is a ship operation command. .
The marine vessel operation command includes, for example, a command for marine vessel maneuvering operation (marine maneuvering command) and a command for manipulating auxiliary equipment (attachment equipment maneuvering command).

操船コマンドの例は次のとおりである。
「ゆっくり前」・・・船舶を微速前進させる命令
「前」・・・船舶を低速前進させる命令
「うしろ」・・・船舶を後進させる命令
「とまれ」・・・船舶を停止（停船）させる命令
「小さく右」・・・第１操舵角度で右に舵をとらせる命令
「右」・・・第２操舵角度（＞第１操舵角度）で右に舵をとらせる命令
「大きく右」・・・第３操舵角度（＞第２操舵角度）で右に舵をとらせる命令
「小さく左」・・・第４操舵角度で左に舵をとらせる命令
「左」・・・第５操舵角度（＞第４操舵角度）で左に舵をとらせる命令
「大きく左」・・・第６操舵角度（＞第５操舵角度）で左に舵をとらせる命令
たとえば、第１および第４操舵角度は、最大操舵角度の４分の１であり、第２および第５操舵角度は最大操舵角度の２分の１であり、第３および第６操舵角度は最大操舵角度である。 Examples of ship maneuvering commands are as follows:
“Slowly forward”: Command to advance the vessel at a slow speed “Front”… Command to advance the vessel at low speed “Back”… Command to move the vessel backward “Tarely”… Stop (stop) the vessel Command “Small right” ・・・ Command to steer to the right at the first steering angle “Right” ・・・ Command to steer to the right at the second steering angle (> 1st steering angle) “Large right” ..Command to steer to the right at the third steering angle (> 2nd steering angle) "Small left" ... Command to steer to the left at the fourth steering angle "Left" ... Fifth steering angle Command to steer left at (> 4th steering angle) "Large left" ... Command to steer left at 6th steering angle (> 5th steering angle) For example, 1st and 4th steering angles Is one quarter of the maximum steering angle, the second and fifth steering angles are one half of the maximum steering angle, The third and sixth steering angles are maximum steering angles.

また、付属機器操作コマンドの例は次のとおりである。
「信号紅炎」・・・信号紅炎を点火させる命令
「ゴムボート」・・・ゴムボートを膨張させて水上に降下させる命令
「船舶電話」・・・船舶電話による通話を開始させる命令
このような船舶操作コマンドを表す船舶操作データが通信処理部７６および通信装置４４を介して船舶１側に受け渡される。船舶１側では、無線通信部１６によって船舶操作データが受信され、この船舶操作データは処理・出力部１７に受け渡される。処理・出力部１７は、受信した船舶操作データに基づき、船舶１内の対応する機器を作動させる。このようにして、音声指示装置１０を用いて、音声により、船舶操作を行うことができる。 Examples of accessory device operation commands are as follows.
"Signal red flame" ... command to ignite signal red flame "Rubber boat" ... command to inflate the rubber boat and drop it onto the water "ship phone" ... command to start a phone call Ship operation data representing an operation command is transferred to the ship 1 side via the communication processing unit 76 and the communication device 44. On the ship 1 side, ship operation data is received by the wireless communication unit 16, and this ship operation data is transferred to the processing / output unit 17. The processing / output unit 17 operates the corresponding device in the ship 1 based on the received ship operation data. In this way, the ship operation can be performed by voice using the voice instruction device 10.

図１０は、この発明の第２の実施形態に係る船舶システムの構成を説明するための図解図である。この図１０において、前述の図３および図４に示された各部と同等の機能部分には、同一の参照符号を付して示す。
この実施形態では、船舶１Ａ側に、音声認識処理および船舶操作データ生成のための演算処理部１２５が設けられ、この演算処理部１２５が船内ＬＡＮ１１に接続されている。演算処理部１２５は、雑音除去処理部７３、音声認識処理部７４、船舶操作データ生成部７５、雑音除去アルゴリズムバッファ７７および雑音除去アルゴリズム切り換え部７８を備えているほか、船内ＬＡＮ１１との通信のための通信インタフェース部１２６を備えている。 FIG. 10 is an illustrative view for explaining the configuration of a ship system according to the second embodiment of the present invention. In FIG. 10, functional parts equivalent to those shown in FIGS. 3 and 4 are given the same reference numerals.
In this embodiment, an arithmetic processing unit 125 for voice recognition processing and ship operation data generation is provided on the ship 1A side, and this arithmetic processing unit 125 is connected to the inboard LAN 11. The arithmetic processing unit 125 includes a noise removal processing unit 73, a voice recognition processing unit 74, a ship operation data generation unit 75, a noise removal algorithm buffer 77, and a noise removal algorithm switching unit 78, and for communication with the inboard LAN 11. The communication interface unit 126 is provided.

一方、音声指示装置１０Ａ側の演算処理ユニット４５Ａは、Ａ／Ｄ変換部７１，７２および通信処理部７６を備えている。通信処理部７６には、Ａ／Ｄ変換部７１，７２およびＩＣタグ読取装置４６が接続されている。
通信処理部７６は、マイクロフォン４１，４２から入力されてＡ／Ｄ変換部７１，７２でディジタル信号に変換された音声信号を通信装置４４を介して船舶１Ａ側へと無線送信するための処理を実行する。また、通信処理部７６は、ＩＣタグ読取装置４６によって読み取られた位置識別情報を、通信装置４４を介して、船舶１Ａ側へと無線送信するための処理を実行する。 On the other hand, the arithmetic processing unit 45A on the voice instruction device 10A side includes A / D conversion units 71 and 72 and a communication processing unit 76. A / D conversion units 71 and 72 and an IC tag reading device 46 are connected to the communication processing unit 76.
The communication processing unit 76 performs processing for wirelessly transmitting the audio signal input from the microphones 41 and 42 and converted into a digital signal by the A / D conversion units 71 and 72 to the ship 1A side via the communication device 44. Execute. In addition, the communication processing unit 76 executes processing for wirelessly transmitting the position identification information read by the IC tag reading device 46 to the ship 1 A side via the communication device 44.

船舶１Ａの演算処理部１２５は、無線通信部１６および船内ＬＡＮ１１を介して音声指示装置１０Ａ側からの音声信号を受信して、これに対して音声認識処理を実行する。また、演算処理部１２５は、音声信号と同様にして音声指示装置１０Ａからの位置識別情報を得て、雑音除去アルゴリズム切り換え部７８および音声認識処理部７４に与える。さらに、演算処理部１２５は、通信インタフェース部１２６を介して船舶１Ａの属性情報および環境情報を取得し、それらを雑音除去アルゴリズム切り換え部７８および音声認識処理部７４に与える。 The arithmetic processing unit 125 of the ship 1A receives a voice signal from the voice instruction device 10A side via the wireless communication unit 16 and the inboard LAN 11, and executes voice recognition processing on the voice signal. Further, the arithmetic processing unit 125 obtains the position identification information from the voice instruction device 10 A in the same manner as the voice signal, and gives it to the noise removal algorithm switching unit 78 and the voice recognition processing unit 74. Further, the arithmetic processing unit 125 acquires the attribute information and environment information of the ship 1 A via the communication interface unit 126, and provides them to the noise removal algorithm switching unit 78 and the voice recognition processing unit 74.

このような構成によっても、話者８は、音声によって船舶操作のための指示を与えることができる。そして、雑音除去処理および音声認識処理のパラメータが、船舶１Ａ内における話者８の位置および船舶１の属性情報および環境情報に応じて切り換えられるから、高い音声認識率が得られる。
以上、この発明の２つの実施形態について説明したが、この発明はさらに他の形態で実施することもできる。たとえば、前述の実施形態では、発話区間検出部９１から音響解析部９２に与えられる音声信号は、第１マイクロフォン４１に対応した１チャンネルの音声信号とされ、この１チャンネルの音声信号に基づいて音響解析処理が行われるようになっている。しかし、音声認識率をさらに高めるためには、第１および第２マイクロフォン４１，４２の両チャンネルの音声信号を用いて音響解析処理を行うことが好ましい。 Even with such a configuration, the speaker 8 can give an instruction for ship operation by voice. Since the parameters of the noise removal process and the voice recognition process are switched according to the position of the speaker 8 in the ship 1A and the attribute information and environment information of the ship 1, a high voice recognition rate can be obtained.
While the two embodiments of the present invention have been described above, the present invention can also be implemented in other forms. For example, in the above-described embodiment, the audio signal given from the speech section detection unit 91 to the acoustic analysis unit 92 is a one-channel audio signal corresponding to the first microphone 41, and an acoustic signal is generated based on the one-channel audio signal. Analysis processing is performed. However, in order to further increase the voice recognition rate, it is preferable to perform acoustic analysis processing using the voice signals of both channels of the first and second microphones 41 and 42.

このようなチャネル同期加算音声を用いるには、たとえば、図７の構成において、周波数関数Ｓ₁(ω)およびＳ₂(ω)を音声信号入力オン／オフ制御部１０９を介して音響解析部９２に与えればよい。そして、音響解析部９２において前述のようにしてチャネル同期加算音声を求め、このチャネル同期加算音声に対して音響解析を行えばよい。
また、前述の実施形態では、雑音除去アルゴリズム、発話検出パラメータおよび音響モデルが、話者８の船舶１内での位置などに応じて切り換えられる例について説明したが、これらの３種のパラメータのうちの１種または２種のみの切り換えを行う構成としてもよい。たとえば、雑音除去アルゴリズムのみを切り換えるようにし、音声認識処理のパラメータを固定してもよい。また、雑音除去アルゴリズムを固定するとともに、音声認識処理のパラメータを変化させるようにしてもよい。さらに、音声認識処理において、発話検出パラメータおよび音響モデルのうちのいずれか一方を固定し、他方を変化させるようにしてもよい。 In order to use such channel-synchronized addition speech, for example, in the configuration of FIG. 7, the frequency functions S ₁ (ω) and S ₂ (ω) are converted into the acoustic analysis unit 92 via the speech signal input on / off control unit 109. To give. Then, the acoustic analysis unit 92 may obtain the channel-synchronized addition sound as described above, and perform an acoustic analysis on the channel-synchronized addition sound.
In the above-described embodiment, the example in which the noise removal algorithm, the speech detection parameter, and the acoustic model are switched according to the position of the speaker 8 in the ship 1 is described. Of these three parameters, It is good also as a structure which switches only 1 type or 2 types of these. For example, only the noise removal algorithm may be switched, and the parameters for speech recognition processing may be fixed. Further, the noise removal algorithm may be fixed and the parameters of the speech recognition process may be changed. Furthermore, in the speech recognition process, either one of the speech detection parameter and the acoustic model may be fixed and the other may be changed.

さらに、音声認識処理を伴わない音声信号処理として、雑音除去処理を行う雑音除去装置を構成することもできる。この場合には、図４の構成において、音声認識処理部７４および船舶操作データ生成部７５を省いた構成とすればよい。
また、前述の第１の実施形態の構成において、船舶操作データ生成部７５を船舶１側に設け、通信処理部７６を介して、認識結果データを船舶１側へ送信するようにしてもよい。 Furthermore, a noise removal apparatus that performs noise removal processing can be configured as voice signal processing that does not involve voice recognition processing. In this case, the configuration of FIG. 4 may be configured such that the voice recognition processing unit 74 and the ship operation data generation unit 75 are omitted.
In the configuration of the first embodiment, the ship operation data generation unit 75 may be provided on the ship 1 side, and the recognition result data may be transmitted to the ship 1 side via the communication processing unit 76.

さらに、前述の第１および第２の実施形態では、船舶１の複数の箇所に配置したＩＣタグ６１〜６Ｍの位置識別情報をＩＣタグ読取装置４６で読み取ることによって話者８の位置が特定できるようにしているが、話者８の位置は、他の方法で特定することもできる。たとえば、音声指示装置１０にキー入力部を設けておき、このキー入力部から、話者８が位置識別情報を入力するようにしてもよい。また、船舶１の複数の箇所に、たとえば、使用者の操作に応答して位置識別情報を発信（たとえば光信号により発信）する発信装置を設けておき、この発信装置から送出される位置識別情報を音声指示装置１０で受信することにより、話者８の位置を特定するようにしてもよい。この場合、発信装置が識別情報担持体の役割を果たし、音声指示装置１０側に設けた受信装置が、位置識別情報を読み取る読取装置の役割を果たす。 Furthermore, in the first and second embodiments described above, the position of the speaker 8 can be specified by reading the position identification information of the IC tags 61 to 6M arranged at a plurality of locations on the ship 1 with the IC tag reader 46. However, the position of the speaker 8 can be specified by other methods. For example, the voice instruction device 10 may be provided with a key input unit, and the speaker 8 may input the position identification information from the key input unit. In addition, for example, a transmission device that transmits position identification information in response to a user's operation (for example, transmission by an optical signal) is provided at a plurality of locations on the ship 1, and the position identification information transmitted from the transmission device is provided. May be specified by the voice instruction device 10 to identify the position of the speaker 8. In this case, the transmitting device serves as an identification information carrier, and the receiving device provided on the voice instruction device 10 serves as a reading device that reads position identification information.

また、前述の第１および第２の実施形態では、音声指示装置１０と船内ＬＡＮ１１との情報の授受が無線通信によって行われるようにしているが、これらの間の通信は通信ケーブルを用いた有線通信によって行われてもよい。この場合、船舶１の複数の箇所に接続コンセントを設けておき、いずれかの接続コンセントに音声指示装置１０を接続することとすればよい。さらに、この場合に、いずれの接続コンセントに音声指示装置１０が接続されたかを判別し、これにより、話者８の位置を特定するようにしてもよい。 In the first and second embodiments described above, information is exchanged between the voice instruction device 10 and the inboard LAN 11 by wireless communication, but communication between these is wired using a communication cable. It may be performed by communication. In this case, connection outlets may be provided at a plurality of locations on the ship 1, and the voice instruction device 10 may be connected to any one of the connection outlets. Further, in this case, it may be determined to which connection outlet the voice instruction device 10 is connected, and thereby the position of the speaker 8 may be specified.

また、前述の第２の実施形態の構成を変形して、話者８の音声を入力するための複数のマイクロフォンを船舶１内の複数の箇所に配置する構成とすることも考えられる。この場合、話者８は、所定の認証情報を担持したＩＣタグを保持し、マイクロフォンの各設置箇所の近傍に読取装置を配置する。使用に際しては、話者８が当該読取装置にＩＣタグをかざし、そのＩＣタグに記録された認証情報を読み取らせる。読み取られた認証情報は、船内ＬＡＮ１１を介して演算処理部１２５へと送られ、いずれの場所の読取装置によって認証情報が入力されたかによって、話者８の位置が特定される。演算処理部１２５では、その特定された位置情報に基づいて、雑音除去処理のアルゴリズムや音声認識処理に適用されるパラメータが切り換えられることになる。 It is also conceivable to modify the configuration of the second embodiment described above so that a plurality of microphones for inputting the voice of the speaker 8 are arranged at a plurality of locations in the ship 1. In this case, the speaker 8 holds an IC tag carrying predetermined authentication information, and places a reading device in the vicinity of each microphone installation location. In use, the speaker 8 holds the IC tag over the reading device and reads the authentication information recorded on the IC tag. The read authentication information is sent to the arithmetic processing unit 125 via the inboard LAN 11, and the position of the speaker 8 is specified depending on which location the authentication information is input by the reading device. In the arithmetic processing unit 125, the algorithm applied to the noise removal processing and the parameters applied to the speech recognition processing are switched based on the specified position information.

さらに、前述の実施形態では、船舶１の属性情報および環境情報が、船内ＬＡＮ１１との間の通信によって音声指示装置１０に取り込まれる構成について説明したが、たとえば、音声指示装置１０にキー入力部を設け、このキー入力部の操作によって属性情報や環境情報を入力するようにしてもよい。
また、前述の実施形態では、環境情報は、環境情報センサ９によって検出されることとしているが、たとえば、無線通信部１６が、いずれかの情報センターと交信して、当該船舶１の近傍の風速や波高などの環境情報を取得する構成としてもよい。 Further, in the above-described embodiment, the configuration in which the attribute information and the environment information of the ship 1 are taken into the voice instruction device 10 by communication with the inboard LAN 11 is described. For example, a key input unit is provided in the voice instruction device 10. It is also possible to provide attribute information and environment information by operating the key input unit.
In the above-described embodiment, the environmental information is detected by the environmental information sensor 9. For example, the wireless communication unit 16 communicates with one of the information centers and wind speed in the vicinity of the ship 1. It is good also as a structure which acquires environmental information, such as a wave height.

また、前述の実施形態では、輸送機器として船舶を例にとったが、この発明は、列車や航空機など、乗員が内部で移動可能な大型の輸送機器に対して好適に適用できる。
さらに、前述の実施形態では、話者８の音声を処理対象音とする場合について説明したが、たとえば、輸送機器の内部で発生する特定の機械音を対象音とする音処理装置（たとえば、特定の機械音に基づいて輸送機器の自己診断を行うもの）に、この発明が適用されてもよい。 Moreover, in the above-described embodiment, a ship is taken as an example of the transportation device. However, the present invention can be suitably applied to a large transportation device in which an occupant can move inside, such as a train or an aircraft.
Furthermore, in the above-described embodiment, the case where the voice of the speaker 8 is set as the processing target sound has been described. The present invention may be applied to a device that performs self-diagnosis of transport equipment based on the mechanical sound of

また、前述の実施形態では、音声指示装置１０がトランシーバ形に形成され、その筐体にマイクロフォン４１，４２が取り付けられる例について説明した。しかし、話者８の発話位置（口元）とマイクロフォン４１，４２との位置関係を一定に保持して音声認識率を向上するには、ヘッドセット型マイクロフォンを用いる方が好ましい。また、船舶の乗員が装着を義務付けられているか、または装着が勧告されている救命胴衣にマイクロフォンを取り付けるようにしてもよい。この場合、２つのマイクロフォンを、救命胴衣の左右の肩部に取り付けてもよいし、ステレオ型マイクロフォンを用いることとしてもよい。 In the above-described embodiment, the example in which the voice instruction device 10 is formed in a transceiver shape and the microphones 41 and 42 are attached to the casing has been described. However, in order to keep the positional relationship between the utterance position (mouth) of the speaker 8 and the microphones 41 and 42 constant and improve the speech recognition rate, it is preferable to use a headset type microphone. Further, the microphone may be attached to a life jacket that is required to be worn by a crew member of the ship or recommended to be worn. In this case, two microphones may be attached to the left and right shoulders of the life jacket, or a stereo microphone may be used.

むろん、必ずしも２チャンネルの受音系が必要なわけではなく、発話区間検出処理や音響解析処理等の種類によっては、１チャンネルの受音系で足りる場合もある。
また、前述の実施形態では、マイクロフォン４１，４２からの音声信号を処理する装置について説明したが、この発明は、予め録音機器で録音した音を再生機器で再生して得られる音声信号の処理に対して適用することも可能である。 Of course, a two-channel sound reception system is not necessarily required, and depending on the type of speech segment detection processing, acoustic analysis processing, or the like, a one-channel sound reception system may be sufficient.
Further, in the above-described embodiment, the apparatus for processing the audio signal from the microphones 41 and 42 has been described. However, the present invention can process the audio signal obtained by reproducing the sound recorded in advance by the recording apparatus with the reproducing apparatus. It is also possible to apply to.

以上のほか、特許請求の範囲に記載された事項の範囲で種々の設計変更を施すことが可能である。 In addition to the above, various design changes can be made within the scope of the matters described in the claims.

この発明の一実施形態に係る輸送機器システムの一例としての船舶システムの概念的な構成を示す図解図である。It is an illustration figure which shows the notional structure of the ship system as an example of the transport equipment system which concerns on one Embodiment of this invention. 前記船舶システムを構成する船舶の図解的な平面図である。It is a schematic top view of the ship which comprises the said ship system. 船舶の内部の主として電気的構成を示すブロック図である。It is a block diagram which mainly shows the electrical structure inside a ship. 音声指示装置の電気的構成を説明するためのブロック図である。It is a block diagram for demonstrating the electrical structure of a voice instruction | indication apparatus. 音声認識処理部の機能的な構成を説明するためのブロック図である。It is a block diagram for demonstrating the functional structure of a speech recognition process part. 雑音除去処理部の構成例を示すブロック図である。It is a block diagram which shows the structural example of a noise removal process part. 発話区間検出部の構成例を説明するためのブロック図である。It is a block diagram for demonstrating the structural example of an utterance area detection part. (a)は、発話区間のフレームについて、クロススペクトルの位相の周波数に対する変化の例を示し、(b)は、非発話区間のフレームについて、クロススペクトルの位相の周波数に対する変化の例を示す。(a) shows an example of the change of the cross spectrum phase with respect to the frequency of the frame in the utterance interval, and (b) shows an example of the change of the cross spectrum phase with respect to the frequency of the frame in the non-utterance interval. (a)は、発話区間のフレームに関して求めた位相の傾きについてのヒストグラムであり、(b)は非発話区間のフレームに関して求めた位相の傾きについてのヒストグラムである。(a) is a histogram about the phase gradient obtained for the frame in the speech segment, and (b) is a histogram about the phase gradient obtained for the frame in the non-speech segment. この発明の第２の実施形態に係る船舶システムの構成を説明するための図解図である。It is an illustration figure for demonstrating the structure of the ship system which concerns on 2nd Embodiment of this invention.

Explanation of symbols

１，１Ａ船舶
２船外機
３操舵装置
４信号紅炎装置
５船舶電話装置
６１〜６ＭＩＣタグ
７ゴムボート作動装置
８話者
９環境情報センサ
１０，１０Ａ音声指示装置
１１船内ＬＡＮ
１２データベース装置
１３操舵部
１４シフト・スロットル操作部
１６無線通信部
１７処理・出力部
１８ステアリングハンドル
２０エンジン回転数センサ等のセンサ類
２１エンジン
２２エンジン制御ユニット
４１第１マイクロフォン
４２第２マイクロフォン
４４通信装置
４５，４５Ａ演算処理ユニット
４６ＩＣタグ読み取り装置
５１船体
５２上部構造物
５３アウトデッキ
５４キャビン
５５操縦室
５６フライングデッキ
７３雑音除去処理部
７３１〜７３ｋ雑音除去アルゴリズム
７４音声認識処理部
７５船舶操作データ生成部
７６通信処理部
７７雑音除去アルゴリズムバッファ
７８雑音除去アルゴリズム切り換え部
８１，８２高域強調フィルタ部
８３，８４スペクトルサブトラクションフィルタ部
８５，８６フレーム化部
８７，８８周波数分析部
８９，９０前処理部
９１発話区間検出部
９１１〜９１ｍ発話区間検出パラメータ
９２音響解析部
９３照合部
９３１〜９３ｎ音響モデル
９４発話検出パラメータ切り換え部
９５発話検出パラメータバッファ
９６音響モデル切り換え部
９７音響モデルバッファ
９８認識辞書
１００クロススペクトル計算部
１０７ヒストグラム計算部
１０８発話区間判定部
１０９音声信号入力オン／オフ制御部
１２５演算処理部
１２６通信インタフェース部 DESCRIPTION OF SYMBOLS 1,1A Ship 2 Outboard motor 3 Steering device 4 Signal flame device 5 Ship telephone device 61-6M IC tag 7 Rubber boat operation device 8 Speaker 9 Environment information sensor 10, 10A Voice indication device 11 Inboard LAN
DESCRIPTION OF SYMBOLS 12 Database apparatus 13 Steering part 14 Shift / throttle operation part 16 Wireless communication part 17 Processing / output part 18 Steering handle 20 Sensors, such as an engine speed sensor 21 Engine 22 Engine control unit 41 1st microphone 42 2nd microphone 44 Communication apparatus 45, 45A arithmetic processing unit 46 IC tag reader 51 hull 52 superstructure 53 out deck 54 cabin 55 cockpit 56 flying deck 73 noise removal processing unit 731-73k noise removal algorithm 74 voice recognition processing unit 75 ship operation data generation unit 76 Communication processing unit 77 Noise removal algorithm buffer 78 Noise removal algorithm switching unit 81, 82 High frequency enhancement filter unit 83, 84 Spectral subtraction filter unit 85, 8 6 Framing unit 87, 88 Frequency analysis unit 89, 90 Pre-processing unit 91 Speech segment detection unit 911-91m Speech segment detection parameter 92 Acoustic analysis unit 93 Collation unit 931-93n Acoustic model 94 Speech detection parameter switching unit 95 Speech detection parameter Buffer 96 Acoustic model switching unit 97 Acoustic model buffer 98 Recognition dictionary 100 Cross spectrum calculation unit 107 Histogram calculation unit 108 Speech segment determination unit 109 Voice signal input on / off control unit 125 Arithmetic processing unit 126 Communication interface unit

Claims

A target sound processing device for processing a predetermined target sound generated in a transport device,
A generation position specifying means for specifying a generation position of the target sound in the transport device;
A sound signal processing means that receives a sound signal that is an electrical signal of a sound including the target sound, executes sound signal processing for processing the sound signal, and can change parameters of the sound signal processing;
A target sound processing apparatus comprising: parameter switching means for switching a parameter of the sound signal processing in the sound signal processing means in accordance with the generation position specified by the generation position specifying means.

A transport device information acquiring means for acquiring transport device information including at least one of attribute information of the transport device and environmental information of the transport device;
The parameter switching means is a parameter of the sound signal processing in the sound signal processing means based on the transport equipment information acquired by the transport equipment information acquisition means in addition to the generation position specified by the generation position specifying means. The target sound processing device according to claim 1, wherein the target sound processing device is switched.

The transport device includes a database storing attribute information of the transport device,
The target sound processing apparatus according to claim 2, wherein the transport device information acquisition unit includes attribute information acquisition unit that acquires attribute information of the transport device from the database.

The transport equipment includes transport equipment information detection means for detecting the transport equipment information,
4. The target sound processing apparatus according to claim 2, wherein the transport device information acquisition unit acquires the transport device information detected by the transport device information detection unit from the transport device.

The sound signal processing means includes noise removal processing means for performing noise removal processing for removing a noise component from the input sound signal,
5. The object according to claim 1, wherein the parameter switching unit includes a noise removal parameter switching unit that switches a noise removal parameter applied to a noise removal process performed by the noise removal processing unit. Sound processing device.

The sound signal processing means includes target sound extraction means for performing target sound section extraction processing for extracting a signal section including the target sound from the input sound signal,
6. The parameter switching means includes a target sound extraction parameter switching means for switching a target sound extraction parameter applied to a target sound section extraction process executed by the target sound extraction means. The target sound processing device according to 1.

The sound signal processing means has a plurality of types of acoustic models that can be collated with the sound signal of the target sound,
7. The target sound processing apparatus according to claim 1, wherein the parameter switching means includes acoustic model switching means for switching the plurality of types of acoustic models.

The transport device is disposed at a plurality of different positions in the transport device, and includes a plurality of identification information carriers each carrying position identification information,
The target sound processing apparatus according to claim 1, wherein the generation position specifying unit includes a reading device that reads position identification information carried on the identification information carrier.

The transport equipment is a ship;
9. The target sound processing apparatus according to claim 1, wherein the generation position specifying unit specifies a generation position of the target sound in the ship.

The target sound is a voice uttered by a speaker in the transport device,
The target sound processing apparatus according to claim 1, wherein the sound signal processing means includes voice recognition processing means for executing voice recognition processing for recognizing the voice of the speaker.

Transportation equipment,
A transportation equipment system comprising the target sound processing device according to claim 1.

Transportation equipment,
The target sound processing device according to claim 10;
Based on the voice recognition result by the voice recognition processing unit, the instruction for operation of the transport device by the speaker is specified, and transport device operation data generating unit for generating transport device operation data corresponding to the instruction
A transport equipment system comprising: equipment control means for operating equipment provided in the transport equipment based on the transport equipment operation data generated by the transport equipment operation data generation means.

A target sound processing method for processing a predetermined target sound generated in a transport device,
Identifying the location of the target sound in the transport device;
Performing sound signal processing on a sound signal that is an electrical signal of a sound including the target sound;
And a step of switching a parameter applied to the sound signal processing in accordance with the identified position where the target sound is generated.

Further comprising obtaining transport equipment information including at least one of attribute information of the transport equipment and environmental information of the transport equipment,
The step of switching the parameter includes a step of switching a parameter applied to the sound signal processing based on the acquired transport device information in addition to the identified generation position of the target sound. The target sound processing method according to claim 13.