JP2021105684A

JP2021105684A - Speech-in-noise recognition device and speech-in-noise recognition system

Info

Publication number: JP2021105684A
Application number: JP2019237523A
Authority: JP
Inventors: 田村　修一; Shuichi Tamura; 修一田村
Original assignee: Toyota Motor Kyushu Inc
Current assignee: Toyota Motor Kyushu Inc
Priority date: 2019-12-26
Filing date: 2019-12-26
Publication date: 2021-07-26

Abstract

To provide a speech-in-noise recognition device and a speech-in-noise recognition system that extract a spoken speech from noise in a production line of a factory and store waveform data thereof in storage regions of the speech-in-noise recognition device and a storage region of a server.SOLUTION: A speech-in-noise recognition device consists of: speech data input means; means which segments a speech waveform input from the speech data input means; means which removes noise from the segmented speech waveform; means which extracts a feature quantity classified into a specific frequency band from the speech waveform having the noise removed; means which generates data on the speech waveform based upon the extracted feature quantity; storage means which saves the generated data on the speech waveform in a storage region; and a server which is connected to a LAN to save the data on the speech waveform.SELECTED DRAWING: Figure 1

Description

この発明は、例えば、工場内の騒音環境下においてデータの読み取り音声を認識し、データ保存を行う騒音内音声認識装置及び騒音内音声認識システムに関する。 The present invention relates to, for example, an in-noise voice recognition device and an in-noise voice recognition system that recognizes read voice of data and stores the data in a noisy environment in a factory.

従来、例えば、自動車部品等の被測定物の測定作業は、測定作業者が測定を行い、測定したデータを測定器から読み取り、その測定データを発声して記録作業者に伝え、それを記録作業者が耳で聞いて、その聞いた測定データを手書きし、事務所に戻ってパソコンに入力することによって、測定データの保存、管理を行っていた。 Conventionally, for example, in the measurement work of an object to be measured such as an automobile part, the measurement worker performs measurement, reads the measured data from the measuring instrument, utters the measurement data, conveys it to the recording worker, and records it. The person listened to the measurement data by ear, handwritten the measured measurement data, returned to the office, and input it to the personal computer to save and manage the measurement data.

しかし、従来の測定作業、記録作業及びパソコン入力作業では２人以上の作業者を必要とし、さらに、測定データの聞き間違いや、測定データの手書き段階及びパソコン入力段階において誤りが発生する虞があった。 However, the conventional measurement work, recording work, and personal computer input work require two or more workers, and there is a risk that an error may occur in the measurement data listening error, the measurement data handwriting stage, and the personal computer input stage. rice field.

そこで、上記問題を解消するために従来の手書き作業に代わって、測定者が発生する測定データの音声情報を自動的に認識してパソコンなどに電子データとして保存する次のような技術が開示されている。 Therefore, in order to solve the above problem, the following technology for automatically recognizing the voice information of the measurement data generated by the measurer and saving it as electronic data in a personal computer or the like is disclosed instead of the conventional handwriting work. ing.

特許文献１には、工場の生産ライン内での不具合を音声入力できるように構成した音声認識装置について記載されている。具体的には、音声入力手段と、前記音声入力手段から入力された一連の音声から予め定められた特徴部分を抽出し、前記特徴部分の前後で前記一連の音声を区切る処理手段と、前記処理手段で区切られた音声をそれぞれ音声データベースと照合することで音声情報を認識する認識手段と、を有し、特徴部分をアルファベットと数字からなる部分で構成したことを特徴とする音声認識装置についての技術が開示されている。 Patent Document 1 describes a voice recognition device configured to be capable of voice input of defects in a factory production line. Specifically, the voice input means, a processing means for extracting a predetermined feature portion from a series of voices input from the voice input means, and separating the series of voices before and after the feature portion, and the processing. A voice recognition device having a recognition means for recognizing voice information by collating each voice separated by means with a voice database, and having a feature part composed of a part composed of alphabets and numbers. The technology is disclosed.

また、特許文献２には、最適な音声認識処理について記載されている。具体的には、周囲環境及び使用状況に応じて、入力した信号から特徴量を抽出し、その特徴量に応じて認識語彙を選択し、さらにその特徴量に応じて文法を選択する。そして、これらの選択した認識語彙及び文法を用いて、パターンマッチングによる音声認識を行う構成についての技術が開示されている。 Further, Patent Document 2 describes the optimum voice recognition process. Specifically, a feature amount is extracted from the input signal according to the surrounding environment and usage conditions, a recognition vocabulary is selected according to the feature amount, and a grammar is selected according to the feature amount. Then, a technique for a configuration for performing speech recognition by pattern matching using these selected recognition vocabularies and grammars is disclosed.

特開２００３−２２３１８４号公報Japanese Unexamined Patent Publication No. 2003-223184 特開２００４−００４１８２号公報Japanese Unexamined Patent Publication No. 2004-004182

しかし、工場の生産ライン内では騒音が大きく、測定作業者の測定データの発生音と工場内の騒音とが入り混じった音声情報から作業者の発声音を抽出し、測定データとして記録することが必要になる。しかるに、特許文献１及び特許文献２には、人の発声音と騒音が入り混じっている音声情報から人の発声音を抽出して測定データを認識する技術については、開示されておらず、このような問題点を解決することはできない。 However, there is a lot of noise in the production line of the factory, and it is possible to extract the voice of the worker from the voice information that is a mixture of the noise generated by the measurement data of the measurement worker and the noise in the factory and record it as measurement data. You will need it. However, Patent Document 1 and Patent Document 2 do not disclose a technique for extracting human voice from voice information in which human voice and noise are mixed and recognizing measurement data. Such a problem cannot be solved.

本発明は、かかる問題点を解決するためになされたものであり、測定作業者の測定データの発声音と工場内の騒音が入り混じった音声情報から人の発声音を抽出し、抽出した音声データを騒音内音声認識装置やサーバーの記憶領域に記憶する騒音内音声認識装置及び騒音内音声認識システムを提供することを目的とする。 The present invention has been made to solve such a problem, and a human voice is extracted from the voice information in which the voice of the measurement data of the measurement worker and the noise in the factory are mixed, and the extracted voice is obtained. An object of the present invention is to provide an in-noise voice recognition device, an in-noise voice recognition device for storing data in a storage area of a server, and an in-noise voice recognition system.

本発明の騒音内音声認識装置は、音声データ入力手段と、前記音声データ入力手段から入力された音声波形を切り出す手段と、切り出した音声波形から騒音を除去する手段と、騒音を除去した音声波形から音声波形のデータを生成する手段と、生成された音声波形のデータを記憶領域に保存する記憶手段と、から構成されたことを特徴とする。 The voice recognition device in noise of the present invention includes a voice data input means, a means for cutting out a voice waveform input from the voice data input means, a means for removing noise from the cut out voice waveform, and a voice waveform from which noise is removed. It is characterized in that it is composed of a means for generating voice waveform data from and a storage means for storing the generated voice waveform data in a storage area.

前記騒音を除去した音声波形から音声波形のデータを生成する手段は、騒音を除去した音声波形から特定の周波数帯域に分類される特徴量を抽出する手段と、予め音節の特徴量の音声マスタを記憶しておく音声マスタ記憶手段と、前記抽出した特徴量と予め記憶している音声マスタの特徴量とを照合し、前記抽出した特徴量に最も近い音声マスタを抽出する手段と、前記抽出した音声マスタを音声波形のデータとする手段と、から構成されたことを特徴とする。 The means for generating the voice waveform data from the noise-removed voice waveform is a means for extracting the feature amount classified into a specific frequency band from the noise-removed voice waveform and a voice master of the syllable feature amount in advance. A means for collating the voice master storage means to be stored with the extracted feature amount and the feature amount of the voice master stored in advance, and extracting the voice master closest to the extracted feature amount, and the extracted voice master. It is characterized in that it is composed of means for using a voice master as voice waveform data.

前記騒音を除去した音声波形から音声波形のデータを生成する手段は、予め音節の音声波形の波形マス目マスタを記憶しておく波形マス目マスタ記憶手段と、切り出された音声波形をマス目テーブル上に配列し、当該配列された波形マス目パターンとを照合し、前記波形マス目パターンに最も近い波形マス目マスタを抽出する手段と、抽出した波形マス目マスタを音声波形のデータとする手段と、から構成されたことにより音声波形のデータを抽出することを特徴とする。 The means for generating the voice waveform data from the voice waveform from which the noise has been removed are a waveform square master storage means for storing the waveform square master of the voice waveform of the syllable in advance, and a square table for the cut out voice waveform. A means for arranging on the top, collating with the arranged waveform grid pattern, and extracting a waveform grid master closest to the waveform grid pattern, and a means for using the extracted waveform grid master as audio waveform data. It is characterized in that the data of the voice waveform is extracted by being composed of.

前記騒音を除去した音声波形から音声波形のデータを生成する手段は、騒音を除去した音声波形から特定の周波数帯域に分類される特徴量を抽出する手段と、予め音節の特徴量の音声マスタを記憶しておく音声マスタ記憶手段と、前記抽出した特徴量と予め記憶している音声マスタの特徴量とを照合し、前記抽出した特徴量に最も近い音声マスタを抽出する手段と、前記抽出した音声マスタを音声波形のデータとする手段と、前記抽出した特徴量に近い音声マスタに記憶されている特徴量が複数ある場合は、予め音節の音声波形の波形マス目マスタを記憶しておく波形マス目マスタ記憶手段と、切り出された音声波形をマス目テーブル上に配列し、当該配列された波形マス目パターンとを照合し、前記波形マス目パターンに最も近い波形マス目マスタを抽出する手段と、抽出した波形マス目マスタを音声波形のデータとする手段と、から音声波形のデータを抽出するよう構成されたことを特徴とする。 The means for generating the voice waveform data from the noise-removed voice waveform is a means for extracting the feature amount classified into a specific frequency band from the noise-removed voice waveform and a voice master of the syllable feature amount in advance. A means for collating the voice master storage means to be stored with the extracted feature amount and the feature amount of the voice master stored in advance, and extracting the voice master closest to the extracted feature amount, and the extracted voice master. When there are a plurality of means for using the voice master as voice waveform data and a voice master having a voice master close to the extracted feature amount, the waveform master of the voice waveform of the syllable is stored in advance. A means for arranging the cut-out voice waveforms on the square table, collating the arranged waveform square patterns with the square master storage means, and extracting the waveform square master closest to the waveform square pattern. It is characterized in that it is configured to extract the voice waveform data from the means for using the extracted waveform grid master as the voice waveform data.

本発明の騒音内音声認識システムは、前記騒音内音声認識装置と、前記騒音内音声認識装置に音声信号を入力する音声入力装置と、前記騒音内音声認識装置に備えた通信手段により前記記憶手段の音声波形のデータを送信し、当該音声波形のデータを受信して記憶領域に保存するサーバーと、から構成されたことを特徴とする。 The in-noise voice recognition system of the present invention is the storage means by means of the in-noise voice recognition device, a voice input device for inputting a voice signal to the in-noise voice recognition device, and communication means provided in the in-noise voice recognition device. It is characterized in that it is composed of a server that transmits the voice waveform data of the above, receives the voice waveform data, and stores the data in the storage area.

請求項１に記載の発明によれば、本発明の騒音内音声認識装置は、音声データ入力手段と、前記音声データ入力手段から入力された音声波形を切り出す手段と、切り出した音声波形から騒音を除去する手段と、騒音を除去した音声波形から音声波形のデータを生成する手段と、生成された音声波形のデータを記憶領域に保存する記憶手段と、から構成されているため、例えば、タブレット端末などのハードウェア資源を使ってソフトウェアで実現することができ、持ち運びが簡単で、工場内の作業現場に一人で手軽に持ち運んで測定及びデータの記録をすることができるなど、利便性に優れている。 According to the invention according to claim 1, the voice recognition device in noise of the present invention has a voice data input means, a means for cutting out a voice waveform input from the voice data input means, and noise from the cut out voice waveform. Since it is composed of a means for removing, a means for generating voice waveform data from the voice waveform from which noise has been removed, and a storage means for storing the generated voice waveform data in a storage area, for example, a tablet terminal. It can be realized by software using hardware resources such as, it is easy to carry, and it is very convenient because it can be easily carried by one person to the work site in the factory to measure and record data. There is.

また、人の音声と騒音が入り混じっている音声波形の中から人の発声音を抽出して測定データを認識し、この波形データを当該タブレット端末の記憶領域に保存することにより、工場内の騒音環境下において音声認識を行うことができると共に、測定データを電子データとして保存及び管理することができる。 In addition, by extracting human voice from a voice waveform in which human voice and noise are mixed, recognizing measurement data, and storing this waveform data in the storage area of the tablet terminal, the factory can be used. Voice recognition can be performed in a noisy environment, and measurement data can be stored and managed as electronic data.

また、従来２人で行っていた作業を１人ですることができ、さらに、測定データの転記ミスや入力ミスを防止することができる。 In addition, the work that was conventionally performed by two people can be performed by one person, and further, it is possible to prevent a transcription error and an input error of the measurement data.

請求項２に記載の発明によれば、前記騒音を除去した音声波形から音声波形のデータを生成する手段は、騒音を除去した音声波形から特定の周波数帯域に分類される特徴量を抽出する手段と、予め音節の特徴量の音声マスタを記憶しておく音声マスタ記憶手段と、前記抽出した特徴量と予め記憶している音声マスタの特徴量とを照合し、前記抽出した特徴量に最も近い音声マスタを抽出する手段と、前記抽出した音声マスタを音声波形のデータとする手段と、から構成されているため、抽出した特徴量と、音声マスタ記憶手段に記憶されている音声マスタとの照合によって最も近い音声マスタを抽出することができ、構成が簡単で、かつ、処理速度を速くすることができる。 According to the invention of claim 2, the means for generating voice waveform data from the noise-removed voice waveform is a means for extracting a feature amount classified into a specific frequency band from the noise-removed voice waveform. The voice master storage means for storing the voice master of the feature amount of the syllable in advance collates the extracted feature amount with the feature amount of the voice master stored in advance, and is closest to the extracted feature amount. Since it is composed of a means for extracting the voice master and a means for using the extracted voice master as voice waveform data, the extracted feature amount is collated with the voice master stored in the voice master storage means. The closest voice master can be extracted, the configuration is simple, and the processing speed can be increased.

請求項３に記載の発明によれば、前記騒音を除去した音声波形から音声波形のデータを生成する手段は、予め音節の音声波形の波形マス目マスタを記憶しておく波形マス目マスタ記憶手段と、切り出された音声波形をマス目テーブル上に配列し、当該配列された波形マス目パターンとを照合し、前記波形マス目パターンに最も近い波形マス目マスタを抽出する手段と、抽出した波形マス目マスタを音声波形のデータとする手段と、から構成されているため、構成が簡単で分かり易く、波形マスタの記憶データを変更することで照合の微調整も容易にすることができる。 According to the invention of claim 3, the means for generating the voice waveform data from the voice waveform from which the noise is removed is a waveform square master storage means for storing the waveform square master of the voice waveform of the syllable in advance. A means for arranging the cut-out voice waveforms on the grid table, collating the arranged waveform grid patterns, and extracting the waveform grid master closest to the waveform grid pattern, and the extracted waveform. Since it is composed of means for using the grid master as audio waveform data, the configuration is simple and easy to understand, and fine adjustment of collation can be facilitated by changing the stored data of the waveform master.

請求項４に記載の発明によれば、前記騒音を除去した音声波形から音声波形のデータを生成する手段は、騒音を除去した音声波形から特定の周波数帯域に分類される特徴量を抽出する手段と、予め音節の特徴量の音声マスタを記憶しておく音声マスタ記憶手段と、前記抽出した特徴量と予め記憶している音声マスタの特徴量とを照合し、前記抽出した特徴量に最も近い音声マスタを抽出する手段と、前記抽出した音声マスタを音声波形のデータとする手段と、前記抽出した特徴量に近い音声マスタに記憶されている特徴量が複数ある場合は、予め音節の音声波形の波形マス目マスタを記憶しておく波形マス目マスタ記憶手段と、切り出された音声波形をマス目テーブル上に配列し、当該配列された波形マス目パターンとを照合し、前記波形マス目パターンに最も近い波形マス目マスタを抽出する手段と、抽出した波形マス目マスタを音声波形のデータとする手段と、から音声波形のデータを抽出するよう構成されているため、抽出した特徴量と、音声マスタ記憶手段に記憶されている音声マスタとの照合によっては音声マスタを抽出することの判定ができない場合でも、切り出された音声波形をマス目テーブル上に配列し、波形マス目マスタと照合することによって、配列した波形マス目パターンに最も近い波形マス目マスタを抽出することができ、より正確に音声波形のデータを生成することができる。 According to the invention of claim 4, the means for generating voice waveform data from the noise-removed voice waveform is a means for extracting feature quantities classified into a specific frequency band from the noise-removed voice waveform. The voice master storage means for storing the voice master of the feature amount of the syllable in advance is collated with the extracted feature amount and the feature amount of the voice master stored in advance, and is closest to the extracted feature amount. If there are a plurality of means for extracting the voice master, means for using the extracted voice master as voice waveform data, and a plurality of feature amounts stored in the voice master close to the extracted feature amount, the voice waveform of the syllable in advance. The waveform grid master storage means for storing the waveform grid master of the above, the voice waveform cut out is arranged on the grid table, the arranged waveform square pattern is collated, and the waveform square pattern is collated. Since it is configured to extract voice waveform data from a means for extracting the waveform grid master closest to the above and a means for using the extracted waveform grid master as voice waveform data, the extracted feature amount and Even if it cannot be determined to extract the voice master by collating with the voice master stored in the voice master storage means, the cut out voice waveforms are arranged on the grid table and collated with the waveform grid master. Thereby, the waveform grid master closest to the arranged waveform grid pattern can be extracted, and the voice waveform data can be generated more accurately.

請求項５に記載の発明によれば、本発明の騒音内音声認識システムは、前記騒音内音声認識装置と、前記騒音内音声認識装置に音声信号を入力する音声入力装置と、前記騒音内音声認識装置に備えた通信手段により前記記憶手段の音声波形のデータを送信し、当該音声波形のデータを受信して記憶領域に保存するサーバーとから構成されている。したがって、工場現場で日々行われている試験、検査などの測定データを手書き作業やパソコン入力作業を経ることなく、リアルタイムにサーバーに収集し、音声解析をすることができるため、騒音の大きな工場現場における測定作業の省力化とデータ処理の迅速化を図ることができる。 According to the invention of claim 5, the in-noise voice recognition system of the present invention includes the in-noise voice recognition device, a voice input device for inputting a voice signal to the in-noise voice recognition device, and the in-noise voice. It is composed of a server that transmits voice waveform data of the storage means by a communication means provided in the recognition device, receives the voice waveform data, and stores the voice waveform data in a storage area. Therefore, measurement data such as tests and inspections that are performed daily at the factory site can be collected on the server in real time and voice analysis can be performed without going through handwriting work or computer input work, so the factory site is noisy. It is possible to save labor in measurement work and speed up data processing.

本発明に係る騒音内音声認識システムの構成図である。It is a block diagram of the voice recognition system in noise which concerns on this invention. 本発明に係る騒音内音声認識装置のブロック図である。It is a block diagram of the voice recognition device in noise which concerns on this invention. 本発明に係る騒音内音声認識装置及び騒音内音声認識システムの処理フローチャートである。It is a processing flowchart of the voice recognition device in noise and the voice recognition system in noise which concerns on this invention. 波形切り出し処理の説明図である。It is explanatory drawing of the waveform cutout processing. 波形切り出しと波形認識とを並列で処理する説明図である。It is explanatory drawing which processes the waveform cutout and waveform recognition in parallel. 抽出した特徴量と音声マスタとの照合処理の説明図である。It is explanatory drawing of the collation processing with the extracted feature amount and a voice master. マス目テーブル上に配列した波形マス目パターンと、波形マス目マスタとの照合処理の説明図である。It is explanatory drawing of the collation processing of the waveform grid pattern arranged on the grid table, and the waveform grid master.

本発明の要旨は、人の音声と騒音が入り混じった音の中から音声波形を切り出して、切り出した音声波形から騒音を除去し、騒音を除去した音声波形データから特定の周波数帯域に分類される特徴量を抽出し、抽出した特徴量に基づき音声波形のデータを生成し、又は／及び音声波形をマス目テーブル上に配列した波形マス目パターンと、波形マス目マスタとの照合処理を行い、照合結果により音声波形のデータを生成し、当該生成された音声波形のデータを記憶部に記憶すると共に、工場内のＬＡＮ（ＬｏｃａｌＡｒｅａＮｅｔｗｏｒｋ）などの通信回線を介して収集して、サーバーの記憶領域に保存し、音声解析を行うことができるよう構成することにより、騒音の大きな工場現場における測定作業の省力化とデータ処理の迅速化を図るものである。 The gist of the present invention is to cut out a voice waveform from a sound in which human voice and noise are mixed, remove noise from the cut out voice waveform, and classify the noise-removed voice waveform data into a specific frequency band. The feature amount is extracted, voice waveform data is generated based on the extracted feature amount, or / and the waveform square pattern in which the voice waveform is arranged on the square table is collated with the waveform square master. , Voice waveform data is generated based on the collation result, the generated voice waveform data is stored in the storage unit, and is collected via a communication line such as LAN (Local Area Network) in the factory of the server. By storing it in a storage area and configuring it so that voice analysis can be performed, it is intended to save labor in measurement work and speed up data processing at a noisy factory site.

以下、本発明の実施の形態について図面により説明する。ただし、図面は模式的なものであり、各部の配置や寸法の比率等は現実のものとは必ずしも一致するものではない。 Hereinafter, embodiments of the present invention will be described with reference to the drawings. However, the drawings are schematic, and the arrangement of each part and the ratio of dimensions do not always match the actual ones.

図１は、本発明に係る騒音内音声認識システム１００の構成図である。本図において、本発明の実施形態に係る騒音内音声認識システム１００は、前記騒音内音声認識装置１と、前記騒音内音声認識装置１に音声信号を入力するマイクロフォン５と、前記騒音内音声認識装置１に備えた通信部１６により送信した音声波形のデータを送受信する無線ルータ６と、無線ルータ６が送信した当該音声波形のデータを受信して記憶領域に保存するサーバー８と、無線ルータ６とサーバー８を接続するＬＡＮ７とから構成されている。 FIG. 1 is a configuration diagram of a noise recognition system 100 according to the present invention. In this figure, the in-noise voice recognition system 100 according to the embodiment of the present invention includes the in-noise voice recognition device 1, a microphone 5 for inputting a voice signal to the in-noise voice recognition device 1, and the in-noise voice recognition. A wireless router 6 that sends and receives voice waveform data transmitted by the communication unit 16 provided in the device 1, a server 8 that receives the voice waveform data transmitted by the wireless router 6 and stores it in a storage area, and a wireless router 6. It is composed of a LAN 7 connecting the server 8 and the LAN 7.

騒音内音声認識装置１は、本発明に係る騒音内音声認識システム１００の中核をなす構成機器であり、図２のブロック図に示すように構成されている。すなわち、ＣＰＵ（ＣｅｎｔｒａｌＰｒｏｃｅｓｓｉｎｇＵｎｉｔ）１０は、騒音内音声認識装置１の動作を統括的に制御するプロセッサである。ＣＰＵ１０は、システムコントローラ１１を介して騒音内音声認識装置１の各部を制御する。ＣＰＵ１０は、プログラム記憶部１７に予め書き込まれているオペレーティングシステムや各種のアプリケーションプログラムをＲＡＭ（ＲａｎｄａｍＡｃｃｅｓｓＭｅｍｏｒｙ）１８にロードし、ロードされたプログラムに従って処理を実行することにより、騒音内音声認識装置１の各部の制御機能を実現する。 The in-noise voice recognition device 1 is a constituent device that forms the core of the in-noise voice recognition system 100 according to the present invention, and is configured as shown in the block diagram of FIG. That is, the CPU (Central Processing Unit) 10 is a processor that comprehensively controls the operation of the noise recognition device 1. The CPU 10 controls each part of the noise recognition device 1 via the system controller 11. The CPU 10 loads the operating system and various application programs written in advance in the program storage unit 17 into the RAM (Random Access Memory) 18 and executes processing according to the loaded program, whereby the noise recognition device 1 Realize the control function of each part of.

また、入力された音声データから特徴量を演算処理により抽出し、抽出した特徴量に基づき音声波形のデータを演算処理により生成し、生成された音声波形のデータを音声波形データ記憶部２２に記憶し、通信部１６、無線ルータ６及びＬＡＮ７を介してサーバー８に送信し、サーバー８の記憶部８１に保存する。 Further, a feature amount is extracted from the input voice data by arithmetic processing, voice waveform data is generated by arithmetic processing based on the extracted feature amount, and the generated voice waveform data is stored in the voice waveform data storage unit 22. Then, the data is transmitted to the server 8 via the communication unit 16, the wireless router 6 and the LAN 7, and stored in the storage unit 81 of the server 8.

プログラム記憶部１７は、不揮発性メモリから構成されており、ＣＰＵ１０が制御動作を行うためのオペレーティングシステム、音声認識処理を行うアプリケーションプログラム及びプログラムの実行に必要な各種データなどが予め書き込まれている。また、アプリケーションプログラムなどのバージョンアップ等のときは、この内容を書き換えることもできる。 The program storage unit 17 is composed of a non-volatile memory, and an operating system for the CPU 10 to perform a control operation, an application program for performing voice recognition processing, various data necessary for executing the program, and the like are written in advance. In addition, this content can be rewritten when the version of the application program or the like is upgraded.

ＲＡＭ１８は、本装置のメインメモリであり、前記プログラム記憶部１７に予め書き込まれているオペレーティングシステムをインストールし、各種のアプリケーションプログラムをロードする。ＣＰＵ１０は、ＲＡＭ１８上にロードされたアプリケーションプログラムである音声処理プログラムを実行する。また、演算処理において発生するデータの一時的な記憶装置としても使用される。 The RAM 18 is the main memory of the present device, and installs an operating system written in advance in the program storage unit 17 to load various application programs. The CPU 10 executes a voice processing program which is an application program loaded on the RAM 18. It is also used as a temporary storage device for data generated in arithmetic processing.

システムコントローラ１１は、プログラム記憶部１７及びＲＡＭ１８に対するアクセス制御を行う。また、システムコントローラ１１は、グラフィックコントローラ１２、タッチパネルコントローラ１４及びサーバー８とのデータのやりとりを行う通信部１６を制御する。
また、システムコントローラ１１は、音声入力部２３に入力された測定作業者の音声信号や、操作部２５が受け付けた測定作業者の操作情報を入力する。また、スピーカ部２４から音声信号を出力する。 The system controller 11 controls access to the program storage unit 17 and the RAM 18. Further, the system controller 11 controls a communication unit 16 that exchanges data with the graphic controller 12, the touch panel controller 14, and the server 8.
Further, the system controller 11 inputs the voice signal of the measurement worker input to the voice input unit 23 and the operation information of the measurement worker received by the operation unit 25. In addition, an audio signal is output from the speaker unit 24.

また、システムコントローラ１１は、ＣＰＵ１０の演算処理に必要な音声マスタ記憶部２０、波形マス目マスタ記憶部２１、及び演算処理結果を記憶する音声波形データ記憶部２２の読み出し書き込み制御を行う。 Further, the system controller 11 performs read / write control of the voice master storage unit 20, the waveform grid master storage unit 21, and the voice waveform data storage unit 22 that stores the calculation processing results, which are necessary for the calculation processing of the CPU 10.

音声入力部２３は、具体的には、マイクロフォン５が収音した音声及び音声と一体となった工場内の騒音を電気信号に変換するとともに、ＣＰＵ１０やシステムコントローラ１１が処理し得るよう内蔵のＡ／Ｄ変換回路（ＡｎａｌｏｇｔｏＤｉｇｉｔａｌＣｏｎｖｅｒｔｅｒ）によりデジタル信号に変換したデータを音声波形としてシステムコントローラ１１に出力する。 Specifically, the voice input unit 23 converts the voice collected by the microphone 5 and the noise in the factory integrated with the voice into an electric signal, and the built-in A so that the CPU 10 and the system controller 11 can process the sound. The data converted into a digital signal by the / D conversion circuit (Analog to Digital Converter) is output to the system controller 11 as an audio waveform.

スピーカ部２４は、測定作業者が音声入力した内容をＣＰＵ１０が処理し、その結果を音声で測定作業者に知らせる。また、音声の識別ができなかった場合や発声誤りなどがあったときも、その旨測定作業者に知らせて再発声を促す。 In the speaker unit 24, the CPU 10 processes the content input by the measurement worker by voice, and notifies the measurement worker of the result by voice. In addition, when the voice cannot be identified or when there is a utterance error, the measurement worker is notified to that effect and a recurrence voice is urged.

グラフィックコントローラ（ＧｒａｐｈｉｃｓＣｏｎｔｒｏｌｌｅｒ）１２は、ディスプレイ１３に表示する画像を制御する画像表示用のコントローラである。 The graphic controller 12 is an image display controller that controls an image to be displayed on the display 13.

ディスプレイ１３は、測定作業者が音声入力した内容をＣＰＵ１０が処理し、その入力内容を前記ディスプレイ１３の画面に文字で表示して測定作業者に知らせる。また、ディスプレイ１３は表示すると共に、スピーカ部２４からも音声で知らせることもできる。また、測定作業者が発声した音声の識別ができなかった場合や入力ミスなどがあったときは、その旨画面上に表示して測定作業者に知らせて再発声を促す。
また、測定作業者に対して作業手順を表示して、作業指示書の役割を果たすよう構成することもできる。 In the display 13, the CPU 10 processes the content input by the measurement worker by voice, and displays the input content in characters on the screen of the display 13 to notify the measurement worker. In addition to displaying the display 13, the speaker unit 24 can also notify by voice. In addition, if the voice spoken by the measurement worker cannot be identified or if there is an input error, a message to that effect is displayed on the screen to notify the measurement worker and prompt a recurrence voice.
It is also possible to display the work procedure to the measurement worker and configure it to play the role of a work instruction sheet.

ディスプレイ１３は、例えば、ＬＣＤ（ＬｉｑｕｉｄＣｒｙｓｔａｌＤｉｓｐｌａｙ）や有機ＥＬ（ＥｌｅｃｔｒｏＬｕｍｉｎｅｓｃｅｎｃｅ）ディスプレイを用いて構成される。ディスプレイ１３は、モノクロ画面でもよいが、カラー画面が望ましい。 The display 13 is configured by using, for example, an LCD (Liquid Crystal Display) or an organic EL (Electro Luminescence) display. The display 13 may be a monochrome screen, but a color screen is preferable.

タッチパネルコントローラ１４は、ディスプレイ１３の画面上に配設されたタッチパネル１５の操作信号入力の制御を行うコントローラである。測定作業者が操作したタッチ位置から操作画面上の座標データをタッチパネル１５から読み取り、システムコントローラ１１へ出力する。 The touch panel controller 14 is a controller that controls the operation signal input of the touch panel 15 arranged on the screen of the display 13. The coordinate data on the operation screen is read from the touch panel 15 from the touch position operated by the measurement operator and output to the system controller 11.

発声による測定データの入力が終わり、ディスプレイ１３に画面表示された音声入力の内容が正しいときは、測定作業者は、次の測定データの入力に移り、当該次の測定データを発声する。もし、ディスプレイ１３の画面に表示された内容が正しくないときは、測定作業者は、その表示内容をタッチパネル１５の操作により取り消して、再度発声による入力を行う。
ＣＰＵ１０は、システムコントローラ１１からのタッチパネル１５の操作信号に基づき処理を行う。 When the input of the measurement data by vocalization is completed and the content of the audio input displayed on the screen on the display 13 is correct, the measurement operator moves on to the input of the next measurement data and utters the next measurement data. If the content displayed on the screen of the display 13 is incorrect, the measurement operator cancels the displayed content by operating the touch panel 15 and inputs by vocalization again.
The CPU 10 performs processing based on an operation signal of the touch panel 15 from the system controller 11.

騒音内音声認識装置１は、上述のように構成されている。
次に、騒音内音声認識装置１の入力された音声信号の演算処理について詳しく説明する。 The noise recognition device 1 is configured as described above.
Next, the arithmetic processing of the input voice signal of the noise recognition device 1 will be described in detail.

図３は、本発明に係る騒音内音声認識装置１及び騒音内音声認識システム１００の処理フローチャートである。以下、図３の処理フローチャートに基づき説明する。 FIG. 3 is a processing flowchart of the noise recognition device 1 and the noise recognition system 100 according to the present invention. Hereinafter, description will be made based on the processing flowchart of FIG.

本発明に係る騒音内音声認識システム１００は、工場現場で日々行われている試験、検査などの測定データを音声入力し、入力された音声データから音声波形４１を所定の時間幅で切り出して、切り出した音声波形４２から騒音に該当する音声データを除去し、騒音を除去した音声波形６１から特定の周波数帯域に分類される特徴量６３を抽出し、抽出した特徴量６３と音声マスタ６４との照合を行い音声波形のデータを生成し、又は／及び音声波形をマス目テーブル上に配列した波形マス目パターン７１と、波形マス目マスタ７３との照合処理を行い、音声波形のデータを生成し、当該生成された音声波形のデータを記憶部に記憶すると共に、工場内のＬＡＮ７などの通信回線を介してサーバー８にまとめて収集し、音声解析を行うことができるよう構成されたものである。 The voice recognition system 100 in noise according to the present invention inputs measurement data such as tests and inspections that are performed daily at a factory site by voice, cuts out a voice waveform 41 from the input voice data in a predetermined time width, and then cuts out the voice waveform 41 in a predetermined time width. The voice data corresponding to the noise is removed from the cut out voice waveform 42, the feature amount 63 classified into a specific frequency band is extracted from the voice waveform 61 from which the noise is removed, and the extracted feature amount 63 and the voice master 64 are combined. Collation is performed to generate voice waveform data, or / and the waveform grid pattern 71 in which the voice waveform is arranged on the grid table is collated with the waveform grid master 73 to generate voice waveform data. The data of the generated voice waveform is stored in the storage unit, and is collectively collected in the server 8 via a communication line such as LAN 7 in the factory so that voice analysis can be performed. ..

具体的な用途としては、例えば、自動車部品である鉄タンクなどの被測定物３の耐チッピング塗装（Ａｎｔｉ−ＣｈｉｐｐｉｎｇＣｏａｔｉｎｇ）の膜厚を測定する場合に使用することができる。なお、耐チッピング塗装とは、自動車のボディのフロア下面、ホイールハウス内などに、石跳ねなどによる塗膜のダメージを防ぐために塗られる耐チッピング性能を向上させた塗装のことをいう。 As a specific application, for example, it can be used when measuring the film thickness of anti-chipping coating of an object 3 to be measured such as an iron tank which is an automobile part. The chipping-resistant coating refers to a coating with improved chipping resistance that is applied to the underside of the floor of an automobile body, the inside of a wheel house, etc. to prevent damage to the coating film due to stone splashing or the like.

以上のことを前提に、処理ステップについて説明する。まず、測定作業者は、膜厚測定器４を使用して所定の作業手順に従って被測定物３の所定の個所の膜厚を測定する。そして、作業者は、計測データをその場で発声する。騒音内音声認識装置１の音声入力部２３は、マイクロフォン５を介して入力された測定作業者の発声音と工場内の騒音とが入り混じった音声情報データを、システムコントローラ１１を介してＣＰＵ１０に入力する（Ｓ１０１）。 Based on the above, the processing steps will be described. First, the measuring operator uses the film thickness measuring device 4 to measure the film thickness at a predetermined position of the object to be measured 3 according to a predetermined working procedure. Then, the worker utters the measurement data on the spot. The voice input unit 23 of the noise recognition device 1 transmits voice information data, which is a mixture of the voice of the measurement worker input via the microphone 5 and the noise in the factory, to the CPU 10 via the system controller 11. Input (S101).

ＣＰＵ１０は、測定作業者の発声音と工場内の騒音とが入り混じった音声情報データから騒音成分だけを除去するために、「波形切り出しスレッド」により、入力された音声波形４１を図４（ａ）に示すような間隔で、音声波形４１ａ〜４１ｄ・・・のように切り出す。切り出し幅は０．３ｍｓｅｃ乃至０．７ｍｓｅｃである。
図４（ｂ）に示すように、切り出された音声波形４２ａ〜４２ｃ・・・は、ＲＡＭ１８上のプール領域に格納される。ここで、「スレッド」とは、アプリケーションを処理する単位のことをいう（Ｓ１０２）。 In order to remove only the noise component from the voice information data in which the voice of the measurement worker and the noise in the factory are mixed, the CPU 10 obtains the voice waveform 41 input by the “waveform cutting thread” in FIG. 4 (a). ), The audio waveforms 41a to 41d ... Are cut out. The cutout width is 0.3 msec to 0.7 msec.
As shown in FIG. 4B, the cut out voice waveforms 42a to 42c ... Are stored in the pool area on the RAM 18. Here, the “thread” refers to a unit for processing an application (S102).

次に、「波形認識処理スレッド」は、切り出した音声波形４２ａ〜４２ｃ・・・について音声認識処理を行う。波形認識処理スレッドは、前記プール領域を常時監視しており、プール領域に格納された音声波形４２ａ〜４２ｃ・・・を先入れ先出し、すなわち、先に格納した順に取り出して音声認識処理を行う。 Next, the “waveform recognition processing thread” performs voice recognition processing on the cut out voice waveforms 42a to 42c. The waveform recognition processing thread constantly monitors the pool area, and performs voice recognition processing by taking out the voice waveforms 42a to 42c ... Stored in the pool area in the first-in first-out manner, that is, in the order in which they are stored first.

ここで音声認識処理とは、プール領域に格納され、切り出された音声波形４２ａ〜４２ｃ・・・のうち、音声を含んだ波形と、音声を含んでいない単なる騒音とに切り分ける処理のことである。すなわち、測定作業者は常に連続的に発生しているわけではなく、通常、測定したときにのみ、その測定値を発声することになる。一方、ＣＰＵ１０は、測定作業者の音声と工場内の騒音とが入り混じった音声情報を常時入力しているため、そのほとんどが、測定作業者の音声を含んでいない工場内の騒音となる。そこで、「波形認識処理スレッド」は、工場内の騒音のみの音声波形を廃棄する。 Here, the voice recognition process is a process of separating the waveforms including the voice and the simple noise that does not contain the voice among the voice waveforms 42a to 42c ... Stored in the pool area and cut out. .. That is, the measurement worker does not always generate the measured value continuously, and usually, the measured value is uttered only when the measurement is performed. On the other hand, since the CPU 10 constantly inputs voice information in which the voice of the measurement worker and the noise in the factory are mixed, most of the noise is the noise in the factory that does not include the voice of the measurement worker. Therefore, the "waveform recognition processing thread" discards the noise-only voice waveform in the factory.

「波形認識処理スレッド」の処理と、前記「波形切り出しスレッド」の処理とは、図５（ａ）、（ｂ）に示すように、直列、並列のいずれでも処理することができる。本実施形態では、図５（ａ）に示すように、直列に行うのではなく、図５（ｂ）に示すように、並列に行うことで、直列処理と比較して理論値で約３２％、高速に処理を行うことができる（Ｓ１０３）。 As shown in FIGS. 5A and 5B, the processing of the “waveform recognition processing thread” and the processing of the “waveform cutting thread” can be processed in either series or in parallel. In the present embodiment, the theoretical value is about 32% as compared with the series processing by performing in parallel as shown in FIG. 5 (b) instead of performing in series as shown in FIG. 5 (a). , High-speed processing can be performed (S103).

次に、波形認識処理スレッドで処理された音声波形からさらに騒音を除去する。具体的には、次のような場合は、騒音と判断する。
（１）振幅が大きい音声波形は騒音と判断し、廃棄する。
（２）周波数が人間の声の４５０Ｈｚから１０５０Ｈｚの範囲に収まらない音声波形は騒音と判断し、廃棄する（Ｓ１０４）。 Next, noise is further removed from the voice waveform processed by the waveform recognition processing thread. Specifically, in the following cases, it is judged to be noise.
(1) A voice waveform with a large amplitude is judged to be noise and discarded.
(2) A voice waveform whose frequency does not fall within the range of 450 Hz to 1050 Hz of the human voice is judged to be noise and discarded (S104).

次に、図６に示すように、特徴量によるパターン照合を次の手順で行う。
（１）図６（ａ）の左図は、上記により騒音をある程度除去した音声波形６１を示す。図中、縦軸は音声の振幅すなわち音の大きさを示し、横軸は時間を示している。ＣＰＵ１０は、音声波形６１についてフーリエ変換を行い、振幅と時間軸の音声波形６１を、波形音圧と周波数軸の波形６２に変換する。フーリエ変換を行うことによって、図６（ａ）の右図に示すように、音声波形６１を、周波数に対する音圧の大きさとの関係である波形６２に変換することができる（Ｓ１０５）。
なお、フーリエ変換そのものは公知技術であるので、説明は省略する。
（２）フーリエ変換を行った結果の波形６２について、以下の特徴量演算式により１２次の近似曲線より特徴量を求める。
特徴量＝（変換係数）×ｌｎ｛（周波数÷レート）＋１｝
上記演算式で求めた特徴量６３を図６（ｂ）の表に示す（Ｓ１０６）。 Next, as shown in FIG. 6, pattern matching based on the feature amount is performed by the following procedure.
(1) The left figure of FIG. 6A shows a voice waveform 61 in which noise is removed to some extent as described above. In the figure, the vertical axis shows the amplitude of the voice, that is, the loudness, and the horizontal axis shows the time. The CPU 10 performs a Fourier transform on the voice waveform 61, and converts the voice waveform 61 on the amplitude and time axes into the waveform sound pressure and the waveform 62 on the frequency axis. By performing the Fourier transform, as shown in the right figure of FIG. 6A, the voice waveform 61 can be converted into the waveform 62, which is the relationship between the magnitude of the sound pressure with respect to the frequency (S105).
Since the Fourier transform itself is a known technique, the description thereof will be omitted.
(2) For the waveform 62 resulting from the Fourier transform, the feature amount is obtained from the 12th-order approximate curve by the following feature amount calculation formula.
Feature = (conversion coefficient) x ln {(frequency ÷ rate) +1}
The feature amount 63 obtained by the above calculation formula is shown in the table of FIG. 6 (b) (S106).

なお、本発明における「特徴量」とは、あるものを識別するために、そのモノの大きさ、重さ、長さ、形状といった特徴の組合せのことを指している。例えば、人、車、動物、文字、音や顔などを識別するには、それぞれの「もの」について最適な特徴量の値を決定してゆくことが重要である。この技術は画像認識領域の業界で最も採用されており公知技術であるため、特徴量の値の決定や演算式についての説明は省略する。 The "feature amount" in the present invention refers to a combination of features such as the size, weight, length, and shape of the object in order to identify a certain object. For example, in order to identify people, cars, animals, characters, sounds, faces, etc., it is important to determine the optimum feature value for each "thing". Since this technique is the most adopted and known technique in the image recognition field, the description of the determination of the feature value and the calculation formula will be omitted.

図６（ｃ）は、音声マスタ６４を示している。音声マスタ６４には、例えば、「あいうえお・・・」の各音順に５０音分等の特徴量が、音声マスタ記憶部２０に予め記憶されている。ＣＰＵ１０は、当該音声マスタ６４を音声マスタ記憶部２０から読み出し、前記演算式で求めた図６（ｂ）に示す特徴量６３と照合する。 FIG. 6C shows the voice master 64. In the voice master 64, for example, feature quantities such as 50 sounds in the order of each sound of "aiueo ..." are stored in advance in the voice master storage unit 20. The CPU 10 reads the voice master 64 from the voice master storage unit 20 and collates it with the feature amount 63 shown in FIG. 6B obtained by the calculation formula.

例えば、特徴量６３は、図６（ｂ）の特徴量６３と、図６（ｃ）の音声マスタ６４に予め記憶されている特徴量とを順番に照合してゆくと、「あ」の音と一致していることがわかる。したがって、ＣＰＵ１０は、図６（ａ）の音声波形６１を「あ」の音であると判断する（Ｓ１０７）。 For example, when the feature amount 63 in FIG. 6B is collated with the feature amount stored in advance in the voice master 64 in FIG. 6C in order, the sound “A” is heard. It can be seen that it matches with. Therefore, the CPU 10 determines that the voice waveform 61 of FIG. 6A is the sound of “a” (S107).

ここにおいて、ＣＰＵ１０は、図７（ｃ）に示すように、後述する波形マス目マスタ記憶部２１に記憶された波形マスタ７２から「あ」の音の波形マス目マスタ７３を読み出すことにより「あ」の音声波形を生成する。 Here, as shown in FIG. 7C, the CPU 10 reads out the waveform square master 73 of the sound “A” from the waveform master 72 stored in the waveform square master storage unit 21 described later, thereby “a”. To generate the audio waveform.

上記例では、特徴量６３が「あ」の音に完全に一致した場合について説明したが、照合結果が完全に一致しなかった場合には、当該音の特徴量６３に近いものが音声マスタ６４に予め記憶されている特徴量の中にあり、かつ、照合の結果、当該音の特徴量６３に近いものが他に存在しない場合には、その特徴量６３に近い音声マスタ６４の当該特徴量が当該音声波形６１であると判断する（Ｓ１０８）。そして、波形マス目マスタ記憶部２１に記憶された波形マスタ７２から当該音声波形６１の波形マス目マスタ７３を読み出すことにより当該音声波形６１の音声波形のデータを生成する（Ｓ１０９）。 In the above example, the case where the feature amount 63 completely matches the sound of “A” has been described, but when the collation results do not completely match, the voice master 64 is close to the feature amount 63 of the sound. If there is no other feature amount that is close to the feature amount 63 of the sound as a result of collation, the feature amount of the voice master 64 that is close to the feature amount 63 is included in the feature amount stored in advance. Is determined to be the voice waveform 61 (S108). Then, the data of the voice waveform of the voice waveform 61 is generated by reading the waveform square master 73 of the voice waveform 61 from the waveform master 72 stored in the waveform square master storage unit 21 (S109).

なお、音声波形のデータの生成は、波形マス目マスタ記憶部２１に記憶された「あ」の音の波形マス目マスタ７３を読み出すことでなし得るが、これに限定されるものではなく、別途、例えば、「あいうえお・・・」の各音順に５０音分等のマスタ音声波形記憶部を設けて、そこから音声波形を読み出して生成するよう構成してもよい。又は音声波形をキャラクタデータに変換して、キャラクタデータで読み出すよう構成してもよい（Ｓ１０９）。 Note that the generation of voice waveform data can be performed by reading out the waveform square master 73 of the sound of "A" stored in the waveform square master storage unit 21, but the generation is not limited to this, and is not limited to this. For example, a master voice waveform storage unit for 50 sounds or the like may be provided in each sound order of "aiueo ...", and the voice waveform may be read out and generated from the master voice waveform storage unit. Alternatively, the voice waveform may be converted into character data and read out as the character data (S109).

ＣＰＵ１０は、上記演算処理により生成された音声波形のデータを、音声波形データ記憶部２２に記憶する（Ｓ１１０）。
また、ＣＰＵ１０は、通信部１６、無線ルータ６及びＬＡＮ７を介してサーバー８に送信し、サーバー８の記憶部８１に記憶する（Ｓ１１１）。
また、サーバー８は、前記音声波形のデータに基づき音声解析を行う（Ｓ１１２）。 The CPU 10 stores the voice waveform data generated by the above arithmetic processing in the voice waveform data storage unit 22 (S110).
Further, the CPU 10 transmits to the server 8 via the communication unit 16, the wireless router 6 and the LAN 7, and stores the data in the storage unit 81 of the server 8 (S111).
Further, the server 8 performs voice analysis based on the voice waveform data (S112).

以上の例では、特段の問題なく音声波形の切り出し、騒音の除去、特徴量による照合が行われ、照合結果が一音に絞れた場合について説明した。 In the above example, the case where the voice waveform is cut out, the noise is removed, and the collation by the feature amount is performed without any particular problem, and the collation result is narrowed down to one sound has been described.

しかし、現実には工場現場の騒音が大きかったり、測定作業者の声が小さかったり、発音が不明瞭である場合があり、このような場合には、両者の特徴量が一致せず、近似する特徴量が複数存在する場合があり得る（Ｓ１１３）。このような場合には、さらに、先述の波形マス目マスタ７３を用いて波形パターンの照合を行う。 However, in reality, the noise at the factory site may be loud, the voice of the measurement worker may be quiet, or the pronunciation may be unclear. In such cases, the features of the two do not match and are similar. There may be a plurality of feature quantities (S113). In such a case, the waveform pattern is further collated using the waveform grid master 73 described above.

図７は、マス目テーブル上に配列した波形マス目パターン７１と、波形マス目マスタ７３との照合処理の説明図である。ＣＰＵ１０は、図６（ａ）に示す音声波形６１を波形マス目パターン７１として、図７（ａ）に示すように、例えば、横軸に（２^１６−１＝６５５３５）ビット、横軸に（２^１６−１＝６５５３５）ビットで構成された２次元のマス目テーブル上に、配列する。 FIG. 7 is an explanatory diagram of a collation process between the waveform grid pattern 71 arranged on the grid table and the waveform grid master 73. The CPU 10 uses the voice waveform 61 shown in FIG. 6A as the waveform grid pattern 71, and as shown in FIG. 7A, for example, the horizontal axis is (216 -1 = 65535) bits and the horizontal axis is ( ^{2 16 -1 = 65535) bits.} 2 ¹⁶ -1 = 65535) Arrange on a two-dimensional grid table composed of bits.

一方、図７（ｃ）に示すように、波形マス目マスタ記憶部２１には、例えば、「あいうえお・・・」の各音順に５０音分等の波形マスタ７２が予め記憶されている。ＣＰＵ１０は、当該波形マスタ７２に予め記憶されている図７（ｂ）に示す波形マス目マスタ７３を順番に波形マスタ７２から読み出し、前記マス目テーブル上に配列した図７（ａ）に示す波形マス目パターン７１と順次照合する。
すなわち、音声波形の傾き、振幅の大きさ、周期、波形全体の長さにより波形を正確に認識するために、視覚的手法により波形のパターン認識を行うものである（Ｓ１１４）。 On the other hand, as shown in FIG. 7C, the waveform master storage unit 21 stores, for example, a waveform master 72 for 50 sounds in the order of each sound of “aiueo ...” in advance. The CPU 10 sequentially reads out the waveform grid master 73 shown in FIG. 7 (b) stored in the waveform master 72 in advance from the waveform master 72, and arranges the waveforms shown in FIG. 7 (a) on the grid table. It is sequentially collated with the grid pattern 71.
That is, in order to accurately recognize the waveform based on the inclination of the voice waveform, the magnitude of the amplitude, the period, and the length of the entire waveform, the pattern recognition of the waveform is performed by a visual method (S114).

照合は座標ごとに順次行い、照合が一致する座標をスコアとして集計する。そして、集計したスコアが一番高い波形マス目マスタ７３を当該音の波形マス目マスタ７３とする。すなわち、集計したスコアが一番高い波形マス目マスタ７３が「あ」の音である場合は、図７（ａ）に示す当該波形マス目パターン７１の音は、「あ」であると判断して、波形マスタ７２から当該「あ」の音の波形マス目マスタ７３を読み出すことにより当該音の音声波形を生成する（Ｓ１０９）。 Matching is performed sequentially for each coordinate, and the coordinates that match the matching are totaled as a score. Then, the waveform square master 73 having the highest total score is designated as the waveform square master 73 of the sound. That is, when the waveform square master 73 having the highest aggregated score is the sound of "a", it is determined that the sound of the waveform square pattern 71 shown in FIG. 7A is "a". Then, the voice waveform of the sound is generated by reading the waveform grid master 73 of the sound of the “a” from the waveform master 72 (S109).

ＣＰＵ１０は、上記演算処理により生成された音声波形のデータを、音声波形データ記憶部２２に記憶する（Ｓ１１０）。
また、ＣＰＵ１０は、当該音声波形のデータを通信部１６、無線ルータ６及びＬＡＮ７を介してサーバー８に送信し、サーバー８は、当該音声波形のデータを記憶部８１に記憶する（Ｓ１１１）。 The CPU 10 stores the voice waveform data generated by the above arithmetic processing in the voice waveform data storage unit 22 (S110).
Further, the CPU 10 transmits the voice waveform data to the server 8 via the communication unit 16, the wireless router 6 and the LAN 7, and the server 8 stores the voice waveform data in the storage unit 81 (S111).

サーバー８は、記憶部８１に記憶された当該音声波形のデータに基づき音声解析を行い、音声解析結果をもとに測定作業者の測定データを検査成績書等のデータとしてまとめる。そして、そのデータをサーバー８の記憶部８１に記憶すると共に、生産ライン上の当該製品の出荷判定や品質統計等に使用する（Ｓ１１２）。 The server 8 performs voice analysis based on the voice waveform data stored in the storage unit 81, and collects the measurement data of the measurement worker as data such as an inspection report based on the voice analysis result. Then, the data is stored in the storage unit 81 of the server 8 and used for shipping determination, quality statistics, and the like of the product on the production line (S112).

音声波形データ記憶部２２及びサーバー８の記憶部８１に記憶された当該音声波形のデータは、当該音の波形マス目マスタ７３のデータそのものであるため、クリアな音声波形として記憶される。すなわち、当該記憶された音声波形のデータは騒音成分を一切含んでいないため、サーバー８で行う音声解析処理において音声認識精度を向上させることができる。 Since the voice waveform data stored in the voice waveform data storage unit 22 and the storage unit 81 of the server 8 is the data itself of the waveform grid master 73 of the sound, it is stored as a clear voice waveform. That is, since the stored voice waveform data does not contain any noise component, the voice recognition accuracy can be improved in the voice analysis process performed by the server 8.

本発明に係る騒音内音声認識装置１及び騒音内音声認識システム１００は以上のように構成されているために、次のような顕著な効果を奏する。
例えば、工場内の騒音内音声認識装置１は、タブレット端末などのハードウェア資源を使ってソフトウェアで実現することができるため、持ち運びが簡単で、工場内の作業現場に一人で手軽に持ち運んで測定及びデータの記録をすることができ、利便性に優れている。 Since the in-noise voice recognition device 1 and the in-noise voice recognition system 100 according to the present invention are configured as described above, the following remarkable effects are exhibited.
For example, the noise recognition device 1 in a factory can be realized by software using hardware resources such as a tablet terminal, so that it is easy to carry and can be easily carried and measured by one person at a work site in the factory. And it is possible to record data, which is excellent in convenience.

また、人の発声音と騒音が入り混じっている騒音波形の中から人の発声音を抽出して測定データを認識し、この音声波形のデータを当該タブレット端末の記憶領域に保存することにより、工場内の騒音環境下において正しい音声認識が行えると共に、測定データを電子データとして保存、管理することができる。 In addition, by extracting the human voice from the noise waveform in which the human voice and noise are mixed, recognizing the measurement data, and saving the data of the voice waveform in the storage area of the tablet terminal. Correct voice recognition can be performed in a noisy environment in the factory, and measurement data can be saved and managed as electronic data.

また、抽出した特徴量６３と、音声マスタ記憶部２０に記憶されている音声マスタ６４の特徴量との照合によって最も近い音声マスタを抽出するものであるため、構成が簡単で、かつ、処理速度を速くすることができる。 Further, since the closest voice master is extracted by collating the extracted feature amount 63 with the feature amount of the voice master 64 stored in the voice master storage unit 20, the configuration is simple and the processing speed is high. Can be made faster.

また、切り出された音声波形６１をマス目テーブル上に配列し、波形マス目マスタ７３と照合することによって、配列した波形マス目パターン７１に最も近い波形マス目マスタ７３を抽出することができるため、構成が簡単で分かり易く、波形マスタ７２の記憶データを調整することで照合精度の微調整も容易にすることができる。 Further, by arranging the cut-out voice waveform 61 on the grid table and collating it with the waveform grid master 73, the waveform grid master 73 closest to the arranged waveform grid pattern 71 can be extracted. The configuration is simple and easy to understand, and fine adjustment of the collation accuracy can be facilitated by adjusting the stored data of the waveform master 72.

また、抽出した特徴量６３と、音声マスタ記憶部２０に記憶されている音声マスタ６４の特徴量とを照合した結果、複数の候補が存在するために音声マスタ６４の特徴量を抽出する判定ができない場合でも、切り出された音声波形６１をマス目テーブル上に配列し、波形マス目マスタ７３と照合することによって、配列した波形マス目パターン７１に最も近い波形マス目マスタ７３を抽出することができるため、より正確に音声波形のデータを生成することができる。 Further, as a result of collating the extracted feature amount 63 with the feature amount of the voice master 64 stored in the voice master storage unit 20, it is determined that the feature amount of the voice master 64 is extracted because there are a plurality of candidates. Even if it is not possible, by arranging the cut out voice waveform 61 on the grid table and collating it with the waveform grid master 73, it is possible to extract the waveform grid master 73 closest to the arranged waveform grid pattern 71. Therefore, it is possible to generate voice waveform data more accurately.

また、工場現場で日々行われている試験、検査などの測定データを手書き作業やパソコン入力作業を経ることなく、リアルタイムにサーバーに収集し、音声解析をすることができるため、騒音の大きな工場現場における測定作業の省力化とデータ処理の迅速化を図ることができる。 In addition, measurement data such as tests and inspections that are performed daily at the factory site can be collected on a server in real time and voice analysis can be performed without going through handwriting work or computer input work, so the factory site is noisy. It is possible to save labor in measurement work and speed up data processing.

また、音声波形データ記憶部２２及びサーバー８の記憶部８１に記憶された当該音声波形のデータは、当該音の波形マス目マスタ７３のデータそのものであるため、クリアな音声波形として記憶される。すなわち、当該記憶された音声波形のデータは騒音成分を一切含んでいないため、サーバー８で行う音声解析処理において音声認識精度を向上させることができる。 Further, since the voice waveform data stored in the voice waveform data storage unit 22 and the storage unit 81 of the server 8 is the data itself of the waveform grid master 73 of the sound, it is stored as a clear voice waveform. That is, since the stored voice waveform data does not contain any noise component, the voice recognition accuracy can be improved in the voice analysis process performed by the server 8.

以上の実施形態において説明した本発明に係る騒音内音声認識装置１及び騒音内音声認識システムは、上述した実施形態に限られず、上述した実施形態の中で開示した各構成を相互に置換したり組み合わせを変更した構成、公知発明及び上述した実施形態の中で開示した各構成を相互に置換したり組み合わせを変更した構成等も含まれる。また、本発明の技術的範囲は上述した実施形態に限定されず、特許請求の範囲に記載された事項とその均等物にまで及ぶものである。 The in-noise voice recognition device 1 and the in-noise voice recognition system according to the present invention described in the above embodiments are not limited to the above-described embodiments, and may replace each configuration disclosed in the above-described embodiments with each other. It also includes a configuration in which the combination is changed, a known invention, and a configuration in which the configurations disclosed in the above-described embodiments are replaced with each other or the combination is changed. Further, the technical scope of the present invention is not limited to the above-described embodiment, but extends to the matters described in the claims and their equivalents.

例えば、本発明に係る騒音内音声認識装置１の実施形態としては、タブレット端末の他に携帯用パソコンを使用してもよい。また、携帯電話機やスマートフォンを使用してもよい。また専用のコンピュータであってもよい。要は、本発明を実現することができるハードウェア資源を有するものであれば、何を使用しても差し支えない。 For example, as an embodiment of the noise recognition device 1 according to the present invention, a portable personal computer may be used in addition to the tablet terminal. Further, a mobile phone or a smartphone may be used. It may also be a dedicated computer. In short, any hardware resource that can realize the present invention may be used.

また、音声波形６１の照合処理の他の実施の形態としては、音声波形６１から算出した特徴量６３と、音声マスタ６４の特徴量との照合処理のみとしてもよい。
また、音声波形６１をマス目テーブル上に配列した波形マス目パターン７１と、波形マス目マスタ７３との照合処理のみとしてもよい。
また、先に音声波形６１をマス目テーブル上に配列した波形マス目パターン７１と、波形マスタ７２に予め記憶された波形マス目マスタ７３との照合処理を行い、その照合結果が複数ある場合は、音声波形６１から算出した特徴量６３と、音声マスタ６４に予め記憶された特徴量との照合処理を行うように構成してもよい。
要するに、特徴量６３による照合処理と、波形マス目パターン７１による照合処理とのいずれか一方のみの処理としてもよいし、両方の処理をしてもよい。また、両方の処理を行う場合は、どちらを先にしてもよい。 Further, as another embodiment of the collation process of the voice waveform 61, only the collation process of the feature amount 63 calculated from the voice waveform 61 and the feature amount of the voice master 64 may be performed.
Further, only the collation processing of the waveform grid pattern 71 in which the voice waveform 61 is arranged on the grid table and the waveform grid master 73 may be performed.
Further, when the waveform grid pattern 71 in which the voice waveform 61 is arranged on the grid table first and the waveform grid master 73 stored in advance in the waveform master 72 are collated, and there are a plurality of collation results, , The feature amount 63 calculated from the voice waveform 61 and the feature amount stored in advance in the voice master 64 may be collated.
In short, only one of the collation process by the feature amount 63 and the collation process by the waveform grid pattern 71 may be performed, or both processes may be performed. Further, when both processes are performed, whichever process may be performed first.

また、マス目テーブルは、縦軸、横軸とも（２^１６−１＝６５５３５）ビットであるとして説明したが、（２^８−１＝２５５）ビットで構成しても差し支えない。つまり、マス目テーブルの縦軸、横軸のビット数は、測定データの精度に応じて任意の値に決定することができる。 Further, grid table, the vertical axis, with the horizontal axis ⁽² 16 -1 = 65535) has been described as a ^bit, no problem be constituted by (2 8 -1 = 255) bits. That is, the number of bits on the vertical axis and the horizontal axis of the grid table can be determined to be arbitrary values according to the accuracy of the measurement data.

また、本発明に係る騒音内音声認識システム１００は、無線ルータ６を介して騒音内音声認識装置１とサーバー８を接続する例について説明したが、直接インターネットの有線の回線を介して接続してもよい。 Further, the noise recognition system 100 according to the present invention has described an example of connecting the noise recognition device 1 and the server 8 via a wireless router 6, but the system 100 is directly connected via a wired line of the Internet. May be good.

また、本発明に係る騒音内音声認識システム１００は、自動車工場における自動車部品の測定作業を例にとって説明したが、本発明の用途は自動車工場に限られず、他の製造業や騒音の大きな建設現場などに幅広く応用することができることはいうまでもない。 Further, the noise recognition system 100 according to the present invention has been described by taking the measurement work of automobile parts in an automobile factory as an example, but the application of the present invention is not limited to the automobile factory, and other manufacturing industries and construction sites with a large noise are generated. Needless to say, it can be widely applied to such applications.

１騒音内音声認識装置
３被測定物
４膜厚測定器
５マイクロフォン
６無線ルータ
７ＬＡＮ
８サーバー
１０ＣＰＵ
１１システムコントローラ
１２グラフィックコントローラ
１３ディスプレイ
１４タッチパネルコントローラ
１５タッチパネル
１６通信部
１７プログラム記憶部
１８ＲＡＭ
２０音声マスタ記憶部
２１波形マス目マスタ記憶部
２２音声波形データ記憶部
２３音声入力部
２４スピーカ部
２５操作部
６１音声波形
６３特徴量
６４音声マスタ
７１波形マス目パターン
７２波形マスタ
７３波形マス目マスタ
８１記憶部
１００騒音内音声認識システム

1 Voice recognition device in noise 3 Measured object 4 Film thickness measuring device 5 Microphone 6 Wireless router 7 LAN
8 server 10 CPU
11 System controller 12 Graphic controller 13 Display 14 Touch panel controller 15 Touch panel 16 Communication unit 17 Program storage unit 18 RAM
20 Voice master storage unit 21 Waveform grid master storage unit 22 Voice waveform data storage unit 23 Voice input unit 24 Speaker unit 25 Operation unit 61 Voice waveform 63 Feature quantity 64 Voice master 71 Waveform grid pattern 72 Waveform master 73 Waveform grid master 81 Storage 100 Noise recognition system

Claims

Voice data input means and
A means for cutting out a voice waveform input from the voice data input means and a means for cutting out the voice waveform.
A means to remove noise from the cut out voice waveform,
A means of generating voice waveform data from a voice waveform from which noise has been removed,
A storage means for storing the generated voice waveform data in a storage area,
A voice recognition device in noise, which is characterized by being composed of.

The means for generating the voice waveform data from the noise-removed voice waveform is
A means for extracting features classified into a specific frequency band from a noise-removed voice waveform, and
A voice master storage means for storing voice masters of syllable features in advance,
A means for collating the extracted feature amount with the feature amount of the voice master stored in advance and extracting the voice master closest to the extracted feature amount.
A means for using the extracted voice master as voice waveform data,
The voice recognition device in noise according to claim 1, wherein the voice recognition device is made of.

The means for generating the voice waveform data from the noise-removed voice waveform is
A waveform grid master storage means for storing the waveform grid master of the syllable voice waveform in advance,
A means for arranging the cut-out voice waveforms on a grid table, collating the arranged waveform grid patterns, and extracting the waveform grid master closest to the waveform grid pattern.
A means to use the extracted waveform grid master as voice waveform data,
The voice recognition device in noise according to claim 1, wherein the voice recognition device is made of.

The means for generating the voice waveform data from the noise-removed voice waveform is
A means for extracting features classified into a specific frequency band from a noise-removed voice waveform, and
A voice master storage means for storing voice masters of syllable features in advance,
A means for collating the extracted feature amount with the feature amount of the voice master stored in advance and extracting the voice master closest to the extracted feature amount.
A means for using the extracted voice master as voice waveform data,
If there are multiple features stored in the voice master that are close to the extracted features,
A waveform grid master storage means for storing the waveform grid master of the syllable voice waveform in advance,
A means for arranging the cut-out voice waveforms on a grid table, collating the arranged waveform grid patterns, and extracting the waveform grid master closest to the waveform grid pattern.
A means to use the extracted waveform grid master as voice waveform data,
The voice recognition device in noise according to claim 1, wherein the voice waveform data is extracted from the voice waveform data.

The noise recognition system of the present invention includes the noise recognition device and the noise recognition device.
A voice input device that inputs a voice signal to the noise recognition device,
A server that transmits voice waveform data of the storage means by a communication means provided in the noise recognition device, receives the voice waveform data, and stores the voice waveform data in a storage area.
A voice recognition system in noise that is characterized by being composed of.