JP2021033134A

JP2021033134A - Evaluation device, evaluation method, and evaluation program

Info

Publication number: JP2021033134A
Application number: JP2019154876A
Authority: JP
Inventors: 雅史西村; Masafumi Nishimura
Original assignee: Shizuoka University NUC
Current assignee: Shizuoka University NUC
Priority date: 2019-08-27
Filing date: 2019-08-27
Publication date: 2021-03-01
Anticipated expiration: 2039-08-27
Also published as: JP7378770B2

Abstract

To evaluate a wearing position suitable to a throat microphone in the case of generating a voice signal by using the throat microphone.SOLUTION: An evaluation device 1 includes: a feature quantity extractor 11 for extracting a first spectral feature quantity on the basis of a voice signal from a throat microphone M1, and extracting a second spectral feature quantity on the basis of a voice signal from a close-talking microphone M2; a distance calculation unit 13 for calculating a spectral distance on the basis of the first spectral feature quantity and the second spectral feature quantity; and an average value calculation unit 16 for calculating and outputting the average value of spectral distances which are temporally continuously calculated by the distance calculation unit 13.SELECTED DRAWING: Figure 1

Description

本発明は、ユーザにおける咽喉マイクの装着位置を評価する評価装置、評価方法、及び評価プログラムに関する。 The present invention relates to an evaluation device, an evaluation method, and an evaluation program for evaluating a wearing position of a throat microphone in a user.

従来から、マイクロフォン（以下、単に「マイク」と言う。）によって音声を検出することによって生成された音声信号を用いて音声認識処理を実行する装置が用いられている。例えば、下記特許文献１には、音声信号から低Ｓ／Ｎ環境下でも高精度に発話区間を検出できるシステムが開示されている。 Conventionally, a device that executes voice recognition processing using a voice signal generated by detecting voice with a microphone (hereinafter, simply referred to as “microphone”) has been used. For example, Patent Document 1 below discloses a system capable of detecting an utterance section from a voice signal with high accuracy even in a low S / N environment.

特開２００９−２１０６１７号公報Japanese Unexamined Patent Publication No. 2009-210617

上述したような音声信号の処理技術においては、複数人で行われる会議等の会話の音声を処理する際には、音声信号中に雑音が含まれたり、音声信号中に複数人の話者の音声が重なり合って含まれる場合がある。このような音声信号を対象にした場合には、高精度の音声認識処理が困難である。このような問題は、人体の頸部に直接装着して話者の発声に伴う頸部の振動を直接検出する接触型のマイクである咽喉マイクを用いることで解決される場合がある。しかしながら、咽喉マイクを頸部の適切な位置に装着しない場合には話者の発した音声の検出精度が著しく低下しがちであり、その装着位置によっては音声信号の音質が劣化しやすい場合があった。 In the above-mentioned voice signal processing technology, when processing the voice of a conversation such as a conference held by a plurality of people, noise may be included in the voice signal, or multiple speakers may be included in the voice signal. Audio may overlap and be included. When such a voice signal is targeted, high-precision voice recognition processing is difficult. Such a problem may be solved by using a throat microphone, which is a contact-type microphone that is directly attached to the neck of the human body and directly detects the vibration of the neck accompanying the vocalization of the speaker. However, if the throat microphone is not attached to an appropriate position on the neck, the detection accuracy of the voice emitted by the speaker tends to be significantly reduced, and the sound quality of the audio signal may be easily deteriorated depending on the attachment position. It was.

そこで、本発明は、かかる課題に鑑みてなされたものであり、咽喉マイクを用いて音声信号を生成する場合に咽喉マイクの適した装着位置を評価させることが可能な評価装置、評価方法、及び評価プログラムを提供することを課題とする。 Therefore, the present invention has been made in view of the above problems, and an evaluation device, an evaluation method, and an evaluation method capable of evaluating a suitable mounting position of the throat microphone when a voice signal is generated by using the throat microphone. The challenge is to provide an evaluation program.

本発明の一側面は、咽喉マイクの装着位置を評価する評価装置であって、咽喉マイクからの音声信号を基に第１のスペクトル特徴量を抽出し、音響マイクからの音声信号を基に第２のスペクトル特徴量を抽出する抽出部と、第１のスペクトル特徴量及び第２のスペクトル特徴量を基に、スペクトル距離を算出する距離算出部と、距離算出部によって時間的に連続して算出されたスペクトル距離の平均値を算出および出力する距離出力部と、を備える。なお、上記の「音響マイク」とは、咽喉マイクなどの接触型のマイクとの対比として、ユーザの発声を大気を介した振動として検出する検出機器を広く含む概念である。 One aspect of the present invention is an evaluation device for evaluating the mounting position of the throat microphone, which extracts a first spectral feature based on a voice signal from the throat microphone and is based on a voice signal from an acoustic microphone. Calculated continuously in time by an extraction unit that extracts 2 spectral feature quantities, a distance calculation unit that calculates the spectral distance based on the first spectral feature quantity and the second spectral feature quantity, and a distance calculation unit. A distance output unit for calculating and outputting the average value of the obtained spectral distances is provided. The above-mentioned "acoustic microphone" is a concept that broadly includes a detection device that detects a user's utterance as vibration through the atmosphere in contrast to a contact-type microphone such as a throat microphone.

あるいは、本発明の他の側面は、咽喉マイクの装着位置を評価する評価方法であって、咽喉マイクからの音声信号を基に第１のスペクトル特徴量を抽出し、音響マイクからの音声信号を基に第２のスペクトル特徴量を抽出する抽出ステップと、第１のスペクトル特徴量及び第２のスペクトル特徴量を基に、スペクトル距離を算出する距離算出ステップと、距離算出ステップにおいて時間的に連続して算出されたスペクトル距離の平均値を算出および出力する距離出力ステップと、を備える。 Alternatively, another aspect of the present invention is an evaluation method for evaluating the wearing position of the throat microphone, which extracts a first spectral feature based on the voice signal from the throat microphone and obtains the voice signal from the acoustic microphone. The extraction step of extracting the second spectral feature amount based on the basis, the distance calculation step of calculating the spectral distance based on the first spectral feature amount and the second spectral feature amount, and the distance calculation step are continuous in time. It is provided with a distance output step of calculating and outputting the average value of the spectral distances calculated in the above.

あるいは、本発明の他の側面は、コンピュータを、咽喉マイクからの音声信号を基に第１のスペクトル特徴量を抽出し、音響マイクからの音声信号を基に第２のスペクトル特徴量を抽出する抽出部、第１のスペクトル特徴量及び第２のスペクトル特徴量を基に、スペクトル距離を算出する距離算出部、及び距離算出部によって時間的に連続して算出されたスペクトル距離の平均値を算出および出力する距離出力部、として機能させる。 Alternatively, another aspect of the invention is to have the computer extract a first spectral feature based on the audio signal from the throat microphone and a second spectral feature based on the audio signal from the acoustic microphone. Based on the extraction unit, the first spectral feature amount, and the second spectral feature amount, the distance calculation unit that calculates the spectral distance, and the distance calculation unit calculate the average value of the spectral distances continuously calculated in time. And function as a distance output unit to output.

上記いずれかの側面によれば、咽喉マイクからの音声信号を基にして抽出された第１のスペクトル特徴量と音響マイクからの音声信号を基にした第２のスペクトル特徴量との間のスペクトル距離が算出され、時間的に連続して算出されたスペクトル距離の平均値が算出および出力される。これにより、咽喉マイクが適した位置に装着されているか否かを、咽喉マイクによる検出を基にした信号のスペクトルと音響マイクによる検出を基にした信号のスペクトルとの間の類似性を基に評価させることができる。 According to any of the above aspects, the spectrum between the first spectral feature extracted based on the audio signal from the throat microphone and the second spectral feature based on the audio signal from the acoustic microphone. The distance is calculated, and the average value of the spectral distances calculated continuously over time is calculated and output. This will determine if the throat microphone is in the right position based on the similarity between the spectrum of the signal based on the detection by the throat microphone and the spectrum of the signal based on the detection by the acoustic microphone. Can be evaluated.

上記一側面においては、第１のスペクトル特徴量を、補正用のモデルを用いて、第２のスペクトル特徴量の特性に近づくように補正する補正部をさらに備え、距離算出部は、補正後の第１のスペクトル特徴量及び第２のスペクトル特徴量を基にスペクトル距離を算出する、ことが好適である。この場合、咽喉マイクのスペクトル上の検出特性と音響マイクのスペクトル上の検出特性との差を考慮して第１のスペクトル特徴量を補正することができ、この補正された第１のスペクトル特徴量を用いることで咽喉マイクの装着位置をより適切に評価できる。 In one aspect described above, a correction unit for correcting the first spectral feature amount so as to approach the characteristics of the second spectral feature amount using a correction model is further provided, and the distance calculation unit is after the correction. It is preferable to calculate the spectral distance based on the first spectral feature amount and the second spectral feature amount. In this case, the first spectral feature can be corrected in consideration of the difference between the spectral detection characteristic of the throat microphone and the spectral detection characteristic of the acoustic microphone, and the corrected first spectral feature can be corrected. The mounting position of the throat microphone can be evaluated more appropriately by using.

また、距離算出部は、スペクトル距離として第１のスペクトル特徴量と第２のスペクトル特徴量の差を数値化してスペクトル距離を算出する、ことが好適である。この場合、咽喉マイクによる検出を基にした信号のスペクトルと音響マイクによる検出を基にした信号のスペクトルとの間の類似性を、簡易に評価することができる。 Further, it is preferable that the distance calculation unit calculates the spectral distance by quantifying the difference between the first spectral feature amount and the second spectral feature amount as the spectral distance. In this case, the similarity between the spectrum of the signal based on the detection by the throat microphone and the spectrum of the signal based on the detection by the acoustic microphone can be easily evaluated.

また、距離算出部は、スペクトル距離としてメルケプストラム距離を算出する、ことも好適である。この場合、咽喉マイクによる検出を基にした信号のスペクトルと音響マイクによる検出を基にした信号のスペクトルとの間の類似性を、簡易かつ適切に評価することができる。 It is also preferable that the distance calculation unit calculates the mer cepstrum distance as the spectral distance. In this case, the similarity between the spectrum of the signal based on the detection by the throat microphone and the spectrum of the signal based on the detection by the acoustic microphone can be easily and appropriately evaluated.

さらに、距離出力部は、咽喉マイクあるいは音響マイクからの音声信号を基に認識された発話区間において算出されたスペクトル距離の平均値を算出する、ことも好適である。この場合、ユーザの発話区間における、咽喉マイクによる検出を基にした信号のスペクトルと音響マイクによる検出を基にした信号のスペクトルとの間の類似性を評価でき、雑音の影響を受けることなく咽喉マイクの装着位置をより適切に評価できる。 Further, it is also preferable that the distance output unit calculates the average value of the spectral distances calculated in the utterance section recognized based on the voice signal from the throat microphone or the acoustic microphone. In this case, the similarity between the spectrum of the signal based on the detection by the throat microphone and the spectrum of the signal based on the detection by the acoustic microphone in the user's speech section can be evaluated, and the throat can be evaluated without being affected by noise. The mounting position of the microphone can be evaluated more appropriately.

またさらに、距離出力部は、咽喉マイクあるいは音響マイクからの音声信号を基に一定期間ごとに時間窓を順次シフトさせてスペクトル距離の平均値を算出し、シフトさせた時間窓ごとの平均値を順次出力する、ことも好適である。かかる構成によれば、時間的に連続して、咽喉マイクによる検出を基にした信号のスペクトルと音響マイクによる検出を基にした信号のスペクトルとの間の類似性を評価でき、咽喉マイクの装着位置を時間的に連続して評価できる。 Furthermore, the distance output unit calculates the average value of the spectral distance by sequentially shifting the time window at regular intervals based on the audio signal from the throat microphone or the acoustic microphone, and calculates the average value for each shifted time window. It is also preferable to output sequentially. According to such a configuration, it is possible to evaluate the similarity between the spectrum of the signal based on the detection by the throat microphone and the spectrum of the signal based on the detection by the acoustic microphone continuously in time, and the throat microphone is attached. The position can be evaluated continuously in time.

さらにまた、距離出力部は、ユーザの咽喉部の複数の装着位置を順次画面上に表示させるとともに、それぞれの装着位置に対応して算出したスペクトル距離の平均値を順次画面上に表示させる、ことも好適である。かかる構成によれば、ユーザの咽喉部における装着位置を示しながら、咽喉マイクによる検出を基にした信号のスペクトルと音響マイクによる検出を基にした信号のスペクトルとの間の類似性を評価できる。その結果、咽喉マイクの装着位置を順次変えさせながら、適した装着位置を評価させることができる。 Furthermore, the distance output unit sequentially displays a plurality of wearing positions of the user's throat on the screen, and sequentially displays the average value of the spectral distances calculated corresponding to each wearing position on the screen. Is also suitable. According to such a configuration, it is possible to evaluate the similarity between the spectrum of the signal based on the detection by the throat microphone and the spectrum of the signal based on the detection by the acoustic microphone while showing the wearing position in the throat of the user. As a result, it is possible to evaluate a suitable mounting position while sequentially changing the mounting position of the throat microphone.

本発明の一側面によれば、咽喉マイクを用いて音声信号を生成する場合に咽喉マイクの適した装着位置を評価させることができる。 According to one aspect of the present invention, when a throat microphone is used to generate an audio signal, a suitable mounting position of the throat microphone can be evaluated.

実施形態にかかる評価装置１の概略構成を示すブロック図である。It is a block diagram which shows the schematic structure of the evaluation apparatus 1 which concerns on embodiment. 図１の評価制御１のハードウェア構成を示す図である。It is a figure which shows the hardware configuration of the evaluation control 1 of FIG. 図１の評価装置１における事前学習処理における動作手順を示すフローチャートである。It is a flowchart which shows the operation procedure in the pre-learning process in the evaluation apparatus 1 of FIG. 図１の評価装置１における装着位置評価処理における動作手順を示すフローチャートである。It is a flowchart which shows the operation procedure in the mounting position evaluation process in the evaluation apparatus 1 of FIG. 図１の平均値算出部１６によるスペクトル距離の平均値の入出力デバイス１０５における出力イメージを示す図である。It is a figure which shows the output image in the input / output device 105 of the average value of the spectrum distance by the average value calculation unit 16 of FIG. 実施形態の評価プログラムの構成を示すブロック図である。It is a block diagram which shows the structure of the evaluation program of embodiment.

以下、添付図面を参照して、本発明の実施形態について詳細に説明する。なお、説明において、同一要素又は同一機能を有する要素には、同一符号を用いることとし、重複する説明は省略する。 Hereinafter, embodiments of the present invention will be described in detail with reference to the accompanying drawings. In the description, the same reference numerals will be used for the same elements or elements having the same function, and duplicate description will be omitted.

図１は、実施形態の評価装置１の概略構成を示すブロック図である。図１に示されるように、評価装置１は、ユーザの咽喉部における咽喉マイクＭ１の装着位置を評価するための装置である。評価装置１は、咽喉マイクＭ１及び音響マイクである接話型マイクＭ２からアナログ信号である音声信号を、ケーブルを介して受信可能に構成され、咽喉マイクＭ１から受信した音声信号を用いて音声認識処理を実行し、ユーザが発した音声を文字に変換して文字データを生成および記憶する機能を有する。ただし、評価装置１は、ブルートゥース（登録商標）、無線ＬＡＮ等の無線信号を用いて、咽喉マイクＭ１及び接話型マイクＭ２のうちの一方あるいは両方から音声信号を受信可能に構成されていてもよい。また、評価装置１は、音声認識の機能を必ずしも有してなくてよく、外部装置に音声信号をデジタルデータとして転送して外部装置に音声認識処理を実行させてもよい。咽喉マイクＭ１は、ユーザの咽喉付近の皮膚に装着されて発声に応じた皮膚の振動を検出して発声に対応した音声信号を生成する検出機器である。咽喉マイクＭ１としては、ピエゾ素子を内蔵したもの、あるいは、コンデンサマイクを内蔵したもの等が用いられる。接話型マイクＭ２は、ユーザの口に近づけて使用され、発声に応じた口付近の空気の振動を検出することにより音声信号を生成する検出機器である。ただし、接話型マイクＭ２は、発声を大気を介した振動として検出できる音響マイクであれば他の種類のマイクに置換されてもよく、ピンマイク、ボーカルマイク等の集音マイクに置換されてもよい。 FIG. 1 is a block diagram showing a schematic configuration of the evaluation device 1 of the embodiment. As shown in FIG. 1, the evaluation device 1 is a device for evaluating the wearing position of the throat microphone M1 in the throat of the user. The evaluation device 1 is configured to be able to receive an audio signal which is an analog signal from the throat microphone M1 and the conversational microphone M2 which is an acoustic microphone via a cable, and uses the audio signal received from the throat microphone M1 for voice recognition. It has a function of executing processing, converting the voice emitted by the user into characters, and generating and storing character data. However, even if the evaluation device 1 is configured to be able to receive audio signals from one or both of the throat microphone M1 and the talking microphone M2 using wireless signals such as Bluetooth (registered trademark) and wireless LAN. Good. Further, the evaluation device 1 does not necessarily have a voice recognition function, and may transfer the voice signal as digital data to the external device and cause the external device to execute the voice recognition process. The throat microphone M1 is a detection device that is attached to the skin near the user's throat to detect vibration of the skin in response to utterance and generate a voice signal corresponding to utterance. As the throat microphone M1, a microphone having a built-in piezo element, a microphone having a built-in condenser microphone, or the like is used. The close-talking microphone M2 is a detection device that is used close to the user's mouth and generates an audio signal by detecting the vibration of the air near the mouth in response to the utterance. However, the talking microphone M2 may be replaced with another type of microphone as long as it can detect the utterance as vibration through the atmosphere, or may be replaced with a sound collecting microphone such as a pin microphone or a vocal microphone. Good.

ここで、評価装置１は、機能的な構成要素として、特徴量抽出器１１、スペクトル補正部１２、距離算出部１３、区間検出部１４、時間窓カウンタ部１５、及び平均値算出部（距離出力部）１６を含んで構成されている。 Here, the evaluation device 1 has, as functional components, a feature amount extractor 11, a spectrum correction unit 12, a distance calculation unit 13, a section detection unit 14, a time window counter unit 15, and an average value calculation unit (distance output). Part) 16 is included.

図２は、評価装置１のハードウェア構成を示すブロック図である。図２に示すように、評価装置１は、スマートフォン、タブレット端末、コンピュータ端末等に代表される演算装置５０によって実現される。演算装置５０は、物理的には、プロセッサであるＣＰＵ（Central Processing Unit）１０１、記録媒体であるＲＡＭ（Random Access Memory）１０２又はＲＯＭ（Read Only Memory）１０３、通信モジュール１０４、及び入出力デバイス等を含んだコンピュータ等であり、各々は内部で電気的に接続されている。入出力デバイス１０５は、キーボード、マウス、ディスプレイ装置、タッチパネルディスプレイ装置、スピーカ等である。上述した評価装置１の各機能部は、ＣＰＵ１０１及びＲＡＭ１０２等のハードウェア上に実施形態の評価プログラムを読み込ませることにより、ＣＰＵ１０１の制御のもとで、通信モジュール１０４、及び入出力デバイス１０５等を動作させるとともに、ＲＡＭ１０２におけるデータの読み出し及び書き込みを行うことで実現される。 FIG. 2 is a block diagram showing a hardware configuration of the evaluation device 1. As shown in FIG. 2, the evaluation device 1 is realized by an arithmetic unit 50 represented by a smartphone, a tablet terminal, a computer terminal, or the like. The arithmetic unit 50 physically includes a CPU (Central Processing Unit) 101 as a processor, a RAM (Random Access Memory) 102 or a ROM (Read Only Memory) 103 as a recording medium, a communication module 104, an input / output device, and the like. These are computers, etc., each of which is electrically connected internally. The input / output device 105 is a keyboard, a mouse, a display device, a touch panel display device, a speaker, and the like. Each functional unit of the evaluation device 1 described above loads the evaluation program of the embodiment on the hardware such as the CPU 101 and the RAM 102, so that the communication module 104, the input / output device 105, and the like are controlled by the CPU 101. It is realized by operating and reading and writing data in the RAM 102.

以下、図１に戻って、評価装置１の各機能部の機能について詳細に説明する。 Hereinafter, returning to FIG. 1, the functions of each functional unit of the evaluation device 1 will be described in detail.

特徴量抽出器１１は、咽喉マイクＭ１及び接話型マイクＭ２の両方から同時に音声信号を受信し、それぞれの音声信号をＡ／Ｄ変換する。そして、特徴量抽出器１１は、咽喉マイクＭ１からの音声信号の全フレームを対象としたスペクトル分析を行うことにより、音声信号のスペクトル（第１のスペクトル）とスペクトル特徴量（第１のスペクトル特徴量）を抽出するとともに、接話型マイクＭ２からの音声信号の全フレームを対象にしたスペクトル分析を行うことにより、音声波形のスペクトル（第２のスペクトル）とスペクトル特徴量（第２のスペクトル特徴量）を抽出する。このスペクトル特徴量は、スペクトルの特徴を表すものであれば特定のものには限定されないが、例えば、スペクトルをフーリエ変換して得られるＬＰＣ（Linear Predictive Coding）ケプストラム、ＬＰＣメルケプストラム等の音声スペクトルの概形を表すケプストラムが挙げられる。 The feature amount extractor 11 simultaneously receives an audio signal from both the throat microphone M1 and the close-talking microphone M2, and A / D-converts each audio signal. Then, the feature amount extractor 11 performs a spectral analysis of all frames of the voice signal from the throat microphone M1 to perform a spectrum analysis of the voice signal (first spectrum) and a spectral feature amount (first spectral feature). By extracting the amount) and performing spectral analysis for all frames of the audio signal from the close-talking microphone M2, the spectrum of the audio waveform (second spectrum) and the spectral feature amount (second spectral feature) are performed. Amount) is extracted. This spectral feature quantity is not limited to a specific one as long as it represents the characteristics of the spectrum, but for example, the speech spectrum of LPC (Linear Predictive Coding) cepstrum, LPC mel cepstrum, etc. obtained by Fourier transforming the spectrum. Cepstrum, which represents the outline, can be mentioned.

また、スペクトル補正部１２は、予め特徴量抽出器１１によって同時に取得された第１のスペクトル及び第２のスペクトルの組み合わせを複数のフレーム分用いて、第１のスペクトルを第２のスペクトルに近づけるように周波数特性を補正するための機械学習の補正用モデルを作成し、内部メモリ（ＲＡＭ１０２等）に記憶する（事前学習機能）。この補正用モデルのアルゴリズムとしては、ＬＳＴＭ（Long Short Term Memory）等の深層学習のアルゴリズムが用いられる。そして、スペクトル補正部１２は、咽喉マイクＭ１の装着位置の評価の処理時には、特徴量抽出器１１によって得られた第１のスペクトルを内部メモリに記憶された事前学習済の補正用モデルを用いて順次補正する。これにより、特徴量抽出器１１においては、順次補正された第１のスペクトルを基に第１のスペクトル特徴量が抽出される。 Further, the spectrum correction unit 12 uses a combination of the first spectrum and the second spectrum simultaneously acquired by the feature extractor 11 in advance for a plurality of frames so as to bring the first spectrum closer to the second spectrum. A machine learning correction model for correcting frequency characteristics is created and stored in an internal memory (RAM 102 or the like) (pre-learning function). As an algorithm of this correction model, a deep learning algorithm such as LSTM (Long Short Term Memory) is used. Then, the spectrum correction unit 12 uses a pre-learned correction model in which the first spectrum obtained by the feature amount extractor 11 is stored in the internal memory at the time of processing the evaluation of the mounting position of the throat microphone M1. Correct sequentially. As a result, the feature amount extractor 11 extracts the first spectral feature amount based on the sequentially corrected first spectrum.

距離算出部１３は、特徴量抽出器１１によってフレーム毎に抽出された第１及び第２のスペクトル特徴量を参照して、フレーム毎のスペクトル距離を時間的に連続して算出する。例えば、距離算出部１３は、下記式（１）を用いてスペクトル距離として、２つのスペクトル特徴量（メルケプストラム）間の差（距離）を数値化したＭＣＤ（Mel-Cepstrum Distortion）を算出する。 The distance calculation unit 13 refers to the first and second spectral feature quantities extracted for each frame by the feature quantity extractor 11 and continuously calculates the spectral distance for each frame in time. For example, the distance calculation unit 13 calculates an MCD (Mel-Cepstrum Distortion) in which the difference (distance) between two spectral feature quantities (mel cepstrum) is quantified as the spectral distance using the following formula (1).

上記式（１）中、ｍ_ｘは第２のスペクトル特徴量であるメルケプストラム係数を示し、ｍ_ｘ’は第１のスペクトル特徴量であるメルケプストラム係数を示し、ＤはＬＰＣの次数である整数を示す。ＭＣＤは、聴取音の品質を評価するためのパラメータであり、０に近いほど２つの音声のスペクトル特性が近いことを示す。なお、距離算出部１３は、第１のスペクトルと第２のスペクトルとの近さ（距離）を評価することができるパラメータであれば他のパラメータを算出してもよい。例えば、上記式（１）に示すルートの項をスペクトル距離として算出してもよいし、上記式（１）に示すΣの値をスペクトル距離として算出してもよいし、ＬＰＣケプストラム距離（ＬＣＤ）をスペクトル距離として算出してもよい。

The formula (1), m _x represents the mel-cepstral coefficients is a second spectral characteristic quantity, m x _'indicates the mel-cepstral coefficients is a first spectral characteristic amount, D is an order of LPC integer Is shown. The MCD is a parameter for evaluating the quality of the listening sound, and the closer it is to 0, the closer the spectral characteristics of the two sounds are. The distance calculation unit 13 may calculate other parameters as long as the parameters can evaluate the closeness (distance) between the first spectrum and the second spectrum. For example, the root term shown in the above formula (1) may be calculated as the spectral distance, the value of Σ shown in the above formula (1) may be calculated as the spectral distance, or the LPC cepstrum distance (LCD). May be calculated as the spectral distance.

区間検出部１４は、特徴量抽出器１１で抽出された各フレーム毎の音声信号を対象にして、ユーザの発話区間を特定する。この発話区間の特定は、特徴量抽出器１１において生成されたＡ／Ｄ変換後の音声信号からパワーあるいはスペクトルを推定した上で音声信号における有音／無音を判定し、有音の期間を特定することにより行われる。そして、区間検出部１４は、発話区間に含まれる各フレームについてスペクトル距離を算出するように距離算出部１３を制御する。 The section detection unit 14 specifies the utterance section of the user by targeting the audio signal for each frame extracted by the feature amount extractor 11. To specify the utterance section, the sound / silence in the voice signal is determined after estimating the power or spectrum from the voice signal after A / D conversion generated by the feature amount extractor 11, and the sound period is specified. It is done by doing. Then, the section detection unit 14 controls the distance calculation unit 13 so as to calculate the spectral distance for each frame included in the utterance section.

時間窓カウンタ部１５は、区間検出部１４において特定された発話区間の開始タイミングから一定時間の時間窓を設定し、その時間窓を順次時間方向にシフトさせて設定する。そして、時間窓カウンタ部１５は、順次シフトさせて設定される時間窓ごとにその時間窓に含まれるフレームに関してスペクトル距離を算出するように距離算出部１３を制御する。 The time window counter unit 15 sets a time window for a fixed time from the start timing of the utterance section specified by the section detection unit 14, and sequentially shifts the time window in the time direction to set the time window. Then, the time window counter unit 15 controls the distance calculation unit 13 so as to calculate the spectral distance for each frame included in the time window for each time window set by sequentially shifting.

平均値算出部１６は、距離算出部１３によってフレーム毎に時間的に連続して算出されたスペクトル距離の平均値を算出する。すなわち、区間検出部１４によって特定された発話区間に含まれる全フレームのスペクトル距離の平均値を算出する。または、平均値算出部１６は、時間窓カウンタ部１５によって順次シフトさせて設定された時間窓毎に、その時間窓に含まれる全フレームのスペクトル距離の平均値を算出する。さらに、平均値算出部１６は、算出したスペクトル距離の平均値を入出力デバイス１０５に出力する。例えば、平均値算出部１６は、ユーザの咽喉マイクＭ１の装着位置の変更に応じた平均値の変化を視覚的に認識可能にディスプレイ等に出力してもよいし、その変化をユーザの聴覚によって認識可能なようにスピーカ等を用いて音声出力してもよい。 The average value calculation unit 16 calculates the average value of the spectral distances continuously calculated in time for each frame by the distance calculation unit 13. That is, the average value of the spectral distances of all the frames included in the utterance section specified by the section detection unit 14 is calculated. Alternatively, the mean value calculation unit 16 calculates the average value of the spectral distances of all the frames included in the time window for each time window set by sequentially shifting by the time window counter unit 15. Further, the average value calculation unit 16 outputs the calculated average value of the spectral distances to the input / output device 105. For example, the mean value calculation unit 16 may output the change of the mean value according to the change of the wearing position of the user's throat microphone M1 to a display or the like so as to be visually recognizable, or the change is output by the user's hearing. Audio may be output using a speaker or the like so that it can be recognized.

次に、上述した評価装置１の事前学習処理における動作および装着位置評価処理における動作を説明するとともに、実施形態に係る評価方法の流れについて詳述する。図３は、評価装置１における事前学習処理における動作手順を示すフローチャートであり、図４は、評価装置１における装着位置評価処理における動作手順を示すフローチャートである。 Next, the operation in the pre-learning process and the operation in the mounting position evaluation process of the evaluation device 1 described above will be described, and the flow of the evaluation method according to the embodiment will be described in detail. FIG. 3 is a flowchart showing an operation procedure in the pre-learning process in the evaluation device 1, and FIG. 4 is a flowchart showing an operation procedure in the mounting position evaluation process in the evaluation device 1.

最初に、装着位置評価処理を実行する前の任意のタイミングでユーザによって咽喉マイクＭ１及び接話型マイクＭ２が装着された状態で事前学習処理が開始される。この事前学習処理は、装着位置評価処理を実行する度に毎回実行される必要はなく、評価装置１の提供者等が最適な装着位置でマイクを装着した上で実行されてもよい。事前学習処理が開始されると、ユーザによる連続的な発声に伴って評価装置１によって咽喉マイクＭ１及び接話型マイクＭ２から音声信号が受信され、特徴量抽出器１１によって、それらの音声信号がＡ／Ｄ変換される（ステップＳ０１）。次に、特徴量抽出器１１によって、咽喉マイクＭ１から得られた音声信号から第１のスペクトルが抽出され、接話型マイクＭ２から得られた音声信号から第２のスペクトルが抽出される（ステップＳ０２）。その後、スペクトル補正部１２によって、複数フレームに亘って連続して得られた第１及び第２のスペクトルのペアを基に、第１のスペクトルから計算される第１のスペクトル特徴量を補正するための機械学習の補正用モデルが生成される（ステップＳ０３）。そして、スペクトル補正部１２により、生成された補正用モデルが内部メモリに記憶される（ステップＳ０４）。 First, the pre-learning process is started with the throat microphone M1 and the close-talking microphone M2 being worn by the user at an arbitrary timing before the wearing position evaluation process is executed. This pre-learning process does not have to be executed every time the mounting position evaluation process is executed, and may be executed after the provider of the evaluation device 1 or the like mounts the microphone at the optimum mounting position. When the pre-learning process is started, audio signals are received from the throat microphone M1 and the close-talking microphone M2 by the evaluation device 1 along with continuous vocalization by the user, and these audio signals are transmitted by the feature amount extractor 11. A / D conversion is performed (step S01). Next, the feature amount extractor 11 extracts the first spectrum from the audio signal obtained from the throat microphone M1 and extracts the second spectrum from the audio signal obtained from the close-talking microphone M2 (step). S02). After that, the spectrum correction unit 12 corrects the first spectral feature amount calculated from the first spectrum based on the pair of the first and second spectra obtained continuously over a plurality of frames. A correction model for machine learning is generated (step S03). Then, the spectrum correction unit 12 stores the generated correction model in the internal memory (step S04).

図４に移って、装着位置評価処理の流れについて説明する。この装着位置評価処理は、ユーザによって接話型マイクＭ２を装着した状態で咽喉マイクＭ１の装着位置が変更された後に、評価装置１に対する指示入力に応じてその都度開始される。 The flow of the mounting position evaluation process will be described with reference to FIG. This wearing position evaluation process is started each time in response to an instruction input to the evaluation device 1 after the wearing position of the throat microphone M1 is changed while the user wears the talking microphone M2.

最初に、ユーザによる連続的な発声に伴って評価装置１によって咽喉マイクＭ１及び接話型マイクＭ２から音声信号が受信され、特徴量抽出器１１によって、それらの音声信号がＡ／Ｄ変換される（ステップＳ１０１）。このとき、評価装置１によって、ユーザに対して、咽喉マイクＭ１の装着位置に応じて音質の比較的大きな変化が生じる音声（例えば、“ｓｈｉ”、“ｓｕ”等）を発声するように促すように、ディスプレイ等の入出力デバイス１０５に指示が出力されることが好ましい。同時に、評価装置１によって、ユーザに対して咽喉マイクＭ１をユーザの咽喉部の所定の部位に装着することを促すように、ディスプレイ等の入出力デバイス１０５に指示が出力されることも好ましい。 First, audio signals are received from the throat microphone M1 and the close-talking microphone M2 by the evaluation device 1 with continuous vocalization by the user, and these audio signals are A / D converted by the feature amount extractor 11. (Step S101). At this time, the evaluation device 1 urges the user to utter a voice (for example, “shi”, “su”, etc.) in which the sound quality changes relatively greatly depending on the mounting position of the throat microphone M1. It is preferable that the instruction is output to the input / output device 105 such as a display. At the same time, it is also preferable that the evaluation device 1 outputs an instruction to the input / output device 105 such as a display so as to urge the user to attach the throat microphone M1 to a predetermined portion of the user's throat.

次に、特徴量抽出器１１によって連続する各フレームにおいて、Ａ／Ｄ変換された２つの音声信号を基に、第１のスペクトル特徴量及び第２のスペクトル特徴量が抽出される（ステップＳ１０２）。その後、スペクトル補正部１２によって、内部メモリに記憶された補正用モデルが読み出され、その補正用モデルを用いて各フレームの第１のスペクトル特徴量が補正される（ステップＳ１０３）。 Next, in each frame consecutively by the feature amount extractor 11, the first spectral feature amount and the second spectral feature amount are extracted based on the two A / D converted audio signals (step S102). .. After that, the spectrum correction unit 12 reads out the correction model stored in the internal memory, and corrects the first spectral feature amount of each frame using the correction model (step S103).

次に、距離算出部１３によって、特徴量抽出器１１によって抽出された各フレームの第２のスペクトル特徴量と、スペクトル補正部１２によって補正された各フレームの第１のスペクトル特徴量とを用いて、各フレームに関してスペクトル距離が算出および保持される（ステップＳ１０４）。さらに、第１及び第２のスペクトル特徴量の抽出、第１のスペクトル特徴量の補正、及びスペクトル距離の算出は、発話区間に含まれる全フレームに関して、もしくは、発話区間の開始後の一定時間の移動分析の時間窓に含まれる全フレームに関して繰り返し行われる（ステップＳ１０５）。 Next, the distance calculation unit 13 uses the second spectral feature amount of each frame extracted by the feature amount extractor 11, and the first spectral feature amount of each frame corrected by the spectrum correction unit 12. , The spectral distance is calculated and held for each frame (step S104). Further, the extraction of the first and second spectral features, the correction of the first spectral features, and the calculation of the spectral distance are performed for all frames included in the utterance section or for a certain period of time after the start of the utterance section. It is repeated for all frames included in the time window of the movement analysis (step S105).

そして、平均値算出部１６によって、発話区間あるいはそれぞれの時間窓におけるスペクトル距離の平均値が算出され出力される（ステップＳ１０６）。最後に、ユーザによる評価装置１に対する装着位置評価処理の終了が指示されたか否かが判定され（ステップＳ１０７）、終了が指示されていない場合には（ステップＳ１０７；Ｎｏ）、処理がステップＳ１０２に戻されて、スペクトル距離の平均値の算出および出力が繰り返される。一方で、終了が指示された場合には（ステップＳ１０７；Ｙｅｓ）、装着位置評価処理が終了される。 Then, the average value calculation unit 16 calculates and outputs the average value of the spectral distances in the utterance section or each time window (step S106). Finally, it is determined whether or not the user has instructed the end of the mounting position evaluation process for the evaluation device 1 (step S107), and if the end is not instructed (step S107; No), the process proceeds to step S102. It is returned and the calculation and output of the average value of the spectral distances are repeated. On the other hand, when the end is instructed (step S107; Yes), the mounting position evaluation process is ended.

図５には、平均値算出部１６によるスペクトル距離の平均値の入出力デバイス１０５における出力イメージを示す。ここでは、ディスプレイ装置における出力イメージを示している。このように、ユーザに対して咽喉部における咽喉マイクＭ１の装着位置“１”、“２”、“３”、…を指示するようにディスプレイ画面２１上に順次指示情報が表示されるとともに、それぞれの装着位置に対応して算出された平均値“Ｘ．ＸＸ”がその装着位置に関連付けてディスプレイ画面２１上に順次表示される。この平均値の表示に際しては、平均値を示す文字列に加えて、前回測定時からの平均値の変化を示す情報（例えば、上昇を示す記号“↑”）が表示されてもよい。また、平均値算出部１６は、ディスプレイ画面上に視認可能なように情報を出力することには限定されず、スピーカ等を用いて聴覚で認識可能なように音声を出力してもよい。例えば、スペクトル距離の平均値が下降した際にビープ音等を出力してもよいし、平均値の大小をビープ音の高低で表わして出力してもよい。 FIG. 5 shows an output image of the average value of the spectral distances by the average value calculation unit 16 in the input / output device 105. Here, the output image in the display device is shown. In this way, instruction information is sequentially displayed on the display screen 21 so as to instruct the user at the mounting positions “1”, “2”, “3”, ... Of the throat microphone M1 in the throat, and each of them. The average value "X.XX" calculated corresponding to the mounting position of the above is sequentially displayed on the display screen 21 in association with the mounting position. When displaying the average value, in addition to the character string indicating the average value, information indicating the change in the average value from the previous measurement (for example, the symbol “↑” indicating an increase) may be displayed. Further, the average value calculation unit 16 is not limited to outputting information so that it can be visually recognized on the display screen, and may output voice so that it can be audibly recognized by using a speaker or the like. For example, a beep sound or the like may be output when the average value of the spectral distances decreases, or the magnitude of the average value may be represented by the pitch of the beep sound and output.

次に、図６を参照して、コンピュータを上記評価装置１として機能させるための評価プログラムを説明する。 Next, an evaluation program for making the computer function as the evaluation device 1 will be described with reference to FIG.

評価プログラムＰ１は、メインモジュールＰ１０、特徴量算出モジュールＰ１１、スペクトル補正モジュールＰ１２、距離算出モジュールＰ１３、区間検出モジュールＰ１４、時間窓カウンタモジュールＰ１５、及び平均値算出モジュールＰ１６を備えている。 The evaluation program P1 includes a main module P10, a feature amount calculation module P11, a spectrum correction module P12, a distance calculation module P13, an interval detection module P14, a time window counter module P15, and an average value calculation module P16.

メインモジュールＰ１０は、評価装置１の動作を統括的に制御する部分である。メインモジュールＰ１０、特徴量算出モジュールＰ１１、スペクトル補正モジュールＰ１２、距離算出モジュールＰ１３、区間検出モジュールＰ１４、時間窓カウンタモジュールＰ１５、及び平均値算出モジュールＰ１６を実行することにより実現される機能は、それぞれ、特徴量抽出器１１、スペクトル補正部１２、距離算出部１３、区間検出部１４、時間窓カウンタ部１５、及び平均値算出部１６の機能と同様である。 The main module P10 is a part that comprehensively controls the operation of the evaluation device 1. The functions realized by executing the main module P10, the feature amount calculation module P11, the spectrum correction module P12, the distance calculation module P13, the section detection module P14, the time window counter module P15, and the mean value calculation module P16 are, respectively. The functions are the same as those of the feature amount extractor 11, the spectrum correction unit 12, the distance calculation unit 13, the section detection unit 14, the time window counter unit 15, and the mean value calculation unit 16.

評価プログラムＰ１は、例えば、ＣＤ−ＲＯＭ、ＤＶＤもしくはＲＯＭ等のコンピュータ読み取り可能な記録媒体または半導体メモリによって提供される。また、評価プログラムＰ１は、搬送波に重畳されたコンピュータデータ信号としてネットワークを介して提供されてもよい。 The evaluation program P1 is provided by, for example, a computer-readable recording medium such as a CD-ROM, DVD or ROM, or a semiconductor memory. Further, the evaluation program P1 may be provided via a network as a computer data signal superimposed on a carrier wave.

上述した評価装置１によれば、咽喉マイクＭ１からの音声信号を基にして抽出された第１のスペクトル特徴量と接話型マイクＭ２からの音声信号を基にした第２のスペクトル特徴量との間のスペクトル距離が算出され、時間的に連続して算出されたスペクトル距離の平均値が算出および出力される。これにより、咽喉マイクＭ１が適した位置に装着されているか否かを、咽喉マイクＭ１による検出を基にした信号のスペクトルと接話型マイクＭ２による検出を基にした信号のスペクトルとの間の類似性を基に評価させることができる。 According to the evaluation device 1 described above, the first spectral feature amount extracted based on the voice signal from the throat microphone M1 and the second spectral feature amount based on the voice signal from the talking microphone M2. The spectral distance between the two is calculated, and the average value of the spectral distances calculated continuously over time is calculated and output. Thereby, whether or not the throat microphone M1 is mounted in a suitable position is determined between the spectrum of the signal based on the detection by the throat microphone M1 and the spectrum of the signal based on the detection by the conversational microphone M2. It can be evaluated based on the similarity.

また、評価装置１においては、第１のスペクトル特徴量を補正用のモデルを用いて、第２のスペクトル特徴量の特性に近づくように補正されている。この場合、咽喉マイクＭ１のスペクトル上の検出特性と接話型マイクＭ２のスペクトル上の検出特性との差を考慮して第１のスペクトル特徴量を補正することができ、この補正された第１のスペクトル特徴量を用いることで咽喉マイクＭ１の装着位置をより適切に評価できる。 Further, in the evaluation device 1, the first spectral feature amount is corrected so as to approach the characteristic of the second spectral feature amount by using a model for correction. In this case, the first spectral feature amount can be corrected in consideration of the difference between the detection characteristic on the spectrum of the throat microphone M1 and the detection characteristic on the spectrum of the conversational microphone M2, and the corrected first The mounting position of the throat microphone M1 can be evaluated more appropriately by using the spectral feature amount of.

また、評価装置１においては、スペクトル距離として、第１のスペクトル特徴量と第２のスペクトル特徴量の差を数値化したメルケプストラム距離が用いられている。この場合、咽喉マイクＭ１による検出を基にした信号のスペクトルと接話型マイクＭ２による検出を基にした信号のスペクトルとの間の類似性を、簡易かつ適切に評価することができる。 Further, in the evaluation device 1, as the spectral distance, a mer cepstrum distance obtained by quantifying the difference between the first spectral feature amount and the second spectral feature amount is used. In this case, the similarity between the spectrum of the signal based on the detection by the throat microphone M1 and the spectrum of the signal based on the detection by the talking microphone M2 can be easily and appropriately evaluated.

さらに、評価装置１においては、咽喉マイクＭ１あるいは接話型マイクＭ２からの音声信号を基に認識された発話区間に含まれる全フレームにおけるスペクトル距離の平均値が算出されている。この場合、ユーザの発話区間における、咽喉マイクＭ１による検出を基にした信号のスペクトルと接話型マイクＭ２による検出を基にした信号のスペクトルとの間の類似性を評価でき、雑音の影響を受けることなく咽喉マイクＭ１の装着位置をより適切に評価できる。 Further, in the evaluation device 1, the average value of the spectral distances in all the frames included in the utterance section recognized based on the voice signal from the throat microphone M1 or the close-talking microphone M2 is calculated. In this case, it is possible to evaluate the similarity between the spectrum of the signal based on the detection by the throat microphone M1 and the spectrum of the signal based on the detection by the close-talking microphone M2 in the user's utterance section, and the influence of noise can be evaluated. The mounting position of the throat microphone M1 can be evaluated more appropriately without receiving it.

一方で、評価装置１においては、咽喉マイクＭ１あるいは接話型マイクＭ２からの音声信号を基に一定期間ごとに時間窓を順次シフトさせてスペクトル距離の平均値が算出されている。このようにすることで、時間的に連続して、咽喉マイクＭ１による検出を基にした信号のスペクトルと接話型マイクＭ２による検出を基にした信号のスペクトルとの間の類似性を評価でき、咽喉マイクＭ１の装着位置を時間的に連続して評価できる。 On the other hand, in the evaluation device 1, the average value of the spectral distances is calculated by sequentially shifting the time window at regular intervals based on the audio signals from the throat microphone M1 or the talking microphone M2. By doing so, it is possible to evaluate the similarity between the spectrum of the signal based on the detection by the throat microphone M1 and the spectrum of the signal based on the detection by the talking microphone M2 continuously in time. , The mounting position of the throat microphone M1 can be continuously evaluated in time.

また、評価装置１においては、ユーザの咽喉部の複数の装着位置を順次画面上に表示させるとともに、それぞれの装着位置に対応して算出したスペクトル距離の平均値が順次画面上に表示されている。このような機能により、ユーザの咽喉部における装着位置を示しながら、咽喉マイクＭ１による検出を基にした信号のスペクトルと接話型マイクＭ２による検出を基にした信号のスペクトルとの間の類似性を評価できる。その結果、咽喉マイクＭ１の装着位置を順次変えさせながら、適した装着位置を評価させることができる。 Further, in the evaluation device 1, a plurality of wearing positions of the user's throat are sequentially displayed on the screen, and the average value of the spectral distances calculated corresponding to the respective wearing positions is sequentially displayed on the screen. .. With such a function, the similarity between the spectrum of the signal based on the detection by the throat microphone M1 and the spectrum of the signal based on the detection by the talking microphone M2 while indicating the wearing position in the user's throat. Can be evaluated. As a result, it is possible to evaluate a suitable mounting position while sequentially changing the mounting position of the throat microphone M1.

以上、本発明の種々の実施形態について説明したが、本発明は上記実施形態に限定されるものではなく、各請求項に記載した要旨を変更しない範囲で変形し、又は他のものに適用したものであってもよい。 Although various embodiments of the present invention have been described above, the present invention is not limited to the above embodiments, and the gist described in each claim is modified or applied to other embodiments without modification. It may be a thing.

１…評価装置、１１…特徴量抽出器（抽出部）、１３…距離算出部、１６…平均値算出部（距離出力部）、Ｍ１…咽喉マイク、Ｍ２…接話型マイク（音響マイク）、Ｐ１…評価プログラム。
1 ... Evaluation device, 11 ... Feature amount extractor (extractor), 13 ... Distance calculation unit, 16 ... Mean value calculation unit (distance output unit), M1 ... Throat microphone, M2 ... Close-up microphone (acoustic microphone), P1 ... Evaluation program.

Claims

It is an evaluation device that evaluates the mounting position of the throat microphone.
An extraction unit that extracts the first spectral feature based on the audio signal from the throat microphone and extracts the second spectral feature based on the audio signal from the acoustic microphone.
A distance calculation unit that calculates a spectral distance based on the first spectral feature amount and the second spectral feature amount, and a distance calculation unit.
A distance output unit that calculates and outputs the average value of the spectral distances continuously calculated by the distance calculation unit in time, and a distance output unit.
Evaluation device equipped with.

A correction unit for correcting the first spectral feature amount so as to approach the characteristic of the second spectral feature amount by using a correction model is further provided.
The distance calculation unit calculates the spectral distance based on the corrected first spectral feature amount and the second spectral feature amount.
The evaluation device according to claim 1.

The distance calculation unit calculates the spectral distance by quantifying the difference between the first spectral feature amount and the second spectral feature amount as the spectral distance.
The evaluation device according to claim 1 or 2.

The distance calculation unit calculates the mer cepstrum distance as the spectral distance.
The evaluation device according to claim 3.

The distance output unit calculates the average value of the spectral distances calculated in the utterance section recognized based on the voice signal from the throat microphone or the acoustic microphone.
The evaluation device according to any one of claims 1 to 4.

The distance output unit sequentially shifts the time window at regular intervals based on the audio signal from the throat microphone or the acoustic microphone, calculates the average value of the spectral distance, and shifts each time window. The average value is sequentially output.
The evaluation device according to any one of claims 1 to 5.

The distance output unit sequentially displays a plurality of wearing positions of the user's throat on the screen, and sequentially displays the average value of the spectral distances calculated corresponding to the respective wearing positions on the screen.
The evaluation device according to any one of claims 1 to 6.

It is an evaluation method that evaluates the wearing position of the throat microphone.
An extraction step of extracting the first spectral feature based on the audio signal from the throat microphone and extracting the second spectral feature based on the audio signal from the acoustic microphone.
A distance calculation step for calculating a spectral distance based on the first spectral feature amount and the second spectral feature amount, and
A distance output step that calculates and outputs the average value of the spectral distances calculated continuously in time in the distance calculation step, and
Evaluation method with.

Computer,
An extraction unit that extracts the first spectral feature based on the audio signal from the throat microphone and extracts the second spectral feature based on the audio signal from the acoustic microphone.
A distance calculation unit that calculates a spectral distance based on the first spectral feature amount and the second spectral feature amount, and an average value of the spectral distances that are continuously calculated in time by the distance calculation unit. Distance output section for calculation and output,
An evaluation program that functions as.