JP7043081B2 - Voice recall recognition device, wearer, voice recall recognition method and program - Google Patents

Voice recall recognition device, wearer, voice recall recognition method and program Download PDF

Info

Publication number
JP7043081B2
JP7043081B2 JP2019097202A JP2019097202A JP7043081B2 JP 7043081 B2 JP7043081 B2 JP 7043081B2 JP 2019097202 A JP2019097202 A JP 2019097202A JP 2019097202 A JP2019097202 A JP 2019097202A JP 7043081 B2 JP7043081 B2 JP 7043081B2
Authority
JP
Japan
Prior art keywords
voice
time series
recognition device
recall
language
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
JP2019097202A
Other languages
Japanese (ja)
Other versions
JP2020191021A (en
JP2020191021A5 (en
Inventor
恒雄 新田
Original Assignee
恒雄 新田
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 恒雄 新田 filed Critical 恒雄 新田
Priority to JP2019097202A priority Critical patent/JP7043081B2/en
Priority to EP20809757.6A priority patent/EP3973861A1/en
Priority to CN202080037965.1A priority patent/CN113873944A/en
Priority to US17/613,658 priority patent/US20220238113A1/en
Priority to PCT/JP2020/020342 priority patent/WO2020235680A1/en
Publication of JP2020191021A publication Critical patent/JP2020191021A/en
Publication of JP2020191021A5 publication Critical patent/JP2020191021A5/ja
Application granted granted Critical
Publication of JP7043081B2 publication Critical patent/JP7043081B2/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61BDIAGNOSIS; SURGERY; IDENTIFICATION
    • A61B5/00Measuring for diagnostic purposes; Identification of persons
    • A61B5/24Detecting, measuring or recording bioelectric or biomagnetic signals of the body or parts thereof
    • A61B5/316Modalities, i.e. specific diagnostic methods
    • A61B5/369Electroencephalography [EEG]
    • A61B5/377Electroencephalography [EEG] using evoked responses
    • A61B5/38Acoustic or auditory stimuli
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/24Speech recognition using non-acoustical features
    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61BDIAGNOSIS; SURGERY; IDENTIFICATION
    • A61B5/00Measuring for diagnostic purposes; Identification of persons
    • A61B5/24Detecting, measuring or recording bioelectric or biomagnetic signals of the body or parts thereof
    • A61B5/316Modalities, i.e. specific diagnostic methods
    • A61B5/369Electroencephalography [EEG]
    • A61B5/372Analysis of electroencephalograms
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/02Feature extraction for speech recognition; Selection of recognition unit
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/28Constructional details of speech recognition systems
    • G10L15/30Distributed recognition, e.g. in client-server systems, for mobile phones or network applications
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0216Noise filtering characterised by the method used for estimating noise
    • G10L21/0232Processing in the frequency domain
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/06Transformation of speech into a non-audible representation, e.g. speech visualisation or speech processing for tactile aids
    • G10L21/10Transforming into visible information
    • G10L21/12Transforming into visible information by displaying time domain information
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • G10L25/18Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being spectral information of each sub-band
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/02Feature extraction for speech recognition; Selection of recognition unit
    • G10L2015/025Phonemes, fenemes or fenones being the recognition units

Description

本発明は、音声想起認識装置、装着具、音声想起認識方法及びプログラムに関するものである。 The present invention relates to a voice recall recognition device, a wearer, a voice recall recognition method, and a program.

音声言語入力装置は、これまで発話された音声波をマイクロホン、もしくは骨伝導の振動を振動ピックアップで受け、得られた信号から音声言語情報を認識するものが実用に供されている。
近年は、膨大な音声データおよび言語データを利用し、音素の並び(音響モデル)と単語の並び(言語モデル)に関する確率情報をネットワーク上に蓄積・利用することで、高速・高性能な音声言語認識を実現している。他方、発話による周囲への迷惑・漏洩、発話困難な筋委縮性側索硬化症(ALS)患者などの増大から、発話を伴わない、音声想起(speech imagery)による言語認識実現が脳コンピュータ・インタフェース(Brain Computer Interface;BCI)の分野から望まれている。
As a voice language input device, a device that receives voice waves uttered so far by a microphone or a vibration pickup of bone conduction vibration and recognizes voice language information from the obtained signal is put into practical use.
In recent years, by using a huge amount of speech data and language data, and accumulating and using stochastic information about phoneme sequences (acoustic model) and word sequences (language model) on the network, high-speed and high-performance speech languages have been used. Realizes recognition. On the other hand, due to the increase in utterance-induced inconvenience / leakage to the surroundings and the increase in patients with muscle atrophy lateral sclerosis (ALS) who have difficulty speaking, the realization of language recognition by speech imagery without speech is a brain computer interface. (Brain Computer Interface; BCI) is desired from the field.

音声想起信号からの音声言語認識は、脳皮質から硬膜下皮質表面電位(Electrocorticogram ;ECoG)を64~128点観測することで、近年、発話を伴う場合の音声言語認識が試みられつつある(非特許文献1参照)。しかし、このように開頭手術を伴う方法は、重篤な患者以外に利用することは現実的ではない。他方、頭皮上の電極で脳波(Electroencephalogram:EEG)を観測する方式は、実用化すると社会貢献が計り知れないものの、今日まで、雑音中に意味のある音声言語信号を見出す試みは成功して来なかった。 For speech language recognition from speech recall signals, speech language recognition with speech is being attempted in recent years by observing 64 to 128 points of subdural corticogram (ECoG) from the brain cortex (ECoG). See Non-Patent Document 1). However, it is not realistic to use such a method involving craniotomy except for critically ill patients. On the other hand, although the method of observing electroencephalograms (EEGs) with electrodes on the scalp has immeasurable social contribution when put into practical use, attempts to find meaningful vocal language signals in noise have been successful to date. I didn't.

近年は、PET、fMRIなど高解像度装置を用いて発話時の脳を解析したり、開頭時に患者が発話した際のECoGを観測する研究が進み、音声言語が脳の何処の部位で処理されるかが、明確になりつつある。これらの結果によると、左中側頭回(MTG)における概念準備の後、言語としてのプランニングが左上側頭回(STG)にかけて行われるとされている(非特許文献2参照)。この後、左下前頭回(IFG; ブローカ野)において音節化(syllabication)が行われ、発話の際には左中心前回(PG; 運動野)で調音(構音)が行われる(非特許文献3参照)。こうした研究成果から、発話を伴わない音声言語に対するディコーディング(復号)も、ブローカ野に到達する言語表象(linguistic representation)を捉えることができるなら可能になると期待されている。
また、脳波を検出して、この脳波から運動指令に関する信号を検出する技術が提案されている(特許文献1参照)
In recent years, research has progressed to analyze the brain during speech using high-resolution devices such as PET and fMRI, and to observe EcoG when a patient speaks at the time of craniotomy, and speech language is processed in any part of the brain. Is becoming clearer. According to these results, after the concept preparation in the left middle temporal gyrus (MTG), the planning as a language is performed toward the superior temporal gyrus (STG) (see Non-Patent Document 2). After this, syllableation is performed in the lower left inferior frontal gyrus (IFG; Broca's area), and syllables (articulation) are performed in the left precentral gyrus (PG; motor cortex) during speech (see Non-Patent Document 3). ). From these research results, it is expected that decoding for speech languages without speech will be possible if the linguistic representation that reaches Broca's area can be captured.
Further, a technique for detecting an electroencephalogram and detecting a signal related to a motor command from the electroencephalogram has been proposed (see Patent Document 1).

Heger D. et al., Continuous Speech Recognition from ECoG, Interspeech2015, 1131-1135 (2015)Heger D. et al., Continuous Speech Recognition from ECoG, Interspeech2015, 1131-1135 (2015) Indefrey, P et al., The spatial and temporal signatures of word production components, Cognition 92, 101-144 (2004)Indefrey, P et al., The spatial and temporal signatures of word production components, Cognition 92, 101-144 (2004) Bouchard K.E. et al., Functional organization of human sensorimotor cortex for speech articulation, Nature 495, 327-332 (2013)Bouchard K.E. et al., Functional organization of human sensorimotor cortex for speech articulation, Nature 495, 327-332 (2013) Girolami M., Advances in Independent Component Analysis, Springer (2000)Girolami M., Advances in Independent Component Analysis, Springer (2000) Durbin, J. "The fitting of time series models." Rev. Inst. Int. Stat., v. 28, pp. 233-243 (1960)Durbin, J. "The fitting of time series models." Rev. Inst. Int. Stat., V. 28, pp. 233-243 (1960)

特開2008-204135号公報Japanese Unexamined Patent Publication No. 2008-204135

しかしながら、脳波からの音声言語認識では、言語表象がどのようなフォーマットで表現されているのか不明で、具体的な抽出方法を見出せないことが最大の問題点である。さらに、言語表象から音素単位への変換方法が与えられないと、例えば音節単位のように多くの種類を対象にしなければならないため(音節では短音節の他に多くの長音節を持ち、計数千個と言われる)、効率の良い音声言語処理が非常に困難になる(音素では日本語で24個、英語で44個(但し、弱母音と強母音を分けている。日本語では通常、分けない)程度である)。 However, in speech language recognition from brain waves, it is unclear in what format the language representation is expressed, and the biggest problem is that a specific extraction method cannot be found. Furthermore, if a method for converting language representations to phoneme units is not given, many types must be targeted, for example, syllable units (syllables have many long syllables in addition to short syllables and count. Efficient speech language processing becomes very difficult (it is said to be a thousand) (24 for phonemes and 44 for English (however, weak vowels and strong vowels are separated. Normally in Japanese, it is usually). (Do not divide).

本発明は、上記事情に鑑みてなされたものであり、脳波による音声言語認識を可能とする音声想起認識装置、装着具、音声想起認識方法及びプログラムを提供することを目的とする。 The present invention has been made in view of the above circumstances, and an object of the present invention is to provide a voice recall recognition device, a wearer, a voice recall recognition method, and a program capable of voice language recognition by electroencephalogram.

上記目的を達成するために、本発明は、音声想起時の脳波から音声言語を認識するため、言語表象としての線スペクトル成分抽出器により線スペクトル成分を抽出すると共に、それらの成分を音素別畳み込み演算などを用いた音素特徴ベクトル時系列変換器に通すことで、音素特徴ベクトル時系列を得ることを最も主要な特徴とする。 In order to achieve the above object, in order to recognize speech language from brain waves at the time of speech recall, the present invention extracts line spectrum components by a line spectrum component extractor as a language representation and convolves those components by phoneme. The most important feature is to obtain a phoneme feature vector time series by passing it through a phoneme feature vector time series converter using arithmetic.

第1の発明は、音声想起時の脳波から音声言語を認識する音声想起認識装置であって、電極群から入力される前記電極毎の脳波の離散信号群を分析処理してスペクトル時系列を出力する分析処理部と、前記スペクトル時系列に基づき、音素特徴ベクトル時系列を出力する抽出部と、を有する音声想起認識装置が提供される。 The first invention is a voice recall recognition device that recognizes a voice language from a brain wave at the time of voice recall, analyzes and processes a discrete signal group of the brain wave for each electrode input from the electrode group, and outputs a spectral time series. Provided is a speech recall recognition device having an analysis processing unit for performing an electroencephalogram and an extraction unit for outputting an electroencephalogram feature vector time series based on the spectral time series.

第2の発明は、音声想起時の脳波から音声言語を認識する音声想起認識装置用の装着具であって、ブローカ野周辺に配置される電極群と、前記電極群からの信号を出力する出力部と、を有し、前記音声想起認識装置は、前記出力部から出力される前記電極毎の脳波の離散信号群を分析処理してスペクトル時系列を出力する分析処理と、前記スペクトル時系列に基づき、音素特徴ベクトル時系列を出力する抽出処理と、を実行する、装着具が提供される。 The second invention is a wearer for a voice recall recognition device that recognizes a speech language from brain waves at the time of speech recall, and has an electrode group arranged around the broker field and an output for outputting a signal from the electrode group. The voice recall recognition device has an analysis process of analyzing and processing a discrete signal group of brain waves for each electrode output from the output unit and outputting a spectral time series, and the spectral time series. Based on this, a fitting is provided that executes an extraction process that outputs a phonetic feature vector time series.

第3の発明は、音声想起時の脳波から音声言語を認識する音声想起認識方法であって、電極群から入力される前記電極毎の脳波の離散信号群を分析処理してスペクトル時系列を出力する分析処理ステップと、前記スペクトル時系列に基づき、音素特徴ベクトル時系列を出力する抽出ステップと、を含む音声想起認識方法が提供される。 The third invention is a voice recall recognition method for recognizing a speech language from a brain wave at the time of speech recall, in which a discrete signal group of the brain wave for each electrode input from the electrode group is analyzed and processed to output a spectral time series. A speech recall recognition method including an analysis processing step to be performed and an extraction step to output a phonetic feature vector time series based on the spectral time series is provided.

第4の発明は、コンピュータに、音声想起時の脳波から音声言語を認識する音声想起認識処理を実行させるためのプログラムであって、コンピュータに、電極群から入力される前記電極毎の脳波の離散信号群を分析処理して言語表象としてのスペクトル成分を出力する分析処理と、前記電極毎のスペクトル成分に基づき、音素特徴群を抽出する抽出処理と、を実行させるプログラムが提供される。 The fourth invention is a program for causing a computer to execute a voice recall recognition process for recognizing a speech language from a brain wave at the time of speech recall, and is a discrete brain wave for each electrode input to the computer from a group of electrodes. A program is provided for executing an analysis process of analyzing a signal group and outputting a spectral component as a linguistic representation, and an extraction process of extracting a phonetic feature group based on the spectral component of each electrode.

本発明によれば、脳波による音声言語認識を可能とする音声想起認識装置、装着具、音声想起認識方法及びプログラムを提供することができる。 INDUSTRIAL APPLICABILITY According to the present invention, it is possible to provide a speech recall recognition device, a wearer, a speech recall recognition method and a program that enable speech language recognition by brain waves.

本発明の認識装置の構成を示したモデル図である。It is a model diagram which showed the structure of the recognition device of this invention. 脳波測定電極(10-10システム)とブローカ野周辺9電極を示す図である。It is a figure which shows the electroencephalogram measurement electrode (10-10 system) and 9 electrodes around Broca's area. 脳波からのノイズ除去効果を示す図である。It is a figure which shows the noise reduction effect from an electroencephalogram. 音声想起時脳波の線形予測分析の説明図である。It is explanatory drawing of the linear prediction analysis of the EEG at the time of voice recall. 音声想起時脳波の線形予測分析と従来フーリエ分析との比較を示す図である。It is a figure which shows the comparison between the linear predictive analysis of the EEG at the time of speech recall and the conventional Fourier analysis. 音声想起時脳波の短時間正弦波群を示す図である。It is a figure which shows the short-time sine wave group of the electroencephalogram at the time of voice recall. 言語特徴抽出部の処理手順を示すフロー図である。It is a flow chart which shows the processing procedure of a language feature extraction part. 音声想起時脳波の周波数変動吸収例を示した図である。It is a figure which showed the frequency fluctuation absorption example of the electroencephalogram at the time of voice recall. 音声想起時脳波の線スペクトル時系列の例を示す図である。It is a figure which shows the example of the line spectrum time series of the electroencephalogram at the time of voice recall. 複数電極に跨る線スペクトル時系列の例を示した図である。It is a figure which showed the example of the line spectrum time series which spans a plurality of electrodes. 音素別畳み込み演算子の設計と利用の処理手順を示すフロー図である。It is a flow diagram which shows the processing procedure of the design and use of the convolution operator for each phoneme. 音素別畳み込み演算子を構成する音素固有ベクトルの例を示す図である。It is a figure which shows the example of the phoneme eigenvector which constitutes the phoneme-specific convolution operator. 音声想起時脳波に対する音素尤度時系列の例を示した図である。It is a figure which showed the example of the phoneme likelihood time series with respect to the electroencephalogram at the time of speech recall. テスト認識による電極位置更正を示す図である。It is a figure which shows the electrode position correction by a test recognition. 音声想起認識装置の他の構成例を示す図である。It is a figure which shows the other configuration example of the voice recall recognition apparatus. 音声想起認識装置の他の構成例を示す図である。It is a figure which shows the other configuration example of the voice recall recognition apparatus. 音声想起認識装置の他の構成例を示す図である。It is a figure which shows the other configuration example of the voice recall recognition apparatus.

(実施形態)
以下、本発明における音声想起認識装置の実施形態について、添付図面を参照して説明する。なお、添付図面は本発明の技術的特徴を説明するのに用いられており、記載されている装置の構成、各種処理の手順などは、特に特定的な記載がない限り、それのみに限定する趣旨ではない。なお、実施形態の説明の全体を通して同じ要素には同じ符号が付される。
(Embodiment)
Hereinafter, embodiments of the voice recall recognition device according to the present invention will be described with reference to the accompanying drawings. It should be noted that the attached drawings are used to explain the technical features of the present invention, and the configuration of the apparatus described, the procedure of various processes, etc. are limited to that unless otherwise specified. Not the purpose. The same elements are designated by the same reference numerals throughout the description of the embodiments.

図1は、音声想起認識装置1の構成を示したモデル図である。図1を参照して、音声想起認識装置1の構成と動作を説明する。
音声想起認識装置1は、音声想起時の脳波から音声言語を認識するためのものである。
音声想起認識装置1は、図示しない頭皮上に設置する電極群から入力される脳波を離散信号群に変換する脳波入力部2と、電極毎に離散信号からノイズを除去する前処理部3と、電極毎に離散信号群を分析処理してスペクトル時系列を出力する分析処理部4と、全電極のスペクトル時系列から音素特徴ベクトル時系列を出力する言語特徴抽出部5と、音素特徴ベクトル時系列から音声言語である単語・文を認識する単語・文認識部6と、音声言語情報を表示・音声出力する後処理・出力部7により構成される。
FIG. 1 is a model diagram showing the configuration of the voice recall recognition device 1. The configuration and operation of the voice recall recognition device 1 will be described with reference to FIG.
The voice recall recognition device 1 is for recognizing a voice language from an electroencephalogram at the time of voice recall.
The voice recall recognition device 1 includes an electroencephalogram input unit 2 that converts brain waves input from a group of electrodes installed on the scalp (not shown) into a discrete signal group, a preprocessing unit 3 that removes noise from the discrete signal for each electrode, and a preprocessing unit 3. An analysis processing unit 4 that analyzes and processes a discrete signal group for each electrode and outputs a spectral time series, a language feature extraction unit 5 that outputs a phonetic feature vector time series from the spectral time series of all electrodes, and a phonetic feature vector time series. It is composed of a word / sentence recognition unit 6 that recognizes a word / sentence that is a voice language, and a post-processing / output unit 7 that displays / outputs voice language information.

脳波入力部2は、多電極脳波出力のアナログ信号群x(q,t)をA/D変換等により離散的信号に変換すると共に、全電極の離散信号の平均値などを利用して、個別電極が持つバイアスの偏りを更正する処理を行う。同時に電極毎の離散信号から、70Hz以下の不要な周波数成分を低周波除去フィルタ(高域通過フィルタ)で遮断すると共に、180Hz以上の不要な周波数成分を高周波除去フィルタ(低域通過フィルタ)により遮断した信号x1(q,n)を出力する。 The brain wave input unit 2 converts the analog signal group x (q, t) of the multi-electrode brain wave output into a discrete signal by A / D conversion or the like, and individually uses the average value of the discrete signals of all the electrodes. A process is performed to correct the bias bias of the electrodes. At the same time, unnecessary frequency components of 70 Hz or less are blocked by a low frequency rejection filter (high frequency pass filter) from the discrete signal of each electrode, and unnecessary frequency components of 180 Hz or higher are blocked by a high frequency rejection filter (low frequency pass filter). The signal x 1 (q, n) is output.

図2に64電極を用いる標準的な国際10-10システムの電極配置を示す。このうち、左脳のブローカ野周辺に属する9電極{F3,F5,F7,FC3,FC5,FT7,C3,C5,T7}から音声想起信号を受け取り、言語特徴を抽出して想起内容を認識する。一般に右利きの人は、左脳で言語を処理していると言われているが、左利きのかなりの人が、やはり左脳で言語を処理しているとされている。なお、脳波では、瞬きなどの動作により大きな変動(artifactと呼ばれる)を受けることがあるが、上記フィルタ操作により多くの不要成分を除去することができる。さらにフィルタ操作では除去できない不要成分に対しては、全電極の離散信号に対して、少数の独立した情報源を推定して除去した後、元の電極出力(ここでは9電極)に戻す処理を行う、独立成分分析(Independent Component Analysis;IPA)を適用してもよい。 FIG. 2 shows the electrode arrangement of a standard international 10-10 system using 64 electrodes. Of these, voice recall signals are received from 9 electrodes {F3, F5, F7, FC3, FC5, FT7, C3, C5, T7} belonging to the Broca's area of the left brain, language features are extracted, and the recall content is recognized. It is generally said that right-handed people process language in the left brain, but a considerable number of left-handed people also process language in the left brain. It should be noted that although brain waves may be subject to large fluctuations (called artifacts) due to movements such as blinking, many unnecessary components can be removed by the above filter operation. Furthermore, for unnecessary components that cannot be removed by filter operation, a process of estimating and removing a small number of independent information sources for the discrete signals of all electrodes and then returning to the original electrode output (here, 9 electrodes) is performed. Independent Component Analysis (IPA) may be applied.

前処理部3は、電極毎にフィルタを通過するノイズを除去する。この処理の一例を以下に述べる。脳波入力部の一連の処理を終えた各電極の離散信号x1(q,n)、(q:電極番号、n:時刻)に対して、まず一定の時間窓を掛けた後、高速フーリエ変換(FFT)により時間領域から周波数領域に写像する。続いて、周波数領域の複素数成分から以下のように振幅スペクトラム時系列X1(q,f,n’)、(fは周波数、n’は窓かけ後の時間フレーム番号) を求める。 The pretreatment unit 3 removes noise passing through the filter for each electrode. An example of this process is described below. The discrete signal x 1 (q, n), (q: electrode number, n: time) of each electrode that has completed a series of processing of the electroencephalogram input unit is first multiplied by a fixed time window, and then fast Fourier transform is performed. (FFT) maps from the time domain to the frequency domain. Then, the amplitude spectrum time series X 1 (q, f, n') and (f is the frequency and n'is the time frame number after windowing) are obtained from the complex number components in the frequency domain as follows.

Figure 0007043081000001
Figure 0007043081000002
ここで、jは虚数単位、Re{ }、Im{ }は各々実数部、虚数部を表す。ノイズ引き去り(Noise subtraction)では、音声想起(Speech imagery)に先立ち観測した脳波(EEG信号)のスペクトルN(q,f,n’)から次式で、平均ノイズ振幅スペクトルを求める。
Figure 0007043081000003
Figure 0007043081000001
Figure 0007043081000002
Here, j represents an imaginary unit, and Re {} and Im {} represent a real part and an imaginary part, respectively. In noise subtraction, the average noise amplitude spectrum is obtained from the spectrum N (q, f, n') of the electroencephalogram (EEG signal) observed prior to the speech imagery by the following equation.
Figure 0007043081000003

上式では平均ノイズスペクトラムを、時刻n’の前後8フレームから計算しているが、システムにより適せん設定してよい。なお、時刻n’の設定は、通常、
(a)音声想起認識応用システムから与えられる、プロンプト信号(想起開始を指示する信号)に続いて利用者が音声想起を行う。
(b)利用者から応用システムへの決められた呼びかけ「Yamadaさん」など(wake-up word)に続き音声想起を行う。
の二通りが考えられるが、何れの場合も音声想起の前、あるいは後の区間で観測した脳波からN(q,f,n’)を計算する。
続いて電極q 毎に、音声想起信号のスペクトラムX1(q, f, n’)から、Nav(q,f,n’)を次式のように引き去る。

Figure 0007043081000004
In the above equation, the average noise spectrum is calculated from 8 frames before and after the time n', but it may be set appropriately depending on the system. The time n'is usually set.
(A) The user performs voice recall following the prompt signal (signal instructing the start of recall) given from the voice recall recognition application system.
(B) Voice recall is performed following a fixed call from the user to the application system, such as "Yamada-san" (wake-up word).
In either case, N (q, f, n') is calculated from the brain waves observed in the section before or after the speech recall.
Subsequently, Nav (q, f, n') is subtracted from the speech recall signal spectrum X 1 (q, f, n') for each electrode q as shown in the following equation.
Figure 0007043081000004

この処理により脳波中のノイズを除去した例を図3に示す。図3(A)はノイズ除去前、同図(B)はノイズ除去後を示す。図3(A)と(B)を比べると、ノイズのスペクトラムを引き去る効果が顕著であることが分かる。ノイズ除去後の振幅スペクトル時系列は、逆高速フーリエ変換(IFFT)により、波形x2(q,n)に戻す。 FIG. 3 shows an example in which noise in the brain wave is removed by this processing. FIG. 3A shows before noise reduction, and FIG. 3B shows after noise reduction. Comparing FIGS. 3A and 3B, it can be seen that the effect of removing the noise spectrum is remarkable. The amplitude spectrum time series after noise reduction is returned to the waveform x 2 (q, n) by the inverse fast Fourier transform (IFFT).

なお、ノイズ除去後の9電極信号から、少数の独立した情報源を取り出す処理、即ち独立成分分析(Independent Component Analysis;IPA)(非特許文献4)が有効である。この処理により、前記フィルタ操作では除去できない不要成分を除けると共に、9電極の離散信号から有効な少数の情報源を選択できる。ただしICAは,分析結果の独立成分の順番が、分析の都度異なる所謂パーミュテーションという問題があり、この欠点を解消して本特許に導入する方法について後に説明する。 It should be noted that a process of extracting a small number of independent information sources from the 9-electrode signal after noise removal, that is, an independent component analysis (IPA) (Non-Patent Document 4) is effective. By this processing, unnecessary components that cannot be removed by the filter operation can be removed, and a small number of effective information sources can be selected from the discrete signals of 9 electrodes. However, ICA has a problem of so-called permutation in which the order of independent components of the analysis result is different each time of analysis, and a method of solving this drawback and introducing it into the present patent will be described later.

分析処理部4は、前処理部3で得たノイズ除去後(かつq個の独立成分抽出後)の音声想起信号のスペクトラム時系列X2(q,f,n’)を用いてもよいが、本発明の効果をより良く引き出す分析方式として、以下に線形予測分析(Linear Predictive Analysis;LPA)を適用した例を説明する。分析処理部4は、スペクトラム又は線スペクトラムを用いることができる。
線形予測符号化(Linear Predictive Coding;LPC)は、音声通信方式として、現在、世界標準になっている。音声では情報源が声帯による一定周期のパルス波と声道の狭めによるランダム波の二つになる。このため、音源を符号帳(codebook)として別途保持し、符号帳の全ての音源を音声の線形予測係数(声道の伝達関数を担う)に通し、この合成音声と元の音声との比較を行うという複雑な処理を必要としている。
The analysis processing unit 4 may use the spectrum time series X 2 (q, f, n') of the voice recall signal after noise removal (and after extraction of q independent components) obtained by the preprocessing unit 3. , An example in which Linear Predictive Analysis (LPA) is applied will be described below as an analysis method that brings out the effects of the present invention better. The analysis processing unit 4 can use a spectrum or a line spectrum.
Linear Predictive Coding (LPC) is currently the global standard as a voice communication method. In voice, there are two sources of information: pulse waves with a fixed period due to the vocal cords and random waves due to the narrowing of the vocal tract. Therefore, the sound source is held separately as a codebook, and all the sound sources in the codebook are passed through the linear prediction coefficient of speech (which is responsible for the transmission function of the vocal tract), and the synthetic speech is compared with the original speech. It requires a complicated process to do.

一方、脳波では図4に示すように情報源はランダム波のみと考えられるため、脳波合成は音声合成と比較して簡単になる。脳波x2(q,n) から得た自己相関係数r2(τ)から線形予測係数{αm }を求めるアルゴリズムはLevinson-Durbin法など種々提案されている(非特許文献4)。各電極の音声想起脳波x(n)は図4に示すように、信号源の白色雑音w(n)を神経系のインパルス応答s(n)に通して得られる。図4で☆は畳み込み積分記号を示す。 On the other hand, in the case of EEG, as shown in FIG. 4, since the information source is considered to be only a random wave, EEG synthesis is simpler than speech synthesis. Various algorithms for obtaining the linear prediction coefficient {α m } from the autocorrelation coefficient r 2 (τ) obtained from the electroencephalogram x 2 (q, n) have been proposed, such as the Levinson-Durbin method (Non-Patent Document 4). As shown in FIG. 4, the voice recall electroencephalogram x (n) of each electrode is obtained by passing the white noise w (n) of the signal source through the impulse response s (n) of the nervous system. In FIG. 4, ☆ indicates a convolution integral symbol.

畳み込み積分処理は、周波数領域では音声言語情報を担うインパルス応答s(n)の伝達(周波数)関数をS(f)として、脳波のスペクトルをX(f)=W(f)S(f)=S(f)と表現できる(但しW(f)=1)。S(f)は、線形予測係数{αm }のフーリエ変換から次式に示すように求めることができる。

Figure 0007043081000005
ここでδ(n-p)は、信号の各時刻n=pを表わす関数、F[ ]はフーリエ変換である。脳波に対する線形予測分析(LPA)では、図4に示すように合成モデルS(f)を逆フィルタとして、
Figure 0007043081000006
と求めることができる(σは振幅バイアス値)。このように、合成過程を通して分析を精度良く行う方式は、「合成による分析(Analysis-by-Synthesis; AbS)」と呼ばれ、脳波分析においても有効な方式である。上式のフーリエ変換F[ ]では、p個の線形予測係数(α0=1.0)にゼロ点を付加し(0-paddingと呼ばれる)、例えば128点、256点、…と任意点数のフーリエ変換を行うことができる。このゼロ点付加によって、周波数分解精度を各々64点、128点、…と任意に調整して、スペクトル成分A(q,f,n’)を求めることができる。 In the convolution integration process, the transmission (frequency) function of the impulse response s (n), which is responsible for speech language information, is S (f) in the frequency domain, and the spectrum of the brain wave is X (f) = W (f) S (f) =. It can be expressed as S (f) (however, W (f) = 1). S (f) can be obtained from the Fourier transform of the linear prediction coefficient {α m } as shown in the following equation.
Figure 0007043081000005
Here, δ (np) is a function representing each time n = p of the signal, and F [] is a Fourier transform. In linear predictive analytics (LPA) for EEG, as shown in FIG. 4, the synthetic model S (f) is used as an inverse filter.
Figure 0007043081000006
(Σ is the amplitude bias value). As described above, the method of performing analysis with high accuracy through the synthesis process is called "Analysis-by-Synthesis (AbS)" and is also an effective method in EEG analysis. In the Fourier transform F [] of the above equation, zero points are added to p linear prediction coefficients (α 0 = 1.0) (called 0-padding), for example, 128 points, 256 points, ... Fourier transform can be performed. By adding zero points, the frequency resolution accuracy can be arbitrarily adjusted to 64 points, 128 points, ..., Each, and the spectral component A (q, f, n') can be obtained.

図5に、LPAにより分析したスペクトルパターンを、通常のフーリエ変換により分析したスペクトルパターンと比較して示す。図5でLPAによるスペクトルパターンが複数表示されているが、これらは自己相関係数に対して遅れτが大きくなるに従い、値を減衰させるlog窓と呼ばれる窓関数を使用していることを示す(上からlag窓なし、下に行くほどlag窓の傾斜は大で、log窓を用いない場合は鋭いピークとなる)。LPAでは図に示すように、スペクトルを脳波が持つ本質的な少数のピークで表現することができる。 FIG. 5 shows the spectral pattern analyzed by LPA in comparison with the spectral pattern analyzed by ordinary Fourier transform. In FIG. 5, a plurality of spectral patterns by LPA are displayed, and these show that a window function called a log window is used to attenuate the value as the delay τ increases with respect to the autocorrelation coefficient (). There is no lag window from the top, the slope of the lag window is large toward the bottom, and it becomes a sharp peak when the log window is not used). In LPA, as shown in the figure, the spectrum can be represented by a small number of essential peaks of the brain wave.

LPA分析を通した音声想起時脳波のスペクトルが、少数のスペクトルピークで表現される。このことから、脳(特に、音声想起の言語情報が現れるブローカ野)においては、言語表象(linguistic representation)は短時間正弦波(tone-burst)群から構成されている、言い換えると言語表象は特有の線スペクトルで表わされると推定される。図6にトーンバースト波群とそれらのスペクトル形状の例を示す。短時間正弦波は、本来、単一パラメータすなわち単一周波数で表わされるが、図に示したように(また図5に示したように)、信号の前後に過渡部を持つことで通常の周波数分析ではスペクトルに広がりを持つ。 The spectrum of the EEG during speech recall through LPA analysis is represented by a small number of spectral peaks. From this, in the brain (especially Broca's area where the linguistic information of speech recall appears), the linguistic representation is composed of a group of short-time sine waves (tone-burst), in other words, the linguistic representation is peculiar. It is presumed to be represented by the line spectrum of. FIG. 6 shows an example of a tone burst wave group and their spectral shapes. A short-time sine wave is originally represented by a single parameter or single frequency, but as shown in the figure (and as shown in Figure 5), it has a normal frequency by having transients before and after the signal. The analysis has a spread in the spectrum.

言語特徴抽出部5は、広がりを持つスペクトル群から、線スペクトル成分を「言語表象」として取り出すと共に、音素単位畳み込み演算子を通して、言語特徴である音素尤度ベクトル時系列を出力する。
以下、図7の言語特徴抽出部の処理フロー図に沿って処理過程を説明する。言語特徴出部5は、分析処理部4から電極qのスペクトル時系列が入力される(ステップS1)。音声想起時脳波のスペクトラムは、図8(A)に示すように±5Hz程度の揺らぎを持つことがある。そこで、これらの周波数揺らぎを非線形フィルタリングの一種である中間値フィルタ(median filter)を用いて、周波数揺らぎを吸収する(ステップS2)。
The language feature extraction unit 5 extracts line spectrum components as "language representations" from a spread spectrum group, and outputs a phoneme likelihood vector time series, which is a language feature, through a phoneme unit convolution operator.
Hereinafter, the processing process will be described with reference to the processing flow diagram of the language feature extraction unit of FIG. 7. In the language feature output unit 5, the spectral time series of the electrode q is input from the analysis processing unit 4 (step S1). The spectrum of the electroencephalogram at the time of voice recall may have a fluctuation of about ± 5 Hz as shown in FIG. 8 (A). Therefore, these frequency fluctuations are absorbed by using a median filter, which is a kind of non-linear filtering (step S2).

一定の時間幅(時刻n’の前後の数フレーム)と周波数幅(隣接周波数f-1,f,f+1)の中にあるデータを対象に、全体の中の中間値を求めて代表させる。この処理は、中央値から外れた値を削除できるため、周波数揺らぎを吸収することができる。非線形フィルタの出力は、ガウス窓などにより平滑処理を施すことが一般的である。図8(B)に70Hz~170Hzの脳波信号(4msec周期)に対して、中心フレームn’の前後3フレームの計7フレームに対して中間値フィルタ処理を行った際の周波数揺らぎの改善結果を示した。図から揺らぎが少なくなっていることが分かる。この後、周波数分析パターンに対して、時間方向にガウス窓(係数;{1/4,1/2,1/4})を掛けて平滑し、時間フレームを4msecから8msec周辺に落としている。なお、周波数揺らぎを吸収する処理は、前処理部3の中で振幅スペクトラム上でのノイズ成分引き去りを行った後、波形信号に戻す前の段階で行うことも可能である。 For data within a fixed time width (several frames before and after time n') and frequency width (adjacent frequencies f-1, f, f + 1), an intermediate value in the whole is obtained and represented. Since this process can delete the value deviating from the median value, the frequency fluctuation can be absorbed. The output of the nonlinear filter is generally smoothed by a Gaussian window or the like. FIG. 8B shows the improvement results of frequency fluctuations when intermediate value filtering was performed on a total of 7 frames, 3 frames before and after the central frame n', for an electroencephalogram signal (4 msec period) of 70 Hz to 170 Hz. Indicated. It can be seen from the figure that the fluctuation is reduced. After that, the frequency analysis pattern is smoothed by multiplying it by a Gaussian window (coefficient; {1/4/1/2/1/4}) in the time direction, and the time frame is dropped from 4 msec to around 8 msec. The process of absorbing the frequency fluctuation can also be performed in the preprocessing unit 3 at a stage after removing the noise component on the amplitude spectrum and before returning to the waveform signal.

次に、線スペクトルの抽出過程を説明する(ステップS3)。この処理では時間フレーム(8msec)毎に、周波数軸上に現れるピーク由来の成分を線スペクトルとして抽出する。具体的には:
(i) 周波数軸上の極大値Δf =0となる周波数、
(ii) 変曲点ΔΔf =0の時
Δf >0ならΔΔf の値が正から負に変化する周波数、
Δf <0ならΔΔf の値が負から正に変化する周波数、
これらの条件を満たす場合にのみ、元の振幅を持つ正弦波周波数成分、すなわち線スペクトル成分とする。
Next, the process of extracting the line spectrum will be described (step S3). In this process, the component derived from the peak appearing on the frequency axis is extracted as a line spectrum every time frame (8 msec). In particular:
(i) Frequency at which the maximum value Δ f = 0 on the frequency axis,
(ii) When the inflection point ΔΔ f = 0, if Δf> 0, the frequency at which the value of ΔΔf changes from positive to negative,
If Δ f <0, the frequency at which the value of ΔΔ f changes from negative to positive,
Only when these conditions are satisfied, the sinusoidal frequency component having the original amplitude, that is, the line spectrum component is used.

図9に音声想起時の脳波の線スペクトル成分の抽出例を示す。この例では/ga-gi-gu-ge-go/を3回、できるだけ連続して想起するタスク下でデータを採取している。同じシーケンスを3回続けることで、熟練者は図に示すような各音節のパターンを学習でき、脳波データに音節ラベルを付したデータベースを作成することができる。
図9では9電極の線スペクトル時系列を、電極方向にプーリング(9電極から代表となるパターンを抽出する処理を行う。p-normを取るなどの処理(p=∞は最大値を取ることに相当))の処理を行い、統合した線スペクトルを対象に音節ラベリングを行った結果を示している。ここでのプーリング処理は音節ラベルを読み取るだけのために行っており、以下の音素特徴抽出では元の9電極の線スペクトル成分を対象にしている。
FIG. 9 shows an example of extracting the line spectrum component of the electroencephalogram at the time of recalling the voice. In this example, data is collected under the task of recalling / ga-gi-gu-ge-go / three times in succession as much as possible. By repeating the same sequence three times, the expert can learn the pattern of each syllable as shown in the figure, and can create a database in which the EEG data is labeled with a syllable.
In FIG. 9, the line spectrum time series of 9 electrodes is pooled in the electrode direction (processing for extracting a representative pattern from 9 electrodes. Processing such as taking p-norm (p = ∞ takes the maximum value). The result of syllable labeling for the integrated line spectrum after the processing of)) is shown. The pooling process here is performed only for reading the syllable label, and in the following phoneme feature extraction, the line spectrum component of the original 9 electrodes is targeted.

言語特徴抽出部5は、最終的に音素特徴を抽出することを目的としている。すなわち、電極毎の線スペクトル成分から、音声言語情報として最小の単位である音素(phoneme)成分を音素特徴ベクトルの形で取り出すことを目指す。脳波中の音声言語情報は、線スペクトル(周波数情報) - 電極(空間情報) - フレーム(時間情報)の三つの軸に跨る、所謂テンソル構造を持つ。ブローカ野の3×3=9電極に跨る線スペクトル時系列の例を図10に示す。この例は単音節/ka/の例を示している。このように、ブローカ野に現れる音節パターンは、現れる電極位置がその都度異なり、脳神経系の柔軟な情報処理メカニズムを窺わせる。一方、脳の音声言語処理では発話の最小単位として音節がブローカ野に現れるが、発話の際には発話器官を筋動作で制御しており、この制御は音素と一対一に対応する調音パラメータで行われる。こうした背景を考えると、ブローカ野で観測される図10の音節パターンから、音素特徴を抽出する過程が存在すると考えられ、この過程をコンピュータ上で実現する方法を図11の音素別畳み込み演算子の設計と利用の処理手順を示すフローに従い以下に説明する。 The language feature extraction unit 5 aims to finally extract phoneme features. That is, we aim to extract the phoneme component, which is the smallest unit of vocal language information, from the line spectrum component of each electrode in the form of a phoneme feature vector. Vocal language information in brain waves has a so-called tensor structure that straddles the three axes of line spectrum (frequency information) -electrodes (spatial information) -frames (time information). FIG. 10 shows an example of a line spectrum time series straddling 3 × 3 = 9 electrodes in Broca's area. This example shows an example of a single syllable / ka /. In this way, the syllable pattern that appears in Broca's area has different electrode positions each time, suggesting a flexible information processing mechanism of the cranial nerve system. On the other hand, in vocal language processing of the brain, syllables appear in Broca's area as the smallest unit of utterance, but during utterance, the speech organs are controlled by muscle movements, and this control is a tone parameter that corresponds one-to-one with phonemes. Will be done. Considering this background, it is considered that there is a process of extracting phoneme features from the syllable pattern of FIG. 10 observed in Broca's area, and the method of realizing this process on a computer is described by the phoneme-specific convolution operator of FIG. It will be described below according to the flow showing the processing procedure of design and utilization.

図11のフローは、9電極の周波数-時間パターンから、音素を効率的に抽出するため、音素別畳み込み演算子による音素尤度ベクトルの算出を示している。まず同じ音素コンテクストに属する音節(音素/s/では/sa/,/shi/,/su/,/se/,/so/,あるいは音素/a/では/a/,/ka/,/sa/,/ta/,/na/,/ha/,….,/ga/,/za/,… など)をメモリ上に蓄積しておく(ステップS11)。この蓄積された情報を出し入れして必要な情報処理に利用する手法はプーリングと呼ばれる。 The flow of FIG. 11 shows the calculation of the phoneme likelihood vector by the phoneme-specific convolution operator in order to efficiently extract phonemes from the frequency-time pattern of 9 electrodes. First, syllables belonging to the same phoneme context (phoneme / s / in / sa /, / shi /, / su /, / se /, / so /, or phoneme / a / in / a /, / ka /, / sa / , / Ta /, / na /, / ha /, ..., / Ga /, / za /, ..., etc.) are stored in the memory (step S11). The method of taking in and out this accumulated information and using it for necessary information processing is called pooling.

次に、音節毎に主成分分析を行い(ステップS12)、音節毎の固有ベクトルを関連音素毎に、音素/s/:{ψ/sa/(m),ψ/shi/(m),ψ/su/(m),ψ/se/(m),ψ/so/(m)}、音素/a/:{ψ/a/(m),ψ/ka/(m),ψ/sa/(m),ψ/ta/(m),ψ/na/(m),….} のように音素グループ化する。続いて、同じ音素グループの固有ベクトルから自己相関行列を計算して、音素別自己相関行列Rs,Ra,・・・へ統合する(ステップS13)。音素別自己相関行列からは、音素別の部分空間(固有ベクトル)φ/s/(m), φ/a/(m)を求めることができる。図12に音素/s/と/a/の固有ベクトル(上位3軸の累積を表示)を示した。 Next, the principal component analysis is performed for each syllable (step S12), and the eigenvector for each syllable is determined for each related phoneme, phoneme / s /: {ψ / sa / (m), ψ / shi / (m), ψ /. su / (m), ψ / se / (m), ψ / so / (m)}, phoneme / a /: {ψ / a / (m), ψ / ka / (m), ψ / sa / ( Group phonemes like m), ψ / ta / (m), ψ / na / (m),….}. Subsequently, the autocorrelation matrix is calculated from the eigenvectors of the same phoneme group and integrated into the phoneme-specific autocorrelation matrices R s , Ra , ... (Step S13). From the phoneme-specific autocorrelation matrix, the subspaces (eigenvectors) φ / s / (m) and φ / a / (m) for each phoneme can be obtained. FIG. 12 shows the eigenvectors of phonemes / s / and / a / (displaying the accumulation of the upper three axes).

次に、音素k毎に得られる固有ベクトル群を「音素単位畳み込み演算子」として使用することで、未知の9電極(もしくはICA後の少数)線スペクトル時系列に対する音素類似度(尤度)L(k)を計算することが出来る(ステップS4、ステップS14、ステップS15)。

Figure 0007043081000007
ここでMaxの意味はq個(電極もしくはICAの成分)について最大値を取ることを意味している.また< >は内積演算を示す。なお,X(q,f,n’)およびφ(f,n’)は各々予めノルムで正規化されている。
音素k;k=1, 2,…, Kの尤度L(k)をK個並べたベクトルを音素特徴ベクトルとする。(7)式は、音素の固有ベクトルφ(f,n’)を利用して音素単位の畳み込み演算子を構成しており、音素k毎に尤度としてのスカラー値L(k)が得られ、これをK個並べたベクトルが、入力X(f,n’)の時刻n’が推移するに従い(音素尤度ベクトル)時系列データとして言語特徴抽出部5から出力される(ステップS5、ステップS16)。
図13に音素の尤度(L(g),L(o),…)から音節の尤度(L(go),L(ro),…)を求めて表示した例を示した。この例は連続数字(“1,2,3,4,5,6,7,8, 9,0”)をこの順で想起した際の音節の尤度を濃淡で示している。縦軸に音節(上からi,chi,ni,sa,N,yo,o,go,ro,ku,na,ha,kyu,u, ze,e,noise)を示した。連続数字を構成する音節の尤度が高い値で求められていることが分かる。 Next, by using the eigenvector group obtained for each phoneme k as the "phoneme unit convolution operator", the phoneme similarity (probability) L (probability) to the unknown 9-electrode (or minority after ICA) line spectrum time series (probability) L ( k) can be calculated (step S4, step S14, step S15).
Figure 0007043081000007
Here, the meaning of Max means to take the maximum value for q pieces (electrode or ICA component). Further, <> indicates an inner product operation. Note that X (q, f, n') and φ (f, n') are each pre-normalized by the norm.
A phoneme feature vector is a vector in which K-likelihoods L (k) of phonemes k; k = 1, 2, ..., K are arranged. In equation (7), a phoneme-based convolution operator is constructed using the phoneme eigenvector φ (f, n'), and a scalar value L (k) as a likelihood is obtained for each phoneme k. A vector in which K of these are arranged is output from the language feature extraction unit 5 as time-series data as the time n'of the input X (f, n') changes (phoneme likelihood vector) (step S5, step S16). ).
FIG. 13 shows an example in which the likelihood of a syllable (L (go), L (ro), ...) Is obtained from the likelihood of a phoneme (L (g), L (o), ...) And displayed. In this example, the likelihood of syllables when consecutive numbers (“1, 2, 3, 4, 5, 6, 7, 8, 9, 0”) are recalled in this order is shown in shades. The vertical axis shows syllables (i, chi, ni, sa, N, yo, o, go, ro, ku, na, ha, kyu, u, ze, e, noise from the top). It can be seen that the likelihood of the syllables constituting the consecutive numbers is obtained with a high value.

なお、音声想起データは大量に収集することが、現時点では困難なため、ここでは音素畳み込み演算子の形で問題を解決する例を示した。しかし,音声想起に関する脳データベースが今後充実するに従い、近年、画像処理等の分野で多用される、深層畳み込みネットワーク(Deep Convolutional Net.;DCN)などを、音素別畳み込み演算子の代わりに用いることが可能である。 Since it is difficult to collect a large amount of voice recall data at this time, an example of solving the problem in the form of a phoneme convolution operator is shown here. However, as the brain database related to speech recall is enriched in the future, it is possible to use a deep convolutional network (DCN), which is often used in the field of image processing in recent years, instead of the phoneme-specific convolution operator. It is possible.

単語・文認識部6は、音素特徴ベクトルの時系列データ(正確には音素尤度ベクトル時系列データ)から単語・文を認識する。単語・文の認識は、音声認識の分野で実用化されている隠れマルコフモデル(HMM)を用いる方法(この中では音素の前後コンテクストを含むtriphonが利用される)、深層ニューラルネットワークを用いる方法(LSTMなど)を応用することができる。また、現行の音声認識のメリットである言語情報(単語の並びに関する確率)も同様に利用可能である。さらに、音声想起では時間軸のズレが問題になるが、現行の頑健な音声システムで行われる、時間方向に単語・文を連続して探索する「スポッティング処理」の利用が、音声想起でも性能向上に効果的である。 The word / sentence recognition unit 6 recognizes a word / sentence from the time series data of the phoneme feature vector (to be exact, the phoneme likelihood vector time series data). For word / sentence recognition, a method using a hidden Markov model (HMM) that has been put into practical use in the field of speech recognition (in which triphon including front and back contexts of phonemes is used), and a method using a deep neural network ( LSTM etc.) can be applied. In addition, linguistic information (probability regarding word sequence), which is an advantage of current speech recognition, can be used as well. Furthermore, although the time axis shift becomes a problem in voice recall, the use of "spotting processing" that continuously searches for words and sentences in the time direction, which is performed in the current robust voice system, improves performance even in voice recall. It is effective for.

後処理・出力部7は、認識結果の単語(列)を受け必要なディスプレィ表示や音声出力を行う。ここでは、予め決められた単語・文の音声想起認識結果から、多電極の脳波センサーが正しい位置にあるか否かを利用者にフィードバックし、利用者がスマートフォン等の端末の画面や音声指示により、脳波センサーを動かすことによって、適正な位置を知ることを支援する機能を持たせることができる。 The post-processing / output unit 7 receives the word (column) of the recognition result and performs necessary display display and voice output. Here, based on the voice recall recognition result of a predetermined word / sentence, feedback is given to the user whether or not the multi-electrode brain wave sensor is in the correct position, and the user uses the screen of a terminal such as a smartphone or voice instruction. By moving the brain wave sensor, it is possible to have a function to support knowing the proper position.

後処理・出力部7は、音声想起しながら、電極群の最適位置を調整することを支援する画面を表示する。この後処理・出力部7は、ディスプレィ表示を行うことができ、図14は後処理・出力部7が表示するディスプレィ画面を示している。利用者は、図14に示す画面を見ながら電極群の位置を調整していく。
図14に示すように、テスト音声想起(“山田さん”など)を音声想起すると、脳波入力部2より脳波が入力され、後処理・出力部7が表示する画面に、色、〇の大きさ、グラティーションの濃さ(図の例)等で認識結果の精度を示すことができる。図14では、最初の電極位置(1)では白色で表示され、次の電極位置(2)では薄いグレーで表示され、次の電極位置(3)ではグレーで表示され、さらに次の電極位置(4)では濃いグレーで表示され、次の位置(5)では薄いグレーで表示されている。したがって、電極位置(4)が最適な電極位置であることを利用者は知ることができる。精度の違いを時系列で見ながら、正解が出る方向にセンサー位置を移動させ更正する機能を持たせる例を示した。
The post-processing / output unit 7 displays a screen that assists in adjusting the optimum position of the electrode group while recalling the voice. The post-processing / output unit 7 can display a display, and FIG. 14 shows a display screen displayed by the post-processing / output unit 7. The user adjusts the position of the electrode group while looking at the screen shown in FIG.
As shown in FIG. 14, when the test voice recall (such as "Mr. Yamada") is recalled, the brain wave is input from the brain wave input unit 2, and the color and the size of 〇 are displayed on the screen displayed by the post-processing / output unit 7. , The accuracy of the recognition result can be shown by the density of the gradation (example in the figure). In FIG. 14, the first electrode position (1) is displayed in white, the next electrode position (2) is displayed in light gray, the next electrode position (3) is displayed in gray, and the next electrode position ( In 4), it is displayed in dark gray, and in the next position (5), it is displayed in light gray. Therefore, the user can know that the electrode position (4) is the optimum electrode position. While looking at the difference in accuracy in chronological order, an example was shown in which the sensor position is moved in the direction in which the correct answer is obtained to have a function to correct it.

図1で示した音声想起認識装置1は、携帯端末によって構成することができる。また、音声想起認識装置1は、サーバによって構成することができる。このとき、音声想起認識装置1は、複数のサーバによって構成されていてもよい。また、音声想起認識装置1は、携帯端末とサーバとによって構成することもできる。音声想起認識装置1の一部の処理を携帯端末で、残りの処理をサーバによって処理することができる。このときも、サーバは複数のサーバによって構成することもできる。 The voice recall recognition device 1 shown in FIG. 1 can be configured by a mobile terminal. Further, the voice recall recognition device 1 can be configured by a server. At this time, the voice recall recognition device 1 may be configured by a plurality of servers. Further, the voice recall recognition device 1 can also be configured by a mobile terminal and a server. A part of the processing of the voice recall recognition device 1 can be processed by the mobile terminal, and the remaining processing can be processed by the server. At this time, the server can also be configured by a plurality of servers.

また、音声想起認識装置1は、図1に示したように、脳波入力部2、前処理部3、分析処理部、言語特徴抽出部5、単語・文認識部6、後処理・出力部7によって構成されていたが、音声想起認識装置に、装着具と電極群を含めるようにしてもよい。 Further, as shown in FIG. 1, the voice recall recognition device 1 includes an electroencephalogram input unit 2, a preprocessing unit 3, an analysis processing unit, a language feature extraction unit 5, a word / sentence recognition unit 6, and a post-processing / output unit 7. Although it was configured by, the voice recall recognition device may include a wearer and a group of electrodes.

図15は、音声想起認識装置の他の構成例を示す図である。
図15に示すように、音声想起認識装置10は、装着具11、携帯端末12、サーバ13を備える。装着具11は、音声想起時の脳波から音声言語を認識する音声想起認識装置用の装着具である。装着具11は、電極群22を保持するシート部21、ブローカ野周辺に配置される電極群22と、電極群22からの信号を出力する処理部23と、を有する。電極群22は、上述したように9電極から構成されているが、電極数は限定されない。処理部23は、通信機能を有していてもよく、図1で示した音声想起認識装置1の一部又は全部の処理を行うことができる。
FIG. 15 is a diagram showing another configuration example of the voice recall recognition device.
As shown in FIG. 15, the voice recall recognition device 10 includes a wearer 11, a mobile terminal 12, and a server 13. The wearing tool 11 is a wearing tool for a voice recall recognition device that recognizes a voice language from an electroencephalogram at the time of voice recall. The mounting tool 11 has a sheet unit 21 for holding the electrode group 22, an electrode group 22 arranged around the broker field, and a processing unit 23 for outputting a signal from the electrode group 22. The electrode group 22 is composed of 9 electrodes as described above, but the number of electrodes is not limited. The processing unit 23 may have a communication function, and can process a part or all of the voice recall recognition device 1 shown in FIG.

装着具11の処理部23、携帯端末12及びサーバ13は、例えば、CPU(Central Processing Unit)、メモリ、ROM(Read only memory)及びハードディスク等を有するコンピュータによって構成されている。端末12は、図1で示した音声想起認識装置1の一部又は全部の処理を行うことができる。サーバ13は、図1で示した音声想起認識装置1の一部又は全部の処理を行うことができる。
音声想起時の脳波から音声言語を認識する音声想起認識方法は、装着具11、携帯端末12及び/又はサーバ13によって実行され、装着具11、携帯端末12及び/又はサーバ13は単独で又は協働して実行することができる。音声想起認識方法は、携帯端末12とサーバ13とによって実行することができる。
The processing unit 23, the mobile terminal 12, and the server 13 of the mounting tool 11 are composed of, for example, a computer having a CPU (Central Processing Unit), a memory, a ROM (Read only memory), a hard disk, and the like. The terminal 12 can perform some or all processing of the voice recall recognition device 1 shown in FIG. The server 13 can perform some or all processing of the voice recall recognition device 1 shown in FIG.
The voice recall recognition method of recognizing a voice language from an electroencephalogram at the time of voice recall is executed by the wearer 11, the mobile terminal 12 and / or the server 13, and the wearer 11, the mobile terminal 12 and / or the server 13 are used alone or in cooperation with each other. Can work and run. The voice recall recognition method can be executed by the mobile terminal 12 and the server 13.

コンピュータに、音声想起時の脳波から音声言語を認識する音声想起認識処理を実行させるためのプログラムは、上記ハードディスク等にダウンロード又は記憶され、上記コンピュータに、電極群から入力される電極毎の脳波の離散信号群を分析処理してスペクトル時系列を出力する分析処理と、電極毎のスペクトル成分に基づき、音素特徴ベクトル時系列を抽出する抽出処理と、を実行させる。 The program for causing the computer to execute the voice recall recognition process for recognizing the voice language from the brain waves at the time of voice recall is downloaded or stored in the hard disk or the like, and the brain waves for each electrode input to the computer from the electrode group. An analysis process for analyzing a discrete signal group and outputting a spectral time series, and an extraction process for extracting a phonetic feature vector time series based on the spectral components of each electrode are executed.

図16は、音声想起認識装置の他の構成例を示す図である。
図16に示すように、音声想起認識装置20は、装着具11とサーバ13とにより構成されている。装着具11の構成は、図15で説明した通りであるが、装着具11の処理部23は、サーバ13と直接通信する機能を有する。装着具11が直接サーバ13と情報のやりとりを行うことにより、音声想起認識装置の機能を実現することができる。
FIG. 16 is a diagram showing another configuration example of the voice recall recognition device.
As shown in FIG. 16, the voice recall recognition device 20 is composed of a wearer 11 and a server 13. The configuration of the mounting tool 11 is as described with reference to FIG. 15, but the processing unit 23 of the mounting tool 11 has a function of directly communicating with the server 13. The function of the voice recall recognition device can be realized by the mounting tool 11 directly exchanging information with the server 13.

図17は、音声想起認識装置の他の構成例を示す図である。
図17に示すように、音声想起認識装置30は、装着具11から構成されている。装着具11の処理部23が、図1で示した音声想起認識装置30のすべての機能を実現することにより、装着具11だけによって音声想起認識装置を実現することができる。
FIG. 17 is a diagram showing another configuration example of the voice recall recognition device.
As shown in FIG. 17, the voice recall recognition device 30 is composed of a wearer 11. By realizing all the functions of the voice recall recognition device 30 shown in FIG. 1, the processing unit 23 of the mounting tool 11 can realize the voice recall recognition device only by the mounting tool 11.

以上、本実施形態によれば、音声想起時の脳波から直接、言語表象しての線スペクトル成分群を抽出し、さらに音素特徴ベクトル時系列へ変換することが出来るため、現行の音声認識の枠組みを活用できるという利点がある。 As described above, according to the present embodiment, it is possible to directly extract a line spectrum component group representing a language from an electroencephalogram at the time of speech recall and further convert it into a phoneme feature vector time series, so that the current speech recognition framework can be used. There is an advantage that can be utilized.

なお、以上の実施形態に関し、更に以下の付記を開示する。 The following additional notes will be further disclosed with respect to the above embodiments.

(付記1)
音声想起時の脳波から音声言語を認識する音声想起認識方法であって、
電極群から入力される前記電極毎の脳波の離散信号群を分析処理してスペクトル時系列を出力する分析処理ステップと、
前記スペクトル時系列に基づき、音素特徴ベクトル時系列を出力する抽出ステップと、
を含む音声想起認識方法。
(Appendix 1)
It is a voice recall recognition method that recognizes vocal language from brain waves at the time of speech recall.
An analysis processing step that analyzes and processes a discrete signal group of brain waves for each electrode input from the electrode group and outputs a spectral time series.
An extraction step that outputs a phoneme feature vector time series based on the spectral time series,
Voice recall recognition method including.

(付記2)
電極群から入力される脳波を離散信号群に変換する入力ステップを更に含む、付記1に記載の音声想起認識方法。
(付記3)
前記電極毎の離散信号群を周波数領域に変換した音声想起信号のスペクトラムから平均ノイズ振幅スペクトルを差し引くことにより前記脳波中のノイズを除去する処理を行う前処理部を更に含む、付記1又は付記2に記載の音声想起認識方法。
(Appendix 2)
The voice recall recognition method according to Appendix 1, further comprising an input step of converting an electroencephalogram input from an electrode group into a discrete signal group.
(Appendix 3)
Addendum 1 or 2 further includes a preprocessing unit that performs a process of removing noise in the brain wave by subtracting an average noise amplitude spectrum from the spectrum of a voice recall signal obtained by converting a discrete signal group for each electrode into a frequency domain. The voice recall recognition method described in.

(付記4)
前記ノイズ除去後の各電極信号から少数の独立した情報源を取り出す独立成分分析を行うステップを更に含む付記3に記載の音声想起認識方法。
(付記5)
前記音素特徴ベクトル時系列に基づき、前記音声言語を認識する認識ステップを更に含む、付記1から付記4のいずれかに記載の音声想起認識方法。
(付記6)
前記認識した音声言語を出力する出力ステップを更に含む、付記1から付記5のいずれかに記載の音声想起認識方法。
(Appendix 4)
The voice recall recognition method according to Appendix 3, further comprising an independent component analysis for extracting a small number of independent information sources from each electrode signal after noise reduction.
(Appendix 5)
The speech recall recognition method according to any one of Supplementary note 1 to Supplementary note 4, further comprising a recognition step of recognizing the speech language based on the phoneme feature vector time series.
(Appendix 6)
The voice recall recognition method according to any one of Supplementary note 1 to Supplementary note 5, further comprising an output step of outputting the recognized speech language.

(付記7)
前記音声想起しながら前記電極群の最適位置を調整することを支援する画面を表示するステップを更に含む、付記6に記載の音声想起認識方法。
(付記8)
前記分析処理ステップは、線形予測分析を適用することにより前記スペクトル時系列を抽出する、付記1から付記7のいずれかに記載の音声想起認識方法。
(付記9)
前記分析処理ステップは、前記電極毎の離散信号に基づき、周波数揺らぎを吸収するステップを含む、付記1から付記8のいずれかに記載の音声想起認識方法。
(Appendix 7)
The voice recall recognition method according to Appendix 6, further comprising a step of displaying a screen that assists in adjusting the optimum position of the electrode group while recalling the voice.
(Appendix 8)
The speech recall recognition method according to any one of Supplementary note 1 to Supplementary note 7, wherein the analysis processing step extracts the spectral time series by applying linear prediction analysis.
(Appendix 9)
The voice recall recognition method according to any one of Supplementary note 1 to Supplementary note 8, wherein the analysis processing step includes a step of absorbing frequency fluctuations based on the discrete signal for each electrode.

(付記10)
前記分析処理ステップは、時間フレーム毎に、周波数軸上のピーク由来の周波数を線スペクトル成分として抽出する、付記1から付記9のいずれかに記載の音声想起認識方法。
(付記11)
前記抽出ステップは、所定の畳み込み演算子を用いて言語特徴である音素尤度ベクトル時系列を出力する、付記1から付記10のいずれかに記載の音声想起認識方法。
(Appendix 10)
The speech recall recognition method according to any one of Supplementary note 1 to Supplementary note 9, wherein the analysis processing step extracts the frequency derived from the peak on the frequency axis as a line spectrum component for each time frame.
(Appendix 11)
The speech recall recognition method according to any one of Supplementary note 1 to Supplementary note 10, wherein the extraction step outputs a phoneme likelihood vector time series, which is a language feature, using a predetermined convolution operator.

(付記12)
前記音声想起認識方法は、携帯端末、サーバ又は携帯端末及びサーバによって実行される、付記1から付記11のいずれかに記載の音声想起認識方法。
(付記13)
装着具に設けられるブローカ野周辺に配置される電極群からの信号を出力する出力ステップを更に含む、付記1から付記12のいずれかに記載の音声想起認識方法。
(Appendix 12)
The voice recall recognition method according to any one of Supplementary note 1 to Supplementary note 11, which is executed by a mobile terminal, a server, or a mobile terminal and a server.
(Appendix 13)
The voice recall recognition method according to any one of Supplementary note 1 to Supplementary note 12, further comprising an output step of outputting a signal from a group of electrodes arranged around Broca's area provided in the mounting tool.

かくしてこの発明の音声想起認識装置、装着具、方法、プログラムによれば、音声想起時の脳波から直接、言語表象としての線スペクトル群と音素特徴群へ変換することが可能になるので、現行の音声認識の枠組みにBCIできる音声言語を提供することができる。 Thus, according to the speech recognition recognition device, wearing tool, method, and program of the present invention, it is possible to directly convert the brain wave at the time of speech recall into a line spectrum group and a phoneme feature group as a language representation. It is possible to provide a speech language that can be BCI in the framework of speech recognition.

1 音声想起認識装置
2 脳波入力部
3 前処理部
4 分析処理部
5 言語特徴抽出部
6 単語・文字認識部
7 後処理・出力部
1 Voice recall recognition device 2 EEG input unit 3 Pre-processing unit 4 Analysis processing unit 5 Language feature extraction unit 6 Word / character recognition unit 7 Post-processing / output unit

Claims (16)

音声想起時の脳波から音声言語を認識する音声想起認識装置であって、
電極群から入力される前記電極毎の脳波の離散信号群を分析処理してスペクトル時系列を出力する分析処理部と、
前記スペクトル時系列に基づき、音素特徴ベクトル時系列を出力する抽出部と、
を有し、
前記抽出部は、所定の畳み込み演算子を用いて言語特徴である音素尤度ベクトル時系列を出力する、音声想起認識装置。
It is a voice recall recognition device that recognizes vocal language from brain waves at the time of speech recall.
An analysis processing unit that analyzes and processes discrete signal groups of brain waves for each electrode input from the electrode group and outputs a spectral time series.
An extraction unit that outputs a phoneme feature vector time series based on the spectral time series,
Have,
The extraction unit is a speech recall recognition device that outputs a phoneme likelihood vector time series, which is a language feature, using a predetermined convolution operator.
電極群から入力される脳波を離散信号群に変換する脳波入力部を更に有する、請求項1に記載の音声想起認識装置。 The voice recall recognition device according to claim 1, further comprising an electroencephalogram input unit that converts an electroencephalogram input from an electrode group into a discrete signal group. 前記電極毎の離散信号群を周波数領域に変換した音声想起信号のスペクトラムから平均ノイズ振幅スペクトルを差し引くことにより前記脳波中のノイズを除去する処理を行う前処理部を更に有する、請求項1又は請求項2に記載の音声想起認識装置。 1. Item 2. The voice recall recognition device according to Item 2. 前記前処理部は、前記ノイズ除去後の各電極信号から少数の独立した情報源を取り出す独立成分分析を行う請求項3に記載の音声想起認識装置。 The voice recall recognition device according to claim 3, wherein the preprocessing unit performs an independent component analysis for extracting a small number of independent information sources from each electrode signal after noise removal. 前記音素特徴ベクトル時系列に基づき、前記音声言語を認識する認識部を更に有する、請求項1から請求項4のいずれか1項に記載の音声想起認識装置。 The voice recall recognition device according to any one of claims 1 to 4, further comprising a recognition unit that recognizes the voice language based on the phoneme feature vector time series. 前記認識部で認識した音声言語を出力する出力部を更に有する、請求項5に記載の音声想起認識装置。 The voice recall recognition device according to claim 5, further comprising an output unit that outputs the voice language recognized by the recognition unit. 前記出力部は、前記認識部による認識の実行中において前記電極群の最適位置を調整することを支援する画面を表示する、請求項6に記載の音声想起認識装置。 The voice recall recognition device according to claim 6, wherein the output unit displays a screen that assists in adjusting the optimum position of the electrode group during recognition by the recognition unit. 前記分析処理部は、線形予測分析を適用することにより前記スペクトル時系列を抽出する、請求項1から請求項7のいずれか1項に記載の音声想起認識装置。 The voice recall recognition device according to any one of claims 1 to 7, wherein the analysis processing unit extracts the spectral time series by applying linear prediction analysis. 前記分析処理部は、前記電極毎の離散信号に基づき、周波数揺らぎを吸収する処理を行う、請求項1から請求項8のいずれか1項に記載の音声想起認識装置。 The voice recall recognition device according to any one of claims 1 to 8, wherein the analysis processing unit performs processing for absorbing frequency fluctuations based on the discrete signal for each electrode. 前記分析処理部は、時間フレーム毎に、周波数軸上のピーク由来の周波数を線スペクトル成分として抽出する、請求項1から請求項9のいずれか1項に記載の音声想起認識装置。 The voice recall recognition device according to any one of claims 1 to 9, wherein the analysis processing unit extracts the frequency derived from the peak on the frequency axis as a line spectrum component for each time frame. ブローカ野周辺に配置される電極群を更に有する請求項1から請求項10のいずれか1項に記載の音声想起認識装置。 The voice recall recognition device according to any one of claims 1 to 10, further comprising a group of electrodes arranged around Broca's area. 頭部に装着する装着具を更に有する請求項11に記載の音声想起認識装置。 The voice recall recognition device according to claim 11, further comprising a wearer to be worn on the head. 前記音声想起認識装置は、携帯端末、サーバ又は携帯端末及びサーバによって構成される、請求項1から請求項11のいずれか1項に記載の音声想起認識装置。 The voice recall recognition device according to any one of claims 1 to 11, wherein the voice recall recognition device is composed of a mobile terminal, a server, or a mobile terminal and a server. 音声想起時の脳波から音声言語を認識する音声想起認識装置用の装着具であって、
ブローカ野周辺に配置される電極群と、
前記電極群からの信号を出力する処理部と、を有し、
前記音声想起認識装置は、前記処理部から出力される前記電極毎の脳波の離散信号群を分析処理してスペクトル時系列を出力する分析処理と、
前記スペクトル時系列に基づき、音素特徴ベクトル時系列を出力する抽出処理と、を実行し、
前記抽出処理は、所定の畳み込み演算子を用いて言語特徴である音素尤度ベクトル時系列を出力することを含む、装着具。
It is a wearer for a voice recall recognition device that recognizes a voice language from brain waves at the time of voice recall.
Electrodes placed around Broca's area and
It has a processing unit that outputs a signal from the electrode group, and has a processing unit.
The voice recall recognition device has an analysis process of analyzing and processing a discrete signal group of brain waves for each electrode output from the processing unit and outputting a spectral time series.
Based on the spectral time series, the extraction process that outputs the phoneme feature vector time series is executed.
The extraction process includes outputting a phoneme likelihood vector time series, which is a language feature, using a predetermined convolution operator.
音声想起時の脳波から音声言語を認識する音声想起認識方法であって、
電極群から入力される前記電極毎の脳波の離散信号群を分析処理してスペクトル時系列を出力する分析処理ステップと、
前記スペクトル時系列に基づき、音素特徴ベクトル時系列を出力する抽出ステップと、
を含み、
前記抽出ステップは、所定の畳み込み演算子を用いて言語特徴である音素尤度ベクトル時系列を出力することを含む、コンピュータにより実行される音声想起認識方法。
It is a voice recall recognition method that recognizes vocal language from brain waves at the time of speech recall.
An analysis processing step that analyzes and processes a discrete signal group of brain waves for each electrode input from the electrode group and outputs a spectral time series.
An extraction step that outputs a phoneme feature vector time series based on the spectral time series,
Including
The extraction step is a computer-executed speech recall recognition method comprising outputting a phoneme likelihood vector time series, which is a language feature, using a predetermined convolution operator .
コンピュータに、音声想起時の脳波から音声言語を認識する音声想起認識処理を実行させるためのプログラムであって、
コンピュータに、
電極群から入力される前記電極毎の脳波の離散信号群を分析処理してスペクトル時系列を出力する分析処理と、
前記電極毎のスペクトル成分に基づき、音素特徴ベクトル時系列を抽出する抽出処理と、
を実行させ、
前記抽出処理は、所定の畳み込み演算子を用いて言語特徴である音素尤度ベクトル時系列を出力することを含む、プログラム。
It is a program for causing a computer to execute a voice recall recognition process that recognizes a speech language from brain waves at the time of speech recall.
On the computer
An analysis process that analyzes and processes a discrete signal group of brain waves for each electrode input from the electrode group and outputs a spectral time series.
Extraction processing to extract phoneme feature vector time series based on the spectral components of each electrode,
To execute,
The extraction process is a program including outputting a phoneme likelihood vector time series, which is a language feature, using a predetermined convolution operator.
JP2019097202A 2019-05-23 2019-05-23 Voice recall recognition device, wearer, voice recall recognition method and program Active JP7043081B2 (en)

Priority Applications (5)

Application Number Priority Date Filing Date Title
JP2019097202A JP7043081B2 (en) 2019-05-23 2019-05-23 Voice recall recognition device, wearer, voice recall recognition method and program
EP20809757.6A EP3973861A1 (en) 2019-05-23 2020-05-22 Speech imagery recognition device, wearing fixture, speech imagery recognition method, and program
CN202080037965.1A CN113873944A (en) 2019-05-23 2020-05-22 Speech association recognition device, wearing tool, speech association recognition method, and program
US17/613,658 US20220238113A1 (en) 2019-05-23 2020-05-22 Speech imagery recognition device, wearing fixture, speech imagery recognition method, and program
PCT/JP2020/020342 WO2020235680A1 (en) 2019-05-23 2020-05-22 Speech imagery recognition device, wearing fixture, speech imagery recognition method, and program

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
JP2019097202A JP7043081B2 (en) 2019-05-23 2019-05-23 Voice recall recognition device, wearer, voice recall recognition method and program

Publications (3)

Publication Number Publication Date
JP2020191021A JP2020191021A (en) 2020-11-26
JP2020191021A5 JP2020191021A5 (en) 2022-01-06
JP7043081B2 true JP7043081B2 (en) 2022-03-29

Family

ID=73454620

Family Applications (1)

Application Number Title Priority Date Filing Date
JP2019097202A Active JP7043081B2 (en) 2019-05-23 2019-05-23 Voice recall recognition device, wearer, voice recall recognition method and program

Country Status (5)

Country Link
US (1) US20220238113A1 (en)
EP (1) EP3973861A1 (en)
JP (1) JP7043081B2 (en)
CN (1) CN113873944A (en)
WO (1) WO2020235680A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR101932682B1 (en) * 2016-08-29 2019-03-20 정금진 Steam boiler of multi-pipe type

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2009297059A (en) 2008-06-10 2009-12-24 Toyota Central R&D Labs Inc Brain training support apparatus
US20120022391A1 (en) 2010-07-22 2012-01-26 Washington University In St. Louis Multimodal Brain Computer Interface
JP2017074356A (en) 2015-10-16 2017-04-20 国立大学法人広島大学 Sensitivity evaluation method

Family Cites Families (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2515875B2 (en) * 1989-03-07 1996-07-10 日本電信電話株式会社 A syllable recognition device using EEG topography
JPH066118B2 (en) * 1989-10-14 1994-01-26 元 田村 EEG analyzer
US7054454B2 (en) * 2002-03-29 2006-05-30 Everest Biomedical Instruments Company Fast wavelet estimation of weak bio-signals using novel algorithms for generating multiple additional data frames
CN1991976A (en) * 2005-12-31 2007-07-04 潘建强 Phoneme based voice recognition method and system
JP4411442B2 (en) 2007-02-20 2010-02-10 国立大学法人 岡山大学 EEG-motor command converter
US9788043B2 (en) * 2008-11-07 2017-10-10 Digimarc Corporation Content interaction methods and systems employing portable devices
KR101783959B1 (en) * 2009-08-18 2017-10-10 삼성전자주식회사 Portable sound source playing apparatus for testing hearing ability and method for performing thereof
CN102781322B (en) * 2010-06-11 2015-02-25 松下电器产业株式会社 Evaluation system of speech sound hearing, method of same
WO2016011189A1 (en) * 2014-07-15 2016-01-21 The Regents Of The University Of California Frequency-multiplexed speech-sound stimuli for hierarchical neural characterization of speech processing
JP6580882B2 (en) * 2015-06-24 2019-09-25 株式会社東芝 Speech recognition result output device, speech recognition result output method, and speech recognition result output program
US11717686B2 (en) * 2017-12-04 2023-08-08 Neuroenhancement Lab, LLC Method and apparatus for neuroenhancement to facilitate learning and performance
US11478603B2 (en) * 2017-12-31 2022-10-25 Neuroenhancement Lab, LLC Method and apparatus for neuroenhancement to enhance emotional response
CN109741733B (en) * 2019-01-15 2023-01-31 河海大学常州校区 Voice phoneme recognition method based on consistency routing network
US11756540B2 (en) * 2019-03-05 2023-09-12 Medyug Technology Private Limited Brain-inspired spoken language understanding system, a device for implementing the system, and method of operation thereof
KR20210076451A (en) * 2019-12-16 2021-06-24 현대자동차주식회사 User interface system and operation method thereof

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2009297059A (en) 2008-06-10 2009-12-24 Toyota Central R&D Labs Inc Brain training support apparatus
US20120022391A1 (en) 2010-07-22 2012-01-26 Washington University In St. Louis Multimodal Brain Computer Interface
JP2017074356A (en) 2015-10-16 2017-04-20 国立大学法人広島大学 Sensitivity evaluation method

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR101932682B1 (en) * 2016-08-29 2019-03-20 정금진 Steam boiler of multi-pipe type

Also Published As

Publication number Publication date
WO2020235680A1 (en) 2020-11-26
JP2020191021A (en) 2020-11-26
CN113873944A (en) 2021-12-31
US20220238113A1 (en) 2022-07-28
EP3973861A1 (en) 2022-03-30

Similar Documents

Publication Publication Date Title
Giri et al. Attention wave-u-net for speech enhancement
Kingsbury et al. Robust speech recognition using the modulation spectrogram
Narendra et al. The detection of Parkinson's disease from speech using voice source information
CN111048071B (en) Voice data processing method, device, computer equipment and storage medium
Darabkh et al. An efficient speech recognition system for arm‐disabled students based on isolated words
Catellier et al. Wawenets: A no-reference convolutional waveform-based approach to estimating narrowband and wideband speech quality
Moselhy et al. LPC and MFCC performance evaluation with artificial neural network for spoken language identification
WO2014062521A1 (en) Emotion recognition using auditory attention cues extracted from users voice
Agrawal et al. Modulation filter learning using deep variational networks for robust speech recognition
Heckmann et al. A hierarchical framework for spectro-temporal feature extraction
Bulut et al. Low-latency single channel speech enhancement using u-net convolutional neural networks
CN108198576A (en) A kind of Alzheimer&#39;s disease prescreening method based on phonetic feature Non-negative Matrix Factorization
Ismail et al. Mfcc-vq approach for qalqalahtajweed rule checking
Mini et al. EEG based direct speech BCI system using a fusion of SMRT and MFCC/LPCC features with ANN classifier
Fazel et al. Sparse auditory reproducing kernel (SPARK) features for noise-robust speech recognition
JP7043081B2 (en) Voice recall recognition device, wearer, voice recall recognition method and program
Abdulbaqi et al. Residual recurrent neural network for speech enhancement
Sharon et al. An empirical study of speech processing in the brain by analyzing the temporal syllable structure in speech-input induced EEG
Martínez et al. Denoising sound signals in a bioinspired non-negative spectro-temporal domain
Angrick et al. Speech Spectrogram Estimation from Intracranial Brain Activity Using a Quantization Approach.
Krishna et al. Continuous Silent Speech Recognition using EEG
Kayser et al. Denoising convolutional autoencoders for noisy speech recognition
Murugan et al. Efficient Recognition and Classification of Stuttered Word from Speech Signal using Deep Learning Technique
Nemala et al. Biomimetic multi-resolution analysis for robust speaker recognition
Shome et al. Non-negative frequency-weighted energy-based speech quality estimation for different modes and quality of speech

Legal Events

Date Code Title Description
A521 Request for written amendment filed

Free format text: JAPANESE INTERMEDIATE CODE: A523

Effective date: 20211126

A621 Written request for application examination

Free format text: JAPANESE INTERMEDIATE CODE: A621

Effective date: 20211126

A871 Explanation of circumstances concerning accelerated examination

Free format text: JAPANESE INTERMEDIATE CODE: A871

Effective date: 20211126

A131 Notification of reasons for refusal

Free format text: JAPANESE INTERMEDIATE CODE: A131

Effective date: 20220111

A521 Request for written amendment filed

Free format text: JAPANESE INTERMEDIATE CODE: A523

Effective date: 20220112

TRDD Decision of grant or rejection written
A01 Written decision to grant a patent or to grant a registration (utility model)

Free format text: JAPANESE INTERMEDIATE CODE: A01

Effective date: 20220208

A61 First payment of annual fees (during grant procedure)

Free format text: JAPANESE INTERMEDIATE CODE: A61

Effective date: 20220309

R150 Certificate of patent or registration of utility model

Ref document number: 7043081

Country of ref document: JP

Free format text: JAPANESE INTERMEDIATE CODE: R150