JPH07502126A - 入力音声の個別単語の認識方法及び装置 - Google Patents
入力音声の個別単語の認識方法及び装置Info
- Publication number
- JPH07502126A JPH07502126A JP5502468A JP50246893A JPH07502126A JP H07502126 A JPH07502126 A JP H07502126A JP 5502468 A JP5502468 A JP 5502468A JP 50246893 A JP50246893 A JP 50246893A JP H07502126 A JPH07502126 A JP H07502126A
- Authority
- JP
- Japan
- Prior art keywords
- input
- individual words
- plane
- correlogram
- neuron
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 55
- 238000013528 artificial neural network Methods 0.000 claims abstract description 17
- 210000002569 neuron Anatomy 0.000 claims description 12
- 230000003595 spectral effect Effects 0.000 claims description 10
- 239000013598 vector Substances 0.000 claims description 9
- 230000005236 sound signal Effects 0.000 claims description 6
- 230000002123 temporal effect Effects 0.000 claims description 2
- 230000006870 function Effects 0.000 claims 2
- 239000000126 substance Substances 0.000 claims 2
- 238000001228 spectrum Methods 0.000 claims 1
- 230000001960 triggered effect Effects 0.000 claims 1
- 238000000605 extraction Methods 0.000 abstract description 3
- 238000012549 training Methods 0.000 description 10
- 210000002364 input neuron Anatomy 0.000 description 8
- 230000007935 neutral effect Effects 0.000 description 5
- 238000005070 sampling Methods 0.000 description 5
- 230000008901 benefit Effects 0.000 description 4
- 210000004205 output neuron Anatomy 0.000 description 3
- 238000012545 processing Methods 0.000 description 3
- 238000004422 calculation algorithm Methods 0.000 description 2
- 230000002596 correlated effect Effects 0.000 description 2
- 230000000875 corresponding effect Effects 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 238000003909 pattern recognition Methods 0.000 description 2
- 241000713054 Agation Species 0.000 description 1
- 241000282994 Cervidae Species 0.000 description 1
- 101000822695 Clostridium perfringens (strain 13 / Type A) Small, acid-soluble spore protein C1 Proteins 0.000 description 1
- 101000655262 Clostridium perfringens (strain 13 / Type A) Small, acid-soluble spore protein C2 Proteins 0.000 description 1
- 101000655256 Paraclostridium bifermentans Small, acid-soluble spore protein alpha Proteins 0.000 description 1
- 101000655264 Paraclostridium bifermentans Small, acid-soluble spore protein beta Proteins 0.000 description 1
- 230000003542 behavioural effect Effects 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 210000004556 brain Anatomy 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 238000007796 conventional method Methods 0.000 description 1
- 230000001419 dependent effect Effects 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 239000000284 extract Substances 0.000 description 1
- 239000004744 fabric Substances 0.000 description 1
- 239000011159 matrix material Substances 0.000 description 1
- 101150006061 neur gene Proteins 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 238000004088 simulation Methods 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
- 238000013519 translation Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/16—Speech classification or search using artificial neural networks
Landscapes
- Engineering & Computer Science (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Computation (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Machine Translation (AREA)
- Image Analysis (AREA)
- Navigation (AREA)
- Character Discrimination (AREA)
Abstract
Description
Claims (1)
- 【特許請求の範囲】 1.入力音声の個別単語の認識方法において、下記の方法過程ステップを備え、 すなわち −予測期間内に入力された音声を電気音声信号に変換し、 −音素の時間的継続期間ないし持続時間により定まる時間間隔tsにおいて、音 声信号の瞬時のスペクトル振幅分布を求め、スペクトルベクトルS1(i=0. 1.....m−1) として表わし、ここにおいてスペクトルベクトルS1の各要素(S10,S11 ,....S1n−1)は帯域幅b0を有する周波数帯域の振幅を表わし、 −当該予測期間内にて求められたスペクトルベクトルS1からスペクトログラム Sを下記式に従って形成し、 ▲数式、化学式、表等があります▼ −スペクトログラムSから座標j,h,kを有するコレログラムKを求め、ここ において上記コレログラムKの各要素Kj,h,kを下式に従って形成し▲数式 、化学式、表等があります▼ −上記コレログラムkを、音声入力された個別単語の分類のため単語特有の特徴 パターンとして用いることを特徴とする入力音声の個別単語の認識方法。 2.2つの順次連続するスペクトルベクトルS,S1+1間の所定の時間間隔t ■を32msecに等しくした請求の範囲1記載の方法。 3,上記コレログラムKのインデックスj,h,kに対して、下記条件を選定す る、すなわちK>0:k×bc、1kHz j+k≧n−1 j,h≧0 h×ts<500msec という条件を選定する請求の範囲1又は2記載の方法。 4.音声入力された個別単語の分類のためにニューラルネットワークを使用する 請求の範囲1,2または3記載の方法。 5.上記ニューラルネットワークに第1の数のニューロンを有する入力平面、及 び、第2の数のニューロンを有する出力平面を備え、単語特有のコレログラムK の各要素を上記入力平面のニューロンと結合ないし接続し、更に、上記ニューロ ン各々の出力側を上記の出力平面の各ニューロン結合ないし接続し、更に上記出 力平面の1つのニューロンの出力(側)により1つの所定の認識された個別単語 を指示するようにした請求の範囲4記載の方法。 6.各ニューロンに下記の非連続性伝達関数を備える、即ち f(×)=×/{|×|+1} という伝達関数を備える請求の範囲4又は5記載の方法。 7.デジタル信号プロセッサを設け該プロセッサはデータ、アドレス、制御線路 から成るバスシステムを介して、プログラムメモリ、ワークメモリ、入出力ユニ ットに接続されている請求の範囲1から6までのうちいずれか1項記載の方法を 実施する装置。 8.当該装置はIC素子として構成されている請求の範囲7記載の装置。 9.認識された個別単語に基づきダイヤリング過程が、個別単語に対応づけられ た電話番号を以て電話装置にてトリガされるようにした請求の範囲2から6まで のうちいずれか1項記載の方法。
Applications Claiming Priority (3)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
AT148891 | 1991-07-25 | ||
AT1488/91 | 1991-07-25 | ||
PCT/AT1992/000100 WO1993002448A1 (de) | 1991-07-25 | 1992-07-21 | Verfahren und anordnung zum erkennen von einzelwörtern gesprochener sprache |
Publications (2)
Publication Number | Publication Date |
---|---|
JPH07502126A true JPH07502126A (ja) | 1995-03-02 |
JP3274133B2 JP3274133B2 (ja) | 2002-04-15 |
Family
ID=3514977
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
JP50246893A Expired - Fee Related JP3274133B2 (ja) | 1991-07-25 | 1992-07-21 | 入力音声の個別単語の認識方法及び装置 |
Country Status (10)
Country | Link |
---|---|
US (1) | US5721807A (ja) |
EP (1) | EP0595889B1 (ja) |
JP (1) | JP3274133B2 (ja) |
AT (1) | ATE159374T1 (ja) |
DE (1) | DE59208973D1 (ja) |
DK (1) | DK0595889T3 (ja) |
ES (1) | ES2108127T3 (ja) |
GR (1) | GR3025319T3 (ja) |
NO (1) | NO306887B1 (ja) |
WO (1) | WO1993002448A1 (ja) |
Families Citing this family (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR100446471B1 (ko) * | 1998-09-28 | 2004-12-04 | 주식회사 삼양사 | 고유동성폴리카보네이트수지 |
US8983832B2 (en) * | 2008-07-03 | 2015-03-17 | The Board Of Trustees Of The University Of Illinois | Systems and methods for identifying speech sound features |
US9015093B1 (en) | 2010-10-26 | 2015-04-21 | Michael Lamport Commons | Intelligent control with hierarchical stacked neural networks |
US8775341B1 (en) | 2010-10-26 | 2014-07-08 | Michael Lamport Commons | Intelligent control with hierarchical stacked neural networks |
US8699637B2 (en) | 2011-08-05 | 2014-04-15 | Hewlett-Packard Development Company, L.P. | Time delay estimation |
US8738554B2 (en) | 2011-09-16 | 2014-05-27 | International Business Machines Corporation | Event-driven universal neural network circuit |
US8874498B2 (en) | 2011-09-16 | 2014-10-28 | International Business Machines Corporation | Unsupervised, supervised, and reinforced learning via spiking computation |
US8626684B2 (en) | 2011-12-14 | 2014-01-07 | International Business Machines Corporation | Multi-modal neural network for universal, online learning |
US8799199B2 (en) | 2011-12-14 | 2014-08-05 | International Business Machines Corporation | Universal, online learning in multi-modal perception-action semilattices |
US20150134580A1 (en) * | 2013-11-12 | 2015-05-14 | Persyst Development Corporation | Method And System For Training A Neural Network |
US20190070517A1 (en) * | 2017-09-05 | 2019-03-07 | Creata (Usa) Inc. | Digitally-Interactive Toy System and Method |
EP3502974A1 (de) | 2017-12-20 | 2019-06-26 | Siemens Aktiengesellschaft | Verfahren zur realisierung eines neuronalen netzes |
CN110335617A (zh) * | 2019-05-24 | 2019-10-15 | 国网新疆电力有限公司乌鲁木齐供电公司 | 一种变电站中的噪音分析方法 |
Family Cites Families (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
GB2139052A (en) * | 1983-04-20 | 1984-10-31 | Philips Electronic Associated | Apparatus for distinguishing between speech and certain other signals |
US4975961A (en) * | 1987-10-28 | 1990-12-04 | Nec Corporation | Multi-layer neural network to which dynamic programming techniques are applicable |
US4928302A (en) * | 1987-11-06 | 1990-05-22 | Ricoh Company, Ltd. | Voice actuated dialing apparatus |
US5285522A (en) * | 1987-12-03 | 1994-02-08 | The Trustees Of The University Of Pennsylvania | Neural networks for acoustical pattern recognition |
JP2764277B2 (ja) * | 1988-09-07 | 1998-06-11 | 株式会社日立製作所 | 音声認識装置 |
JPH0375860A (ja) * | 1989-08-18 | 1991-03-29 | Hitachi Ltd | パーソナライズド端末 |
DE69030561T2 (de) * | 1989-12-28 | 1997-10-09 | Sharp Kk | Spracherkennungseinrichtung |
US5473759A (en) * | 1993-02-22 | 1995-12-05 | Apple Computer, Inc. | Sound analysis and resynthesis using correlograms |
-
1992
- 1992-07-21 JP JP50246893A patent/JP3274133B2/ja not_active Expired - Fee Related
- 1992-07-21 US US08/185,800 patent/US5721807A/en not_active Expired - Fee Related
- 1992-07-21 DE DE59208973T patent/DE59208973D1/de not_active Expired - Fee Related
- 1992-07-21 ES ES92915507T patent/ES2108127T3/es not_active Expired - Lifetime
- 1992-07-21 WO PCT/AT1992/000100 patent/WO1993002448A1/de active IP Right Grant
- 1992-07-21 AT AT92915507T patent/ATE159374T1/de not_active IP Right Cessation
- 1992-07-21 EP EP92915507A patent/EP0595889B1/de not_active Expired - Lifetime
- 1992-07-21 DK DK92915507.5T patent/DK0595889T3/da not_active Application Discontinuation
-
1994
- 1994-01-24 NO NO940241A patent/NO306887B1/no not_active IP Right Cessation
-
1997
- 1997-11-07 GR GR970402961T patent/GR3025319T3/el unknown
Also Published As
Publication number | Publication date |
---|---|
NO940241L (no) | 1994-01-24 |
DE59208973D1 (de) | 1997-11-20 |
GR3025319T3 (en) | 1998-02-27 |
WO1993002448A1 (de) | 1993-02-04 |
EP0595889B1 (de) | 1997-10-15 |
EP0595889A1 (de) | 1994-05-11 |
DK0595889T3 (da) | 1998-05-25 |
JP3274133B2 (ja) | 2002-04-15 |
ES2108127T3 (es) | 1997-12-16 |
NO306887B1 (no) | 2000-01-03 |
ATE159374T1 (de) | 1997-11-15 |
US5721807A (en) | 1998-02-24 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Lin et al. | Speech enhancement using multi-stage self-attentive temporal convolutional networks | |
CN105321525B (zh) | 一种降低voip通信资源开销的系统和方法 | |
JP2022529641A (ja) | 音声処理方法、装置、電子機器及びコンピュータプログラム | |
JPH07502126A (ja) | 入力音声の個別単語の認識方法及び装置 | |
CN110675891B (zh) | 一种基于多层注意力机制的语音分离方法、模块 | |
Luo et al. | Ultra-lightweight speech separation via group communication | |
CN105225672B (zh) | 融合基频信息的双麦克风定向噪音抑制的系统及方法 | |
CN105118501A (zh) | 语音识别的方法及系统 | |
CN110600014B (zh) | 一种模型训练方法、装置、存储介质及电子设备 | |
CN109036460A (zh) | 基于多模型神经网络的语音处理方法和装置 | |
CN111243617B (zh) | 一种基于深度学习降低mfcc特征失真的语音增强方法 | |
EP1913591B1 (en) | Enhancement of speech intelligibility in a mobile communication device by controlling the operation of a vibrator in dependance of the background noise | |
CN107293305A (zh) | 一种基于盲源分离算法改善录音质量的方法及其装置 | |
Shi et al. | End-to-End Monaural Speech Separation with Multi-Scale Dynamic Weighted Gated Dilated Convolutional Pyramid Network. | |
Barros et al. | Estimation of speech embedded in a reverberant and noisy environment by independent component analysis and wavelets | |
Saeki et al. | Real-Time, Full-Band, Online DNN-Based Voice Conversion System Using a Single CPU. | |
CN113763966B (zh) | 一种端到端的文本无关声纹识别方法及系统 | |
CN113782044B (zh) | 一种语音增强方法及装置 | |
CN113077798B (zh) | 一种居家老人呼救设备 | |
Gandhiraj et al. | Auditory-based wavelet packet filterbank for speech recognition using neural network | |
CN111276132A (zh) | 一种语音处理方法、电子设备及计算机可读存储介质 | |
Zaman et al. | Classification of Harmful Noise Signals for Hearing Aid Applications using Spectrogram Images and Convolutional Neural Networks | |
CN110136741A (zh) | 一种基于多尺度上下文的单通道语音增强方法 | |
CN110459235A (zh) | 一种混响消除方法、装置、设备及存储介质 | |
Erten et al. | Voice extraction by on-line signal separation and recovery |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
R250 | Receipt of annual fees |
Free format text: JAPANESE INTERMEDIATE CODE: R250 |
|
R250 | Receipt of annual fees |
Free format text: JAPANESE INTERMEDIATE CODE: R250 |
|
R250 | Receipt of annual fees |
Free format text: JAPANESE INTERMEDIATE CODE: R250 |
|
FPAY | Renewal fee payment (event date is renewal date of database) |
Free format text: PAYMENT UNTIL: 20080201 Year of fee payment: 6 |
|
FPAY | Renewal fee payment (event date is renewal date of database) |
Free format text: PAYMENT UNTIL: 20090201 Year of fee payment: 7 |
|
FPAY | Renewal fee payment (event date is renewal date of database) |
Free format text: PAYMENT UNTIL: 20090201 Year of fee payment: 7 |
|
FPAY | Renewal fee payment (event date is renewal date of database) |
Free format text: PAYMENT UNTIL: 20100201 Year of fee payment: 8 |
|
LAPS | Cancellation because of no payment of annual fees |