KR970050112A - Real time voice recognition - Google Patents

Real time voice recognition Download PDF

Info

Publication number
KR970050112A
KR970050112A KR1019950047885A KR19950047885A KR970050112A KR 970050112 A KR970050112 A KR 970050112A KR 1019950047885 A KR1019950047885 A KR 1019950047885A KR 19950047885 A KR19950047885 A KR 19950047885A KR 970050112 A KR970050112 A KR 970050112A
Authority
KR
South Korea
Prior art keywords
speech
signal
feature
neural network
value
Prior art date
Application number
KR1019950047885A
Other languages
Korean (ko)
Other versions
KR100202424B1 (en
Inventor
정호선
조영탁
Original Assignee
정호선
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 정호선 filed Critical 정호선
Priority to KR1019950047885A priority Critical patent/KR100202424B1/en
Publication of KR970050112A publication Critical patent/KR970050112A/en
Application granted granted Critical
Publication of KR100202424B1 publication Critical patent/KR100202424B1/en

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/16Speech classification or search using artificial neural networks
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/02Feature extraction for speech recognition; Selection of recognition unit
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/04Segmentation; Word boundary detection

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)
  • Telephonic Communication Services (AREA)

Abstract

본 발명은 음성인식방법에 관한 것으로, 다수의 심플음성신호로부터 구해진 신호차이를 저장하는 신호차이 누적과정; 저장된 신호차이의 최대값으로 된 일련의 열을 계산하고, 상기 열로부터 음성을 음절로 분리하고 자음과 모음을 구별하는 세그먼트화과정; 세그먼트화과정에서 분리된 자음과 모음에 의하여 음성신호를 시간 영역에서 정규화하여 음성특징으로 추출하고 정규화과정; 및 실수형 신경회로망에서와 같은 학습상수 및 활성함수를 가진 정수형 입력구도 다층 퍼셉트론 신경회로망에 상기 추출된 특징으로 적용하여 특징으로 부터 음성을 분류하는 과정을 포함함을 특징으로 한다.The present invention relates to a speech recognition method, comprising: a signal difference accumulation process for storing signal differences obtained from a plurality of simple voice signals; A segmentation process of calculating a series of maximum values of stored signal differences, separating speech from syllables into syllables, and distinguishing consonants from vowels; Normalizing the speech signal in the time domain by consonants and vowels separated in the segmentation process and extracting the speech signal into a speech feature; And an integer input sphere having a learning constant and an activation function as in a real neural network, is applied to the multilayer perceptron neural network as the extracted feature to classify a voice from the feature.

본 발명에 의하면, 음성인식처리에 잇어서, 저가형, 소형이면서도 실시간 처리가 가능하며 높은 인식률을 가지는 시스템을 구현할 수 있다.According to the present invention, in addition to the voice recognition processing, it is possible to implement a system having a low recognition rate, a small size and real-time processing and a high recognition rate.

Description

실시간 음성인식방법Real time voice recognition

본 내용은 요부공개 건이므로 전문내용을 수록하지 않았음Since this is an open matter, no full text was included.

제1도는 본 발명에 의한 음성인식장난감의 하드웨어적인 구성을 도시한 구성블럭도.1 is a block diagram showing a hardware configuration of a voice recognition toy according to the present invention.

제2도는 제1도에 도시된 음성인식장치의 상세 구성블럭도.2 is a detailed block diagram of the speech recognition apparatus shown in FIG.

제3도는 본 발명에 의한 음성인식 알고리즘을 설명하기 위한 블럭도.3 is a block diagram for explaining a speech recognition algorithm according to the present invention.

Claims (3)

음성신호로부터 특징으로 추출하고, 추출된 특징으로부터 음성을 분류하여 음성인식에 적용하기 위한 음성 인식방법에 있어서, 다수의 샘플 음성신호로부터 구해진 신호차이를 저장하는 신호차이누적과정; 상기 저장된 신호차이의 최대값으로 된 릴련의 열을 계산하고, 상기 열로부터 음성을 음절로 분리하고 자음과 모음을 구별하는 세그먼트화과정; 상기 세그먼트화과정에서 분리된 자음과 모음에 의하여 으성시호를 시간영역에서 정규화하여 음성특징으로 추출하고 정규화과정; 및 실수형 신경회로망에서와 같은 학습상수 및 활성함수를 가진 정수형 입력구동 다층 퍼셉트론 신경회로망에 상기 추출된 특징으로 적용하여 특징으로부터 음성을 분류하는 과정을 포함함을 특징으로 하는 음성인식방법.A speech recognition method for extracting a feature from a speech signal, classifying a speech from the extracted feature, and applying it to speech recognition, comprising: a signal difference accumulation process of storing signal differences obtained from a plurality of sample speech signals; A segmentation process of calculating a series of columns of maximum values of the stored signal differences, separating speech into syllables, and distinguishing consonants and vowels from the rows; Normalizing the voice call in the time domain by consonants and vowels separated in the segmentation process and extracting the voice feature into a voice feature; And classifying speech from the features by applying the extracted features to an integer input driving multilayer perceptron neural network having a learning constant and an activation function as in a real neural network. 제1항에 있어서, 상기 신호차이누적과정은 소정 개수의샘플을 한 프레임으로 선택하고, 순차적으로 각각 샘플간의 차리를 구하는 과정; 상기 계산된 차이값을 대응하는 복수의 멜스케일 저장장치에 저장하는 과정; 및 상기 각 멜스케일 저장장치에 누적된 데이타의 수를 카운트하는 과정; 및 시간축에 일정시간만큼 이동시키고 나머지 프레임에 대하여 상기 제1프레임에서와 같은 과정을 반복하는 과정을 포함함을 특징으로 하는 음성인식방법.The method of claim 1, wherein the signal difference accumulating process comprises: selecting a predetermined number of samples in one frame, and sequentially calculating differences between samples; Storing the calculated difference value in a plurality of melscale storage devices; Counting the number of data accumulated in each of the melscale storage devices; And repeating the same process as the first frame with respect to the remaining frames by a predetermined time. 정수형 입력구동 다층 퍼셉트론 신경회로망의 활성함수에서 최적의 오프셋값을 설정하기 위한 방법에 있어서, 초기 오프셋값으로 0을 선택하는 제1과정; 전체 에러가 더 이상의 반복에 의해서도 감소되지 않을 때 그 반복회수를 카운트하는 제2과정; 만일 상기 카운트된 반복회수가 소정의 상수값보다 더 크면 정미가(net value)의 평균을 계산하는 제3과정; 만일 상기 계산된 평균값이 양수이면 오프셋을 1포인트 증가시키고, 만일 상기 계산된 평균값이 음수이면 오프셋을 1포인트 감소시키는 제3과정; 상기 과정에서 증가 또는 감소된 새로운 오프셋값을 사용하여 가중치와 에러를 계산하는 제4과정; 및 전체 에러가 소정의 원하는 값으로 될 때까지 상기 제2과정부터 반복하는 제5과정을 포함하을 특징으로 하는 오프셋값 설정방법.CLAIMS 1. A method for setting an optimal offset value in an activation function of an integer input drive multilayer perceptron neural network, comprising: a first step of selecting zero as an initial offset value; Counting the number of iterations when the total error is not reduced by further iterations; A third step of calculating an average of net values if the counted number of repetitions is greater than a predetermined constant value; A third step of increasing the offset by one point if the calculated average value is positive, and decreasing the offset by one point if the calculated average value is negative; A fourth step of calculating a weight and an error by using the new offset value increased or decreased in the step; And a fifth step of repeating the second step until the total error reaches a predetermined desired value. ※ 참고사항 : 최초출원 내용에 의하여 공개하는 것임.※ Note: The disclosure is based on the initial application.
KR1019950047885A 1995-12-08 1995-12-08 Real time speech recognition method KR100202424B1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
KR1019950047885A KR100202424B1 (en) 1995-12-08 1995-12-08 Real time speech recognition method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
KR1019950047885A KR100202424B1 (en) 1995-12-08 1995-12-08 Real time speech recognition method

Publications (2)

Publication Number Publication Date
KR970050112A true KR970050112A (en) 1997-07-29
KR100202424B1 KR100202424B1 (en) 1999-06-15

Family

ID=19438634

Family Applications (1)

Application Number Title Priority Date Filing Date
KR1019950047885A KR100202424B1 (en) 1995-12-08 1995-12-08 Real time speech recognition method

Country Status (1)

Country Link
KR (1) KR100202424B1 (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR102413692B1 (en) 2015-07-24 2022-06-27 삼성전자주식회사 Apparatus and method for caculating acoustic score for speech recognition, speech recognition apparatus and method, and electronic device
KR102192678B1 (en) 2015-10-16 2020-12-17 삼성전자주식회사 Apparatus and method for normalizing input data of acoustic model, speech recognition apparatus

Also Published As

Publication number Publication date
KR100202424B1 (en) 1999-06-15

Similar Documents

Publication Publication Date Title
Zazo et al. Feature Learning with Raw-Waveform CLDNNs for Voice Activity Detection.
Thomas et al. Analyzing convolutional neural networks for speech activity detection in mismatched acoustic conditions
US5805731A (en) Adaptive statistical classifier which provides reliable estimates or output classes having low probabilities
Chang et al. Robust CNN-based speech recognition with Gabor filter kernels.
CA2247006C (en) Speech processing
CN109003625B (en) Speech emotion recognition method and system based on ternary loss
CA2122575C (en) Speaker independent isolated word recognition system using neural networks
CN110033756B (en) Language identification method and device, electronic equipment and storage medium
US6151592A (en) Recognition apparatus using neural network, and learning method therefor
US5809461A (en) Speech recognition apparatus using neural network and learning method therefor
EP0586714B1 (en) Speech recognition apparatus using neural network, and learning method therefor
US5101434A (en) Voice recognition using segmented time encoded speech
US9972310B2 (en) System and method for neural network based feature extraction for acoustic model development
Scherer et al. Real-time emotion recognition from speech using echo state networks
CN113889099A (en) Voice recognition method and system
KR970050112A (en) Real time voice recognition
Sunny et al. Feature extraction methods based on linear predictive coding and wavelet packet decomposition for recognizing spoken words in malayalam
Yousfi et al. Isolated Iqlab checking rules based on speech recognition system
JPH06119476A (en) Time sequential data processor
Sekhar et al. Neural network models for spotting stop consonant-vowel (SCV) segments in continuous speech
Wang et al. Speaker verification and identification using gamma neural networks
KR100211113B1 (en) Learning method and speech recognition using chaotic recurrent neural networks
Sankar et al. Noise immunization using neural net for speech recognition
Surampudi et al. Speech signal processing using neural networks mapping the phonology of sanskrit language using neural networks
dos Santos Moura et al. Source Extraction based on Binary Masking and Machine Learning

Legal Events

Date Code Title Description
A201 Request for examination
E902 Notification of reason for refusal
E701 Decision to grant or registration of patent right
GRNT Written decision to grant
LAPS Lapse due to unpaid annual fee