KR20020024742A

KR20020024742A - An apparatus for abstracting the characteristics of voice signal using Non-linear method and the method thereof

Info

Publication number: KR20020024742A
Application number: KR1020000056532A
Authority: KR
Inventors: 백승표
Original assignee: 김대중; (주)네오싸이피아
Priority date: 2000-09-26
Filing date: 2000-09-26
Publication date: 2002-04-01

Abstract

PURPOSE: A device and a method for extracting characteristics of sound signal by a nonlinear method are provided to reduce a process period of time of sound signal, increase a compression rate of data, simplify a required database construction work, and to obtain a data result value of good quality sound signal. CONSTITUTION: A device for extracting characteristics of sound signal by a nonlinear method includes a sound signal input part, a correlation coefficient detection part(10) for controlling frequency amplitude deviation of sound signal using correlation coefficient, an average zero crossing rate detection part(11) for dividing phoneme of the input sound signal into voiced sound, unvoiced sound and/or silence, a voiced sound nonlinear processing part(13) for extracting sound signal characteristics of voiced sound from the sound signals, an unvoiced sound nonlinear processing part(23) for extracting sound signal characteristics of unvoiced sound from the sound signals, a zero point processing part(33) for processing the unvoiced sound signals as zero point, and a combining part(30) for linearly connecting, processing and combining the nonlinear voiced sound data, unvoiced sound data and silence data at a linear time interval and databasing the characteristics of the sound signal.

Description

An apparatus for abstracting the characteristics of voice signal using Non-linear method and the method according to nonlinear method

본 발명은 음성신호의 특징을 추출하는 장치 및 방법에 관한 것으로서, 특히 비선형 방법에 의하여 음성신호의 특징을 추출하여 데이터화하는 장치 및 방법에 관한 것이다.The present invention relates to an apparatus and method for extracting features of a voice signal, and more particularly, to an apparatus and method for extracting and characterizing features of a voice signal by a nonlinear method.

음성은 인간이 사용하고 있는 통신의 매체 중 가장 자연스러운 형태이다. 즉, 자신의 의사표명 혹은 정보의 생성에 있어서 음성을 이용하는 비중이 매우 높다. 또한 현재와 같은 급격한 컴퓨터, 통신 환경의 변화는 음성을 매체로 하는 맨 머신 인터페이스(Man- Machine interface)의 필요성이 증가하고 있는 추세이다. 주식정보, 전화번호 안내, 지역 정보 안내 등의 데이터 베이스 질의(query)분야에서의 음성인식 기술을 이용하려고 하는 경향이 두드러지고 있음은 전술된 맨머신 인터페이스의 필요성을 반증한다.Voice is the most natural form of communication used by humans. In other words, the use of voice is very high in the generation of self-expression or information. In addition, the rapid change of the computer and communication environment, such as the present time, is increasing the need for a man-machine interface using voice as a medium. The prominent tendency to use speech recognition techniques in the field of database queries, such as stock information, telephone number guidance, local information guidance, etc., demonstrates the need for the man machine interface described above.

전술된 예 이외에도 음성 인식 시스템은 컴퓨터의 운영체제를 음성인식을 통해 조작할 수 있도록 하는 것과, 교육, 엔터테인먼트 등 그 응용범위가 광범위하다. 이러한 필요에 따라 다양한 기법의 음성 인식 시스템이 개발되고 있다.In addition to the above-described examples, the speech recognition system has a wide range of applications such as allowing the operating system of a computer to be operated through speech recognition, education, and entertainment. To meet these needs, various techniques for speech recognition systems have been developed.

도 1에 도시된 바와 같이, 종래에 있어서 음성신호의 특징을 추출하기 위한 음성신호 추출 장치로써 국내 공개 특허 "특 1998-0034074"의 "음성신호 특징 추출방법 및 장치"는, 다수의 주파수 대역필터를 사용하여 음성신호를 분할하고 분할된 음성신호에서 영교차 점과 피크치를 검출한 후, 다수의 비선형 처리 수단에 의하여 음성신호에 대한 피크치의 히스토그램을 각각 생성하고 이를 취합하여 음성신호의 특징을 출력하는 방법이 사용된다.As shown in FIG. 1, in the related art, as a voice signal extraction apparatus for extracting a feature of a voice signal, the "voice signal feature extraction method and apparatus" of Korean Patent Laid-Open Publication No. 1998-0034074 includes a plurality of frequency band filters. After segmentation of the speech signal, the zero crossing point and the peak value are detected from the divided speech signal, and the histograms of the peak values for the speech signal are generated by a plurality of nonlinear processing means, respectively, and outputted. Method is used.

이러한 방법은 입력되는 음성신호를 다층의 대역 필터를 통과시키고, 그 결과를 각각 비선형 처리하므로 불필요한 대역 필터 작업이 필요하며, 비선형 처리 부분은 음성의 영교차 검출과 그 결과만을 사용함으로써 음성의 대역폭적 특징을 잘 나타낼 수 없는 문제점이 있다. 또한 다층의 대역 필터를 거친 결과를 모두 고려하여 음성신호를 처리함으로써 처리 시간이 길어지며, 음소를 포함하는 대역 이외의 잡음이 있는 대역도 포함되어 음성신호의 특성으로 추출된 데이터가 정확하지 못하게 되어 상용성이 떨어지는 문제점이 있다.In this method, the input voice signal is passed through a multi-layered band pass filter, and the result is nonlinear. Therefore, unnecessary band filtering is required. The non-linear processing part uses only the zero crossing of the voice and uses the result, thereby reducing the bandwidth of the voice. There is a problem that cannot be characterized well. In addition, the processing time is longer by processing the voice signal in consideration of the result of the multi-layer band filter. Also, the data extracted as a characteristic of the voice signal becomes inaccurate because the band includes noises other than the band including the phoneme. There is a problem of poor compatibility.

도 2에 도시된 바와 같이 종래에 있어서의 음성신호의 특징을 추출하는 또 다른 예로서 국내 공개 특허 "특 1998-038570"의 "음성 합성장치 및 그 방법"은, 먼저 음성신호에서 유성음과 무성음을 검출하는 유/무성음 검출부, 검출된 유/무성음에서 유성음인 경우에는 음성신호의 피치를 검출하는 피치 검출부, 검출된 피치에서 피치 집합 군을 형성하는 피치 집합군 형성부, 피치 집합 군에서 상관관계 계수를 검출하는 상관관계 계수 검출부, 검출된 상관관계 계수를 평균하여 예측 계수를 선형적으로 산출하는 선형 예측 적용부, 다음으로 산출된 예측 계수를 통해 음성신호를 합성하는 계수 합성부, 그리고 검출된 음성신호가 무성음인 경우 가우시안 잡음으로 처리하는 가우시안 처리부를 포함하여 구성된다.As another example of extracting features of a conventional speech signal as shown in FIG. 2, the "voice synthesizer and method thereof" of Korean Patent Laid-Open Publication No. 1998-038570 discloses voiced and unvoiced sounds in a speech signal. Detected voiced / unvoiced sound detection unit, In the case of voiced sound detected from voiced / unvoiced sound pitch detection unit for detecting the pitch of the voice signal, Pitch set group forming unit forming a pitch set group in the detected pitch, Correlation coefficient in the pitch set group A correlation coefficient detector for detecting a signal, a linear prediction applicator for linearly calculating a prediction coefficient by averaging the detected correlation coefficients, a coefficient synthesizer for synthesizing a speech signal using the predicted coefficients, and a detected speech. If the signal is an unvoiced sound, it comprises a Gaussian processor for processing the Gaussian noise.

이러한 방법은 음성의 합성에 국한 될 뿐 아니라 상관관계 계수 검출과 선형예측 방법을 병행하여 비선형적 처리의 미비점을 보완해야 하는 등의 다단계 처리를 수행해야 하므로 처리과정이 복잡하고 처리 시간이 길어지며 선형적 예측 방법에 의한 결과를 비선형 처리의 입력으로 사용하므로 음성의 특징을 잘 나타낼 수 없다는 문제점이 있다.This method is not only limited to speech synthesis but also requires complex processing such as correlation coefficient detection and linear prediction to compensate for the lack of nonlinear processing. Since the result of the adaptive prediction method is used as an input of the nonlinear processing, there is a problem in that the characteristics of speech cannot be well represented.

이에 본 발명은 상기와 같은 문제점을 해결하기 위한 것으로서, 비선형 방법에 의한 음성신호의 특징 추출 장치 및 그 방법을 제공하는 것에 관한 것이다.Accordingly, the present invention is to solve the above problems, and to provide an apparatus and method for extracting features of a speech signal by a nonlinear method.

보다 상세하게 설명하면, 음성신호를 데이터화 내지 부호 및 복호화시킬 때 음질을 향상시키고 음성 데이터의 압축율을 높이기 위하여 문장이나 단어가 아닌 음소 자체를 처리대상으로 하며, 주파수 및 시간을 주파수 대역별로 비선형적으로 분할하여 음성 신호를 샘플링하고, 영교차 방법에 의하여 유성음, 무성음 및 무음을 구분한 후 각각의 대역에 맞는 최소한의 대역 필터를 사용하여 각 특징에 맞는 음성신호에 대한 비선형 처리를 수행함으로써 음성신호 특징을 추출하고 데이터화 한 후에 재 출력하는 경우 잡음 제거 및 음질의 향상을 제공하며, 음성신호의 전송및 저장에 있어서는 높은 음성 데이터 압축율을 제공하고, 언어나 문장에 대한 데이터 베이스가 아닌 음소 파라미터를 데이터 베이스화하여 광범위한 데이터 베이스 구축에 있어 처리시간을 단축시키고, 또한 음성신호를 처리하는 과정에서도 처리 시간을 단축시키는데 그 목적이 있다.In more detail, in order to improve sound quality and to increase the compression rate of speech data, the phoneme itself, not a sentence or a word, is processed as a target for improving the sound quality when data signals, codes and decoding are performed. Voice signal is divided and sampled, and voiced sound, unvoiced sound and silent sound are classified by zero crossing method, and then the non-linear processing is performed on the voice signal corresponding to each characteristic using the minimum band filter for each band. Extracts and re-outputs the data and re-outputs it to provide noise reduction and improved sound quality, and to provide high voice data compression rate for transmission and storage of voice signals, and to database the phonemic parameters rather than the language or sentence database. To build an extensive database The purpose is to shorten the time and also to shorten the processing time in the process of processing the voice signal.

도 1은 종래 기술로서의 음성신호 특징 추출 방법 및 장치를 나타내는 도면,1 is a view showing a voice signal feature extraction method and apparatus according to the prior art;

도 2는 종래 기술로서의 또 다른 일 실시 예의 음성합성 장치 및 그 방법을 나타내는 도면,2 is a view showing a speech synthesis apparatus and method thereof according to another embodiment of the prior art;

도 3은 본 발명에 따르는 비선형 방법에 의한 음성신호의 특징 추출 장치의 일 실시 예를 나타내는 블록도,3 is a block diagram showing an embodiment of an apparatus for extracting features of an audio signal by a nonlinear method according to the present invention;

도 4는 본 발명에 따르는 비선형 방법에 의한 음성신호의 특징 추출 방법의 처리과정을 나타내는 순서도,4 is a flowchart showing a process of a feature extraction method of a speech signal by a nonlinear method according to the present invention;

도 5는 도 4의 과정 중 유성음 비선형 처리과정의 서브루틴을 나타내는 순서도,5 is a flowchart illustrating a subroutine of the voiced sound nonlinear processing of FIG. 4;

도 6은 도 4의 과정 중 무성음 비선형 처리과정의 서브루틴을 나타내는 순서도,6 is a flowchart illustrating a subroutine of the unvoiced nonlinear processing of FIG. 4;

* 도면의 주요 부분에 대한 부호의 설명** Explanation of symbols for the main parts of the drawings *

1 : 비선형 음성신호 특징 추출 장치1: nonlinear voice signal feature extraction device

10 : 상관관계 계수 검출부10: correlation coefficient detector

11 : 평균 영교차 비율 검출부12 : 스위치부11: average zero crossing ratio detection unit 12: switch unit

13 : 유성음 비선형 처리부14 : 제 1 유성음 대역 필터13 voiced sound nonlinear processing unit 14 first voiced sound band filter

15 : 제 2 유성음 대역 필터16, 26 : 비교부15 second voiced sound band filter 16, 26 comparison unit

17, 27 : 대역 비교부23 : 무성음 비선형 처리부17, 27: band comparator 23: unvoiced nonlinear processing unit

24 : 제 1 무성음 대역 필터25 : 제 2 무성음 대역 필터24: first unvoiced band filter 25: second unvoiced band filter

30 : 조합부33 : 영점 처리부30: combination unit 33: zero processing unit

상기와 같은 목적을 달성하기 위한 본 발명에 따르는 비선형 방법에 의한 음성신호의 특징 추출 장치 및 그 방법은, 음성신호를 데이터화 내지 부호 및 복호화시킬 때 잡음제거 및 음질을 향상시키고 음성 데이터의 압축율을 높이기 위한 것으로써 문장이나 단어가 아닌 음소 자체의 파라미터를 데이터 베이스화 하며, 주파수 및 시간을 주파수 대역별로 비선형적으로 분할하여 음성 신호를 샘플링하고, 영교차 방법에 의하여 유성음, 무성음 및 무음을 구분한 후 각각의 대역에 맞는 최소한의 대역 필터를 사용하여 각 특징에 맞는 음성신호에 대한 비선형 처리를 수행함으로써 달성될 수 있다.An apparatus and method for extracting features of a speech signal by a nonlinear method according to the present invention for achieving the above object are to improve noise reduction and sound quality and to increase the compression rate of speech data when data is encoded, coded and decoded. For this purpose, database the parameters of the phoneme itself, not sentences or words, sample the speech signal by dividing the frequency and time non-linearly by frequency band, and classify voiced sound, unvoiced sound and silent sound by zero crossing method. This can be accomplished by performing non-linear processing on the speech signal for each feature using a minimum band filter for the band of.

이하 본 발명에 따르는 비선형 방법을 이용한 음성신호 특징 추출 장치 및 그 방법의 일 실시 예를 첨부된 도면을 통하여 보다 상세히 설명한다.Hereinafter, an apparatus for extracting a voice signal feature using a nonlinear method and an embodiment of the method will be described in detail with reference to the accompanying drawings.

도 3은 본 발명에 따르는 비선형 방법에 의한 음성신호의 특징 추출 장치의 일 실시 예를 나타내는 블록도이다.3 is a block diagram showing an embodiment of an apparatus for extracting features of a voice signal by a nonlinear method according to the present invention.

본 발명에 따르는 비선형 방법에 의한 음성신호의 특징 추출 장치(1)는 음성신호를 입력받는 음성신호 입력부(도면에 도시되지 않음), 입력된 음성신호에서 잡음에 의한 음성신호의 편이에 의한 기준 값의 이동을 보정하기 위해 가우시안 커브를 적용하여 상관관계 계수를 검출하는 상관관계 검출부(10), 상관관계 계수 검출부(10)에 의해 보정된 음성신호에 대한 평균 영교차 비율을 검출하여 음성신호를 유성음, 무성음 또는 무음으로 분리하는 평균 영교차 비율 검출부(11), 평균 영교차 비율 검출부(11)에 의해 분류된 유성음과 무성음 각각에 대하여 비선형 처리를 수행할 수 있도록 스위칭 작용을 수행하는 스위칭부(12), 스위칭된 각 무성음, 유성음, 무음 중 유성음을 비선형 처리하는 유성음 비선형 처리부(13), 무성음을 비선형 처리하는 무성음 비선형 처리부(23), 및 무음을 영점처리하는 영점처리부(33), 비선형 처리된 유성음과 무성음 그리고 영점 처리된 무음을 모두 조합하여 음성신호에 대한 데이터를 생성하는 조합부(30)를 포함하여 구성된다.An apparatus 1 for extracting features of a voice signal by a nonlinear method according to the present invention includes a voice signal input unit (not shown) for receiving a voice signal, and a reference value due to the shift of the voice signal due to noise in the input voice signal. The correlation detector 10 detects the correlation coefficient by applying a Gaussian curve to correct the movement of the signal, and detects the average zero crossing ratio of the speech signal corrected by the correlation coefficient detector 10 to produce a voice signal. A switching unit 12 that performs a switching operation to perform nonlinear processing on each of the voiced and unvoiced sounds classified by the average zero crossing ratio detection unit 11 and the average zero crossing ratio detection unit 11 that are separated into an unvoiced sound or a silent sound. ), Voiced sound nonlinear processing unit 13 for non-linear processing of each unvoiced sound, voiced sound, silenced voiced sound, unvoiced sound nonlinear processing unvoiced sound A processing unit 23, and a zero processing unit 33 for zeroing the silence, and a combination unit 30 for generating data on the voice signal by combining all of the nonlinear voiced and unvoiced and zero processed silence. .

여기서 평균 영교차 비율 검출부(11)는 상관관계 계수 검출부(10)에서 가우시안 커브에 의해 잡음에 의해 편이된 음성신호의 대역폭을 보정하고, 대역 필터를 통과 시키지 않고 다수의 주파수가 포함된 음성신호에 대한 영교차 비율을 검출한다. 이렇게 함으로써 음성신호를 대역별로 분할하기 위해 구비되는 다수의 대역 필터를 구비하지 않고도 입력되는 음성신호에서 음소를 유성음, 무성음 및 무음으로 분류할 수 있게된다. 따라서 음성신호가 입력되면 평균 영교차 비율 검출부(11)는 음성을 대역 필터로 주파수 대역별로 분류하지 않고도 음소 각각에 대한 음소를 유성음, 무성음 그리고 무음으로 분류할 수 있다. 여기서 영교차 비율이 10msec 동안 0∼5회 또는 81회 이상이면 무음으로 분류되고, 6∼30회이면 유성음, 31∼80회이면 무성음으로 분류된다.Here, the average zero crossing ratio detector 11 corrects the bandwidth of the voice signal shifted by the Gaussian curve by the Gaussian curve in the correlation coefficient detector 10 and applies the voice signal including a plurality of frequencies without passing through a band pass filter. Detect the zero crossing ratio By doing so, it is possible to classify phonemes as voiced sound, unvoiced sound and silent in the input voice signal without having a plurality of band filters provided for dividing the voice signal into bands. Therefore, when the voice signal is input, the average zero crossing ratio detection unit 11 may classify the phonemes for the phonemes as voiced sounds, unvoiced sounds, and silent voices without classifying the voices according to frequency bands by the band filter. Here, if the zero crossing ratio is 0 to 5 times or 81 times or more during 10 msec, it is classified as silent, 6 to 30 times as voiced sound, and 31 to 80 times as unvoiced sound.

전술된 평균 영교차 비율 검출부(11)에서 유성음과 무성음 그리고 무음으로 분류된 음성신호의 각 음소에 대한 데이터는 다음 단계의 데이터 처리과정을 거치게 되는데,In the aforementioned average zero crossing ratio detection unit 11, data for each phoneme of a voice signal classified into voiced sound, unvoiced sound, and unvoiced is subjected to a data processing of the next step.

먼저 유성음은 유성음 비선형 처리를 수행하는 유성음 비선형 처리부(13)로 보내진다. 유성음 비선형 처리부(13)는 그 내부에 유성음을 서로 다른 대역에서 비선형 처리하여 그 결과 값을 비교할 수 있도록 구비된 제 1 유성음 대역 필터(14) 와 제 2 유성음 대역 필터(15)를 구비한다. 또한 유성음 비선형 처리부(13)는 전술한 두 개의 대역 필터를 통과한 결과 값이 서로 같은지를 비교하기 위한 비교부(16), 결과 값이 일치하지 않는 경우 재 비교가 수행되기 위해 피드백되는 경우 재 비교되는 음성신호의 대역폭이 유성음의 대역폭안에 포함되는지를 판단하기 위한 대역비교부(27)를 포함한다.First, the voiced sound is sent to the voiced sound nonlinear processing unit 13 which performs the voiced sound nonlinear processing. The voiced sound nonlinear processing unit 13 includes a first voiced sound band filter 14 and a second voiced sound band filter 15 provided therein so as to non-linear process voiced sound in different bands and compare the result. In addition, the voiced sound nonlinear processing unit 13 includes a comparison unit 16 for comparing whether the result values passing through the above-described two band filters are equal to each other. And a band comparator 27 for determining whether the bandwidth of the voice signal is included in the bandwidth of the voiced sound.

유성음 비선형 처리부(13)에서 두 개의 대역 필터를 이용하여 유성음을 분류하고 비선형 처리하여 결과 값을 비교하는 것은 추출된 음성신호가 서로 다른 대역에 걸쳐 존재할 수 있기 때문에 서로 다른 대역에 있는 음성신호의 영향으로평균 영교차 비율 검출부(11)에서 정확하게 유성음, 무성음 및 무음을 구별하지 못하는 경우가 있기 때문이다.The voiced sound nonlinear processing unit 13 classifies voiced sounds using two band filters and compares the result values by nonlinear processing, because the extracted voice signals may exist in different bands, and thus the influence of voice signals in different bands. This is because the average zero crossing ratio detection unit 11 may not accurately distinguish voiced sounds, unvoiced sounds, and silent sounds.

전술된 비선형 처리부(13)에서 하기에 설명될 유성음 비선형 처리 방법에 의하여 각 음소가 오류없이 유성음만으로 제대로 분류되었는지를 확인하여 유성음으로 확정되면, 각 음소를 유성음으로 분류하여 유성음의 음소에 대한 음성신호 특성을 추출해 내고, 평균 영교차 비율 검출부(11)에서 유성음이 아닌 것이 유성음으로 잘 못 분류되었다고 판단되면, 무성음으로 처리하여 무성음에 대한 비선형 처리를 수행하는 무성음 비선형 처리부(23)로 음성신호에 대한 샘플링된 데이터 신호를 전송한다.In the above-described nonlinear processing unit 13, the voiced sound nonlinear processing method to be described below confirms that each phoneme is properly classified as a voiced sound without error, and when it is determined to be voiced sound, it is classified as a voiced sound and the voice signal for the phoneme of the voiced sound is classified. When the feature is extracted and it is determined that the average zero crossing ratio detection unit 11 is not classified as the voiced sound, the voiced nonlinear processing unit 23 performs a non-linear processing for the unvoiced sound and processes the voiced sound. Transmit the sampled data signal.

전술된 유성음 처리과정이 진행됨과 동시에 평균 영교차 비율 검출부(11)에서 검출된 무성음은 무성음 비선형 처리부인 무성음 비선형 처리부(23)로 보내진다. 여기서 전술된 유성음 비선형 처리부(13)에서와 유사한 방법을 사용하여 전 단계에 구성된 평균 영교차 비율 검출부(11)에서 무성음으로 분류된 음성신호의 데이터 신호와 유성음 비선형 처리부(13)에서 무성음으로 분류되어 전송되어진 음성신호를 전술된 유성음 비선형 처리부(13)에서 와 같이 무성음에 대한 두 개의 서로 다른 대역 필터인 제 1 무성음 대역 필터(24)와 제 2 무성음 대역 필터(25)를 사용하여 무성음 비선형 처리한 후 그 결과 값을 비교하여 무성음으로 확정되면 무성음으로 분류 무성음의 음소에 대한 음성신호 특성을 추출해 내고, 전 단계에서 무성음이 아니라는 결과가 나오면 해당 음소의 대역폭을 변경하여 다시 무성음에 대한 비교를 실시한다. 이 과정에서 대역비교부(27)에서 무성음 비선형 처리되는 음성신호의 대역이 무성음의 신호의 대역을 벗어났는지를 조사하고, 벗어나지 않았으면 무성음 비선형 처리 과정을 반복 수행하고, 무성음의 신호에 대한 대역을 벗어난 경우에는 행당 음성신호를 무성음 처리하여 영점처리부로 전송 영정 처리한다.While the voiced sound processing process described above is performed, the unvoiced sound detected by the average zero crossing ratio detection unit 11 is sent to the unvoiced sound nonlinear processor 23 which is an unvoiced sound nonlinear processor. Here, using the method similar to the voiced sound nonlinear processing unit 13 described above, the data signal of the voice signal classified as unvoiced sound in the average zero crossing ratio detection unit 11 configured in the previous step and the voiced sound nonlinear processing unit 13 are classified as unvoiced sound. The unvoiced nonlinear processing of the transmitted voice signal is performed using the first unvoiced band filter 24 and the second unvoiced band filter 25, which are two different band filters for unvoiced sound as in the aforementioned voiced sound nonlinear processing unit 13. After that, if the result is compared to the unvoiced sound, the voice signal characteristics of the unvoiced sound are extracted. If the result is not unvoiced in the previous step, the bandwidth of the corresponding phone is changed and the unvoiced sound is compared again. . In this process, the band comparator 27 checks whether the band of the voice signal subjected to the unvoiced nonlinear processing is out of the band of the unvoiced sound signal, and if not, repeats the unvoiced nonlinear processing and repeats the band for the unvoiced signal. In the case of deviation, the voice signal per row is processed by voice processing to be transmitted to the zero processing unit.

전술된 영점처리부(33) 또한 전술된 유성음 비선형 처리과정 및 무성음 비선형 처리과정이 진행되는 것과 동시에 무음에 대한 영점처리를 수행하며, 무성음 비선형 처리과정에서 무음으로 판단된 음성신호 또한 영점처리한다.The aforementioned zero processing unit 33 also performs the zero processing for the unvoiced sound at the same time as the aforementioned voiced sound nonlinear processing and the unvoiced sound nonlinear processing, and also zero-processes the voice signal determined to be silent in the unvoiced nonlinear processing.

이렇게 음성신호의 각 음소에 대하여 유성음, 무성음, 무음으로 분리되고 비선형 처리된 각 음소 파라미터에 대한 데이터 신호는 조합부(30)에 의하여 합쳐져서 다음 과정에 사용될 수 있도록 음성입력신호를 각 음소별로 통합하여 음성신호를 데이터처럼 처리할 수 있도록 데이터화한다.In this way, the data signal for each phoneme parameter that is divided into voiced sound, unvoiced sound, and unvoiced and nonlinear processing for each phoneme of the voice signal is combined by the combination unit 30 to integrate the voice input signal for each phoneme to be used in the next process. Data is processed so that voice signals can be processed like data.

전술된 조합부(30) 다음에 수행될 수 있는 과정은, 음성신호의 데이터 압축,전송, 음성인식 명령의 수행, 음성신호의 출력 등 음성에 관한 모든 응용에 사용될 수 있다.The process that can be performed after the combination unit 30 described above can be used for all applications related to voice, such as data compression, transmission of voice signals, performance of voice recognition commands, and output of voice signals.

여기서 음성신호 특성 추출을 위한 데이터로써 음성신호에 단어, 문장 등의 데이터 베이스를 구비하는 것은 필수 사항이다.Here, it is essential to provide a database of words, sentences, etc. in the voice signal as data for extracting voice signal characteristics.

본원 발명에서는 음성신호 비교를 위한 데이터 베이스로서, 단어나 문장에 대한 정보를 저장하는 것이 아니라 음소 즉 'ㄱ', 'ㄴ'.....'아'.... 등의 음소에 대한 음소 파라미터를 저장하여 음성신호의 특성 데이터 신호추출 및 합성에 이용한다. 따라서 종래에서 사용되는 음성신호 특성 추출을 위한 데이터 베이스 구축의 어려움을 해결할 뿐 아니라 그 시간과 작업량을 현저하게 줄여준다.In the present invention, as a database for voice signal comparison, rather than storing information about words or sentences, phonemes for phonemes, such as 'ㄱ', 'B' ..... 'ah' .... The parameters are stored and used for signal extraction and synthesis of the characteristic data of the audio signal. Therefore, it not only solves the difficulty of constructing the database for extracting the speech signal characteristics used in the related art but also significantly reduces the time and the work amount.

도 4는 본 비선형 방법에 의한 음성신호의 특징 추출 방법의 전체 처리과정을 나타내는 순서도이다.4 is a flowchart showing the overall processing of the feature extraction method of the speech signal by the nonlinear method.

도시된 바와 같이 비선형 방법에 의한 음성신호의 특징 추출 방법은, 입력된 음성신호에 대하여 상관관계 계수 검출부(10)에서 가우시안 커브를 사용하여 상관관계 계수를 검출하는 단계(S401,S402), 가우시안 커브가 적용된 음성신호에 대하여 평균 영교차 검출부(11)에서 평균 영교차 비율을 검출하는 단계(S403), 검출된 평균 영교차 비율에 따라 전술된 스위치부(12)에서 음성신호를 유성음 또는 무성음 또는 무음으로 분류하는 단계(S404), 유성음으로 분류된 경우 유성음 비선형 처리부(13)에서 유성음 비선형 처리를 수행하는 단계(S405), 전술된 평균 영교차 비율 검출부(11)에서 입력된 음성신호에서 추출된 음소가 무성음으로 분류된 경우 무성음 비선형 처리를 수행하는 단계(S406), 전술된 평균 영교차 비율 검출부(11)에서 입력된 음성신호에서 추출된 음소가 무음으로 분류된 경우 영점처리부(33)에서 무음에 대한 영점처리를 수행하는 단계(S407), 다음으로 전술된 유성음 비선형 처리단계, 무성음 비선형 처리단계 및 무음에 대한 영점처리 단계에서 처리된 유성음, 무성음 및 무음의 음소에 대한 음소 파라미터의 데이터 값을 합성하여 음성신호 데이터화 하는 단계(S408,S409)를 포함하여 구성된다.As shown, in the feature extraction method of the speech signal by the nonlinear method, the correlation coefficient detection unit 10 detects the correlation coefficient by using the Gaussian curve with respect to the input speech signal (S401, S402), Gaussian curve The average zero crossing ratio is detected by the average zero crossing detection unit 11 with respect to the voice signal to which S is applied (S403), and the voice signal is unvoiced or unvoiced or silent in the above-described switch unit 12 according to the detected average zero crossing ratio. In step S404, if classified as voiced sound, the voiced sound nonlinear processing unit 13 performs voiced sound nonlinear processing (S405), and the phoneme extracted from the voice signal input by the above-described average zero crossing ratio detection unit 11. If the speech is classified as an unvoiced sound (S406), performing the unvoiced nonlinear processing, the phoneme extracted from the voice signal input from the above-described average zero crossing ratio detector 11 If it is classified as a silent, the zero processing for performing a zero point in the zero processing unit 33 (S407), then the voiced sound non-linear processing step, the unvoiced non-linear processing step and the zero processing step for the silent voiced voice, unvoiced sound And synthesizing the data values of the phoneme parameters for the silent phonemes to form voice signal data (S408 and S409).

다음으로 비선형 방법에 의한 음성신호의 특징 추출 방법을 보다 상세하게 설명한다.Next, the feature extraction method of the audio signal by the nonlinear method will be described in more detail.

먼저 음성신호가 입력되면 상관관계 계수 검출부(10)에서 가우시안 커브를 이용하여 상관관계를 구한 후 음성신호의 잡음에 대한 영향을 제거하고 처리된 음성신호를 평균 영교차 비율 검출부(11)로 전송한다(S401,S402). 가우시안 커브에 의해 처리된 음성신호는 평균 영교차 비율 검출부(11)에서 음성신호를 대역별로 분리하지 않고 음성신호 중 주파수 특성이 동일한 음소의 주파수에서 평균 영교차 비율이 검출된다. 여기서 평균 영교차 비율 검출부(11)에서 검출된 데이터는 음성신호 주파수에 대한 영교차 비율, 영교차 비율에서 동일 영교차를 갖는 신호가 시작되는 시작시간, 및 동일 영교차 비율을 갖는 주파수가 지속되는 시간에 대한 데이터가 검출된다(S403). 평균 영교차 비율이 검출되면 검출된 음성신호에 대한 평균 영교차 비율에 의하여 유성음과 무성음 그리고 무음으로 음성신호를 분리한다. 여기서 유성음, 무성음 그리고 무음의 판단 기준은 전술된 바와 같이 영교차 비율이 10msec 동안 6∼30회이면 유성음, 31∼80회이면 무성음으로 분류되고, 0∼5회 또는 81회 이상이면 무음으로 분류된다(S404). 분리된 음성신호 중 유성음은 유성음 비선형 처리부(13)에서 유성음에 대한 비선형 처리가 수행됨과 동시에 유성음이 정확하게 분류되었는지를 판단한다(S405). 판단된 결과 유성음으로 정확하게 분류되었다고 판단되면, 유성음에 대한 음성신호 데이터 특성을 추출하여 데이터화한다. 그리고 전술된 S405의 판단 결과 유성음이 아니라고 판단되면, 평균 영교차 검출부에서 유성음으로 분류된 음성신호를 무성음으로 분류하여 무성음 비선형 처리부인 무성음 비선형 처리부(23)로 전송한다. 무성음 비선형 처리부(23)는 전술된 평균 영교차 비율 검출부(11)에서 검출된 무성음에 대한 무성음의 비선형 처리를 유성음 비선형 처리과정과 동시에 수행하며 또한, 전술된 유성음 비선형 처리부(13)에서 무성음으로 분류된 음성신호의 음소신호를 다시 무성음만으로 정확하게 분류되었는지를 판단한다. 전술된 무성음 판단 단계에서 전술된 평균 영교차 비율 검출부(11)에서 검출된 음성신호 중의 음소신호가 무성음이 확실하다고 판단되면, 무성음에 대한 음성신호 데이터 특성을 추출하여 데이터화한다. 전술된 유성음 비선형 처리과정과 무성음 비선형 처리과정이 수행되는 것과 동시에 평균 영교차 비율 검출부(11)에서 무음으로 분류된 신호에 대하여 영점철처리부(33)에서 영점처리를 수행하고, 전술된 무성음 비선형 처리과정에서 무음으로 판단되어 영점처리부(33)로 전송된 무음에 대하여도 영점처리를 수행한다(S407),First, when a voice signal is input, the correlation coefficient detector 10 obtains a correlation using a Gaussian curve, removes the influence of the noise of the voice signal, and transmits the processed voice signal to the average zero crossing ratio detector 11. (S401, S402). In the speech signal processed by the Gaussian curve, the average zero crossing ratio detection unit 11 detects the average zero crossing ratio at frequencies of phonemes having the same frequency characteristics among the voice signals without separating the voice signals into bands. Here, the data detected by the average zero crossing ratio detection unit 11 includes a zero crossing ratio with respect to the voice signal frequency, a start time at which a signal having the same zero crossing starts at the zero crossing ratio, and a frequency having the same zero crossing ratio. Data for time is detected (S403). When the average zero crossing ratio is detected, the voice signal is divided into voiced sound, unvoiced sound, and silent by the average zero crossing ratio of the detected voice signal. Here, the criteria for judging voiced sounds, unvoiced sounds and unvoiced sounds are classified as voiced sounds if the ratio of zero crossing is 6 to 30 times for 10 msec, unvoiced sounds for 31 to 80 times, and silent if 0 to 5 or 81 times. (S404). The voiced sound of the separated voice signals determines whether the voiced sound is correctly classified at the same time as the voiced sound nonlinear processing unit 13 performs the nonlinear processing on the voiced sound. If it is determined that the sound classification is correctly classified as voiced sound, the voice signal data characteristic of the voiced sound is extracted to be data. When the determination result of the above-described S405 is not the voiced sound, the average zero-crossing detector classifies the voice signal classified as the voiced sound as unvoiced sound and transmits the unvoiced sound to the unvoiced nonlinear processor 23 which is an unvoiced nonlinear processor. The unvoiced nonlinear processing unit 23 performs the nonlinear processing of the unvoiced sound for the unvoiced sound detected by the aforementioned average zero crossing ratio detection unit 11 simultaneously with the voiced sound nonlinear processing, and also classifies the unvoiced sound in the aforementioned voiced sound nonlinear processing unit 13 as unvoiced sound. It is determined whether the phoneme signal of the voice signal is correctly classified into only the unvoiced sound. If the phoneme signal in the voice signal detected by the aforementioned average zero crossing ratio detection unit 11 in the above-described unvoiced sound determination step is determined to be unvoiced sound, the voice signal data characteristic of the unvoiced sound is extracted and data-set. Simultaneously with the aforementioned voiced sound nonlinear processing and unvoiced nonlinear processing, the zero point processing section 33 performs zero processing on the signal classified as silent in the average zero crossing ratio detection section 11, and the aforementioned unvoiced nonlinear processing. In the process, it is determined to be silent and performs zero processing on the silent transmitted to the zero processing unit 33 (S407).

다음으로 유성음 비선형 처리된 유성음과 무성음 비선형 처리된 무성음에 대한 비선형 처리된 음성신호 특성 데이터와 영점처리부(33)에서 영점처리된 무음에 대한 음성신호 데이터를 조합한다(S408). 조합된 신호는 데이터화되어 음성인식 시스템, 음성인식 문자 출력 시스템, 음성데이터 전송 시스템, 음성신호 저장 시스템 등에서 활용될 수 있다.Next, the nonlinear processed voice signal characteristic data for the voiced sound nonlinear processed voiced sound and the unvoiced nonlinear processed voice sound and the voice signal data for the zero-processed silent sound are combined at step S408. The combined signal may be data and utilized in a voice recognition system, a voice recognition text output system, a voice data transmission system, a voice signal storage system, and the like.

여기서 비선형 방법에 의해 추출된 음성신호를 상기 조합부(30)에서 조합할 때에는 비선형적으로 추출된 음성신호의 각 유성음, 무성음 및 무음 데이터를 선형적인 시간 간격에서 선형적으로 상호연결 처리하여 음소사이의 데이터를 보존하고 음질의 향상을 도모한다.Here, when combining the speech signal extracted by the nonlinear method in the combination unit 30, the voiced, unvoiced, and silent data of the nonlinearly extracted speech signal are linearly interconnected at linear time intervals, and between phonemes. We save data and plan improvement of sound quality.

도 5는 도 4의 과정 중 유성음 비선형 처리 과정의 서브루틴을 나타내는 순서도이다.FIG. 5 is a flowchart illustrating a subroutine of the voiced sound nonlinear processing of FIG. 4.

유성음 비선형 처리 과정의 서브루틴은, 유성음에 대하여 분할된 주파수 대역을 표시하도록 정해진 정수 값을 할당하기 위한 정수 변수 j를 발생 시키고 4를 할당하는 단계(S551), 다음으로 할당된 정수값을 갖는 주파수 대역과 할당된 정수 값에 1을 더한 값에 대응되는 주파수 대역을 하기에 설명될 하 웨이브렛을 이용한 동적 웨이브렛 변환을 수행하여 그 결과 값을 서로 비교하는 단계(S552), 전술된 단계 S552의 비교결과, 결과 값이 서로 같으면 유성음으로 처리하여 유성음의 특성을 검출하는 단계(S553), 전술된 단계 S552의 비교 결과, 결과 값이 서로 다르면 정수 값으로 설정된 정수 값 j에 1을 더하고(S554), 더해진 정수 값에 대응되는 주파수의 대역폭이 유성음에 대한 주파수 대역인지를 판단하기 위하여 6보다 큰지를 판단하는 단계(S556), 전술된 단계 S556에서 정수에 대한 변수 값이 6보다 작은 경우 단계 S552부터 반복 처리하고, 전술된 단계 S556에서 정수에 대한 변수 값이 6보다 큰 경우에는 해당 음성신호의 음소 데이터를 무성음으로 분류하고 단계 S406으로 이동하는 단계를 포함하여 구성된다.The subroutine of the voiced sound nonlinear processing may include generating an integer variable j for allocating an integer value determined to indicate a divided frequency band for voiced sound and assigning 4 (S551), and then having a frequency having an assigned integer value. Performing a dynamic wavelet transformation using a lower wavelet, which will be described below, for the band and the frequency band corresponding to the assigned integer value plus 1, and comparing the resultant values with each other (S552). As a result of the comparison, if the result values are equal to each other, processing as voiced sound is detected (S553). As a result of the comparison of step S552 described above, if the result values are different from each other, 1 is added to an integer value j set as an integer value (S554). Determining whether the bandwidth of the frequency corresponding to the added integer value is greater than 6 to determine whether the bandwidth of the voiced sound is a frequency band (S556), and the aforementioned step S5 If the variable value for the integer is less than 6 at 56, the process is repeated from step S552. If the variable value for the integer is greater than 6 at step S556 described above, the phoneme data of the corresponding speech signal is classified as unvoiced and moved to step S406. It is configured to include.

전술된 바와 같이 유성음이 정확하게 검출되었는지를 판단하는 단계 S552에서 하 웨이브렛을 이용한 동적 웨이브렛 변환을 사용하여 비교하는 이유는, 유성음이 그 주기성이 뚜렷하고 저주파 성분이 강하기 때문이며, 하 웨이브렛이 저주파 성분이 뚜렷하고 순간적인 변화를 갖는 이러한 유성음에 좋은 결과를 갖기 때문이다. 그리고 전술된 평균 영교차 비율 검출부에서 검출된 음성신호를 비선형 처리하여 결과를 얻을 때 그 결과의 정확도를 높이기 위해, 두개의 대역 필터 즉 제 1 유성음 대역 필터(14)와 제 2 유성음 대역 필터(15)에서 동적 웨이브렛 변환을 하여 얻은 결과를 비교하는 이유는 평균 영교차 비율 검출부(11)에서 검출된 평균 영교차 비율의 결과 값에 대한 정확도를 높이기 위함이며, 비교를 두 번만 수행하는 것은 기 설정된 주파수 대역을 나타내는 j가 두 단계 후에는 유성음이 아닌 다른 고주파 성분의 특징을 갖기 때문이다. 또한 전술된 과정에서 유성음의 두개의 대역폭에서의 동적 웨이브렛 변환결과가 서로 일치하지 않는 경우에, 무성음으로 설정하는 이유는, 유성음으로 판단된 음성신호가 유성음과 무성음, 유성음과 무음 또는 유성음에 의하여 영향을 받은 무성음일 수 있으므로, 이 때의 음성신호를 우선적으로 무성음으로 분류하여 무성음에 맞는 비선형 처리를 수행하기 위함이다.The reason why the voiced sound is detected using the dynamic wavelet transform using the lower wavelet in step S552 for determining whether the voiced sound has been accurately detected is because the voiced sound has a distinct periodicity and a low frequency component is strong, and the lower wavelet is a low frequency component. This is because it has good results for these voiced voices with distinct and instantaneous changes. In order to increase the accuracy of the result when the speech signal detected by the aforementioned average zero crossing ratio detector is obtained by nonlinear processing, two band filters, that is, the first voiced sound band filter 14 and the second voiced sound band filter 15 The reason for comparing the result obtained by the dynamic wavelet transformation in the above is to increase the accuracy of the result value of the average zero crossing ratio detected by the average zero crossing ratio detection unit 11, and performing only two comparisons This is because j, which represents a frequency band, is characterized by a high frequency component other than voiced sound after two steps. In addition, when the results of the dynamic wavelet conversion in the two bandwidths of the voiced sound do not match with each other in the above-described process, the reason for setting the voiced sound is that the voice signal judged as voiced sound is determined by voiced and unvoiced sound, voiced and silent or voiced sound. This may be an unvoiced sound affected, so that the voice signal at this time is first classified as an unvoiced sound to perform nonlinear processing for the unvoiced sound.

도 6은 도 4의 과정 중 무성음 비선형 처리과정의 서브루틴을 나타내는 순서도이다.FIG. 6 is a flowchart illustrating a subroutine of the unvoiced nonlinear processing of FIG. 4.

무성음 비선형 처리 과정의 서브루틴은, 무성음에 대하여 분할된 주파수 대역을 표시하도록 정해진 정수 값을 할당하기 위한 정수 변수 j를 발생 시키고 6를 할당하는 단계(S661), 다음으로 할당된 정수값에 대응되는 주파수 대역과 할당된 정수 값에 1을 더한 정수 값에 대응되는 주파수 대역을 하기에 설명될 스플라인 웨이브렛을 이용한 동적 웨이브렛 변환을 수행하여, 그 결과 값을 서로 비교하는 단계(S662), 전술된 단계 S662의 비교 결과, 결과 값이 서로 같으면 무성음으로 처리하여 무성음의 특성을 검출하는 단계(S663), 전술된 단계 S662의 비교 결과, 결과 값이 서로 다르면 정수 값으로 설정된 변수 j에 1을 더하고(S664), 더해진 정수 값에 대응되는 주파수의 대역폭이 무성음에 대한 주파수 대역인지를 판단하기 위하여 변수에 할당된 정수 값이 9보다 큰지를 판단하는 단계(S666), 전술된 단계 S666에서 정수에 대한 변수 값이 9보다 작은 경우 단계 S662부터 반복 처리하고, 전술된 단계 S666에서 정수에 대한 변수 값이 9보다 큰 경우에는 해당 음성신호의 음소 데이터를 무음으로 분류하고 단계 S407로 이동하는 단계를 포함하여 구성된다.The subroutine of the unvoiced nonlinear processing process generates an integer variable j for allocating an integer value determined to display a divided frequency band for unvoiced sound and assigns 6 (S661), which corresponds to the next assigned integer value. Performing a dynamic wavelet transform using a spline wavelet, which will be described below, for the frequency band and the frequency band corresponding to the integer value plus 1 to the assigned integer value, and comparing the result values with each other (S662), As a result of the comparison in step S662, if the result values are the same, processing is performed as unvoiced sound (S663), and if the result of the comparison of the above-described step S662 is different, the result j is added to the variable j set to an integer value (1); S664), the integer value assigned to the variable is greater than 9 to determine whether the bandwidth of the frequency corresponding to the added integer value is the frequency band for the unvoiced sound. In step S666, if the variable value for the integer is less than 9 in step S666 described above, the process is repeated from step S662. If the variable value for the integer is greater than 9 in step S666, the corresponding voice signal is determined. And classifying the phoneme data into silence and moving to step S407.

전술된 무성음 비선형 처리부(23)에서 수행되는 무성음 비선형 처리 과정은, 평균 영교차 비율 검출부에 의해 무성음으로 판단된 음성신호와 유성음 비선형 처리부(13)에서 무성음으로 분류된 무성음들에 대한 음성신호를 대상으로 수행된다.The unvoiced nonlinear processing performed by the unvoiced nonlinear processing unit 23 described above includes a voice signal determined as an unvoiced sound by an average zero crossing ratio detection unit and a voice signal for unvoiced sounds classified as unvoiced sound by the voiced sound nonlinear processing unit 13. Is performed.

여기서 전술된 바와 같이 무성음에 대한 비선형 처리과정에서 스플라인 웨이브렛을 이용한 동적웨이브렛 변환을 사용하는 이유는, 무성음은 주기성이 불분명하고 고주파 성분이 강하기 때문이며, 이는 코사인의 조합으로 이루어진 스플라인 웨이브렛이 그 특성을 잘 나타내 주기 때문이다. 그리고 전술된 평균 영교차 비율 검출부에서 검출된 음성신호를 비선형 처리하여 결과를 얻을 때 그 결과의 정확도를 높이기 위해, 두개의 대역 필터 즉 제 1 무성음 대역 필터(24)와 제 2 무성음 대역 필터(25)에서 동적 웨이브렛 변환을 하여 얻은 결과를 비교하는 이유는 평균 영교차 비율 검출부(11)에서 검출된 평균 영교차 비율의 결과 값에 대한 정확도를 높이기 위함이며, 비교를 세 번만 수행하는 것은 기 설정된 주파수 대역을 나타내는 j가 세 단계 후에는 무성음이 아닌 다른 고주파 성분의 특징을 갖기 때문이다. 또한 전술된 과정에서 무성음의 두개의 대역폭에서의 동적 웨이브렛 변환결과가 서로 일치하지 않는 경우에, 무음으로 설정하는 이유는, 무성음으로 판단된 음성신호가 무성음과 무음 사이의 변환 구간이나 음소사이의 연결부분 또는 기타 잡음에 해당되기 때문이다. 전술된 무성음 비선형 처리과정에서 무음으로 처리된 음성신호는 영점처리부(33)로 전송되어 영점처리된다.The reason for using the dynamic wavelet transform using the spline wavelet in the nonlinear processing of the unvoiced sound as described above is that the unvoiced sound has a periodicity and the high frequency component is strong, which is a spline wavelet composed of a combination of cosines. Because it shows the characteristics well. In order to improve the accuracy of the result when the speech signal detected by the aforementioned average zero crossing ratio detector is obtained by nonlinear processing, two band filters, that is, the first unvoiced band filter 24 and the second unvoiced band filter 25 The reason for comparing the result obtained by the dynamic wavelet transformation in the above is to increase the accuracy of the result value of the average zero crossing ratio detected by the average zero crossing ratio detection unit 11, and performing the comparison only three times This is because j, which represents a frequency band, is characterized by high frequency components other than unvoiced sound after three steps. Also, when the results of the dynamic wavelet conversion in the two bandwidths of the unvoiced sound do not match with each other in the above-described process, the reason for setting the unvoiced sound is that the voice signal judged as unvoiced sound is divided between the uninterrupted sound interval and the phoneme. This is because of the connection or other noise. In the above-described unvoiced nonlinear processing, the voice signal processed as unvoiced is transmitted to the zero processor 33 for zero processing.

본 발명에서는 유성음 및 무성음을 비선형 처리할 때 동적 웨이브렛(Dynamic Wavelet Transform : DyWT)을 사용하였으나, 이외의 웨이브렛변환에는 연속적 웨이브렛 변환(Continuous Wavelet Transform : CWT)이나 이산 웨이브렛 변환(Discrete Wavelet Transform : DWT)이 있다. 그러나 음성신호의 주파수 대역폭적 특징을 고려할 때 주파수 대역폭이 2의 승수로 되는 비선형 주파수 대역폭이 음성신호의 주파수 대역폭적 특징과 가장 잘 맞는다. 그러므로 본 발명에서 주파수 대역폭적 특징이 2의 승수로 구성된 비선형 주파수 대역폭을 갖는 동적 웨이브렛 변환을 사용하여 음성신호의 특징을 추출하였다.In the present invention, the dynamic wavelet transform (DyWT) is used when nonlinear processing of voiced and unvoiced sounds, but for other wavelet transforms, continuous wavelet transform (CWT) or discrete wavelet transform (Discrete Wavelet) Transform: DWT). However, considering the frequency bandwidth characteristics of the speech signal, the nonlinear frequency bandwidth of which the frequency bandwidth is a multiplier of 2 best matches the frequency bandwidth characteristic of the speech signal. Therefore, in the present invention, the feature of the speech signal is extracted using a dynamic wavelet transform having a nonlinear frequency bandwidth consisting of a multiplier of two.

본 발명에서 동적 웨이브렛 변환을 사용함에 있어서 이용된 웨이브렛은 유성음에 대한 비선형 처리 시에는 하 웨이브렛(Haar Wavelet)을, 무성음에 대한 비선형 처리 시에는 스플라인 웨이브렛(Spline Wavelet)을 사용하였는데, 유성음과 무성음의 구분없이 하 웨이브렛이나 스플라인 웨이브렛의 사용이 가능하다. 또한 위의 두가지 웨이브렛 이외에 가우시안 웨이브렛(Gaussian Wavelet)이나 최소 위상 웨이브렛(Minimum Phase Wavelet)이 있다.In the present invention, the wavelet used in the use of the dynamic wavelet transform uses a haar wavelet for nonlinear processing of voiced sound and a spline wavelet for nonlinear processing of unvoiced sound. It is possible to use a low wavelet or a spline wavelet without distinguishing between voiced and unvoiced sounds. In addition to the above two wavelets, there are also Gaussian wavelets or minimum phase wavelets.

그러나 전술된 바와 같이 유성음의 경우는 주기성이 뚜렷하고 저주파 성분이 강하므로, 입력되는 신호의 변화만으로 특성을 나타낼 수 있는 하 웨이브렛이 사용되었고, 무성음의 경우는 주기성이 불분명하고 고주파 성분이 강하므로 신호 전체에 대한 변화를 나타낼 수 있도록 코사인 조합에 의하여 이루어져 전술된 무성음의 특성을 잘 나타내는 스플라인 웨이브렛(Spline Wavelet)을 사용하였다.However, as described above, in the case of voiced sound, since the periodicity is distinct and the low frequency component is strong, a low wavelet is used that can be characterized only by the change of the input signal, and in the case of the unvoiced sound, the periodicity is unclear and the high frequency component is strong. Spline wavelet was used by cosine combination to show the change of the whole, which shows the characteristic of the unvoiced sound described above.

즉 입력되는 음성신호를 동적 웨이브렛 변환에 의한 비선형 처리를 할 경우, 유성음의 경우 하 웨이브렛을, 무성음의 경우 스플라인 웨이브렛을 사용하였을 때 음성신호의 특징을 가장 잘 추출할 수 있다.That is, when the input voice signal is subjected to nonlinear processing by dynamic wavelet conversion, the characteristics of the voice signal can be best extracted when the lower wavelet is used for voiced sound and the spline wavelet is used for unvoiced sound.

이러한 방법은 음성신호를 음성의 특징, 음성의 발성시간 및 음성의 발성시각의 3가지 요소만으로 표현 가능하게 한다. 따라서 음성신호에 대한 데이터 베이스의 구축시 문장, 문구, 단어 등을 고려하지 않고 음소에 대한 데이터만을 데이터 베이스화하므로 데이터 베이스 구축이 간소화되게 된다.This method makes it possible to express a speech signal using only three components: speech characteristics, speech uttering time, and speech uttering time. Therefore, when constructing the database for the voice signal, the database construction is simplified because only the data for the phoneme is databased without considering sentences, phrases, words, and the like.

여기서 음성의 특성이라 함은 음성신호 각각에 대한 전술된 유성음, 무성음 무음처리 결과로 나타나는 각 음소의 데이터 값이며, 음성의 발성시각은 동일 주파수가 처음 시작되는 시간을, 음성의 발성 시간은 동일 주파수가 지속되는 시간을나타낸다.Herein, the characteristic of the speech is a data value of each phoneme resulting from the aforementioned voiced and unvoiced silence processing for each of the speech signals. The speech uttering time is the time at which the same frequency first starts, and the speech uttering time is the same frequency. Indicates the duration of time.

다음은 전술된 하 웨이브렛 및 스플라인 웨이브렛을 이용한 동적 웨이브렛 변환을 나타내는 수학식이다.The following equation represents the dynamic wavelet transform using the above-described lower wavelet and spline wavelet.

수학식 1은 하 웨이브렛을 이용한 동적 웨이브렛 변환을 나타낸다.Equation 1 shows a dynamic wavelet transform using a lower wavelet.

여기서, ⓧ 는 콘볼루션 연산자이고, j는 정수로서의 상수이며, b는 편이 파라메터(Shift Parameter)이다. 그리고 g2^j(t)=1/2^jg_H(t/2^j) 이며, g_H(t)는 하 웨이브렛 으로,Where 콘 is a convolution operator, j is a constant as an integer, and b is a shift parameter. And g2 ^j (t) = 1/2 ^j g _H (t / 2 ^j ), g _H (t) is the lower wavelet,

: 1, 0 < t < 1/2: 1, 0 <t <1/2

g_H(t) = : -1, 1/2 < t < 1g _H (t) =: -1, 1/2 <t <1

: 0, 나머지 구간, 인 특성을 갖는다.: 0, remaining interval, phosphorus characteristics.

다음으로 수학식 2는 스플라인 웨이브렛을 이용한 동적 웨이브렛 변환을 나타낸다.Equation 2 shows a dynamic wavelet transform using a spline wavelet.

여기서, ⓧ 는 콘볼루션 연산자이고, j는 정수로서의 상수이며, b는 편이 파라미터(Shift Parameter)이다.Is the convolution operator, j is a constant as an integer, and b is a shift parameter.

그리고 Gs(2ω)=H₁( ω)ㆍ φ( ω) 로서 스플라인 웨이브렛을나타낸다.The spline wavelet is shown as Gs (2ω) = H ₁ (ω) · φ (ω).

여기서 φ( ω)는,Where φ (ω) is

이고,ego,

|H₁( ω)|²+ |H( ω)|²= 1 을 만족하며,| H ₁ (ω) | ² + | H (ω) | Satisfies ² = 1

H( ω)²= ( cos( ω/2 ))⁴이고H (ω) ² = (cos (ω / 2)) ⁴ and

이다. to be.

본원 발명에 따르는 비선형 방법을 이용한 음성신호의 특징 추출 장치 및 그 방법은,Apparatus and method for extracting features of a speech signal using a nonlinear method according to the present invention,

음성신호를 처리함에 있어서, 유성음 또는 무성음 또는 무음을 서로 다른 대역 필터에서 비선형 처리하기 때문에 그 결과가 명확히 표현될수 있으며,In processing the voice signal, the voiced or unvoiced or unvoiced sound is processed nonlinearly in different band filters, and the result can be clearly expressed.

입력되는 음성신호를 평균 영교차 비율법에 의하여 유성음 또는 무성음 또는 무음으로 구분한 후 이 구분된 음성신호를 비선형 처리함으로써, 음성신호의 처리시간을 단축 시키고,The input speech signal is divided into voiced sound, unvoiced sound or silent by means of the average zero crossing ratio method, and then the non-linear processing of the separated speech signal reduces the processing time of the speech signal.

비선형 처리된 결과가 음성의 특징과 음성의 발성시간 그리고 음성의 발성시각의 3가지 요소로만 표현이 가능하므로 데이터의 압축율을 높일 수 있으며,Non-linearized results can be expressed only by three factors: voice characteristics, voice uttering time, and voice uttering time.

입력되는 음성신호를 단어나 문장이 아닌 음소단위로 처리하므로, 입력되는 음성신호 처리시 비교 파라미터에 대한 데이터 베이스가 단어나 문장과 같은 광법위한 데이터 베이스가 아닌, 음소 파라미터에 대한 최소한의 데이터 베이스 구축만이 필요하므로 필요한 데이터 베이스 구축 작업을 간소화 할 수 있게 하는 효과를 제공한다.Since the input voice signal is processed in phoneme units, not words or sentences, the database for comparison parameters is not the broadest database such as words or sentences. It only needs to provide the effect of simplifying the required database construction.

또한 본 발명에 따른 또 다른 효과는 음성신호를 음소 단위로 처리함에 있어 음소의 특징을 갖는 주파수 대역에서만 처리하므로 다른 대역에 분포되어 있는 잡음을 제거하여 양질의 음성신호에 대한 데이터 결과 값을 얻을 수 있다.In addition, another effect according to the present invention is to process only the frequency band having the characteristics of the phoneme in the processing of the voice signal in the phoneme unit can remove the noise distributed in other bands to obtain a data result value for a good voice signal have.

본원 발명의 또 다른 효과는 음소사이의 연결부분을 주파수 영역에서는 비선형적으로, 시간 영역에서는 선형적으로 처리하므로 음소사이의 데이터 유실이 없고, 자연음에 가까운 결과를 얻을 수 있다는 것이다.Another effect of the present invention is that the connection part between the phonemes is processed nonlinearly in the frequency domain and linearly in the time domain, so that there is no loss of data between the phonemes and a result close to the natural sound can be obtained.

Claims

A voice signal input unit which receives a voice signal from the outside;

A correlation coefficient detector for obtaining a correlation coefficient using a Gaussian curve and correcting the frequency amplitude shift of the speech signal using the correlation coefficient to correct the deviation of the frequency amplitude of the speech signal by noise;

In conjunction with a phoneme database for storing voice signal data for comparison with a voice signal, an average zero crossing ratio is detected in the voice signal output from the correlation coefficient detector to convert the phoneme of the input voice signal into a voiced sound, an unvoiced sound, and / Or a mean zero crossing ratio detector separating silently;

For the voiced sound detected by the average zero crossing ratio detector, dynamic wavelet transform is performed for data by nonlinear processing, and the voiced sound is compared by comparing the result value of the dynamic wavelet transform in two different bandwidths. It is determined whether the unvoiced sound and the unvoiced are classified incorrectly, and if the voiced sound is not voiced as a result of the determination, the voice signal is transmitted through the unvoiced sound processing process. Processing unit;

For the unvoiced sound detected by the average zero crossing ratio detector, dynamic wavelet transform is performed for data by nonlinear processing, and the unvoiced sound is compared with the result of the dynamic wavelet transformed at two different bandwidths. An unvoiced nonlinear processing unit for determining whether the sound is classified incorrectly, and if the result of the determination is not an unvoiced sound, transmitting the voice signal through a silent process, and extracting a voice signal characteristic of the unvoiced sound from the voice signal when the unvoiced sound is determined as the result of the determination;

A zero processing unit for zero-processing the silent signal detected by the average zero crossing ratio detection unit and the silent signal transmitted by being determined to be silent by the unvoiced nonlinear processing unit; And

The voiced sound nonlinear processing unit, the unvoiced sound nonlinear processing unit, and the zero point processing unit linearly interconnect and synthesize voiced data, unvoiced data, and unvoiced data that have been nonlinearly processed by dynamic wavelet transformation. An apparatus for extracting a feature of an audio signal by a nonlinear method, comprising a combination unit for data formation.

The method of claim 1, wherein the average zero crossing ratio detection unit,

If the zero crossing ratio is 0 to 5 times and more than 81 times per 10 msec, it is classified as silent. If the zero crossing ratio is between 6 to 30 times, it is classified as voiced sound. If the zero crossing ratio is between 31 to 80 times, it is classified as silent. Apparatus for extracting voice signal features by a nonlinear method characterized in that the classification,

The method of claim 1 or 2, wherein the average zero crossing ratio detection unit,

Speech signal feature extraction apparatus using a non-linear method, characterized in that for extracting the non-linear characteristic of the speech signal in the non-linear time domain and the non-linear frequency domain.

The method of claim 3, wherein the average zero crossing ratio detection unit,

Extracting frequency information having the same zero crossing ratio among the voice signals and time information at which the same frequency having the same zero crossing ratio starts among the voice signals and time information at which the frequency of the same zero crossing ratio among the voice signals continues. An apparatus for extracting speech signal features by means of a nonlinear processing method.

According to claim 1, The voiced sound nonlinear processing unit,

An apparatus for extracting a feature of a speech signal by a nonlinear method, characterized in that nonlinear processing is performed on a voiced sound using a Harr Wavelet.

The method of claim 1, wherein the unvoiced nonlinear processing unit,

An apparatus for extracting a feature of a speech signal by a nonlinear method, characterized by performing a nonlinear process using a spline wavelet on an unvoiced sound.

The method of claim 1, wherein the voice signal database,

An apparatus for extracting a feature of a speech signal by a nonlinear method, characterized in that it stores only a parameter of a phoneme as data for comparing the speech signal.

A first step of inputting a voice signal from the outside;

A second step of detecting, by the correlation coefficient detector, a correlation coefficient with respect to the input voice signal by using a Gaussian curve;

A third step of detecting an average zero crossing ratio for the speech signal to which the Gaussian curve is applied;

A fourth step of classifying a speech signal into a voiced sound, an unvoiced sound or a silent sound according to the detected average zero crossing ratio;

In the fourth step, the voice signal classified as voiced sound is subjected to voiced sound nonlinear processing, and in the fourth step, the voice signal classified as voiceless is performed unvoiced sound nonlinear processing, and when the voice is classified in the fourth step, the zero processing unit A fifth step of performing a zeroing process on the silence in the first step;

Synthesized non-linear phoneme data values for voiced sound, unvoiced and / or unvoiced phonemes processed in the voiced nonlinear processing step of the fifth step, the unvoiced nonlinear processing step and / or the zero processing step for the silent And a sixth step of converting the voice signal into data.

The method of claim 8, wherein the voiced nonlinear processing step of the fourth step,

A seventh step of generating an integer variable j for allocating an integer value assigned to display the divided frequency band for the voiced sound and allocating 4;

An eighth step of performing dynamic wavelet transformation using a lower wavelet on a frequency band having an assigned integer value and a frequency band having an integer value of which an integer value is added to 1 and comparing the result values with each other;

A ninth step of detecting a characteristic of the voiced sound by treating it as voiced sound if the comparison result of the eighth step is the same;

In the eighth step, if the comparison result is different from each other, a value j is added to the variable j set as an integer value, and the tenth step of determining whether the bandwidth of the frequency corresponding to the added integer value is greater than 6 to determine whether the bandwidth of the voiced sound is a frequency band. step;

If the variable value for the integer is less than 6 as a result of the determination of the tenth step, the process is repeated from the eighth step. If the variable value for the integer is greater than 6, the phoneme data of the corresponding voice signal is determined. And an eleventh step of classifying the signal as an unvoiced sound and performing the unvoiced sound processing of the fourth step.

The method of claim 8, wherein the unvoiced nonlinear processing step of the fourth step,

A twelfth step of generating an integer variable j for allocating an integer value assigned to represent the divided frequency band for the unvoiced sound and allocating a sixth;

A thirteenth step of performing a dynamic wavelet transformation using a spline wavelet on a frequency band having an assigned integer value and a frequency band having an integer value of 1 plus an assigned integer value, and comparing the result values with each other;

A fourteenth step of detecting the characteristics of the unvoiced sound by treating it as unvoiced sound when the result values of the thirteenth step are the same;

As a result of the determination of the thirteenth step, if the result value is different from each other, 1 is added to the variable j set as an integer value, and it is determined whether the bandwidth of the frequency corresponding to the added integer value is greater than 9 to determine whether the bandwidth of the unvoiced sound is a frequency band. A fifteenth step;

If the value of the variable j is less than 9 as a result of the determination of the fifteenth step, the process is repeated from the thirteenth step. If the value of the variable j is greater than nine, the phoneme data of the voice signal is silenced. And a sixteenth step of classifying and performing a zero point processing step of the fourth step.

The method of claim 8, wherein the extracted data combined with the characteristics of the voice signal,

Speech signal feature extraction method using a non-linear method characterized in that the non-linear processing for the phoneme for the voice signal.

The method of claim 8, wherein the combination process of the sixth step,

And extracting the voiced sound, the unvoiced sound, and the unvoiced data of the non-linearly extracted voice signal linearly at linear time intervals.