KR20030095474A

KR20030095474A - Method and apparatus for analysing a pitch, method and system for discriminating a corporal punishment, and computer readable medium storing a program thereof

Info

Publication number: KR20030095474A
Application number: KR1020020032379A
Authority: KR
Inventors: 김영중; 한정희
Original assignee: 휴먼씽크(주)
Priority date: 2002-06-10
Filing date: 2002-06-10
Publication date: 2003-12-24
Also published as: KR100427243B1

Abstract

PURPOSE: A method for analyzing a pitch and an apparatus thereof, and a method for discriminating four constitutions and a system thereof are provided to discriminate a constitution of a speaker by using an analyzed voice of the speaker. CONSTITUTION: An A/D converter(100) converts an analog audio signal provided from a microphone(90) into a digital audio signal. A pure audio signal extracting unit(200) low-pass filters the converted audio signal, extracts a signal over a predetermined threshold for extracting a pure audio signal. A time domain analyzing unit(300) analyzes a waveform of the pure audio signal for outputting time domain analyzed information. An analyzing unit(400) outputs first analyzed information related to the time domain analyzed information through pitch analysis, second analyzed information through formant analysis, and third analysis information through frequency domain analysis. A gram and characteristic factor extracting unit(500) displays the first to third analyzed information and extracts characteristic information. A characteristic factor fusing unit(600) fuses the extracted characteristic factors for outputting the same. A four-constitution discriminating unit(800) checks the output characteristic factors for outputs four-constitution analyzed information corresponding to the audio signal.

Description

TECHNICAL AND APPARATUS FOR ANALYSING A PITCH, METHOD AND SYSTEM FOR DISCRIMINATING A CORPORAL PUNISHMENT, AND COMPUTER READABLE MEDIUM STORING A PROGRAM THEREOF}

본 발명은 피치 분석 및 사상체질 감별 방법에 관한 것으로, 보다 상세하게는 음성의 피치를 분석하기 위한 피치 분석 방법 및 그 장치와, 사상체질 감별 방법 및 그 시스템과, 사상체질 감별 프로그램을 내장한 컴퓨터가 판독 가능한 기록매체에 관한 것이다.The present invention relates to a pitch analysis and filamentous discriminant method, and more particularly, a pitch analysis method and apparatus for analyzing a pitch of speech, a filamentous discriminant method and system, and a computer incorporating a filamentous discriminant program. Relates to a readable recording medium.

일반적으로 음성인식, 합성 및 분석과 같은 디지털 음성신호 처리기술에 있어서 기저기술이라 할 수 있는 기본 주파수(Pitch Frequency) 즉, 피치를 정확히검출하는 것을 매우 중요하다. 이러한 기본 주파수는 음의 전이구간이나 잡음에 혼탁한 음에서는 음의 변화가 심하고 구간별 문턱값을 설정하기가 힘들기 때문에 검출하기가 매우 어렵다. 따라서, 만일 피치 정보를 정확히 검출할 수 있다면, 음성인식에 있어서는 화자에 따른 영향을 최소화하여 포만트 주파수(Formant Frequency)를 통한 인식의 정확도를 높일 수 있게 되고, 음성합성의 경우에는 포만트 주파수와 성도성분을 분리하여 임의로 합성함으로써 자연성과 개성을 쉽게 변경 및 유지할 수 있다.In general, in digital voice signal processing technologies such as speech recognition, synthesis, and analysis, it is very important to accurately detect a pitch frequency, that is, a pitch, which is a basic technology. The fundamental frequency is very difficult to detect in a sound transitioned to a sound transition section or a noise because the sound change is severe and it is difficult to set a threshold value for each section. Therefore, if the pitch information can be accurately detected, it is possible to increase the accuracy of recognition through the formant frequency by minimizing the influence of the speaker in speech recognition, and in the case of speech synthesis, the formant frequency and By separating the constituent components and synthesizing them arbitrarily, the nature and personality can be easily changed and maintained.

또한, 분석시에는 피치에 동기시켜 분석함에 따라 성문의 영향을 제거하고 분석에 따른 오차를 줄여 정확한 성도 파라미터를 통한 고음질을 얻을 수 있게 된다.In addition, in the analysis, it is possible to obtain high sound quality through accurate vocal parameters by removing the influence of the glottal and reducing the error according to the analysis by synchronizing with the pitch.

한편, 일반적으로 사상체질은 본래 조선시대의 동무 이 제마 선생이 독창적으로 창안한 사상의학 체계에서 제시된 인간의 체질들을 말한다. 사상의학에 따르면, 모든 사람은 사상체질의 기본인 태양인, 태음인, 소양인, 소음인의 4체질로 분류되고, 얼굴이 다르고 성격도 다르듯이 사람의 체질도 다르기 때문에, 그에 섭취하는 음식도 달라야 하고 병을 치료하는 방법이나 사용하는 약재도 달리 해야 한다고 한방분야 뿐만 아니라 민간에까지 널리 알려져 있다.On the other hand, in general, ideological constitution refers to human constitutions presented in the ideological medicine system, which was originally invented by Mr. Jema Lee, a friend of the Joseon Dynasty. According to ideology, all people are classified into four constitutions, Sun, Taeumin, Soyangin, and Soinin, which are the basics of Sasang constitution. It is widely known not only in the field of oriental medicine but also in the private sector that the methods of treatment and the medicines used must be different.

이 사상체질에 부가하여 뜨겁고, 차가운 체질을 더 구별하기도 하는데, 이렇게 사람의 체질을 사상체질에 소위 열한 체질로 더 세분화하면, 보다 사람의 체질을 세밀하고 정확하게 분류할 수 있게 되어 그 만큼 사람에 대한 건강유지 및 치료에 있어서 보다 세밀하고 정확하게 대처해 나갈 수 있게 된다.In addition to this filamentous constitution, hot and cold constitutions can be further distinguished, and further subdividing the human constitution into so-called eleven constitutions can lead to a more precise and accurate classification of the human constitution. You will be able to cope more precisely and accurately in maintaining and treating your health.

그러나 이러한 여러 가지 체질 중 어느 한 인간의 체질을 정확하게 감별해 내는 것은 어려운 일인데, 사람의 체질을 감별하는 종래의 방법에는 여러 가지가 있다.However, it is difficult to accurately discriminate between the various constitutions of any one of these various constitutions, and there are various conventional methods of discriminating human constitutions.

즉, 진맥을 이용한 방법, 손가락에 금반지나 은반지를 끼워 알아내는 방법, 설문지를 이용하는 방법 등이 그 대표적인 방법이다.In other words, the method using the vein, how to find a gold ring or silver ring on the finger, the method using a questionnaire and the like is the typical method.

진맥을 이용한 방법은 사람의 진맥을 통해 체질을 알아내는 방법으로 미세한 맥박의 세기, 장단 및 횟수 등으로 사람의 체질을 구분하는 방법이다. 그런데 이 방법은 일반인 뿐 아니라 초보 한의사에게도 어려운 방법으로서, 초보 경력을 소지한 한의사가 진맥하는 경우에도 30 내지 40% 정도의 오진률을 갖는 것으로 알려져 있다.The method using the climax is to find out the constitution through the human climax and to classify the constitution of a person by the intensity, length and frequency of the minute pulse. However, this method is difficult not only for the general public but also for beginner Chinese doctors, and it is known that a doctor having a novice career has an error rate of about 30 to 40%.

또한, 손가락에 금반지나 은반지를 끼워 알아내는 방법은 간, 비, 폐, 신의 연결점인 손가락, 즉 엄지, 중지, 무명지, 약지에 은반지나 금반지를 끼워, 이에 반응하는 인체의 에너지를 측정하여 체질 감별하는 방법이다. 상기한 방법 역시 간장, 비장, 폐장, 신장의 대소에 의해 체질을 감별하는 방법이므로, 체질을 감별하는데 있어서 사람의 전체적인 면에 대한 고려를 소홀히 할 수 있다는 문제점이 있다.In addition, the method of detecting a gold ring or a silver ring on a finger is to insert a silver ring or a gold ring on a finger, that is, a thumb, middle finger, an unknown finger or a ring of the liver, rain, lungs, or god, and measure the energy of the human body in response to the constitution. That's how. The above method is also a method for discriminating the constitution by the size of the liver, spleen, lung, and kidney, and there is a problem in that it is possible to neglect the whole aspect of the person in discriminating the constitution.

또한, 설문지를 이용하는 방법은 설문지를 이용하여 대상자의 몸 상태, 집안 내력, 성격, 기호 등의 여러 자료를 파악하여 체질을 알아내는 통계적 조사 방법인데, 이러한 자료들은 제시되는 설문지의 질문 형태를 통해 조사 목적에 적응하게 수집되며, 조사의 결론은 수집된 자료에 대한 분석을 통하여 얻어진다. 이러한 방법 역시 정확도면에서 떨어질 뿐 아니라, 체계적이지 않은 면들을 가지고 있어서 보다 개선할 여지가 있었다.In addition, the questionnaire survey method is a statistical survey method to find out the constitution by identifying various data such as the subject's physical condition, family history, personality, and preferences by using a questionnaire. Adapted to purpose, the conclusions of the survey are obtained through analysis of the collected data. Not only does this method suffer from inaccuracies, it also has some unstructured aspects that could be improved.

이에 본 발명의 기술과 과제는 이러한 점에 착안한 것으로, 본 발명의 제1 목적은 고속 푸리에 변환 방법을 이용한 피치 분석 방법을 제공하는 것이다.The present invention has been made in view of the above-described problems, and a first object of the present invention is to provide a pitch analysis method using a fast Fourier transform method.

또한, 본 발명의 제2 목적은 상기한 피치 분석 방법을 수행하기 위한 피치 분석 장치를 제공하는 것이다.In addition, a second object of the present invention is to provide a pitch analysis apparatus for performing the above-described pitch analysis method.

또한, 본 발명의 제3 다른 목적은 음성을 이용한 사상체질 감별 방법을 제공하는 것이다.Another object of the present invention is to provide a method for discriminating filamentous constitution using voice.

또한, 본 발명의 제4 목적은 상기한 사상체질 감별 방법을 수행하기 위한 사상체질 감별 시스템을 제공하는 것이다.In addition, a fourth object of the present invention is to provide a filamentous constitution discrimination system for performing the filamentous constitution discrimination method.

또한, 본 발명의 제5 목적은 상기한 사상체질 감별 프로그램을 내장한 컴퓨터가 판독 가능한 기록매체를 제공하는 것이다.Further, a fifth object of the present invention is to provide a computer-readable recording medium incorporating the above-mentioned filamentous constitution discrimination program.

도 1은 본 발명에 따른 사상체질 감별 시스템을 설명하기 위한 도면이다.1 is a view for explaining a filamentous constitution discrimination system according to the present invention.

도 2는 상기한 A/D 변환기로부터 출력되는 디지털 음성신호를 설명하기 위한 파형도이다.2 is a waveform diagram for explaining a digital audio signal output from the A / D converter.

도 3은 상기한 순수 음성신호 추출부로부터 출력되는 순수 음성신호를 설명하기 위한 도면이다.3 is a diagram for explaining a pure voice signal output from the pure voice signal extraction unit.

도 4는 상기한 시간 영역 분석부로부터 출력되는 시간 영역 신호를 설명하기 위한 도면이다.4 is a diagram for describing a time domain signal output from the time domain analyzer.

도 5a 내지 도 5d는 일반적인 음성신호의 모양과 주파수 스펙트럼을 디스플레이하는데, 특히 도 5a와 도 5b는 유성음 및 무성음의 모양을 설명하기 위한 도면이고, 도 5c와 도 5d는 상기한 도 5a와 도 5b에서 각 신호의 주파수 스펙트럼을 통해 포만트 및 피치와의 관계를 설명하기 위한 도면이다.5A to 5D display the shape and frequency spectrum of a general voice signal. In particular, FIGS. 5A and 5B are diagrams for explaining the shape of voiced and unvoiced sounds, and FIGS. 5C and 5D are FIGS. 5A and 5B. Is a view for explaining the relationship between formant and pitch through the frequency spectrum of each signal.

도 6은 상기한 도 1의 피치 분석부를 설명하기 위한 도면이다.FIG. 6 is a diagram for describing the pitch analyzer of FIG. 1.

도 7은 상기 도 6의 피치 분석부로부터 출력되는 피치 분석 신호를 설명하기위한 도면이다.FIG. 7 is a diagram for describing a pitch analysis signal output from the pitch analyzer of FIG. 6.

도 8은 상기한 도 1의 포만트 분석부를 설명하기 위한 도면이다.FIG. 8 is a view for explaining the formant analyzer of FIG. 1.

도 9는 포만트 분석 신호를 설명하기 위한 도면이다.9 is a diagram for explaining a formant analysis signal.

도 10은 상기한 도 1의 주파수 분석부를 설명하기 위한 도면이다.FIG. 10 is a diagram for explaining the frequency analyzer of FIG. 1.

도 11은 주파수 분석 신호를 설명하기 위한 시뮬레이션 도면이다.11 is a simulation diagram for explaining a frequency analysis signal.

도 12는 상기한 도 11의 3차원 그램 분석을 설명하기 위한 시뮬레이션 도면이다.12 is a simulation diagram for explaining the three-dimensional gram analysis of FIG. 11 described above.

도 13은 3차원 신호분석을 설명하기 위한 시뮬레이션 도면이다.13 is a simulation diagram for explaining three-dimensional signal analysis.

도 14는 상기한 도 13을 Z 축 방향에서 X, Y 축 방향을 관찰한 시뮬레이션 도면FIG. 14 is a simulation diagram of FIG. 13 illustrating the X and Y axis directions in the Z axis direction.

도 15는 본 발명에 따른 체질 분류 시스템을 이용한 체질식별의 일례를 설명하기 위한 도면이다.15 is a view for explaining an example of the constitution identification using the constitution classification system according to the present invention.

도 16은 본 발명에 따른 사상체질 분석 방법을 설명하기 위한 흐름도이다.16 is a flowchart illustrating a method for analyzing filamentous constitution according to the present invention.

도 17은 상기한 도 16의 순수 음성신호를 추출하는 일련의 과정을 설명하기 위한 흐름도이다.17 is a flowchart illustrating a series of processes for extracting the pure voice signal of FIG.

도 18은 상기한 도 16의 분류를 위한 디지털 신호 처리 과정을 설명하기 위한 흐름도이다.FIG. 18 is a flowchart for explaining a digital signal processing procedure for the classification of FIG.

도 19는 상기한 도 18의 포만트 분석 처리를 설명하기 위한 흐름도이다.FIG. 19 is a flowchart for explaining the formant analysis process of FIG. 18 described above. FIG.

도 20은 상기한 도 18의 피치 분석을 설명하기 위한 흐름도이다.20 is a flowchart for explaining the pitch analysis of FIG. 18 described above.

도 21은 상기한 도 18의 주파수 분석을 설명하기 위한 흐름도이다.21 is a flowchart for explaining the frequency analysis of FIG. 18 described above.

도 22는 상기한 도 18의 신규 포만트 분석 방법을 설명하기 위한 흐름도이다.FIG. 22 is a flowchart for explaining the novel formant analysis method of FIG. 18.

도 23은 본 발명에 따라 음성 주인공의 사상체질을 감별하기 위한 설문지 형태의 일례를 설명하기 위한 도면이다.23 is a view for explaining an example of the questionnaire form for discriminating the ideological constitution of the voice hero in accordance with the present invention.

<도면의 주요부분에 대한 부호의 설명><Description of the symbols for the main parts of the drawings>

100 : A/D 변환기200 : 순수 음성신호 추출부100: A / D converter 200: pure voice signal extraction unit

300 : 시간 영역 분석부400 : 분석부300: time domain analysis unit 400: analysis unit

410 : 피치 분석부420 : 포만트 분석부410: pitch analysis unit 420: formant analysis unit

430 : 주파수 영역 분석부510 : 3차원 그램 전시부430: frequency domain analysis unit 510: three-dimensional gram display unit

520 : 특징인자 추출부600 : 특징인자 융합부520: feature factor extraction unit 600: feature factor fusion unit

700 : 데이터베이스800 : 체질 분석부700: Database 800: Constitution Analysis

상기한 본 발명의 제1 목적을 실현하기 위한 하나의 특징에 따른 피치 분석 방법은, 성문 개폐의 주기적 특성으로 인해 발생되어 음성 모델링 과정에서 음성 부호화기, 음성 인식, 음성 변환 등에 이용되는 음성의 피치를 분석하기 위한 피치 분석 방법에 있어서, (a) 로우패스필터를 디자인하고, 루프수 카운트를 설정하는 단계; (b) 현재의 루프수 카운트치가 상기 설정한 루프수 카운터치보다 큰 지의 여부를 체크하는 단계; (c) 상기 단계(b)에서 현재의 루프수가 기설정된 루프수보다작다고 체크되는 경우에는 데이터를 읽고, 읽은 데이터를 복조하는 단계; (d) 상기 복조된 데이터에 대해서 데시메이션 처리하는 단계; (e) 현재의 루프수를 증가시키고, 데시메이션된 신호의 DC 성분을 제거하는 단계; (f) DC 성분이 제거된 데시케이션 신호에 FFT를 취하여 스펙트럼 예측을 한 후, 상기 단계(b)로 피드백하는 단계;(g) 상기 단계(b)에서 현재의 루프수가 기설정한 루프수 카운터치보다 같거나 크다고 체크되는 경우에는 특징인자를 추출하는 단계; 및 (h) 디스플레이 스케일을 조정하여 등고선에 대응하는 색상을 디스플레이하는 단계를 포함하여 이루어진다.Pitch analysis method according to one feature for realizing the first object of the present invention, the pitch of the voice generated due to the periodic characteristics of the gate opening and closing and used in the speech coder, speech recognition, speech conversion, etc. A pitch analysis method for analyzing, comprising: (a) designing a low pass filter and setting a loop count count; (b) checking whether the current loop number count value is larger than the set loop number counter value; (c) if it is checked in step (b) that the current number of loops is less than the preset number of loops, reading data and demodulating the read data; (d) decimating the demodulated data; (e) increasing the current loop number and removing the DC component of the decimated signal; (f) taking an FFT on the de-decimation signal from which the DC component has been removed and performing spectral prediction, and then feeding back to the step (b); (g) a loop number counter preset by the current loop number in the step (b); Extracting a feature factor if checked to be equal to or greater than a value; And (h) adjusting the display scale to display a color corresponding to the contour line.

또한, 상기한 본 발명의 제2 목적을 실현하기 위한 하나의 특징에 따른 피치 분석 장치는, 성문 개폐의 주기적 특성으로 인해 발생되어 음성 모델링 과정에서 음성 부호화기, 음성 인식, 음성 변환 등에 이용되는 음성의 피치를 분석하기 위한 피치 분석 장치에 있어서, 외부로부터 제공되는 음성신호 중에서 원하는 주파수 대역폭내에 포함되는 신호만 출력하는 밴드패스필터; 상기 줄어든 주파수 대역폭에 대해서 자승 처리하는 자승 검출부; 음성신호 중에서 피치가 존재할 가능성이 있는 일정 주파수까지의 신호만 출력하고 나머지는 제거하는 방식을 통해 주파수 대역폭을 줄이고, 줄어든 주파수 대역폭을 갖는 신호를 출력하는 로우패스필터; 상기 줄어든 주파수 대역폭을 갖는 음성신호를 제공받아 얼라이징이 미발생하는 최대의 데시메이션 비율로 데시메이션시키는 데시메이션부; 상기 데시메이션된 신호가 포함하는 DC신호 성분을 제거하고, 원하는 주파수 대역 신호를 출력하는 하이패스필터; 상기 하이패스 필터링된 신호의 스펙트럼을 모델링하는 해닝 윈도우부; 해닝 윈도우부를 통해 모델링된 스펙트럼에 대해서 실시간 푸리에 변환 처리를 수행하는 실시간 푸리에 변환부; 상기 실시간 푸리에 변환된 스펙트럼에 대해서 적분 처리하는 선형 적분부; 및 상기 적분 처리된 주파수 스펙트럼을 규준화 처리한 후 피치 분석 신호를 출력하는 규준화부를 포함하여 이루어진다.In addition, the pitch analysis device according to one feature for realizing the second object of the present invention, generated due to the periodic characteristics of the gate opening and closing of the speech used in the speech coder, speech recognition, speech conversion, etc. A pitch analysis device for analyzing a pitch, comprising: a band pass filter for outputting only a signal included in a desired frequency bandwidth among voice signals provided from the outside; A square detector detecting a square of the reduced frequency bandwidth; A low pass filter for reducing a frequency bandwidth and outputting a signal having a reduced frequency bandwidth by outputting only a signal up to a predetermined frequency in which a pitch may exist among voice signals and removing the rest; A decimation unit configured to receive a voice signal having the reduced frequency bandwidth and decimate at a maximum decimation rate at which no aging occurs; A high pass filter removing a DC signal component included in the decimated signal and outputting a desired frequency band signal; A hanning window modeling a spectrum of the high pass filtered signal; A real-time Fourier transform unit for performing a real-time Fourier transform process on the spectrum modeled through the hanning window unit; A linear integrator for integrating the real-time Fourier transformed spectrum; And a normalizing unit configured to output a pitch analysis signal after normalizing the integrated frequency spectrum.

또한, 상기한 본 발명의 제3 목적을 실현하기 위한 하나의 특징에 따른 사상체질 감별 방법은, (a) 사상체질 감별용 음성 녹음 여부를 체크하여, 상기 음성이 녹음된 것으로 체크되는 경우에는 녹음된 음성을 디지털 변환하여 저장하는 단계; (b) 상기 저장된 음성 데이터로부터 순수 음성 데이터를 추출하는 단계; 및 (c) 상기 순수 음성 데이터를 근거로 사상체질 분류를 위한 디지털 신호 처리 과정을 수행하고, 사상체질 감별을 분류한 정보를 출력하는 단계를 포함하여 이루어진다.In addition, the filamentous constitution discrimination method according to one feature for realizing the third object of the present invention, (a) by checking whether or not the audio recording for Sasang Constitution Discrimination, recording if the voice is recorded Digitally converting and storing the speech; (b) extracting pure voice data from the stored voice data; And (c) performing a digital signal processing process for classifying filamentous constitution based on the pure voice data, and outputting information for classifying filamentous constitution discrimination.

또한, 상기한 본 발명의 제3 목적을 실현하기 위한 다른 하나의 특징에 따른 사상체질 감별 방법은, (a) 사상체질 감별용 음성 녹음 여부를 체크하여, 상기 음성이 녹음된 것으로 체크되는 경우에는 녹음된 음성을 디지털 변환하여 저장하는 단계; (b) 상기 저장된 음성 데이터로부터 순수 음성 데이터를 추출하는 단계; (c) 사상체질 감별용 기본 조사에 의한 예측 완료 여부를 체크하는 단계; (d) 상기 단계(c)에서 예측이 완료되었다고 체크되는 경우에는 해당 예측 정보를 처리하여 제1 사상체질 감별정보를 디스플레이하는 단계; (e) 상기 단계(d)에서 처리된 예측 정보를 저장하고, 사상체질 분류 설정 여부를 체크하는 단계; 및 (f) 상기 단계(e)에서 분류 설정이라 체크되는 경우에는 분류에 의한 디지털 신호 처리 과정을 수행하고, 체질 감별을 분류한 정보를 출력하는 단계를 포함하여 이루어진다.In addition, the filamentous constitution discrimination method according to another feature for realizing the third object of the present invention includes (a) checking whether or not the voice is recorded by checking whether or not the sacrificial constitution discrimination voice is recorded. Digitally converting and storing the recorded voice; (b) extracting pure voice data from the stored voice data; (c) checking whether the prediction is completed by the basic survey for Sasang Constitution Discrimination; (d) if it is checked in step (c) that the prediction has been completed, processing the prediction information to display the first Sasang Constitution discrimination information; (e) storing the prediction information processed in step (d) and checking whether to set the filamentous constitution classification; And (f) if the classification setting is checked in step (e), performing a digital signal processing process based on the classification, and outputting the classification information.

또한, 상기한 본 발명의 제4 다른 목적을 실현하기 위한 하나의 특징에 따른사상체질 감별 시스템은, 마이크로부터 제공되는 아날로그 음성신호를 디지털 음성신호로 변환하는 아날로그-디지털 변환부; 상기 디지털 변환된 음성신호에 대해 로우패스 필터링하고, 일정 임계치 이상의 신호를 추출하여 순수 음성신호를 추출하는 순수 음성신호 추출부; 상기 순수 음성신호의 파형을 분석하여 시간 영역 분석 정보를 출력하는 시간 영역 분석부; 상기 시간 영역 분석 정보에 대하여 피치 분석을 통해 제1 분석 정보를 출력하고, 포만트 분석을 통해 제2 분석 정보를 출력하며, 주파수 영역 분석을 통해 제3 분석 정보를 출력하는 분석부; 상기 제1 내지 제3 분석 정보를 디스플레이하고, 특징인자를 추출하는 그램 전시 및 특징인자 추출부; 상기 추출된 특징인자를 융합하여 출력하는 특징인자 융합부; 및 상기 출력된 특징인자를 체크하여 해당 음성신호에 대응하는 사상체질 분석 정보를 출력하는 사상체질 분석부를 포함하여 이루어진다.In addition, the ideological discrimination system according to one feature for realizing the fourth object of the present invention includes an analog-to-digital converter for converting an analog voice signal provided from a microphone into a digital voice signal; A pure voice signal extracting unit which performs low pass filtering on the digitally converted voice signal and extracts a pure voice signal by extracting a signal having a predetermined threshold or more; A time domain analyzer for analyzing the waveform of the pure voice signal and outputting time domain analysis information; An analysis unit configured to output first analysis information through pitch analysis on the time domain analysis information, output second analysis information through formant analysis, and output third analysis information through frequency domain analysis; A gram display and a feature factor extractor configured to display the first to third analysis information and extract a feature factor; A feature factor fusion unit for fusing and outputting the extracted feature factor; And a Sasang Constitution Analysis unit which checks the output feature factors and outputs Sasang Constitution analysis information corresponding to the corresponding speech signal.

또한, 상기한 본 발명의 제5 목적을 실현하기 위한 하나의 특징에 따른 사상체질 감별 프로그램을 내장한 컴퓨터가 판독 가능한 기록매체는, (a) 사상체질 감별용 음성 녹음 여부를 체크하여, 상기 음성이 녹음된 것으로 체크되는 경우에는 녹음된 음성을 디지털 변환하여 저장하는 단계; (b) 상기 저장된 음성 데이터로부터 순수 음성 데이터를 추출하는 단계; 및 (c) 상기 순수 음성 데이터를 근거로 사상체질 분류를 위한 디지털 신호 처리 과정을 수행하고, 사상체질 감별을 분류한 정보를 출력하는 단계를 포함하여 이루어진다.In addition, a computer-readable recording medium incorporating a Sasang Constitution Discrimination Program according to one feature for realizing the fifth object of the present invention includes (a) checking whether the Sasang Constitution Discrimination Voice recording is performed by If it is checked as recorded, digitally converting and storing the recorded voice; (b) extracting pure voice data from the stored voice data; And (c) performing a digital signal processing process for classifying filamentous constitution based on the pure voice data, and outputting information for classifying filamentous constitution discrimination.

또한, 상기한 본 발명의 제5 목적을 실현하기 위한 다른 하나의 특징에 따른 사상체질 감별 프로그램을 내장한 컴퓨터가 판독 가능한 기록매체는, (a) 사상체질감별용 음성 녹음 여부를 체크하여, 상기 음성이 녹음된 것으로 체크되는 경우에는 녹음된 음성을 디지털 변환하여 저장하는 단계; (b) 상기 저장된 음성 데이터로부터 순수 음성 데이터를 추출하는 단계; (c) 사상체질 감별용 기본 조사에 의한 예측 완료 여부를 체크하는 단계; (d) 상기 단계(c)에서 예측이 완료되었다고 체크되는 경우에는 해당 예측 정보를 처리하여 제1 사상체질 감별정보를 디스플레이하는 단계; (e) 상기 단계(d)에서 처리된 예측 정보를 저장하고, 사상체질 분류 설정 여부를 체크하는 단계; 및 (f) 상기 단계(e)에서 분류 설정이라 체크되는 경우에는 분류에 의한 디지털 신호 처리 과정을 수행하고, 체질 감별을 분류한 정보를 출력하는 단계를 포함하여 이루어진다.In addition, a computer-readable recording medium incorporating a Sasang Constitution Discrimination Program according to another feature for realizing the fifth object of the present invention includes (a) checking whether the Sasang Constitution Discrimination Voice recording is If the voice is checked as being recorded, digitally converting and storing the recorded voice; (b) extracting pure voice data from the stored voice data; (c) checking whether the prediction is completed by the basic survey for Sasang Constitution Discrimination; (d) if it is checked in step (c) that the prediction has been completed, processing the prediction information to display the first Sasang Constitution discrimination information; (e) storing the prediction information processed in step (d) and checking whether to set the filamentous constitution classification; And (f) if the classification setting is checked in step (e), performing a digital signal processing process based on the classification, and outputting the classification information.

이러한 피치 분석 방법 및 그 장치와, 사상체질 감별 방법 및 그 시스템과, 사상체질 감별 프로그램을 내장한 컴퓨터가 판독 가능한 기록매체에 의하면, 마이크를 통해서 입력된 음성의 음향적 특징을 추출하고, 추출된 특징인자를 근거로 음성 주인공의 사상체질을 감별할 수 있다.According to such a pitch analysis method and apparatus, a filamentous discriminant method and system, and a computer-readable recording medium incorporating a filamentous discriminant program, an acoustic characteristic of a voice input through a microphone is extracted and extracted. Based on the characteristic factors, the ideological constitution of the main character can be discriminated.

이하, 첨부한 도면을 참조하여, 본 발명을 보다 상세하게 설명하고자 한다.Hereinafter, with reference to the accompanying drawings, it will be described in detail the present invention.

도 1을 참조하면, 본 발명에 따른 사상체질 감별 시스템은 A/D 변환기(100), 순수 음성신호 추출부(200), 시간 영역 분석부(300), 분석부(400), 그램 전시 및 특징인자 추출부(500), 특징인자 융합부(600), 데이터베이스(700) 및 체질 분석부(800)를 포함한다.Referring to FIG. 1, the Sasang Constitution Discrimination System according to the present invention includes an A / D converter 100, a pure voice signal extractor 200, a time domain analyzer 300, an analyzer 400, a gram display and a feature. Factor extraction unit 500, feature factor fusion unit 600, database 700 and the constitution analysis unit 800 is included.

A/D 변환기(100)는 마이크(90)로부터 제공되는 전기신호로 변환된 아날로그타입의 음성신호를 디지털 변환하고, 도 2에 도시한 바와 같이 디지털 변환된 음성신호를 순수 음성신호 추출부(200)에 제공한다. 이때 디지털 신호로 표본화하는 표본율은 나이퀴스트(Nyquist)의 표본화 이론에 따라 신호의 최대 주파수의 두배로 한다. 본 발명에서는 샘플링 주파수(Fs)를 11025㎐로 하여, 16 비트 형식의 디지털 녹음기를 사용하여 웨이브(WAV) 파일로 저장하였다.The A / D converter 100 digitally converts an analog type voice signal converted into an electric signal provided from the microphone 90, and converts the digital signal into a pure voice signal extractor 200 as shown in FIG. 2. To provide. In this case, the sampling rate of the digital signal is twice the maximum frequency of the signal according to Nyquist's sampling theory. In the present invention, the sampling frequency (Fs) is set to 11025 kHz and stored as a wave (WAV) file using a 16-bit digital recorder.

순수 음성신호 추출부(200)는 디지털 변환된 음성신호에 대해서 로우패스 필터링을 수행한 후 일정 임계치(또는 기준 문턱치) 이상의 신호를 추출하는 기법을 통해 순수 음성신호(pure voice signal)만을 추출하고, 도 3에 도시한 바와 같이 추출된 순수 음성신호를 시간 영역 분석부(300)에 제공한다. 여기서, 기준 문턱치는 신호의 최고값에 일정 계수를 곱한 값이다.The pure voice signal extractor 200 extracts only the pure voice signal through a technique of performing a low pass filtering on the digitally converted voice signal and extracting a signal above a predetermined threshold (or a reference threshold). As shown in FIG. 3, the extracted pure audio signal is provided to the time domain analyzer 300. Here, the reference threshold is a value obtained by multiplying a maximum value of a signal by a predetermined coefficient.

시간 영역 분석부(300)는 순수 음성신호 추출부(200)로부터 제공되는 순수 음성신호의 파형을 분석하고, 도 4에 도시한 바와 같이 분석된 시간 영역 정보를 분석부(400)에 제공한다. 이때 시간 영역 분석부(300)는 하이패스필터와 DC 노치필터 및 x-자승 검출 모듈을 통해 이루어진다.The time domain analyzer 300 analyzes the waveform of the pure voice signal provided from the pure voice signal extractor 200, and provides the analyzed time domain information to the analyzer 400 as shown in FIG. 4. In this case, the time domain analyzer 300 is configured through a high pass filter, a DC notch filter, and an x-square detection module.

분석부(400)는 피치 분석부(410), 포만트 분석부(420), 주파수 영역 분석부(430)로 이루어져, 시간 영역 분석부(300)로부터 제공되는 시간 영역 분석 정보에 대하여 피치 분석(Pitch analysis), 포만트 분석(Formant analysis), 주파수 영역 분석을 통해 분석된 정보를 그램 전시 및 특징인자 추출부(500)에 제공한다.The analyzer 400 includes a pitch analyzer 410, a formant analyzer 420, and a frequency domain analyzer 430, and performs pitch analysis on the time domain analysis information provided from the time domain analyzer 300. Information analyzed through pitch analysis, formant analysis, and frequency domain analysis is provided to the gram display and the feature factor extractor 500.

보다 상세히는, 피치 분석부(410)는 기본 음성 주파수를 추출하기 위해 음성의 낮은 주파수 영역들을 정밀 분석하여 피치 분석 정보를 그램 전시 및 특징인자 추출부(500)에 출력한다. 여기서 피치 분석 정보는 피치의 피크 주파수, 크기, 대역폭에 대한 정보이다. 일반적으로 음성신호에서 피치(Pitch)는 음성신호중에서 가장 기본이 되는 주파수, 즉 시간축에서 커다랗게 나타나는 피크들의 주파수를 의미하며, 성대의 주기적인 떨림에 의해 생성되는 기본 주파수(Fundamental Frequency) 라는 말과 동의로서 사용된다. 이러한 피치는 인간의 청각에 매우 민감하게 반응하는 파라미터로서, 음성신호의 화자(주인공)를 구분하는데 사용하며, 음성신호의 음색에 큰 영향을 미친다.In more detail, the pitch analyzer 410 precisely analyzes low frequency regions of speech to extract basic speech frequencies and outputs pitch analysis information to the gram display and feature factor extractor 500. The pitch analysis information is information about the peak frequency, magnitude, and bandwidth of the pitch. In general, the pitch in a speech signal refers to the most basic frequency of the speech signal, that is, the frequency of peaks appearing on the time axis, and the term fundamental frequency generated by periodic shaking of the vocal cords. Used as a synonym. This pitch is a parameter that is very sensitive to human hearing, and is used to distinguish the speaker (main character) of a voice signal and greatly affects the tone of the voice signal.

도 5a 내지 도 5d는 일반적인 음성신호의 모양과 주파수 스펙트럼을 디스플레이하는데, 특히 도 5a와 도 5b는 유성음(Voiced Speech) 및 무성음(Unvoiced Speech)의 모양을 설명하기 위한 도면이고, 도 5c와 도 5d는 상기한 도 5a와 도 5b에서 각 신호의 주파수 스펙트럼을 통해 포만트 및 피치와의 관계를 설명하기 위한 도면이다.5A to 5D display the shape and frequency spectrum of a general voice signal. In particular, FIGS. 5A and 5B are views for explaining the shapes of voiced and unvoiced speech, and FIGS. 5C and 5D. 5A and 5B are diagrams for describing a relationship between formants and pitches through the frequency spectrum of each signal.

도 5c에 도시한 바와 같이, 전체적으로 미세하게 거의 같은 주기의 톱니바퀴 모양을 하고 있는데, 상기 이웃한 톱니바퀴의 간격이 주파수 영역에서의 피치 주파수이다. 한편, 도 5d에 도시한 무성음 스펙트럼은 피치 특성이 거의 나타나지 않는다.As shown in Fig. 5C, the cogwheels are formed in the same coarse cycle as the whole, and the interval between adjacent cogwheels is the pitch frequency in the frequency domain. On the other hand, the unvoiced spectrum shown in Fig. 5D exhibits almost no pitch characteristics.

이처럼 피치는 고음질합성, 음성압축, 음성부호화 또는 음성인식율의 개선을 위한 중요 파라미터이다. 피치를 추출하기 위한 연구로는 시간영역분석, 주파수 영역분석, 그리고 이들을 조합한 조합영역법 등이 있다. 통상적으로 피치 분석은 신호를 푸리에 변환한 주파수 신호를 로그를 위한 후 이를 다시 역푸리에(Inverse Fourier) 변환하는 캠스펙트럼(Cepstrum)을 이용하여 분석하나, 본 발명에서는 저주파 대역만을 정밀 분석한다. 왜냐하면 원하는 저주파 대역을 선정할 수 있고, 선정된 저주파 대역에 대해서 분석이 가능하기 때문이다. 상기한 피치 분석부(410)에 대한 설명은 후술하는 도 5에서 보다 상세히 설명한다.As such, pitch is an important parameter for improving high quality synthesis, speech compression, speech encoding or speech recognition rate. Studies to extract pitch include time domain analysis, frequency domain analysis, and combination domain method combining them. In general, pitch analysis uses a Fourier transformed frequency signal for the log and then inverse Fourier transform (Cepstrum), but the present invention precisely analyzes only the low frequency band. This is because the desired low frequency band can be selected and the selected low frequency band can be analyzed. The pitch analysis unit 410 will be described in more detail later with reference to FIG. 5.

또한, 포만트 분석부(420)는 주파수 상에서 나타나는 음성의 포락(Envelope) 파형을 검출하여 포만트 분석 정보를 출력한다. 통상적으로 원하는 음을 만들기 위해서는 허파에서 적당한 양의 공기를 압축하고 기존에 기억하고 있던 성도의 모양을 만든 후 각각 서로 다른 성도를 거치면서 나오게되고, 사람의 귀는 그 신호를 감지하고 인식한다. 신호처리 입장에서 보면 허파에서는 백색 잡음(White Noise)이 발생하고, 성도를 지나면서 그 특성에 해당하는 주파수에는 공진이 일어나고 그 외에는 상쇄되어 없어진다고 할 수 있다. 이런 현상은 파이프 오르간과 같은 악기에서 볼 수 있는데, 성대와 비도에서 발생하는 공진 주파수를 포만트 주파수 혹은 간단히 포만트라 한다.In addition, the formant analyzer 420 detects an envelope waveform of the voice appearing on the frequency and outputs formant analysis information. Normally, in order to make a desired sound, a moderate amount of air is compressed in the lungs, and the shape of the previously remembered saints is formed, and each one passes through different saints, and the human ear senses and recognizes the signal. From the standpoint of signal processing, white noise is generated in the lung, and resonance occurs at the frequency corresponding to the characteristic as it passes through the saint and is canceled out. This can be seen in instruments such as pipe organs, where the resonant frequencies occurring in the vocal cords and nasal passages are called the formant frequency, or simply the formant.

상기한 도 5c에서는 스펙트럼의 전체적인 모양이 3개의 봉우리를 가짐을 알 수 있다. 즉, 대략 700[㎐], 1200[㎐], 2600[㎐]에 각각 하나의 큰 봉우리가 보이는데 이것이 바로 주파수 영역에서 관찰한 포만트 주파수이다.In FIG. 5C, it can be seen that the overall shape of the spectrum has three peaks. In other words, one large peak is seen at approximately 700 [Hz], 1200 [Hz] and 2600 [Hz], which is the formant frequency observed in the frequency domain.

이처럼, 포만트는 성대의 기하학적 모양에 따라 달라지고 특정 음성신호는 대표적인 몇 개의 포만트로 대표되어 질 수 있다. 예를들어, '아'라는 음과 '어'라는 음은 사람의 성도변화에 의해서 만들어 낼 수 있으며 이때의 포만트 주파수는각각 다른 양상을 나타낸다.As such, the formant depends on the vocal cord's geometry and a particular voice signal can be represented by several representative forms. For example, the sound of 'a' and 'er' can be produced by changes in human vocal tracts, and the formant frequency at this time is different.

특히, 본 발명에서는 다중 적응필터를 이용하여 조작자에 의해 조작된 게인과 대역을 근거로 음성의 포락(Envelope)을 검출하여 포만트 분석 정보를 출력한다. 이때 포만트 분석 정보는 포만트의 피크 주파수, 크기, 대역폭에 대한 정보이다. 상기한 포만트 분석부(420)에 대한 설명은 후술하는 도 8에서 보다 상세히 설명한다.In particular, the present invention outputs formant analysis information by detecting an envelope of a voice based on a gain and a band operated by an operator using a multiple adaptive filter. The formant analysis information is information about the peak frequency, magnitude, and bandwidth of the formant. Description of the formant analyzer 420 will be described in more detail later with reference to FIG. 8.

또한, 주파수 영역 분석부(430)는 음성신호를 전체 대역에 걸쳐서, 일정 대역 레벨로 분할한 후 분할된 대역 각각에 대해서 정밀 주파수 분석하여 주파수 영역 분석 정보를 출력한다. 이때 주파수 영역 분석 정보는 토널 특징, 토널 주파수간 상관 관계, 에너지 합, 에너지 평균, 토널 주파수의 평균, 표준편차이다.In addition, the frequency domain analyzer 430 divides the voice signal into a predetermined band level over the entire band, and then outputs frequency domain analysis information by performing precise frequency analysis on each of the divided bands. In this case, the frequency domain analysis information includes tonal characteristics, correlations between tonal frequencies, sum of energy, energy average, average of tonal frequencies, and standard deviation.

즉, 동일 시간내에 녹음된 순수 음성이라 할지라도 순수 음성을 구성하는 여러 개의 음들이 혼합된 복합음이 존재하므로 가시광선을 프리즘에 통과시키면 가시광선을 구성하는 빛의 주파수 영역에 따라 적색부터 자색까지 무지개 색으로 분광되듯이, 음성도 주파수 영역 분석부(430)에 의해 음성을 구성하고 있는 각 주파수 영역별 음압 레벨로 분류된다. 이와 같은 성분 주파수별 음압 레벨의 크기를 분석하여 X축에 주파수를, Y축에 음압 레벨의 크기를 표시한 것을 음향 스펙트럼이라 한다. 상기한 주파수 영역 분석부(430)에 대한 설명은 후술하는 도 10에서 보다 상세히 설명한다.That is, even if the pure voice recorded within the same time, there is a compound sound in which several sounds constituting the pure voice are mixed, so when the visible light passes through the prism, the rainbow is changed from red to purple according to the frequency range of the light constituting the visible light. As the color is spectroscopically, the sound is also classified into sound pressure levels for each frequency region constituting the sound by the frequency domain analyzer 430. The magnitude of the sound pressure level on the X axis and the magnitude of the sound pressure level on the Y axis by analyzing the magnitude of the sound pressure level for each component frequency is called an acoustic spectrum. The description of the frequency domain analyzer 430 will be described in more detail later with reference to FIG. 10.

그램 전시 및 특징인자 추출부(500)는 3차원 그램 전시부(510)와 특징인자 추출부(520)로 이루어져, 피치 분석부(410), 포만트 분석부(420) 및 주파수 영역분석부(430) 각각으로부터 제공되는 피치 분석 정보, 포만트 분석 정보 및 주파수 영역 분석 정보를 디스플레이하고, 특징인자들을 추출하여 특징인자 융합부(600)에 제공한다. 이때 추출되는 특징인자들은 주파수 대역별 피크 주파수, 주파수 대역별 피크 주파수의 크기 등을 포함한다.The gram display and feature factor extractor 500 includes a three-dimensional gram display unit 510 and a feature factor extractor 520, and include a pitch analyzer 410, a formant analyzer 420, and a frequency domain analyzer 430. ), Pitch analysis information, formant analysis information, and frequency domain analysis information provided from each of the plurality of pieces are displayed, and the feature factors are extracted and provided to the feature factor fusion unit 600. In this case, the extracted feature factors include the peak frequency for each frequency band, the magnitude of the peak frequency for each frequency band, and the like.

특징인자 융합부(600)는 그램 전시 및 특징인자 추출부(500)로부터 제공되는 정보를 융합하고, 패턴 매칭을 위해 또는 신경망 입력을 위해 데이터를 가공한 융합 정보를 체질 분석부(800)에 제공한다.The feature factor fusion unit 600 fuses the information provided from the gram display and the feature factor extractor 500, and provides the constitution analysis unit 800 with the fusion information obtained by processing the data for pattern matching or neural network input. do.

데이터베이스(700)는 음성 주인공이 도 23에서 도시한 바와 같은 소정의 설문지에 기재한 내용을 근거로 1차적으로 판별한 사상체질 정보를 저장하고, 체질 분석부(800)로부터 해당 주인공의 사상체질 정보 요청에 응답하여 1차적으로 판별한 사상체질 정보를 제공한다.The database 700 stores the Sasang Constitution information primarily determined by the voice main character based on the contents of the predetermined questionnaire as shown in FIG. 23, and the Sasang Constitution information of the main character from the constitution analyzer 800. In response to the request, the primary constitution information is determined.

체질 분석부(800)는 특징인자 융합부(600)로부터 제공되는 융합 정보와 데이터베이스(700)에 저장된 1차 사상체질 정보를 근거로 해당 음성의 주인공이 태양인인지 태음인인지 또는 소양인인지, 소음인인지에 대한 정보를 출력한다. 이때 특징인자 융합부(600)로부터 제공되는 융합 정보가 패턴 매칭을 위한 정보인 경우에는 추출된 특징인자를 패턴화시키고, 이를 인식하여 기저장된 특징인자에 대응하는 패턴 데이터와의 비교를 통해 사상체질을 감별할 수 있을 것이다.The constitution analyzer 800 determines whether the main character of the voice is a sun person, a Taeumin person, or a Soyang person, or a noise person based on the fusion information provided from the feature factor fusion unit 600 and the primary Sasang constitution information stored in the database 700. Print information about In this case, when the fusion information provided from the feature factor fusion unit 600 is information for pattern matching, the extracted feature factors are patterned, recognized, and compared to the pattern data corresponding to previously stored feature factors. Will be able to discriminate.

이상에서 설명한 A/D 변환기(100), 순수 음성신호 추출부(200), 시간 영역 분석부(300), 분석부(400), 그램 전시 및 특징인자 추출부(500), 특징인자 융합부(600) 및 체질 분석부(800)는 별도의 IC로 구현하여 컴퓨터 시스템에 응용할수 있을 것이다.A / D converter 100, pure voice signal extractor 200, time domain analyzer 300, analyzer 400, gram display and feature factor extractor 500, feature factor fusion unit (described above) 600) and the constitution analyzer 800 may be implemented as a separate IC and applied to a computer system.

한편, 특징인자 융합부(600)로부터 제공되는 융합 정보가 신경망 입력을 위한 데이터인 경우에는 상기한 데이터베이스(700)에 1차적으로 판별한 사상체질 정보를 별도로 요청하지 않더라도 이미 학습된 신경망을 통해 해당 음성의 주인공의 사상체질을 감별할 수 있을 것이다.On the other hand, when the fusion information provided from the feature factor fusion unit 600 is data for neural network input, the neural constitution information, which is primarily determined in the database 700, is requested through the neural network that has already been learned, even if it is not separately requested. I can discriminate the ideological constitution of the main character of the voice.

도 6을 참조하면, 본 발명에 따른 피치 분석부(410)는 밴드패스필터(BPF)(411), 자승 검출(Square Law Detection) 모듈(412), 로우패스필터(LPF)(413), 데시메이션 모듈(414), 하이패스필터(HPF)(415), 해닝 윈도우(Hanning Window) 모듈(416), 실시간 고속 푸리에 변환 모듈(417), 선형 적분 모듈(418) 및 규준화 모듈(419)을 포함하여, 시간 영역 분석부(300)로부터 제공되는 시간 영역 분석 정보에 대해서 피치 분석을 수행하고, 그 결과에 따른 피치 분석 정보를 그램 전시 및 특징인자 추출부(500)에 제공한다.Referring to FIG. 6, the pitch analyzer 410 according to the present invention includes a band pass filter (BPF) 411, a square law detection module 412, a low pass filter (LPF) 413, and a desiccant. The simulation module 414, the high pass filter (HPF) 415, the Hanning Window module 416, the real-time fast Fourier transform module 417, the linear integration module 418, and the normalization module 419. In addition, pitch analysis is performed on the time domain analysis information provided from the time domain analyzer 300, and the pitch analysis information according to the result is provided to the gram display and the feature factor extractor 500.

밴드패스필터(411)는 음성신호 중에서 원하는 주파수 대역폭에서 피치 분석을 위해 일정 대역폭내에 포함되는 신호만 출력하고 나머지는 제거하는 방식을 통해 주파수 대역폭을 줄이고, 줄어든 주파수 대역폭을 갖는 신호를 자승 검출 모듈(412)에 제공한다.The band pass filter 411 reduces the frequency bandwidth and outputs a signal having a reduced frequency bandwidth by outputting only a signal included in a predetermined bandwidth for pitch analysis at a desired frequency bandwidth among voice signals and removing the rest. 412).

자승 검출 모듈(412)은 밴드패스필터(411)로부터 제공되는 줄어든 대역폭에 대해서 자승 처리하여 로우패스필터(413)에 출력한다.The square detection module 412 squares the reduced bandwidth provided from the band pass filter 411 and outputs the result to the low pass filter 413.

로우패스필터(413)는 음성신호 중에서 피치가 존재할 가능성이 있는 일정 주파수까지의 신호만 출력하고 나머지는 제거하는 방식을 통해 주파수 대역폭을 줄이고, 줄어든 주파수 대역폭을 갖는 신호를 데시메이션 모듈(414)에 제공한다.The low pass filter 413 reduces the frequency bandwidth and outputs a signal having the reduced frequency bandwidth to the decimation module 414 by outputting only a signal up to a predetermined frequency in which a pitch may exist among voice signals and removing the rest. to provide.

데시메이션 모듈(414)은 줄어든 주파수 대역폭을 갖는 음성신호를 제공받아 얼라이징(aliasing)이 생기지 않는 최대의 데시메이션 비율로 데시메이션시킨 후 하이패스필터(415)에 제공한다. 여기서, 데시메이션을 행하는 목적은 데이터 샘플율을 감소시켜 주파수의 분해능을 높이는데 그 목적이 있다. 즉 통상적으로 동일 수의 주파수 데이터는 데이터의 샘플율에 따라 주파수 분해능이 결정되는데, 이 프로세싱단 이전의 데이터 샘플율은 11,025Hz이나 데시메이션을 걸치면 샘플율을 4배 이상 낮추므로 주파수 분해능이 4배 이상으로 더 좋아진다. 이는 샘플 데이터를 주파수영역에서 정밀분석하기 위함이다.The decimation module 414 receives a voice signal having a reduced frequency bandwidth, decimates at a maximum decimation rate at which no aliasing occurs, and then provides the decimation module 414 to the high pass filter 415. The purpose of decimation is to reduce the data sample rate and to increase the resolution of the frequency. In general, frequency resolution of the same number of frequency data is determined according to the sample rate of the data.The data sample rate before this processing stage is 11,025Hz, but the sample rate is lowered by four times or more by decimation. Better than This is to precisely analyze the sample data in the frequency domain.

하이패스필터(415)는 데시메이션 모듈(414)로부터 제공되는 데시메이션된 신호가 포함하고 있는 DC신호 성분을 완전히 제거하므로 신호분석을 훨씬 용이하도록 한다. 또한 하이패스 필터링을 통해 원하지 않는 주파수 대역에 대해서는 제거한 후 해닝 윈도우(Hanning window) 모듈(416)을 통해 해당 프레임의 스펙트럼을 모델링한 후 실시간 고속 푸리에 변환 모듈(417)에 제공한다. 이때 해닝 윈도우를 취하는 것은 주파수 분석시 사이드로브(Sidelobe)의 신호 성분을 크게 약화시키기 위함이다. 이는 음성신호의 정확한 분석 및 잡음에 대한 전처리 과정중의 하나이다.The high pass filter 415 completely removes the DC signal component included in the decimated signal provided from the decimation module 414 to make signal analysis much easier. In addition, the unwanted frequency bands are removed through high pass filtering, and the spectrum of the corresponding frame is modeled through the hanning window module 416 and then provided to the real-time fast Fourier transform module 417. The Hanning window is used to greatly weaken the signal component of the sidelobe during the frequency analysis. This is one of the accurate analysis of voice signal and preprocessing of noise.

실시간 고속 푸리에 변환 모듈(417), 바람직하게는 실시간 N-FFT(Fast Fourier Transform) 모듈은 해닝 윈도우를 통해 모델링된 스펙트럼에 대해서 실시간 푸리에 변환 처리를 수행하고, 선형 적분 모듈(418)은 푸리에 변환된 스펙트럼에 대해서 적분 처리하며, 규준화 모듈(419)은 적분 처리된 주파수 스펙트럼을 규준화 처리한 후 도 7에 도시한 바와 같은 피치 분석 신호를 특징인자 추출부(510) 및 3차원 그램 전시부(520)에 각각 제공한다. 상기한 규준화 모듈(419)은 음성신호를 절대적인 신호크기로 분석하지 않고 상대적인 크기로 이를 상관 분석함에 따라 여러 가지의 상황, 예를들어 잡음 환경에 따라 동일한 신호가 여러 가지로 잘못 분류되는 것을 방지하기 위함이고, 특징인자들의 객관성을 확보하기 위함이다.The real-time fast Fourier transform module 417, preferably the real-time Fast Fourier Transform (N-FFT) module, performs a real-time Fourier transform process on the spectrum modeled through the Hanning window, and the linear integration module 418 performs Fourier transform Integrating with respect to the spectrum, the normalization module 419 normalizes the processed frequency spectrum, and then characterizes the pitch analysis signal as shown in FIG. 7 by the factor extractor 510 and the three-dimensional gram display 520. To each). The normalization module 419 prevents the same signal from being incorrectly classified according to various situations, for example, a noisy environment, by correlating the voice signal to a relative size rather than analyzing the absolute signal size. This is to ensure the objectivity of the feature factors.

일반적으로 음성 파라메터를 구하는 방법에는 대수 스펙트럼(Logarithmic Spectrum)의 전력 스펙트럼(Power Spectrum), 즉 켑스트럼(Cepstrum)을 분석하면 유성음과 무성음을 구별할 수 있는 켑스트럼의 분석 방법과, 선형 예측 계수 대신에 그것을 직교화한 편자기 상관계수를 이용한 것이 편자기 상관계수법 (PARCOR)과, 과거의 신호들에 의해 현재의 신호를 예측하는 선형예측 방법(LPC) 등이 있으나, 본 발명에서는 DFT의 주기성과 대칭성의 성질을 이용하여 계산 과정을 줄이기 위해 고속 푸리에 변환(FFT; Fast Fourier Transform)방법을 이용한다.In general, the method of obtaining the speech parameter includes analyzing the power spectrum of the logarithmic spectrum, that is, the cepstrum, and the method of analyzing the cepstrum, which can distinguish voiced and unvoiced sound, and linear prediction. The use of orthogonal magnetic field correlation coefficients instead of coefficients is known as the polarization coefficients (PARCOR) and the linear prediction method (LPC) that predicts the current signal by past signals. Fast Fourier Transform (FFT) method is used to reduce the calculation process using the properties of periodicity and symmetry.

한편, 이상에서 설명한 밴드패스필터(BPF)(411), 자승 검출(Square Law Detection) 모듈(412), 로우패스필터(LPF)(413), 데시메이션 모듈(414), 하이패스필터(HPF)(415), 해닝 윈도우(Hanning Window) 모듈(416), 실시간 고속 푸리에 변환 모듈(417), 선형 적분 모듈(418) 및 규준화 모듈(419)은 별도의 IC로 구현하여 컴퓨터 시스템에 응용할 수 있을 것이다.Meanwhile, the band pass filter (BPF) 411, the square law detection module 412, the low pass filter (LPF) 413, the decimation module 414, and the high pass filter (HPF) described above. The 415, the Hanning Window module 416, the real-time fast Fourier transform module 417, the linear integration module 418, and the normalization module 419 may be implemented as separate ICs and applied to computer systems. will be.

도 8을 참조하면, 본 발명에 따른 포만트 분석부(420)는 해닝 윈도우모듈(421), 수학적 자동회귀 모델(Auto-Regressive Model) 산출 모듈(422), 전달 함수 추출 모듈(423), 주파수 응답 모듈(424), 다중 적응 대역 통과필터(425), 해닝 윈도우 모듈(426), 실시간 FFT 모듈(427) 및 규준화 모듈(428)을 포함하여, 시간 영역 분석부(300)로부터 제공되는 시간 영역 분석 정보에 대해서 포만트 분석을 수행하고, 그 결과에 따른 포만트 분석 정보를 그램 전시 및 특징인자 추출부(500)에 제공한다.Referring to FIG. 8, the formant analyzer 420 according to the present invention includes a hanning window module 421, a mathematical auto-regressive model calculation module 422, a transfer function extraction module 423, and a frequency. Time provided from time domain analyzer 300, including response module 424, multiple adaptive band pass filter 425, Hanning window module 426, real time FFT module 427 and normalization module 428 Formant analysis is performed on the area analysis information, and the formant analysis information is provided to the gram display and the feature factor extraction unit 500 according to the result.

해닝 윈도우 모듈(421)은 시간 영역 분석부(300)로부터 제공되는 시간 영역 분석 정보를 제공받아 스펙트럼을 모델링하고, 모델링된 신호를 수학적 자동회귀 모델 산출 모듈(422)에 제공한다.The hanning window module 421 receives the time domain analysis information provided from the time domain analyzer 300 to model the spectrum, and provides the modeled signal to the mathematical autoregression model calculation module 422.

수학적 자동회귀 모델 산출 모듈(422)은 해닝 윈도우 모듈(421)에 의해 모델링된 신호에 대해 파라메타들을 추정하는데, 바람직하게는 율-워커(Yule-Walker)의 모델링식을 사용하며, 이때 사용되는 하기하는 수학식 1과 같다.The mathematical autoregressive model calculation module 422 estimates parameters for the signal modeled by the Hanning window module 421, preferably using a Yule-Walker modeling equation, Equation 1 is as follows.

전달 함수 추출 모듈(423)은 상기한 율-워커(Yule-Walker)의 모델링식에 의해 산출되고, 추정된 파라메타들을 이용하여 입력신호에 대한 출력신호의 전달함수를 추출해낸다. 여기서, 언급하는 전달함수란 일반적인 시스템에 입력과 출력의 특성을 함수로 표현한 것과 동일하다.The transfer function extraction module 423 is calculated by the Yule-Walker modeling equation and extracts the transfer function of the output signal for the input signal using the estimated parameters. Here, the transfer function mentioned is equivalent to expressing the characteristics of input and output as a function in a general system.

주파수 응답 모듈(424)은 전달 함수 추출 모듈(423)로부터 제공되는 추출된전달 함수에 대응하는 주파수 응답 특성을 추출하고, 추출된 주파수 응답 특성을 특징인자 추출부(510) 및 3차원 그램 전시부(520)에 각각 제공한다.The frequency response module 424 extracts a frequency response characteristic corresponding to the extracted transfer function provided from the transfer function extraction module 423, and extracts the frequency response characteristic from the factor extractor 510 and the three-dimensional gram display unit. 520 respectively.

한편, 다중 적응 대역 통과필터(425)는 시간 영역 분석부(300)로부터 제공되는 시간 영역 분석 정보를 제공받아 원하는 주파수 대역 및 게인(gain)을 각각 다르게 하여 시간 영역 분석 정보를 필터링한 후 해닝 윈도우 모듈(426)에 제공한다. 이때 주파수 대역이나 게인은 서로 다르게 설정할 수 있을 것이다. 이처럼 다중 적응 대역 통과필터(425)를 사용한 것은 음성신호 특유의 신호 특성과 발음별로 지니고 있는 고정된 특징을 제거하여 신호의 객관화 및 투명성을 높이기 위함이다. 또한 집중 분석하기 위한 대역에 대해 신호 대 잡음비를 좋게하기 위함이다.On the other hand, the multi-adaptive band pass filter 425 receives the time domain analysis information provided from the time domain analyzer 300 to filter the time domain analysis information by varying the desired frequency band and gain, respectively, and then a hanning window. To module 426. At this time, the frequency band or gain may be set differently. The use of the multi-adaptive band pass filter 425 is to increase the objectification and transparency of the signal by removing the signal characteristics peculiar to the voice signal and the fixed characteristics of each voice. It is also to improve the signal-to-noise ratio for the band for intensive analysis.

해닝 윈도우 모듈(426)은 서로 다른 대역폭과 게인을 통해 필터링된 신호에 대해서 해닝 윈도우를 통해 모델링하고, 모델링된 신호를 실시간 FFT 모듈(427)에 제공한다. 여기서, 이용되는 해닝 윈도우의 길이가 증가하면 스펙트럼이 정교해지고, 스펙트럼 분해능(Spectral Resolution)이 좋아진다. 한편, 시간축의 길이는 주파수 영역에서의 대역폭과 반비례한다는 특성을 고려하면 시간 분해능(Time Resolution)이 감소하므로 음성신호 특성의 시간적 변화를 충분히 모델링할 수 없는 현상을 유발하기 때문에 윈도우 길이가 적당히 조정하여 스펙트럼 분해능(Spectral Resolution)과 시간 분해능(Time Resolution)간의 트레이드 오프(Trade Off) 과정이 필요가 있다.The hanning window module 426 models a signal filtered through different bandwidths and gains through a hanning window and provides the modeled signal to the real-time FFT module 427. Here, as the length of the hanning window used increases, the spectrum is refined and the spectral resolution is improved. On the other hand, considering the characteristic that the length of the time axis is inversely proportional to the bandwidth in the frequency domain, the time resolution decreases, which causes a phenomenon in which the temporal change of the speech signal characteristic cannot be sufficiently modeled. There is a need for a trade off process between spectral resolution and time resolution.

실시간 FFT 모듈(427)은 해닝 윈도우 모듈(426)로부터 모델링된 신호를 제공받아 실시간 푸리에 변환을 수행한 후 규준화 모듈(428)에 제공하고, 규준화모듈(428)은 푸리에 변환된 신호를 규준화 처리한 후 도 9에 도시한 바와 같이 포만트 분석 신호를 특징인자 추출부(510) 및 3차원 그램 전시부(520)에 각각 제공한다.The real-time FFT module 427 receives the modeled signal from the Hanning window module 426, performs a real-time Fourier transform, and then provides it to the normalization module 428. The normalization module 428 normalizes the Fourier transformed signal. After processing, the formant analysis signal is provided to the feature factor extractor 510 and the 3D gram display unit 520, respectively, as shown in FIG.

한편, 이상에서 설명한 해닝 윈도우 모듈(421), 수학적 자동회귀 모델(Auto-Regressive Model) 산출 모듈(422), 전달 함수 추출 모듈(423), 주파수 응답 모듈(424), 다중 적응 대역 통과필터(425), 해닝 윈도우 모듈(426), 실시간 FFT 모듈(427) 및 규준화 모듈(428)은 별도의 IC로 구현하여 컴퓨터 시스템에 응용할 수 있을 것이다.Meanwhile, the Hanning window module 421 described above, the mathematical auto-regressive model calculation module 422, the transfer function extraction module 423, the frequency response module 424, and the multiple adaptive band pass filter 425. ), The Hanning window module 426, the real-time FFT module 427, and the normalization module 428 may be implemented as separate ICs and applied to computer systems.

도 9를 참조하면, 봉우리가 6개인 파형을 확인할 수 있고, 이러한 봉우리가 포만트를 의미한다.Referring to Figure 9, it can be seen that the waveform of six peaks, this peak means the formant.

물론 상기한 도 8에서는 사각 윈도우(Rectangle Window)보다 스펙트럼을 모델링하는데 우수한 해닝 윈도우(Hanning Window)를 이용하여 포만트를 분석하는 것을 설명하였으나, 해밍 윈도우(Hamming Window)나 파젠 윈도우(Parzen Window), 바틀렛 윈도우(Bartlet Window) 등을 이용하여 구현할 수도 있을 것이다.Of course, in FIG. 8, the analysis of the formant using a Hanning Window, which is superior in modeling a spectrum than a Rectangular Window, has been made, but a Hamming Window, a Parzen Window, It can also be implemented using a Bartlet Window.

도 10을 참조하면 주파수 분석부(430)는 자승 검출 모듈(431), 로우패스필터(432), 데시메이션 모듈(433), 하이패스필터(434), 데이터 오버랩 및 주파수 쉬프팅 모듈(435), 밴드 선택 모듈(436), 보간 모듈(437) 및 규준화 모듈(438)을 포함하여, 시간 영역 분석부(300)로부터 제공되는 시간 영역 분석 정보에 대하여 주파수 분석 처리를 통해 산출한 주파수 영역 분석 정보를 특징인자추출부(510) 및 3차원 그램 전시부(520)에 각각 제공한다.Referring to FIG. 10, the frequency analyzer 430 may include a square detection module 431, a low pass filter 432, a decimation module 433, a high pass filter 434, a data overlap and frequency shifting module 435, Frequency domain analysis information calculated through a frequency analysis process on the time domain analysis information provided from the time domain analyzer 300, including the band selection module 436, the interpolation module 437, and the normalization module 438. It is provided to the feature factor extraction unit 510 and the three-dimensional gram display unit 520, respectively.

자승 검출 모듈(431)은 시간 영역 분석부(300)로부터 제공되는 시간 영역 분석 정보를 자승 처리하여 로우패스필터(432)에 출력하고, 로우패스필터(432)는 자승 처리된 시간 영역 분석 정보를 로우패스 필터링하여 저역 대역만의 정보만을 데시메이션 모듈(433)에 제공한다.The square detection module 431 squares and processes the time domain analysis information provided from the time domain analyzer 300 to the low pass filter 432, and the low pass filter 432 outputs the squared time domain analysis information. Low pass filtering provides only the low band information to the decimation module 433.

데시메이션 모듈(433)은 로우패스필터(432)로부터 줄어든 주파수 대역폭을 갖는 음성신호를 제공받아 얼라이징(aliasing)이 생기지 않는 최대의 데시메이션 비율로 데시메이션시킨 후 하이패스필터(434)에 제공한다. 여기서, 데시메이션을 행하는 목적은 데이터 샘플율을 감소시켜 주파수 분해능을 높이는데 그 목적이 있다. 즉 같은 수의 주파수 데이터는 데이터의 샘플율에 따라 주파수의 분해능이 결정되는데, 데시메이션 프로세싱 단 이전의 데이터 샘플율은 11,025㎐이나 데시메이션 프로세싱단을 경유하면 데이터 샘플율을 1/4 이상 낮추므로 주파수 분해능이 4배 이상으로 더 좋아진다. 이는 샘플 데이터를 주파수 영역에서 정밀 분석하기 위함이다.The decimation module 433 receives the voice signal having the reduced frequency bandwidth from the low pass filter 432 to decimate at the maximum decimation ratio without aliasing and then provides the decimation module to the high pass filter 434. do. The purpose of decimation is to reduce the data sample rate and to increase the frequency resolution. In other words, the frequency of the same number of frequency data is determined according to the sample rate of the data. The data sample rate before the decimation processing stage is 11,025 Hz but the data sample rate is lowered by 1/4 or more through the decimation processing stage. The frequency resolution is better than four times better. This is to precisely analyze the sample data in the frequency domain.

하이패스필터(434)는 데시메이션 처리된 시간 영역 분석 정보를 하이패스 필터링하여 고역 대역만의 정보만을 데이터 오버랩 및 주파수 쉬프팅 모듈(435)에 제공한다.The high pass filter 434 performs high pass filtering on the decimated time domain analysis information to provide only the high band information to the data overlap and frequency shifting module 435.

데이터 오버랩 및 주파수 쉬프팅 모듈(435)은 주파수 영역에서 얼라이징이 생기지 않도록 시간 영역에서 데이터를 오버랩 시키고, 오버랩 된 데이터에 대응하는 주파수를 쉬프팅시킨 후 밴드 선택 모듈(436)에 제공한다. 여기서, 주파수 쉬프팅은 원하는 대역의 신호를 정밀분석하기 위함이다. 즉 데시메이션된 분석 신호를 확대(Zoom)하기 위해서는 저역 통과 필터링(1㎑ 이하 대역)을 해야 하는데 정밀 분석을 하고자 하는 대역이 고역 대역(1㎑ 이상 대역)에 있을 수 있다. 그러므로 고역 대역에 있는 신호를 저역 대역으로 쉬프팅시킨 후 저역 통과 필터링을 통해 신호를 확대하여 볼 수 있어 보다 세밀한 분석이 용이하다. 또한, 데이터를 오버랩시키는 것은 주파수 영역으로의 변환 과정 중에 데이터의 끊김 현상에 의해 발생되는 잡음성신호 성분을 최소화하고 신호의 연속성을 가속화하기 위함이다.The data overlap and frequency shifting module 435 overlaps the data in the time domain so as not to cause an aliasing in the frequency domain, shifts the frequency corresponding to the overlapped data, and provides the band selection module 436. Here, frequency shifting is for precisely analyzing a signal of a desired band. In other words, in order to zoom the decimated analysis signal, low pass filtering (a band of 1 ㎑ or less) must be performed, and a band to be precisely analyzed may be in a high band (1 ㎑ or more band). Therefore, after shifting the signal in the high band to the low band, the signal can be enlarged through low pass filtering for easier analysis. In addition, overlapping the data is to minimize the noise signal component caused by the data breakage during the conversion to the frequency domain and to accelerate the continuity of the signal.

밴드 선택 모듈(436)은 쉬프트된 주파수들 중 원하는 대역폭을 선택하고, 선택된 대역의 신호를 보간 모듈(437)에 제공하며, 보간 모듈(437)은 선택된 대역폭에 대해 보간 처리를 한 후 규준화 모듈(438)에 제공하면, 규준화 모듈(438)은 보간 처리된 신호를 규준화 처리하여 특징인자 추출부(510) 및 3차원 그램 전시부(520)에 각각 제공한다. 상기한 보간 처리는 특징인자 추출부(510) 및 3차원 그램 전시부(520)에 보다 정확한 정보를 제공하기 위해 큐빅 스플라인(Cubic Spline) 보간 기법을 적용한 데이터 스무딩을 수행한다. 실제 데이터와 데이터 사이에 10배 이상의 데이터를 더 산출하여 보간하므로 정밀 정보 추출이 용이하고, 데이터 스무딩에 따른 전시 또는 디스플레이가 매끄럽게 이루어지도록 한다.The band selection module 436 selects a desired bandwidth among the shifted frequencies, provides a signal of the selected band to the interpolation module 437, and the interpolation module 437 performs an interpolation process on the selected bandwidth and then normalizes the module. In operation 438, the normalization module 438 normalizes the interpolated signal and provides it to the feature factor extractor 510 and the 3D gram display 520, respectively. The interpolation process performs data smoothing using cubic spline interpolation to provide more accurate information to the feature factor extractor 510 and the 3D gram display 520. More than 10 times more data is calculated and interpolated between the actual data and the data, so that it is easy to extract precise information and smoothly display or display according to the data smoothing.

한편, 이상에서 설명한 자승 검출 모듈(431), 로우패스필터(432), 데시메이션 모듈(433), 하이패스필터(434), 데이터 오버랩 및 주파수 쉬프팅 모듈(435), 밴드 선택 모듈(436), 보간 모듈(437) 및 규준화 모듈(438)은 별도의 IC로 구현하여 컴퓨터 시스템에 응용할 수 있을 것이다.Meanwhile, the square detection module 431, the low pass filter 432, the decimation module 433, the high pass filter 434, the data overlap and frequency shifting module 435, the band selection module 436, The interpolation module 437 and the normalization module 438 may be implemented as separate ICs and applied to computer systems.

도 11을 참조하면, X축을 주파수 영역으로 하고, Y축을 데시벨(dB)을 단위로 하는 진폭(또는 에너지 레벨)으로 도시하며, 0 내지 1㎑까지의 주파수 영역상에 3개의 토널, 즉 0.25㎑, 0.5㎑ 및 0.75㎑의 영역에 토널이 있는 것을 확인할 수 있다. 이러한 토널을 이용하여 토널 주파수와, 토널 주파수간의 상관 관계 등을 확인할 수 있을 것이다.Referring to FIG. 11, the X-axis is represented by the frequency domain, the Y-axis is represented by the amplitude (or energy level) in decibels (dB), and three tonals on the frequency range from 0 to 1 Hz, that is, 0.25 Hz. It can be seen that the tonal is in the region of 0.5 ms and 0.75 ms. The tonal frequency may be used to identify the correlation between the tonal frequency and the tonal frequency.

물론 도면상에서는 0 내지 1㎑까지의 주파수 영역에 대해서만 도시하였으나, 1㎑ 내지 2.1㎑, 2.1㎑ 내지 3.1㎑, 3.1㎑ 내지 4.1㎑, 및 4.1㎑ 내지 5㎑까지의 주파수 영역 각각에 대해서도 도시할 수 있을 것이며, 각각의 주파수 영역별로 토널 주파수와, 토널 주파수간의 상관 관계 등을 확인할 수 있을 것이다.Of course, although only the frequency ranges from 0 to 1 kHz are shown in the drawings, the frequency ranges from 1 GHz to 2.1 GHz, 2.1 GHz to 3.1 GHz, 3.1 kHz to 4.1 GHz, and 4.1 kHz to 5 GHz can also be shown. There will be a correlation between the tonal frequency and the tonal frequency in each frequency domain.

도 12를 참조하면, 상기한 도 11을 진폭을 관점으로 관찰한 도면으로, 특히 X축은 [㎑]를 단위로 하는 주파수 영역으로 하고, Y축은 [sec]를 단위로 하는 시간 영역으로 하여 관찰한 도면이다. 도시된 바에 의하면, 0.25㎑ 근방의 토널 주파수에서 주파수 선을 추적할 수 있다. 보다 상세히는, 토널 주파수의 시간에 따른 주파수 변이를 데이터 정보로 산출하기 위해 주파수선 추적을 수행하는데, 이때 칼만 필터(Kalman Filter) 또는 알파필터(α-Filter)를 적용하여 데이터에 대한 추정을 실시하여 최적의 주파수 선을 추적한다.Referring to FIG. 12, the above-described FIG. 11 is observed from the viewpoint of amplitude. In particular, the X-axis is observed as a frequency domain in units of [Hz] and the Y-axis is observed in a time domain in units of [sec]. Drawing. As can be seen, the frequency line can be traced at a tonal frequency around 0.25 Hz. More specifically, frequency line tracking is performed to calculate the frequency variation over time of the tonal frequency as data information, wherein a Kalman filter or an alpha filter is applied to estimate the data. To track the optimal frequency line.

도 13을 참조하면, X축 방향은 0 내지 4.5㎑까지의 주파수 영역을 도시하고, Y축 방향은 0 내지 300까지의 시간 영역을 도시하며, Z축 방향은 0 내지 12레인지의 에너지 레벨을 도시하면서 3차원 공간상에서 신호 분석 파형을 도시한다.Referring to Fig. 13, the X-axis direction shows a frequency range from 0 to 4.5 kHz, the Y-axis direction shows a time domain from 0 to 300, and the Z-axis direction shows an energy level of 0 to 12 ranges. While showing the signal analysis waveform in three-dimensional space.

도 14는 상기한 도 13을 Z 축 방향에서 X, Y 축 방향을 관찰한 시뮬레이션 도면으로, 특히, Z 축 방향에서 X 축 방향의 주파수 영역과 Y 축 방향의 시간 영역을 도시한다. 이때 Z 축 방향에 대해서는 도시하지 않으나, 컬러 표현으로 충분히 가능함을 확인할 수 있다.FIG. 14 is a simulation view in which the X and Y axis directions of FIG. 13 are observed in the Z axis direction. In particular, FIG. 13 illustrates a frequency domain in the X axis direction and a time domain in the Y axis direction in the Z axis direction. In this case, although not illustrated in the Z-axis direction, it can be confirmed that the color representation is sufficiently possible.

도 15를 참조하면, 주파수 대역별 잡음 제거후 검출한 피크 주파수, 산출된 대역별 평균 에너지, 기본 주파수 검출후 산출된 표준 편차를 특징인자 추출부(510)로부터 제공받아 특징인자를 비교, 분석 및 판단하여 음성의 주인공이 태양인인지, 태음인진지, 소양인인지, 소음인인지에 대해서 분류하여 출력한다.Referring to FIG. 15, the feature factor extractor 510 compares, analyzes, and compares a peak frequency detected after noise cancellation for each frequency band, a calculated average energy for each band, and a standard deviation calculated after detecting a fundamental frequency from the feature factor extractor 510. Judging whether the hero of the voice is Sunin, Taeininjin, Soyangin, or Noisein and outputs.

즉, 높은 주파수 대역에서 음성신호가 검출되는 것으로 판단되는 경우에는 태양인으로 분류하여 출력한다. 이때 태양인으로 분류된 음성은 일반적으로 음성이 높고, 맑고, 둥글다는 특성이 있고, '상' 소리와 화합하는 것으로 알려져 있다.That is, when it is determined that a voice signal is detected in a high frequency band, it is classified as a solar phosphorus and output. In this case, voices classified as sun people generally have high voice, clear voice, and roundness, and are known to harmonize with 'sang' sound.

또한, 음성신호의 전체 대역에 걸쳐 에너지 합이 높은 것으로 판단되는 경우에는 태음인으로 분류하여 출력한다. 이때 태음인으로 분류된 음성은 일반적으로 음성이 무겁고, 탁하고, 모가나는 특성이 있고, '궁' 소리와 화합하는 것으로 알려져 있다.In addition, when it is determined that the sum of energy over the entire band of the voice signal is high, it is classified and output as Taeumin. At this time, voices classified as Taeumin are generally known to have heavy, muddy, fuzzy characteristics, and harmonize with 'bow' sounds.

또한, 음성신호의 전 대역에 걸쳐서 에너지 합이 낮은 것으로 판단되는 경우에는 소양인으로 분류하여 출력한다. 이때 소양인으로 분류된 음성은 일반적으로 가볍고, 낮고, 급하게 물러가는 특성이 있으며, '치' 소리와 화합한다.If it is determined that the sum of energy over the entire band of the voice signal is low, it is classified as a person and outputted. In this case, voices classified as Soyangin are generally light, low, and withdrawal, and harmonize with the 'chi' sound.

또한, 음성신호의 기본주파수의 표준 편차가 큰 것으로 판단되는 경우에는 소음인으로 분류하여 출력한다. 이때 소음인으로 분류된 음성은 일반적으로 발동하고, 원만하고 평이하다는 특성이 있고, '우' 소리와 화합한다. 물론, 음성 주인공의 사상체질을 정확히 픽싱(Fixing)시켜 태양인, 태음인, 소양인, 소음인 중 어느 하나를 출력할 수도 있고, 분류된 사상체질 각각에 대응하여 확률로서 출력할 수도 있을 것이다.In addition, when it is determined that the standard deviation of the fundamental frequency of the audio signal is large, the noise is classified and output as noise noise. At this time, the voice classified as a noise person is generally activated, has a characteristic of being smooth and flat, and harmonizes with the 'right' sound. Of course, by correcting the filamentous constitution of the voice main character (Fixing) may output any one of the sun person, Taeumin person, Soyangin person, noise person, or may be output as a probability corresponding to each of the classified filamentous constitution.

도 16을 참조하면, 먼저 음성이 녹음되었는지의 여부를 체크하여(단계 S10), 음성이 녹음된 것으로 체크되는 경우에는 해당 음성은 아날로그 신호이므로 디지털 신호로 변환한다(단계 S20).Referring to Fig. 16, first, it is checked whether or not a voice is recorded (step S10), and when the voice is checked as being recorded, the voice is converted into a digital signal because it is an analog signal (step S20).

이어 디지털 변환된 음성을 저장한다(단계 S30). 이때 자동으로 저장할 수도 있고, 아니면 조작자의 조작에 의해 수동으로 저장할 수도 있을 것이다.The digitally converted voice is then stored (step S30). At this time, it may be saved automatically or manually by an operator's operation.

이어, 저장된 음성 데이터에는 주인공의 음성뿐만 아니라 주변의 노이즈 성분까지 녹음되므로 이러한 노이즈가 제거된 순수 음성 데이터만을 추출한다(단계 S40). 상기한 순수 음성 데이터 추출 단계에 대해서는 후술하는 도 16에서 보다 상세하게 설명한다.Subsequently, not only the main character's voice but also the surrounding noise components are recorded in the stored voice data, thereby extracting only the pure voice data from which the noise is removed (step S40). The pure voice data extraction step will be described in more detail later with reference to FIG. 16.

이어, 기본 조사에 의해 예측이 완료되었는지의 여부를 체크한다(단계 S50).이때 기본 조사라 함은 일정의 설문지 형태의 조사라 할 수 있는데, 녹음되는 음성의 주인공이 직접 작성하는 타입으로 이루어질 수 있다.Then, it is checked whether the prediction is completed by the basic survey (step S50). In this case, the basic survey may be referred to as a survey of a certain questionnaire, which may be made by a type directly written by the main character of the recorded voice. have.

단계 S50에서 기본 조사에 의한 예측이 완료되었다고 체크되는 경우에는 해당 예측 정보를 처리하여 체질 감별을 디스플레이한다(단계 S60).When it is checked in step S50 that the prediction by the basic survey is completed, the prediction information is processed to display the constitution discrimination (step S60).

이어, 예측된 정보를 저장하고(단계 S70), 분류가 설정되었는지의 여부를 체크한다(단계 S80).Then, the predicted information is stored (step S70), and it is checked whether or not classification is set (step S80).

이어, 분류에 의한 디지털 신호 처리 과정을 수행하고, 체질 감별을 분류한 정보를 디스플레이한다(단계 S90). 상기한 디지털 신호 처리 과정에 대해서는 후술하는 도 18에서 보다 상세하게 설명한다.Subsequently, the digital signal processing process according to the classification is performed, and information for classifying the constitution is displayed (step S90). The digital signal processing process will be described in more detail later with reference to FIG. 18.

이어 종료 여부를 체크하여(단계 S100), 종료라 체크되는 경우에는 종료한다.Then, it is checked whether or not to end (step S100), and if it is checked to end, it ends.

도 17을 참조하면, 먼저 아날로그-디지털 변환된 음성에 대해 로우패스 필터링을 수행한다(단계 S410). 일반적으로 사람의 음성이 대략 0 내지 5㎑ 내에 존재하므로 데이터 처리에 소요되는 부하를 줄이기 위해 5㎑를 이탈하는 주파수 영역에 대해서는 필터링을 통해 제거한다.Referring to FIG. 17, first, low pass filtering is performed on an analog-digital converted voice (step S410). In general, since the human voice is in the range of approximately 0 to 5 kHz, the frequency domain beyond 5 kHz is filtered out to reduce the load required for data processing.

이어, 로우패스 필터링된 음성신호에는 음성 주인공의 음성 외에 주변의 소음 등이 포함되므로 이러한 소음을 제거한 순수 음성신호만을 추출한다(단계 S420).Subsequently, since the low-pass filtered voice signal includes ambient noise in addition to the voice of the voice main character, only the pure voice signal from which the noise is removed is extracted (step S420).

이어, 순수 음성신호를 포함하는 새로운 파일로 저장한다(단계 S430). 이때 저장되는 파일 형태는 바람직하게는 웨이브 타입일 것이다.Subsequently, a new file including the pure voice signal is stored (step S430). In this case, the stored file type may be a wave type.

이어, 도 3에 도시한 바와 같은 형태로 순수 음성신호를 디스플레이한 후 종료한다(단계 S440).Subsequently, the pure audio signal is displayed in the form as shown in FIG. 3 and then terminates (step S440).

도 18을 참조하면, 먼저 포만트 분석 처리를 수행하고, 2-D 포만트 그래프를 디스플레이한 후(단계 S910), 포만트 특징인자를 저장한다(단계 S920). 여기서, 포만트 분석 처리에 대한 설명은 후술하는 도 19에서 보다 상세히 설명한다.Referring to FIG. 18, a formant analysis process is first performed, a 2-D formant graph is displayed (step S910), and a formant feature factor is stored (step S920). Here, the description of the formant analysis processing will be described in more detail with reference to FIG. 19 described later.

이어, 피치 처리 과정을 수행하고, 3-D 피치 그램을 디스플레이한 후(단계 S930), 피치 특징인자를 저장한다(단계 S940). 여기서, 피치 처리 과정에 대한 설명은 후술하는 도 20에서 보다 상세히 설명한다.Subsequently, the pitch processing is performed, the 3-D pitch gram is displayed (step S930), and the pitch feature factors are stored (step S940). Here, the description of the pitch processing will be described in more detail with reference to FIG. 20 to be described later.

이어, 주파수 분석 처리를 수행하고, 3-D 주파수 그램을 디스플레이한다(단계 S950). 여기서, 주파수 분석 처리에 대한 설명은 후술하는 도 21에서 보다 상세히 설명한다.Then, frequency analysis processing is performed, and a 3-D frequency gram is displayed (step S950). Here, the description of the frequency analysis processing will be described in more detail with reference to FIG. 21 described later.

이어, 신규 포만트 분석 처리를 수행하고(단계 S960), 주파수 특징인자를 저장한다(단계 S970). 여기서, 신규 포만트 분석 처리에 대한 설명은 후술하는 도 22에서 보다 상세히 설명한다.Next, a new formant analysis process is performed (step S960), and the frequency characteristic factor is stored (step S970). Here, the description of the new formant analysis process will be described in more detail in FIG. 22 described later.

이어, 패턴 매칭이나 신경망을 이용하여 특징인자를 융합하고, 더 나아가 음성의 주인공이 도 23과 같이 설문지 형태로 작성한 데이터를 근거로 음성 주인공의사상체질에 대한 1차 감별 데이터와 융합한다(단계 S980).Subsequently, the feature factors are fused using pattern matching or a neural network, and further, the main character of the voice is fused with primary discrimination data for the ideology of the main character of the voice based on the data created in the form of a questionnaire as shown in FIG. 23 (step S980). ).

여기서, 패턴 매칭과 신경망은 정확한 입력 정보, 즉 체질이 완전히 판별된 사람의 음성 정보에서 추출된 특징인자를 입력으로 하고 판별된 체질을 출력으로 하여 이를 객관화하고 식별할 수 있는 패턴을 산출한다. 상기 산출된 패턴은 출력과 비교하여 오차를 보정하여 다시 적용되며, 이러한 과정을 무수히 반복하므로 산출된 패턴과 출력간의 오차를 최소화하는 최적의 패턴을 구한다.Here, the pattern matching and the neural network input the feature factor extracted from the accurate input information, that is, the voice information of the person whose constitution is completely determined, and the determined constitution as an output to calculate a pattern capable of objectifying and identifying the constitution. The calculated pattern is applied again by correcting the error compared to the output, and this process is repeated numerous times to obtain an optimal pattern that minimizes the error between the calculated pattern and the output.

이와 동일하게 신경망도 퍼셉트론 학습론에 따른 학습을 반복 수행하므로 최적의 연결 강도, 즉 가중치를 구하여 효율적인 분류 신경망을 모델링한다. 따라서 상기한 단계S980에서 이루어지는 패턴 매칭과 신경망은 이미 학습되어 구성되어진 것이다.Similarly, since neural networks repeat the learning according to the perceptron learning theory, we model the efficient classification neural network by obtaining the optimal connection strength, that is, the weight. Therefore, the pattern matching and the neural network made in step S980 have already been learned.

이어, 체질 감별의 예측치를 디스플레이한다(단계 S990). 이때 디스플레이되는 체질 감별 예측치는 4가지 사상체질 중 가장 확률이 높은 사상체질을 지목하여 디스플레이할 수도 있고, 여러 사상체질에 대한 확률로서 디스플레이할 수도 있을 것이다.Then, the predictive value of the constitution discrimination is displayed (step S990). In this case, the displayed constitution discrimination prediction value may be displayed by pointing to the most likely Sasang constitution among the four Sasang constitutions, or may be displayed as a probability for several Sasang constitutions.

도 19를 참조하면, 먼저 포만트 분석이 수행하기 위해 해밍 윈도우를 설계하고, 설계된 해밍 윈도우를 이용하여 신호의 스펙트럼을 모델링한다(단계 S9110).Referring to FIG. 19, first, a hamming window is designed for formant analysis to be performed, and a spectrum of a signal is modeled using the designed hamming window (step S9110).

이어, 모델링된 음성을 수학적 자동회귀 모델(auto-regressive model)을 통해 모델링하고(단계 S9120), 음성 트랙의 전이 함수를 산출한다(단계 S9130).Next, the modeled voice is modeled through a mathematical auto-regressive model (step S9120), and a transition function of the voice track is calculated (step S9130).

이어, 음성 트랙의 주파수 응답 특성을 산출하고(단계 S9140), 산출된 주파수 응답 특성을 근거로 주파수 대역폭에 대한 피크 포인트를 검출한 후(단계 S9150), 특징인자를 추출한다(단계 S9160).Next, the frequency response characteristic of the audio track is calculated (step S9140), and after detecting the peak point for the frequency bandwidth based on the calculated frequency response characteristic (step S9150), the feature factor is extracted (step S9160).

도 20을 참조하면, 먼저 로우패스필터를 디자인하고(단계 S9310), 현재의 루프수 카운트치가 기설정한 루프수 카운터치보다 큰 지의 여부를 체크한다(단계 S9320). 여기서, 루프수는 분석하고자 하는 주파수 대역의 주파수 분해능을 고려하여 설정한다.Referring to Fig. 20, a low pass filter is first designed (step S9310), and it is checked whether the current loop number count value is larger than the preset loop number counter value (step S9320). Here, the number of loops is set in consideration of the frequency resolution of the frequency band to be analyzed.

단계 S9320에서 현재의 루프수가 기설정한 루프수보다 작다고 체크되는 경우에는 데이터를 읽고(단계 S9325), 읽은 데이터를 복조한다(단계 S9330).If it is checked in step S9320 that the current loop number is smaller than the preset loop number, the data is read (step S9325) and the read data is demodulated (step S9330).

이어, 복조된 데이터에 대해서 데시메이션 처리를 한다(단계 S9335). 이때 수행되는 데시메이션은 시간축에서 샘플율을 줄이기 위함이다.Then, a decimation process is performed on the demodulated data (step S9335). The decimation performed at this time is to reduce the sample rate on the time axis.

이어, 현재의 루프수를 증가시키고(단계 S9340), 데시메이션된 신호의 DC 성분을 제거하기 위해 DC 노치 하이패스필터를 통과시킨다(단계 S9345).Then, the current loop number is increased (step S9340), and the DC notch high pass filter is passed to remove the DC component of the decimated signal (step S9345).

이어, 고속 푸리에 변환(FFT)을 취하여 스펙트럼 예측을 한 후(단계 S9350), 단계 S9320으로 피드백한다.Next, the fast Fourier transform (FFT) is taken to perform spectral prediction (step S9350), and then fed back to step S9320.

한편, 단계 S9320에서 현재의 루프수가 기설정한 루프수 카운터치보다 같거나 크다고 체크되는 경우에는 특징인자를 추출한다(단계 S9360). 즉, 각 프레임별 시간축에 대응하는 기본 주파수의 크기, 대역폭을 일차적으로 구하고, 전체 프레임에 대해 주파수선 추적으로 하여 평균 기본 주파수(여기서, 기본 주파수=피치 주파수)를 구하고, 크기 및 주파수 대역폭, 그리고 주파수 표준편차를 구한다.On the other hand, if it is checked in step S9320 that the current loop number is equal to or larger than the preset loop number counter value, the feature factor is extracted (step S9360). That is, the magnitude and bandwidth of the fundamental frequency corresponding to the time axis of each frame are obtained first, and the average fundamental frequency (here, the fundamental frequency = pitch frequency) is obtained by frequency line tracking for the entire frame, and the magnitude and frequency bandwidth, and Find the frequency standard deviation.

이어, 등고선에 대응하는 색상을 원활히 나타내기 위해 디스플레이 스케일을 더 조정할 수도 있을 것이다(단계 S9365).Subsequently, the display scale may be further adjusted to smoothly display the color corresponding to the contour line (step S9365).

이처럼 사람의 발음 특성상 성문(vocal cord) 개폐의 주기적 특성으로 발생하고, 음성 모델링 과정에서 사용되는 중요 파라미터 중의 하나로서 분석되는 피치는 음성부호화기(또는 보코더, 음성코덱), 음성인식, 음성 변환 등 다양하게 이용될 수 있을 것이다.As such, the pitch generated as a periodic characteristic of the opening and closing of a vocal cord due to the pronunciation of a person, and analyzed as one of the important parameters used in the speech modeling process, varies in the speech coder (or vocoder, voice codec), speech recognition, speech conversion, etc. Will be able to be used.

도 21을 참조하면, 먼저, 기본적으로 주파수 분석시 처리할 파라메터를 설정한다(단계 S9510). 이때 설정되는 파라메타들은 샘플율, FFT 수, 윈도우 사이즈, 윈도우 종류, FFT 오버랩 율, 필터별 게인 및 필터의 대역폭 등이다.Referring to FIG. 21, first, a parameter to be processed in the frequency analysis is basically set (step S9510). The parameters set at this time are sample rate, FFT number, window size, window type, FFT overlap rate, gain for each filter, and bandwidth of filter.

이어, 음성신호 주파수를 일정 주파수 대역으로 분할한 후 분할된 주파수 대역 각각에 대응하는 필터들을 설계한다(단계 S9515). 예를들어, 음성신호를 5㎑로 가정할 때, 1㎑를 하나의 유니트로 하여 5개의 밴드 대역 통과필터를 설계한다. 이때 주파수 대역별로 게인은 상이하다. 왜냐하면 음성신호의 저역에서는 주파수의 진폭이 높고, 고역에서는 주파수의 진폭이 낮으므로 이를 보상하기 위해 저역에서는 게인을 낮추고, 고역에서는 게인을 높이므로써, 각 주파수 대역별로 균일화된 정보를 얻을 수 있다.Subsequently, the voice signal frequency is divided into predetermined frequency bands, and then filters corresponding to each of the divided frequency bands are designed (step S9515). For example, assuming that the audio signal is 5 kHz, 5 band band pass filters are designed with 1 kHz as one unit. At this time, the gain is different for each frequency band. Because the amplitude of the frequency is high at the low end of the voice signal and the amplitude of the frequency is low at the high end, the gain is lowered at the low end and the gain is increased at the high end to compensate for this, thereby obtaining uniform information for each frequency band.

이어, 루프수 카운터치를 '1'로 설정하고(단계 S9520), 현재의 루프수 카운트치가 기설정한 루프수 카운터치보다 큰 지의 여부를 체크한다(단계 S9525). 여기서, 루프수는 분석하고자 하는 주파수 대역의 주파수 분해능을 고려하여 설정한다.Then, the loop count counter value is set to '1' (step S9520), and it is checked whether the current loop count count value is larger than the preset loop count counter value (step S9525). Here, the number of loops is set in consideration of the frequency resolution of the frequency band to be analyzed.

단계 S9525에서 현재의 루프수가 기설정한 루프수보다 작다고 체크되는 경우에는 입력되는 데이터를 일부 오버랩(overlap)시켜 데이터를 읽고(단계 S9530), 읽어진 데이터를 FFT 처리를 수행한 후 주파수 쉬프팅시킨다(단계 S9535). 이러한 과정을 통해 각 프레임별로 입력되는 시간 데이터를 일부 중복시켜 각 프레임으로 구성하므로써 각 프레임별 정확한 주파수 분석을 할 수 있다. 여기서, 주파수 쉬프팅(frequency shifting)은 각 밴드별로 데시메이션을 실시하므로 밴드별로 주파수 분해능을 높게 하여 정밀 주파수 분석을 가능하게 하기 위함이다.If it is checked in step S9525 that the current number of loops is smaller than the preset number of loops, the input data is partially overlapped to read the data (step S9530), and the read data is subjected to FFT processing and then frequency shifted ( Step S9535). Through this process, part of the time data input for each frame is overlapped to configure each frame, thereby enabling accurate frequency analysis for each frame. In this case, frequency shifting is performed to perform precision decimation for each band, thereby increasing frequency resolution for each band, thereby enabling accurate frequency analysis.

이어, 보다 정밀한 토널의 특징을 추출하기 위하여 프레임별로 분석된 주파수 데이터를 스무딩을 위한 보간(Interpolation) 처리를 수행한 후(단계 S9540), 카운터치를 '1' 증가시킨 후(단계 S9545) 단계 S9525로 피드백한다.Subsequently, after performing interpolation processing for smoothing the frequency data analyzed for each frame in order to extract more precise tonal features (step S9540), the counter value is increased to '1' (step S9545), and then to step S9525. Feedback.

한편, 단계 S9525에서 현재의 루프수가 기설정한 루프수보다 같거나 크다고 체크되는 경우에는 주파수 분석을 위한 각각의 주파수 대역폭을 선택하고(단계 S9550), 선택된 주파수 대역폭에 대해서 에너지 스펙트럼을 추출하고, 이와 동시에 추출된 에너지 스펙트럼을 디스플레이한다(단계 S9555).On the other hand, when it is checked in step S9525 that the current number of loops is equal to or greater than the preset number of loops, each frequency bandwidth for frequency analysis is selected (step S9550), and the energy spectrum is extracted for the selected frequency bandwidth. At the same time, the extracted energy spectrum is displayed (step S9555).

이어, 주파수 대역폭별 평균 에너지 합, 피크 주파수, 크기 및 대역폭들을 검출한다(단계 S9560).Subsequently, the average energy sum, peak frequency, magnitude and bandwidths for each frequency bandwidth are detected (step S9560).

이어, 토널들 각각의 평균과 표준편차, 토널들의 개수, 토널들간의 상관 관계를 조사한 후(단계 S9565), 각 밴드별로 분석된 주파수를 규준화시킨다(단계 S9570).Subsequently, the average and standard deviation of each of the tonals, the number of tonals, and the correlations between the tonals are examined (step S9565), and then the frequencies analyzed for each band are normalized (step S9570).

이어, 디스플레이 스케일을 조정하고(단계 S9575), 주파수선을 추적하여 각프레임별 토널 주파수의 평균이나 표준편차를 구한다(단계 S9575). 이때 조정되는 디스플레이 스케일은 화면을 관찰하는 관찰자가 보다 주파수 특성을 용이하게 관찰하기 위함이고, 예를 들어, 도 12와 도 13에서 도시한 바와 같이, 신호의 에너지 레벨이 증가할수록 적색 계열의 색으로 디스플레이될 수 있도록 한다. 또한 추적되는 주파수선은 도 12에서 도시한 바와 같이, 피크값을 연결한 선이다.Then, the display scale is adjusted (step S9575), and the frequency line is traced to find the average or standard deviation of the tonal frequencies for each frame (step S9575). At this time, the display scale to be adjusted is for the observer who observes the screen to more easily observe the frequency characteristic. For example, as shown in FIGS. 12 and 13, as the energy level of the signal increases, the color of the red series is increased. To be displayed. In addition, the frequency line to be traced is a line connecting peak values, as shown in FIG. 12.

도 22를 참조하면, 먼저 적응형 다중 밴드패스필터를 이용하여 시간 영역 분석 정보를 특정 주파수 대역별로 특정 게인을 가지고 필터링을 한다(단계 S9610). 이처럼 특정 주파수 대역별로 필터링을 하는 것은 음성신호의 특유의 신호 특성과 발음별로 지니고 있는 고정된 특징을 제거하여 신호의 객관화 및 투명성을 높이고, 또한 집중 분석을 위한 대역에 대해 신호 대 잡음비를 양호하게 함이다.Referring to FIG. 22, first, time domain analysis information is filtered with a specific gain for a specific frequency band using an adaptive multi-band pass filter (step S9610). The filtering by specific frequency bands removes the signal characteristics unique to the voice signal and the fixed features of the pronunciation to improve the objectification and transparency of the signal, and also improves the signal-to-noise ratio for the band for intensive analysis. to be.

이어 다중 대역별로 필터링된 신호에 대해 해밍 윈도우(Hamming Window)를 이용하여 필터링하고(단계 S9615), 음성신호의 수학적 자동회귀 모델(Auto-regressive model)을 구하여 파라메타를 추정한 후(단계 S9620), 음성 트랙의 전달 함수를 구하고(단계 S9625), 음성 트랙의 주파수 응답을 구한다(단계 S9630). 상기한 단계 S9615 내지 단계 S9630에서 설명한 단계는 일반적인 포만트 분석 처리이다.Subsequently, the signal filtered by the multi-band is filtered using a Hamming Window (step S9615), a mathematical auto-regressive model is obtained, and the parameters are estimated (step S9620). The transfer function of the audio track is obtained (step S9625), and the frequency response of the audio track is obtained (step S9630). The steps described in the above steps S9615 to S9630 are general formant analysis processing.

이어, 피크 포만트 주파수들의 주파수와 진폭, 대역폭(Bandwidth 또는 bin)을 찾고(단계 S9635), 최고치인 첫번째 포만트를 제외한 나머지 포만트들에 대해서규준화 실행한다(단계 S9640).Next, the frequency, amplitude, and bandwidth (Bandwidth or bin) of the peak formant frequencies are found (step S9635), and normalization is performed on the remaining formants except the highest formant (step S9640).

이어, 각각의 대역들에 대해서 포만트의 수를 찾고, 찾은 포만트에 대해 적분 과정을 통해 에너지를 구한 후(단계 S9645), 각각의 대역들에 대해서 포만트간의 상관관계를 조사한다(단계 S9650).Subsequently, the number of formants is found for each of the bands, energy is found through the integration process for the found formants (step S9645), and the correlation between the formants is examined for each band (step S9650). ).

이상에서는 사상체질 감별 방법에 대해서만 설명하였으나, 이러한 방법을 하나의 프로그램으로 하여 플로피 디스켓이나, CD-ROM 등의 컴퓨터가 판독 가능한 다양한 기록매체에 저장하여 다양하게 이용할 수도 있을 것이다. 예를들어, 한의원이나 한약방 등에 설치된 컴퓨터에 사상체질 감별 프로그램을 내장시켜 사상체질 진맥을 요하는 환자들의 음성을 녹음하여 해당 환자의 사상체질 감별 정보를 출력할 수도 있을 것이다. 물론 한의원이나 한약방 뿐만 아니라, 사람들의 유동이 심한 은행이나 역 등에 본 발명에 따른 사상체질 감별 프로그램을 내장한 키오스크를 통해서도 사상체질 감별 정보를 제공할 수도 있음은 자명하다.In the above, only the filamentous sifting discrimination method has been described. However, the method may be used as a single program by storing it on various recording media that can be read by a computer such as a floppy diskette or a CD-ROM. For example, a Sasang Constitution Discrimination Program may be embedded in a computer installed in an oriental medicine clinic or a Chinese medicine room to record voices of patients requiring Sasang Constitutional Diagnosis. Of course, it is obvious that the Sasang Constitution Discrimination Information can be provided not only through a oriental medicine clinic or a Chinese medicine room, but also through a kiosk in which the Sasang Constitution Discrimination Program according to the present invention is incorporated into a bank or a station with a high flow of people.

다른 일례로서, 인터넷 서비스망을 통해 사상체질 감별을 원하는 클라이언트들의 음성을 녹취받아 인터넷 서비스 서버에서 분석하고, 분석된 사상체질 정보를 인터넷 망이나 유/무선망을 통해 해당 클라이언트측에 제공할 수도 있을 것이다.As another example, the voices of the clients who want to discriminate Sasang constitution through the Internet service network may be recorded and analyzed in the Internet service server, and the analyzed Sasang Constitution information may be provided to the corresponding client side through the Internet network or a wired / wireless network. will be.

이상에서는 실시예를 참조하여 설명하였지만, 해당 기술 분야의 숙련된 당업자는 하기의 특허 청구의 범위에 기재된 본 발명의 사상 및 영역으로부터 벗어나지 않는 범위 내에서 본 발명을 다양하게 수정 및 변경시킬 수 있음을 이해할 수 있을 것이다.Although described above with reference to the embodiments, those skilled in the art can be variously modified and changed within the scope of the present invention without departing from the spirit and scope of the invention described in the claims below. I can understand.

이상에서 설명한 바와 같이, 본 발명에 따르면 마이크를 통해서 입력된 음성을 아날로그-디지털 변환 과정을 통해 컴퓨터의 기억 장치에 데이터 파일로써 저장하고, 저장된 데이터로부터 음성의 음향적 특징을 추출하기 위해 피치 분석법, 포만트 분석법 및 주파수 영역 분석법을 사용하여 분석하며, 분석된 음성을 이용하여 음성 주인공의 사상체질을 감별할 수 있다.As described above, according to the present invention, the voice input through the microphone is stored as a data file in the storage device of the computer through an analog-to-digital conversion process, the pitch analysis method, to extract the acoustic characteristics of the voice from the stored data, Analyze using formant and frequency domain analysis, and use the analyzed voice to discriminate the ideological constitution of the main character of the voice.

물론 이러한 음성 분석을 통해 잠금 장치에 대응하는 키로서 적극적으로 활용할 수 있고, 사이버상에서 특정인을 판별하여 인증이나 결제 등의 수단으로 활용할 수도 있을 것이다.Of course, such a voice analysis may be actively used as a key corresponding to the locking device, and may be used as a means of authentication or payment by identifying a specific person in cyberspace.

또한, 음성 주인공이 작성한 사상체질 감별을 위한 설문지에 기재된 내용을 근거로 1차적으로 음성 주인공의 사상체질을 감별하고, 상기한 바와 같이 음성을 이용하여 2차적으로 음성 주인공의 사상체질을 감별하며, 1차적으로 감별된 사상체질 정보와 2차적으로 감별된 사상체질 정보를 근거로 최종적으로 해당 주인공의 사상체질을 감별할 수 있으므로 감별의 신뢰성을 부가시킬 수 있다.In addition, based on the contents of the questionnaire for discriminating ideological constitution written by the main character of the voice, the ideological constitution of the main character of the voice is primarily discriminated, and the ideological constitution of the main character of the voice is discriminated secondly by using the voice as described above. It is possible to add the reliability of the discrimination since the ideological constitution of the main character can be finally discriminated based on the ideological constitution information that is discriminated first and the ideological constitution information discriminated second.

Claims

In the pitch analysis method for analyzing the pitch of speech generated due to the periodic characteristics of the gate opening and closing and used in the speech coder, speech recognition, speech conversion, etc.

(a) designing a low pass filter and setting a loop count counter value;

(b) checking whether the current loop number count value is larger than the set loop number counter value;

(c) if it is checked in step (b) that the current number of loops is less than the preset number of loops, reading data and demodulating the read data;

(d) decimating the demodulated data;

(e) increasing the current loop number and removing the DC component of the decimated signal;

(f) taking an FFT on the decimation signal from which the DC component has been removed and performing spectral prediction, and then feeding back to step (b);

(g) extracting a feature factor when it is checked in step (b) that the current loop number is equal to or greater than a preset loop number counter value; And

(h) adjusting the display scale to display a color corresponding to the contour line.

In the pitch analysis device for analyzing the pitch of the voice generated due to the periodic characteristics of the gate opening and closing and used in the speech coder, speech recognition, speech conversion, etc.,

A band pass filter for outputting only a signal included in a desired frequency bandwidth among voice signals provided from the outside;

A square detector detecting a square of the reduced frequency bandwidth;

A low pass filter for reducing a frequency bandwidth and outputting a signal having a reduced frequency bandwidth by outputting only a signal up to a predetermined frequency in which a pitch may exist among voice signals and removing the rest;

A decimation unit configured to receive a voice signal having the reduced frequency bandwidth and decimate at a maximum decimation rate at which no aging occurs;

A high pass filter removing a DC signal component included in the decimated signal and outputting a desired frequency band signal;

A hanning window modeling a spectrum of the high pass filtered signal;

A real-time Fourier transform unit for performing a real-time Fourier transform process on the spectrum modeled through the hanning window unit;

A linear integrator for integrating the real-time Fourier transformed spectrum; And

And a normalization unit configured to output a pitch analysis signal after normalizing the integrated frequency spectrum.

(a) checking whether or not the sacrificial constitution discriminating voice is recorded, and if the voice is checked as being recorded, digitally converting and storing the recorded voice;

(b) extracting pure voice data from the stored voice data; And

and (c) performing a digital signal processing process for classifying filamentous constitution on the basis of the pure voice data, and outputting the classified information for classifying filamentous constitution.

The method of claim 3, wherein step (c) comprises:

(c-1) performing a formant analysis process on the extracted pure speech signal and displaying a 2-D formant graph;

(c-2) storing the formant feature factor;

(c-3) performing a pitch process on the extracted pure speech signal and displaying a 3-D pitch gram;

(c-4) storing the pitch feature factors;

(c-5) performing frequency analysis on the extracted pure speech signal and displaying a 3-D frequency gram;

(c-6) performing a new formant analysis process;

(c-7) storing the frequency feature factor;

(c-8) fusing the formant feature, the pitch feature, and the frequency feature by using pattern matching or a neural network; And

(c-8) displaying the predictive value of the constitution discrimination based on the fused feature factors.

The method of claim 4, wherein step (c-1) comprises:

(c-11) designing a hamming window and modeling a spectrum of the signal using the hamming window;

(c-12) modeling the modeled speech through a mathematical autoregression model and calculating a transition function of the speech track;

(c-13) calculating frequency response characteristics of the voice track; And

(c-14) detecting a peak point for a frequency bandwidth based on the calculated frequency response characteristic, and extracting a feature factor based on the peak point.

The method of claim 4, wherein step (c-3) comprises:

(c-31) designing a low pass filter and setting a loop count counter value;

(c-32) checking whether the current loop number count value is larger than the set loop number counter value;

(c-33) if it is checked in step (c-32) that the current number of loops is less than the preset number of loops, reading data and demodulating the read data;

(c-34) decimating the demodulated data;

(c-35) increasing the current loop number and removing the DC component of the decimated signal;

(c-36) taking an FFT on the decimation signal from which the DC component has been removed and performing spectral prediction, and then feeding back to the step (c-32);

(c-37) extracting a feature factor when it is checked in step (c-32) that the current loop number is equal to or greater than a preset loop number counter value; And

and (c-38) adjusting the display scale to display a color corresponding to the contour line.

The method of claim 4, wherein step (c-5) comprises:

(c-51) setting a parameter to be processed in the frequency analysis;

(c-52) dividing the voice signal frequency into a predetermined frequency band, designing filters corresponding to each of the divided frequency bands, and setting the number of loops to '1';

(c-53) checking whether the current loop count count value is larger than a preset loop count counter value;

(c-54) if it is checked in the step (c-53) that the current number of loops is smaller than the preset number of loops, overlapping the input data and reading data;

(c-55) performing frequency shifting on the read data after performing FFT processing;

(c-56) performing interpolation for smoothing the frequency data analyzed for each frame;

(c-57) incrementing the counter value by '1' and feeding back to the step (c-53);

(c-58) selecting each frequency bandwidth for frequency analysis when it is checked in step (c-53) that the current number of loops is equal to or greater than a preset number of loops;

(c-59) extracting an energy spectrum for the selected frequency bandwidth and displaying the extracted energy spectrum;

(c-60) detecting average energy sum, peak frequency, magnitude and bandwidths per frequency bandwidth;

(c-61) examining the mean and standard deviation of each of the tonals, the number of tonals, and the correlations between the tonals;

(c-62) normalizing the analyzed frequency for each band; And

and (c-63) adjusting a display scale and tracing a frequency line to calculate an average and a standard deviation of the tonal frequencies of each frame.

The method of claim 4, wherein step (c-6)

(c-6a) filtering the time domain analysis information with a specific gain for each specific frequency band;

(c-6b) filtering a signal filtered for each of the multiple bands using a Hamming window;

(c-6c) estimating a parameter by obtaining a mathematical autoregression model of the filtered speech signal;

(c-6d) obtaining a transfer function of the voice track and obtaining a frequency response of the voice track;

(c-6e) finding the frequency, amplitude, and bandwidth of the peak formant frequencies, and performing normalization on the remaining formants except the first formant, which is the highest;

(c-6f) finding the number of formants for each of the bands and obtaining energy through an integration process for the found formants; And

(c-6g) examining the correlation between the formants for each of the bands filamentous constitution discrimination method comprising the.

(b) extracting pure voice data from the stored voice data;

(c) checking whether the prediction is completed by the basic survey for Sasang Constitution Discrimination;

(d) if it is checked in step (c) that the prediction has been completed, processing the prediction information to display the first Sasang Constitution discrimination information;

(e) storing the first filamentous constitution discrimination information processed in step (d), and checking whether to set the filamentous constitution classification; And

(f) if it is checked in the step (e) that the filamentous constitution classification setting is obtained, the second filamentous constitution discrimination information is obtained through a digital signal processing process on the time domain analysis signal obtained by analyzing the pure voice data in the time domain; And outputting third filamentous constitution discrimination information through fusion of the first filamentous constitution discrimination information and the second filamentous constitution discrimination information.

The method of claim 9, wherein step (b) comprises:

(b-1) performing low pass filtering on the analog-digital converted voice;

(b-2) extracting a pure voice signal by removing noise from the low pass filtered signal;

(b-3) storing the pure voice signal as a new file including the pure voice signal; And

and (b-4) displaying the pure audio signal.

The method of claim 9, wherein step (f) comprises:

(f-1) performing a formant analysis process on the time domain analysis signal and displaying a 2-D formant graph;

(f-2) storing the formant feature factor;

(f-3) performing a pitch processing on the time domain analysis signal and displaying a 3-D pitch gram;

(f-4) storing the pitch feature factors;

(f-5) performing a frequency analysis process on the time domain analysis signal and displaying a 3-D frequency gram;

(f-6) performing a new formant analysis process;

(f-7) storing the frequency feature factor;

(f-8) fusing the formant feature, the pitch feature and the frequency feature by using pattern matching or a neural network; And

(f-8) displaying the third filamentous constitution discrimination information, which is a prediction value of the constitution discrimination.

12. The method of claim 11, wherein the step (f-8) further comprises fusing with the first Sasang Constitution discrimination information on the Sasang Constitution of the voice main character based on the data created in the form of a questionnaire. Way.

The method of claim 11, wherein step (f-1),

(f-11) designing a hamming window and modeling a spectrum of the signal using the hamming window;

(f-12) modeling the modeled speech through a mathematical autoregression model and calculating a transition function of the speech track;

(f-13) calculating frequency response characteristics of the voice track; And

(f-14) detecting a peak point for a frequency bandwidth based on the calculated frequency response characteristic, and extracting a feature factor based on the peak point.

The method of claim 11, wherein step (f-3),

(f-31) designing a low pass filter and setting a loop count counter value;

(f-32) checking whether the current loop number count value is larger than the set loop number counter value;

(f-33) if it is checked in step f-32 that the current number of loops is less than a predetermined number of loops, reading data and demodulating the read data;

(f-34) decimating the demodulated data;

(f-35) increasing the current number of loops and removing the DC component of the decimated signal;

(f-36) taking an FFT on the decimation signal from which the DC component has been removed and performing spectral prediction, and then feeding back to the step (f-32);

(f-37) extracting a feature factor when it is checked in step f-32 that the current loop number is equal to or larger than a preset loop number counter value; And

and (f-38) adjusting the display scale to display a color corresponding to the contour line.

The method of claim 11, wherein step (f-5),

(f-51) setting a parameter to be processed in the frequency analysis;

(f-52) dividing the voice signal frequency into a predetermined frequency band, designing filters corresponding to each of the divided frequency bands, and setting the number of loops to '1';

(f-53) checking whether the current loop count count value is larger than a preset loop count counter value;

(f-54) if it is checked in step f-53 that the current number of loops is smaller than the preset number of loops, the data is read by overlapping the input data;

(f-55) performing frequency shifting on the read data after performing FFT processing;

(f-56) performing interpolation for smoothing the frequency data analyzed for each frame;

(f-57) increasing the counter value by '1' and feeding back to the step (f-53);

(f-58) if it is checked in step f-53 that the current number of loops is equal to or greater than the preset number of loops, selecting respective frequency bandwidths for frequency analysis;

(f-59) extracting an energy spectrum for the selected frequency bandwidth and displaying the extracted energy spectrum;

(f-60) detecting an average sum of energy, peak frequency, magnitude and bandwidths per frequency bandwidth;

(f-61) investigating the mean and standard deviation of each of the tonals, the number of tonals, and the correlations between the tonals;

(f-62) normalizing the analyzed frequency for each band; And

(f-63) adjusting the display scale and tracing the frequency lines to calculate the average and standard deviation of the tonal frequencies of each frame.

The method of claim 11, wherein step (f-6),

(f-6a) filtering the time domain analysis information with a specific gain for each specific frequency band;

(f-6b) filtering the signal filtered for each of the multiple bands using a Hamming window;

(f-6c) estimating a parameter by obtaining a mathematical autoregression model of the filtered speech signal;

(f-6d) obtaining a transfer function of the voice track and obtaining a frequency response of the voice track;

(f-6e) finding the frequency, amplitude, and bandwidth of the peak formant frequencies, and performing normalization on the remaining formants except the first formant, which is the highest;

(f-6f) finding the number of formants for each of the bands and obtaining energy through the integration process for the found formants; And

(f-6g) a method for discriminating filamentous constitution, comprising examining the correlation between the formants for each of the bands.

An analog-digital converter for converting an analog voice signal provided from a microphone into a digital voice signal;

A pure voice signal extracting unit which performs low pass filtering on the digitally converted voice signal and extracts a pure voice signal by extracting a signal having a predetermined threshold or more;

A time domain analyzer for analyzing the waveform of the pure voice signal and outputting time domain analysis information;

An analysis unit configured to output first analysis information through pitch analysis on the time domain analysis information, output second analysis information through formant analysis, and output third analysis information through frequency domain analysis;

A gram display and a feature factor extractor configured to display the first to third analysis information and extract a feature factor;

A feature factor fusion unit for fusing and outputting the extracted feature factor; And

Sasang Constitution Discrimination System including the Sasang Constitution analysis unit for outputting the Sasang Constitution analysis information corresponding to the corresponding speech signal by checking the output feature factor.

The method of claim 17, wherein the analysis unit,

A band pass filter for outputting only a signal included in a predetermined bandwidth among the time domain analysis information;

A square detection module for outputting a squared process for the reduced bandwidth;

A low pass filter for outputting only a signal up to a predetermined frequency in which a pitch may exist among voice signals;

A decimation module for receiving a low-pass filtered signal and decimating at a maximum decimation rate at which no aging occurs;

A high pass filter removing a DC component included in the decimated signal and performing high pass filtering;

A hanning window module for modeling a spectrum of the frame;

A real time fast Fourier transform module for performing a real time Fourier transform process on a spectrum modeled through a hanning window;

A linear integration module for integrating the Fourier transformed spectrum; And

And a pitch analysis unit comprising a normalization module for providing the first analysis information obtained by standardizing the integrated frequency spectrum to the gram display unit and the feature factor extraction unit.

The method of claim 17, wherein the analysis unit,

A hanning window module receiving the time domain analysis information to model a spectrum and output a modeled signal;

A mathematical autoregression model calculation module for estimating parameters for the signal modeled by the hanning window module;

A transfer function extraction module for extracting a transfer function of an output signal with respect to an input signal using the estimated parameters;

A frequency response module for extracting a frequency response characteristic corresponding to the extracted transfer function and providing the extracted frequency response characteristic to the gram display and feature factor extractor;

A multiple adaptive band pass filter configured to filter the time domain analysis information by receiving the time domain analysis information and varying desired frequency bands and gains;

A hanning window module for modeling a signal filtered through different bandwidths and gains through a hanning window;

A real time FFT module receiving a modeled signal from the hanning window module and performing a real time Fourier transform; And

And a formant analysis unit comprising a normalization module configured to provide a second analysis information to the gram display unit and the feature factor extraction unit after normalizing a Fourier transformed signal.

The method of claim 17, wherein the analysis unit,

A square detection module which squares the time domain analysis information;

A low pass filter for performing low pass filtering on the squared time domain analysis information and outputting only the low band information;

A decimation module for receiving a voice signal having a reduced frequency bandwidth from the low pass filter to decimate at a maximum decimation rate at which no aging occurs;

A high pass filter for high pass filtering the decimated time domain analysis information and outputting only information of a high band;

A data overlap and frequency shifting module for overlapping data in the time domain so as not to cause an aliasing in the frequency domain, and shifting a frequency corresponding to the overlapped data;

A band selection module for selecting a desired bandwidth among the shifted frequencies and outputting a signal of the selected band;

An interpolation module for interpolating the selected bandwidth; And

And a frequency domain analyzer comprising a normalization module configured to provide third analysis information to the gram display and feature factor extractor after normalizing the interpolated signal.

(b) extracting pure voice data from the stored voice data; And

(c) a computer-readable recording medium incorporating a Sasang Constitution Discrimination Program comprising performing a digital signal processing process for classifying Sasang Constitution on the basis of the pure voice data, and outputting the information classified Sasang Constitution Discrimination. .

(b) extracting pure voice data from the stored voice data;

(f) If it is checked in the step (e) that the filamentous constitution classification setting is obtained, second filamentous constitution discrimination information is obtained through a digital signal processing process based on the pure voice data, and the first filamentous constitution discrimination information and the A computer-readable recording medium incorporating a Sasang Constitution Discrimination Program comprising outputting third Sasang Constitution Discrimination Information through fusion with second Sasang Constitution Discrimination Information.