KR20020049764A

KR20020049764A - Apparatus and method for speech detection using multiple sub-detection system

Info

Publication number: KR20020049764A
Application number: KR1020000079046A
Authority: KR
Inventors: 김승희; 이영직
Original assignee: 오길록; 한국전자통신연구원
Priority date: 2000-12-20
Filing date: 2000-12-20
Publication date: 2002-06-26
Also published as: KR100349656B1

Abstract

PURPOSE: An apparatus and a method for detecting speech using multiple lower speech detection systems are provided to collect the operation results of multiple speech detecting systems that operate independently so as to obtain optimized detection performance. CONSTITUTION: An apparatus for detecting speech using multiple low speech detection systems includes a speech signal input unit(21), a plurality of detectors(22), a detection result combination unit(23), and a detection result output unit(24). The speech signal input unit receives an input speech signal transmitted from the outside. Each of the detectors detects the start point and end point of the input speech signal. The detection result combination unit combines the results obtained by the plurality of detectors to calculate a weighted average value. The detection result output unit outputs the detected result.

Description

Apparatus and method for speech detection using multiple sub-detection system}

본 발명은 다수의 하위 음성검출 시스템을 이용한 음성검출 기술에 관한 것으로, 특히 다수의 하위 음성검출 시스템들이 독립적으로 구한 음성의 시작점과 끝점에 대해 각 시스템별로 가중치를 달리하여 가중평균값을 구함으로써, 최종적인 음성의 시작점과 끝점을 검출하기 위한 다수의 하위 음성검출 시스템을 이용한 음성검출 장치 및 그 방법과 상기 방법을 실현시키기 위한 프로그램을 기록한 컴퓨터로 읽을 수 있는 기록매체에 관한 것이다.The present invention relates to a voice detection technique using a plurality of lower voice detection systems. In particular, the weighted average value is obtained by varying weights for each system for a start point and an end point of a voice independently obtained by a plurality of lower voice detection systems. The present invention relates to a voice detection apparatus using a plurality of subordinate voice detection systems for detecting the start and end points of a conventional voice, and a method and a computer readable recording medium storing a program for realizing the method.

본 발명은 음성인식(Speech Recognition), 음성합성(Speech Synthesis), 음성코딩(Speech Coding) 등 음성을 사용하는 모든 분야에 속한다. 이들 분야에서는 음성이 포함된 신호에서 음성부분만을 정확히 검출해내는 것이 대단히 중요한 일이다.The present invention belongs to all fields using voice, such as speech recognition, speech synthesis, speech coding, and the like. In these fields, it is very important to accurately detect only a speech part in a signal containing speech.

일반적으로 실제 환경에서 음성인식기의 오동작을 유발시키는 주된 요인 중 하나는 음성검출기가 잘못된 음성검출결과를 출력하는 것이다. 즉, 비음성이 음성의 영역에 포함되거나 음성영역이 잘려나갈 경우 필연적으로 인식기를 통해 오인식 결과를 출력하게 되는 것이다.In general, one of the main causes of malfunction of the voice recognizer in a real environment is that the voice detector outputs an incorrect voice detection result. In other words, when the non-voice is included in the speech region or the speech region is cut out, a misrecognition result is necessarily output through the recognizer.

그 동안 음성검출을 위한 많은 방법들이 제안되었다. 이들 방법들은 제각기 장점을 가지고는 있지만 어느 하나의 방법만으로 뛰어난 성능을 발휘하지는 못했다. 여러 개의 하위 시스템들을 사용하는 방법들도 제시되었으나, 이는 매 순간순간 각 시스템들의 결과를 이용하여 판단을 내리는 것일 뿐, 각 시스템의 최종 검출결과를 이용하는 것은 아니었다.Many methods for voice detection have been proposed. Each of these methods has their advantages, but none of them can perform well. Methods using multiple subsystems have also been presented, but this is to make judgments based on the results of each system at every moment and not to use the final detection result of each system.

따라서, 종래의 방법들에서는 여러 시스템을 이용하더라도 각 시스템들의 성능의 합을 이끌어내지는 못하였을 뿐만 아니라 실시간 처리라는 제약에 묶여서 다양한 시스템을 이용하지 못하게 되는 문제점이 있었다.Therefore, in the conventional methods, even if several systems are used, the performance of each system may not be derived, and there is a problem in that various systems cannot be used due to the constraint of real-time processing.

본 발명은, 상기한 바와 같은 문제점을 해결하기 위하여 제안된 것으로, 음성을 입력으로 사용하는 시스템에서 실시간 처리의 제약없이 음성부분을 검출하기 위하여, 여러 개의 독립적인 하위음성검출시스템에서 구한 시작점과 끝점의 가중평균값을 구함으로써, 최종적인 음성의 시작점과 끝점을 검출하기 위한 다수의 하위 음성검출 시스템을 이용한 음성검출 장치 및 그 방법과 상기 방법을 실현시키기 위한 프로그램을 기록한 컴퓨터로 읽을 수 있는 기록매체를 제공하는데 그 목적이 있다.The present invention has been proposed to solve the above problems, the starting point and the end point obtained from several independent sub-voice detection systems in order to detect the voice part without the limitation of real-time processing in the system using the voice as an input A speech detection device using a plurality of subordinate voice detection systems for detecting the starting and ending points of a final voice, and a computer readable recording medium having recorded thereon a method and a program for realizing the above method. The purpose is to provide.

도 1 은 본 발명이 적용되는 음성검출부를 포함한 일반적인 음성인식시스템의 구성예시도.1 is an exemplary configuration of a general voice recognition system including a voice detection unit to which the present invention is applied.

도 2 는 본 발명에 따른 다수의 하위 음성검출 시스템을 이용한 음성검출 장치의 일실시예 구성도.2 is a block diagram of an embodiment of a voice detection apparatus using a plurality of lower voice detection systems according to the present invention;

도 3 은 본 발명에 따른 다수의 하위 음성검출 시스템을 이용한 음성검출 방법에 대한 일실시예 흐름도.3 is a flowchart illustrating a voice detection method using a plurality of lower voice detection systems according to the present invention.

도 4 는 본 발명에 따른 다수의 하위 음성검출 시스템을 이용한 음성검출 방법에서 하위 각 음성검출 시스템별 가중치를 결정하기 위한 훈련과정에 대한 일실시예 흐름도.Figure 4 is a flow diagram of an embodiment of a training process for determining the weight for each lower voice detection system in the voice detection method using a plurality of lower voice detection system according to the present invention.

* 도면의 주요 부분에 대한 부호의 설명* Explanation of symbols for the main parts of the drawings

11 : 음성 입력부12 : 음성검출부11: voice input unit 12: voice detection unit

13 : 인식부21 : 신호 입력부13 recognition unit 21 signal input unit

22 : 음성검출기 23 : 검출결과 조합부22: voice detector 23: detection result combination unit

24 : 검출결과 출력부24: detection result output unit

상기 목적을 달성하기 위한 본 발명은, 다수의 하위 음성검출 시스템을 이용한 음성검출 장치에 있어서, 외부로부터 전달되는 음성 입력신호를 입력받기 위한 음성신호 입력수단; 상기 음성신호 입력수단을 통해 전달된 음성 입력신호에 대해 독립적으로 음성의 시작점과 끝점을 검출하기 위한 다수의 검출수단; 상기 다수의 검출수단을 통해 각각 구해진 검출결과들을 조합하여 가중 평균값을 구하기 위한 검출결과 조합수단; 및 상기 검출결과 조합수단을 통해 전달된 음성의 시작점 및 끝점에 대한 검출 결과를 출력하기 위한 검출결과 출력수단을 포함하여 이루어진 것을 특징으로 한다.In order to achieve the above object, the present invention provides a voice detection apparatus using a plurality of lower voice detection systems, comprising: voice signal input means for receiving a voice input signal transmitted from the outside; A plurality of detection means for detecting a start point and an end point of the voice independently of the voice input signal transmitted through the voice signal input means; Detection result combining means for combining the detection results obtained through the plurality of detection means to obtain a weighted average value; And detection result output means for outputting detection results for the start point and the end point of the voice transmitted through the detection result combining means.

또한, 본 발명은, 다수의 하위 음성검출 시스템을 이용한 음성검출 장치에 적용되는 음성검출 방법에 있어서, 외부로부터 전달되는 음성 입력신호를 입력받는 제 1 단계; 상기 입력된 음성 입력신호에 대해 독립적으로 음성의 시작점과 끝점을 검출하는 제 2 단계; 상기 각각 구해진 검출결과들을 조합하여 가중 평균값을 구하는 제 3 단계; 및 상기 구해진 가중평균값을 전체 시스템이 구한 음성의 시작점 및 끝점으로 하여 출력하는 제 4 단계를 포함하여 이루어진 것을 특징으로 한다.The present invention also provides a voice detection method applied to a voice detection apparatus using a plurality of lower voice detection systems, the method comprising: a first step of receiving a voice input signal transmitted from an external device; Detecting a start point and an end point of a voice independently of the input voice input signal; A third step of obtaining a weighted average value by combining the obtained detection results; And a fourth step of outputting the obtained weighted average value as a start point and an end point of the voice obtained by the entire system.

또한, 본 발명은, 프로세서를 구비한 다수의 하위 음성검출 시스템을 이용한 음성검출 장치에, 외부로부터 전달되는 음성 입력신호를 입력받는 제 1 기능; 상기 입력된 음성 입력신호에 대해 독립적으로 음성의 시작점과 끝점을 검출하는 제 2 기능; 상기 각각 구해진 검출결과들을 조합하여 가중 평균값을 구하는 제 3 기능; 및 상기 구해진 가중평균값을 전체 시스템이 구한 음성의 시작점 및 끝점으로 하여 출력하는 제 4 기능을 실현시키기 위한 프로그램을 기록한 컴퓨터로 읽을 수 있는 기록매체를 제공한다.In addition, the present invention, the voice detection device using a plurality of lower voice detection system having a processor, the first function for receiving a voice input signal transmitted from the outside; A second function of detecting a start point and an end point of a voice independently of the input voice input signal; A third function of combining the obtained detection results to obtain a weighted average value; And a computer readable recording medium having recorded thereon a program for realizing a fourth function of outputting the obtained weighted average value as a starting point and an end point of the speech obtained by the entire system.

본 발명은 제각기 독립적으로 동작하는 음성검출시스템들의 결과를 조합하되, 각각의 시스템에 가중치를 두어 최종결과를 얻는다. 가중치는 훈련과정에서 최적의 음성검출성능을 내도록 결정된다.The present invention combines the results of voice detection systems that operate independently of each other, weighting each system to obtain a final result. The weight is determined to give the optimal voice detection performance during training.

본 발명은 독립적으로 동작하는 여러 음성 검출시스템들의 결과를 모아 최적의 검출성능을 나타내도록 조합함으로써, 기발표된 음성검출시스템의 검출성능보다 우수한 검출성능을 얻을 수 있는 특징이 있다. 즉, 본 발명에서는 실시간 처리를 목표로 하지 않고 우선 여러 하위 음성검출 시스템들이 독립적으로 음성의 시작점을 검출한다. 다음으로 검출된 여러 개의 시작점들에 대해 가중평균을 구한다. 이때, 각 검출시스템별 가중치는 훈련과정에서 결정한다. 끝점에 대해서도 동일한 과정을 거친다. 가중평균하여 구한 시작점과 끝점이 전체 음성검출시스템에서 구한 시작점과 끝점이 된다.The present invention is characterized by obtaining the detection performance superior to the detection performance of the previously announced voice detection system by combining the results of several independently operated voice detection systems to show the optimal detection performance. That is, the present invention does not aim at real time processing, but first, several lower voice detection systems independently detect the starting point of the voice. Next, a weighted average is obtained for several detected starting points. At this time, the weight for each detection system is determined in the training process. The same goes for the endpoint. The starting point and the end point obtained by weighted average are the starting point and the end point obtained from the entire voice detection system.

상술한 목적, 특징들 및 장점은 첨부된 도면과 관련한 다음의 상세한 설명을 통하여 보다 분명해 질 것이다. 이하, 첨부된 도면을 참조하여 본 발명에 따른 바람직한 일실시예를 상세히 설명한다.The above objects, features and advantages will become more apparent from the following detailed description taken in conjunction with the accompanying drawings. Hereinafter, exemplary embodiments of the present invention will be described in detail with reference to the accompanying drawings.

도 1 은 본 발명이 적용되는 음성검출부를 포함한 일반적인 음성인식시스템의 구성예시도이다.1 is an exemplary configuration of a general voice recognition system including a voice detection unit to which the present invention is applied.

도 1에 도시된 바와 같이, 음성입력부(11)는 마이크로부터의 입력신호 또는 파일형태로 저장된 신호를 읽어 들여 음성검출부(12)와 인식부(13)로 전달한다.As shown in FIG. 1, the voice input unit 11 reads an input signal from a microphone or a signal stored in a file form and transmits the read signal to the voice detector 12 and the recognizer 13.

그러면, 음성검출부(12)는 전달된 입력신호에서 음성부분을 검출한 후, 검출된 음성부분의 시작시간과 끝시간정보를 인식부(13)로 전달한다.Then, the voice detector 12 detects the voice portion from the transmitted input signal, and transmits the start time and the end time information of the detected voice portion to the recognition unit 13.

이어서, 인식부(13)는 음성검출부(12)로부터 전달된 검출결과를 이용하여 음성입력부(11)에서 전달된 입력신호 중 음성부분에 대해서만 인식을 수행하고, 인식결과를 출력한다.Subsequently, the recognition unit 13 recognizes only the voice part of the input signal transmitted from the voice input unit 11 by using the detection result transmitted from the voice detector 12 and outputs the recognition result.

도 2 는 본 발명에 따른 다수의 하위 음성검출 시스템을 이용한 음성검출 장치의 일실시예 구성도이다.2 is a block diagram of an embodiment of a voice detection apparatus using a plurality of lower voice detection systems according to the present invention.

도 2에 도시된 바와 같이, 본 발명에 따른 다수의 하위 음성검출 시스템을 이용한 음성검출 장치는, 음성 입력부로부터 전달되는 음성 입력신호를 입력받기 위한 신호 입력부(21)와, 신호 입력부(21)를 통해 전달된 음성 입력신호에 대해 독립적으로 음성의 시작점과 끝점을 검출하기 위한 음성검출기(22)와, 음성검출기(22)를 통해 각각 구해진 검출결과들을 조합하여 가중 평균값을 구하기 위한 검출결과 조합부(23)와, 검출결과 조합부(23)를 통해 전달된 음성의 시작점 및 끝점에 대한 검출 결과를 출력하기 위한 검출결과 출력부(24)를 구비한다.As shown in FIG. 2, a voice detection apparatus using a plurality of lower voice detection systems according to the present invention includes a signal input unit 21 and a signal input unit 21 for receiving a voice input signal transmitted from a voice input unit. Detection result combination unit for obtaining a weighted average value by combining the voice detector 22 for detecting the start and end points of the voice independently of the voice input signal transmitted through the voice detector 22 and the detection results obtained through the voice detector 22 ( 23 and a detection result output section 24 for outputting detection results for the start and end points of the voice transmitted through the detection result combination section 23.

상기한 바와 같은 구조를 갖는 본 발명의 음성검출 장치의 동작 과정을 상세하게 설명하면 다음과 같다.Referring to the operation of the voice detection device of the present invention having the structure as described above in detail.

먼저, 음성 입력부로부터 전달된 입력신호는 검출기(22)로 전달된다. 도 2에서는 다수 개의 음성 검출기(221~22N)를 사용한 예를 도시했으나, 실제 적용시 사용되는 음성검출기의 종류와 숫자에는 제약이 없으며, 기발표된 어떤 종류의 음성검출기도 사용 가능하다. 일실시예로 다음의 리스트에 있는 검출시스템들을 사용할 수 있다.First, the input signal transmitted from the voice input unit is transmitted to the detector 22. In FIG. 2, an example using a plurality of voice detectors 221 to 22N is illustrated, but there are no limitations on the types and numbers of voice detectors used in actual applications, and any type of voice detectors may be used. In one embodiment, detection systems in the following list may be used.

첫 번째, 음성검출기 1(221)은 음성검출을 위한 특징벡터로 켑스트럼을 사용하며, 입력음성 앞부분을 비음성이라고 가정하여 이 부분에 대한 평균값을 구한다. 매 프레임마다 비음성부분의 켑스트럼 값과의 차이를 구하여 차이값이 정해진 문턱값을 넘게 되면 음성, 그렇지 않으면 비음성부분이라고 간주한다.First, the voice detector 1 221 uses a cepstrum as a feature vector for voice detection, and calculates an average value for this portion by assuming that the front part of the input voice is non-voice. In each frame, the difference between the non-voice part and the spectral value is obtained.

두 번째, 음성검출기 2(222)는 정규화된 에너지와 영교차율을 음성검출을 위한 파라메터로 사용한다. 이때, 에너지와 영교차율이 문턱값을 넘게 되면 음성, 그렇지 않으면 비음성이라고 판단한다.Second, voice detector 2 222 uses normalized energy and zero crossing rate as parameters for voice detection. At this time, if the energy and the zero crossing rate exceeds the threshold, it is determined that the voice, otherwise it is non-voice.

세 번째, 음성검출기 3(223)은 비음성부분과 음성부분에 대하여 미리 통계학적인 모델을 생성하여 입력음성에 대해 매 프레임마다 비음성부분에 대한 모델과 음성부분에 대한 모델의 확률값을 구하여 음성구간을 검출한다.Third, the voice detector 3 223 generates a statistical model for the non-voice part and the voice part in advance, and obtains a probability value of the model for the non-voice part and the model for the voice part every frame of the input voice. Is detected.

네 번째, 음성검출기 4(224)는 정규화된 멜켑스트럼 계수(Mel-Frequency Cepstral Coefficient)중 1차값(C1)을 사용하여 음성부분을 검출한다. C1값이 미리 정해진 문턱값을 넘으면 음성부분이라고 판단한다.Fourth, the voice detector 4 224 detects the voice part using the primary value C1 of the normalized Mel-Frequency Cepstral Coefficient. If the C1 value exceeds a predetermined threshold, it is determined to be a voice part.

이렇게, 다수 개의 음성 검출기(221~22N)는 음성 입력부로부터 입력되는 음성 신호를 독립적으로 검출한다. 다수 개의 음성 검출기(221~22N)로부터 각각 검출된 결과인 음성의 시작점과 끝점에 대한 시간정보는 검출결과 조합부(23)로 전달된다. 검출결과 조합부(23)는 전달된 검출결과들에 대해 다음의 [수학식 1] 및 [수학식 2]에 의하여 가중평균값을 구하게 된다. 이렇게 구해진 가중평균값은 전체 음성검출 시스템이 검출한 최종적인 음성부분의 시작점과 끝점에 대한 시간정보가 된다.In this way, the plurality of voice detectors 221 to 22N independently detect voice signals input from the voice input unit. The time information on the start point and the end point of the voice, which are the results detected by the plurality of voice detectors 221 to 22N, respectively, is transmitted to the detection result combination unit 23. The detection result combination unit 23 calculates a weighted average value of the transmitted detection results by Equation 1 and Equation 2 below. The weighted average value thus obtained is time information about the start and end points of the final speech portion detected by the entire speech detection system.

여기서, S와 E는 전체 검출시스템의 최종결과로 출력되는 음성부분의 시작점(시간)과 끝점이다.Here, S and E are the start point (time) and end point of the voice portion output as the final result of the entire detection system.

는 최종적으로 음성의 시작점을 구하기 위해 각각의 검출기들의 검출결과들을 조합하는 과정에서 i번째 검출기에 대해 적용하는 가중치이며,는 최종적으로 음성의 끝점을 구하기 위해 각각의 검출기들의 검출결과들을 조합하는 과정에서 i번째 검출기에 대해 적용하는 가중치이고, 이 가중치들은 별도의 훈련과정에서 결정된다. Is the weight that is applied to the i th detector in the process of combining the detection results of the respective detectors to finally obtain the starting point of speech. Is the weight applied to the i th detector in the process of combining the detection results of the respective detectors to finally find the end point of the voice, and these weights are determined in a separate training process.

및는 i번째 검출기에서 구한 시작점과 끝점이다. And Is the start and end points obtained from the i th detector.

도 3 은 본 발명에 따른 다수의 하위 음성검출 시스템을 이용한 음성검출 방법에 대한 일실시예 흐름도이다.3 is a flowchart illustrating a voice detection method using a plurality of lower voice detection systems according to the present invention.

도 3에 도시된 바와 같이, 음성입력부로부터 음성 입력신호를 입력받아(301)각 음성검출기를 통해 독립적으로 음성의 시작점을 검출하기 시작하며(302), 시작점을 검출한 음성검출기는 이어서 끝점을 검출하기 시작한다(303).As shown in FIG. 3, the voice input signal is input from the voice input unit 301 to start detecting the start point of the voice independently through each voice detector 302, and the voice detector which detects the start point then detects the end point. (303).

이어서, 모든 검출기에서 음성의 시작부분이 검출되었는지를 확인하여(304) 모든 검출기에서 음성의 시작부분이 검출되었으면 개별 검출 결과를 조합하여 최종 음성 시작점을 출력하고(305), 검출되지 않았으면 모든 음성검출기에서 음성의 시작부분이 검출되었는지를 확인하는 과정(303)을 반복 수행한다. 이때, 시작부분 검출 과정이 종료되면 상기의 [수학식 1] 및 [수학식 2]에 의해 최종적으로 음성의 시작점을 구하여 출력한다.Then, it is confirmed whether the beginning of speech is detected at all detectors (304), and if the beginning of speech is detected at all detectors, the individual speech results are combined to output the final speech starting point (305), and if not, all speech is detected. The process of checking whether the beginning of the voice is detected in the detector is repeated 303. At this time, when the start portion detection process is completed, the starting point of the speech is finally obtained by using Equation 1 and Equation 2 and output.

이후, 모든 음성검출기에서 음성의 끝부분이 검출되었는지를 확인하여(306) 끝부분이 검출되었으면 개별 검출 결과를 조합하여 최종 음성 끝점을 출력하고(307), 검출되지 않았으면 모든 음성 검출기에서 음성의 끝부분이 검출되었는지를 확인하는 과정(306)을 반복 수행한다. 이때, 끝부분 검출 과정이 종료되면 상기의 [수학식 1] 및 [수학식 2]에 의해 최종적으로 음성의 끝점을 구하여 출력한다.Then, it is checked whether all voice detectors have detected the end of the voice (306). If the end is detected, the individual voices are combined to output the final voice endpoint (307). Checking whether the end is detected is repeated 306. At this time, when the end detection process is completed, the final end point of the voice is finally obtained by using Equation 1 and Equation 2 and output.

도 4 는 본 발명에 따른 다수의 하위 음성검출 시스템을 이용한 음성검출 방법에서 하위 각 음성검출 시스템별 가중치를 결정하기 위한 훈련과정에 대한 일실시예 흐름도이다.4 is a flowchart illustrating a training process for determining a weight for each lower voice detection system in a voice detection method using a plurality of lower voice detection systems according to the present invention.

도 4에 도시된 바와 같이, 가중치를 결정하기 위한 훈련과정에 사용되는 음성데이터는 수작업으로 음성부의 시작점과 끝점이 검출되어 있는 상태이다. 훈련용 음성 데이터를 입력받은(401) 각 검출기는 각 검출기별로 독립적으로 음성검출을수행하고(402) 수행된 결과를 각각 저장한다(403).As shown in FIG. 4, the voice data used in the training process for determining the weight is a state where the start point and the end point of the voice unit are detected by hand. Each detector that receives training voice data (401) performs voice detection independently for each detector (402) and stores the performed results, respectively (403).

이어서, 모든 훈련용 데이터에 대해 음성 검출 과정이 종료되었는지를 확인하여(404) 끝났으면 저장된 검출 결과와 레이블 데이터(Label data), 즉 수작업으로 구해진 검출결과를 이용하여 아래의 [수학식 3] 및 [수학식 4]에 의해 각 검출기별 가중치를 결정하게 된다(44).Subsequently, it is checked whether the voice detection process is finished for all training data (404), and when it is finished, using the stored detection result and label data, that is, the detection result obtained by hand, the following [Equation 3] and Equation 4 determines the weight for each detector (44).

여기서,는 본 발명이 적용되는 음성검출시스템이 최종적으로 음성의 시작점을 구하기 위해 각 검출기의 검출결과들을 조합하는 과정(305)에서 i번째 검출기에 대해 적용하는 가중치이고,는 최종적으로 음성의 끝점을 구하기 위해 각 검출기의 검출결과들을 조합하는 과정(307)에서 i번째 검출기에 대해 적용하는 가중치이다.here, Is a weight applied to the i-th detector in the process 305 of combining the detection results of the respective detectors to finally obtain the starting point of the speech by the voice detection system to which the present invention is applied. Is the weight applied to the i th detector in the step 307 of combining the detection results of each detector to finally find the end point of the voice.

는 훈련과정에서 j번째 입력음성에 대해 i번째 검출기에서 구한 음성의 시작점이고,는 훈련과정에서 j번째 입력음성에 대해 i번째 검출기에서 구한 음성의 끝점이다. Is the starting point of the voice obtained from the i th detector for the j th input voice during the training process, Is the end point of the voice obtained by the i th detector for the j th input voice during the training process.

는 훈련과정에서 j번째 입력음성에 대해 수작업으로 구한 음성의 시작점이고,는 훈련과정에서 j번째 입력음성에 대해 수작업으로 구한 음성의 끝점이다. Is the starting point of the voice manually obtained for the jth input voice during training. Is the endpoint of the voice manually obtained for the jth input voice during training.

여기서, N은 하위 음성검출기의 총 개수이며, M은 훈련용 데이터의 크기(발화의 개수)이다.Here, N is the total number of lower voice detectors, and M is the size of the training data (number of speeches).

상술한 바와 같은 본 발명의 방법은 프로그램으로 구현되어 컴퓨터로 읽을 수 있는 기록매체(씨디롬, 램, 롬, 플로피 디스크, 하드 디스크, 광자기 디스크 등)에 저장될 수 있다.The method of the present invention as described above may be implemented as a program and stored in a computer-readable recording medium (CD-ROM, RAM, ROM, floppy disk, hard disk, magneto-optical disk, etc.).

이상에서 설명한 본 발명은 전술한 실시예 및 첨부된 도면에 의해 한정되는 것이 아니고, 본 발명의 기술적 사상을 벗어나지 않는 범위 내에서 여러 가지 치환, 변형 및 변경이 가능하다는 것이 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자에게 있어 명백할 것이다.The present invention described above is not limited to the above-described embodiments and the accompanying drawings, and various substitutions, modifications, and changes are possible in the art without departing from the technical spirit of the present invention. It will be apparent to those of ordinary knowledge.

상기한 바와 같은 본 발명은, 독립적으로 동작하는 음성검출 시스템들의 결과를 조합하되, 각각의 시스템에 가중치를 두어 최종결과를 얻게 되며, 가중치는훈련과정에서 구하며 최적의 음성검출성능을 내도록 구해진다. 따라서, 본 발명은 독립적으로 동작하는 여러 음성 검출시스템들의 결과를 모아 최적의 검출성능을 나타내도록 조합함으로써, 기발표된 음성검출시스템의 검출성능보다 우수한 검출성능을 얻을 수 있는 효과가 있다.As described above, the present invention combines the results of independently operating voice detection systems, weights each system to obtain a final result, and the weight is obtained in a training process to obtain an optimal voice detection performance. Therefore, the present invention has the effect of obtaining a detection performance superior to that of the previously announced voice detection system by combining the results of several independently operated voice detection systems to show the optimal detection performance.

Claims

In the voice detection device using a plurality of lower voice detection system,

Voice signal input means for receiving a voice input signal transmitted from the outside;

A plurality of detection means for detecting a start point and an end point of the voice independently of the voice input signal transmitted through the voice signal input means;

Detection result combining means for combining the detection results obtained through the plurality of detection means to obtain a weighted average value; And

Detection result output means for outputting detection results for the start point and end point of the voice transmitted through the detection result combining means

Voice detection apparatus using a plurality of lower voice detection system comprising a.

The method of claim 1,

The weighted average value is,

A voice detection apparatus using a plurality of subordinate voice detection systems, which is obtained by the following equation and is time information on the start and end points of the final voice portion detected by the entire voice detection system.

,

Where S and E are the starting point (time) and ending point of the speech part output as the final result of the whole detection system, Is the weight applied to the i th detector in the process of combining the detection results of the detectors to finally find the starting point of the speech. Is the weight applied to the i th detector in the process of combining the detection results of the detectors to finally find the end point of the voice, And Is the start and end points of the i th detector for the input voice)

The method according to claim 1 or 2,

The weight is,

A voice detection device using a plurality of subordinate voice detection systems, which is determined for each detector by the following equation, and is determined in a process of obtaining from training data.

,

(here, Is the weight that the voice detection system applies to the i th detector in the process of combining the detection results of the detectors to finally find the starting point of the voice, Is the weight applied to the i th detector in the process of combining the detection results of each detector to finally find the end point of the voice, Is the starting point of the voice obtained from the i th detector for the j th input voice during the training process, Is the end point of the voice obtained from the i th detector for the j th input voice during training. Is the starting point of the voice manually obtained for the jth input voice during training. Is the endpoint of the voice manually obtained for the jth input voice during training, N is the total number of lower voice detectors, and M is the size of the training data (number of speeches).

In the voice detection method applied to a voice detection device using a plurality of lower voice detection system,

A first step of receiving a voice input signal transmitted from the outside;

Detecting a start point and an end point of a voice independently of the input voice input signal;

A third step of obtaining a weighted average value by combining the obtained detection results; And

A fourth step of outputting the obtained weighted average value as a start point and an end point of the voice obtained by the entire system;

Voice detection method using a plurality of lower voice detection system comprising a.

The method of claim 4, wherein

The weighted average value is,

A voice detection method using a plurality of subordinate voice detection systems, which is obtained by the following equation and is time information on the start and end points of the final voice portion detected by the entire voice detection system.

,

Where S and E are the starting point (time) and ending point of the speech part output as the final result of the whole detection system, Is the weight applied to the i th detector in the process of combining the detection results of the detectors to finally find the starting point of the speech. Is the weight applied to the i th detector in the process of combining the detection results of each detector to finally find the end point of the voice, And Is the starting point and ending point from the i th detector)

The method according to claim 4 or 5,

The weight is,

A method of detecting a voice using a plurality of lower voice detection systems, wherein each detector is obtained by the following equation, and is determined through a process of obtaining from training data.

,

In the voice detection device using a plurality of lower voice detection system having a processor,

A first function of receiving a voice input signal transmitted from the outside;

A second function of detecting a start point and an end point of a voice independently of the input voice input signal;

A third function of combining the obtained detection results to obtain a weighted average value; And

A fourth function of outputting the obtained weighted average value as a start point and an end point of the voice obtained by the entire system;

A computer-readable recording medium having recorded thereon a program for realizing this.