KR101144162B1

KR101144162B1 - Apparatus for Detecting Audio Target Signal and Method of The same

Info

Publication number: KR101144162B1
Application number: KR1020100076920A
Authority: KR
Inventors: 강홍구; 이봉진
Original assignee: 연세대학교 산학협력단
Priority date: 2010-08-10
Filing date: 2010-08-10
Publication date: 2012-05-10
Also published as: KR20120014755A

Abstract

오디오 타겟 신호의 검출장치 및 방법이 개시된다. 보다 상세하게는 본 발명은, 오디오 스트림을 미리 결정된 시간 단위의 적어도 하나 이상의 세그먼트들로 분할하는 세그먼트 분할부와, 분할된 세그먼트별로 특성 파라미터들을 추출하는 특성 파라미터 추출부와, 특성 파라미터 추출부에서 추출된 세그먼트별 특성 파라미터들의 정규분포 매칭도를 측정하는 정규성 측정부 및 정규성 측정부에서 측정된 결과값과 특성 파라미터들을 미리 결정된 기준 모델에 적용하여 타겟 신호를 검출하는 타겟 신호 검출부를 포함하는 타겟 신호의 검출장치 및 이를 이용한 타겟 신호의 검출방법에 관한 것이다.An apparatus and method for detecting an audio target signal are disclosed. More specifically, the present invention, the segment splitter for dividing the audio stream into at least one or more segments of a predetermined time unit, a characteristic parameter extractor for extracting the characteristic parameters for each segmented segment, and a characteristic parameter extractor The target signal including a normality measurer for measuring the normality matching degree of the characteristic parameters for each segment and a target signal detector for detecting the target signal by applying the result values and the characteristic parameters measured in the normality measurer to a predetermined reference model. A detection apparatus and a method for detecting a target signal using the same.

Description

Apparatus for Detecting Audio Target Signal and Method of The same

본 발명은 오디오 타겟 신호의 검출방법 및 장치에 관한 것이다. 보다 상세하게는, 오디오 스트림(audio stream)을 수신하여 수신된 오디오 신호에 타겟 신호(이벤트 신호, 비정상 신호)가 포함되었는지 여부를 탐지하는 기술에 관한 것이다.The present invention relates to a method and apparatus for detecting an audio target signal. More specifically, the present invention relates to a technique for receiving an audio stream and detecting whether a received signal includes a target signal (event signal, abnormal signal).

오디오 스트림으로부터 타겟 신호를 검출하기 위한 종래의 가장 일반적인 접근 방법은 파워/에너지 변화를 이용하는 파워/에너지 기반 검출방법과 다른 접근 방법으로 가우시안 혼합 모델(Gaussian Mixture Model, 이하 'GMM' 이라 함)을 이용한 통계적인 접근 방법인 통계 기반 검출방법이 있었다.The most common approach for detecting target signals from an audio stream is a power / energy-based detection method using power / energy variations and a different approach using a Gaussian Mixture Model (GMM). There was a statistical approach, which is a statistical approach.

파워/에너지 기반의 검출방법은 입력으로 들어오는 오디오 신호에서 프레임 단위로 파워/에너지값을 계산하고, 그 파워/에너지값이 임계치를 넘는지의 여부에 따라 잡음 신호를 검출하는 방법으로, 이와 같은 접근 방법은 구현의 단순함과 적은 자원으로 동작이 가능 하다는 장점이 있으나, 모든 환경에 적용할 수 있는 임계치 설정이 어렵고, 단순 파워/에너지값으로만 잡음 여부를 판단하여 그 성능에는 한계가 있었다.The power / energy-based detection method calculates power / energy values in units of frames from an audio signal coming into the input and detects a noise signal according to whether the power / energy value exceeds a threshold. Has the advantages of simplicity of implementation and operation with few resources, but it is difficult to set the threshold that can be applied to all environments, and its performance was limited by judging noise by simple power / energy value.

한편, 다른 접근 방법인 GMM을 이용하는 통계 기반 검출방법은, 프레임 단위로 들어오는 음성신호를 이용하여 각 모델의 확률 값을 계산하고 이를 이용하여 해당 프레임이 어떤 모델과 유사한지를 결정하는 방법이다. GMM을 이용한 통계적인 접근 방법의 경우에는 파워/에너지값이 작은 스크래치 잡음의 검출에도 좋은 성능을 보이고, 성능 면에서는 파워/에너지 기반의 잡음 검출방법보다는 우수하지만, 유사한 특성의 신호 검출에 있어서는 많은 오류를 포함하게 되는 문제점이 있었다.
On the other hand, the statistics-based detection method using GMM, another approach, is to calculate the probability value of each model using the incoming voice signal in units of frames and to determine which model is similar to the corresponding frame by using the same. The statistical approach using GMM shows good performance in the detection of scratch noise with small power / energy value, and is superior to the power / energy-based noise detection method in terms of performance, but it has many errors in detecting similar signals. There was a problem to include.

상술한 문제점을 해결하기 위한 관점으로부터 본 발명은, 통계 기반의 오디오 타겟 신호 검출에 있어서, 검출 오류를 최소화하기 위해 수신된 오디오 스트림에서 추출된 신호의 정규분포(normality) 매칭도를 측정하여 이를 기준 모델에 가중치로 부여하는 오디오 타겟 신호의 검출장치 및 방법을 제공함을 기술적 과제로 한다.In view of solving the above problems, the present invention, in the detection of statistics-based audio target signal, to measure the normality matching degree of the signal extracted from the received audio stream in order to minimize the detection error, An object of the present invention is to provide an apparatus and method for detecting an audio target signal, which is weighted to the model.

또한, 본 발명은 오디오 스트림을 소정 시간 간격으로 분할한 세그먼트에서 상기 세그먼트의 정규분포 매칭시 피크(peak)와 타겟 신호에 대한 우도비(likelihood ratio)의 피크가 어긋나는 특성을 보정한 오디오 타겟 신호의 검출장치 및 방법을 제공함을 또 다른 기술적 과제로 한다.In addition, the present invention provides a method for correcting a characteristic in which a peak of a likelihood ratio with respect to a target signal is shifted in a segment obtained by dividing an audio stream at predetermined time intervals. Another object of the present invention is to provide a detection apparatus and method.

그러나, 본 발명의 기술적 과제는 상기에 언급된 사항으로 제한되지 않으며, 언급되지 않은 또 다른 목적들은 아래의 기재로부터 당업자에게 명확하게 이해될 수 있을 것이다.However, the technical problem of the present invention is not limited to the above-mentioned matters, and other objects not mentioned will be clearly understood by those skilled in the art from the following description.

상기한 기술적 과제를 달성하기 위해서 본 발명에 따른 오디오 스트림에서 타겟 신호를 검출하는 장치는, 오디오 스트림에서 특성 파라미터들을 추출하는 특성 파라미터 추출부와, 상기 특성 파라미터 추출부에서 추출된 특성 파라미터들의 정규분포 매칭도를 측정하는 정규성 측정부 및 상기 정규성 측정부에서 측정된 결과값과 상기 특성 파라미터들을 미리 결정된 기준 모델에 적용하여 타겟 신호를 검출하는 타겟 신호 검출부를 포함한다.In order to achieve the above technical problem, an apparatus for detecting a target signal in an audio stream according to the present invention includes a feature parameter extractor for extracting feature parameters from an audio stream, and a normal distribution of feature parameters extracted by the feature parameter extractor. A normality measuring unit for measuring a matching degree and a target signal detector for detecting a target signal by applying the result value and the characteristic parameters measured in the normality measuring unit to a predetermined reference model.

여기서, 상기 타겟 신호 검출부는, 상기 정규성 측정부에서 측정된 결과값을 상기 기준 모델에 가중치로서 적용하는 것이 좋다.Here, the target signal detection unit, it is preferable to apply the result value measured by the normality measuring unit as a weight to the reference model.

또한, 상기 기준 모델은 GMM(Gaussian Mixture Model)인 것이 바람직하며,In addition, the reference model is preferably a GMM (Gaussian Mixture Model),

상기 GMM은 미리 학습된 타겟 신호의 통계적 모델에 관한 파라미터와 미리 학습된 배경 신호의 통계적 모델에 관한 파라미터를 이용하는 것이 더욱 바람직하다.More preferably, the GMM uses a parameter relating to a statistical model of a pre-learned target signal and a parameter relating to a statistical model of a pre-learned background signal.

그리고, 상기 정규성 측정부의 정규분포 매칭도는 다변량 정규분포 매칭도인 것도 좋다.The normality matching degree of the normality measuring unit may be a multivariate normal distribution matching degree.

또한, 상기한 기술적 과제를 달성하기 위해서 본 발명에 따른 오디오 스트림에서 타겟 신호를 검출하는 장치는, 오디오 스트림을 미리 결정된 시간 단위의 적어도 하나 이상의 세그먼트들로 분할하는 세그먼트 분할부와, 상기 분할된 세그먼트별로 특성 파라미터들을 추출하는 특성 파라미터 추출부와, 상기 특성 파라미터 추출부에서 추출된 상기 세그먼트별 특성 파라미터들의 정규분포 매칭도를 측정하는 정규성 측정부 및 상기 정규성 측정부에서 측정된 결과값과 상기 특성 파라미터들을 미리 결정된 기준 모델에 적용하여 타겟 신호를 검출하는 타겟 신호 검출부를 포함한다.In addition, in order to achieve the above technical problem, an apparatus for detecting a target signal in an audio stream according to the present invention includes a segment divider for dividing an audio stream into at least one or more segments of a predetermined time unit, and the divided segment. A feature parameter extractor for extracting feature parameters for each feature, a normality measurer for measuring a normal distribution matching degree of the feature parameters for each segment extracted by the feature parameter extractor, and a result value and the feature parameter measured by the normality measurer And a target signal detector for detecting the target signal by applying the same to the predetermined reference model.

여기서, 상기 정규성 측정부에서 측정된 결과값은 상기 세그먼트 분할부에서 분할된 세그먼트를 소정 시간 간격만큼 오프셋 보정하여 형성되는 세그먼트에 대한 결과값인 것이 좋다.The result value measured by the normality measurer may be a result value of a segment formed by offset-correcting the segment divided by the segment divider by a predetermined time interval.

그리고, 상기 타겟 신호 검출부는, 상기 정규성 측정부에서 측정된 결과값은 상기 기준 모델에 가중치로서 적용되는 것이 바람직하다.In addition, the target signal detection unit, the result value measured by the normality measuring unit is preferably applied as a weight to the reference model.

그리고, 상기 정규성 측정부의 정규분포 매칭도는 다변량 정규분포 매칭도인 것이 좋다.The normality matching degree of the normality measuring unit may be a multivariate normal distribution matching degree.

한편, 상기한 기술적 과제를 달성하기 위해서 본 발명에 따른 오디오 스트림에서 타겟 신호를 검출하는 방법은,(a) 오디오 스트림을 미리 결정된 시간 단위의 적어도 하나 이상의 세그먼트들로 분할하는 단계와, (b) 상기 분할된 세그먼트별로 특성 파라미터들을 추출하는 단계와, (c) 상기 추출된 상기 세그먼트별 특성 파라미터들의 정규분포 매칭도를 측정하는 단계 및 (d) 상기 (c)단계에서 측정된 결과값과 상기 (b)단계에서 추출된 특성 파라미터들을 미리 결정된 기준 모델에 적용하여 타겟 신호를 검출하는 단계를 포함한다.Meanwhile, in order to achieve the above technical problem, a method of detecting a target signal in an audio stream according to the present invention includes: (a) dividing an audio stream into at least one or more segments of a predetermined time unit, and (b) Extracting the characteristic parameters for each of the divided segments, (c) measuring a normal distribution matching degree of the extracted characteristic parameters for each segment, and (d) the result value measured in step (c) and the ( detecting a target signal by applying the characteristic parameters extracted in step b) to a predetermined reference model.

여기서, 상기 (c)단계에서 측정된 결과값은 상기 분할된 세그먼트를 소정 시간 간격만큼 오프셋 보정하여 형성되는 세그먼트에 대한 결과값인 것이 좋다.Here, the result value measured in step (c) may be a result value for the segment formed by offset-correcting the divided segment by a predetermined time interval.

또한, 상기 (d)단계에서 상기 결과값은 상기 기준 모델에 가중치로서 적용되는 것을 특징으로 하는 것도 좋다.In the step (d), the result value may be applied as a weight to the reference model.

그리고, 상기 기준 모델은 GMM(Gaussian Mixture Model)인 것이 바람직하다.In addition, the reference model is preferably a GMM (Gaussian Mixture Model).

또한 바람직하게는, 상기 GMM은 미리 학습된 타겟 신호의 통계적 모델에 관한 파라미터와 미리 학습된 배경 신호의 통계적 모델에 관한 파라미터를 이용할 수 있다.Also preferably, the GMM may use a parameter relating to a statistical model of a pre-learned target signal and a parameter relating to a statistical model of a pre-learned background signal.

그리고, 상기 (d)단계에서 타겟 신호의 검출은 상기 (c)단계에서 측정된 결과값과 상기 (b)단계에서 추출된 특성 파라미터들을 미리 결정된 기준 모델에 적용하여 생성된 결과값이 미리 결정된 임계값을 초과하는지 여부로 결정할 수 있다.In step (d), the detection of the target signal is performed by applying the result value measured in step (c) and the characteristic parameters extracted in step (b) to a predetermined reference model. It can be determined whether the value is exceeded.

여기서, 상기 (c)단계의 정규분포 매칭도는 다변량 정규분포 매칭도인 것이 바람직할 것이다.In this case, the normal distribution matching degree of step (c) may be a multivariate normal distribution matching degree.

본 명세서의 기재를 통해 파악되는 본 발명에 따른 오디오 타겟 신호의 검출장치 및 방법에 의하면, 수신된 오디오 스트림에서 추출된 신호의 정규분포(normality) 매칭도를 측정하여 이를 기준 모델에 가중치로 부여하므로, 오디오 타겟 신호의 검출을 보다 정밀하게 수행할 수 있다.According to the apparatus and method for detecting an audio target signal according to the present invention, which is identified through the description of the present specification, since a normality matching degree of a signal extracted from a received audio stream is measured and weighted to a reference model, The detection of the audio target signal can be performed more precisely.

도 1은 본 발명의 일 실시예에 따른 오디오 타겟 신호 검출장치를 개략적으로 도시한 블록도,
도 2는 본 발명의 다른 실시예에 따른 오디오 타겟 신호 검출장치를 개략적으로 도시한 블록도,
도 3은 오디오 스트림으로부터 수신된 신호에 대한 노말리티 테스트 시행으로 도출된 정규 분포 매칭도를 측정한 결과값을 예시적으로 개시한 테이블,
도 4는 머신건 사운드와 화이트 노이즈 사운드에 대한 노말리티 테스트 결과를 도시한 그래프,
도 5는 세그먼트 보정에 따른 DET curve를 도시한 그래프,
도 6은 종래의 오디오 타겟 신호 검출방법과 본 발명에 따른 오디오 타겟 신호 검출방법의 성능 비교를 위한 시뮬레이션 결과 그래프,
도 7은 본 발명에 따른 오디오 타겟 신호의 검출방법에 대한 플로우 차트이다.1 is a block diagram schematically showing an audio target signal detection apparatus according to an embodiment of the present invention;
2 is a block diagram schematically showing an audio target signal detection apparatus according to another embodiment of the present invention;
3 is a table exemplarily illustrating a result of measuring a normal distribution matching degree derived from a normality test on a signal received from an audio stream;
4 is a graph showing normality test results for machine gun sound and white noise sound;
5 is a graph illustrating a DET curve according to segment correction;
6 is a simulation result graph for comparing the performance of a conventional audio target signal detection method and an audio target signal detection method according to the present invention;
7 is a flowchart illustrating a method of detecting an audio target signal according to the present invention.

이하에서는 본 발명의 바람직한 실시예를 첨부된 도면들을 참조하여 상세히 설명한다. 여기의 설명에서 어떤 구성 요소가 다른 구성 요소에 연결된다고 기술될 때, 이는 다른 구성 요소에 바로 연결될 수도 그 사이에 제3의 구성 요소가 개재될 수도 있음을 의미한다. 우선 각 도면의 구성 요소들에 참조 부호를 부가함에 있어서, 동일한 구성 요소들에 대해서는 비록 다른 도면상에 표시되더라도 가능한 한 동일한 부호를 가지도록 하고 있음에 유의해야 한다. 이때 도면에 도시되고 또 이것에 의해서 설명되는 본 발명의 구성과 작용은 적어도 하나의 실시예로서 설명되는 것이며, 이것에 의해서 본 발명의 기술적 사상과 그 핵심 구성 및 작용이 제한되지는 않는다.Hereinafter, exemplary embodiments of the present invention will be described in detail with reference to the accompanying drawings. In the description herein, when a component is described as being connected to another component, this means that the component may be directly connected to another component or an intervening third component may be interposed therebetween. First, in adding reference numerals to the components of each drawing, it should be noted that the same reference numerals are assigned to the same components as much as possible, even if shown on different drawings. At this time, the configuration and operation of the present invention shown in the drawings and described by it will be described as at least one embodiment, by which the technical spirit of the present invention and its core configuration and operation is not limited.

본 발명에 대한 상세한 설명에 앞서, 기존의 GMM(Gaussian Mixture Model)을 사용한 오디오 타겟 신호의 검출방법에 대해 간략히 소개한다. 이는 본 발명이 종래의 기술에 비해 개선된 점을 보다 명확하게 할 것이다.Prior to the detailed description of the present invention, an audio target signal detection method using a conventional Gaussian Mixture Model (GMM) will be briefly introduced. This will make it clear that the present invention is improved over the prior art.

일반적으로 GMM은 화자 인식, 음성/음악 분류와 같은 패턴 인식 응용분야에서 광범위하게 사용된다. 따라서, 오디오 신호 중 이벤트 신호(이는 비정상 신호 또는 타겟 신호로서 이하에서 지칭될 수도 있음)를 탐지하기 위해 GMM이 채용되기도 한다. 통상적인 GMM 기반의 오디오 타겟 신호 탐지는 수신되는 오디오 스트림이 타겟 신호에 가까운지 아닌지에 대한 우도비 테스트(likelihood ratio test)를 통해 수행되며 상기 우도비 테스트 결과값이 미리 결정된 임계값을 초과하는 경우 이를 타겟 신호로 결정한다. In general, GMM is widely used in pattern recognition applications such as speaker recognition and speech / music classification. Thus, a GMM may be employed to detect an event signal (which may be referred to hereinafter as an abnormal signal or a target signal) among audio signals. Conventional GMM-based audio target signal detection is performed through a likelihood ratio test of whether or not the received audio stream is close to the target signal and the likelihood ratio test result exceeds a predetermined threshold. This is determined as a target signal.

하기에는 상술한 기존의 GMM(Gaussian Mixture Model)을 사용한 오디오 타겟 신호의 검출방법에 사용되는 [수학식 1] 및 [수학식 2]를 나타내었다.Hereinafter, [Equation 1] and [Equation 2] used in the method of detecting an audio target signal using the conventional Gaussian Mixture Model (GMM) described above are shown.

[수학식 1]은 우도비 테스트에 관한 식이며, [수학식 2]는 우도비 테스트의 결과값을 미리 결정된 임계값과 비교하여 타겟 신호를 검출하기 위한 식이다.
[Equation 1] is an equation for the likelihood ratio test, and [Equation 2] is an equation for detecting the target signal by comparing the result value of the likelihood ratio test with a predetermined threshold value.

[[ 수학식Equation 1] One]

여기서,

는 세그먼트 내의 t번째 프레임의 특성 벡터(특성 파라미터), k는 세그먼트 내 프레임의 개수,

는 타겟 신호의 통계적 모델에 관한 파라미터,

는 배경(background) 신호의 통계적 모델에 관한 파라미터,

은 n번째 특성 세그먼트로서

=

,

은 n번째 특성 세그먼트의 우도비(likelihood ratio) 테스트의 결과값이다.
here,

Is the characteristic vector (characteristic parameter) of the t-th frame in the segment, k is the number of frames in the segment,

Is a parameter relating to the statistical model of the target signal,

Is a parameter for the statistical model of the background signal,

Is the nth attribute segment

=

,

Is the result of the likelihood ratio test of the nth characteristic segment.

[[ 수학식Equation 2] 2]

여기서, θ는 미리 결정된 임계값이다.
Where θ is a predetermined threshold.

이하에서는, 본 발명에 따른 오디오 타겟 신호의 검출장치에 관한 설명을 개시하기로 한다.Hereinafter, a description will be given of an apparatus for detecting an audio target signal according to the present invention.

도 1은 본 발명의 일 실시예에 따른 오디오 타겟 신호 검출장치를 개략적으로 도시한 블록도이다.1 is a block diagram schematically illustrating an audio target signal detection apparatus according to an embodiment of the present invention.

도 1에 도시된 바와 같이, 일 실시예에서 오디오 타겟 신호 검출장치(100)는, 특성 파라미터 추출부(10), 정규성 측정부(20) 및 타겟 신호 검출부(30)를 포함한다.As shown in FIG. 1, in one embodiment, the audio target signal detector 100 includes a characteristic parameter extractor 10, a normality measurer 20, and a target signal detector 30.

특성 파라미터 추출부(10)는 오디오 타겟 신호 검출장치(100)로 유입되는 오디오 스트림(audio stream)을 수신하고 수신된 오디오 스트림으로부터 오디오 신호의 특성 파라미터를 추출한다. 특성 파라미터는 특성 벡터로서 공지된 cepstral 분석방법으로 산출하는 것이 바람직하나, 이에 제한되지 않고 다양한 공지의 방법을 적용하는 것도 가능하다.The characteristic parameter extractor 10 receives an audio stream flowing into the audio target signal detection apparatus 100 and extracts characteristic parameters of an audio signal from the received audio stream. The characteristic parameter is preferably calculated by a cepstral analysis method known as a characteristic vector, but is not limited thereto, and various known methods may be applied.

정규성 측정부(20)는 상술한 특성 파라미터 추출부(10)에서 추출된 특성 파라미터들이 정규 분포에 대응되는 정도인 정규분포 매칭도를 측정한다. 특히 여기의 정규 분포는 다변량 정규 분포인 것이 바람직하며, 이하에서 이러한 다변량 정규 분포 매칭도에 따른 결과값은 p-value로 지칭될 수 있음에 유의하여야 한다.The normality measuring unit 20 measures a normal distribution matching degree, which is a degree in which the characteristic parameters extracted by the above-described characteristic parameter extraction unit 10 correspond to a normal distribution. In particular, it is preferable that the normal distribution here is a multivariate normal distribution, and in the following, it should be noted that a result value according to the multivariate normal distribution matching degree may be referred to as a p-value.

다음으로, 타겟 신호 검출부(30)는 상술한 정규성 측정부에서 측정된 결과값(p-value)과 상기 특성 파라미터들을 미리 결정된 기준 모델에 적용하여 타겟 신호를 검출한다. 타겟 신호 검출부(30)에서 사용되는 미리 결정된 기준 모델은 GMM이다. 따라서, 타겟 신호 검출부(30)에서는 GMM에 정규성 측정부(20)에서 측정된 결과값(p-value)을 가중치로 부여하여 오디오 타겟 신호를 탐지한다.Next, the target signal detector 30 detects the target signal by applying the result value (p-value) measured by the above-described normality measurer and the characteristic parameters to a predetermined reference model. The predetermined reference model used in the target signal detector 30 is a GMM. Therefore, the target signal detector 30 detects the audio target signal by assigning the GMM as a weight to a resultant value (p-value) measured by the normality measurer 20.

이에 대한 보다 상세한 설명은 본 발명의 다른 실시예에 대한 하기의 설명에서 후술한다.A more detailed description thereof will be described later in the following description of another embodiment of the present invention.

도 2는 본 발명의 다른 실시예에 따른 오디오 타겟 신호 검출장치를 개략적으로 도시한 블록도이다.2 is a block diagram schematically illustrating an audio target signal detection apparatus according to another embodiment of the present invention.

도 2에 도시된 바와 같이, 다른 실시예에서 오디오 타겟 신호 검출장치(100)는, 세그먼트 분할부(5), 특성 파라미터 추출부(10), 정규성 측정부(20) 및 타겟 신호 검출부(30)를 포함한다.As shown in FIG. 2, in another embodiment, the audio target signal detector 100 includes a segment splitter 5, a characteristic parameter extractor 10, a normality measurer 20, and a target signal detector 30. It includes.

세그먼트 분할부(5)는 오디오 타겟 신호 검출장치(100)로 유입되는 오디오 스트림을 미리 결정된 시간 단위의 적어도 하나 이상의 세그먼트들로 분할한다.The segment dividing unit 5 divides the audio stream flowing into the audio target signal detecting apparatus 100 into at least one or more segments of a predetermined time unit.

즉, 시간적으로 연속성을 갖는 오디오 스트림 신호를 일정 간격으로 분할하여 분석대상 세그먼트(feature segment)별로 분할하는 것이다. 여기서 분할된 세그먼트들은 그 내부에 프레임들을 구비한다.That is, the audio stream signal having a continuity in time is divided at predetermined intervals and divided into feature segments. The segmented segments here have frames therein.

특성 파라미터 추출부(10)는 세그먼트 분할부(5)에서 분할된 세그먼트들 중 분석대상 세그먼트에 대하여 그 세그먼트에 속하는 각각의 프레임마다의 특성 파라미터(특성 벡터)를 추출한다. 여기서 특성 파라미터를 추출하는 방법에 대하여는 상술한 바 있으므로 중복 설명을 생략한다.The characteristic parameter extracting unit 10 extracts a characteristic parameter (feature vector) for each frame belonging to the segment from the segments divided by the segment dividing unit 5. Since the method of extracting the characteristic parameter has been described above, duplicate description thereof will be omitted.

정규성 측정부(20)는, 상술한 특성 파라미터 추출부(10)에서 추출된 상기 분석대상 세그먼트에 속하는 특성 파라미터들이 정규 분포에 대응되는 정도인 정규분포 매칭도를 측정한다. 특히 여기의 정규 분포는 다변량 정규 분포인 것이 바람직하며, 이하에서 이러한 다변량 정규 분포 매칭도에 따른 결과값은 p-value로 지칭될 수 있음에 유의하여야 함은 전술한 바 있다.The normality measuring unit 20 measures a normal distribution matching degree, which is a degree in which characteristic parameters belonging to the analysis target segment extracted by the characteristic parameter extracting unit 10 described above correspond to a normal distribution. In particular, it is preferable that the normal distribution herein is a multivariate normal distribution, and it should be noted that the result value according to the multivariate normal distribution matching degree may be referred to as p-value below.

또한, 특성 파라미터들의 정규 분포 대응 정도를 측정하는 기술은 이미 공지된 것으로 여기서는 명세서의 간략한 기재를 위해 그 설명을 생략한다.In addition, a technique for measuring the degree of normal distribution correspondence of characteristic parameters is already known, and the description thereof is omitted for the sake of brief description of the specification.

타겟 신호 검출부(30)는 상술한 정규성 측정부에서 측정된 결과값(p-value)과 상기 특성 파라미터들을 미리 결정된 기준 모델에 적용하여 타겟 신호를 검출한다. 타겟 신호 검출부(30)에서 사용되는 미리 결정된 기준 모델은 GMM이다. 따라서, 타겟 신호 검출부(30)에서는 GMM에 정규성 측정부(20)에서 측정된 결과값(p-value)을 가중치로 부여하여 오디오 타겟 신호를 탐지한다. The target signal detector 30 detects the target signal by applying the result value (p-value) measured by the above-described normality measurer and the characteristic parameters to a predetermined reference model. The predetermined reference model used in the target signal detector 30 is a GMM. Therefore, the target signal detector 30 detects the audio target signal by assigning the GMM as a weight to a resultant value (p-value) measured by the normality measurer 20.

하기의 [수학식 3]은 상술한 내용을 수학식으로 정리한 것으로 [수학식 3]을 참조하여 더욱 상세히 본 발명을 설명한다.
Equation 3 below summarizes the above-described contents into equations, and the present invention will be described in more detail with reference to [Equation 3].

[[ 수학식Equation 3] 3]

여기서,

는 타겟 신호의 통계적 모델에 관한 파라미터,

는 배경(background) 신호의 통계적 모델에 관한 파라미터,

은 n번째 특성 세그먼트로서

=

,

는 n-s번째 세그먼트의 p-value,

는 n번째 특성 세그먼트의 우도비(likelihood ratio) 테스트의 결과값이다.here,

Is a parameter relating to the statistical model of the target signal,

Is a parameter for the statistical model of the background signal,

Is the nth attribute segment

=

,

Is the p-value of the ns-th segment,

Is the result of the likelihood ratio test of the nth characteristic segment.

[수학식 3]은 상술한 종래의 GMM 기반의 오디오 타겟 신호 탐지방법에

이 추가된 것인데, 이는 상술한 정규성 측정부(20)에서 분석대상 세그먼트의 특성 파라미터들이 정규분포에 매칭되는 정도를 가중치로 반영한 것임을 나타낸다.[Equation 3] is a method for detecting the conventional GMM-based audio target signal described above

This is added, which indicates that the above-described normality measuring unit 20 reflects the degree of matching of the characteristic parameters of the analysis target segment to the normal distribution as a weight.

즉, 타겟 신호(비정상 신호, 이벤트 신호)는 정규 분포에 매칭되는 정도인 정규성(normality)이 배경신호(정상신호, 비 이벤트 신호)에 비해 상대적은 낮기 때문에, 본 발명에서 GMM에 가중치를 부여함에 있어서

를 사용하는 것이다. 이는 본 발명의 목적이 비정상 신호(타겟 신호, 이벤트 신호)를 검출하고자 하는 것이므로, 노말리티가 낮은 타겟 신호가 오디오 타겟 신호의 검출장치(100)에 유입된 경우 이 타겟 신호를 정확하기 검출하기 위해 상기와 같은 가중치를 부여한 것이다.That is, since the target signal (abnormal signal, event signal) is matched to the normal distribution, the normality (normality) is relatively low compared to the background signal (normal signal, non-event signal), the weight is given to the GMM in the present invention In

Is to use This is because the object of the present invention is to detect abnormal signals (target signals, event signals), in order to accurately detect the target signal when the low normality signal is introduced into the detection apparatus 100 of the audio target signal. The above weight is given.

관련하여, 도 3 및 도 4를 참조하여 설명한다.This will be described with reference to FIGS. 3 and 4.

도 3은 오디오 스트림으로부터 수신된 신호에 대한 노말리티 테스트 시행으로 도출된 정규 분포 매칭도를 측정한 결과값을 예시적으로 개시한 테이블, 도 4는 머신건 사운드와 화이트 노이즈 사운드에 대한 노말리티 테스트 결과를 도시한 그래프이다.3 is a table exemplarily showing a result of measuring a normal distribution matching result derived from a normality test trial on a signal received from an audio stream, and FIG. 4 is a normality test for a machine gun sound and a white noise sound. It is a graph showing the results.

예를 들어, 도 3에 도시된 데이터 중 오디오 타겟 신호 검출장치(100)로 머신건(Mmachinegun) 신호가 유입되면, 정규성 측정부(20)에서는 이에 대한 p-value를 0.22로 산출하게 되므로 상기 [수학식 3]의 가중치는 1-0.22=0.78로 적용된다. For example, when a machine gun signal is introduced into the audio target signal detection apparatus 100 among the data shown in FIG. 3, the normality measuring unit 20 calculates a p-value of 0.22 as described above. Equation 3] is applied to 1-0.22 = 0.78.

이는, 화이트 노이즈 신호가 오디오 타겟 신호 검출장치로 유입된 경우에 [수학식 3]의 가중치가 1-0.5=0.5로 적용되어 산출되는

값보다 더 큰 갑을 갖게 된다. This is calculated by applying a weight of 1-0.5 = 0.5 in Equation 3 when the white noise signal is introduced into the audio target signal detection device.

You will have a larger box than the price.

이와 같이, 본 발명에 따른 오디오 타겟 신호 검출장치는 입력 오디오 신호의 다변량 정규분포 매칭도에 따른 결과값은 가중치로 GMM에 부여하여 정규분포를 따르지 않는 타겟 신호의 검출 오류를 줄이게 되는 것이다.As described above, in the audio target signal detection apparatus according to the present invention, the result value according to the multivariate normal distribution matching degree of the input audio signal is assigned to the GMM as a weight to reduce the detection error of the target signal not following the normal distribution.

나아가, 상기 [수학식 3]을 참조하면, p-value의 세그먼트 인덱스, 즉 n번째세그먼트에 대한 p-value임에도 가중치 term에서는 p-value의 세그먼트 인덱스는 n이 아닌 n-s를 사용한다. 이는 정규성 측정(normality test)의 특성을 반영한 결과로 도 5를 참조하여 설명한다.Further, referring to Equation 3, although the segment index of the p-value, that is, the p-value for the n-th segment, the segment index of the p-value uses n-s instead of n in the weighting term. This will be described with reference to FIG. 5 as a result reflecting the characteristics of the normality test.

우도비(likelihood rate)는 세그먼트 내에 타겟 신호가 존재하기만 하면 극대화되는데 비해, 다변량 정규분포 매칭도에 따른 결과값(p-value of multivariate normality test)은 세그먼트 내에서 타겟 신호가 발생한 지점에서 급격히 변하기 때문에 이를 보정하기 위해 오프셋(offset)을 적용한 p-value를 사용하는 것이다.The likelihood rate is maximized as long as the target signal is present in the segment, whereas the p-value of multivariate normality test changes rapidly at the point where the target signal occurs in the segment. Therefore, to compensate for this, an offset is applied to the p-value.

관련하여, 도 5-도 5는 세그먼트 보정에 따른 DET(Detection Error Trade-off) curve를 도시한 그래프-를 참조하면, 5 to 5 are graphs illustrating a DET (Detection Error Trade-off) curve according to segment correction,

상기 [수학식 3]에서 오프셋 s를 450ms 또는 600ms으로 할 경우 오디오 타겟 신호 검출장치의 성능이 최적화됨을 알 수 있다. 그러나, 본 발명에서, 오프셋 s를 상기와 같이 고정하여야 하는 것은 아니고, 오디오 타겟 검출장치가 설치되는 환경의 여건을 고려하여 장치 설정자가 가변적으로 조절하는 것이 가능함은 물론이다.In Equation 3, when the offset s is set to 450 ms or 600 ms, it can be seen that the performance of the audio target signal detection apparatus is optimized. However, in the present invention, the offset s does not have to be fixed as described above, and it is of course possible that the device setter can variably adjust in consideration of the environment in which the audio target detection apparatus is installed.

이하에서는 본 발명에 따른 오디오 타겟 신호 검출장치와 기존의 오디오 타겟 신호 검출장치의 성능 비교 실험 결과에 대한 설명을 개시한다.Hereinafter, a description will be given of the performance comparison experiment results of the audio target signal detection device according to the present invention and the conventional audio target signal detection device.

도 6은 종래의 오디오 타겟 신호 검출장치와 본 발명에 따른 오디오 타겟 신호 검출장치의 성능 비교를 위한 시뮬레이션 결과 그래프이다.6 is a simulation result graph for comparing the performance of the conventional audio target signal detection device and the audio target signal detection device according to the present invention.

여기서 본 발명에 따른 오디오 타겟 신호 검출장치는 GMM+MVN(Muli-Variate Normality)으로 표시하였으며, 종래의 오디오 타겟 신호 검출장치는 GMM으로 표시하였다. 또한, 0dB이하의 경우에서의 결과만을 도시한 것으로 이는 0dB이상의 경우에서는 에러가 발생하지 않기 때문임에 유의하여야 한다.Here, the audio target signal detection apparatus according to the present invention is denoted by GMM + MVN (Muli-Variate Normality), and the conventional audio target signal detection apparatus is denoted by GMM. In addition, it is noted that only the results in the case of 0dB or less is shown because the error does not occur in the case of 0dB or more.

도 6에 도시된 바와 같이, 여러 경우에서 GMM+MVN이 GMM에 비해 우수한 성능을 보임을 확인할 수 있다.As shown in FIG. 6, it can be seen that in many cases, GMM + MVN performs better than GMM.

다음으로 본 발명에 따른 오디오 타겟 신호의 검출방법에 대한 설명을 개시한다. Next, a description will be given of a method of detecting an audio target signal according to the present invention.

도 7은 본 발명에 따른 오디오 타겟 신호의 검출방법에 대한 플로우 차트이다.7 is a flowchart illustrating a method of detecting an audio target signal according to the present invention.

도 7에 도시된 바와 같이, 오디오 타겟 신호의 검출방법은, 오디오 스트림을 수신하는 S10단계, 상기 수신된 오디오 스트림을 미리 결정된 시간 단위의 적어도 하나 이상의 세그먼트들로 분할하는 S20단계, 분할된 세그먼트별로 특성 파라미터들을 추출하는 S30단계, 추출된 상기 세그먼트별 특성 파라미터들의 정규분포 매칭도를 측정하는 S40단계, S30단계에서 측정된 결과값과 상기 S30단계에서 추출된 특성 파라미터들을 미리 결정된 기준 모델에 적용하는 S50단계 및 상기 S50단계에서 도출되는 우도비(likelihood rate) 결과값과 미리 결정된 임계값을 비교하여 타겟 신호를 검출하는 S60단계를 포함한다.As shown in FIG. 7, the method for detecting an audio target signal includes: step S10 of receiving an audio stream, step S20 of dividing the received audio stream into at least one or more segments of a predetermined time unit, and for each segmented segment. Extracting the characteristic parameters in step S30, measuring the normal distribution matching degree of the extracted characteristic parameters for each segment in step S40, applying the result value measured in step S30 and the characteristic parameters extracted in step S30 to a predetermined reference model And a step S60 of detecting a target signal by comparing a likelihood rate result value derived from the step S50 and the predetermined threshold value with a predetermined threshold value.

이러한 단계를 이용하는 오디오 타겟 신호의 검출방법은 상술한 본 발명에 따른 오디오 타겟 신호 검출장치에 대한 설명으로부터 이해됨이 가능하므로 여기서는 명세서의 간략화를 위해 그 상세한 설명은 생략하기로 한다.The audio target signal detection method using this step can be understood from the description of the above-described audio target signal detection apparatus according to the present invention will be omitted here for the sake of brevity of the specification.

이상과 같이 본 발명은 비록 한정된 실시예와 도면에 의해 설명되었으나, 본 발명은 상기의 실시예에 한정되는 것은 아니며, 이는 본 발명이 속하는 분야에서 통상의 지식을 가진 자라면 이러한 기재로부터 다양한 수정 및 변형이 가능하다. 따라, [k127] 본 발명 사상은 아래에 기재된 특허청구범위에 의해서만 파악되어야 하고, 이의 균등 또는 등가적 변형 모두는 본 발명의 사상적 범주에 속한다.
As described above, the present invention has been described by way of limited embodiments and drawings, but the present invention is not limited to the above-described embodiments, which can be variously modified and modified by those skilled in the art to which the present invention pertains. Modifications are possible. Accordingly, the spirit of the present invention should be understood only by the claims set forth below, and all equivalent or equivalent modifications thereof fall within the spirit of the present invention.

5:세그먼트 분할부 10:특성 파라미터 추출부
20:정규성 측정부 30:타겟 신호 검출부
100:타겟 신호 검출장치 5: segment division part 10: characteristic parameter extraction part
20: regularity measurement unit 30: target signal detection unit
100: target signal detection device

Claims

An apparatus for detecting a target signal in an audio stream,
A feature parameter extraction unit for extracting feature parameters from the audio stream;
A normality measurer for measuring a normal distribution matching degree of the feature parameters extracted by the feature parameter extractor; And
And a target signal detector for detecting a target signal using the characteristic parameters by applying the result value measured by the normality measurer as a weight to a predetermined reference model.

delete

The method of claim 1,
The reference model is a GMM (Gaussian Mixture Model) device for detecting a target signal, characterized in that.

The method of claim 3,
And the GMM uses a parameter relating to a statistical model of a pre-learned target signal and a parameter relating to a statistical model of a pre-learned background signal.

The method of claim 1,
The normal distribution matching degree of the normality measuring unit is a multivariate normal distribution matching degree.

An apparatus for detecting a target signal in an audio stream,
A segment dividing unit dividing the audio stream into at least one segment of a predetermined time unit;
A characteristic parameter extracting unit extracting characteristic parameters for each of the divided segments;
A normality measurer for measuring a normal distribution matching degree of the feature parameters for each segment extracted by the feature parameter extractor; And
And a target signal detector to detect a target signal by applying an offset-corrected value of the result value measured by the normality measurer and the characteristic parameters to a predetermined reference model.

delete

The method according to claim 6,
The target signal detector,
And a resultant value measured by the normality measuring unit is applied as a weight to the reference model.

The method according to claim 6,
The reference model is a GMM (Gaussian Mixture Model) device for detecting a target signal, characterized in that.

The method of claim 9,
And the GMM uses a parameter relating to a statistical model of a pre-learned target signal and a parameter relating to a statistical model of a pre-learned background signal.

The method according to claim 6,
The normal distribution matching degree of the normality measuring unit is a multivariate normal distribution matching degree.

In the method for detecting a target signal in an audio stream,
(a) dividing the audio stream into at least one or more segments of a predetermined time unit;
(b) extracting characteristic parameters for each of the divided segments;
(c) measuring a normal distribution matching degree of the extracted feature parameters for each segment; And
(d) detecting the target signal using the characteristic parameters extracted in step (b) by applying the resultant value measured in step (c) as a weight to a predetermined reference model. .

The method of claim 12,
And the result value measured in step (c) is a result value for a segment formed by offset-correcting the divided segment by a predetermined time interval.

delete

The method of claim 12,
The reference model is a GMM (Gaussian Mixture Model) characterized in that the detection method of the target signal.

16. The method of claim 15,
The GMM uses a parameter relating to a statistical model of a pre-learned target signal and a parameter relating to a statistical model of a pre-learned background signal.

The method of claim 12,
In the detection of the target signal in step (d), the result value measured in step (c) and the characteristic parameters extracted in step (b) are applied to a predetermined reference model to determine a predetermined threshold value. And determining whether or not to exceed the target signal.

The method of claim 12,
The normal distribution matching degree of step (c) is a multivariate normal distribution matching degree.