KR20220129878A

KR20220129878A - Method for Discriminating AMR Voice Data and Apparatus thereof

Info

Publication number: KR20220129878A
Application number: KR1020210034760A
Authority: KR
Inventors: 박철순; 김선교; 유훈
Original assignee: 국방과학연구소
Priority date: 2021-03-17
Filing date: 2021-03-17
Publication date: 2022-09-26
Also published as: KR102550760B1

Abstract

The present invention relates to a method for discriminating adaptive multi-rate (AMR) voice data to easily determine whether voice data is AMR voice data encoded according to an AMR format, and an apparatus thereof. According to one embodiment of the present invention, the method for discriminating AMR voice data comprises: a target data processing step of extracting a centroid, which is a unique factor of target data, from the target data compressed in an AMR format of one of 0^th to seventh modes; a reference data processing step of extracting a reference Eigenfactor, which is an Eigenfactor of reference data, from the reference data compressed in the AMR format; and a comparison and determination step of comparing the centroid and the reference Eigenfactor by using a maximum likelihood (ML) method to discriminate whether the target data is AMR data compressed in an AMR format. The comparison and determination step performs discrimination by a mean squared error (MSE) distribution which is a probability density function of an MSE between the centroid and the reference Eigenfactor.

Description

Method for Discriminating AMR Voice Data and Apparatus thereof

본 발명은 AMR 음성데이터 판별 방법 및 그 장치에 관한 것으로, 더 상세하게는 대상 데이터와 기준 데이터를 비교하여 대상 데이터가 AMR 음성 데이터인지 여부를 용이하게 판정할 수 있는 방법 및 그 장치에 관한 것이다.The present invention relates to a method and apparatus for determining AMR voice data, and more particularly, to a method and apparatus for easily determining whether target data is AMR voice data by comparing target data and reference data.

Adaptive Multi-Rate(이하, "AMR")은 음성 부호화에 최적화된 특허가 있는 오디오 데이터 압축이고, AMR 코덱은 다양한 전송률을 지원하여, 유동적인 환경에서 유연하게 대처할 수 있는 코덱으로 알려져 있다. AMR은 총 8개의 가변 전송률 4.75, 5.15, 5.90, 6.70, 7.40, 7.95, 10.2, 12.2 kbit/s의 협대역 신호(200-3400 Hz)를 인코딩하며, 무선망의 상황에 따라 기지국 제어 시스템에서 AMR 출력 모드를 가변적으로 조정하여 사용자의 통신 서비스 품질(QoS)을 보장하는 다중 속도 협대역 음성 코덱이다.Adaptive Multi-Rate (hereinafter, "AMR") is a patented audio data compression optimized for voice encoding, and the AMR codec is known as a codec that can flexibly cope with a flexible environment by supporting various data rates. AMR encodes a total of 8 variable data rates of 4.75, 5.15, 5.90, 6.70, 7.40, 7.95, 10.2, and 12.2 kbit/s of narrowband signals (200-3400 Hz). It is a multi-rate narrowband voice codec that variably adjusts the output mode to guarantee the user's quality of service (QoS).

본 발명의 실시예들은 음성 데이터가 AMR 형식에 따라 인코딩된 AMR 음성데이터인지 여부를 기확보된 기준 데이터와의 비교를 통해 용이하게 판정할 수 있는 방법 및 그 장치를 제공하고자 한다.Embodiments of the present invention are to provide a method and apparatus for easily determining whether voice data is AMR voice data encoded according to the AMR format through comparison with previously secured reference data.

본 발명의 일 실시예에 따른 제0 모드 내지 제7 모드 중 하나의 AMR 형식으로 압축된 대상 데이터로부터 상기 대상 데이터의 고유 팩터인 센트로이드(Centroid)를 추출하는 대상 데이터 처리 단계; 상기 AMR 형식으로 압축된 기준 데이터로부터 상기 기준 데이터의 고유 팩터인 기준 고유 팩터를 추출하는 기준 데이터 처리 단계; 및 최대가능도(Maximum Likelihood; ML) 방식을 이용하여 상기 센트로이드 및 상기 기준 고유 팩터를 비교하여 상기 대상 데이터가 AMR 형식으로 압축된 AMR 데이터인지 여부를 판별하는 비교 판정 단계를 포함하고, 상기 비교 판정 단계는 상기 센트로이드 및 상기 기준 고유 팩터 간의 평균 제곱 오차(Mean Squared Error; MSE)의 확률 밀도 함수인 MSE 분포를 이용하여 판별한다.A target data processing step of extracting a centroid, which is a unique factor of the target data, from the target data compressed in the AMR format of one of the 0th mode to the 7th mode according to an embodiment of the present invention; a reference data processing step of extracting a reference intrinsic factor, which is an intrinsic factor of the reference data, from the reference data compressed in the AMR format; and a comparison determination step of determining whether the target data is AMR data compressed in an AMR format by comparing the centroid and the reference intrinsic factor using a Maximum Likelihood (ML) method, wherein the comparison The determination step is determined using the MSE distribution, which is a probability density function of a mean squared error (MSE) between the centroid and the reference eigen factor.

상기 대상 데이터 처리 단계는, 상기 대상 데이터로부터 주성분(Principal Component)을 추출하는 단계; 및 상기 주성분으로부터 상기 주성분의 일부 영역에 대한 상기 센트로이드를 추출하는 단계를 포함하고, 상기 센트로이드를 추출하는 단계는, 상기 주성분을 적어도 하나 이상의 비트를 포함하는 복수의 서브프레임으로 분리하는 단계, 및 상기 복수의 서브프레임 중 일부의 서브프레임에 포함된 비트 정보를 기초로 상기 센트로이드를 특정하는 단계를 포함할 수 있다.The target data processing step may include: extracting a principal component from the target data; and extracting the centroid for a partial region of the main component from the main component, wherein the extracting of the centroid comprises dividing the main component into a plurality of subframes including at least one or more bits; and specifying the centroid based on bit information included in some subframes of the plurality of subframes.

상기 대상 데이터가 압축된 AMR 모드인 입력 모드, 및 상기 입력 모드에 대하여 상기 주성분을 추출하기 위한 비트수를 제공하는 점검 모드는 상기 제0 모드 내지 상기 제7 모드를 포함하고, 상기 주성분을 추출하는 단계는, 상기 입력 모드에 대하여 상기 입력 모드와 동일한 점검 모드를 기준으로 추출한 일치 모드의 주성분을 추출하는 단계; 및 상기 입력 모드에 대하여 상기 입력 모드와 상이한 점검 모드를 기준으로 추출한 불일치 모드의 주성분인 상호 팩터(Cross factor)를 추출하는 단계;를 포함할 수 있다.An input mode in which the target data is a compressed AMR mode, and a check mode providing the number of bits for extracting the principal component with respect to the input mode include the 0th mode to the 7th mode, and extracting the principal component The step may include: extracting a principal component of the matching mode extracted based on the same check mode as the input mode with respect to the input mode; and extracting, with respect to the input mode, a cross factor, which is a main component of a mismatch mode extracted based on a check mode different from the input mode.

상기 주성분을 추출하는 단계는, 상기 제0 모드 내지 제7 모드 중 하나의 AMR 형식으로 압축된 대상 데이터의 비트스트림(Bitstream)을 수신하는 데이터 수신 단계; 상기 비트스트림을 기설정된 제1 비트수를 갖는 프레임들로 분리하는 프레임 분리 단계; 상기 분리된 프레임들을 통합하여 비트배열을 생성하는 비트배열 생성 단계; 및 상기 비트배열 및 상기 프레임들의 전체 개수를 기초로 상기 비트스트림의 상기 주성분을 특정하는 단계를 포함하고, 상기 센트로이드는 상기 프레임들 중 일부 프레임을 기준으로 추출될 수 있다.The extracting of the main component may include: a data receiving step of receiving a bitstream of target data compressed in one AMR format among the 0th mode to the 7th mode; a frame separation step of dividing the bitstream into frames having a predetermined first number of bits; a bit array generating step of generating a bit array by integrating the separated frames; and specifying the main component of the bitstream based on the bit array and the total number of frames, wherein the centroid may be extracted based on some of the frames.

상기 기준 데이터 처리 단계는, 상기 기준 데이터로부터 기준 주성분을 추출하는 단계; 및 상기 기준 주성분으로부터 상기 기준 주성분의 일부 영역에 대한 상기 기준 고유 팩터를 추출하는 단계를 포함하고, 상기 기준 고유 팩터를 추출하는 단계는, 상기 기준 주성분을 적어도 하나 이상의 비트를 포함하는 복수의 서브프레임으로 분리하는 단계, 및 상기 복수의 서브프레임 중 일부의 서브프레임에 포함된 비트 정보를 기초로 상기 기준 고유 팩터를 특정하는 단계를 포함할 수 있다.The reference data processing step may include: extracting a reference principal component from the reference data; and extracting the reference eigen factor for a partial region of the reference principal component from the reference principal component, wherein the extracting of the reference eigen factor comprises a plurality of subframes including the reference principal component at least one bit or more. and specifying the reference intrinsic factor based on bit information included in some subframes of the plurality of subframes.

상기 비교 판정 단계는, 상기 센트로이드와 상기 기준 고유 팩터를 기초로 모드 별 MSE 분포를 획득하는 단계; 상기 모드 별 MSE 분포에 기초하여 모드 별 임계값(Threshold value)을 획득하는 단계; 및 상기 대상 데이터의 MSE 분포의 최소값과 상기 모드 별 임계값 중 상기 입력 모드의 임계값을 비교하여, 상기 최소값이 상기 임계값보다 작은 경우 상기 대상 데이터가 AMR 음성 데이터인 것으로 판정하는 단계를 포함할 수 있다.The comparison determination step may include: obtaining an MSE distribution for each mode based on the centroid and the reference intrinsic factor; obtaining a threshold value for each mode based on the MSE distribution for each mode; and comparing the minimum value of the MSE distribution of the target data and the threshold value of the input mode among the threshold values for each mode, and determining that the target data is AMR voice data if the minimum value is less than the threshold value. can

상기 임계값을 획득하는 단계는, 상기 일치 모드의 주성분의 제1 MSE 분포와 상기 불일치 모드의 주성분의 제2 MSE 분포가 같은 값을 가지는 지점을 임계값으로 획득할 수 있다.The acquiring of the threshold may include acquiring, as a threshold, a point at which the first MSE distribution of the principal component of the inconsistent mode has the same value as the second MSE distribution of the principal component of the inconsistent mode.

전술한 본 발명의 실시예들에 따른 방법을 실행시키기 위한 프로그램은 컴퓨터 판독가능한 기록매체에 저장될 수 있다.A program for executing the method according to the above-described embodiments of the present invention may be stored in a computer-readable recording medium.

본 발명의 일 실시예에 따른 제0 모드 내지 제7 모드 중 하나의 AMR 형식으로 압축된 대상 데이터로부터 상기 대상 데이터의 고유 팩터인 센트로이드(Centroid)를 추출하는 대상 데이터 처리부; 상기 AMR 형식으로 압축된 기준 데이터로부터 상기 기준 데이터의 고유 팩터인 기준 고유 팩터를 추출하는 기준 데이터 처리부; 및 최대가능도(Maximum Likelihood; ML) 방식을 이용하여 상기 센트로이드 및 상기 기준 고유 팩터를 비교하여 상기 대상 데이터가 AMR 형식으로 압축된 AMR 데이터인지 여부를 판별하는 비교부;를 포함하고, 상기 비교부는 상기 센트로이드 및 상기 기준 고유 팩터 간의 평균 제곱 오차(Mean Squared Error; MSE)의 확률 밀도 함수인 MSE 분포를 이용하여 판별한다.a target data processing unit for extracting a centroid, which is a unique factor of the target data, from the target data compressed in the AMR format of one of the 0th mode to the 7th mode according to an embodiment of the present invention; a reference data processing unit for extracting a reference intrinsic factor, which is an intrinsic factor of the reference data, from the reference data compressed in the AMR format; and a comparison unit that compares the centroid and the reference intrinsic factor using a Maximum Likelihood (ML) method to determine whether the target data is AMR data compressed in an AMR format; The negative is determined using the MSE distribution, which is a probability density function of the mean squared error (MSE) between the centroid and the reference eigen factor.

상기 대상 데이터 처리부는, 상기 대상 데이터로부터 주성분(Principal Component)을 추출하는 주성분 추출부; 및 상기 주성분으로부터 상기 주성분의 일부 영역에 대한 상기 센트로이드를 추출하는 센트로이드 추출부를 포함하고, 상기 센트로이드 추출부는, 상기 주성분을 적어도 하나 이상의 비트를 포함하는 복수의 서브프레임으로 분리하고, 상기 복수의 서브프레임 중 일부의 서브프레임에 포함된 비트 정보를 기초로 상기 센트로이드를 특정할 수 있다.The target data processing unit may include: a principal component extractor configured to extract a principal component from the target data; and a centroid extraction unit for extracting the centroid for a partial region of the main component from the main component, wherein the centroid extraction unit divides the main component into a plurality of subframes including at least one or more bits, the plurality of The centroid may be specified based on bit information included in some subframes of subframes of .

상기 대상 데이터가 압축된 AMR 모드인 입력 모드, 및 상기 입력 모드에 대하여 상기 주성분을 추출하기 위한 비트수를 제공하는 점검 모드는 상기 제0 모드 내지 상기 제7 모드를 포함하고, 상기 주성분 추출부는, 상기 입력 모드에 대하여 상기 입력 모드와 동일한 점검 모드를 기준으로 추출한 일치 모드의 주성분을 추출하고, 상기 입력 모드에 대하여 상기 입력 모드와 상이한 점검 모드를 기준으로 추출한 불일치 모드의 주성분인 상호 팩터(Cross factor)를 추출할 수 있다.An input mode in which the target data is a compressed AMR mode, and a check mode for providing the number of bits for extracting the principal component with respect to the input mode include the 0th mode to the 7th mode, wherein the principal component extraction unit comprises: For the input mode, a principal component of the coincidence mode extracted based on the same check mode as the input mode is extracted, and with respect to the input mode, a cross factor which is a main component of the inconsistency mode extracted based on a check mode different from the input mode ) can be extracted.

상기 주성분 추출부는, 상기 제0 모드 내지 제7 모드 중 하나의 AMR 형식으로 압축된 대상 데이터의 비트스트림(Bitstream)을 수신하는 데이터 수신부; 상기 비트스트림을 기설정된 제1 비트수를 갖는 프레임들로 분리하는 프레임 분리부; 상기 분리된 프레임들을 통합하여 비트배열을 생성하는 비트배열 생성부; 및 상기 비트배열 및 상기 프레임들의 전체 개수를 기초로 상기 비트스트림의 상기 주성분을 특정하는 주성분 특정부를 포함하고, 상기 센트로이드는 상기 프레임들 중 일부 프레임을 기준으로 추출될 수 있다.The principal component extracting unit may include: a data receiving unit configured to receive a bitstream of target data compressed in an AMR format of one of the 0th mode to the 7th mode; a frame dividing unit dividing the bitstream into frames having a predetermined first number of bits; a bit array generator for generating a bit array by integrating the separated frames; and a principal component specifying unit for specifying the main component of the bitstream based on the bit array and the total number of frames, wherein the centroid may be extracted based on some of the frames.

상기 기준 데이터 처리부는, 상기 기준 데이터로부터 기준 주성분을 추출하는 기준 주성분 추출부; 및 상기 기준 주성분으로부터 상기 기준 주성분의 일부 영역에 대한 상기 기준 고유 팩터를 추출하는 기준 고유팩터 추출부;를 포함하고, 상기 기준 고유팩터 추출부는, 상기 기준 주성분을 적어도 하나 이상의 비트를 포함하는 복수의 서브프레임으로 분리하고, 상기 복수의 서브프레임 중 일부의 서브프레임에 포함된 비트 정보를 기초로 상기 기준 고유 팩터를 특정할 수 있다.The reference data processing unit may include: a reference principal component extractor configured to extract a reference principal component from the reference data; and a reference eigenfactor extracting unit for extracting the reference eigenfactor for a partial region of the reference principal component from the reference principal component, wherein the reference eigenfactor extractor includes a plurality of bits including at least one bit for the reference principal component. It may be divided into subframes, and the reference intrinsic factor may be specified based on bit information included in some subframes of the plurality of subframes.

상기 비교부는, 상기 센트로이드와 상기 기준 고유 팩터를 기초로 모드 별 MSE 분포를 획득하는 모드별 MSE 획득부; 상기 모드 별 MSE 분포에 기초하여 모드 별 임계값(Threshold value)을 획득하는 임계값 획득부; 및 상기 대상 데이터의 MSE 분포의 최소값과 상기 모드 별 임계값 중 상기 입력 모드의 임계값을 비교하여, 상기 최소값이 상기 임계값보다 작은 경우 상기 대상 데이터가 AMR 음성 데이터인 것으로 판정하는 비교 판정부를 포함할 수 있다.The comparison unit may include: an MSE acquisition unit for each mode acquiring an MSE distribution for each mode based on the centroid and the reference intrinsic factor; a threshold value acquisition unit for acquiring a threshold value for each mode based on the MSE distribution for each mode; and a comparison determining unit that compares the minimum value of the MSE distribution of the target data and the threshold value of the input mode among the threshold values for each mode, and determines that the target data is AMR voice data when the minimum value is less than the threshold value can do.

상기 임계값 획득부는, 상기 일치 모드의 주성분의 제1 MSE 분포와 상기 불일치 모드의 주성분의 제2 MSE 분포가 같은 값을 가지는 지점을 임계값으로 획득할 수 있다.The threshold value obtaining unit may obtain, as a threshold, a point at which the first MSE distribution of the principal component of the coincident mode has the same value as the second MSE distribution of the principal component of the inconsistent mode.

본 발명의 실시예들에 따르면, 음성 데이터가 AMR 형식에 따라 인코딩된 AMR 음성데이터인지 여부를 기확보된 기준 데이터와의 비교를 통해 용이하게 판정할 수 있다.According to embodiments of the present invention, it is possible to easily determine whether the voice data is AMR voice data encoded according to the AMR format by comparing it with pre-secured reference data.

도 1은 본 발명의 일 실시예에 따른 AMR 음성 데이터 판별 장치의 구성을 개략적으로 도시한 블록도이다.
도 2는 도 1에 따른 장치의 구성 중 일부를 개략적으로 도시한 블록도이다.
도 3은 본 발명의 일 실시예에 따른 주성분 추출부에 의한 데이터 처리 과정을 설명하기 위한 도면이다.
도 4는 본 발명의 일 실시예에 따른 일 모드에서 하나의 프레임 및 그 프레임에 포함된 서브프레임들을 도식적으로 나타낸 도면이다.
도 5 내지 도 12는 본 발명의 제0 모드 내지 제7 모드로 부호화된 남성 및 여성의 음성데이터의 주성분의 그래프의 일 예를 도시한 그래프이다.
도 13 및 도 14는 일 실시예에 따른 상호 팩터(Cross factor)를 설명하기 위한 도면이다.
도 15는 본 발명의 일 실시예에 따른 대상 데이터가 인코딩 된 입력 모드 및 이를 비교하는 점검 모드를 매치하여 생성된 모드 별 행렬을 도시한 도면이다.
도 16은 제0 모드 내지 제3 모드에 대하여 본 발명의 일 실시예에 따른 MSE 분포를 도시한 그래프이다.
도 17은 제4 모드 내지 제7 모드에 대하여 본 발명의 일 실시예에 따른 MSE 분포를 도시한 그래프이다.
도 18은 본 발명의 일 실시예에 따른 AMR 음성 데이터 판별 방법을 설명하기 위한 순서도이다.1 is a block diagram schematically showing the configuration of an AMR voice data discrimination apparatus according to an embodiment of the present invention.
FIG. 2 is a block diagram schematically illustrating a part of the configuration of the apparatus according to FIG. 1 .
3 is a view for explaining a data processing process by the principal component extractor according to an embodiment of the present invention.
4 is a diagram schematically illustrating one frame and subframes included in the frame in one mode according to an embodiment of the present invention.
5 to 12 are graphs illustrating an example of a graph of main components of male and female voice data encoded in mode 0 to mode 7 of the present invention.
13 and 14 are diagrams for explaining a cross factor according to an embodiment.
15 is a diagram illustrating a matrix for each mode generated by matching an input mode in which target data is encoded and a check mode comparing the same according to an embodiment of the present invention.
16 is a graph illustrating an MSE distribution according to an embodiment of the present invention with respect to mode 0 to mode 3;
17 is a graph illustrating an MSE distribution according to an embodiment of the present invention with respect to a fourth mode to a seventh mode.
18 is a flowchart illustrating a method for determining AMR voice data according to an embodiment of the present invention.

본 발명은 다양한 변환을 가할 수 있고 여러 가지 실시예를 가질 수 있는 바, 특정 실시예들을 도면에 예시하고 상세한 설명에 상세하게 설명하고자 한다. 본 발명의 효과 및 특징, 그리고 그것들을 달성하는 방법은 도면과 함께 상세하게 후술되어 있는 실시예들을 참조하면 명확해질 것이다. 그러나 본 발명은 이하에서 개시되는 실시예들에 한정되는 것이 아니라 다양한 형태로 구현될 수 있다. Since the present invention can apply various transformations and can have various embodiments, specific embodiments are illustrated in the drawings and described in detail in the detailed description. Effects and features of the present invention, and a method for achieving them, will become apparent with reference to the embodiments described below in detail in conjunction with the drawings. However, the present invention is not limited to the embodiments disclosed below and may be implemented in various forms.

이하, 첨부된 도면을 참조하여 본 발명의 실시예들을 상세히 설명하기로 하며, 도면을 참조하여 설명할 때 동일하거나 대응하는 구성 요소는 동일한 도면부호를 부여하고 이에 대한 중복되는 설명은 생략하기로 한다.Hereinafter, embodiments of the present invention will be described in detail with reference to the accompanying drawings, and when described with reference to the drawings, the same or corresponding components are given the same reference numerals, and the overlapping description thereof will be omitted. .

이하의 실시예에서, 제1, 제2 등의 용어는 한정적인 의미가 아니라 하나의 구성 요소를 다른 구성 요소와 구별하는 목적으로 사용되었다. 이하의 실시예에서, 단수의 표현은 문맥상 명백하게 다르게 뜻하지 않는 한, 복수의 표현을 포함한다. 이하의 실시예에서, 포함하다 또는 가지다 등의 용어는 명세서상에 기재된 특징, 또는 구성요소가 존재함을 의미하는 것이고, 하나 이상의 다른 특징들 또는 구성요소가 부가될 가능성을 미리 배제하는 것은 아니다. 도면에서는 설명의 편의를 위하여 구성 요소들이 그 크기가 과장 또는 축소될 수 있다. 예컨대, 도면에서 나타난 각 구성의 크기 및 형태는 설명의 편의를 위해 임의로 나타내었으므로, 본 발명이 반드시 도시된 바에 한정되지 않는다. In the following embodiments, terms such as first, second, etc. are used for the purpose of distinguishing one component from another, not in a limiting sense. In the following examples, the singular expression includes the plural expression unless the context clearly dictates otherwise. In the following embodiments, terms such as include or have means that the features or components described in the specification are present, and the possibility that one or more other features or components may be added is not excluded in advance. In the drawings, the size of the components may be exaggerated or reduced for convenience of description. For example, since the size and shape of each component shown in the drawings are arbitrarily indicated for convenience of description, the present invention is not necessarily limited to the illustrated bar.

이하의 실시예에서, 막, 영역, 구성 요소 등이 연결되었다고 할 때, 막, 영역, 구성 요소들이 직접적으로 연결된 경우뿐만 아니라 막, 영역, 구성요소들 중간에 다른 막, 영역, 구성 요소들이 개재되어 간접적으로 연결된 경우도 포함한다. 예컨대, 본 명세서에서 막, 영역, 구성 요소 등이 전기적으로 연결되었다고 할 때, 막, 영역, 구성 요소 등이 직접 전기적으로 연결된 경우뿐만 아니라, 그 중간에 다른 막, 영역, 구성 요소 등이 개재되어 간접적으로 전기적 연결된 경우도 포함한다.In the following embodiments, when a film, region, or component is connected, other films, regions, and components are interposed between the films, regions, and components as well as when the films, regions, and components are directly connected. It also includes cases where it is indirectly connected. For example, in the present specification, when it is said that a film, a region, a component, etc. are electrically connected, not only the case where the film, a region, a component, etc. are directly electrically connected, but also other films, regions, and components are interposed therebetween. Indirect electrical connection is also included.

도 1은 본 발명의 일 실시예에 따른 AMR 음성 데이터 판별 장치(10)(이하, '판별 장치'로 지칭할 수 있다.)의 구성을 개략적으로 도시한 블록도이다.1 is a block diagram schematically illustrating the configuration of an AMR voice data discrimination device 10 (hereinafter, may be referred to as a 'discrimination device') according to an embodiment of the present invention.

판별 장치(10)는 대상 데이터 처리부(100), 기준 데이터 처리부(200) 및 비교부(300)를 포함할 수 있다. 도 1의 장치를 구성하는 각각의 구성은 특정한 기능을 수행하도록 제작된 물리적인 유닛일 수도 있고, 프로그램으로 구현되어 각각의 기능을 구현하도록 설정된 논리적인 모듈일 수도 있다.The determination apparatus 10 may include a target data processing unit 100 , a reference data processing unit 200 , and a comparison unit 300 . Each component constituting the device of FIG. 1 may be a physical unit manufactured to perform a specific function, or may be a logical module implemented as a program and set to implement each function.

대상 데이터 처리부(100)는 주성분(Principal Component) 추출부(110) 및 상기 주성분으로부터 상기 대상 데이터의 고유 팩터인 센트로이드(Centroid)를 추출하는 센트로이드 추출부(120)를 포함할 수 있다. 이하에서, '대상 데이터'라 함은 본 발명에서 AMR 데이터인지 여부를 추정하고자 하는 대상이 되는 데이터를 의미한다. 주성분 추출부(110)에 관하여는 후술하는 도 2에서 더 상세히 설명하고, 센트로이드 추출부(120)에 관하여는 후술하는 도 4 내지 도 13을 통해 더 상세히 설명한다.The target data processing unit 100 may include a principal component extraction unit 110 and a centroid extraction unit 120 for extracting a centroid, which is a unique factor of the target data, from the principal component. Hereinafter, 'target data' refers to data that is to be estimated whether it is AMR data or not in the present invention. The main component extraction unit 110 will be described in more detail with reference to FIG. 2 to be described later, and the centroid extraction unit 120 will be described in more detail with reference to FIGS. 4 to 13 to be described later.

기준 데이터 처리부(200)는 주성분 추출부(210), 상기 기준 데이터의 주성분으로부터 상기 기준 데이터의 고유 팩터인 코드워드(Codeword)를 추출하는 코드워드 추출부(220)를 포함할 수 있다. 이하에서, '기준 데이터'라 함은 상기 대상 데이터의 분석 시 비교 대상이 되는 샘플 AMR 데이터를 의미한다.The reference data processing unit 200 may include a principal component extraction unit 210 and a codeword extraction unit 220 for extracting a codeword that is a unique factor of the reference data from a principal component of the reference data. Hereinafter, 'reference data' refers to sample AMR data to be compared when analyzing the target data.

비교부(300)는 모드별 평균제곱오차(Mean Squared Error; 이하, 'MSE'로 지칭한다.) 분포를 획득하는 모드별 MSE 획득부(310), 임계값 획득부(320) 및 상기 MSE 분포의 최소값과 상기 임계값을 비교하여 상기 대상 데이터가 AMR 음성데이터인지 여부를 판정하는 비교 판정부(330)를 포함할 수 있다. 이때, 모드별 MSE 획득부(310)는 대상 데이터 처리부(100)로부터 센트로이드를 수신하고, 기준 데이터 처리부(200)로부터 고유 팩터를 수신하여 상기 모드별 MSE 분포를 획득할 수 있다. 모드별 MSE 분포에 관하여는 후술하는 도 15 내지 도 17을 통해 더 상세히 설명한다.The comparison unit 300 includes a mode-specific MSE acquisition unit 310, a threshold value acquisition unit 320, and the MSE distribution for obtaining a mean squared error (Mean Squared Error; hereinafter referred to as 'MSE') distribution for each mode. and a comparison determination unit 330 for determining whether the target data is AMR voice data by comparing the threshold value with the minimum value of . In this case, the MSE acquisition unit 310 for each mode may receive the centroid from the target data processing unit 100 and the intrinsic factor from the reference data processing unit 200 to obtain the MSE distribution for each mode. The MSE distribution for each mode will be described in more detail with reference to FIGS. 15 to 17 to be described later.

도 2는 도 1에 따른 장치(10)의 구성 중 일부인 주성분 추출부(110)를 개략적으로 도시한 블록도이다. 이하에서, 주성분 추출부는 대상 데이터 처리부(100)의 구성인 것을 기준으로 하여 설명하나, 기준 데이터 처리부(200)의 주성분 추출부(210)도 처리하는 데이터만 기준 데이터로 상이할 뿐 이와 동일한 구성을 가질 수 있다. FIG. 2 is a block diagram schematically illustrating the principal component extraction unit 110 which is a part of the configuration of the apparatus 10 according to FIG. 1 . Hereinafter, the main component extraction unit will be described based on the configuration of the target data processing unit 100, but only the data processed by the principal component extraction unit 210 of the reference data processing unit 200 is different as reference data. can have

주성분 추출부(110)는 데이터 수신부(111), 프레임 분리부(112), 비트배열 생성부(113) 및 주성분 특정부(114)를 포함할 수 있다.The principal component extracting unit 110 may include a data receiving unit 111 , a frame separating unit 112 , a bit array generating unit 113 , and a principal component specifying unit 114 .

데이터 수신부(111)는 제0 모드 내지 제7 모드 중 하나의 AMR 형식으로 인코딩된 음성 데이터의 비트스트림(Bitstream)을 수신할 수 있다. 본 발명의 AMR 음성데이터는 해당 음성데이터가 인코딩된 모드 별로 할당된 비트 수를 가진다. AMR 음성 데이터는 가변율 압축 옵션인 총 8가지의 비트율(가변 전송률)에 대한 개별적인 압축 포맷 특징을 포함할 수 있다.The data receiver 111 may receive a bitstream of voice data encoded in one of the 0th mode to the 7th mode in the AMR format. The AMR voice data of the present invention has the number of bits allocated for each mode in which the corresponding voice data is encoded. AMR voice data may include individual compression format features for a total of eight bit rates (variable bit rates), which are variable rate compression options.

보다 구체적으로, AMR 인코딩이 되기 위해 입력되는 기초데이터(raw data)는 1프레임당 160개의 샘플을 포함하고, 1초마다 50프레임들이 입력되지만, 본 발명에서는 이미 AMR 인코딩이 완료된 파일을 수신하고, 수신된 음성 파일을 분석하기 위한 방법을 제안하는 것이므로, 이에 대한 구체적인 설명은 생략한다. 본 발명은 AMR 음성데이터가 수신되었으나, 수신된 데이터의 압축포맷에 대한 정보가 전혀 없을 때에 이를 정확하게 추정하기 위한 방법을 제안한다.More specifically, the raw data input for AMR encoding includes 160 samples per frame, and 50 frames are input every second, but in the present invention, a file for which AMR encoding is already completed is received, Since a method for analyzing a received voice file is proposed, a detailed description thereof will be omitted. The present invention proposes a method for accurately estimating AMR voice data when there is no information on the compression format of the received data even though it is received.

예를 들어, 상기 [표 1]에서 10.2kbit/s의 모드는 모드 6(Mode 6)으로 별칭되며, 이 모드로 음성 샘플을 부호화(인코딩)할 경우, 1프레임당 204비트의 비트스트림이 생성된다. 여기서, 204비트 중 26비트에는 LSF 서브매트릭스(submatrix)의 지수에 대한 내용이 할당되고, 나머지 178비트는 각각 46, 43, 46, 43비트의 서브프레임(subframe)으로 분리될 수 있다. 다른 예로서, 4.75kbit/s의 모드는 모드 0으로 별칭되며, 1프레임당 95비트의 비트스트림이 생성된다. 더 명확한 기재를 위해서, 이하에서는, 모드 0 내지 모드 7을 제0 모드 내지 제7 모드로 호칭하기로 한다.For example, in [Table 1], a mode of 10.2 kbit/s is called mode 6, and when a voice sample is encoded (encoded) in this mode, a bitstream of 204 bits per frame is generated. do. Here, 26 bits among 204 bits are allocated with the exponent of the LSF submatrix, and the remaining 178 bits may be divided into subframes of 46, 43, 46, and 43 bits, respectively. As another example, a mode of 4.75 kbit/s is called mode 0, and a bitstream of 95 bits per frame is generated. For clearer description, modes 0 to 7 will be referred to as modes 0 to 7 hereinafter.

데이터 수신부(111)는 제0 모드 내지 제7 모드 중 하나의 AMR 형식으로 압축된 음성데이터의 비트스트림을 수신하고, 본 발명에서는 AMR 파일 내의 모드 정보가 모두 동일하다고 전제한다. 예를 들어, 데이터 수신부(111)가 AMR파일을 수신한 경우, 그 AMR파일은 제0 모드의 정보만 포함되어 있거나, 제4 모드의 정보만 포함되어 있을 뿐이고, 그 파일의 분석결과에서 제1 모드 및 제2 모드의 정보가 동시에 검출되지 않는다. 위와 같은 조건은 실제 WCDMA 채널에서 코덱 변경이 자주 발생되지 않는 사정을 고려한 것이다.The data receiving unit 111 receives the bitstream of voice data compressed in one of the AMR formats from the 0th mode to the 7th mode, and the present invention assumes that all mode information in the AMR file is the same. For example, when the data receiving unit 111 receives the AMR file, the AMR file contains only the information of the 0th mode or only the information of the 4th mode, and in the analysis result of the file, the first The information of the mode and the second mode is not detected at the same time. The above conditions take into account the fact that codec changes do not occur frequently in actual WCDMA channels.

프레임 분리부(112)는 데이터 수신부(111)가 수신한 음성데이터의 비트스트림을 기설정된 제1 비트수를 갖는 프레임들로 분리한다. 프레임 분리부(112)는 표 1과 같은 정보를 저장하고 있다가, 비트스트림을 복수의 프레임들로 분리할 때에 활용할 수 있다. 제1 비트수는 미리 설정된 값으로서, 표 1에서 각 모드의 1프레임당 포함되는 비트수를 의미한다. 상기 제1 비트수는 95, 103, 118, 134, 148, 159, 204, 244 중 선택된 어느 하나일 수 있다.The frame dividing unit 112 divides the bitstream of the voice data received by the data receiving unit 111 into frames having a predetermined first number of bits. The frame separator 112 may store the information shown in Table 1 and use it when dividing the bitstream into a plurality of frames. The first number of bits is a preset value and means the number of bits included in one frame of each mode in Table 1. The first number of bits may be any one selected from among 95, 103, 118, 134, 148, 159, 204, and 244.

예를 들어, 1020비트로 구성된 비트스트림이 수신되고, 프레임 분리부(112)가 제5 모드의 프레임으로 비트스트림을 분리하고자 한다면, 제1 비트수는 204가 되며, 비트스트림은 총 5개의 제5 모드 프레임으로 분리될 수 있다. 프레임 분리부(112)는 1차적으로 선택된 제1 비트수에 따라서 최종 분석결과가 산출되었을 때, 적절한 결과가 나오지 않으면, 새로운 제1 비트수를 기초로 하여 비트스트림을 복수의 프레임으로 분리할 수 있다. 본 발명에서, 위와 같이 반복적인 과정을 거치는 것은, 수신된 비트스트림에는 모드정보(mode information)가 포함되어 있지 않으므로, 모든 모드에 대한 분석을 하고, 그 중에서 가장 이상적인 결과를 나타내는 모드분석의 결과를 취하기 위함이다.For example, if a bitstream composed of 1020 bits is received and the frame separation unit 112 separates the bitstream into a frame of a fifth mode, the first number of bits becomes 204, and the bitstream is a total of five fifth mode. It can be divided into mod frames. When the final analysis result is calculated according to the first selected number of bits, the frame dividing unit 112 may divide the bitstream into a plurality of frames based on the new first number of bits if an appropriate result is not obtained. have. In the present invention, since the received bitstream does not contain mode information, the repetitive process as described above is analyzed for all modes, and the result of mode analysis showing the most ideal result is analyzed in the present invention. to take

비트배열 생성부(113)는 분리된 프레임들을 통합하여 비트배열을 생성한다.The bit array generator 113 generates a bit array by integrating the separated frames.

여기서, 비트배열(bit array)은, 수신된 음성데이터의 비트스트림은 직렬로 길게 늘어져 있는 데이터로서 특징을 추출하기 어려우므로 이를 일련의 과정을 거쳐서 가공한 가공데이터를 의미한다. 비트배열 생성부(113)는 프레임 분리부(112)로부터 음성데이터를 구성하는 복수의 프레임들을 전달받고, 프레임의 크기와 대응되는 제1 비트수를 기초로 하여, 복수의 프레임들에 포함된 정보들을 모두 포함하는 비트배열을 생성한다.Here, the bit array refers to processed data processed through a series of processes because the bitstream of the received voice data is serially elongated data and it is difficult to extract features. The bit array generating unit 113 receives a plurality of frames constituting the voice data from the frame separating unit 112, and based on the first number of bits corresponding to the size of the frame, information included in the plurality of frames Create a bit array containing all of them.

주성분 특정부(114)는 비트배열 생성부(113)가 생성한 비트배열과 프레임들의 전체 개수를 기초로 하여 주성분을 특정하며, 주성분에 대해서는 후술하는 도 3에서 더 상세히 설명한다다.The main component specifying unit 114 specifies the main component based on the total number of the bit array and frames generated by the bit arrangement generating unit 113, and the main component will be described in more detail with reference to FIG. 3 to be described later.

도 3은 본 발명의 일 실시예에 따른 주성분 추출부(110)에 의한 데이터 처리 과정을 설명하기 위한 도면이다.3 is a view for explaining a data processing process by the principal component extractor 110 according to an embodiment of the present invention.

도 3에서 음성데이터의 비트스트림은 총 (M×N) 비트로 구성되어 있으며, 1프레임당 M비트를 포함하는 N개의 프레임으로 분리된다. 비트배열 생성부(113)는 N개의 프레임을 기초로 해서 비트배열인 v_s을 생성할 수 있다. 도 3에서 비트배열 생성부(113)는 각 프레임들의 첫번째, 두번째, 세번째 비트 등을 합하는 방식으로 비트배열을 생성했으나, 실시 예에 따라서, 단순히 합산하지 않고 다른 연산을 통해서 비트배열의 각 비트값들이 결정될 수 있다. In FIG. 3, the bitstream of voice data consists of a total of (M×N) bits, and is divided into N frames including M bits per frame. The bit array generator 113 may generate a bit array v _s based on N frames. In FIG. 3 , the bit array generating unit 113 generates the bit array by summing the first, second, and third bits of each frame, but according to an embodiment, each bit value of the bit array is not simply summed but through another operation. can be determined.

또한, 비트배열는 여러 프레임들을 더함으로써 생성될 수 있으므로, 프레임 분리부(112)에 의해 생성된 프레임들보다 더 많은 비트수로 구성될 수 있다. 예를 들어, 도 3에서 v₁프레임은 M비트를 포함하는 프레임이지만, 비트배열 v_s는 (N×M)비트를 포함할 수 있다.In addition, since the bit array may be generated by adding several frames, it may be composed of a larger number of bits than the frames generated by the frame dividing unit 112 . For example, in FIG. 3 , the v ₁ frame is a frame including M bits, but the bit array v _s may include (N×M) bits.

주성분 특정부(114)는 비트배열 v_s가 생성되면, 이를 프레임의 전체 개수인 N으로 비트배열에 포함된 값(value)들로 나누어서 대상 음성데이터의 주성분(principal component)을 특정한다. 비트에 포함되는 값은 0 또는 1이고, 주성분은 각 프레임들의 비트의 합을 프레임의 전체 개수인 N으로 나눈 결과를 포함하므로, 주성분 v_p에서 v_p(i)를 수학적으로 해석하면, i가 1에서 N까지 변할 때, bi가 1일 확률분포함수라는 것을 알 수 있다.When the bit array v _s is generated, the principal component specifying unit 114 divides it into values included in the bit array by N, which is the total number of frames, to specify a principal component of the target voice data. The value included in the bit is 0 or 1, and the principal component includes the result of _dividing the _sum of bits of each frame by N, the total number of frames. When changing from 1 to N, it can be seen that bi is a probability distribution function of 1.

수학식 1은 주성분의 수학적인 의미를 나타낸다. 수학식 1에서 N은 비트스트림에서 분리된 프레임의 전체 개수이고, v_k는 k번째 프레임을 의미한다. 또한, 수학식 1에서 E[v]는 주성분 특정부(114)가 특정한 주성분이 벡터로 표현될 수 있다는 것을 나타낸다. 주성분은 제1 비트수로 구성된 복수의 프레임들을 통합하여 생성되므로, 주성분의 길이는 제1 비트수와 동일하거나 제1 비트수의 n 배수(이때 n은 1보다 큰 자연수)와 같을 수 있다.Equation 1 represents the mathematical meaning of the principal component. In Equation 1, N is the total number of frames separated from the bitstream, and v _k means the k-th frame. Also, E[v] in Equation 1 indicates that the principal component specified by the principal component specifying unit 114 can be expressed as a vector. Since the main component is generated by integrating a plurality of frames composed of the first number of bits, the length of the main component may be equal to the first number of bits or equal to n multiples of the first number of bits (in this case, n is a natural number greater than 1).

도 3에서 설명한 하나의 프레임에 포함되는 비트의 수 M은 상기 [표 1]에 있는 다양한 값(95이나 204등)에도 적용될 수 있다는 것은 이 분야의 통상의 지식을 가진 자에게 자명할 것이다. It will be apparent to those skilled in the art that the number M of bits included in one frame described in FIG. 3 can also be applied to various values (95 or 204, etc.) in [Table 1].

위와 같이 특정된 주성분은 센트로이드 추출부(120)에서 추출되는 센트로이드(Centroid)의 후보가 된다. 여기서, 주성분이 고유팩터의 후보가 된다는 것은 정보추정부(150)가 주성분에 포함된 정보 중 일부의 정보만을 통해 음성데이터의 압축포맷정보를 파악한다는 것을 의미한다.The main component specified as above is a candidate for the centroid extracted by the centroid extraction unit 120 . Here, the fact that the main component becomes a candidate for the intrinsic factor means that the information estimator 150 grasps the compression format information of the voice data through only some of the information included in the main component.

센트로이드 추출부(120)는 상기 주성분을 적어도 하나 이상의 비트를 포함하는 복수의 서브프레임으로 분리할 수 있다. 이후, 상기 복수의 서브프레임 중 일부의 서브프레임에 포함된 비트 정보를 기초로 상기 센트로이드를 특정할 수 있다.The centroid extraction unit 120 may divide the main component into a plurality of subframes including at least one or more bits. Thereafter, the centroid may be specified based on bit information included in some subframes of the plurality of subframes.

일 예로, 센트로이드 추출부(120)는 주성분을 적어도 하나 이상의 비트를 포함하는 제1 서브프레임 내지 제4 서브프레임으로 분리할 수 있다. 대상 데이터 처리부(100)는 상기 표 1을 저장하고 있다가 주성분 추출부(110)에 의해 주성분이 특정되면, 센트로이드 추출부(120)는 주성분을 4개의 서브프레임으로 분리할 수 있다. 주성분은 수학식 1을 기초로 프레임들의 비트값을 통합하고 평균을 내는 방식으로 특정되므로, 제1 비트수를 갖는 프레임과 동일한 비트수를 갖고, 그 프레임들에 포함되는 서브프레임과 동일한 개수의 서브프레임을 포함할 수 있다. 다시 말해, 주성분은 상기 [수학식 1]과 같이 전체 프레임이 포함하는 비트 평균값으로 산출될 수 있다.For example, the centroid extractor 120 may divide the main component into first to fourth subframes including at least one bit or more. The target data processing unit 100 stores Table 1 and when the main component is specified by the main component extraction unit 110 , the centroid extraction unit 120 may divide the main component into four subframes. Since the principal component is specified by integrating and averaging the bit values of frames based on Equation 1, it has the same number of bits as the frame having the first number of bits, and has the same number of subframes as the subframes included in the frames. It may include a frame. In other words, the main component may be calculated as an average bit value included in the entire frame as shown in Equation 1 above.

도 4는 제6 모드에서 하나의 프레임 및 그 프레임에 포함된 서브프레임들을 도식적으로 나타낸 도면이다.4 is a diagram schematically illustrating one frame and subframes included in the frame in the sixth mode.

도 4를 참조하면, 제6 모드에서 하나의 프레임은 204개의 비트를 포함하며, 처음 26비트는 LSP, 그 외의 178비트는 제1서브프레임 내지 제4서브프레임들로 구성된 것을 알 수 있다. 프레임을 구성하는 각 비트는 s1부터 s204까지 라벨링되어 있다. 표 1을 참조하여 구체적으로 설명하면, 제1서브프레임 중 8비트(s27~s34)에는 adaptive codebook의 지수가 할당되고 31비트(s35~s72)에는 algebraic codebook 정보가 할당된다. 제1서브프레임의 마지막 7비트(s66~s72)에는 codebook gain이 할당된다. 이어서, 제2서브프레임의 5비트(s73~s77)에는 adaptive codebook의 지수가 할당되고 나머지 38비트(s78~s115)에는 algebraic codebook 정보가 할당된다. 제3서브프레임 및 제4서브프레임은 제1서브프레임 및 제2서브프레임과 각각 동일한 방식으로 정보가 할당된다.Referring to FIG. 4 , it can be seen that in the sixth mode, one frame includes 204 bits, the first 26 bits are the LSP, and the other 178 bits are composed of the first subframes to the fourth subframes. Each bit constituting the frame is labeled from s1 to s204. Specifically with reference to Table 1, 8 bits (s27 to s34) of the first subframe are allocated an exponent of the adaptive codebook, and algebraic codebook information is allocated to 31 bits (s35 to s72). A codebook gain is allocated to the last 7 bits (s66 to s72) of the first subframe. Subsequently, the index of the adaptive codebook is allocated to 5 bits (s73 to s77) of the second subframe, and the algebraic codebook information is allocated to the remaining 38 bits (s78 to s115). Information is allocated to the third subframe and the fourth subframe in the same manner as the first subframe and the second subframe, respectively.

비교부(300)의 모드별 MSE 획득부(310)는 제3 서브프레임 및 제4 서브프레임에 포함된 비트정보를 기초로 하여, 대상 데이터가 AMR 음성데이터인지 여부를 판정할 수 있다. 제3 서브프레임 및 제4 서브프레임에 포함된 비트정보는 음성데이터의 압축포맷정보를 알려주는 정보라는 점에서 고유팩터(eigenfactor)로 기능할 수 있기 때문이다. 이때, 대상 데이터의 고유팩터는 센트로이드, 기준 데이터의 고유팩터는 코드워드로 정의될 수 있다.The MSE acquisition unit 310 for each mode of the comparison unit 300 may determine whether the target data is AMR voice data based on bit information included in the third subframe and the fourth subframe. This is because the bit information included in the third subframe and the fourth subframe can function as an eigenfactor in that it is information indicating compressed format information of voice data. In this case, the intrinsic factor of the target data may be defined as a centroid, and the intrinsic factor of the reference data may be defined as a codeword.

모드mode 제3 서브프레임3rd subframe 제4 서브프레임4th subframe 00 s62 ~ s82s62 to s82 s83 ~ s95s83 to s95 1One s66 ~ s84s66 to s84 s85 ~ s103s85 to s103 22 s73 ~ s97s73 to s97 s98 ~ s118s98 to s118 33 s81 ~ s109s81 to s109 s110 ~ s134s110 to s134 44 s88 ~ s119s88 to s119 s120 ~ s148s120 to s148 55 s94 ~ s127s94 to s127 s128 ~ s159s128 to s159 66 s116 ~ s161s116 to s161 s162 ~ s204s162 to s204 77 s142 ~ s194s142 to s194 s195 ~ s244s195 to s244

상기 [표 2]는 주성분에서 고유팩터의 비트위치를 나타낸 표이다. 구체적으로, [표 2]는 하나의 프레임과 동일한 제1 비트수로 구성된 주성분에서 제3 서브프레임 및 제4 서브프레임이 어느 위치에 존재하는지 제시하는 것으로서, 비교부(300)는 주성분 중 일부 영역인 제3서브프레임 및 제4서브프레임에 대한 비트정보 만으로 대상 데이터가 AMR 음성데이터인지 여부를 파악할 수 있다.[Table 2] is a table showing the bit positions of eigen factors in the main component. Specifically, [Table 2] shows the positions of the third subframe and the fourth subframe in the main component composed of the same first number of bits as one frame. Whether the target data is AMR voice data can be determined only with bit information about the third subframe and the fourth subframe.

도 5 내지 도 12는 본 발명의 제0 모드 내지 제7 모드로 부호화된 남성 및 여성의 음성데이터의 주성분의 그래프의 일 예를 도시한 그래프이다.5 to 12 are graphs illustrating an example of a graph of main components of male and female voice data encoded in mode 0 to mode 7 of the present invention.

보다 구체적으로, 도 5 내지 도 12는 남성 및 여성이 1분 동안 동일한 내용의 문서를 읽으면서 녹음된 기초음성데이터(raw voice data)를 제0 모드 내지 제7 모드로 부호화하고 상술한 과정을 통해 주성분을 특정하여 그래프로 나타낸 것이다. 파란색 그래프는 남성의 음성에 대한 비트 평균값(주성분)을 의미하고, 빨간색 그래프는 여성의 음성에 대한 비트 평균값을 의미한다.More specifically, FIGS. 5 to 12 show that raw voice data recorded while a man and a woman read a document of the same content for 1 minute are encoded into modes 0 to 7, and through the above-described process The principal components are specified and presented in a graph. The blue graph means the average beat value (principal component) for male voice, and the red graph means the average beat value for female voice.

먼저, 도 5는 제0 모드에서의 주성분의 그래프를 나타내고 있고, 가로축의 좌표는 제0 모드의 비트 수에 따라 0에서 95까지 있으며, 세로축에는 0에서 1까지의 값이 매겨져 있다. First, FIG. 5 shows a graph of principal components in the 0th mode, the coordinates of the horizontal axis range from 0 to 95 depending on the number of bits in the 0th mode, and values 0 to 1 are assigned to the vertical axis.

도 5를 참조하면, 제3서브프레임 및 제4서브프레임에서의 남성의 음성에 대한 그래프와 여성의 음성에 대한 그래프가 극히 유사한 형태를 보이는 것을 알 수 있으며, 이는 제3서브프레임 이전의 제1서브프레임 및 제2서브프레임 구간에서 남성의 음성에 대한 그래프와 여성의 음성에 대한 그래프가 현격한 차이를 보이는 점과 비교한다면, 주목할만한 결과이다. 즉, 성별차이로 인한 음성의 차이는 제1서브프레임 및 제2서브프레임 구간에 모두 반영되며, 남성과 여성은 모두 동일한 문자가 인쇄된 문서를 음독하였으므로, 제3서브프레임 및 제4서브프레임 구간에서의 그래프의 일치성은 제0 모드에 대한 정보를 포함하는 것으로 이해될 수 있다.Referring to FIG. 5 , it can be seen that the graph for male voice and the graph for female voice in the third and fourth subframes show extremely similar shapes, which are This is a remarkable result when comparing the graph for male voice and the graph for female voice in the subframe and the second subframe section showing a marked difference. That is, the voice difference due to the gender difference is reflected in both the first and second subframe sections, and since both men and women read documents printed with the same characters, the third and fourth subframe sections It may be understood that the correspondence of the graphs in ? includes information about the 0th mode.

이어서, 도 6은 제1 모드에서의 주성분의 그래프를 나타내고 있고, 도 6의 가로축의 좌표는 제1 모드의 비트 수에 따라 0에서 103까지 있으며, 세로축에는 0에서 1까지의 값이 매겨져 있다. 도 6에서도 도 5와 마찬가지로 제3 서브프레임 및 제4 서브프레임 구간에서 성별과 관계없이 두 가지의 그래프가 유사한 경향성을 보이는 것을 알 수 있다.Next, FIG. 6 shows a graph of principal components in the first mode. The coordinates of the horizontal axis of FIG. 6 range from 0 to 103 according to the number of bits in the first mode, and values from 0 to 1 are assigned to the vertical axis. In FIG. 6 , it can be seen that the two graphs show similar tendencies in the third subframe and the fourth subframe period, irrespective of gender, as in FIG. 5 .

그 외에 도 7 내지 도 12에 도시되어 있는 그래프에서 모두 제3 서브프레임 및 제4 서브프레임 구간에서는 남성의 음성에 대한 그래프 및 여성의 음성에 대한 그래프가 극히 유사한 경향성을 보이므로, 비교부(300)(특히, 비교 판정부(330))가 이 부분에 대한 정보를 기초로 인코딩 정보를 알 수 없는 AMR 파일을 분석하면, 압축포맷정보를 정확하게 추정할 수 있게 된다.In addition, in the graphs shown in FIGS. 7 to 12, in the third subframe and the fourth subframe section, the graph for the male voice and the graph for the female voice show extremely similar tendencies, so the comparator 300 ) (in particular, the comparison determination unit 330) analyzes the AMR file whose encoding information is unknown based on the information on this part, so that it is possible to accurately estimate the compression format information.

비교부(300)는 전술한 과정을 통해 특정된 주성분이 분석하기에 적절하지 않다면, 제1 비트수를 변경하여, 새로운 주성분이 생성되도록 제어할 수 있다. 프레임 분리부(112)가 음성데이터의 비트스트림을 원래 모드와 일치하지 않는 모드의 프레임으로 분리하고, 분리된 프레임들로 비트배열이 생성된 후에 주성분이 특정되면, 그 주성분의 그래프는 분석하기에 적절하지 않은 형태로 출력될 수 있으며, 이러한 주성분은 전술한 센트로이드와 배치되는 개념으로 크로스팩터(crossfactor)로 호칭될 수 있다.If the main component specified through the above-described process is not suitable for analysis, the comparator 300 may change the first number of bits to control a new main component to be generated. The frame separating unit 112 divides the bitstream of the voice data into frames of a mode that does not match the original mode, and when a main component is specified after a bit array is generated from the separated frames, the graph of the main component is difficult to analyze. It may be output in an inappropriate form, and such a main component may be referred to as a cross-factor as a concept that is disposed with the aforementioned centroid.

도 13 및 도 14는 일 실시예에 따른 상호 팩터(또는 크로스 팩터)(Cross factor)를 설명하기 위한 도면이다.13 and 14 are diagrams for explaining a mutual factor (or cross factor) according to an embodiment.

도 13은 제0 모드로 변환된 비트스트림 파일을 제1 모드에 대하여 주성분(principal component)을 구했을 때의 파형을 도시한 것이고, 도 14은 제7 모드로 변환된 비트스트림 파일을 제4 모드에 대하여 주성분을 구했을 때의 파형을 도시한 것이다. 도 13 및 도 14에서 파란색 그래프 및 빨간색 그래프는 각각 남성음성 및 여성음성의 그래프를 나타낸다.13 shows the waveform when the principal component is obtained for the first mode of the bitstream file converted to the 0th mode, and FIG. 14 is the bitstream file converted to the 7th mode in the fourth mode. The waveform is shown when the principal component is obtained. In FIGS. 13 and 14 , the blue graph and the red graph represent graphs of male and female voices, respectively.

도 13 및 도 14과 같이, 적절하지 않은 모드로 벡터 평균이 구해질 경우, 코덱 비트스트림에 남아있는 상관정보마저 소실되어 그래프에서 0.5에 가까운 값들만 관측된다. 즉, 도 13 및 도 14과 같은 그래프를 통해서는, AMR파일에 포함된 음성신호의 강도(amplitude)가 매우 약한 것으로 해석되거나, AMR파일에 사용 가능한 정보가 전혀 포함되어 있지 않는 것으로 해석될 수 밖에 없으며, 이러한 주성분들이 크로스팩터가 된다.13 and 14 , when the vector average is obtained in an inappropriate mode, even correlation information remaining in the codec bitstream is lost and only values close to 0.5 are observed in the graph. That is, through the graphs of FIGS. 13 and 14 , it can only be interpreted that the amplitude of the voice signal included in the AMR file is very weak, or that the AMR file contains no information available at all. No, these principal components become cross-factors.

도 15는 본 발명의 일 실시예에 따른 대상 데이터가 인코딩 된 입력 모드 및 이를 비교하는 점검 모드를 매치하여 생성된 모드 별 주성분 행렬을 도시한 도면이다. 가로축은 대상 데이터가 인코딩 된 모드인 입력 모드로서 제0 모드 내지 제7 모드를 포함하고, 세로축은 대상 데이터 입력 모드에 대하여 주성분을 추출하고자 하는 점검 모드로서 상기 입력 모드와 동일하게 제0 모드 내지 제7 모드를 포함한다. 즉, 도 15의 주성분 행렬은 AMR 형식의 모든 모드에 대한 고유 팩터 및 상호 팩터를 모두 포함하는 8×8 행렬 형태를 가질 수 있으며, 각 주성분 그래프가 위치한 좌표값을 (i, j)(i, j는 8보다 작거나 같은 자연수)으로 설명할 수 있다. 15 is a diagram illustrating a principal component matrix for each mode generated by matching an input mode in which target data is encoded and a check mode comparing the same according to an embodiment of the present invention. The horizontal axis is an input mode in which the target data is encoded, and includes modes 0 to 7, and the vertical axis is a check mode for extracting a principal component with respect to the target data input mode. Includes 7 modes. That is, the principal component matrix of FIG. 15 may have the form of an 8×8 matrix including both eigen factors and mutual factors for all modes of the AMR format, and (i, j)(i, j is a natural number less than or equal to 8).

도 15를 참조하면, (m, m)(m은 8보다 작거나 같은 자연수) 좌표값을 가지는 대각선 위치에 위치하는 그래프는 입력 모드와 동일한 점검 모드로 추출된 일치 모드의 주성분 그래프로서 고유 팩터에 해당한다. 반면, 상기 대각선 위치를 제외한 나머지 위치로서 (m, n)(m, n은 8보다 작거나 같은 자연수로서 서로 다른 값을 가진다.)의 좌표값을 가지는 위치의 그래프는 입력 모드와 상이한 점검 모드로 추출된 불일치 모드의 주성분 그래프로서 전술한 상호 팩터에 해당한다. 대각선 위치에 위치하는 고유 팩터가 나머지 위치의 상호 팩터와 비교할 때 가시적으로 진폭의 변화가 뚜렷한 패턴을 가지는 것을 확인할 수 있다. 상호 팩터는 일부를 제외하고는 고유 팩터와 비교할 때 신호 크기가 약하거나 진폭의 변화가 미미한 것을 확인할 수 있다.Referring to FIG. 15 , a graph located at a diagonal position having a coordinate value of (m, m) (m is a natural number less than or equal to 8) is a principal component graph of the coincidence mode extracted with the same check mode as the input mode, and is based on the eigen factor. corresponds to On the other hand, as the positions other than the diagonal positions, the graph of positions having coordinate values of (m, n) (m, n are natural numbers less than or equal to 8 and have different values) is a check mode different from the input mode. As the principal component graph of the extracted discordance mode, it corresponds to the above-mentioned mutual factor. It can be seen that the intrinsic factor located at the diagonal position has a distinct pattern in which the amplitude change is visibly compared with the mutual factor of the other positions. With the exception of some, it can be seen that the signal magnitude is weak or the amplitude change is insignificant when compared to the intrinsic factor.

본 개시에서는 대상 음성데이터가 AMR 음성데이터인지 여부를 판정하기 위해 MSE 분포의 조건부 확률을 이용한 추정을 한다. 본 개시에서 상기 조건부 확률의 개별 분포는 MSE 분포는 가우스 분포인 것으로 가정하고, MSE의 전체 분포는 속성이 다른 가우스 분포의 선형 결합인 것으로 가정한다.In the present disclosure, in order to determine whether the target speech data is AMR speech data, estimation using the conditional probability of the MSE distribution is performed. In the present disclosure, as for the individual distributions of the conditional probabilities, it is assumed that the MSE distribution is a Gaussian distribution, and it is assumed that the overall distribution of the MSE is a linear combination of Gaussian distributions having different properties.

모드별 MSE 획득부(310)는 대상 데이터 처리부(100)로부터 센트로이드를 수신하고, 기준 데이터 처리부(200)로부터 고유 팩터를 수신하여 상기 모드별 MSE 분포를 획득할 수 있다. 이하, 모드별 MSE 분포 획득 방법에 대하여 구체적으로 설명한다.The MSE acquisition unit 310 for each mode may receive the centroid from the target data processing unit 100 and the intrinsic factor from the reference data processing unit 200 to obtain the MSE distribution for each mode. Hereinafter, a method for obtaining the MSE distribution for each mode will be described in detail.

이하에서, 기준 데이터의 주성분으로부터 추출한 고유 팩터를 기초로 생성(정의)된 코드워드를 c_i라고 하고, 추정 대상이 되는 대상 데이터의 주성분으로부터 추출된 고유 팩터인 센트로이드를 d_i라고 하면, MSE 값은 하기 [수학식 2]와 같이 정의될 수 있다.Hereinafter, a codeword generated (defined) based on the intrinsic factor extracted from the main component of the reference data is referred to as c _i , and the centroid, which is a unique factor extracted from the main component of the target data to be estimated, is referred to as d _i , MSE The value may be defined as in [Equation 2] below.

본 발명의 일 실시예에 따른 AMR 음성데이터의 판별을 위해서는, 전술한 도 15의 행렬의 모든 위치마다의 MSE 분포가 필요하다. 다시 말해, 대상 데이터의 입력 모드에 대하여 이와 동일한 점검 모드 뿐만 아니라 상이한 점검 모드를 기준으로 추출한 모든 주성분 그래프에 대한 MSE 분포가 필요하다. 이하, 입력 모드와 점검 모드가 같은 경우를 '일치 모드', 입력 모드와 점검 모드가 서로 다른 경우를 '불일치 모드'인 경우로 지칭할 수 있고, 일치 모드는 도 15의 행렬에서 (m, m)의 좌표값을 가지고 불일치 모드는 도 15의 행렬에서 (m, n)의 좌표값을 가질 수 있다. 또한, 일치 모드의 주성분을 통해 고유 팩터가 획득되고, 불일치 모드의 주성분을 통해 상호 팩터가 획득될 수 있다.In order to discriminate the AMR voice data according to an embodiment of the present invention, the MSE distribution for every position in the matrix of FIG. 15 is required. In other words, with respect to the input mode of the target data, the MSE distribution for all principal component graphs extracted based on the different check modes as well as the same check mode is required. Hereinafter, a case in which the input mode and the check mode are the same may be referred to as a 'match mode', and a case in which the input mode and the check mode are different from each other may be referred to as a 'mismatch mode', and the matching mode is (m, m in the matrix of FIG. ) and the mismatch mode may have a coordinate value of (m, n) in the matrix of FIG. 15 . In addition, the eigen factor may be obtained through the principal component of the coincident mode, and the mutual factor may be obtained through the principal component of the mismatch mode.

이때, 일치 모드의 MSE의 조건부 확률 분포를 N₁(μ₁, σ₁)(μ₁은 일치 모드의 평균, σ₁은 일치 모드의 표준 편차)로 정의하고, 불일치 모드의 MSE의 조건부 확률 분포를 N₂(μ₂, σ₂)(μ₂은 불일치 모드의 평균, σ₂은 불일치 모드의 표준 편차)로 정의한다.In this case, the conditional probability distribution of the MSE in concordance mode is defined as N ₁ (μ ₁ , σ ₁ ) (μ ₁ is the mean of the consensus mode, σ ₁ is the standard deviation of the consensus mode), and the conditional probability distribution of the MSE in the mismatch mode is defined as is defined as N ₂ (μ ₂ , σ ₂ ), where μ ₂ is the mean of the inconsistent mode, and σ ₂ is the standard deviation of the inconsistent mode.

각 모드 별로 남성의 음성 데이터를 기초로 기준 데이터를 획득하고, 이로부터 고유팩터를 추출하고, 코드워드를 획득할 수 있다. 이와 병렬적으로, 각 모드 별로 여성의 음성 데이터를 기초로 대상 데이터를 획득하고, 이로부터 센트로이드를 추출할 수 있다. 이와 같이 획득된 코드워드와 센트로이드를 이용하여, 10개의 일치 모드의 데이터(c_i, d_i), 70개의 불일치 모드의 데이터(c_i, d_i)를 상기 수학식 2에 대입하여 MSE 값에 관한 총 80개의 상수 값을 획득할 수 있다. 남성, 여성 데이터를 반대로 하여 여성의 음성 데이터를 기초로 코드워드를 획득하고, 남성의 음성 데이터를 기초로 센트로이드를 추출하여 이로부터 마찬가지로 총 80개의 MSE의 상수 값을 획득할 수 있다.For each mode, it is possible to obtain reference data based on male voice data, extract a unique factor therefrom, and obtain a codeword. In parallel, target data may be obtained based on the female voice data for each mode, and centroids may be extracted therefrom. Using the codewords and centroids obtained in this way, 10 matching mode data (c _i , d _i ) and 70 non-matching mode data (c _i , d _i ) are substituted into Equation 2 to obtain the MSE value. It is possible to obtain a total of 80 constant values for . By reversing the male and female data, a codeword is obtained based on the female voice data, and the centroid is extracted based on the male voice data, and a total of 80 MSE constant values can also be obtained from this.

이때, 일치 모드와 불일치 모드 각각의 데이터 개수인 10개 및 70개는 일 예시에 불과하며, 서로 7배 차이가 나거나 행렬의 각 좌표마다 유사한 개수만큼의 데이터가 확보되면 충분하다.In this case, the number of data of 10 and 70, respectively, in the coincident mode and the non-matching mode is only an example, and it is sufficient if the data differs by 7 times from each other or a similar number of data is secured for each coordinate of the matrix.

전술한 바에 따라 각 모드 별로 획득된 MSE 값의 확률 밀도 함수인 MSE 분포를 평균값으로 계산한 그래프가 후술하는 도 16 및 도 17에 도시되어 있다. As described above, graphs in which the MSE distribution, which is a probability density function of the MSE values obtained for each mode, is calculated as an average value is shown in FIGS. 16 and 17 to be described later.

도 16은 제0 모드 내지 제3 모드에 대하여 본 발명의 일 실시예에 따른 평균제곱오차(MSE) 분포를 도시한 그래프이고, 도 17은 제4 모드 내지 제7 모드에 대하여 본 발명의 일 실시예에 따른 MSE 분포를 도시한 그래프이다. 빨간 선으로 도시된 그래프가 일치 모드(code match)의 MSE 분포로서 제1 MSE 분포(N₁), 파란 선으로 도시된 그래프가 불일치 모드(code mismatch)의 MSE 분포로서 제2 MSE 분포(N₂)를 나타낸다. 제1 MSE 분포는 전술한 10개 데이터의 평균으로 산출된 분포이며, 제2 MSE 분포 또한 전술한 70개 데이터의 평균으로 산출된 분포일 수 있다. 16 is a graph showing mean square error (MSE) distribution according to an embodiment of the present invention with respect to mode 0 to mode 3, and FIG. 17 is an embodiment of the present invention with respect to mode 4 to mode 7 It is a graph showing the MSE distribution according to the example. The graph shown by the red line is the first MSE distribution (N ₁ ) as the MSE distribution in code match mode, and the graph shown by the blue line is the MSE distribution in the code mismatch mode as the second MSE distribution (N ₂ ). ) is indicated. The first MSE distribution may be a distribution calculated as an average of the above-described 10 pieces of data, and the second MSE distribution may also be a distribution calculated as an average of the above-described 70 pieces of data.

도 16 및 도 17을 참조하면, 제1 MSE 분포(N₁)는 그 진폭이 좁고 분산이 작은 편인 반면 제2 MSE 분포(N₂)는 그 진폭이 넓고 분산이 큰 편임을 확인할 수 있다 (σ₁＜σ₂). 한편, 평균 값 또한 제1 MSE 분포(N₁)가 제2 MSE 분포(N₂) 보다 대체로 작은 것을 확인할 수 있다 (μ₁＜μ₂).16 and 17 , it can be seen that the first MSE distribution (N ₁ ) has a narrow amplitude and a small dispersion, whereas the second MSE distribution (N ₂ ) has a wide amplitude and a large dispersion (σ). ₁ <σ ₂ ). On the other hand, it can be seen that the average value of the first MSE distribution (N ₁ ) is generally smaller than that of the second MSE distribution (N ₂ ) (μ ₁ <μ ₂ ).

이와 같은 MSE의 조건부 확률 분포의 실험적 추정을 통해 대상 데이터가 AMR 데이터인지 여부를 판정할 수 있다. 이때 상기 판정을 위해서는 상기 제1 MSE 분포 및 제2 MSE 분포 사이의 관계를 이용한 임계값 획득이 필요하다.It can be determined whether the target data is AMR data through the experimental estimation of the conditional probability distribution of the MSE as described above. In this case, for the determination, it is necessary to obtain a threshold value using the relationship between the first MSE distribution and the second MSE distribution.

임계값 획득부(320)는 후술하는 방법에 의해 본 발명의 임계값을 획득할 수 있다. 본 개시에서 MSE 분포를 가우스 분포(정규 분포)로 가정한바, 상기 가우스 분포의 확률 밀도 함수 f(μ,σ)는 아래의 [수학식 3]으로 표현될 수 있다.The threshold value acquisition unit 320 may acquire the threshold value of the present invention by a method to be described later. Since the MSE distribution is assumed to be a Gaussian distribution (normal distribution) in the present disclosure, the probability density function f(μ,σ) of the Gaussian distribution may be expressed by the following [Equation 3].

이때, 임계값을 구하기 위한 방정식으로서 전술한 제1 MSE 분포(N₁)(의 확률 밀도 함수)와 제2 MSE 분포(N₂)(의 확률 밀도 함수)가 같은 값을 가지는 지점(x)을 임계값으로 구하기 위해 하기의 [수학식 4]를 유도할 수 있다. 상기 임계값 x의 지점은 도 16 및 도 17의 그래프에서 N₁과 N₂의 오른쪽 교점일 수 있다. 다시 말해, 두 MSE 분포(N₁, N₂)가 서로 동일한 위치가 임계값으로 활용될 수 있다.At this time, as an equation for obtaining the threshold value, a point (x) at which the aforementioned first MSE distribution (N ₁ ) (probability density function of) and the second MSE distribution (N ₂ ) (probability density function of) have the same value In order to obtain a threshold value, the following [Equation 4] can be derived. The point of the threshold value x may be a right intersection point of N ₁ and N ₂ in the graphs of FIGS. 16 and 17 . In other words, a position where the two MSE distributions (N ₁ , N ₂ ) are identical to each other may be used as a threshold value.

상기 수학식 4에 따라 임계값 x는 하기 [수학식 5]와 같이 산출될 수 있다.According to Equation 4, the threshold value x may be calculated as in Equation 5 below.

상기 수학식 5를 기초로 전술한 각 모드 별 실험 데이터(σ₁,σ₂, μ₁, μ₂)를 이용하여 각 모드 별 임계값을 구하면 일 예로 하기 [표 3]의 데이터를 획득할 수 있다. 하기 [표 3]에 따른 임계값 데이터는 일 예시에 불과할 뿐 하기 표 3의 값과 다른 값일 수도 있으며, 표 3에 명시된 값 부근의 값일 수도 있다.Experimental data (σ ₁ , for each mode described above based on Equation 5)σ ₂ , μ ₁ , μ ₂ ) is used to obtain the threshold values for each mode, for example, the data shown in Table 3 below can be obtained. The threshold data according to [Table 3] below is merely an example and may be a value different from the value of Table 3 below, or a value near the value specified in Table 3.

modemode 임계값 (x)Threshold (x) 00 0.0012430.001243 1One 0.0007890.000789 22 0.0008390.000839 33 0.0007190.000719 44 0.0002970.000297 55 0.0005280.000528 66 0.000190.00019 77 0.0003490.000349

이후, 비교 판정부(330)는 대상 데이터의 MSE 분포의 최소값과 상기 모드 별 임계값 중 상기 대상 데이터가 압축된 AMR 모드인 입력 모드의 임계값을 비교하여 상기 대상 데이터가 AMR 음성데이터인지 여부를 판정할 수 있다. 더 구체적으로, 비교 판정부(330)는 상기 대상 데이터의 MSE 분포의 최소값을 추출하고, 상기 최소값과 상기 임계값을 비교할 수 있다. 즉, MSE 분포가 최소가 되는 방향으로 추정하는 최대가능도 방식을 사용할 수 있다.Thereafter, the comparison determination unit 330 compares the minimum value of the MSE distribution of the target data and the threshold value of the input mode in which the target data is compressed AMR mode among the threshold values for each mode to determine whether the target data is AMR voice data. can be judged. More specifically, the comparison determination unit 330 may extract a minimum value of the MSE distribution of the target data and compare the minimum value with the threshold value. That is, a maximum likelihood method of estimating the direction in which the MSE distribution is minimized may be used.

비교 판정부(330)는 상기 MSE 분포의 최소값이 상기 임계값보다 작은 경우 대상 데이터가 AMR 음성 데이터인 것으로 판정할 수 있다. 반대로, MSE 분포의 최소값이 상기 임계값보다 크거나 같은 경우 대상 데이터가 AMR 형식 이외의 형식으로 압축된 비AMR 음성 데이터를 포함하는 것으로 판정할 수 있다. 다시 말해, 이 경우는 대상 데이터가 비AMR 음성 데이터만 포함하는 케이스 뿐만 아니라, 대상 데이터가 AMR 음성데이터와 비AMR 음성 데이터 모두를 포함하는 케이스도 포함할 수 있다.The comparison determination unit 330 may determine that the target data is AMR voice data when the minimum value of the MSE distribution is less than the threshold value. Conversely, when the minimum value of the MSE distribution is greater than or equal to the threshold value, it may be determined that the target data includes non-AMR speech data compressed in a format other than the AMR format. In other words, this case may include not only a case in which the target data includes only non-AMR voice data, but also a case in which the target data includes both AMR voice data and non-AMR voice data.

가령, 대상 데이터가 제1 모드로 압축되었다고 가정할 때 입력 모드는 제1 모드이고, 제1 모드의 임계값은 상기 [표 3]에 따르면 0.000789이다. 상기 대상 데이터의 MSE 분포의 최소값을 x_min이라고 할 때, 상기 최소값(x_min)이 0.000789보다 작으면(x_min＜0.000789) 도 16의 그래프에서 빨간 선으로 표시된 일치 모드의 제1 MSE 분포에 속하는 것이므로 상기 대상 데이터는 AMR 데이터인 것으로 판정할 수 있다.For example, assuming that the target data is compressed in the first mode, the input mode is the first mode, and the threshold value of the first mode is 0.000789 according to Table 3 above. When the minimum value of the MSE distribution of the target data is x _min , if the minimum value (x _min ) is less than 0.000789 (x _min < 0.000789), it belongs to the first MSE distribution of the matching mode indicated by the red line in the graph of FIG. 16 . Therefore, it can be determined that the target data is AMR data.

도 18은 본 발명의 일 실시예에 따른 AMR 음성 데이터 판별 방법을 설명하기 위한 순서도이다. 도 18은 도 1 및 도 2의 장치에 의해 수행될 수 있으므로 도 1 및 도 2를 함께 참조하여 설명하며, 중복되는 내용은 설명을 생략하거나 간략히 할 수 있다.18 is a flowchart illustrating a method for determining AMR voice data according to an embodiment of the present invention. Since FIG. 18 can be performed by the apparatus of FIGS. 1 and 2 , it will be described with reference to FIGS. 1 and 2 , and overlapping descriptions may be omitted or simplified.

먼저, 대상 데이터 처리부(100)는 AMR 형식으로 압축된 대상 데이터로부터 센트로이드를 추출할 수 있다(S100).First, the target data processing unit 100 may extract a centroid from the compressed target data in the AMR format (S100).

기준 데이터 처리부(200)는 마찬가지로 AMR 형식으로 압축된 기준 데이터로부터 상기 기준 데이터의 고유 팩터인 기준 고유 팩터를 추출할 수 있다(S200). 상기 기준 고유 팩터를 기초로 코드워드를 생성(정의)할 수 있다.The reference data processing unit 200 may similarly extract a reference intrinsic factor, which is a unique factor of the reference data, from the reference data compressed in the AMR format ( S200 ). A codeword may be generated (defined) based on the reference intrinsic factor.

본 도면에서는 S100 단계 이후에 S200 단계가 수행되는 것으로 도시하였으나, S100 및 S200 단계는 병렬적으로 수행될 수 있다.Although the figure shows that step S200 is performed after step S100, steps S100 and S200 may be performed in parallel.

이후, 비교부(300)는 최대가능도(Maximum Likelihood; ML) 방식을 이용하여 상기 대상 데이터 및 상기 기준 데이터를 비교하여 상기 대상 데이터가 AMR 음성데이터인지 여부를 판정할 수 있다 (S300). 상기 비교부(300)는 상기 센트로이드 및 상기 기준 고유 팩터 간의 평균 제곱 오차(Mean Squared Error; MSE)의 확률 밀도 함수인 MSE 분포를 이용하여 판별할 수 있다. 상기 비교 판정 단계(S300)는 후술하는 단계들을 포함할 수 있다.Thereafter, the comparison unit 300 may determine whether the target data is AMR voice data by comparing the target data and the reference data using a Maximum Likelihood (ML) method ( S300 ). The comparison unit 300 may determine using the MSE distribution, which is a probability density function of a mean squared error (MSE) between the centroid and the reference eigen factor. The comparison determination step ( S300 ) may include steps to be described later.

먼저, 모드별 MSE 획득부(310)는 센트로이드 및 기준 고유 팩터를 기초로 모드별 MSE 분포를 획득할 수 있다(S310).First, the MSE acquisition unit 310 for each mode may acquire the MSE distribution for each mode based on the centroid and the reference intrinsic factor ( S310 ).

이후, 임계값 획득부(320)는 상기 모드 별 MSE 분포에 기초하여 모드 별 임계값(Threshold value)을 획득할 수 있다(S320). 이때 임계값은 전술한 [수학식 4] 및 [수학식 5]에 따라, 일치 모드의 주성분으로부터 획득한 제1 MSE 분포(N₁)와 불일치 모드의 주성분으로부터 획득한 제2 MSE 분포(N₂)가 같은 값을 가지는 지점으로 획득될 수 있다. 더 구체적으로는, 일치 모드의 센트로이드 및 코드워드(c_i, d_i)를 이용하여 제1 MSE 분포(N₁)를 획득하고, 불일치 모드의 센트로이드 및 코드워드(c_i, d_i)를 이용하여 제2 MSE 분포(N₂)를 획득할 수 있다.Thereafter, the threshold value acquisition unit 320 may acquire a threshold value for each mode based on the MSE distribution for each mode ( S320 ). In this case, the threshold value is the first MSE distribution (N ₁ ) obtained from the principal component of the concordance mode and the second MSE distribution (N ₂ ) obtained from the principal component of the mismatch mode according to the above-mentioned [Equation 4] and [Equation 5]. ) can be obtained as a point having the same value. More specifically, the first MSE distribution (N ₁ ) is obtained using the centroid and codewords (c _i , d _i ) of concordance mode, and the centroid and codewords (c _i , d _i ) of discordant mode are obtained. A second MSE distribution (N ₂ ) may be obtained using .

이후, 비교 판정부(330)는 대상 데이터의 MSE 분포 값과 상기 모드 별 임계값 중 상기 입력 모드의 임계값을 비교하여 상기 대상 데이터가 AMR 음성 데이터인지 여부를 판정할 수 있다. Thereafter, the comparison determination unit 330 may determine whether the target data is AMR voice data by comparing the MSE distribution value of the target data with the threshold value of the input mode among the threshold values for each mode.

이때 비교 판정부(330)는 대상 데이터의 MSE 분포의 최소값을 먼저 구하고, 상기 최소값이 상기 임계값보다 작은지 여부를 판단할 수 있다(S330). 상기 MSE의 최소값이 상기 임계값보다 작은 경우(S330-Y) 상기 대상 데이터가 AMR 음성 데이터인 것으로 판정할 수 있다(S340). 이와 반대로 상기 MSE의 최소값이 상기 임계값보다 크거나 같은 경우(S330-N) 상기 대상 데이터는 AMR 음성 데이터가 아닌 비AMR 음성 데이터로 판정할 수 있다(S350). 이때 비AMR 음성 데이터는 대상 데이터가 비AMR 음성 데이터만 포함하는 케이스 뿐만 아니라, 대상 데이터가 AMR 음성데이터와 비AMR 음성 데이터 모두를 포함하는 케이스도 포함할 수 있다.In this case, the comparison determination unit 330 may first obtain the minimum value of the MSE distribution of the target data and determine whether the minimum value is smaller than the threshold value ( S330 ). When the minimum value of the MSE is smaller than the threshold value (S330-Y), it may be determined that the target data is AMR voice data (S340). Conversely, when the minimum value of the MSE is greater than or equal to the threshold value (S330-N), the target data may be determined to be non-AMR voice data rather than AMR voice data (S350). In this case, the non-AMR voice data may include not only a case in which the target data includes only non-AMR voice data, but also a case in which the target data includes both AMR voice data and non-AMR voice data.

본 명세서에서 설명되는 다양한 실시예들은 예시적이며, 서로 구별되어 독립적으로 실시되어야 하는 것은 아니다. 본 명세서에서 설명된 실시예들은 서로 조합된 형태로 실시될 수 있다.The various embodiments described herein are illustrative, and do not need to be performed independently of each other. The embodiments described in this specification may be implemented in combination with each other.

이상 설명된 다양한 실시예들은 컴퓨터 상에서 다양한 구성요소를 통하여 실행될 수 있는 컴퓨터 프로그램의 형태로 구현될 수 있으며, 이와 같은 컴퓨터 프로그램은 컴퓨터로 판독 가능한 매체에 기록될 수 있다. 이때, 매체는 컴퓨터로 실행 가능한 프로그램을 계속 저장하거나, 실행 또는 다운로드를 위해 임시 저장하는 것일 수도 있다. 또한, 매체는 단일 또는 수개 하드웨어가 결합된 형태의 다양한 기록수단 또는 저장수단일 수 있는데, 어떤 컴퓨터 시스템에 직접 접속되는 매체에 한정되지 않고, 네트워크 상에 분산 존재하는 것일 수도 있다. 매체의 예시로는, 하드 디스크, 플로피 디스크 및 자기 테이프와 같은 자기 매체, CD-ROM 및 DVD와 같은 광기록 매체, 플롭티컬 디스크(floptical disk)와 같은 자기-광 매체(magneto-optical medium), 및 ROM, RAM, 플래시 메모리 등을 포함하여 프로그램 명령어가 저장되도록 구성된 것이 있을 수 있다. 또한, 다른 매체의 예시로, 애플리케이션을 유통하는 앱 스토어나 기타 다양한 소프트웨어를 공급 내지 유통하는 사이트, 서버 등에서 관리하는 기록매체 내지 저장매체도 들 수 있다.The various embodiments described above may be implemented in the form of a computer program that can be executed through various components on a computer, and such a computer program may be recorded in a computer-readable medium. In this case, the medium may be to continuously store the program executable by the computer, or to temporarily store the program for execution or download. In addition, the medium may be various recording means or storage means in the form of a single or several hardware combined, it is not limited to a medium directly connected to any computer system, and may exist distributed on a network. Examples of the medium include a hard disk, a magnetic medium such as a floppy disk and a magnetic tape, an optical recording medium such as CD-ROM and DVD, a magneto-optical medium such as a floppy disk, and those configured to store program instructions, including ROM, RAM, flash memory, and the like. In addition, examples of other media include recording media or storage media managed by an app store that distributes applications, sites that supply or distribute various other software, and servers.

이상에서는 본 발명의 바람직한 실시 예에 대하여 도시하고 설명하였지만, 본 발명은 상술한 특정의 실시 예에 한정되지 아니하며, 청구범위에서 청구하는 본 발명의 요지를 벗어남이 없이 당해 발명이 속하는 기술분야에서 통상의 지식을 가진 자에 의해 다양한 변형실시가 가능한 것은 물론이고, 이러한 변형실시들은 본 발명의 기술적 사상이나 전망으로부터 개별적으로 이해되어져서는 안될 것이다.In the above, preferred embodiments of the present invention have been illustrated and described, but the present invention is not limited to the specific embodiments described above, and it is common in the technical field to which the present invention pertains without departing from the gist of the present invention as claimed in the claims. Various modifications are possible by those having the knowledge of, of course, and these modifications should not be individually understood from the technical spirit or perspective of the present invention.

따라서, 본 발명의 사상은 앞에서 설명된 실시예들에 국한하여 정해져서는 아니되며, 후술하는 특허청구범위뿐만 아니라 이 특허청구범위와 균등한 또는 이로부터 등가적으로 변경된 모든 범위가 본 발명의 사상의 범주에 속한다고 할 것이다.Therefore, the spirit of the present invention should not be limited to the above-described embodiments, and not only the claims described below, but also all scopes equivalent to or changed from the claims described below are of the spirit of the present invention. would be said to belong to the category.

10: AMR 음성 데이터 판별 장치
100: 대상 데이터 처리부
200: 기준 데이터 처리부
110, 210: 주성분 추출부
120: 센트로이드 추출부
220: 코드워드 추출부
300: 비교부
310: 모드별 MSE 획득부
320: 임계값 획득부
330: 비교 판정부10: AMR voice data discrimination device
100: target data processing unit
200: reference data processing unit
110, 210: main component extraction unit
120: centroid extraction unit
220: codeword extraction unit
300: comparison unit
310: MSE acquisition unit for each mode
320: threshold value acquisition unit
330: comparison determination unit

Claims

A target data processing step of extracting a centroid, which is a unique factor of the target data, from the target data compressed in the AMR format of one of the 0th mode to the 7th mode;
a reference data processing step of extracting a reference intrinsic factor, which is an intrinsic factor of the reference data, from the reference data compressed in the AMR format; and
a comparison determination step of determining whether the target data is AMR data compressed in an AMR format by comparing the centroid and the reference intrinsic factor using a Maximum Likelihood (ML) method;
including,
The comparison and determination step is determined using an MSE distribution that is a probability density function of a mean squared error (MSE) between the centroid and the reference eigen factor.

According to claim 1,
The target data processing step is,
extracting a principal component from the target data; and
extracting the centroid for a partial region of the main component from the main component;
including,
The extracting of the centroid may include dividing the main component into a plurality of subframes including at least one or more bits, and the centroid based on bit information included in some subframes of the plurality of subframes. AMR voice data discrimination method comprising the step of specifying.

3. The method of claim 2,
An input mode in which the target data is a compressed AMR mode, and a check mode for providing the number of bits for extracting the principal component with respect to the input mode include the 0th mode to the 7th mode,
The step of extracting the main component,
extracting a principal component of the matching mode extracted based on the same check mode as the input mode with respect to the input mode; and
and extracting a cross factor, which is a main component of a mismatch mode extracted based on a check mode different from the input mode, with respect to the input mode.

3. The method of claim 2,
The step of extracting the main component,
a data receiving step of receiving a bitstream of target data compressed in one of the 0th mode to the 7th mode;
a frame separation step of dividing the bitstream into frames having a predetermined first number of bits;
a bit array generating step of generating a bit array by integrating the separated frames; and
specifying the principal component of the bitstream based on the bit array and the total number of frames;
including,
The centroid is a method for determining AMR voice data in which the centroid is extracted based on some of the frames.

3. The method of claim 2,
The reference data processing step is,
extracting a reference principal component from the reference data; and
extracting the reference eigen factor for a partial region of the reference principal component from the reference principal component;
including,
The extracting of the reference eigen factor may include dividing the reference principal component into a plurality of subframes including at least one bit, and based on bit information included in some subframes of the plurality of subframes, the A method for determining AMR speech data, comprising the step of specifying a reference intrinsic factor.

4. The method of claim 3,
The comparison determination step is
obtaining an MSE distribution for each mode based on the centroid and the reference intrinsic factor;
obtaining a threshold value for each mode based on the MSE distribution for each mode; and
comparing the minimum value of the MSE distribution of the target data with the threshold value of the input mode among the threshold values for each mode, and determining that the target data is AMR voice data when the minimum value is less than the threshold value;
Including, AMR voice data discrimination method.

7. The method of claim 6,
The step of obtaining the threshold value is
and acquiring, as a threshold, a point at which the first MSE distribution of the principal component of the inconsistent mode has the same value as the second MSE distribution of the principal component of the inconsistent mode.

A computer-readable recording medium storing a program for executing the method according to any one of claims 1 to 7.

a target data processing unit for extracting a centroid, which is a unique factor of the target data, from the target data compressed in one of the 0th mode to the 7th mode;
a reference data processing unit for extracting a reference intrinsic factor, which is an intrinsic factor of the reference data, from the reference data compressed in the AMR format; and
A comparison unit for determining whether the target data is AMR data compressed in an AMR format by comparing the centroid and the reference intrinsic factor using a Maximum Likelihood (ML) method;
The comparison unit determines using the MSE distribution, which is a probability density function of a mean squared error (MSE) between the centroid and the reference eigen factor, to determine the AMR speech data.

10. The method of claim 9,
The target data processing unit,
a principal component extracting unit for extracting a principal component from the target data; and
a centroid extraction unit for extracting the centroid for a partial region of the main component from the main component;
including,
The centroid extraction unit divides the main component into a plurality of subframes including at least one or more bits, and specifies the centroid based on bit information included in some subframes of the plurality of subframes, AMR Voice data discrimination device.

11. The method of claim 10,
An input mode in which the target data is a compressed AMR mode, and a check mode for providing the number of bits for extracting the principal component with respect to the input mode include the 0th mode to the 7th mode,
The main component extraction unit,
extracting the principal components of the matching mode extracted based on the same check mode as the input mode with respect to the input mode;
An apparatus for determining AMR voice data for extracting a cross factor, which is a main component of a discrepancy mode extracted based on a check mode different from the input mode, with respect to the input mode.

11. The method of claim 10,
The main component extraction unit,
a data receiver configured to receive a bitstream of target data compressed in an AMR format among the 0th mode to the 7th mode;
a frame dividing unit dividing the bitstream into frames having a predetermined first number of bits;
a bit array generator for generating a bit array by integrating the separated frames; and
a principal component specifying unit for specifying the main component of the bitstream based on the bit array and the total number of frames;
including,
The centroid is extracted based on some of the frames, AMR voice data discrimination device.

11. The method of claim 10,
The reference data processing unit,
a reference principal component extracting unit for extracting a reference principal component from the reference data; and
Including;
The reference intrinsic factor extracting unit separates the reference main component into a plurality of subframes including at least one bit, and specifies the reference intrinsic factor based on bit information included in some subframes of the plurality of subframes. AMR voice data discrimination device.

12. The method of claim 11,
The comparison unit,
an MSE acquisition unit for each mode for acquiring an MSE distribution for each mode based on the centroid and the reference intrinsic factor;
a threshold value acquisition unit for acquiring a threshold value for each mode based on the MSE distribution for each mode; and
a comparison determining unit that compares the minimum value of the MSE distribution of the target data and the threshold value of the input mode among the threshold values for each mode, and determines that the target data is AMR voice data when the minimum value is less than the threshold value;
Including, AMR voice data discrimination device.

15. The method of claim 14,
The threshold value acquisition unit,
and acquiring a point at which the first MSE distribution of the principal component of the coincident mode has the same value as the second MSE distribution of the principal component of the inconsistent mode as a threshold value.