KR102225503B1

KR102225503B1 - System and method for audio watermarking

Info

Publication number: KR102225503B1
Application number: KR1020170140497A
Authority: KR
Inventors: 차재욱
Original assignee: 주식회사 케이티
Priority date: 2017-10-26
Filing date: 2017-10-26
Publication date: 2021-03-08
Also published as: KR20190046563A

Abstract

오디오 워터마킹 시스템이 저장된 음원에 오디오 워터마크를 삽입하는 방법으로서, 단말에 의해 요청된 음원에 포함된 적어도 하나의 잡음 구간을 지시하는 메타 데이터를 토대로, 음원에 포함된 적어도 하나의 잡음 구간을 확인한다. 메타 데이터와 음원을 기초로, 잡음 구간에 삽입할 워터마크를 생성하고, 생성한 워터마크를 잡음 구간에 삽입한 후 단말에 전송한다.A method of inserting an audio watermark into a sound source stored by an audio watermarking system, wherein at least one noise section included in the sound source is checked based on metadata indicating at least one noise section included in the sound source requested by the terminal. do. Based on the metadata and the sound source, a watermark to be inserted into the noise section is generated, the generated watermark is inserted into the noise section, and then transmitted to the terminal.

Description

System and method for audio watermarking

본 발명은 오디오 워터마킹 시스템 및 방법에 관한 것이다.The present invention relates to an audio watermarking system and method.

멀티미디어 기술의 발전으로 인해 다양한 디지털 콘텐츠들이 제작, 유통되고 있고, 디지털 콘텐츠를 이용할 수 있는 단말의 종류와 기술이 발전함에 따라 디지털 콘텐츠의 요구도 증가하고 있다. 디지털 콘텐츠의 유통이 활발해짐에 따라 불법 복제된 디지털 콘텐츠들의 유통 또한 늘어나고 있다. With the development of multimedia technology, various digital contents are being produced and distributed, and as the types and technologies of terminals that can use digital contents are developed, the demand for digital contents is also increasing. As the distribution of digital contents becomes active, the distribution of illegally copied digital contents is also increasing.

이에 디지털 콘텐츠에 대한 콘텐츠 저작자의 저작권을 보호하기 위한 다양한 기술들이 연구되고 있다. 다양한 기술들 중 워터마크(watermark) 기술은 디지털 콘텐츠에 사람의 눈이나 귀를 통해 쉽게 감지하기 어려운 디지털 이미지, 오디오, 비디오 신호에 저작권 정보를 삽입하여, 디지털 콘텐츠에 대한 소유권을 보호하는 기술이다.Accordingly, various technologies are being studied to protect the copyrights of content authors for digital contents. Among various technologies, watermark technology is a technology that protects ownership of digital contents by inserting copyright information into digital images, audio, and video signals that are difficult to detect easily through human eyes or ears.

그러나, 종래의 워터마크 기술은 신호처리 시 강인성을 보여주지 못한다. 또한, 기존 워터마크 기술은 디지털 이미지에 삽입하는 형태로 오디오 콘텐츠에 대한 보호가 어렵다. However, the conventional watermark technology does not show robustness in signal processing. In addition, the existing watermark technology is difficult to protect audio content in the form of embedding it into a digital image.

오디오 콘텐츠 보호를 위해 워터마크를 오디오 헤더에 삽입하여 보호하는 기술이 있다. 이 기술을 이용할 경우 비가청 주파수를 사용하여야 하므로 오디오 콘텐츠에 적용하기에 제한적이고 헤더에 정보가 삽입되므로 쉽게 삭제될 수 있다. There is a technology for protecting audio content by inserting a watermark into an audio header. When this technology is used, it is limited to be applied to audio contents because inaudible frequencies must be used, and since information is inserted in the header, it can be easily deleted.

또한, 주파수 영역에서 워터마크를 삽입하여 보호하는 기술도 있다. 이 방법을 사용할 경우, 주파수 영역에서 워터마크를 삽입해야 하기 때문에 방법이 복잡하고, 주파수 변경이나 피치 조절, 샘플 비트 변경 등의 공격에 약하며 오디오 콘텐츠의 품질이 저하되는 문제점이 있다.In addition, there is a technology for protecting by inserting a watermark in the frequency domain. In the case of using this method, since the watermark must be inserted in the frequency domain, the method is complicated, it is weak against attacks such as frequency change, pitch control, sample bit change, etc., and the quality of audio content is deteriorated.

따라서, 본 발명은 오디오에 오디오 워터마크를 삽입하여 제공하는 오디오 워터마킹 시스템 및 방법을 제공한다.Accordingly, the present invention provides an audio watermarking system and method for inserting and providing an audio watermark in audio.

상기 본 발명의 기술적 과제를 달성하기 위한 본 발명의 하나의 특징인 음원에 오디오 워터마크를 삽입하는 오디오 워터마킹 시스템으로서,As an audio watermarking system for inserting an audio watermark into a sound source, which is one characteristic of the present invention for achieving the technical problem of the present invention,

상기 음원에서 음성 구간과 잡음 구간을 확인하고, 확인한 잡음 구간의 위치를 지시하는 메타 데이터를 생성하는 오디오 서버, 그리고 상기 오디오 서버로부터 상기 음원과 메타 데이터를 수신하고, 상기 음원과 메타 데이터, 음원이 요청된 시간 정보 및 음원을 요청한 사용자의 식별 정보를 토대로 오디오 워터마크를 생성하여 상기 잡음 구간에 삽입하며, 오디오 워터마크가 삽입된 음원을 상기 오디오 서버로 전달하는 워터마킹 처리 서버를 포함한다.An audio server that checks a voice section and a noise section in the sound source, generates metadata indicating the location of the identified noise section, and receives the sound source and metadata from the audio server, and the sound source, metadata, and sound source are And a watermarking processing server that generates an audio watermark based on the requested time information and identification information of a user who requested the sound source, inserts it into the noise section, and delivers the sound source with the audio watermark inserted to the audio server.

상기 오디오 서버는, 상기 워터마킹 처리 서버로 상기 음원과 상기 음원에 대한 메타 데이터를 포함하는 오디오 신호를 전송하고, 상기 오디오 워터마크가 삽입된 음원을 수신하면 상기 음원을 요청한 단말로 전송하는 인터페이스, 그리고 상기 단말로부터 요청된 음원을 미리 설정한 구간 단위로 분할된 분할 구간으로 수신하여 분할 구간의 평균 음량을 구하고, 상기 평균 음량을 미리 설정된 임계 세기와 비교하여 상기 분할 구간이 음성 구간인지 잡음 구간인지 구분하는 프로세서를 포함할 수 있다.The audio server, an interface that transmits an audio signal including the sound source and metadata about the sound source to the watermarking processing server, and transmits the sound source to a terminal requesting the sound source upon receiving the sound source into which the audio watermark is inserted, Then, the sound source requested from the terminal is received as a divided section divided by a preset section unit, and the average volume of the divided section is obtained, and the average volume is compared with a preset threshold strength to determine whether the divided section is a voice section or a noise section. It may include a processor that distinguishes.

상기 프로세서는, 상기 분할 구간이 잡음 구간인 것으로 확인하면, 상기 분할 구간의 이전 구간 또는 이후 구간 중 잡음 구간에 가까운 음성 구간에서 구한 평균 음량을 상기 잡음 구간인 분할 구간의 잡음 레벨로 결정하며, 상기 분할 구간의 신호 세기와 상기 분할 구간에 포함된 음원 데이터 수를 이용하여 상기 평균 음량을 구할 수 있다.When it is determined that the divided section is a noise section, the processor determines an average volume obtained from a speech section close to the noise section among the previous or subsequent sections of the divided section as the noise level of the divided section, which is the noise section. The average volume may be obtained using the signal strength of the divided section and the number of sound source data included in the divided section.

상기 프로세서는, 상기 잡음 구간의 이전 구간 또는 이후 구간 중 어느 하나인 음성 구간을 DFT(Discrete Fourier Transform) 처리하고, 상기 DFT 처리된 분할 구간에 포함된 복수의 주파수별 신호 세기를 비교하여, 신호 세기가 센 주파수 순서로 제1 주파수와 제2 주파수를 상기 잡음 구간의 메인 주파수로 설정할 수 있다.The processor performs DFT (Discrete Fourier Transform) processing on a speech section that is one of a previous section or a subsequent section of the noise section, and compares signal strengths for a plurality of frequencies included in the DFT-processed division section, The first frequency and the second frequency may be set as the main frequency of the noise section in the order of the highest frequency.

상기 프로세서는, 상기 분할 구간의 음원 정보, 상기 제1 주파수와 제2 주파수, 그리고 잡음 레벨을 포함하는 메타 데이터를 생성하며, 상기 제1 주파수와 제2 주파수는 각각의 주파수 별로 미리 설정된 인덱스 정보 또는 주파수 값 중 어느 하나가 삽입되고, 상기 음원 정보는 분할 구간 시작 시간, 분할 구간 종료 시간, 상기 분할 구간의 식별 정보 중 적어도 하나를 포함할 수 있다.The processor generates metadata including sound source information of the divided section, the first frequency and the second frequency, and a noise level, and the first frequency and the second frequency are preset index information for each frequency or Any one of the frequency values is inserted, and the sound source information may include at least one of a divided section start time, a divided section end time, and identification information of the divided section.

상기 오디오 서버는, 상기 음원에 포함된 적어도 하나의 잡음 구간에 대한 정보와 잡음 구간에 대한 상기 메인 주파수, 그리고 잡음 구간의 잡음 레벨과 잡음 구간인 분할 구간의 정보를 포함하는 메타 데이터를 저장하는 메타 데이터 메모리, 그리고 상기 인터페이스를 통해 외부로부터 전송된 음원을 음원 식별 정보와 함께 저장하는 오디오 메모리를 포함할 수 있다.The audio server stores meta data including information on at least one noise section included in the sound source, the main frequency for the noise section, and information on a noise level of the noise section and a segmentation section that is a noise section. It may include a data memory and an audio memory for storing the sound source transmitted from the outside through the interface together with sound source identification information.

상기 워터마킹 처리 서버는, 상기 오디오 서버로부터 상기 메타 데이터를 수신하고, 상기 분할 구간에 오디오 워터마크를 삽입한 제2 오디오 신호를 상기 오디오 서버로 전달하는 인터페이스, 상기 메타 데이터를 토대로 상기 음원 중 잡음 구간을 확인하고, 상기 메타 데이터에 포함되어 있는 메인 주파수와 잡음 레벨 그리고 음원이 요청된 시간 정보 및 음원을 요청한 사용자의 식별 정보를 이용하여 잡음 구간에 삽입할 오디오 워터마크를 생성하여 잡음 구간에 삽입하는 워터마크 처리부, 그리고 상기 워터마크 처리부가 오디오 워터마크를 생성하기 위해 이용한 잡음과 삽입 정보를 저장하는 메모리를 포함할 수 있다.The watermarking processing server is an interface for receiving the metadata from the audio server and transmitting a second audio signal in which an audio watermark is inserted in the divided section to the audio server, and noise in the sound source based on the metadata Check the section, generate an audio watermark to be inserted into the noise section by using the main frequency and noise level included in the meta data, the time information for which the sound source was requested, and the identification information of the user who requested the sound source, and insert it into the noise section. The watermark processing unit may include a memory for storing noise and embedding information used by the watermark processing unit to generate an audio watermark.

상기 워터마크 처리부는, 상기 메타 데이터의 메인 주파수를 토대로 제1 주파수의 잡음과 제2 주파수의 잡음을 생성하고, 상기 메타 데이터에 포함된 음원 정보를 토대로 생성된 이진 코드의 제1 부분에는 제1 주파수의 잡음을, 제2 부분에는 제2 주파수의 잡음을 삽입할 수 있다.The watermark processing unit generates noise of a first frequency and noise of a second frequency based on a main frequency of the meta data, and a first part of the binary code generated based on the sound source information included in the meta data The noise of the frequency may be inserted and the noise of the second frequency may be inserted into the second part.

상기 본 발명의 기술적 과제를 달성하기 위한 본 발명의 또 다른 특징인 오디오 워터마킹 시스템이 저장된 음원에 오디오 워터마크를 삽입하는 방법으로서, As a method of inserting an audio watermark into a sound source stored with an audio watermarking system, which is another feature of the present invention for achieving the technical problem of the present invention,

단말에 의해 요청된 음원에 포함된 적어도 하나의 잡음 구간을 지시하는 메타 데이터를 토대로, 상기 음원에 포함된 적어도 하나의 잡음 구간을 확인하는 단계, 상기 메타 데이터와 상기 음원을 기초로, 상기 잡음 구간에 삽입할 워터마크를 생성하는 단계, 그리고 생성한 워터마크를 상기 잡음 구간에 삽입한 후 상기 단말에 전송하는 단계를 포함한다.Checking at least one noise section included in the sound source based on metadata indicating at least one noise section included in the sound source requested by the terminal, based on the metadata and the sound source, the noise section And generating a watermark to be inserted in, and transmitting the generated watermark to the terminal after inserting the generated watermark into the noise section.

상기 잡음 구간을 확인하는 단계 이전에, 상기 음원을 미리 설정된 시간 단위로 분할한 분할 구간을 생성하는 단계, 생성한 분할 구간에 대한 평균 음량을 계산하고, 계산한 평균 음량을 토대로 상기 분할 구간이 잡음 구간인지 음성 구간인지 확인하는 단계, 상기 분할 구간이 잡음 구간인 것으로 확인하면, 상기 분할 구간의 이전 구간 또는 이후 구간 중 잡음 구간에 가장 가까운 음성 구간을 DFT 처리하여 적어도 하나의 주파수 대역별로 신호 세기를 추출하는 단계, 신호 세기가 센 제1 주파수와 제2 주파수를 상기 잡음 구간에 대한 메인 주파수로 설정하는 단계, 그리고 상기 평균 음량을 토대로 설정한 잡음 레벨, 상기 제1 주파수와 제2 주파수, 그리고 상기 분할 구간의 음원 정보를 포함하여 상기 분할 구간이 잡음 구간임을 알리는 메타 데이터를 생성하는 단계를 포함할 수 있다.Prior to the step of checking the noise section, generating a divided section obtained by dividing the sound source by a preset time unit, calculating an average volume for the generated divided section, and the divided section being noisy based on the calculated average volume. Checking whether the segment is a segment or a voice segment, and if it is determined that the segment is a noise segment, the audio segment closest to the noise segment from the previous segment or the subsequent segment of the segment is DFT-processed to determine the signal strength for each of at least one frequency band. Extracting, setting a first frequency and a second frequency having high signal strength as main frequencies for the noise section, and a noise level set based on the average volume, the first frequency and the second frequency, and the It may include generating metadata indicating that the divided section is a noise section, including sound source information of the divided section.

상기 평균 음량은 상기 분할 구간의 시간 영역의 신호 세기와 상기 분할 구간에 포함된 음원 데이터 수를 이용하여 상기 평균 음량을 구하고, 상기 평균 음량이 미리 설정한 임계 세기 이하이면 상기 분할 구간이 잡음 구간인 것으로 확인하고 상기 평균 음량을 상기 분할 구간의 잡음 레벨로 결정할 수 있다.The average volume is obtained by using the signal strength in the time domain of the divided section and the number of sound source data included in the divided section, and if the average volume is less than a preset threshold strength, the divided section is a noise section. And the average volume can be determined as the noise level of the divided section.

상기 워터마크를 생성하는 단계는, 상기 제1 주파수에 대한 잡음과 제2 주파수에 대한 잡음을 생성하는 단계, 그리고 상기 메타 데이터에 포함된 음원 정보를 토대로 생성된 이진 코드의 제1 부분에는 상기 제1 주파수의 잡음을, 제2 부분에는 상기 제2 주파수의 잡음을 삽입하여 워터마크로 생성하는 단계를 포함할 수 있다.The generating of the watermark includes generating noise for the first frequency and noise for a second frequency, and the first part of the binary code generated based on the sound source information included in the metadata It may include the step of generating a watermark by inserting the noise of the first frequency and the noise of the second frequency into the second part.

상기 본 발명의 기술적 과제를 달성하기 위한 본 발명의 또 다른 특징인 오디오 워터마킹 시스템이 실시간 전송되는 음원에 오디오 워터마크를 삽입하는 방법으로서,As a method of inserting an audio watermark into a sound source transmitted in real time by an audio watermarking system, which is another feature of the present invention for achieving the technical problem of the present invention,

상기 전송되는 음원을 미리 설정한 시간 단위로 분할하고, 분할한 분할 구간의 평균 음량을 토대로 상기 분할 구간이 음성 구간인지 잡음 구간인지 확인하는 단계, 상기 분할 구간이 잡음 구간이면 상기 잡음 구간의 이전 분할 구간 또는 이후 분할 구간 중 상기 잡음 구간에 인접한 음성 구간에 포함된 적어도 하나의 주파수 중 주파수 세기가 가장 센 주파수를 상기 잡음 구간의 메인 주파수로 선정하는 단계, 상기 메인 주파수에 대응하는 잡음을 확인하고, 상기 확인한 잡음과 상기 분할 구간의 음원 정보를 이용하여 워터마크를 생성하는 단계, 그리고 생성한 워터마크를 상기 분할 구간에 삽입하는 단계를 포함한다.Dividing the transmitted sound source into a preset time unit, and determining whether the divided section is a voice section or a noise section based on the average volume of the divided section. If the divided section is a noise section, the previous division of the noise section Selecting a frequency having the highest frequency intensity among at least one frequency included in a speech section adjacent to the noise section among a section or a subsequent divided section as a main frequency of the noise section, confirming noise corresponding to the main frequency, And generating a watermark using the checked noise and sound source information of the divided section, and inserting the generated watermark into the divided section.

상기 잡음 구간인지 확인하는 단계는, 상기 분할 구간의 시간 영역의 신호 세기와 상기 분할 구간에 포함된 음원 데이터 수를 이용하여 상기 평균 음량을 계산할 수 있다.In the step of determining whether it is the noise section, the average volume may be calculated using the signal strength of the time domain of the divided section and the number of sound source data included in the divided section.

상기 분할 구간에 삽입하는 단계 이후에, 상기 분할 구간에 이어 전송되는 분할 구간의 평균 음량을 토대로, 잡음 구간이 지속되는지 확인하는 단계, 그리고 잡음 구간이 지속되면 상기 생성한 워터마크를 삽입하고, 잡음 구간이 지속되지 않으면 상기 워터마크의 삽입을 중단하는 단계를 포함할 수 있다.After the step of inserting into the divided section, based on the average volume of the divided section transmitted following the divided section, checking whether the noise section continues, and if the noise section continues, inserting the generated watermark, and noise If the section does not continue, it may include the step of stopping the insertion of the watermark.

상기 삽입하는 단계 이후에, 상기 워터마크를 생성하는데 사용한 음원 정보와 워터마크 생성 내역을 메타 데이터로 저장하는 단계를 포함할 수 있다. After the inserting step, the sound source information used to generate the watermark and the watermark generation history may be stored as metadata.

본 발명에 따르면 음질에 대한 저하 없이 워터마크를 삽입할 수 있어, 사용자는 기존 품질 그대로의 음원 서비스를 이용할 수 있다.According to the present invention, a watermark can be inserted without deteriorating sound quality, so that a user can use a sound source service with the existing quality.

또한, 오디오 워터마킹이 삽입된 음원이 유출되더라도 최초 유출 경로를 추적할 수 있다.In addition, even if a sound source into which audio watermarking is inserted is leaked, it is possible to track an initial leak path.

도 1은 본 발명의 실시예에 따른 오디오 워터마킹 시스템이 적용된 환경의 예시도이다.
도 2는 본 발명의 실시예에 따른 오디오 서버의 구조도이다.
도 3은 본 발명의 실시예에 따른 워터마킹 처리 서버의 구조도이다.
도 4는 본 발명의 제1 실시예에 따른 오디오 워터마킹 방법에 대한 흐름도이다.
도 5는 본 발명의 제1 실시예에 따른 메타 데이터 생성 방법에 대한 흐름도이다.
도 6은 본 발명의 제2 실시예에 따른 오디오 워터마킹 방법에 대한 흐름도이다.
도 7은 본 발명의 실시예에 따른 오디오 신호의 예시도이다.
도 8은 본 발명의 실시예에 따른 메타 데이터의 예시도이다.1 is an exemplary diagram of an environment to which an audio watermarking system according to an embodiment of the present invention is applied.
2 is a structural diagram of an audio server according to an embodiment of the present invention.
3 is a structural diagram of a watermarking processing server according to an embodiment of the present invention.
4 is a flowchart of an audio watermarking method according to the first embodiment of the present invention.
5 is a flowchart of a method for generating meta data according to the first embodiment of the present invention.
6 is a flowchart of an audio watermarking method according to a second embodiment of the present invention.
7 is an exemplary diagram of an audio signal according to an embodiment of the present invention.
8 is an exemplary diagram of meta data according to an embodiment of the present invention.

아래에서는 첨부한 도면을 참고로 하여 본 발명의 실시예에 대하여 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자가 용이하게 실시할 수 있도록 상세히 설명한다. 그러나 본 발명은 여러 가지 상이한 형태로 구현될 수 있으며 여기에서 설명하는 실시예에 한정되지 않는다. 그리고 도면에서 본 발명을 명확하게 설명하기 위해서 설명과 관계없는 부분은 생략하였으며, 명세서 전체를 통하여 유사한 부분에 대해서는 유사한 도면 부호를 붙였다.Hereinafter, embodiments of the present invention will be described in detail with reference to the accompanying drawings so that those of ordinary skill in the art may easily implement the present invention. However, the present invention may be implemented in various different forms and is not limited to the embodiments described herein. In the drawings, parts irrelevant to the description are omitted in order to clearly describe the present invention, and similar reference numerals are attached to similar parts throughout the specification.

명세서 전체에서, 어떤 부분이 어떤 구성요소를 "포함"한다고 할 때, 이는 특별히 반대되는 기재가 없는 한 다른 구성요소를 제외하는 것이 아니라 다른 구성요소를 더 포함할 수 있는 것을 의미한다. Throughout the specification, when a part "includes" a certain component, it means that other components may be further included rather than excluding other components unless specifically stated to the contrary.

본 명세서에서 단말(terminal)은, 이동국(Mobile Station, MS), 이동 단말(Mobile Terminal, MT), 가입자국(Subscriber Station, SS), 휴대 가입자국(Portable Subscriber Station, PSS), 사용자 장치(User Equipment, UE), 접근 단말(Access Terminal, AT) 등을 지칭할 수도 있고, 이동 단말, 가입자국, 휴대 가입자 국, 사용자 장치 등의 전부 또는 일부의 기능을 포함할 수도 있다.In the present specification, a terminal is a mobile station (MS), a mobile terminal (MT), a subscriber station (SS), a portable subscriber station (PSS), and a user device (User Equipment, UE), an access terminal (AT), and the like, and may include all or part of functions such as a mobile terminal, a subscriber station, a mobile subscriber station, and a user equipment.

이하, 도면을 참조로 하여 본 발명의 실시예에 따른 오디오 워터마킹 시스템 및 이를 이용한 오디오 워터마킹 방법에 대하여 설명한다.Hereinafter, an audio watermarking system according to an embodiment of the present invention and an audio watermarking method using the same will be described with reference to the drawings.

도 1은 본 발명의 실시예에 따른 오디오 워터마킹 시스템이 적용된 환경의 예시도이다.1 is an exemplary diagram of an environment to which an audio watermarking system according to an embodiment of the present invention is applied.

도 1에 도시된 바와 같이, 오디오 서버(100)와 워터마킹 처리 서버(200)로 구성된 오디오 워터마킹 시스템(10)은 복수의 단말(300)과 연동하여, 단말(300)들이 요청하는 오디오에 오디오 워터마크를 삽입하여 제공한다. 여기서, 오디오 서버(100)는 임의의 한 서버 내에 포함되어 있는 오디오 모듈, 오디오 처리 파트, 오디오 처리 부분일 수 있으며, 반드시 오디오 서버(100)와 같이 물리적인 서버로 구성되지 않아도 무방하다. 워터마킹 처리 서버(200)의 경우에도 설명의 편의를 위하여 "서버"로 지칭하여 설명하나, 모듈로서 구성될 수도 있다.As shown in FIG. 1, the audio watermarking system 10 composed of the audio server 100 and the watermarking processing server 200 interlocks with a plurality of terminals 300 to provide audio that the terminals 300 request. Provides by inserting an audio watermark. Here, the audio server 100 may be an audio module, an audio processing part, and an audio processing part included in any one server, and may not necessarily be configured as a physical server like the audio server 100. The watermarking processing server 200 is also referred to as a "server" for convenience of description, but may be configured as a module.

오디오 서버(100)는 실시간으로 수신되는 음원 또는 미리 저장되어 있는 음원을 이용하여 음원에 포함되어 있는 잡음 구간의 위치를 지시하는 메타 데이터를 생성한다. 그리고 단말(300)로부터 음원 요청이 있을 경우 메타 데이터와 음원을 제1 오디오 신호에 포함하여 워터마킹 처리 서버(200)로 전달하고, 워터마킹 처리 서버(200)에서 오디오 워터마크가 삽입된 제2 오디오 신호를 수신하면 이를 단말(300)로 제공한다.The audio server 100 generates metadata indicating the location of a noise section included in the sound source by using a sound source received in real time or a sound source stored in advance. In addition, when there is a request for a sound source from the terminal 300, the meta data and the sound source are included in the first audio signal and transmitted to the watermarking processing server 200, and the second audio watermark is inserted in the watermarking processing server 200. Upon receiving the audio signal, it is provided to the terminal 300.

워터마킹 처리 서버(200)는 오디오 서버(100)로부터 제1 오디오 신호를 수신하면, 워터마크를 생성한다. 이때, 제1 오디오 신호에 포함된 메타 데이터를 토대로 생성한 워터마크를 음원에 삽입하고, 오디오 워터마킹이 처리된 음원을 제2 오디오 신호에 포함하여 오디오 서버(100)로 전달하여, 단말(300)로 오디오 워터마킹 처리된 음원이 제공되도록 한다.When the watermarking processing server 200 receives the first audio signal from the audio server 100, it generates a watermark. At this time, the watermark generated based on the metadata included in the first audio signal is inserted into the sound source, and the sound source subjected to the audio watermarking is included in the second audio signal and transmitted to the audio server 100, so that the terminal 300 ) To provide the audio watermarked sound source.

본 발명의 실시예에서는 음원을 저장하거나, 음원으로부터 메타 데이터를 생성하는 오디오 서버(100)와 음원에 오디오 워터마크를 삽입하는 워터마킹 처리 서버(200)가 물리적으로 분리되어 있는 것을 예로 하여 설명하나, 하나의 서버 내에 두 개의 서버가 포함되어 있거나, 하나의 서버가 모든 기능을 처리할 수 있다. 이러한 오디오 서버(100)와 워터마킹 처리 서버(200)의 구조에 대해 도 2 및 도 3을 참조로 설명한다.In the embodiment of the present invention, the audio server 100 for storing a sound source or generating metadata from the sound source and the watermarking processing server 200 for inserting an audio watermark into the sound source are physically separated as an example. , Two servers are included in one server, or one server can handle all functions. The structures of the audio server 100 and the watermarking processing server 200 will be described with reference to FIGS. 2 and 3.

도 2는 본 발명의 실시예에 따른 오디오 서버의 구조도이다.2 is a structural diagram of an audio server according to an embodiment of the present invention.

도 2에 도시된 바와 같이, 오디오 서버(100)는 인터페이스(110), 프로세서(120), 오디오 메모리(130)와 메타 데이터 메모리(140)를 포함한다.As shown in FIG. 2, the audio server 100 includes an interface 110, a processor 120, an audio memory 130, and a meta data memory 140.

오디오 서버(100)의 인터페이스(110)는 외부(예를 들어, 단말, 오디오 데이터베이스 등)로부터 음원을 수신하거나, 단말(300)로부터 음원 요청 신호를 수신한다. 그리고, 인터페이스(110)는 워터마킹 처리 서버(200)로 단말(300)이 요청한 음원과 해당 음원에 대한 복수의 메타 데이터를 포함하는 제1 오디오 신호를 전달한다.The interface 110 of the audio server 100 receives a sound source from an external device (eg, a terminal, an audio database, etc.) or a sound source request signal from the terminal 300. In addition, the interface 110 transmits a sound source requested by the terminal 300 to the watermarking processing server 200 and a first audio signal including a plurality of metadata for the sound source.

또한, 인터페이스(110)는 워터마킹 처리 서버(200)로부터 오디오 워터마킹 처리된 음원을 포함하는 제2 오디오 신호를 수신하면, 수신한 제2 오디오 신호를 음원을 요청한 단말(300)로 전달한다. In addition, when the interface 110 receives the second audio signal including the audio watermarked sound source from the watermarking processing server 200, the interface 110 transmits the received second audio signal to the terminal 300 requesting the sound source.

프로세서(120)는 오디오 메모리(130)에 저장되어 있는 음원 또는 인터페이스(110)를 통해 수신되는 음원을 미리 설정한 구간 단위로 분할하여 수신한다. 본 발명의 실시예에서는 프로세서(120)가 10msec와 같은 시간 단위로 분할된 분할 구간을 수신하는 것을 예로 하여 설명하나, 반드시 이와 같이 한정되는 것은 아니다. 또한, 프로세서(120)가 하나의 음원을 분할하는 방법도 여러 형태로 수행할 수 있으므로, 어느 하나의 방법으로 한정하지 않는다.The processor 120 divides and receives a sound source stored in the audio memory 130 or a sound source received through the interface 110 in units of a preset section. In the exemplary embodiment of the present invention, the processor 120 is described as an example in which the processor 120 receives a divided section divided by a time unit such as 10 msec, but is not necessarily limited as such. In addition, since the processor 120 may perform a method of dividing one sound source in various forms, it is not limited to any one method.

프로세서(120)는 분할 구간을 수신하고 해당 분할 구간이 잡음 구간이면, 잡음 구간을 지시하기 위한 메타 데이터를 생성한다. 본 발명의 실시예에서는 목소리가 포함되어 있는 구간을 "음성 구간"이라 지칭하고, 음성 구간 이외에 잡음만 있는 구간이나 잡음도 없는 무음 구간을 "잡음 구간"이라 지칭한다.The processor 120 receives the segmentation section and, if the segmentation section is a noise section, generates metadata for indicating the noise section. In an exemplary embodiment of the present invention, a section in which a voice is included is referred to as a "speech section", and a section with only noise or a silent section without noise is referred to as a "noise section".

잡음 구간에 대한 메타 데이터를 생성하기 위하여 프로세서(120)는 먼저 분할 구간의 평균 음량을 구한다. 분할 구간의 평균 음량을 구하기 위해, 프로세서(120)는 분할 구간의 신호 세기를 제곱한 후, 10msec 분할 구간 내에 포함된 음원 데이터 단위 수만큼 나누어 평균 음량을 구한다. In order to generate meta data for the noise section, the processor 120 first obtains an average volume of the divided section. In order to obtain the average volume of the divided section, the processor 120 squares the signal strength of the divided section and then divides it by the number of sound source data units included in the 10msec division section to obtain the average volume.

오디오 서버(100)는 음원을 디지털 형태로 받아들일 수 있으므로, 분할 구간에 포함된 파형을 토대로, 분할 구간의 신호 세기를 확인할 수 있다. 신호 세기는 + 세기와 - 세기가 모두 포함될 수 있으며, 오디오 서버(100)가 확인한 오디오 신호의 예를 나타낸 도 7을 참조로 하여 먼저 설명한다. Since the audio server 100 can accept a sound source in a digital form, it is possible to check the signal strength of the divided section based on the waveform included in the divided section. The signal strength may include both + strength and-strength, and will be described first with reference to FIG. 7 showing an example of an audio signal checked by the audio server 100.

도 7은 본 발명의 실시예에 따른 오디오 신호의 예시도이다.7 is an exemplary diagram of an audio signal according to an embodiment of the present invention.

도 7에 도시된 바와 같이, 음원의 파형은 음성의 세기에 따라 + 세기와 - 세기를 나타내므로, 10msec 단위로 수신하는 프로세서(120)는 10msec 내의 신호 세기를 제곱하여 - 세기 성분을 없앤다. 그리고 제곱한 신호 세기를 10msec에 포함된 음원 데이터 수(N개, 여기서 N은 정수)로 나누어 분할 구간에 대한 평균 음량으로 구한다.As shown in FIG. 7, since the waveform of the sound source represents + intensity and-intensity according to the intensity of the voice, the processor 120 receiving the signal in 10 msec units squares the signal intensity within 10 msec to remove the-intensity component. Then, the squared signal strength is divided by the number of sound source data included in 10 msec (N, where N is an integer) to obtain the average volume for the divided section.

예를 들어 예를 들어, 1초에 16000개의 음성 데이터가 포함되도록 16KHz 음질로 음성이 녹음되었다고 가정한다. 그러면, 10msec 내에는 160개(N개)의 음원 데이터가 포함되어 있다. 따라서, 제곱된 신호 세기를 160개로 나누면 10msec의 분할 구간에 대한 평균 음량을 구할 수 있다. 본 발명의 실시예에서는 분할 구간 각각에 대해 제곱한 신호 세기를 분할 구간에 포함된 음원 데이터 수로 나눈 값으로 평균 음량을 구하는 것을 예로 하여 설명하나, 여러 방법으로 실행할 수 있으므로 반드시 이와 같은 방법으로 한정되는 것은 아니다.For example, assume that voice is recorded in 16KHz sound quality so that 16000 voice data are included in one second. Then, 160 (N) sound source data are included within 10 msec. Therefore, by dividing the squared signal strength by 160, the average volume for the divided section of 10 msec can be obtained. In the embodiment of the present invention, the average volume is obtained by dividing the signal strength squared for each division section by the number of sound source data included in the division section as an example. It is not.

상기 도 2를 이어 설명하면, 프로세서(120)는 분할 구간에 대한 평균 음량이 미리 설정한 임계 세기 이하이면, 해당 분할 구간은 잡음 구간인 것으로 확인하고 분할 구간에 대한 평균 음량을 잡음 레벨로 결정한다. 이와 함께, 프로세서(120)는 잡음 구간으로 확인한 분할 구간의 시작 시간과 종료 시간을 파악한다.Referring to FIG. 2, if the average volume for the divided section is less than or equal to a preset threshold strength, the processor 120 determines that the divided section is a noise section and determines the average volume for the divided section as the noise level. . In addition, the processor 120 determines the start time and the end time of the divided section identified as the noise section.

한편 평균 음량이 미리 설정한 임계 세기 이상이면, 프로세서(120)는 해당 분할 구간은 유의미한 데이터가 포함되어 있는 음성 구간으로 확인한다. 여기서, 평균 음량을 비교할 미리 설정한 범위는 어느 하나의 구간으로 한정하지 않는다.On the other hand, if the average volume is greater than or equal to a preset threshold intensity, the processor 120 identifies the divided section as a voice section containing meaningful data. Here, the preset range for comparing the average volume is not limited to any one section.

프로세서(120)는 분할 구간을 음성 구간과 잡음 구간으로 구분한 후, 음성 구간으로 확인된 분할 구간을 DFT(Discrete Fourier Transform) 또는 FFT(Fast Fourier Transform) 변환을 수행하여 주파수 대역별 신호 크기를 구한다. 그리고 프로세서(120)는 분할 구간에서 신호 세기가 가장 센 두 개의 주파수(이하, '제1 주파수'와 '제2 주파수'라 지칭하거나, 합쳐서 '메인 주파수'라 지칭함)를 선택한다. 메인 주파수의 수는 분할 구간의 수만큼 선택된다. The processor 120 divides the divided section into a voice section and a noise section, and then calculates the signal size for each frequency band by performing Discrete Fourier Transform (DFT) or Fast Fourier Transform (FFT) transformation on the divided section identified as the voice section. . In addition, the processor 120 selects two frequencies (hereinafter referred to as'first frequency' and'second frequency', or collectively referred to as'main frequency') having the strongest signal strength in the divided section. The number of main frequencies is selected as many as the number of division sections.

여기서, 프로세서(120)가 제1 주파수와 제2 주파수를 선택할 때, 분할 구간이 잡음 구간인 경우에는 해당 잡음 구간의 이전 또는 이후 분할 구간이 음성 구간인지 확인한다. 그리고 음성 구간일 경우, 음성 구간에서 제1 주파수와 제2 주파수를 선택한다. 프로세서(120)가 선택하는 제1 주파수와 제2 주파수는 음원의 음색을 맞추기 위함이며, 본 발명의 실시예에서 설명하는 바와 같이 반드시 음성 구간에서 주파수를 선택하지 않아도 무방하다. 즉, 미리 설정한 주파수를 제1 주파수와 제2 주파수로 사용할 수도 있다.Here, when the processor 120 selects the first frequency and the second frequency, when the divided section is a noise section, it is checked whether the divided section before or after the noise section is a voice section. And, in the case of a voice section, a first frequency and a second frequency are selected in the voice section. The first frequency and the second frequency selected by the processor 120 are to match the tone of the sound source, and as described in the embodiment of the present invention, it is not necessary to select a frequency in the voice section. That is, a preset frequency may be used as the first frequency and the second frequency.

프로세서(120)는 모든 분할 구간에 대한 신호 처리가 종료되면, 분할 구간 중 잡음 구간으로 확인된 분할 구간의 시작 시간, 분할 구간 종료 시간, 제1 주파수, 제2 주파수, 잡음 레벨을 포함하는 메타 데이터를 생성한다. 여기서, 제1 주파수와 제2 주파수에는 인덱스 정보 또는 주파수 자체의 값 중 어느 하나가 삽입된다. 각 메인 주파수에 대한 인덱스 정보를 삽입하기 위해, 본 발명의 실시예에서는 다음 표 1과 같이 각 주파수 대역에 대한 인덱스가 정해져 있다고 가정한다.When the signal processing for all the divided sections is finished, the processor 120 includes metadata including the start time, the end time of the division section, the first frequency, the second frequency, and the noise level of the division section identified as a noise section among the division sections. Create Here, either index information or a value of the frequency itself is inserted into the first frequency and the second frequency. In order to insert index information for each main frequency, in an embodiment of the present invention, it is assumed that an index for each frequency band is determined as shown in Table 1 below.

인덱스 정보Index information 주파수 구간Frequency section 00 1~100Hz1~100Hz 1One 101~400Hz101~400Hz 22 401~800Hz401~800Hz …… ……

만약 분할 구간이 잡음 구간인 경우, 분할 구간에 앞선 구간 또는 바로 이어 나타나는 분할 구간 중 잡음 구간에 가장 인접한 음성 구간에서 신호 세기가 가장 센 두 개의 주파수가 500Hz와 200Hz라고 가정한다면, 제1 주파수에는 인덱스 정보로 2가 삽입되고, 제2 주파수에는 인덱스 정보로 1이 삽입된다.If the segmentation section is a noise section, assuming that the two frequencies with the strongest signal strength in the voice section closest to the noise section among the section preceding or immediately following the segmentation section are 500Hz and 200Hz, the first frequency is indexed 2 is inserted as information, and 1 is inserted as index information in the second frequency.

하나의 음원에 대해 잡음 구간의 수만큼의 메타 데이터가 모여, 메타 데이터 메모리(140)에 저장된다. 즉, 하나의 음원에 10개의 잡음 구간이 포함되어 있었다면, 메타 데이터 메모리(140)에 저장되어 있는 하나의 음원에 대한 메타 데이터 내에는 잡음 구간을 지시하기 위한 10개의 정보가 포함되어 있다. Meta data as many as the number of noise sections for one sound source are collected and stored in the meta data memory 140. That is, if 10 noise sections are included in one sound source, 10 pieces of information for indicating the noise section are included in meta data for one sound source stored in the metadata memory 140.

메타 데이터 메모리(140)가 메타 데이터를 저장할 때, 음원에 대한 음원 식별 정보와 해당 음원에 대한 메타 데이터가 저장된다. 음원 식별 정보는 프로세서(120)가 제공할 수도 있고, 음원을 생성하여 전달한 단말(300)이 생성할 수도 있으므로, 어느 하나의 방법으로 한정하지 않는다.When the meta data memory 140 stores meta data, sound source identification information for a sound source and meta data for a corresponding sound source are stored. The sound source identification information may be provided by the processor 120 or may be generated by the terminal 300 that generated and transmitted the sound source, and thus the sound source identification information is not limited to any one method.

오디오 메모리(130)는 음원과 음원 식별 정보를 저장한다. 그리고 프로세서(120)는 단말(300)에 의해 요청된 음원을 오디오 메모리(130)에 저장된 음원 식별 정보를 이용하여 찾는다. 또한, 해당 음원의 음원 식별 정보에 대응하는 메타 데이터가 메타 데이터 메모리(140)에 저장되어 있는지 확인한다.The audio memory 130 stores sound sources and sound source identification information. In addition, the processor 120 finds the sound source requested by the terminal 300 using sound source identification information stored in the audio memory 130. In addition, it is checked whether meta data corresponding to the sound source identification information of the corresponding sound source is stored in the meta data memory 140.

음원에 대한 메타 데이터가 메타 데이터 메모리(140)에 저장되어 있는 경우에는, 프로세서(120)는 메타 데이터와 음원을 포함하는 제1 오디오 신호를 생성하여 워터마킹 처리 서버(200)로 전달한다. 그러나, 메타 데이터가 없거나 실시간으로 음원이 전송되는 경우, 프로세서(120)는 메타 데이터를 바로 생성하여 제1 오디오 신호에 포함, 워터마킹 처리 서버(200)로 전송한다. 그리고 생성한 메타 데이터를 메타 데이터 메모리(140)에 저장한다.When metadata about a sound source is stored in the metadata memory 140, the processor 120 generates a first audio signal including the metadata and the sound source and transmits the generated first audio signal to the watermarking processing server 200. However, when there is no metadata or a sound source is transmitted in real time, the processor 120 immediately generates the metadata, includes it in the first audio signal, and transmits it to the watermarking processing server 200. Then, the generated meta data is stored in the meta data memory 140.

본 발명의 실시예에서는 설명의 편의를 위하여 오디오 메모리(130)와 메타 데이터 메모리(140)로 구분하여 나타내었으나, 하나의 메모리에 오디오 신호와 메타 데이터가 저장될 수 있다. 또한, 오디오 서버(100)를 구동하기 위한 다양한 프로그램들도 메모리에 저장될 수 있다.In the exemplary embodiment of the present invention, for convenience of description, the audio memory 130 and the meta data memory 140 are classified, but an audio signal and meta data may be stored in one memory. Also, various programs for driving the audio server 100 may be stored in the memory.

도 3은 본 발명의 실시예에 따른 워터마킹 처리 서버의 구조도이다.3 is a structural diagram of a watermarking processing server according to an embodiment of the present invention.

도 3에 도시된 바와 같이, 워터마킹 처리 서버(200)는 인터페이스(210), 워터마크 처리부(220) 및 메모리(230)를 포함한다.As shown in FIG. 3, the watermarking processing server 200 includes an interface 210, a watermark processing unit 220, and a memory 230.

인터페이스(210)는 오디오 서버(100)로부터 전송되는 제1 오디오 신호를 수신한다. 그리고 워터마크 처리부(220)가 생성한 제2 오디오 신호를 오디오 서버(100)로 전송한다. The interface 210 receives a first audio signal transmitted from the audio server 100. Then, the second audio signal generated by the watermark processing unit 220 is transmitted to the audio server 100.

워터마크 처리부(220)는 제1 오디오 신호에 포함되어 있는 메타 데이터와 음원을 확인하고, 메타 데이터를 토대로 음원 중 어느 구간이 잡음 구간인지 확인한다. 그리고 워터마크 처리부(220)는 잡음 구간에 삽입할 오디오 워터마크를 생성한다. The watermark processing unit 220 checks the meta data and sound source included in the first audio signal, and determines which section of the sound source is a noise section based on the meta data. In addition, the watermark processing unit 220 generates an audio watermark to be inserted into the noise section.

워터마크 처리부(220)는 잡음 구간을 확인한 후, 제1 주파수의 잡음과 제2 주파수의 잡음을 잡음 레벨에 맞춰 생성한다. 제1 주파수 또는 제2 주파수는 무음으로 대체할 수 있으며, 잡음을 생성하는 방법이나 생성된 잡음의 형태는 어느 하나로 한정하지 않는다. 여기서, 워터마크 처리부(220)는 음원을 듣는 사용자에게 오디오 워터마크에 대한 알람을 주기 위해, 제1 주파수의 잡음과 제2 주파수의 잡음을 음성이나 다른 구간의 음향 부분의 수준에 맞춰 크게 생성할 수도 있다. After checking the noise section, the watermark processing unit 220 generates the noise of the first frequency and the noise of the second frequency according to the noise level. The first frequency or the second frequency may be replaced with silence, and a method of generating noise or a form of generated noise is not limited to any one. Here, the watermark processing unit 220 may generate the noise of the first frequency and the noise of the second frequency largely according to the level of the voice or the sound part of another section in order to give an alarm about the audio watermark to the user who listens to the sound source. May be.

워터마크 처리부(220)는 제1 주파수의 잡음과 제2 주파수의 잡음을 생성하면, 생성한 두 개의 잡음과 워터 마크를 생성하기 위한 삽입 정보를 조합하여 워터 마크를 생성한다. 본 발명의 실시예에서는 삽입 정보와 잡음을 이진 데이터 형태로 조합하여 오디오 워터마크를 생성하는 것을 예로 하여 설명한다. 여기서, 삽입 정보는 잡음 구간에 대한 시간 정보나 음원의 식별 정보 또는 사용자 식별 정보 중 어느 형태의 정보가 선택될 수 있다. When the watermark processing unit 220 generates noise of a first frequency and noise of a second frequency, the watermark is generated by combining the two generated noises and embedding information for generating the watermark. In the embodiment of the present invention, an audio watermark is generated by combining embedding information and noise in the form of binary data. Here, as the insertion information, any form of information from time information on a noise section, identification information of a sound source, or user identification information may be selected.

예를 들어, 삽입 음원의 정보 중 음원의 식별 정보를 토대로 생성된 이진 코드가 '011001011010…'이라 가정한다. 그러면, 워터마크 처리부(220)는 이진 코드의 '1' 부분에 제1 주파수의 잡음을 삽입하고, '0' 부분에 제2 주파수의 잡음을 삽입하여 오디오 워터마크로 생성한다. For example, the binary code generated based on the identification information of the sound source among the information of the inserted sound source is '011001011010... 'Is assumed. Then, the watermark processing unit 220 inserts the noise of the first frequency into the '1' portion of the binary code and the noise of the second frequency into the '0' portion to generate an audio watermark.

이때, 워터마크 처리부(220)는 이진 코드에 더 많은 워터마크 정보를 삽입하기 위하여, 오디오 서버(100)의 프로세서(120)가 DFT 변환하여 생성한 주파수 중에서, 추가로 주파수를 선택하여 처리할 수 있다. 예를 들어, 워터마크 처리부(220)가 프로세서(120)에서 생성된 주파수들 중 임의의 8개의 주파수를 선정하여 8진수 형태로 처리하고, 추가로 2^N 진법을 사용하여 더 많은 정보를 워터마크에 삽입할 수 있다. 여기서 2^N은 선정한 주파수 수를 의미한다. 주파수를 선택하여 주파수 수에 따른 진법을 사용하여 처리하는 방법은 다양한 방법으로 수행할 수 있으므로, 본 발명의 실시예에서는 어느 하나의 방법으로 한정하여 설명하지 않는다.At this time, the watermark processing unit 220 may additionally select and process a frequency among frequencies generated by DFT conversion by the processor 120 of the audio server 100 in order to insert more watermark information into the binary code. have. For example, the watermark processing unit 220 selects 8 random frequencies among the frequencies generated by the processor 120 and processes them in octal form, and additionally ^{uses the 2N} base method to watermark more information. Can be inserted into Here, 2 ^N means the number of selected frequencies. Since a method of selecting a frequency and processing it using a base method according to the number of frequencies can be performed in various ways, it is not limited to any one method in the embodiment of the present invention.

워터마크 처리부(220)는 삽입 정보가 조합될 주파수의 잡음을 전체 음원에서 공통으로 사용할 고정 잡음으로 생성하여 사용할 수 있고, 잡음 구간별로 각각 상이한 잡음을 생성하여 사용할 수 있다. 또한, 워터마크 처리부(220)는 오디오 워터마크의 길이가 잡음 구간의 길이보다 짧으면, 오디오 워터마크를 반복적으로 삽입할 수도 있다.The watermark processing unit 220 may generate and use noise of a frequency at which the embedding information is to be combined as fixed noise to be used in common in all sound sources, and may generate and use different noises for each noise section. Also, the watermark processing unit 220 may repeatedly insert the audio watermark if the length of the audio watermark is shorter than the length of the noise section.

워터마크 처리부(220)는 오디오 워터마크를 생성하기 위해 잡음과 조합한 삽입 정보를 메모리(230)에 저장한다. 여기서, 메모리(230)에 저장한 삽입 정보는 오디오 서버(100)의 메타 데이터 메모리(140)에 저장될 수도 있다. 또한, 메모리(230)는 워터마킹 처리 서버(200)가 오디오 워터마크를 생성할 수 있는 다양한 프로그램들이 저장되어 있다.The watermark processing unit 220 stores embedding information combined with noise in the memory 230 to generate an audio watermark. Here, the insertion information stored in the memory 230 may be stored in the metadata memory 140 of the audio server 100. In addition, the memory 230 stores various programs through which the watermarking processing server 200 can generate an audio watermark.

이상에서 설명한 오디오 워터마킹 시스템(10)을 이용하여 오디오 워터마크를 음원에 삽입하는 방법에 대해 도 4 내지 도 6을 참조로 설명한다. 본 발명의 제1 실시예에서는 음원이 오디오 서버(100)에 저장되어 있는 경우의 오디오 워터마킹 방법을 설명하고, 제2 실시예에서는 음원이 실시간으로 외부로부터 전송되는 경우의 오디오 워터마킹 방법을 설명한다.A method of inserting an audio watermark into a sound source using the audio watermarking system 10 described above will be described with reference to FIGS. 4 to 6. In the first embodiment of the present invention, an audio watermarking method when a sound source is stored in the audio server 100 is described, and in the second embodiment, an audio watermarking method when a sound source is transmitted from outside in real time is described. do.

도 4는 본 발명의 제1 실시예에 따른 오디오 워터마킹 방법에 대한 흐름도이다.4 is a flowchart of an audio watermarking method according to the first embodiment of the present invention.

도 4에 도시된 바와 같이, 오디오 서버(100)는 단말(300)로부터 음원 제공을 요청하는 사용자 요청을 수신하면(S100), 요청된 음원과 음원에 대응하는 메타 데이터가 저장되어 있는지 검색한다(S110). 여기서, 사용자 요청 신호에는 단말(300)로 제공할 음원에 대한 음원 식별 정보가 포함되어 있는 것을 예로 하여 설명한다.As shown in FIG. 4, when receiving a user request for providing a sound source from the terminal 300 (S100), the audio server 100 searches whether the requested sound source and metadata corresponding to the sound source are stored ( S110). Here, the user request signal will be described as an example in which the sound source identification information for the sound source to be provided to the terminal 300 is included.

오디오 서버(100)는 음원 식별 정보를 이용하여 음원에 대응하는 메타 데이터가 저장되어 있는지 확인하고(S120), 메타 데이터가 없는 경우에는 메타 데이터를 생성한다(S130). 메타 데이터 생성 절차에 대해 도 5를 참조로 먼저 설명한다.The audio server 100 checks whether metadata corresponding to the sound source is stored using the sound source identification information (S120), and if there is no metadata, generates the metadata (S130). The meta data generation procedure will be first described with reference to FIG. 5.

도 5는 본 발명의 제1 실시예에 따른 메타 데이터 생성 방법에 대한 흐름도이다.5 is a flowchart of a method for generating meta data according to the first embodiment of the present invention.

도 5에 도시된 바와 같이, 오디오 서버(100)의 프로세서(120)는 먼저 사용자가 요청한 음원을 오디오 메모리(130)로부터 미리 설정한 시간 단위로 분할된 분할 구간으로 수신한다(S131). 본 발명의 실시예에서는 10msec 시간 단위로 하나의 음원을 복수의 분할 구간으로 분할하여 처리하는 것을 예로 하여 설명한다. 그리고 음원이 16KHz 음질로 녹음되었다고 가정하면, 프로세서(120)가 읽어오는 10msec 내에는 160개의 음원 데이터가 포함된다.As shown in FIG. 5, the processor 120 of the audio server 100 first receives a sound source requested by the user from the audio memory 130 in a divided section divided by a preset time unit (S131). In the embodiment of the present invention, a description will be given on the example of dividing one sound source into a plurality of divided sections in 10 msec time units and processing them. In addition, assuming that the sound source is recorded in 16KHz sound quality, 160 sound source data are included within 10 msec that the processor 120 reads.

프로세서(120)는 수신한 10msec의 분할 구간에 대한 평균 음량을 계산한다(S132). 프로세서(120)는 계산한 평균 음량이 미리 설정한 임계 세기 이하이면 해당 분할 구간을 잡음 구간으로 확인하고, 평균 음량이 임계 세기 이상이면 음성 구간으로 확인한다(S133).The processor 120 calculates the average volume for the received divided section of 10 msec (S132). If the calculated average volume is less than or equal to a preset threshold strength, the processor 120 checks the divided section as a noise section, and if the average volume is greater than or equal to the threshold strength, it identifies as a voice section (S133).

프로세서(120)는 잡음 구간으로 확인한 분할 구간의 직전 음성 구간을 DFT 처리한다(S134). 이 때, 잡음 구간이 음원의 첫 구간이라 이전 음성 구간이 없는 경우에는, 다음 음성 구간을 DFT 처리한다.The processor 120 performs DFT on the speech section immediately before the divided section identified as the noise section (S134). At this time, if there is no previous voice section because the noise section is the first section of the sound source, the next voice section is subjected to DFT.

DFT 처리된 분할 구간에 주파수 대역으로 필터 뱅크(Filter bank)를 적용한 후 대역별로 신호 크기를 구한다(S135). 여기서 프로세서(120)가 필터 뱅크를 적용하는 방법이나 필터 뱅크의 종류, 대역별로 신호 크기를 구하는 방법은 여러 방법으로 수행할 수 있으므로, 본 발명의 실시예에서는 어느 하나의 방법으로 한정하지 않는다.After applying a filter bank as a frequency band to the DFT-processed divided section, the signal size for each band is obtained (S135). Here, the method of applying the filter bank by the processor 120, the method of obtaining the signal size for each filter bank type and band can be performed in several ways, and thus the embodiment of the present invention is not limited to any one method.

프로세서(120)는 음성 구간에 대한 주파수 대역별 신호 세기 중, 신호 세기가 가장 센 제1 주파수와 두 번째로 센 제2 주파수를 확인한다(S136). 프로세서(120)는 음원 또는 잡음 구간으로 확인된 분할 구간별로, 각각 음원을 나타내는 플래그 또는 잡음을 나타내는 플래그, 시작 시간과 종료 시간, 음원인 경우 제1 주파수와 제2 주파수, 잡음인 경우 잡음 레벨을 포함하여 메타 데이터를 생성한다(S137). 이렇게 생성된 메타 데이터는 메타 데이터 메모리(140)에 저장된다.The processor 120 checks a first frequency having the highest signal strength and a second frequency having the second highest signal strength among the signal strengths for each frequency band for the voice section (S136). For each divided section identified as a sound source or a noise section, the processor 120 determines a flag representing a sound source or a flag representing noise, a start time and an end time, a first frequency and a second frequency in the case of a sound source, and a noise level in the case of noise. And generates metadata (S137). The generated meta data is stored in the meta data memory 140.

여기서, 메타 데이터에 포함된 정보 중 종료 시간은 생략 가능하다. 또한, 본 발명의 실시예에서는 분할 구간이 음성 구간인 경우 플래그를 1로 설정하고, 잡음 구간인 경우 플래그를 0으로 설정하는 것을 예로 하여 설명하나, 반드시 이와 같이 한정되는 것은 아니다. S137 단계에서 생성된 메타 데이터의 형태에 대해 도 8을 참조로 먼저 설명한다.Here, the end time of the information included in the meta data may be omitted. In addition, in the exemplary embodiment of the present invention, the flag is set to 1 when the divided section is an audio section and the flag is set to 0 when the divided section is a noise section, but is not limited as such. The form of the meta data generated in step S137 will be described first with reference to FIG. 8.

도 8은 본 발명의 실시예에 따른 메타 데이터의 예시도이다.8 is an exemplary diagram of meta data according to an embodiment of the present invention.

도 8에 도시된 바와 같이, 프로세서(120)는 다양한 형태로 메타 데이터를 생성한다. As shown in FIG. 8, the processor 120 generates metadata in various forms.

먼저 도 8의 (a)는 플래그 정보, 분할 구간 시작 시간, 종료 시간, 제1 주파와 제2 주파수가 포함된 형태 또는 플래그 정보, 분할 구간 시작 시간, 종료 시간, 잡음 레벨을 포함한 형태로 메타 데이터가 형성된 것을 나타낸다. 제1 주파수와 제2 주파수가 포함된 메타 데이터(a-1)는 음성 구간에 대한 메타 데이터이고, 잡음 레벨이 포함된 메타 데이터(a-2)는 잡음 구간에 대한 메타 데이터를 나타낸 것이다.First, (a) of FIG. 8 shows metadata in a form including flag information, segment start time, end time, first and second frequencies, or flag information, segment start time, end time, and noise level. Indicates that is formed. Meta data (a-1) including the first frequency and the second frequency is meta data for the speech section, and meta data (a-2) including the noise level represents meta data for the noise section.

도 8의 (b)는 플래그 정보, 분할 구간 시작 시간, 제1 주파와 제2 주파수가 포함된 형태 또는 플래그 정보, 분할 구간 시작 시간, 종료 시간, 잡음 레벨을 포함한 형태로 메타 데이터가 형성된 것을 나타낸다. 제1 주파수와 제2 주파수가 포함된 메타 데이터(b-1)는 음성 구간에 대한 메타 데이터이고, 잡음 레벨이 포함된 메타 데이터(b-2)는 잡음 구간에 대한 메타 데이터를 나타낸 것이다.(B) of FIG. 8 shows that meta data is formed in a form including flag information, segment start time, first and second frequencies, or flag information, segment start time, end time, and noise level. . Meta data (b-1) including the first frequency and the second frequency is meta data for a speech section, and meta data (b-2) including a noise level represents meta data for a noise section.

또한, 도 8의 (c)에 도시한 바와 같이, 플래그 정보, 분할 구간 시작 시간, 종료 시간, 제1 주파수와 제2 주파수, 그리고 잡음 레벨이 하나의 메타 데이터에 모두 포함되어 생성될 수도 있다.In addition, as shown in (c) of FIG. 8, flag information, a division section start time, an end time, a first frequency and a second frequency, and a noise level may all be included in one metadata to be generated.

한편, 도 4를 이어 설명하면 S120 단계에서 확인한 결과 음원에 대응하는 메타 데이터가 존재하는 경우, 오디오 서버(100)는 메타 데이터와 음원을 포함하는 제1 오디오 신호를 워터마킹 처리 서버(200)로 전달한다(S140). Meanwhile, referring to FIG. 4, if metadata corresponding to the sound source exists as a result of checking in step S120, the audio server 100 transmits the metadata and the first audio signal including the sound source to the watermarking processing server 200. It transmits (S140).

워터마킹 처리 서버(200)는 제1 오디오 신호를 수신하여 메타 데이터와 음원을 확인한다. 워터마킹 처리 서버(200)는 메타 데이터에 포함되어 있는 제1 주파수와 제2 주파수를 토대로 제1 주파수의 잡음과 제2 주파수의 잡음을 생성한다(S150). 본 발명의 실시예에서는 설명의 편의를 위하여 하나의 음원에 대해 공통적으로 사용할 수 있는 단위 잡음을 생성하는 것을 예로 하여 설명하나, 반드시 이와 같이 한정되는 것은 아니다.The watermarking processing server 200 receives the first audio signal and checks the metadata and sound source. The watermarking processing server 200 generates noise of a first frequency and noise of a second frequency based on the first frequency and the second frequency included in the metadata (S150). In the exemplary embodiment of the present invention, for convenience of explanation, generation of unit noise that can be used in common for one sound source is described as an example, but is not necessarily limited as such.

워터마킹 처리 서버(200)는 단위 잡음과 삽입 정보를 이진 데이터 또는 2^N 진법 데이터 형태로 조합하여 워터마크를 생성한다(S160). 생성한 워터 마크를 음원의 잡음 부분에 삽입하고(S170), 워터 마크가 삽입된 음원을 오디오 서버(100)를 통해 단말(300)로 전송한다. 그리고 워터마킹 처리 서버(200)는 워터 마크를 생성하는데 이용한 삽입 정보와 워터마크 생성 내역을 저장한다(S180). 본 발명의 실시예에서는 삽입 정보와 워터마크 생성 내역이 워터마킹 처리 서버(200)에 저장되는 것을 예로 하여 설명하나, 오디오 서버(100)에 저장될 수도 있다.The watermarking processing server 200 generates a watermark by combining unit noise and insertion information in the ^{form of binary data or 2N} base data (S160). The generated watermark is inserted into the noise part of the sound source (S170), and the sound source with the watermark inserted is transmitted to the terminal 300 through the audio server 100. In addition, the watermarking processing server 200 stores the embedding information used to generate the watermark and the watermark generation details (S180). In the embodiment of the present invention, the embedding information and watermark generation details are stored in the watermarking processing server 200 as an example, but may be stored in the audio server 100.

이상에서는 음원이 오디오 서버(100)에 저장되어 있을 경우의 오디오 워터마킹 방법에 대하여 설명하였다. 그러나, 음원이 실시간으로 단말(300)로부터 전송되어 또 다른 단말로 스트리밍 형태로 전달될 수도 있다. 이 경우 오디오 워터마킹 방법에 대해 도 6을 참조로 설명한다.In the above, the audio watermarking method when the sound source is stored in the audio server 100 has been described. However, the sound source may be transmitted from the terminal 300 in real time and transmitted to another terminal in the form of streaming. In this case, an audio watermarking method will be described with reference to FIG. 6.

도 6은 본 발명의 제2 실시예에 따른 오디오 워터마킹 방법에 대한 흐름도이다.6 is a flowchart of an audio watermarking method according to a second embodiment of the present invention.

도 6에 도시된 바와 같이, 오디오 서버(100)는 단말(300)로부터 전송되는 음원을 수신하면(S200), 수신한 음원을 미리 설정한 구간 단위로 분할한다(S201). 본 발명의 실시예에서는 10msec로 분할하여 분할 구간을 생성하는 것을 예로 하여 설명한다.As shown in FIG. 6, when the audio server 100 receives the sound source transmitted from the terminal 300 (S200), the received sound source is divided into preset section units (S201). In the embodiment of the present invention, it will be described as an example of generating a divided section by dividing it by 10 msec.

오디오 서버(100)는 실시간으로 음원을 수신하기 때문에, 음원에 대한 메타 데이터가 저장되어 있는 것이 없다. 따라서, 분할 구간의 평균 음량을 계산하고 임계 세기와 비교하여 분할 구간이 음성 구간인지 잡음 구간인지 확인하여 구분한다(S202). Since the audio server 100 receives the sound source in real time, there is no metadata about the sound source stored therein. Accordingly, the average volume of the divided section is calculated and compared with the threshold intensity to determine whether the divided section is a voice section or a noise section, and classified (S202).

프로세서(120)는 S202 단계에서 확인한 분할 구간이 음성 구간이면, 단말(300)로 음원을 제공한다(S204). 동시에 음성 구간을 DFT 처리한다(S206). If the divided section identified in step S202 is a voice section, the processor 120 provides a sound source to the terminal 300 (S204). At the same time, the audio section is DFT-processed (S206).

프로세서(120)는 DFT 처리된 분할 구간에 주파수 대역별로 필터 뱅크(Filter bank)를 적용한 후 대역별로 신호 크기를 구한다. DFT 처리된 분할 구간은 워터마킹 처리 서버(200)로 전달된다. 그러나, 확인된 분할 구간이 잡음 구간이면, 프로세서(120)는 해당 분할 구간의 잡음 레벨을 결정한다(S205).The processor 120 applies a filter bank for each frequency band to the DFT-processed divided section and then obtains a signal size for each band. The DFT-processed divided section is transmitted to the watermarking processing server 200. However, if the identified divided section is a noise section, the processor 120 determines the noise level of the divided section (S205).

워터마킹 처리 서버(200)는 DFT 처리한 분할 구간에서 신호 세기가 가장 센 제1 주파수와 두 번째로 센 제2 주파수를 확인하여 메인 주파수로 선정한다(S207). 이때, 분할 구간이 잡음 구간이라면 분할 구간 이전 구간 또는 이후 구간의 음성 구간 중 잡음 구간에 근접한 구간에서 제1 주파수와 제2 주파수를 선정한다. 그리고, 메인 주파수로 선정한 제1 주파수와 제2 주파수별로 미리 생성되어 있는 잡음을 확인한다(S208). 본 발명의 제2 실시예에서는 주파수별로 잡음이 생성되어 있는 것을 예로 하여 설명하나, 실시간으로 잡음을 생성할 수도 있다.The watermarking processing server 200 checks the first frequency having the strongest signal strength and the second frequency having the highest signal strength in the divided section subjected to DFT processing, and selects the main frequency (S207). In this case, if the divided section is a noise section, the first frequency and the second frequency are selected in a section close to the noise section among the voice sections of the section before or after the section. Then, noise generated in advance for each of the first frequency and the second frequency selected as the main frequency is checked (S208). In the second embodiment of the present invention, noise is generated for each frequency as an example, but noise may be generated in real time.

워터마킹 처리 서버(200)는 잡음 레벨에 맞춰 제1 주파수와 제2 주파수 잡음을 조합하여 워터 마크를 생성한다(S209). 본 발명의 실시예에서는 삽입 정보와 잡음을 이진 데이터 형태로 조합하여 오디오 워터마크를 생성하는 것을 예로 하여 설명한다. 여기서, 삽입 정보는 잡음 구간에 대한 시간 정보나 음원의 식별 정보 또는 사용자 식별 정보 중 어느 형태의 정보가 선택될 수 있다.The watermarking processing server 200 generates a watermark by combining the first frequency and the second frequency noise according to the noise level (S209). In the embodiment of the present invention, an audio watermark is generated by combining embedding information and noise in the form of binary data. Here, as the insertion information, any form of information from time information on a noise section, identification information of a sound source, or user identification information may be selected.

워터마킹 처리 서버(200)는 생성한 워터 마크를 분할 구간 즉, 잡음 구간에 삽입하고(S210), 워터 마크가 삽입된 음원을 오디오 서버(100)를 통해 단말(300)로 전송한다(S211). 오디오 서버(100)는 실시간으로 입력되는 분할 구간을 계속 확인하여, 잡음 구간이 지속되고 있는지 확인한다(S212). The watermarking processing server 200 inserts the generated watermark into a divided section, that is, a noise section (S210), and transmits the sound source into which the watermark is inserted to the terminal 300 through the audio server 100 (S211). . The audio server 100 continuously checks the divided section input in real time and checks whether the noise section continues (S212).

만약 잡음 구간이 지속된다면, S210 단계에 따라 워터 마크를 잡음 구간에 반복 삽입한다. 그러나, 새로 수신한 분할 구간이 잡음 구간이 아닌 것으로 확인하면, 워터마크 삽입을 중단한다(S213). If the noise section continues, the watermark is repeatedly inserted into the noise section according to step S210. However, if it is confirmed that the newly received segmentation section is not a noise section, the watermark insertion is stopped (S213).

그리고 워터마킹 처리 서버(200)는 워터 마크를 생성하는데 이용한 삽입 정보와 워터마크 생성 내역을 메타 데이터로 저장한다(S214). 추가로, 실시간 방송, 통신이 완료된 이후 저장된 음원 파일을 이용하여 상기 도 4에 도시한 제1 실시예와 같은 절차를 추가로 처리할 수 있다. 본 발명의 실시예에서는 삽입 정보와 워터마크 생성 내역이 워터마킹 처리 서버(200)에 저장되는 것을 예로 하여 설명하나, 오디오 서버(100)에 저장될 수도 있다.In addition, the watermarking processing server 200 stores the embedding information used to generate the watermark and the watermark generation details as meta data (S214). In addition, the same procedure as in the first embodiment illustrated in FIG. 4 may be additionally processed by using the sound source file stored after the completion of real-time broadcasting and communication. In the embodiment of the present invention, the embedding information and watermark generation details are stored in the watermarking processing server 200 as an example, but may be stored in the audio server 100.

이상에서 본 발명의 실시예에 대하여 상세하게 설명하였지만 본 발명의 권리범위는 이에 한정되는 것은 아니고 다음의 청구범위에서 정의하고 있는 본 발명의 기본 개념을 이용한 당업자의 여러 변형 및 개량 형태 또한 본 발명의 권리범위에 속하는 것이다.Although the embodiments of the present invention have been described in detail above, the scope of the present invention is not limited thereto, and various modifications and improvements by those skilled in the art using the basic concept of the present invention defined in the following claims are also provided. It belongs to the scope of rights.

Claims

As an audio watermarking system that inserts an audio watermark into a sound source,
Check the voice section and the noise section in the sound source, check the first frequency and the second frequency as the main frequency in the order of the highest signal strength in the voice section close to the noise section among the sections before or after the identified noise section, and , Generating a first frequency noise and a second frequency noise of each of the first frequency and the second frequency, the main frequency, the noise level of the first frequency noise and the second frequency noise, and position information of the noise section An audio server that generates metadata containing, and
Receives the sound source and metadata from the audio server, generates a first frequency noise and a second frequency noise for the first frequency and the second frequency, and generates the sound source and metadata, time information and sound source for which the sound source was requested A watermarking processing server that generates an audio watermark based on the identification information of the user who requested the request, inserts it into the noise section, and delivers the sound source with the audio watermark inserted to the audio server.
Audio watermarking system comprising a.

The method of claim 1,
The audio server,
An interface that transmits an audio signal including the sound source and metadata about the sound source to the watermarking processing server, and transmits the sound source to a terminal requesting the sound source when the sound source with the audio watermark is received, and
Receiving the sound source requested from the terminal as a divided section divided in units of a preset section to obtain an average volume of the divided section, and comparing the average volume with a preset threshold strength to distinguish whether the divided section is a voice section or a noise section Processor
Audio watermarking system comprising a.

The method of claim 2,
The processor,
When it is determined that the segmentation section is a noise section, the average volume obtained from the voice section close to the noise section among the previous or subsequent sections of the segmentation section is determined as the existing noise level of the segmentation section, which is the noise section,
An audio watermarking system that calculates the average volume by using the signal strength of the divided section and the number of sound source data included in the divided section.

The method of claim 3,
The processor,
An audio watermarking system for performing DFT (Discrete Fourier Transform) processing on an audio section that is either a previous section or a subsequent section of the noise section, and setting the main frequency in the DFT-processed divided section.

The method of claim 4,
The processor,
Generate metadata including sound source information of the divided section, the first frequency and the second frequency, and a noise level,
In the first frequency and the second frequency, any one of index information or frequency values preset for each frequency is inserted,
The sound source information is an audio watermarking system including at least one of a divided section start time, a divided section end time, and identification information of the divided section.

The method of claim 2,
The audio server,
Meta data memory for storing metadata including information on at least one noise section included in the sound source, a main frequency for the noise section, and information on a noise level of the noise section and information on a divided section that is a noise section, and
Audio memory for storing sound sources transmitted from the outside through the interface together with sound source identification information
Including more,
The main frequency is an audio watermarking system indicating two frequencies having the highest signal strength in the noise section.

The method of claim 6,
The watermarking processing server,
An interface for receiving the meta data from the audio server and transmitting a second audio signal in which an audio watermark is inserted in the divided section to the audio server,
Audio to be inserted into the noise section by checking the noise section of the sound source based on the meta data, and using the main frequency and noise level included in the meta data, the time information requested by the sound source, and identification information of the user who requested the sound source A watermark processing unit that generates a watermark and inserts it into the noise section, and
Memory for storing noise and embedding information used by the watermark processor to generate an audio watermark
Audio watermarking system comprising a.

The method of claim 7,
The watermark processing unit,
Generate noise of a first frequency and noise of a second frequency based on the main frequency of the metadata,
An audio watermarking system for inserting noise of a first frequency into a first part of a binary code generated based on sound source information included in the meta data and a noise of a second frequency into a second part.

As a method of inserting an audio watermark into a sound source in which an audio watermarking system is stored,
Indicating at least one noise section included in the sound source requested by the terminal, but based on the metadata including information on the main frequency set in the order in which the signal strength for each frequency of the voice section closest to the noise section is high, the sound source is Checking at least one included noise section,
Generating a noise of a main frequency based on the main frequency included in the metadata, and generating a watermark to be inserted into the noise section based on the generated noise and the sound source, and
Transmitting the generated watermark to the terminal after inserting it into the noise section
Audio watermarking method comprising a.

The method of claim 9,
Before the step of checking the noise section,
Generating a divided section obtained by dividing the sound source into a preset time unit,
Calculating an average volume for the generated divided section, and checking whether the divided section is a noise section or a voice section based on the calculated average volume,
If it is determined that the divided section is a noise section, extracting signal strength for each of at least one frequency band included in the voice section by performing DFT processing on a voice section closest to the noise section among the previous or subsequent sections of the divided section. ,
Setting a first frequency and a second frequency having high signal strength as the main frequency for the noise section, and
Generating metadata indicating that the divided section is a noise section, including a noise level set based on the average volume, the first frequency and the second frequency, and sound source information of the divided section.
Audio watermarking method comprising a.

The method of claim 10,
The average volume is obtained by using the signal strength of the time domain of the divided section and the number of sound source data included in the divided section,
When the average volume is less than a preset threshold strength, it is determined that the divided section is a noise section, and the average volume is determined as the noise level of the divided section.

The method of claim 10,
The step of generating the watermark,
Checking a first frequency and a second frequency corresponding to the main frequency, generating noise for the first frequency and noise for the second frequency, and
Generating a watermark by inserting the noise of the first frequency into the first part of the binary code generated based on the sound source information included in the meta data and the noise of the second frequency into the second part
Audio watermarking method comprising a.

As a method of inserting an audio watermark into a sound source transmitted in real time by an audio watermarking system,
Dividing the transmitted sound source into a preset time unit, and checking whether the divided section is a voice section or a noise section based on the average volume of the divided section,
If the divided section is a noise section, among at least one frequency included in the voice section adjacent to the noise section among the previous or subsequent sections of the noise section, a frequency having the highest frequency intensity is selected as the main frequency of the noise section. step,
Checking the noise corresponding to the main frequency, and generating a watermark using the checked noise and sound source information of the divided section, and
Inserting the generated watermark into the segmentation section
Audio watermarking method comprising a.

The method of claim 13,
The step of determining whether it is the noise section,
An audio watermarking method for calculating the average volume using the signal strength of the time domain of the divided section and the number of sound source data included in the divided section.

The method of claim 13,
After the step of inserting into the divided section,
Checking whether the noise section continues based on the average volume of the divided section transmitted following the divided section, and
Inserting the generated watermark if the noise section continues, and stopping the insertion of the watermark if the noise section does not persist.
Audio watermarking method comprising a.

The method of claim 13,
After the step of inserting,
Storing sound source information and watermark generation details used to generate the watermark as metadata
Audio watermarking method further comprising a.