KR20050117607A

KR20050117607A - Sync signal insertion/detection method and apparatus for synchronization between audio contents and text

Info

Publication number: KR20050117607A
Application number: KR1020030016308A
Authority: KR
Inventors: 신승원; 이원하; 김남훈
Original assignee: (주)마크텍; 왕상주
Priority date: 2003-03-15
Filing date: 2003-03-15
Publication date: 2005-12-15
Also published as: KR20040034338A; KR100577558B1

Abstract

오디오 파일의 재생시에 텍스트를 동기화시켜 출력할 수 있도록 하기 위한, 오디오 파일에 동기신호를 삽입하는 방법 및 장치가 개시된다. 먼저, 프레임의 제 2 부분으로부터 상기 프레임의 제 1 부분의 크기에 관한 정보를 획득한다. 그 후, 획득된 정보에 기초하여, 프레임의 제 3 부분의 시작 위치 및 크기를 판정하고, 프레임의 제 3 부분으로 동기신호의 적어도 일부를 삽입한다. 따라서, 오디오 내용에 손상을 주지 않으면서도 효과적으로 오디오 파일에 동기신호를 삽입할 수 있다.A method and apparatus for inserting a synchronization signal into an audio file for synchronizing and outputting text upon reproduction of an audio file is disclosed. First, information about the size of the first portion of the frame is obtained from the second portion of the frame. Then, based on the obtained information, the start position and size of the third part of the frame are determined, and at least a part of the synchronization signal is inserted into the third part of the frame. Therefore, the synchronization signal can be effectively inserted into the audio file without damaging the audio content.

Description

SYNC SIGNAL INSERTION / DETECTION METHOD AND APPARATUS FOR SYNCHRONIZATION BETWEEN AUDIO CONTENTS AND TEXT}

본 발명은 오디오 컨텐츠에 관한 것으로, 보다 상세하게는 디지털 휴대용 재생 장치 (portable digital playback device) 에서 디지털 오디오 컨텐츠와 텍스트 사이의 동기화 방법 및 장치에 관한 것이다.The present invention relates to audio content, and more particularly, to a method and apparatus for synchronizing between digital audio content and text in a portable digital playback device.

최근, 컴퓨터 기술의 발전에 부응하여, 컴퓨터를 사용하여 오디오 컨텐츠를 재생하는 기술이 빠르게 발전하고 있다. 이에 따라, 오디오 컨텐츠를 재생함과 동시에 오디오 컨텐츠의 내용을 시각적으로 표시해주는 기능이 주목을 받고 있다. 예컨대, 가요에 관한 오디오 컨텐츠를 재생함과 동시에 그 가사를 화면에 표시하는 기술이 이에 해당된다.In recent years, in response to the development of computer technology, a technology for reproducing audio content using a computer has been rapidly developed. Accordingly, attention has been paid to a function of visually displaying the content of the audio content while playing the audio content. For example, a technology for displaying the lyrics on the screen while playing audio content related to the song corresponds to this.

도 10 을 참조하여, 종래기술에서의 오디오 컨텐츠와 컨텐츠 내용을 동시에 표시하는 구성을 설명하기로 한다.Referring to FIG. 10, a configuration of simultaneously displaying audio content and content content in the prior art will be described.

먼저, 재생 대상이 되는 오디오 컨텐츠, 및 오디오 컨텐츠의 내용을 저장하고 있는 텍스트 파일이 마련된다. 도 10 은 종래의 오디오 컨텐츠의 내용을 저장하는 텍스트 파일을 테이블의 형태로 재구성한 도면이다. 텍스트 파일에는 오디오 컨텐츠의 내용 뿐만 아니라, 그 오디오 컨텐츠의 내용을 시각적으로 표시하는 재생시점이 저장되어 있다. 도 10 의 예에서는, 압축된 음성 또는 음악 파일이 재생되는 중에, 액정에 문자를 출력할 시간을 알려주는 재생시점은 1/1000 초 단위로 저장되어 있다. First, an audio content to be played back and a text file storing the content of the audio content are provided. FIG. 10 is a diagram illustrating a reconstruction of a text file storing contents of conventional audio content in the form of a table. The text file stores not only the contents of the audio contents, but also the playback time for visually displaying the contents of the audio contents. In the example of FIG. 10, while the compressed voice or music file is being reproduced, a reproduction time point that informs the time to output characters to the liquid crystal is stored in units of 1/1000 second.

예컨대, 재생시점 0000040 ms 에서, 오디오 컨텐츠가 재생되고, 그 오디오 컨텐츠에 대응되는 "이 발명은 휴대용 디지털 재생장치에서" 라는 문자열이 소정의 디스플레이를 통하여 시각적으로 출력된다. 오디오 컨텐츠가 재생됨에 따라, 재생시점 0001055 ms 에서 오디오 컨텐츠의 재생과 동시에 "음악이나 음성 파일을 재생하는 동안에" 라는 문자열이 출력된다.For example, at a playback time of 0000040 ms, audio content is played back, and a character string corresponding to the audio content is visually output through a predetermined display. As the audio content is played back, at the playback time 0001055 ms, at the same time as the audio content is played, a string "while playing music or an audio file" is output.

즉, 오디오 컨텐츠를 재생시키면서 재생시점을 감시하여, 재생시점이 테이블에 나타난 출력 문자열의 재생시점과 일치하는 경우에, 출력 문자열이 출력되도록 한다.That is, the playback time is monitored while the audio content is being played back, so that the output string is output when the playback time coincides with the playback time of the output string shown in the table.

상기와 같은 텍스트 파일의 구조는 동영상에 자막을 출력하기 위한 예컨대 ".smi 파일" 의 구조와 실질적으로 유사한 것으로, 컴퓨터와 같이 사용가능한 리소스가 충분히 제공되는 경우에 사용하는 것이 바람직하다.The structure of the text file as described above is substantially similar to the structure of, for example, a ".smi file" for outputting subtitles in a video, and is preferably used when sufficient resources are provided, such as a computer.

그러나, 디지털 휴대용 재생 장치에서 디지털 오디오 컨텐츠와 텍스트를 동기화시키는 경우에는 사용가능한 리소스에 한계가 있다. 디지털 휴대용 재생 장치에서 오디오 컨텐츠의 ms 단위의 재생시간을 감시하는 것은 실제적으로는 가능하지 않다. 따라서, 이러한 미세한 재생시간에 일치하여 텍스트를 출력하는데 어려움이 있다. 그 때문에 상술한 방법은 디지털 휴대용 재생 장치에 있어서 재생하는 경우에는 적당하지 않다.However, there is a limit to the resources available when synchronizing text with digital audio content in a digital portable playback device. It is not practical to monitor the playback time in ms of audio content in a digital portable playback device. Therefore, there is a difficulty in outputting text in accordance with such minute reproduction time. Therefore, the above-described method is not suitable when playing back in a digital portable playback apparatus.

디지털 컨텐츠에는, 특정한 목적을 달성하기 위하여, 워터마크를 삽입하는 경우가 있다. 일반적으로, 워터마킹 기술은 저작물에 대한 저작권 보호, 저작물의 위·변조 유무 판별 등을 위하여 음원에 일반인들이 인식하지 못하는 저작물의 정보를 저장하는 기술을 의미한다. 워터마킹 기술은 저작물의 실질적인 음원에 사용자가 정의한 정보를 은닉하기 때문에, 신호처리 공격, 압축 변환 등에도 강인하며 악의적인 목적으로 제거하기 어려운 특징을 갖는 강인한 워터마크 (robust watermark) 를 사용하는 것이 일반적이다.In order to achieve a specific purpose, the digital content may contain a watermark. In general, the watermarking technology refers to a technology for storing information of a work which is not recognized by the general public in a sound recording for copyright protection of a work and for determining whether the work is forged or forged. Since watermarking technology conceals user-defined information in the actual sound source of the copyrighted work, it is common to use robust watermarks that are robust against signal processing attacks, compression conversion, etc., and that are difficult to remove for malicious purposes. to be.

이와 같은 워터마킹은 데이터를 디지털 컨텐츠의 음원에 삽입하기 때문에, 은닉한 정보를 다시 검출해내기 위해서는 상당히 복잡한 연산과정이 수행되어야 하기 때문에, 많은 메모리 용량과 계산량이 수반되어야 한다. 이 때문에 워터마킹 기술을 통상 DSP 로 구현하기 위해서는 상당한 양의 리소스를 소모하기 때문에, 이러한 DSP 를 사용하는 휴대용 MP3 플레이어와 같은 휴대용 디지털 재생 장치에는 사용하기 어려운 문제점이 있다. 또한, 많은 리소스를 소모하는 부가적인 기능은 휴대용 재생 장치의 제한된 배터리 사용시간을 고려할 때 바람직하지 않다. 특히, 대부분의 오디오 데이터는 대상 컨텐츠를 압축하는 포멧으로 되어 있기 때문에, 통상적인 워터마킹 기술은 사용가능하지 않다.Since watermarking inserts data into a sound source of digital content, a large amount of computation and computation must be performed because a fairly complicated calculation process must be performed to detect hidden information again. For this reason, since watermarking technology usually consumes a considerable amount of resources to implement a DSP, it is difficult to use a portable digital playback device such as a portable MP3 player using such a DSP. In addition, an additional function that consumes a lot of resources is undesirable in view of the limited battery life of the portable playback device. In particular, since most audio data is in a format for compressing target content, conventional watermarking techniques are not available.

압축된 데이터에 정보를 은닉하는 기술에 대하여는, F. Petitcolas 가 제안한 MP3Stego (Computer Laboratory, Cambridge, August, 1998) 가 개시되어 있다. 이 기술은 음원을 압축하는 과정 중에 데이터를 은닉하기 때문에 고속 삽입처리가 가능하지 않은 문제점이 있다.As a technique for hiding information in compressed data, MP3Stego (Computer Laboratory, Cambridge, August, 1998) proposed by F. Petitcolas is disclosed. This technique has a problem in that high-speed insertion processing is not possible because data is concealed during the compression of the sound source.

또한, L. Qia 와 K. Nahrstedt 가 제안한 Non-Invertible Watermarking Methods For MPEG Encoded Audio (Security and watermarking of Multimedia Contents, January 1999) 는 MP3 의 음원을 변질시킬 우려가 높으며, 은닉가능한 정보량에 한계가 있는 문제점이 있다.Also, Non-Invertible Watermarking Methods For MPEG Encoded Audio (Security and watermarking of Multimedia Contents, January 1999) proposed by L. Qia and K. Nahrstedt has a high possibility of altering the sound source of MP3 and has a limited amount of concealable information. There is this.

또한, D. K. Koukopoulos 와 Y. C. Stamatiou 가 제안한 A compressed-domain watermarking algorithm for MPEG Audio Layer3 (ACM Multimedia 2001, Septemper 30 - October 5, Ottawa, Ontario, Canada) 은 고속추출은 가능할 수 있으나, 고속 삽입처리는 가능하지 않은 문제점이 있다.Also, A compressed-domain watermarking algorithm for MPEG Audio Layer 3 (ACM Multimedia 2001, Septemper 30-October 5, Ottawa, Ontario, Canada) proposed by DK Koukopoulos and YC Stamatiou can be used for high-speed extraction, but not for fast insertion. There is a problem.

본 발명은, 상술한 바와 같은 문제점을 해결하기 위하여 안출된 것으로, 음질에 미치는 영향을 최소화하면서 고속 삽입/처리가 가능한, 오디오 컨텐츠와 텍스트를 동기화시키는 동기신호 삽입 방법 및 장치를 제공하는 것을 그 목적으로 한다.Disclosure of Invention The present invention has been made to solve the above-described problems, and an object thereof is to provide a synchronization signal insertion method and apparatus for synchronizing audio content with text, which enables high-speed insertion / processing while minimizing the effect on sound quality. It is done.

또한, 본 발명은 동기신호가 삽입되어 있는 오디오 파일로부터 동기신호를 검출하는 동기신호 검출 방법 및 장치를 제공하는 것을 그 목적으로 한다.Another object of the present invention is to provide a synchronization signal detecting method and apparatus for detecting a synchronization signal from an audio file into which a synchronization signal is inserted.

상술한 목적을 달성하기 위하여, 본 발명은 프레임의 제 2 부분으로부터 상기 프레임의 제 1 부분의 크기에 관한 정보를 획득하는 단계, 상기 획득된 정보에 기초하여, 상기 프레임의 제 3 부분의 시작 위치 및 크기를 판정하는 단계, 및 상기 프레임의 상기 제 3 부분으로 동기신호의 적어도 일부를 삽입하는 단계를 포함하는 오디오 컨텐츠가 저장된 제 1 부분, 적어도 상기 제 1 부분의 크기에 관한 정보를 갖는 제 2 부분, 및 상기 제 1 부분의 크기에 따라 시작 위치와 크기가 결정되는 제 3 부분을 각각 갖는 복수의 프레임들을 포함하는 오디오 파일에 동기신호를 삽입하는 방법을 제공한다.In order to achieve the above object, the present invention obtains information about the size of the first portion of the frame from the second portion of the frame, based on the obtained information, the starting position of the third portion of the frame And determining a size, and inserting at least a portion of a synchronization signal into the third portion of the frame, the first portion storing audio content, the second having at least information about the size of the first portion. A method of inserting a synchronization signal into an audio file includes a portion and a plurality of frames each having a third portion whose starting position and size are determined according to the size of the first portion.

또한, 상기 동기신호는 상기 프레임의 상기 제 1 부분에 대응하는 텍스트의 위치에 관한 정보를 포함한다.The sync signal also includes information about the position of the text corresponding to the first portion of the frame.

또한, 상기 프레임의 상기 제 3 부분으로 동기신호의 적어도 일부를 삽입하는 단계는, 상기 프레임의 상기 제 3 부분으로의 동기신호의 삽입 여부를 결정하는 단계, 및 동기신호의 불삽입 결정에 응답하여, 상기 프레임의 상기 제 3 부분으로 상기 프레임의 상기 제 1 부분에 대응하는 텍스트 정보를 삽입하는 단계를 포함한다.The inserting of at least a portion of the synchronization signal into the third portion of the frame may include determining whether to insert a synchronization signal into the third portion of the frame, and in response to determining whether to insert a synchronization signal; And inserting text information corresponding to the first portion of the frame into the third portion of the frame.

또한, 상기 프레임의 상기 제 3 부분으로 동기신호의 적어도 일부를 삽입하는 단계는, 상기 제 3 부분에서의 동기신호 삽입 영역과 동기신호의 크기를 비교하여, 상기 제 3 부분에서의 상기 동기신호 삽입 영역이 상기 동기신호의 크기보다 작은 경우, 상기 동기신호 삽입 영역과 동일한 크기만큼의 상기 동기신호의 부분을 상기 제 3 부분으로 삽입하는 단계를 포함한다.In the inserting of at least a part of the synchronization signal into the third portion of the frame, the synchronization signal insertion region in the third portion may be compared with the magnitude of the synchronization signal to insert the synchronization signal in the third portion. If the area is smaller than the size of the synchronization signal, inserting a portion of the synchronization signal with the same size as the synchronization signal insertion area into the third portion.

또한, 상기 오디오 컨텐츠는 상기 텍스트를 TTS (Text-to-Speech) 변환하여 생성될 수도 있다.In addition, the audio content may be generated by converting the text to text-to-speech (TTS).

또한, 상기 제 3 부분은 동기신호의 존재를 나타내는 영역 및 상기 동기신호의 내용을 나타내는 영역을 포함한다.Further, the third portion includes an area indicating the presence of a synchronization signal and an area indicating the content of the synchronization signal.

한편, 본 발명은 제 1 부분의 크기에 관한 정보에 기초하여, 상기 제 3 부분의 시작 위치와 크기에 관한 정보를 추출하는 단계, 상기 제 3 부분을 분석하여, 동기신호의 존재 여부를 판정하는 단계, 및 동기신호의 존재 판정에 응답하여, 상기 제 3 부분으로부터 동기신호의 적어도 일부를 획득하는 단계를 포함하는 오디오 컨텐츠가 저장된 제 1 부분, 적어도 상기 제 1 부분의 크기에 관한 정보를 갖는 제 2 부분, 및 상기 제 1 부분의 크기에 따라 시작 위치와 크기가 결정되는 제 3 부분을 각각 갖는 복수의 프레임들을 포함하는 오디오 파일로부터 동기신호를 검출하는 방법을 제공한다.On the other hand, the present invention is based on the information on the size of the first portion, the step of extracting information about the start position and size of the third portion, by analyzing the third portion, to determine the presence of a synchronization signal And in response to determining the presence of the synchronization signal, obtaining at least a portion of the synchronization signal from the third portion, the first portion having audio content stored therein, the first having at least information about the size of the first portion. A method of detecting a synchronization signal from an audio file includes a plurality of frames each having a second portion and a third portion whose starting position and size are determined according to the size of the first portion.

한편, 본 발명은 제 1 부분의 크기에 관한 정보에 기초하여, 상기 제 3 부분의 시작 위치와 크기에 관한 정보를 추출하고, 상기 제 3 부분을 분석하여, 동기신호의 존재 여부를 판정하는 동기신호 존재 여부 판정부, 및 동기신호의 존재 판정에 응답하여, 상기 제 3 부분으로부터 동기신호의 적어도 일부를 획득하는 동기신호 획득부를 구비하는 오디오 컨텐츠가 저장된 제 1 부분, 적어도 상기 제 1 부분의 크기에 관한 정보를 갖는 제 2 부분, 및 상기 제 1 부분의 크기에 따라 시작 위치와 크기가 결정되는 제 3 부분을 각각 갖는 복수의 프레임들을 포함하는 오디오 파일로부터 동기신호를 검출하는 장치를 제공한다.On the other hand, the present invention is based on the information on the size of the first part, extracting information about the start position and size of the third part, and analyzing the third part, the synchronization to determine the presence of the synchronization signal A first portion storing audio content having a signal presence determining portion and a synchronizing signal obtaining portion for obtaining at least a portion of the synchronization signal from the third portion in response to determining the presence of the synchronization signal; An apparatus for detecting a synchronization signal from an audio file comprising a plurality of frames each having a second portion having information about and a third portion whose starting position and size are determined according to the size of the first portion.

이하, 첨부도면을 참조하여 본 발명의 바람직한 실시예에 대하여 보다 구체적으로 설명하면 다음과 같다.Hereinafter, preferred embodiments of the present invention will be described in detail with reference to the accompanying drawings.

도 1 은 디지털 휴대용 재생 장치에서 음악 파일과 가사 텍스트를 동기화시키기 위한 전체적인 과정을 도시한 개념도이다.1 is a conceptual diagram illustrating an overall process for synchronizing music files and lyrics text in a digital portable playback device.

도 1 을 참조하면, 먼저, 음악 파일과 가사 텍스트가 텍스트 동기화 프로그램에 입력된다. 입력된 정보를 이용하여, 텍스트 동기화 프로그램에서는 사용자로부터 각 가사가 출력되어야 할 시점을 직접 입력받게 된다. 사용자로부터 입력받은 정보는 도 10 에 도시한 바와 같이 각각 출력하고자 하는 텍스트와 재생시간이 연결된 정보로 구성될 수 있다. 텍스트 동기화 프로그램은, 본 발명의 따른 동기신호 삽입 방법에 따라서, 음악 파일의 소정의 위치에 대응하는 가사 출력을 위한 가사 텍스트의 소정의 위치를 나타내는 정보를 삽입한다.Referring to FIG. 1, first, a music file and lyrics text are input to a text synchronization program. Using the inputted information, the text synchronization program directly receives a time point at which each lyrics should be output from the user. As shown in FIG. 10, the information input from the user may include information to which text to be output and playback time are connected. The text synchronizing program inserts information indicating a predetermined position of lyrics text for lyrics output corresponding to a predetermined position of a music file according to the synchronization signal insertion method according to the present invention.

그 후, 휴대용 재생장치에서 음악 파일을 재생하는 경우에, 음악 파일 재생 중에 동기신호가 검출되면, 그 동기신호를 분석하여, 동기신호가 표시하는 텍스트의 소정의 위치에 해당하는 문자열을 휴대용 재생 장치의 디스플레이 수단을 통하여 출력하게 된다.Then, when the music file is played back by the portable playback device, if a sync signal is detected during music file playback, the sync signal is analyzed and the character string corresponding to the predetermined position of the text displayed by the sync signal is played. Output through the display means of.

한편, 본 발명에 따른 오디오 파일과 가사 텍스트를 동기화시키는 과정은 TTS (Text-to-Speech) 엔진을 이용하여 생성된 것일 수도 있다. 도 2 는 TTS 기술로 생성된 음성 파일과 텍스트를 동기화시키는 과정을 도시한 개념도이다.Meanwhile, the process of synchronizing the audio file and the lyrics text according to the present invention may be generated using a text-to-speech (TTS) engine. 2 is a conceptual diagram illustrating a process of synchronizing text with a voice file generated by the TTS technology.

TTS 는 텍스트를 음성 합성하여 음성 파일로 만드는 기술로, 텍스트 문자를 음성 파일로 변환함에 있어서, 각 나라의 언어에 대한 최소 발음 단위로 음소 DB 를 구축한 후, 텍스트 문자의 앞뒤 맥락을 고려하여 검색된 음소 DB 를 합성하여 음성신호를 생성한다.TTS is a technology that synthesizes text into speech files. In converting text characters into speech files, TTS is constructed by phoneme DB with minimum pronunciation unit for each language. The phoneme DB is synthesized to generate a voice signal.

도 1 을 참조하여 상술한 본 발명의 구성에서는 사용자로부터 오디오 파일과 동기화시키기 위한 텍스트의 위치를 직접 입력받아야 하지만, TTS 에 의한 음성 합성의 경우에는 음성 파일의 생성과 동시에 그와 대응되는 텍스트 파일에서의 텍스트의 위치가 자동적으로 파악되기 때문에, 별도의 사용자 입력 과정은 불필요하다.In the configuration of the present invention described above with reference to FIG. 1, the position of text for synchronizing with an audio file must be directly input from a user. However, in the case of speech synthesis by TTS, a text file corresponding to the voice file is generated at the same time. Because the location of the text is automatically detected, no separate user input process is necessary.

이하, 본 발명의 실시예에서는 음악 파일의 포맷을 MP3 로 하여 설명하지만, WMA, AAC, 및 AC3 등 다른 오디오 파일 포맷에 따라 저장된 음악 파일의 경우에도 본 발명의 따른 동기신호 삽입 방법을 적용 또는 응용할 수 있음을 당업자는 알 수 있다.Hereinafter, although the format of the music file is described as MP3 in the embodiment of the present invention, in the case of music files stored according to other audio file formats such as WMA, AAC, and AC3, the synchronization signal insertion method according to the present invention may be applied or applied. Those skilled in the art will appreciate.

도 3 은 MP3 프레임의 구조를 나타내는 도면이다. 도 3 을 참조하여, MP3 프레임의 구조를 설명하면, MP3 오디오 파일은 복수의 프레임으로 구성되며, 각 프레임은 12 비트의 동기 비트로 구성된 싱크 (301), 부 정보 (side information; 303), 메인 데이터 (305), 및 스터핑 비트 (stuffing bits; 307) 로 구성된다.3 is a diagram illustrating a structure of an MP3 frame. Referring to FIG. 3, the structure of the MP3 frame will be described. The MP3 audio file includes a plurality of frames, and each frame includes a sink 301, side information 303, and main data, each of which has 12 bits of sync bits. 305, and stuffing bits 307.

메인 데이터 (305) 는 허프만 코딩 (Huffman Coding) 방식에 따라 오디오 컨텐츠가 무손실 압축되어 저장된다. 무손실 압축된 메인 데이터 (305) 는 바이트 단위로 저장되게 되며, 허프만 코딩의 결과 오디오 컨텐츠의 내용이 전혀 포함되지 않는 잉여 비트가 발생하게 된다. 이와 같은 잉여 비트를 스터핑 비트 (307) 라고 한다. 즉, 이 비트들은 음악을 재생하는 경우에 전혀 사용되지 않는 빈 공간이다. 스터핑 비트 (307) 는 메인 데이터 (305) 를 포함한 프레임의 크기를 바이트 단위가 되도록 하기 위한 비트이므로, 스터핑 비트 (307) 의 크기는 오디오 컨텐츠를 허프만 코딩하여 생성된 메인 데이터 (305) 의 크기에 따라 결정된다.The main data 305 is losslessly compressed and stored with audio content according to Huffman Coding. The lossless compressed main data 305 is stored in units of bytes, and as a result of Huffman coding, excess bits are generated in which the content of the audio content is not included at all. Such surplus bits are called stuffing bits 307. That is, these bits are empty spaces that are not used at all when playing music. Since the stuffing bit 307 is a bit for making the size of the frame including the main data 305 in units of bytes, the size of the stuffing bit 307 is equal to the size of the main data 305 generated by Huffman coding audio content. Is determined accordingly.

이하에서 보다 상세히 설명하는 바와 같이, 본 발명에서는 이러한 프레임의 구성적인 특성을 이용하여 스터핑 공간에 동기신호를 삽입하게 된다.As will be described in more detail below, the present invention inserts a synchronization signal into the stuffing space by using the structural characteristics of the frame.

도 4 는 본 발명의 제 1 실시예에 따른 동기신호 삽입 과정을 나타낸 흐름도이다. 도 4 를 참조하면, 먼저, MP3 오디오 파일이 선택되면, 이를 프레임 단위로 분할한다 (S501).4 is a flowchart illustrating a synchronization signal insertion process according to a first embodiment of the present invention. Referring to FIG. 4, first, when an MP3 audio file is selected, the MP3 audio file is divided in units of frames (S501).

분할된 각 프레임에 대하여, 프레임 분석이 수행된다 (S503). 프레임 분석은, 싱크 (301) 와 부 정보 (303) 를 분석하여, 메인 데이터 (305) 의 시작 위치와 그 크기에 관한 정보를 획득한다. 그 후, 메인 데이터 (305) 의 크기에 기초하여, 스터핑 비트 (307) 의 크기 및 위치가 획득된다.For each divided frame, frame analysis is performed (S503). The frame analysis analyzes the sink 301 and the sub information 303 to obtain information about the start position and the size of the main data 305. Then, based on the size of the main data 305, the size and position of the stuffing bit 307 is obtained.

메인 데이터 (305) 의 크기에 기초하여, 스터핑 비트 (307) 가 존재하지 않지만, 동기신호를 삽입할 공간이 필요하다고 판단되는 경우, 스터핑 비트 (307) 를 생성한다 (S507). 이 경우, 스터핑 공간을 위하여 1 바이트를 새롭게 할당하게 되며, 따라서 이후의 모든 프레임은 1 바이트만큼씩 뒤로 밀리도록 프레임을 재구성하게 된다 (S509).Based on the size of the main data 305, if the stuffing bit 307 does not exist, but it is determined that a space for inserting the synchronization signal is necessary, the stuffing bit 307 is generated (S507). In this case, one byte is newly allocated for the stuffing space, and thus, all subsequent frames are reconstructed so as to push backward by one byte (S509).

그 후, 해당 프레임에 동기신호가 삽입되어야 하는지가 판정된다 (S511). 동기신호가 삽입되어야 하는 경우에는, 스터핑 비트에 동기신호를 삽입하게 된다 (S513). 동기신호의 크기는 일반적으로 스터핑 공간의 비트수보다 크기 때문에, 하나의 동기신호 전부를 하나의 스터핑 공간에 삽입하는 것이 아니라, 동기신호의 일부를 하나의 스터핑 공간에 삽입한다. 즉, 복수 개의 스터핑 공간에 하나의 동기신호를 삽입한다. 스터핑 공간에 삽입되는 동기신호는 동기신호의 존재를 나타내는 부분 및 텍스트의 위치 및 출력되는 텍스트의 문자수를 나타내는 부분을 포함한다.Then, it is determined whether or not a synchronization signal should be inserted into the frame (S511). When the sync signal is to be inserted, the sync signal is inserted into the stuffing bit (S513). Since the size of the synchronization signal is generally larger than the number of bits of the stuffing space, a part of the synchronization signal is inserted into one stuffing space, rather than all of one sync signal into one stuffing space. That is, one synchronization signal is inserted into the plurality of stuffing spaces. The synchronization signal inserted into the stuffing space includes a portion indicating the presence of the synchronization signal and a portion indicating the position of the text and the number of characters of the output text.

상술한 과정을 각 프레임에 대하여 반복함으로써, 프레임들로 구성된 오디오 파일에 동기신호를 삽입하게 된다.By repeating the above process for each frame, a synchronization signal is inserted into an audio file consisting of frames.

다음으로, 도 5 및 6 을 참조하여 본 발명의 제 2 실시예에 대하여 설명하기로 한다. 도 5 는 본 발명의 제 2 실시예에 따른 동기신호 삽입 과정을 나타낸 흐름도이다.Next, a second embodiment of the present invention will be described with reference to FIGS. 5 and 6. 5 is a flowchart illustrating a synchronization signal insertion process according to a second embodiment of the present invention.

도 5 에 도시하지는 않았지만, 도 4 의 도면부호 501 내지 509 로 도시한 부재들이 도 5 의 도면부호 611 이전에 동일하게 존재하지만, 도시 및 설명의 편의상 생략되어 있으며, 그 구성 및 동작은 도 4 의 도면부호 501 내지 509 로 도시하고 설명한 바와 동일하다.Although not shown in FIG. 5, the members shown by reference numerals 501 to 509 of FIG. 4 exist the same as before 611 of FIG. 5, but are omitted for convenience of illustration and description, and the configuration and operation of FIG. 4 are omitted. Same as those shown and described by reference numerals 501 to 509.

도면부호 611 에서, 동기신호가 삽입될 필요가 있는지가 판정된다. 동기신호가 삽입될 필요가 있는 경우, 스터핑 공간에 동기신호가 삽입된다 (S613).At 611, it is determined whether the synchronization signal needs to be inserted. If it is necessary to insert the synchronization signal, the synchronization signal is inserted into the stuffing space (S613).

동기신호가 삽입될 필요가 없는 경우, 스터핑 공간에 텍스트를 삽입한다 (S615).When the synchronization signal does not need to be inserted, text is inserted into the stuffing space (S615).

동기신호가 삽입되어야 하는 경우, 스터핑 비트에 동기신호를 삽입하게 된다 (S615). 도 4 를 참조하여 상술한 바와 같이, 동기신호의 크기는 일반적으로 스터핑 공간의 비트수보다 크기 때문에, 하나의 동기신호 전부를 하나의 스터핑 공간에 삽입하는 것이 아니라, 동기신호의 일부를 하나의 스터핑 공간에 삽입한다. 즉, 복수 개의 스터핑 공간에 하나의 동기신호를 삽입한다. 스터핑 공간에 삽입되는 동기신호는 동기신호의 존재를 나타내는 부분만을 포함하는 것으로 충분하다. 오디오 파일의 재생시에 있어서, 동기신호가 검출된 프레임의 이전 프레임들의 스터핑 영역에 저장된 정보가 텍스트 정보들의 조각이기 때문에, 이들을 취합하면 동기신호의 존재 검출시에 디스플레이에 출력할 텍스트를 얻을 수 있기 때문이다.When the sync signal is to be inserted, the sync signal is inserted into the stuffing bit (S615). As described above with reference to FIG. 4, since the size of the synchronization signal is generally larger than the number of bits of the stuffing space, the entire synchronization signal is not inserted into one stuffing space, but a part of the synchronization signal is stuffed into one stuffing. Insert it into the space. That is, one synchronization signal is inserted into the plurality of stuffing spaces. It is sufficient that the synchronization signal inserted into the stuffing space includes only a portion indicating the presence of the synchronization signal. In the reproduction of the audio file, since the information stored in the stuffing area of the previous frames of the frame in which the synchronization signal was detected is a piece of text information, combining these results in the text to be output to the display upon detection of the presence of the synchronization signal. to be.

도 6 은 본 발명의 제 2 실시예에 따른 동기신호가 삽입된 오디오 파일을 프레임 단위로 도시한 개략도이다. 도 6 에서, 오디오 파일을 프레임 단위로 구획하여 개략적으로 나타내었다. 각 프레임들에 대하여, 스터핑 영역에서는 텍스트 정보를 포함하고 있거나, 또는 동기신호를 포함하고 있으며, 동기신호가 포함되어 있는 프레임의 재생시점이 그 이전의 프레임에 삽입된 텍스트를 출력하는 시점이 되도록 한다.FIG. 6 is a schematic diagram illustrating an audio file in which a synchronization signal is inserted according to a second embodiment of the present invention in units of frames. In FIG. 6, an audio file is divided into frames and schematically illustrated. For each frame, in the stuffing area, the playback time of a frame containing text information or a synchronization signal and including a synchronization signal is a point in time at which the text inserted in the previous frame is outputted. .

이하, 본 발명에 따른 동기신호 검출 과정을 설명하기로 한다.Hereinafter, a synchronization signal detection process according to the present invention will be described.

도 7 은 본 발명에 따른 동기신호 검출 과정을 개략적으로 설명한 개략도이다.7 is a schematic diagram schematically illustrating a synchronization signal detection process according to the present invention.

도 7 을 참조하면, 본 발명에 따른 동기신호 검출 장치는 메모리 (701), 프레임 분석기 (703), 스터핑 비트 식별기 (705), 동기신호 및 텍스트 구성기 (707), LCD 컨트롤러 (709), 및 LCD (711) 를 포함한다.Referring to FIG. 7, the synchronization signal detecting apparatus according to the present invention includes a memory 701, a frame analyzer 703, a stuffing bit identifier 705, a synchronization signal and text configurator 707, an LCD controller 709, and LCD 711.

메모리 (701) 에는 MP3 오디오 파일이 저장되어 있다. MP3 오디오 파일에 대한 재생 명령에 응답하여, 메모리 (701) 로부터 MP3 오디오 파일의 정보가 판독되어, MP3 스트림의 형식으로 프레임 분석기 (703) 로 전송된다.The memory 701 stores MP3 audio files. In response to the playback command for the MP3 audio file, the information of the MP3 audio file is read from the memory 701 and transmitted to the frame analyzer 703 in the form of an MP3 stream.

프레임 분석기 (703) 는 MP3 스트림의 형식으로 전송된 오디오 파일을 프레임 단위로 분할한다.The frame analyzer 703 divides the audio file transmitted in the form of an MP3 stream in units of frames.

그 후, 스터핑 비트 식별기 (705) 는 각 프레임에 대하여 싱크 및 부 정보를 사용하여 오디오 컨텐츠의 크기를 추출한다. 오디오 컨텐츠의 크기에 기초하여, 스터핑 영역의 비트 크기 및 위치를 알 수 있으므로, 이에 따라 스터핑 영역의 비트 크기 및 위치 파악이 가능하다. 그 후, 스터핑 비트의 존재 여부 및 (존재하는 경우에) 위치 및 크기에 관한 정보가 동기신호 및 텍스트 구성기 (707) 로 전송된다.The stuffing bit identifier 705 then extracts the size of the audio content using the sync and sub information for each frame. Since the bit size and location of the stuffing area can be known based on the size of the audio content, the bit size and location of the stuffing area can be determined accordingly. Thereafter, information about the presence of the stuffing bit and the location and size (if any) is sent to the sync signal and text composer 707.

동기신호 및 텍스트 구성기 (707) 는 검출된 동기신호의 내용을 분석하여, 동기신호가 표시하고 있는 텍스트 파일에서의 텍스트의 위치 및 표시해야 하는 문자열의 길이를 결정하여, 해당 문자열 부분을 텍스트 파일로부터 판독한다. 한편, 텍스트가 MP3 오디오 파일에 포함되어 있는 상기 제 2 실시예의 경우에는, 동기신호가 존재하지 않는 경우에, 스터핑 공간의 비트 내용을 판독하여, 이를 별도의 메모리 공간에 연속적으로 저장하고, 동기신호의 존재가 검출되는 경우에 메모리 공간에 저장된 내용을 텍스트로서 출력하게 된다. 텍스트로 출력된 후에는, 상기 내용은 메모리 공간에서 제거된다. 그 후, 텍스트로 구성된 문자열은 LCD 컨트롤러 (709) 로 전송된다.The synchronizing signal and text configurator 707 analyzes the contents of the detected synchronizing signal to determine the position of the text in the text file indicated by the synchronizing signal and the length of the character string to be displayed, and the corresponding portion of the character string to the text file. Read from On the other hand, in the case of the second embodiment in which the text is included in the MP3 audio file, when there is no synchronization signal, the bit content of the stuffing space is read out and stored in a separate memory space continuously, and the synchronization signal When the presence of is detected, the content stored in the memory space is output as text. After output as text, the contents are removed from the memory space. Thereafter, the character string composed of the text is transmitted to the LCD controller 709.

그 후, LCD 컨트롤러 (709) 는 LCD (711) 에 현재 출력되어 있는 문자열을 지우고 새로운 문자열을 출력하도록 LCD (711) 를 제어한다. 이 경우에, LCD 에 동시에 출력가능한 문자열보다 긴 텍스트를 출력해야 하는 경우라면, 자동으로 문자열이 오른쪽에서 왼쪽으로 스크롤되도록 할 수 있으며, 이러한 스크롤 과정은 당업자라면 누구나 알 수 있다.Thereafter, the LCD controller 709 controls the LCD 711 to erase the character string currently output to the LCD 711 and output a new character string. In this case, if it is necessary to output text longer than a string that can be printed on the LCD at the same time, the character string can be automatically scrolled from right to left, and this scrolling process is known to those skilled in the art.

도 7 의 동기신호 검출 장치는 도 8 및 9 와 같이 디지털 휴대용 재생 장치에서 구현될 수 있다. DSP 에 구현되는 것이 일반적이나, 텍스트 동기화 작업은 MICOM 에서 모든 외부 장치를 제어하고 있으므로 MICOM 에 리소스가 충분히 남아 있다면, 도 8 과 같이 MICOM 에 구현하는 것이 유리하다. 본 발명에서 제안한 방법으로 동기화를 구현할 경우에 소요되는 처리 속도와 메모리가 매우 작기 때문에 MICOM에서 처리해도 충분히 가능하다.The sync signal detecting apparatus of FIG. 7 may be implemented in a digital portable playback apparatus as shown in FIGS. 8 and 9. Although it is common to implement in the DSP, the text synchronization operation is controlled by all external devices in MICOM, so if there are enough resources in MICOM, it is advantageous to implement in MICOM as shown in FIG. Since the processing speed and memory required to implement synchronization by the method proposed by the present invention are very small, it is possible to process in MICOM.

도 8 은 본 발명에 따른 텍스트 동기화를 위한 동기신호 검출 장치를 휴대용 디지털 재생 장치의 DSP 에 구현하는 경우의 내부 구성도이며, 도 9 는 휴대용 디지털 재생 장치의 DSP 에 구현하는 경우의 내부 구성도이다.8 is an internal configuration diagram when the synchronization signal detection apparatus for text synchronization according to the present invention is implemented in the DSP of the portable digital playback apparatus, and FIG. 9 is an internal configuration diagram when the DSP is implemented in the portable digital playback apparatus. .

도 8 과 9 는 일반적인 재생장치의 내부 구성도로, 사용자가 재생버튼을 눌렀을 때, 마이콤에서는 재생할 파일 이름을 가져온다. 재생할 파일이름을 가져온 다음에는 그 파일의 데이터를 읽어서 버퍼에 전달을 하고, DSP 에서는 버퍼에 있는 압축된 데이터를 복호화해서 스피커를 통해서 음악을 들려주게 된다.8 and 9 are internal configuration diagrams of a general playback apparatus. When a user presses a play button, the microcomputer brings a file name to be played back. After getting the name of the file to play, the data of the file is read and transferred to the buffer, and the DSP decodes the compressed data in the buffer to play music through the speaker.

이 과정에 가사나 재생되는 파일의 음성 정보를 액정에 표출하는 본 발명을 삽입하게 되면 전체 구조가 다음과 같이 변경된다. 마이콤에서 재생할 파일을 가져오는 과정은 동일하다. 재생할 파일을 가져온 다음에 재생 파일로부터 읽은 데이터를 버퍼에 전달하고, 전달한 데이터에 동기 신호가 있는지 없는지를 동기신호 검출기에서 찾게 된다. 이때, 동기 신호 검출기에서 동기 신호를 발견하게 되면 마이콩의 컨트롤러에서 동기 신호를 발견했으며, 발견된 동기신호의 내용이 무엇인지를 알려주게 된다. 마이콤의 LCD 컨트롤러에서는 액정 화면에 동기 신호 검출기에서 알려온 정보를 내보내게 된다. In this process, when the present invention which displays the voice information of the lyrics or the file to be reproduced on the liquid crystal is inserted, the overall structure is changed as follows. The process of importing a file to be played in Micom is the same. After retrieving the file to be reproduced, the data read from the file is transferred to the buffer, and the synchronization signal detector determines whether the transferred data has a synchronization signal. At this time, when the sync signal is found in the sync signal detector, the controller of the Mikong finds the sync signal and tells what the content of the found sync signal is. The microcomputer's LCD controller sends the information from the sync signal detector to the liquid crystal display.

도 8 과 도 9 의 차이점은 동기 신호 검출기가 내부의 어디에 위치하느냐만 다른데, 이는 휴대용 재생 장치의 구조적인 특성에 맞게 어떤 형태를 취하든 전체적인 실행 절차는 동일하게 동작된다.The difference between FIG. 8 and FIG. 9 differs only in where the sync signal detector is located. The overall execution procedure is the same regardless of the shape of the portable reproducing apparatus.

본 발명을 특정 애플리케이션에 대한 특정 실시예를 참조하여 설명하였다. 당업계의 통상의 지식을 가지고 본 교시에 접근하는 자는 그 범위 내의 부가적인 변형, 애플리케이션, 및 실시예를 알 수 있다.The invention has been described with reference to specific embodiments for specific applications. Those of ordinary skill in the art, having access to the present teachings, may know additional variations, applications, and embodiments within the scope.

따라서, 첨부된 청구범위는 본 발명의 사상 내의 이러한 임의의, 그리고 모든 응용, 변형, 및 실시예를 커버하도록 의도된다.Accordingly, the appended claims are intended to cover any and all such applications, modifications, and embodiments within the spirit of the invention.

본 발명은 디지털 휴대용 재생장치에 텍스트 동기화 장치를 첨가함으로써, 음악 파일 또는 음성 파일을 재생하면서 자동으로 재생되는 음악의 가사 혹은 음성 내용을 액정에 표시할 수 있는 기능을 제공한다.The present invention provides a function capable of displaying lyrics or voice content of music automatically reproduced while reproducing a music file or an audio file by adding a text synchronization device to the digital portable player.

본 발명은 압축된 파일이 재생되는 중에 음악 파일에 은닉되어 있는 동기신호를 실시간으로 검출하여 컨텐츠 파일 현재 재생되는 시점과 동기를 맞추어 액정화면에 디스플레이한다. 따라서, 사용자는 재생장치의 액정화면을 통해서 현재 재생되는 내용을 확인할 수 있게 된다. 또한, 텍스트 정보와 텍스트가 출력되어야 할 시점까지 모든 정보를 디지털 컨텐츠에 은닉함으로써 사용자가 부가적으로 텍스트 파일이나 기타 정보를 자신의 휴대용 재생 장치에 저장하지 않아도 된다. The present invention detects a synchronization signal hidden in a music file in real time while the compressed file is being reproduced, and displays it on the LCD screen in synchronization with the time point at which the content file is currently played. Therefore, the user can check the content currently being played back through the LCD screen of the playback device. In addition, by hiding all the information in the digital content until the text information and the time at which the text should be output, the user does not need to additionally store the text file or other information in his portable playback device.

본 발명은 일반 음악의 가사를 비롯해서 외국어 학습을 위한 교재 내용까지 포괄적으로 활용할 수 있기 때문에 어학 학습용 디지털 휴대용 재생 장치에 매우 효과적으로 이용될 수 있다.The present invention can be used effectively in a digital portable playback device for language learning because it can comprehensively utilize the contents of teaching materials for foreign language learning as well as lyrics of general music.

기존 재생장치에서 텍스트를 보여주는 장치들은 재생되는 시간에 따라 임의로 텍스트 정보를 액정화면에 출력하기 때문에 실제로 재생되는 내용과 액정에 출력되는 내용이 일치하지 않으며, 자동으로 현재 재생되는 내용을 액정에 디스플레이하는 기능은 구현되어 있지 않다. The devices that display text in the existing playback devices randomly output text information according to the playback time. Therefore, the actual playback content does not match the output on the liquid crystal display. The function is not implemented.

본 발명을 이용하게 되면, 각종 압축 포멧 파일을 재생하면서 재생되고 있는 내용을 액정화면으로 재생 내용을 정확하게 출력이 가능하게 된다.According to the present invention, it is possible to accurately output the contents reproduced on the LCD screen while the various contents of the compressed format file are reproduced.

도 2 는 TTS 기술로 생성된 음성 파일과 텍스트를 동기화시키는 과정을 도시한 개념도이다.2 is a conceptual diagram illustrating a process of synchronizing text with a voice file generated by the TTS technology.

도 3 은 MP3 프레임의 구조를 나타내는 도면이다.3 is a diagram illustrating a structure of an MP3 frame.

도 4 는 본 발명의 제 1 실시예에 따른 동기신호 삽입 과정을 나타낸 흐름도이다.4 is a flowchart illustrating a synchronization signal insertion process according to a first embodiment of the present invention.

도 5 는 본 발명의 제 2 실시예에 따른 동기신호 삽입 과정을 나타낸 흐름도이다.5 is a flowchart illustrating a synchronization signal insertion process according to a second embodiment of the present invention.

도 6 은 본 발명의 제 2 실시예에 따른 동기신호가 삽입된 오디오 파일을 프레임 단위로 도시한 개략도이다.FIG. 6 is a schematic diagram illustrating an audio file in which a synchronization signal is inserted according to a second embodiment of the present invention in units of frames.

도 7 은 본 발명에 따른 동기신호 검출 과정을 개략적으로 설명한 개략도이다. 7 is a schematic diagram schematically illustrating a synchronization signal detection process according to the present invention.

Claims

A plurality of frames each having a first portion in which audio content is stored, a second portion having at least information about the size of the first portion, and a third portion whose starting position and size are determined according to the size of the first portion; In the method for inserting a synchronization signal into an audio file comprising:

Obtaining information about the size of the first portion of the frame from the second portion of the frame;

Determining a starting position and size of a third portion of the frame based on the obtained information; And

And inserting at least a portion of the synchronization signal into the third portion of the frame.

The method of claim 1,

And the synchronization signal includes information about a position of text corresponding to the first portion of the frame.

The method of claim 1,

Inserting at least a portion of a sync signal into the third portion of the frame,

Determining whether to insert a synchronization signal into the third portion of the frame; And

And in response to the non-insertion determination of the synchronization signal, inserting text information corresponding to the first portion of the frame into the third portion of the frame.

The method according to any one of claims 1 to 3,

When the synchronization signal insertion region in the third portion is compared with the magnitude of the synchronization signal, and the synchronization signal insertion region in the third portion is smaller than the size of the synchronization signal, And inserting a portion of the synchronization signal into the third portion.

The method of claim 1,

And the audio content is generated by text-to-speech (TTS) conversion of the text.

The method of claim 1,

And the third portion includes an area indicating the presence of a synchronization signal and an area indicating the content of the synchronization signal.

A plurality of frames each having a first portion in which audio content is stored, a second portion having at least information about the size of the first portion, and a third portion whose starting position and size are determined according to the size of the first portion; In the method for detecting a synchronization signal from an audio file comprising:

Extracting information about a start position and a size of the third part based on the information about the size of the first part;

Analyzing the third portion to determine whether a synchronization signal is present; And

In response to determining the presence of the synchronization signal, obtaining at least a portion of the synchronization signal from the third portion.

The method of claim 7, wherein

And in response to determining the absence of a synchronization signal, extracting text information from the third portion.

The method of claim 7, wherein

And analyzing the content of the synchronization signal, and then selecting a position of a corresponding text based on the analysis.

The method according to claim 7 to 9,

And combining at least a portion of the synchronization signal with at least a portion of the synchronization signal of a subsequent frame if at least a portion of the synchronization signal obtained from the third portion is not the same as the synchronization signal. Signal detection method.

A plurality of frames each having a first portion in which audio content is stored, a second portion having at least information about the size of the first portion, and a third portion whose starting position and size are determined according to the size of the first portion; An apparatus for detecting a synchronization signal from an audio file comprising:

On the basis of the information on the size of the first portion, information on the start position and size of the third portion is extracted, and the third portion is analyzed to determine whether there is a synchronization signal presence plate to determine the presence of a synchronization signal. government; And

And a synchronizing signal acquiring unit for acquiring at least a portion of the synchronizing signal from the third portion in response to the determination of the existence of the synchronizing signal.