WO2005104549A1

WO2005104549A1 - Method and apparatus of synchronizing caption, still picture and motion picture using location information

Info

Publication number: WO2005104549A1
Application number: PCT/KR2005/000476
Authority: WO
Inventors: Jong-Sik Woo
Original assignee: Jong-Sik Woo
Priority date: 2004-04-27
Filing date: 2005-02-22
Publication date: 2005-11-03
Also published as: KR20060000172A; KR100573219B1

Abstract

Disclosed are a method and apparatus of synchronizing a caption in an audio file format (e.g., wav, MP3, wma, ogg, asf, etc.) reproduced in a bit steam, a musical instrument digital interface (MIDI) file format for reproducing an audio, and a file format combined with a picture and an audio data reproduced in a bit stream, regardless of compression, and, more particularly, to a method and apparatus of synchronizing a caption, in which an interested location information is inputted every bit and a caption is synchronized in various file formats, such as a bit stream file format, an interface file format or a multimedia file format, so that the caption may be easily modified to variable bit rate, zipping or a new multimedia file format, and the caption is synchronized by use of synchronization information produced from an appliance (e.g., mobile devices and computer system) to be consistently track or color according to the audio when the audio is reproduced ,regardless of the variable bit rate like a computer music player.

Description

Description METHOD AND APPARATUS OF SYNCHRONIZING CAPTION, STILL PICTURE AND MOTION PICTURE USING LOCATION INFORMATION Technical Field

[1] The present invention relates to a method and apparatus for generating synchronization to synchronize a caption for an audio file format (e.g., wav, MP3, wma, ogg, asf, etc.) reproduced in a bitstream, a musical instrument digital interface (MIDI) file format for audio reproduction, and a file format having video and audio data reproduced in a bitstream (e.g., wmv, mpeg4, mpeg2, mpegl, etc.) whether the file is compressed or not. The present invention also relates to a method and apparatus for generating synchronization to synchronize a caption, in which location information is input per a bit and a caption is synchronized in various file formats, such as a bit stream file format, an interface file format or a multimedia file format, so that the caption is easily modified according to a speed change, a fast feed or a new multimedia file format, and the caption is synchronized by use of synchronization information produced from an appliance (e.g., a mobile device and computer system) to be consistently tracked or colored according to the audio upon reproducing the audio whether the speed is changed or not, as in a computer song accompaniment device.

[2] The present invention provides a method for generating and regenerating syn- chronzation to synchronize an interface file format (MIDI) and a bitstream file format by performing a first step of assigning position information, a second step of adjusting a size of information, and a third step of composing a new bitstream. The present invention provides a synchronization generator comprising a synchronization generating input unit, a file system buffer and buffer information, a data player, a counter and a position information assigning unit; and a synchronization regenerator comprising a synchronization regenerator input unit, file system buffer information, a counter, a buffer and a synchronization position information comparator.

[3] Background Art

[4] There are five conventional methods for synchronizing a caption. Captioning is performed using different methods depending on formats of files.

[5] The methods include a method of separately assigning a song word track in MIDI that is musical performance information and directly embedding a caption into the track; a method of embedding one link information into one caption character on a song word track and storing a song word as a separate text file in MIDI that is musical performance information; a method of separately storing time information upon reproduction in an audio bitstream format such as MP3 or incorporating playback time information into a text file; a method of parsing an algorithm in a bitstream file format such as MP3 to replace a spare bit in the bitstream that is not used upon reproduction with caption character data; and a method of embedding a caption as an image into video screen frame data in a multimedia file format such as a CDkaraoke having mixed audio and video information.

[6] Specifically, in the first and second methods, a computer song accompaniment device, which is representative of a captioning technique, is composed of a musical performance information file called a MIDI. MIDI itself includes musical performance information rather than a song. MIDI is a control file that indicates instruments for musical performance, a length of musical performance, and a method of musical performance, and is different from a music file. Musical performance information allocated to one instrument is called a track. In the methods, a song word represented on a musical note is regarded as one instrument, and the musical performance length is allocated as note rhythm. In the first method, a song word or a caption track is separately allocated according to a musical note on a track composing a MIDI format, and a song word is directely embedded into the caption track. In the second method, a song word is not directly embedded into a separately constructed caption track. Instead, a specical character representing a corresponding caption character (commonly, "_") and musical performance length information are allocated to the track. The special character is mapped to a character included in an external text file in one-to-one correspondence, such that a detailed character tracking is allowed. The method is possible only if musical note information for a caption is known. Further, MIDI requires a number of professional manual tasks on each note represented in a musical note, as well as being subject to arrangement. The caption information conserves a format unique to the MIDI and thus users, not MIDI experts, get into difficulties when the caption is not synchronized.

[7] FIG. 1 illustrates MIDI information configured in the form of tracks depending on musical instruments. The structure of the MIDI bitstream begins with a MIDI header and includes 13 tracks, as shown in FIG. 1. That is, each of coloumns is called a track and a second track includes musical performance information for the caption. Each track includes information such as channel/patcli/bank/name and musical performance information for a relevant musical instrument. In FIG. 1, musical performance information for caption is shown at a right of the second track "word". In musical performance information, one character is specified per one bar, bar information indicates start information of one character, caption coloring duration information is represented by a number of bars, notes, and rhythms in the MIDI data. To use a caption in a musical performance information file in the form of MIDI, a MIDI file format and production of the file should be known so that a synchronization task is performed.

[8] The third and fourth conventional methods are methods for synchronizing caption appearing in an audio bitstream. A method for synchronizing caption of MP3 and prior art as a compressed audio will be discussed.

[9] First, an MP3 format will be discussed briefly.

[10] An MP3 file stored in a computer or electronic device and represented in a bitstream using a file system is shown in FIG. 2.

[11] In FIG. 2, the compressed MP3 file is composed of an indicator for distinguishing between a frame called a header frame and data, and the header fame includes Info having synchronization information indicating a frame start and various algorithm information needed for restoration.

[12] A multimedia file reproduced in the form of bitstreams is divided into frames. The frame is a basic unit composing the bitstream. Individual frames have independent information, major information required for restoration of information is stored in the frame header.

[13] Frame header information of the MP3 file is as follows:

[14] 1. Syncword

[15] The syncword is composed of Oxfff and represents the beigining of the frame. It has the same form as layer- 1, 2 and 3

[16]

[17] 2. Frame information

[18] Frame information is composed of the following information. Normal audio is reproduced by parsing a main text using various frame information according to various algorithm.

[19] ID[1 bit]: 1 - MPEG Audio, 0 - reserved

[20] Layer[2 bit] : 00 - Layer I, 10 - Layer π

[21] 01 - Layer m, 11 - reserved

[22] protection bit[l bit]: indicates whether an error correction code is used.

[23] 0: CRC check, 1 : reserved

[24] Bitrate index [4 bit]: indicates bit rate.

[25] Sampling frequency [2 bits]: 00 - 44J Khz, 01 - 48 Khz

[26] 10 - 32 Khz, 11 - reserved

[27] Padding bit[l bit]: the padding bit becomes 1 to match an average bit rate only if the sampling frequency is 44J Khz and is 0 otherwise.

[28] Private bit[l bit]

[29] Mode[2 bit]: 00 (stereo), 01 (joint stereo), 10 (dual ch), 11 (single ch)

[30] Mode extension[2 bits] : is used only for the joint stereo in which all bands excluding a next band are encoded into stereo.

[31] 00 - 4 to 31 subbands

[32] 01 - 8 to 31 subbands

[33] 10 - 12 to 31 subbands

[34] 11 - 16 to31 subbands

[35] Copyright[l bit]: 0 (no copyright), 1 (copyright protect)

[36] Original[l bit]: 0 (copy), 1 (original)

[37] Emphasis[2 bit]: 00 (no emphasis), 01 (50/15 usec. Emphasis), 10 (reserved), 11 (CCITT JJ7)

[38]

[39] 4. Side information

[40] Main_data_begin [9 bits]: specify a position with which main data in a frame begins

[41] Private_bits [5, 3 bit]: personal bit

[42] Scfsi[scfsi_band] [1 bit]: presence or non-presence of a scale factor

[43] Part2_3_length [12 bits]: scale factor and huffman code data

[44] Global_gain [8 bit]: specify a qauntizer step upon requantization

[45] Scalefac_compress [4 bits]: a number of bits used to transmit a scale factor

[46] Subblock_gain [3 bits]: gain offset of respective subblocks at a global gain

[47] Preflag [1 bit]: detemiine whether to amplify a high frequency component

[48] Scalefac_scale [1 bit]: determine a step size upon quantization

[49] Scalefac_scale = 0: stepsize sqrt(2), scalefac_scale = 1: stepsize 2

[50]

[51] In the third conventional method, actual playback time information becomes a reference of the caption synchronization. To calculate the actual playback time, the frame number and the frame header information contained in the frame are parsed. That is, a number of the frames from an initial or specific position and header information are calculated and translated into actual restoration playback time. Based on the playback time, a playback time information table is created and is mapped to external text file in an one-to-one correspondence or the playback time information is mixed with the caption data, such that caption synchronization containing the playback time information is obtained. The playback time information is represented by actual playback time of a corresponding audio file, a number of samples of a restored bitstream or the like. Table 1 illustrates synchronization information that is LRC, currently supported by an application audio player called WINAMP.

[52]

[53] Table 1 <Table 1>

[54] [55] In the method in which playback time information is mixed with caption data, it is impossible to immediately recognize playback time information of the MP3 portion, when fast forward/fast rewind functions are performed in a typical appliance, when fast forward/backward feed is performed from a current reproduction position for playback, when movement to any position is performed on a computer program for playback, when a file is compressed at various bit rates, or when an MP3 reproduction speed is changed. Accoridngly, it is difficult for the method to acquire synchronization and consume much time to calculate the playback time information, thereby causing errors. Further, most of audio compression file formats have a different compression rate depending on the amount of audio information. To obtain correct playback time of the position, it is required to parse information on headers of all frames from the beginning of the compressed audio file to a desired point upon performing the fast feed operation on the compressed audio file, as in the MP3 file. The frame header information includes various variables, such as a frequency, a bit rate, a compression rate, a channel number and the like. This requires much calculations and hardware resources. This method has a shortcoming in that it cannot discover a correct synchronization position when failing to calculate a number of frames from an initial or specific position. Further, as described above, individual frame header information is complex, and to calculate the header information, it is required to accurately understand the algorithm as well as requiring computer resources and time to perform operations. Thus, the prior art has a difficulty in performing real-time synchronization, as in a computer song accompaniment device.

[56] In the fourth conventional method, MP3 file bitstreams are parsed to use spare in- formation bits, which are not used for restoration. In the MP3 bitstream, an internal frame structure has a spare information space called an anciliiary data, as shown in FIG. 3. Caption data may be embedded in the anciliiary space, as shown in FIG. 4. The thus embedded caption information is used to synchronize the caption while the frame is being reproduced to assist in acquiring correct synchronization. However, changing the thus stored digital content into another file format is not impossible but requires much time and costs. In addition, there is a difficulty in that it is required to parse an algorithm in order to discover spare bits on a file bitstream that are not used for reproduction so that a caption is synchronized each time a new file format is created.

[57] The fifth conventional method is used for a multimedia file having mixed audio and video data. It has been mainly used for a video CD or a karaoke CD.

[58] The video CD or karaoke CD will be briefly discussed.

[59] CD-ROMs for recording compressed video and audio signal thereon and realizing a digital motion picture are called a video CD, a karaoke CD or a compact disc-interactive/digital video (CD-I/DV). The structure of data contained in the video CD is shown in FIG. 5. These CDs have a different name but have the same principle of recording and reproducing compressed video and audio signals on the CD. Information compression (encoding) for video or audio signal has been widely studied. An international standard for digital motion picture and audio compression (encoding) is generally called a MPEG. Even in the video CD or digital video karaoke, an MPEG information compression technique is applied to the motion picture or audio signal.

[60] In the video CD having a playback time as a key factor, a standard data transmission rate has been adopted to obtain a 74-minute playback time, as in a music CD. Accordingly, video and audio signals on the video CD should satisfy a strict condition of data transmission rate of up to 1.4 MPBS. For a limited transmission rate, a rate of distribution of an amount of codes to the video and audio becomes a key factor to determine the quality of the system.

[61] In the video CD, an amount of audio code is first determined. An amount of codes 224 KBPS is selected out of 32 KBPS-448 KBPS by considering a quality of sound and a scale of decording hardware. This amount of codes is detemiined from several evaluation experiments for a quality of sound performed by music software expert, with which a quality of sound comparable to CD can be obtained in most music. If the amount of audio codes is detemiined, an occupying rate (about 16.4%) of audio sectors is determined depending a recording format of an MPEG bit string which will be described later, the remaining amount is allocated as a video sector. In this case, an amount of video compressed codes is calculated to be about 1.15 MB PS, and its structure is shown in FIG. 6. In the structure of FIG. 6, video and audio sector sequences can be changed. In the video CD, a background video and a caption coloring data corresponding to an audio data area are embedded into one screen. For example, as shown in FIG. 6, caption coloring data corresponding to A 1 is embedded into video data VI 0, V11, V12, V13 and V14 after a video synchronization sector (Vsy) and audio synchronization sector (Asy) appear. If Al audio data is "ABCDE", V10, VI 1, V 12, VI 3 and V14 video data has video data having a background image, an "ABCDE" image, and "ABCDE" coloring image, as indicated as V10 of FIG. 7, VI 1 of FIG. 8, V12 of FIG. 9, V13 of FIG. 10, and V14 of FIG. 11. Accordingly, even in fast feeding, position movement, speed change and the like, caption tracking is possible. However, equipment, time, and costs are required to apply an alrealy fixed digital content to a new multimedia file format. Further, it is originally impossible for a user other than a content provider to perform a synchronization task.

[62] Disclosure of Invention Technical Problem

[63] As described above, there are problems associated with the conventional method for synchronizing a caption in that it does not support fast forward/backward feed and reproduction, movement to an aribitary position and reproduction, and speed change- dependent caption tracking, as well as requiring a professional task due to difficult synchronization and consuming synchronization time and costs. Technical Solution

[64] The present invention has been made to solve the aforementioned conventional problems. An object of the present invention is to provide a method and apparatus for generating and regenerating synchronization, in which bitstream "position information" of a file format reproduced in a bitstream and tick "position information of an interface file format having musical performance information is extracted and allocated as caption position information with respect to an audio and multimedia file format that can be reproduced in a bitstream and an interface file format having musical performance information, such that the caption is tracked even though which position of the file bitstream the reproduceotin begins with since the position information of the reproduced bitstream is known. Advantageous Effects

[65] That is, the caption can be correctly synchronized to be tracked or colored upon reproducing a file whether the speed is changed or not, as in a computer song accompaniment device, using synchronization information generated by an appliance including a portable device and a computer system. Accordingly, it is possible to support caption tracking according to fast forward/fast rewind, movement to any position and playback, and speed change, and to easily modify composed digital contents. Further, it is possible to easily modify existing digital contents without parsing a new algorithm and file format and needing much time and costs whenever a new format (e.g., WMA, OGG VOBIS, etc.) is added to multimedia information composed in various files and compression formats.

[66] According to one aspect of the present invention, there is provided a method for generating and regenerating synchronization for a caption, a still picture and a motion picture using position information in synchronizing an interface file format (MIDI) that is musical performance information, the method comprising: a first step of assigning tick position information using a tick counter value in an interface file format; a second step of adjusting the size of information using the allocated tick position information; and a third step of separately storing synchronization information related to the position information and the caption and composing a new synchronized bitstream.

[67] According to another aspect of the present invention, there is pvovided a method for generating and regenerating synchronization for a caption, a still picture and a motion picture using position information in synchronizing a file format that is reproduced in a bitstream, the method comprising: a first step of assigning bitstream position information using a bitstream byte counter value; a second step of adjusting the size of information using the allocated bitsteam position information; and a third step of separately storing synchronization information related to the position information and the caption and composing a new synchronized bitstream.

[68] In an embodiment, the position information in the first step includues start position information and end position information.

[69] A multiple of position information in the second step is used to adjust the information size.

[70] A difference information between start position information and end position information in the second step is used to adjust the information size.

[71] The third step comprises mixing the synchronization information and the caption.

[72] The third step incorporates the synchronization information and the caption into a MIDI file.

[73] The third step comprises: storing the position information for a file bitstream in a file reproduction portion from a file start portion; performing simple addition and subrraction operation on an absolute position of the file bitstream in the file reproduction portion to acquire synchronization upon fast feed; and representing an absolute position corresponding to a speed change as a caption depending on a variable speed upon audio speed change.

[74] According to another aspect of the present invention, there is provided a synchronization generator for a caption, a still picture, and a motion picture using position information, the synchronization generator compring: a synchronization input unit for reading an audio file format, an audio musical performance information format, and a multimedia file format having audio and video information to form a bitstream; a buffer and buffer information of a file system for storing a bitstream read from the respective file formats; a data player for playing back data stored in the buffer; a counter for measuring a number of ticks or bytes; and a position information assigning unit for assigning position information using a value measured by the counter.

[75] According to another aspect of the present invention, there is provided a synchronization regenerator for a caption, a still picture and a motion picture using position information, the regenerator comprising: a synchronization regenerator input unit for inputting an interface file or a bitstream; a file system buffer for storing the interface file or bitstream input by the synchronization regenerator input unit; a couter for measuring a number of ticks or bytes; a regenerator buffer for aligning transferred data using buffer information of a file system; and a synchronization position information comparator for comparing a value of the counter to synchronization position information for the caption and still/motion picture data.

[76] Brief Description of the Drawings

[77] The above objects, other features and advantages of the present invention will become more apparent by describing the preferred embodiment thereof with reference to the accompanying drawings, in which:

[78] FIG. 1 illustrates MIDI information;

[79] FIG. 2 is a diagram illustrating the structure of an audio data bitstream in an MP3 format;

[80] FIG. 3 is a diagram illusrtrating the internal structure of a frame in an MP3 bitstream;

[81] FIG. 4 illustrates embedding data used for caption;

[82] FIG. 5 illustrates the structure of data contained in a video CD;

[83] FIG. 6 illustrates an arragement of video sectors and audio sectors in a video CD;

[84] FIG. 7 illustrates caption coloring for a video sector V 10 in a video CD form;

[85] FIG. 8 illustrates caption coloring for a video sector VI 1 in a video CD form;

[86] FIG. 9 illustrates caption coloring for a video sector V 12 in a video CD form;

[87] FIG. 10 illustrates caption coloring for a video sector V13 in a video CD form;

[88] FIG. 11 illustrates caption coloring for a video sector V 14 in a video CD form;

[89] FIG. 12 illustrates the structure of a MIDI file header and multiple tracks;

[90] FIG. 13 is a block diagram of a MIDI synchronization generator;

[91] FIG. 14 is a block diagram of a synchronization generator for an audio file that can be reproduced in a bitstream; [92] FIG. 15 -Illustrates a bitstream that is a series of bytes of an audio file;

[93] FIG. 16 is a block diagram of a synchronization generator for a multimedia file format;

[94] FIG. 17 is a method of generating synchronization for a file format including audio and video;

[95] FIG. 18 illustrates generating synchronization for a still picture and a motion picture to audio data;

[96] FIG. 19 illustrates a format of a caption and still picture synchronization data;

[97] FIG. 20 is a block diagram of a synchronization regenerator for a musical performance information interface file format;

[98] FIG. 21 is a block diagram of a synchronization regenerator for an audio file format;

[99] FIG. 22 illustrates fast forward/backward movement upon reproducing a compressed audio bitstream; and

[100] FIG. 23 is a block diagram of a synchronization regenerator for a multimedia file format.

[101] Best Mode for Carrying Out the Invention

[102] Reference will now be made in detail to preferred embodiments of the present invention, examples of which are illustrated in the accompanying drawings.

[103] A computer device is generally composed of a hardware system comprising an application program, an operating system, and a storage device. A flow of data and control information is bi-directional between the application program and the operating system and between the operating system and the hardware system. The application program is a media player on the computer device and performs a number of functions, such as file reading, file storing, playback, stop, pause, fast feeding, speed changing and the like through a program user interface. When a file to be reproduced is read through the application program, the program user interface reads data in the storage device that is the hardware system through a file system in the operating system, stores the data in a series of bitstreams in a buffer, reproduces the stored value in the buffer using the application program, and sends back the reproduced data to the operating system. The operating system sends the data to an audio device or video device that is the hardware system. Each of a synchronization generator and a synchronization regenerator used in the present invention is composed of an application program having input and output functions, a file system included in an operating system, and a hardware system. Each is implemented by an electronic device and a portable device including a program on a computer or a computer. [104] In a computer device including a portable player, generation and regeneration of digital data synchronization are made through a data structure called a file that is a collection of bits constructing the data. Although a conceptual file has a form with an end-of-file mark attached to a series of bytes, an actual file is configured as a combination of data blocks on a disk, tape or main memory. All files have their name so that a program handles the files, and the names form parts of a greater data structure called a file system in an operating system.

[105] File accessing systems have their file system even though there is a difference in complexity. The file systems may be represented in various forms such as UNIX, LINUX, MS"s FAT, NTFS, a user-defining file system and the like, but have a similar file system structure. The file is composed of several data blocks in the file system, and contents of executable binary codes or texts desired by a user may be included in each data block. The internal structure of the file can be modified by a program. There is no association between a content of the file and a file name, and an extension (e.g., .TXT or .MP3) after the file name is used to distinguish between the natures of the files. Input/output managemet of the files using the file system is to peform buffering so that tasks are peformed in a bitstream format arranged in a line within a system task environment irrespective of a structure in or position at which data are written in various memory spaces of a disk. Accordingly, the user may regard the file as a simple form of a connection of bytes. That is, although on the system, one file is divided into data blocks and stored in several palces on the disk, but all logically stored bytes are shown to the user as a bitstream arranged in a line. In other words, the user is allowed to manipulate the file irrespective of an internal storage structure of one file. Files read by the file system are reproduced in a bitstream by a program.

[106] Files read by the file system are configured in one bitstream format through buffering. The bitstream is configured independently from system variables other than the files, or other files. The bitstream is also accessible in a bit or byte unit on the computer or electronic device, and the system has a file pointer therein for accessing on a file basis. As stated above, the file system is for logically connecting actual data distributed at several places of the storage device into one bitstream. Inputing/ outputing/data processing are controlled through a function called a file pointer.

[107] According to the present invention, the method for generating synchronization in a caption synchronization generator using position information may be classified into two types depending on whether to reproduce in a bitstream file format or in an interface file format.

[108] The MIDI that is a musical instrument digital interface file format is composed of multiple tracks. Each of the tracks indicates instruments, musical performance length, intensity of musical performance and the like, as described above. When a file is read through the file system, a MIDI file header and multiple tracks are arranged in a line, as in FIG. 12. Each track has a bitstream format over time and a MIDI player handles the multiple tracks in parallel. That is, as shown in FIG. 12, various instruments appearing at one point of SLOT are simultaneously activated and sounds of the instruments several tracks are simultaneously output and synthesized. Accordingly, it is impossible to reproduce the musical performance information interface file format as a bitstream format over time.

[109] The method for generating and regenerating synchronization for an interface file format (MIDI) as musical performance information according to an embodiment of the present invention will be described.

[110] At a first step, start position information or both start position information and end position information are assigned using a tick counter value of an interface file format corresponding to an embedded position of a relevant caption. At a second step, information size is adjusted using the allocated file bitstream position information, a multiple of the position information, or difference information between the start position information and the end position information. At a third step, the synchronization information related to the position information and the caption are separately stored, mixed, or incorporated in a multimedia file to compose a new synchronized bitstream.

[Ill] FIG. 13 illustrates a MIDI synchronization generator. When an input unit of the synchronization generator that is an application program selects a MIDI file to play back, the file system in the operating system reads data at several places in the storage device and inputs the data in a series of bitstreams in a MIDI data buffer. The tick counter value is initialized into "0." When the synchronization generator input unit that is the application program then inputs a reproduction signal, the data is input from the MIDI data buffer to the MIDI data player and the tick counter value is automatically incremented by a number of input bytes. That is, the data player translates the tick counter value into musical performance information such as bar, note, rhythm length and the like in Table 2 (prior art), and synthesizes it with an external sound souce to carry out musical performance.

[112] At this case, when there is a corresponding caption when the data player synthesizes the sound to output audio signal, the position information assigning unit with the tick counter assigns and uses a tick counter value as the caption position information. The synchronization position information of the corresponding caption becomes an integer value indicating that the caption information corresponding to the MIDI corresponds to which byte from the beginning of the file pointer and the file in the MIDI data buffer.

[113] Table 2 shows an example of the position information according to the present invention. [114] Table 2 <Table 2> Comparative table for bar, note and rhythm length of a prior art translated by MIDI player and position information generated by tick counter according to the present invention MTrkTrack Number =2Length = 650Track name: Wordsόl : 1 :0 (0)(@LENGL)131:1:0 (0) (@TWONDERFUL TONIGHT)el:l:0 (0)(@TEric Clapton)510:2:48(3600)510:3:0 (3648)(late)510:3:48 (3696)(in)410:4:0(3744)(the)410:4:48(3792(eve)411:l:48(3888)(ning)612:2:48(4368) (/She's)612:3:0(4416(wond')412:3:48(4464)(ring)

[115]

[116] The method for generating and regerenrating synchronization for a file format that can be reproduced in a bitstream according to another embodiment of the present invention will be described.

[117] At a first step, start position information or both start position information and end position information are assigned using a tick counter value of an interface file format corresponding to an embedded position of a relevant caption. At a second step, information size is adjusted using the allocated file bitstream position information, a multiple of the position information, or difference information between the start position information and the end position information. At a third step, the synchronization information related to the position information and the caption are separately stored, mixed, or incorporated in a multimedia file to compose a new synchronized bitstream.

[118] As such, in the present invention, the synchronization information is generated using the position information from the beginning of the file bitstream to completion of caption synchrornization, i.e., information indicating that data is which byte on the bitstream, unlike a conventional synchronization method in which actual playback time information is not stored and a header or frame information of various file formats is not parsed.

[119] FIG. 14 shows a synchronization generator for an audio file format that can be reproduced in a bitstream. When a file to be played back is selected by the synchronization generator input unit that is the application program, the file system in the operating system reads data in a block form stored in several places of the storage device and shows the data as a file as one logical bitstream form. The bitstream is stored in the audio bitstream buffer, and an initial value of the byte counter becomes "0." In this case, when the synchronization generator input unit inputs a reproduction signal, a number of bytes input from the audio bitstream buffer to the audio player is stored in the byte counter.

[120] When there is a caption corresponding to an audio upon reproduction, the position information assigning unit using the byte counter assigns the caption position information as a value of the byte counter. That is, the synchronization position information of the corresponding caption becomes an integer value indicating that the caption information corresponding to the audio corresponds to which byte from the beginning of the file pointer and the file in the audio bitstream buffer.

[121] The process of assigning the position information at the position information assigning unit using the value measured by the byte counter will be describebd in more detail.

[122] First, if a first byte in the bitstream buffer is input to the audio player, a position information value is incremented to "1." Then, if a second byte in the bitstream buffer is input to the audio player, the position information value is incremented to "2." If a third byte in the bitstream buffer is input to the audio player, the position information value is incremented to "3." At this time, a sound corresponding to the caption begins to be output, position information "3" as start position information is assigned to the caption information. If a fourth byte in the file bitstream is input to the player, the position information value is incremented to "4." At this time, if sound corresponding to the caption is ended, position information "4" as end position information is assigned to the caption information. Then, since the caption position information starts at "3" and ends at position information "4", the caption begins to be colored upon reproducing the third byte and one caption coloring is completed upon reproducing the fourth byte to perform fine caption synchronization.

[123] That is, information indicating that a byte of a currently restored and reproduced bitstream at the corresponding point corresponds to which byte from the beginning of the file bitstream is mixed with the caption information to obtain the position information, and the position information is used to generate the synchronization information. The position information includes relative position information value obtained through operation such as the allocated position information, or a multiple of the allocated position information, a difference value with the allocated position information or the like.

[124] A method for synchronization using a synchronization generator will be described.

[125] When synchronization is desired to be generated for an MP3 file that can be reproduced in a bitstream, the synchronization generator input unit that is the application program selects 'to read a file', the file system in the operating system reads the MP3 file and forms a bitstream that is a series of bytes, as in FIG. 15. The bitstream is arranged in a temporal flow and the byte information is obtained by compressing the audio data. If a corresponding audio and caption is output as a sound "A." when B2 in the bitstream is input to and played back by the audio player, start position information is assigned as "2" and the end position information is assigned as "3" since the caption "A" is from the beginning of the second byte to the beginning of the third byte.

[126] If a corresponding audio and caption is output as a sound "B" when B3 in the bitstream is input to and played back by the audio player, the start position information is assigned as "3" and the end position information is assigned as "4" since the caption "B" is from the beigning of the third byte to the beginning of the fourth byte. The same applies to the captions "C", "D" and "E." For each of the captions "ABCDE," the position information may be variously represented by, for example, "(2,3)(3,4)(4,5)...", "(2)(3)(4)...", or "(2J)(3J)(4J)..." that are a pair of (startposition information, end position information), only (startposition information), or a difference value between the position information (startposition information, end position information- startposition information), respectively. The synchronization information can be generated in various forms by mixing the position information and the caption. Further, the caption and the synchronization information may be managed as separate files, incorporated before or after the audio file, or included in an ID3 tag added to the audio file format.

[127] FIG. 16 is a block diagram of a synchronization generator for a multimedia file format that can be reproduced in a bitstream. When the synchronization generator input unit that is an application program selects 'to read a file' the file system in the operating system reads a multimedia file and inputs to a multimedia bitstream buffer, and the byte counter value is initialized to "0." When the synchronization generator input unit inputs a reproduction signal, the multimedia bitstream buffer inputs data to the multimedia player, and the byte counter value is incremented by a number of input bytes. When there is a caption corresponding to the multimedia audio, the byte counter value is allocated as the position information of the caption.

[128] FIG. 17 is a method of generating synchronization for a file format including audio and video according to the present invention. In FIG. 17, there are shown a bitstream having video and audio data, caption data synchronized to audio information, and byte information in the bitstream. According to the synchronization method, start position information of the caption "A" is allocated as "12" and end position information as "13." Start position information of the caption "C" is allocated as "18" and end position information as "19." Start position information of the caption "D" is allocated as "24" and end position information as "25." For successive captions, the start position information of (12, 18, 24) may be used to form synchronization information, the start and end position information of (12J3)(18J9)(24,25) may be used to form synchronization information, or a difference between the start and end position in- formation may be used or the opration may be performed on the position information to form various synchronization information. For example, when the start position information allocated in a caption phoneme unit are a, b, c, d,... in sequence, synchronization information may be various represented as (a-0,b-a,c-b,d-c,...) to reduce a size of the synchronization information or may be formed as (a/512,b/512.c/512,d/512,...), ((a-0)/64, (b-a)/64, (c-b)/64, (d-c)/64,...) or the like.

[129] Further, in FIG. 17, the position information is allocated in a phoneme unit to finely make the caption as a coloring in a phoneme unit, but may be allocated in a word or sentence unit by the same method.

[130] FIG. 18 illustrates synchronizing a caption to audio bitstream such as an oral- lynarrated fairy tale or animation and synchronizing a separate still picture and motion picture to audio data with respect to an audio file format that can be reproduced in a bitstream. A method for synchronizing a caption is the same as the above-described method. In the method for synchronizing a still picture or motion picture, a file name or link information of the still picture or motion picture is treated as the caption to allocate the position information. The link information indicates all information that may reference still picture and motion picture data as set. For example, the position information allocated when "A" is a still picture and "B" is a motion picture is as follows: "A" is still picture data, a file name, or link information, in which start position information of the still picture is "2" and the end position information is "3.". "B" is motion picture data, a file name or link information, in which start position information of the motion picture is "3" and the end position information is "4."

[131] FIG. 19 illustrates a caption and a format of still picture synchronizationdata. The format includes ID 30 of a content producer or provider, a content ID 40 for text song word/still picture selection, a type 50 of data format of a text song word/still picture, an entire size 60 of the text song word and still picture, an index 70 for setting a type of a byte offset, byte offset information 80 at an instant when the text song word/still picture starts, byte offset information 90 at an instant when the text song word/still picture ends, and text song word/still picture data 100 in FIG. 3. A type of the data format 50 of the text song word/stiU picture includes a kind of countries, a type of caption data codes, option and the like. The index 70 indicates a type of byte offset information and selects an operation format such as a/512 or a/64 as represented above. The byte offset information 80 and 90 directly indicate a byte offset according to an index or indicate a number of actual bytes obtained by byte offset operation, such as a multiple or a difference. The text song word may be included as one character or several characters, and motion picture data may directly include a motion picture or include file link information 100.

[132] A synchronization regenerator for a caption of MIDI, audio and multimedia file, a still picture and a motion picture will be now described.

[133] Each of a synchronization generator and a synchronization regenerator used in the present invention is composed of an application program having input and output functions, a file system included in an operating system, and a hardware system. Each is implemented by a program on a computer or an electronic device or portable device having a computer. The synchronization regenerator performs a number of functions through a user interface, such as file reading, file storing, playback, stop, pause, fast feed, speed change and the like.

[134] When the synchronization regenerator input unit selecs 'to read a file', the file system reads data from several places of the storage device that is the hardware system and inputs the data in a series of bitstreams in a buffer. The input buffer data is played back by a player.

[135] First, a block diagram of the synchronization regenerator for an interface file format that is musical performance information is shown in FIG. 20. In FIG. 20, when the synchronization regenerator input unit selects 'to read a file' the file system reads a MIDI file and inputs the file to the MIDI data buffer. An initial value of the tick counter becomes "0." When a reproduction signal is input to the synchronization regenerator input unit, the tick counter is incremented automatically, and the data stored in the MIDI data buffer is input to and played back by the MIDI player. When the data is input from the MIDI data buffer to the MIDI player, the synchronization position information comparator compares the tick counter value to the caption synchronization position information, and when the tick counter value matches the synchronization start position information of the caption, the coloring of the caption and playback of the still/motion picture begin. When the tick counter value matches the end position information, the coloring of the caption and the playback of the still/motion picture stop.

[136] According to the MIDI analysis data, the tick counter is automatically incremented with the starting of the MIDI musical performance. When the tick counter value matches the start position information that is the synchronization information, a difference value with the end position information is extracted and a coloring period is divided depending on one caption font size during the difference value. If the difference value is "a" and a width font size of the caption is "b", "b/a" is subject to caption coloring when the tick counter value is increments by "1."

[137] Caption coloring operation and still/motion picture synchronization operation in the MIDI synchronization regenerator when the MIDI synchronization regenerator input unit instructs speed change playback and fast feed will be described.

[138] When the MIDI synchronization regenerator input unit inputs a fast feed and reproduction signal and a feed position, the file system aligns data at the feed position in the MIDI data buffer using internal buffer information. Furthr, the tick counter value is assigned a new value according to buffer information in the file system and is automatically incremented. The synchronization position information comparator compares the new tick counter value to caption and still/motion picture synchronization position information to perform caption coloring and the still/motion picture synchronization regeneration.

[139] When the MIDI synchronization regenerator input unit inputs a speed change playback signal and a speed change value, a speed at which data is input from the MIDI data buffer to the MIDI player is changed and a tick counter increment speed is adjusted with the speed change to perform the synchronization regeneration.

[140] Next, a player for an audio file format that can be reproduced in a bitstream is shown in FIG. 21. In FIG. 21, when the synchronization regenerator input unit selects 'to read a file', the file system reads an audio file and inputs the file to the audio data buffer. An initial value of the byte counter becomes "0." When a reproduction signal is input to the synchronization regenerator input unit, the byte counter is incremented automatically, and the data stored in the audio data buffer is input to and played back by the audio player. When the data is input from the audio data buffer to the audio player, the synchronization position information comparator compares the byte counter value to the synchronization position information of the caption and still/motion data and when the byte counter value matches the synchronization start position information of the caption and still/motion data, the coloring of the caption and playback of the still/ motion picture begin. When the byte counter value matches the end position information, the coloring of the caption and the playback of the still/motion picture stop.

[141] Caption coloring operation and still/motion picture synchronization operation in the audio synchronization regenerator when the audio synchronization regenerator input unit instructs speed change playback and fast feed will be described.

[142] When the audio synchronization regenerator input unit inputs a fast feed and reproduction signal and a feed position, the file system aligns data at the feed position in the audio data buffer using internal buffer information. Furthr, the byte counter value is assigned a new value according to buffer information in the file system. The synchronization position information comparator compares the new byte counter value to caption and still/motion picture synchronization position information to perform caption coloring and the still/motion picture synchronization regeneration.

[143] When the audio synchronization regenerator input unit inputs a speed change playback signal and a speed change value, a speed at which data is input from the bitstream data buffer to the audio player in a byte unit is changed and a byte counter increment speed is automatically changed with a speed at which the byte is input from the bitstream buffer to the audio player. Accordingly, the byte counter value and the caption and still/motion picture position information are compared by the syn- chronization position information comparator so that the synchronization regeneration is performed. Dividing the above stated difference between the start position information and the end position information by a byte number for coloring allows fine caption coloring.

[144] For example, when a movement is made to a T position through fast forward/ backward movement upon reproducing the compressed audio bitstream in FIG. 22, it is difficult to directly recognize a frame number at T in obtaining synchronization using existing frame information. In particular, it is impossible for the VBR method to obtain actual playback time information without decoding the frame header from the beginging. On the other hand, in the present invention, it is possible to obtain accurate real-time synchronization since the T position in FIG. 1 is present after e bytes elapses from the beginning and is a position closest to a byte offset of the same phoneme. The same method applies to a still picture for synchronization.

[145] Next, a player for a multimedia file format that can be reproduced in a bitstream is shown in FIG. 23. In FIG. 23, when the synchronization regenerator input unit selects 'to read a file', the file system reads a multimedia file and inputs the file to the multimedia bitstream buffer. An initial value of the byte counter becomes "0." When a reproduction signal is input to the synchronization regenerator input unit, the byte counter is incremented automatically, and the data stored in the multimedia bitstream buffer is input to and played back by the multimedia player. When the data in a byte unit is input from the multimedia bitsteam buffer to the multimedia player, the sy nchronization position information comparator compares the byte counter value to the synchronization position information of the caption data and when the byte counter value matches the synchronization start position information of the caption, the coloring of the caption begins. When the byte counter value matches the end position information, the coloring of the caption stops. Dividing the above stated difference between the start position information and the end position information by a byte number for coloring allows fine caption coloring.

[146] Caption coloring operation in the multimedia synchronization regenerator when the multimedia synchronization regenerator input unit instructs speed change playback and fast feed will be described.

[147] When the multimedia synchronization regenerator input unit inputs a fast feed and reproduction signal and a feed position, the file system aligns bitstream data at the feed position in the multimedia bitsteam buffer using internal buffer information. Furthr, the byte counter value is assigned a new value according to buffer information. The synchronization position information comparator compares the new byte counter value to caption synchronization position information to perform caption synchronization regeneration. [148] When the multimedia synchronization regenerator input unit inputs a speed change playback signal and a speed change value, a speed at which data is input from the bitstream buffer to the multimedia player in a byte unit is changed and a byte counter increment speed is automatically changed with a speed at which the byte is input from the bitstream buffer to the multimedia player. Accordingly, the byte counter value and the caption position information are compared by the synchronization position information comparator so that the synchronization regeneration is performed. Dividing the above stated difference between the start position information and the end position information by a byte number for coloring allows fine caption coloring.

[149] As described above, in the present invention, a caption of a file having any format can be synchronized irrespective of a change in a reproduced bitstream speed, algorithm parsing, fast feed, and VBR only if a tick counter value or byte counter value for a currently reproduced portion of the file is known. The present invention provides a method capable of easily performing a synchronization task on an audio file by assigning caption synchronization information as position information and translating position information into a time domain. For example, according to the present invention, it is possible to easily edit synchronization data produced in a MIDI using synchronization information of an MP3 file by comparing and translating a tick position information value to a position information value on a bitstream to modify caption synchronization information with MIDI position information based on caption synchronization information of a file stored as the same MP3.

[150] Industrial Applicability

[151] As described above, the present invention has solved the conventional problems. It is possible to easily synchronize and regenerate a caption on a portable device or computer device using only a simple position information whether the file is an audio file in a bitstream format, an interface file having musical performance information, or a multimedia file format having audio and video in a bitstream format, or whether the file is compressed. It is also possible to easily change existing caption synchronization information to other formats, and to accurately and rapidly synchronize a caption using a VBR method in the fast feed such as FF/FW and the playback from any position upon playing back the compressed audio and video. It is possible to accurately acquire synchronization using only position information of currently produced data upon fast reproduction and slow reproduction as well as a normal reproduction speed in the case where a speed change function is performed upon playing back a file. It is possible to obtain accurate synchronization only through a comparison between current read amount and position information irrespective of output upon speed change since a speed to read the file data becomes fast in fast reproduction and a speed to read file data becomes slow in slow reproduction.

Claims

[1] A method for generating and regenerating synchronization for a caption, a still picture and a motion picture using position information in synchronizing an interface file format (MIDI) that is musical performance information, the method comprising: a first step of assigning tick position information using a tick counter value in an interface file format; a second step of adjusting the size of information using the allocated tick position information; and a third step of separately storing synchronization information related to the position information and the caption and composing a new synchronized bitstream. [2] A method for generating and regenerating synchronization for a caption, a still picture and a motion picture using position information in synchronizing a file format that is reproduced in a bitstream, the method comprising: a first step of assigning bitstream position information using a bitstream byte counter value; a second step of adjusting the size of information using the allocated bitsteam position information; and a third step of separately storing synchronization information related to the position information and the caption and composing a new synchronized bitstream. [3] The method according to claim 1 or 2, wherein the position information in the first step is start position information. [4] The method according to claim 1 or 2, wherein the position information in the first step includues start position information and end position information. [5] The method according to claim 1 or 2, wherein a multiple of position information in the second step is used to adjust the information size. [6] The method according to claim 1 or 2, wherein a difference information between start position information and end position information in the second step is used to adjust the information size. [7] The method according to claim 1 or 2, wherein the third step comprises mixing the synchronization information and the caption. [8] The method according to claim 1 or 2, wherein the third step incorporates the synchronization information and the caption into a MIDI file. [9] The method according to claim 1 or 2, wherein the third step comprises: storing the position information for a file bitstream in a file reproduction portion from a file start portion; perforating simple addition and subrraction operation on an absolute position of the file bitstream in the file reproduction portion to acquire synchronization upon fast feed; and representing an absolute position corresponding to a speed change as a caption depending on a variable speed upon audio speed change. [10] A synchronization generator for a caption, a still picture, and a motion picture using position information, the synchronization generator compring: a synchronization input unit for reading an audio file format, an audio musical performance information format, and a multimedia file format having audio and video information to form a bitstream; a buffer and buffer information of a file system for storing a bitstream read from the respective file formats; a data player for playing back data stored in the buffer; a counter for measuring a number of ticks or bytes; and a position information assigning unit for assigning position information using a value measured by the counter. [11] A synchronization regenerator for a caption, a still picture and a motion picture using position information, the regenerator comprising: a synchronization regenerator input unit for inputting an interface file or a bitstream; a file system buffer for storing the interface file or bitstream input by the synchronization regenerator input unit; a couter for measuring a number of ticks or bytes; a regenerator buffer for aligning transferred data using buffer information of a file system; and a synchronization position information comparator for comparing a value of the counter to synchronization position information for the caption and still/motion picture data.