WO2006111041A1 - Subtitle editing method and the device thereof - Google Patents

Subtitle editing method and the device thereof Download PDF

Info

Publication number
WO2006111041A1
WO2006111041A1 PCT/CN2005/000535 CN2005000535W WO2006111041A1 WO 2006111041 A1 WO2006111041 A1 WO 2006111041A1 CN 2005000535 W CN2005000535 W CN 2005000535W WO 2006111041 A1 WO2006111041 A1 WO 2006111041A1
Authority
WO
WIPO (PCT)
Prior art keywords
time
character
sound file
graphic
display
Prior art date
Application number
PCT/CN2005/000535
Other languages
French (fr)
Chinese (zh)
Inventor
Rong Yi
Original Assignee
Rong Yi
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Rong Yi filed Critical Rong Yi
Priority to PCT/CN2005/000535 priority Critical patent/WO2006111041A1/en
Publication of WO2006111041A1 publication Critical patent/WO2006111041A1/en

Links

Classifications

    • GPHYSICS
    • G11INFORMATION STORAGE
    • G11BINFORMATION STORAGE BASED ON RELATIVE MOVEMENT BETWEEN RECORD CARRIER AND TRANSDUCER
    • G11B27/00Editing; Indexing; Addressing; Timing or synchronising; Monitoring; Measuring tape travel
    • G11B27/10Indexing; Addressing; Timing or synchronising; Measuring tape travel
    • G11B27/34Indicating arrangements 
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/06Transformation of speech into a non-audible representation, e.g. speech visualisation or speech processing for tactile aids
    • GPHYSICS
    • G11INFORMATION STORAGE
    • G11BINFORMATION STORAGE BASED ON RELATIVE MOVEMENT BETWEEN RECORD CARRIER AND TRANSDUCER
    • G11B27/00Editing; Indexing; Addressing; Timing or synchronising; Monitoring; Measuring tape travel
    • G11B27/02Editing, e.g. varying the order of information signals recorded on, or reproduced from, record carriers
    • G11B27/031Electronic editing of digitised analogue information signals, e.g. audio or video signals

Definitions

  • the invention relates to a method and a device for making a subtitle editing.
  • This operating system allows the producer to observe the "position pointer” while listening to music ( Figure 1
  • Figure 1 The position of the red line in the waveform on the waveform map to determine the time range of each word.
  • This kind of system has made great progress in the editing mode of time division based on hearing alone. It can give the operator certain auxiliary information to reduce the difficulty of time determination and improve the accuracy of calibration.
  • W words Choinese characters
  • words Western words
  • Figure 1 the information contained in the waveform is very scarce, and the operator can't get intuitive information from it. It still relies on listening to music to set the time. It requires a high concentration of mind, a large workload, and easy fatigue. And the work efficiency is not high. It is very difficult for an operator to distinguish words between musical instruments and vocal music; and, for alphabetic characters such as English and French, it is very difficult to increase the accuracy of time division to a single letter or phoneme.
  • the object of the present invention is to provide a subtitle editing method and device for greatly reducing the difficulty of character area time calibration and easily achieving single words (for Chinese characters), syllables (letters or phonemes) (for Western characters).
  • a method for producing a subtitle editing comprising the following steps:
  • the step 3-1) includes the following steps,
  • the feature values are logarithmically normalized to a set maximum value, and then the graphic output is performed in the form of a gradient or a height.
  • the data structure is used, including at least the start and end positions of the character string corresponding to the character region in the character set, and the character The corresponding broadcast Let go and stop.
  • the start time of the next character area is determined based on the stop time of the previous character area at the start time of the character area.
  • the start and end time difference of the other calibrated character area is copied as the play time of the character area to be calibrated.
  • the graphic is segmentally displayed or continuously scrolled in a display window having an operation interface in a time axis thereof, and the time axis displayed in the display window is configurable.
  • Time span when the graphic is segmented display, an indication mark moving along the time axis indicates a corresponding position of the currently synchronized sound file; when the graphic is continuous scrolling, marked with a fixed position indicator The corresponding position of the currently synchronized sound file.
  • the moving speed of the indicator mark or the continuous scrolling speed of the graphic, and the playing speed of the synchronizedly played sound file are lower than the original playing speed of the sound file.
  • a subtitle editing and creating apparatus including: a data storage device for storing a sound file and a character set as materials;
  • a data processing device configured to convert the sound file into feature values whose time and frequency are two-dimensional variables, and divide the character set into a plurality of load segments, each of the load segments including one or more character regions;
  • a graphic display device configured to display and output the converted sound file in a graphic form
  • An instruction receiving device configured to receive an editing instruction issued by a user, and convert the instruction signal into an instruction signal that is identifiable by the instruction executing device;
  • the command execution device is configured to change a range of characters included in the character area according to the command signal, and perform calibration of the start and end time of each character area.
  • the beneficial technical effects of the present invention are as follows: 1) Converting the sound file data into feature values with time and frequency as two-dimensional variables, and displaying the output in the form of graphics. It greatly enriches the visual information available to the producer. In many cases, it can intuitively observe the starting and ending position of the character area from the graphic, greatly reducing the difficulty and intensity of the work of the producer, and improving the accuracy of the time calibration. , making subtitle editing Be a lighthearted thing.
  • FIG. 1 is a screenshot of an existing subtitle editing system operation interface waveform output window.
  • FIG. 2 is a block diagram showing a circuit configuration of a caption editing and producing apparatus provided by the present invention.
  • FIG. 3 is a flowchart of a method for creating a caption editing provided by the present invention.
  • Figure 4 is a two-dimensional grayscale spectrum of a lyric.
  • Figure 5 is a screenshot of a display window in which the spectrum is displayed in a green gradient.
  • Figure 6 is a screenshot of the operation interface for time calibration of the character area.
  • Figure 7 is a spectrum diagram displayed in three dimensions.
  • Embodiment 1 A subtitle editing and manufacturing apparatus, which is combined with the circuit configuration block diagram shown in FIG. 2, includes: a storage device 1 for storing a sound and a character set as a material; a data processing device 2 for converting the sound file into feature values whose time and frequency are two-dimensional variables, and dividing the character set into a plurality of loads a segment, each of the loading segments includes one or more character regions; a graphic display device 3, configured to display and output the converted sound file in a graphical form; and an instruction receiving device 4, configured to receive an editing command issued by the user And converted into a command signal for the instruction execution device to recognize; and an instruction execution device 5 for changing the range of characters included in the character region according to the command signal, and performing calibration of the start and end time of each character region.
  • a storage device 1 for storing a sound and a character set as a material
  • a data processing device 2 for converting the sound file into feature values whose time and frequency are two-dimensional variables, and dividing the character set into a pluralit
  • the data processing device 2 and the instruction executing device 5 may be realized by a microprocessor of a computer reading and executing a processing program stored on a temporary or fixed storage device.
  • the graphic display device 3 is a device capable of providing a display output window for the processing result, such as a display, a projector, etc.
  • the command receiving device 4 can generally employ a device capable of transmitting an identifiable command to the microprocessor, for example, Keyboard, mouse, trackball, etc.
  • Embodiment 2 A method for creating a subtitle editing, combined with the flowchart shown in FIG. 3, includes the following steps:
  • each of the load segments comprises one or more character regions (Re g ion).
  • the load segment of the character set typically corresponds to a line displayed in the editing interface, typically in the form of a textual natural sentence (eg, an Enter symbol in text editing).
  • the Region is divided by specific rules. For example, a space character is used as a separator to divide each Region (usually applied to a Western-language language, and a region is obtained in units of words), while for a Chinese-language language, it is usually divided into single words, that is, each Characters as a Region.
  • Each Region can contain one or more characters, and the user can expand or reduce the range of characters contained in a Region by using a specific operation method (for example, inputting merge or split instructions through an input device). 3)
  • the start and end time of the character area is calibrated according to the sound file. This is the core of the entire subtitle production process, and the most time and effort. The following process will provide a way to accomplish this step easily, intuitively and with high accuracy.
  • eigenvalues Normalize the eigenvalues with 255 as the maximum value, establish a two-dimensional plane with time and frequency as the coordinate axes, the horizontal axis represents time, the vertical axis represents frequency, and the points on the plane correspond to
  • the eigenvalues are displayed in the form of a gradient, that is, the eigenvalues corresponding to the points on the plane are converted into RGB color values, for example, 256-level gradients are directly converted into green component values to display color or monochrome two-dimensional image.
  • Figure 4 shows a spectrum of 256-level grayscale converted to the lyrics "Happy birthday to you" in the song "Happy Birthday".
  • the obtained sound pattern is stored, and according to the instruction of the editing instruction, segment display or continuous scrolling is performed in the display window having the operation interface in the order of the time axis, and the time axis displayed in the display window is settable. time span.
  • Figure 5 shows a screenshot of the display window with a window span of 8000ms, in which the spectrum is displayed with a 256-level green gradient.
  • the spectrogram adopts a segment display manner, and the indication mark (white vertical line in FIG. 5) moving in the time axis direction indicates the corresponding time position of the currently synchronized played sound file.
  • the playback speed of the W and the synchronized sound files is 0.5 times the original playback speed of the sound file.
  • FIG. 3-2 The start and end time of the character area is calibrated according to the graphic of the display output and the sound file played synchronously.
  • Figure 6 is a screenshot of the operation interface that is time calibrating the character area.
  • the program Increase the time at the left end of the window by 4s (that is, move the spectrum displayed in the right half to the left half), re-read the spectrum of the next 4000ms, and return the indicator to the middle of the window to continue moving. Repeat until the end of the playback. .
  • the operator listens to the music played at a slow speed while observing the positional change of the indicator mark (the white vertical line in Fig. 6). After confirming the start time or stop time of the current character area to be calibrated (the red underline character in the figure), select pause playback and stop the movement of the indicator, and then display the window through the input device (such as mouse, keyboard, etc.)
  • the calibration starts and ends the current character area.
  • the area marked by the yellow line in Figure 6 is the play area of the corresponding character
  • the white line indicates the current playback time point
  • the red vertical line is the time label currently being edited.
  • the current playback time point can be set to the start (or stop time) of the currently to-be-calibrated character area by inputting a confirmation signal.
  • the operator can also perform operations such as adjusting the range of characters covered by the character area, for example, expanding the range of characters to a phrase or narrowing down to a single phoneme or letter.
  • the start and end time labels can still be changed.
  • the start time of the latter character area can be determined based on the stop time of the previous character area, for example , Set in the program, if not set, the stop time of the previous character area is used as the start time of the next character area to save the operator's editing steps.
  • the playing time of a calibrated character area is copied as having the same melody. The playing time of the character area to be calibrated can greatly save the operator's editing time and improve the editing efficiency.
  • Embodiment 3 Another method for creating a subtitle editing process is basically the same as that of the second embodiment, except that the feature value is displayed and outputted in a high-level form in step 3-1-2).
  • Figure 7 shows a three-dimensional map of the song "Happy Birthday” with the lyrics "Happy birthday to you” converted to a height of 256 levels.
  • Figure 7 shows the elevation of each point in different colors. . It can be seen that the use of three-dimensional graphics display can provide more stereoscopic visual information, and the information expression is more abundant and complete.
  • the caption editing method and device provided by the invention can be used not only for editing song subtitles but also for pure speech subtitles, such as movie subtitles, TV subtitles, etc., and as an auxiliary tool for learning foreign languages. Since this method greatly simplifies the difficulty of subtitle production, and improves the accuracy of time label editing, it will make "subtitle DIY" a new form of entertainment for ordinary non-professional users.

Abstract

A subtitle editing method is disclosed. The method comprises the steps of, storing the audio file and the character set in the memory bank; dividing the character set into several load segments wherein each of the load segments includes one or several character regions; converting the audio file into the characteristic values consist of two dimensional variables of time and frequency, then outputting and displaying the characteristic values in the form of graphics; demarcating the starting time and the terminal time of each character region according to the displayed graphics; storing the demarcated character regions. The beneficial technical effect of the invention is that, the audio file data is converted into the characteristic values consist of two dimensional variables of time and frequency, then outputted and displayed in the form of graphics, therefore the visual information obtained by the editor is greatly enriched, so that sometimes the starting position and the terminal position of the displaying character regions can be directly observed from the graphics. According to the invention, the working strength and complexity can be decreased, and the precision of the time demarcation can be improved, which make the subtitle editing become an easy and pleasant work.

Description

字幕编辑制作方法及装置  Subtitle editing production method and device
【技术领域】 [Technical Field]
本发明涉及一种字幕编辑制作方法及装置。  The invention relates to a method and a device for making a subtitle editing.
【背景技术】  【Background technique】
流行音乐的盛行, 以及多媒体播放软件、 器材的普及, 使得 karaoke成为人们休闲娱乐 的轻松选择。在欣赏、学唱、跟唱歌曲时,如果歌曲能够带有随乐曲节奏变化而出现的字幕, 无疑会使得娱乐过程更加轻松和完美,字幕伴唱音乐文件往往较普通单纯音乐文件更加受到 人们的普遍欢迎。  The prevalence of popular music, as well as the popularity of multimedia playback software and equipment, make karaoke an easy choice for people to enjoy. When you enjoy, learn to sing, and sing a song, if the song can have subtitles that appear with the rhythm of the music, it will undoubtedly make the entertainment process easier and more perfect. The subtitle singer music files are often more popular than ordinary simple music files. welcome.
在早期的 karaoke字幕制作过程中, 需要使用特殊的硬件, 通过繁琐的操作来实现字幕 和歌曲的同步。 电脑平台普及之后, 出现了多种歌词字幕编辑系统, 实现了词曲分离的 Karaoke特效字幕和逐行显示字幕的制作。 这类编辑系统的一般操作方法是, 载入一个音乐 文件, 输入没有时间标签的歌 i司文本, 然后一面播放音乐文件, 一面通过各种操作方式来确 定每句或者每个词的起始和结束时间标签。 由于现有歌词字幕编辑系统使用起来效率不高, 制作一首歌曲的字幕一般都要反复播放多次歌曲文件, 普通用户难于操作。  In the early karaoke subtitle production process, special hardware was required to synchronize subtitles and songs through cumbersome operations. After the popularity of the computer platform, a variety of lyrics subtitle editing system appeared, which realized the Karaoke special effects subtitles and progressive display subtitles. The general operation method of this type of editing system is to load a music file, input the text of the song without the time label, and then play the music file, and determine the start of each sentence or each word through various operations. End time label. Since the existing lyrics subtitle editing system is not efficient in use, subtitles for making a song generally have to repeatedly play multiple song files, which is difficult for ordinary users to operate.
在整个字幕的制作过程中, 确定歌词的时间标签是最核心, 也是最花费时间和精力的部 分。较早的编辑系统未将音频信号转化为可视图形,靠制作者一边聆听音乐一边在听到相应 的旋律时用敲打键盘、 点击鼠标等方式指示时间, 这种编辑方式十分主观, 与制作者的音乐 素养甚至反应敏捷程度都有很大关系,造成最终时间标定的误差非常大,通常只能对整句的 歌词进行起止时间标定。现在较为先进的编辑系统已将音频信号转化为波形, 图 1是这种编 辑系统操作界面波形输出窗口的截图, 这种操作系统能够让制作者一面聆听音乐一面观察 "位置指示光标"(图 1中的红色线条)在波形图上的位置, 以此来确定每个单词的时间范 围。这种系统较早期单凭听觉来进行时间划分的编辑方式已有了很大进步, 它能给操作者一 定的辅助信息来降低时间确定的难度,提高标定的准确度。采用这种系统,已经可以对单字、 W 词语(中文类文字)或单词 (西文类文字)进行时间标定。 但是从图 1中可以看出, 波形蕴 含的信息非常匮乏,操作者无法从中得到直观的信息,仍然主要是靠聆听音乐来进行时间设 定的, 需要高度的精神集中, 工作量大、 容易疲劳而工作效率不高。操作者在乐器和人声混 杂的音乐中区分单词, 难度很大; 并且, 对于英文、 法文等字母型文字, 要将时间划分的精 确度提高到单个字母或者音素, 也非常困难。 In the entire subtitle production process, determining the time stamp of the lyrics is the core and the most time and effort. Earlier editing systems did not convert audio signals into visual graphics. The producers listened to the music while listening to the melody, using the keyboard, clicking the mouse, etc. to indicate the time. This editing method is very subjective, and the producer The music literacy and even the agility degree have a great relationship, which causes the error of the final time calibration to be very large. Usually, the lyrics of the whole sentence can only be calibrated. Now the more advanced editing system has converted the audio signal into a waveform. Figure 1 is a screenshot of the waveform output window of the editing system interface. This operating system allows the producer to observe the "position pointer" while listening to music (Figure 1 The position of the red line in the waveform on the waveform map to determine the time range of each word. This kind of system has made great progress in the editing mode of time division based on hearing alone. It can give the operator certain auxiliary information to reduce the difficulty of time determination and improve the accuracy of calibration. With this system, it is already possible to W words (Chinese characters) or words (Western words) are time-calibrated. However, as can be seen from Figure 1, the information contained in the waveform is very scarce, and the operator can't get intuitive information from it. It still relies on listening to music to set the time. It requires a high concentration of mind, a large workload, and easy fatigue. And the work efficiency is not high. It is very difficult for an operator to distinguish words between musical instruments and vocal music; and, for alphabetic characters such as English and French, it is very difficult to increase the accuracy of time division to a single letter or phoneme.
【发明内容】  [Summary of the Invention]
本发明的目的在于提出一种大幅度降低字符区域时间标定难度,轻松实现精确到单字(对 中文类字符)、 音节 (字母或者音素) (对西文类字符) 的字幕编辑制作方法及装置  The object of the present invention is to provide a subtitle editing method and device for greatly reducing the difficulty of character area time calibration and easily achieving single words (for Chinese characters), syllables (letters or phonemes) (for Western characters).
实现上述目的的技术方案是: 一种字幕编辑制作方法, 包括如下步骤:  The technical solution for achieving the above object is: A method for producing a subtitle editing, comprising the following steps:
1 )将声音文件与字符集储存于记忆体中;  1) storing the sound file and the character set in the memory;
2) 将字符集划分为多个载入段, 各个载入段中包括一个或多个字符区域;  2) dividing the character set into a plurality of load segments, each of which includes one or more character regions;
3)根据声音文件标定字符区域的起止时间; 本步骤由包括下述步骤的过程来实现:  3) The start and end time of the character area is calibrated according to the sound file; this step is implemented by a process including the following steps:
3-1 )将声音文件转换成以时间、频率为二维变量的特征值, 并以图形的形式显示输出; 3-2)根据显示输出的图形及同步播放的声音文件对字符区域进行起止时间的标定。 3-1) Convert the sound file into feature values with time and frequency as two-dimensional variables, and display the output in the form of graphics; 3-2) Start and end time of the character area according to the graphic output and the sound file played synchronously Calibration.
4) 存储经时间标定后的字符区域; 4) storing the time-characterized character area;
优选的是, 所述步骤 3-1 ) 包括如下步骤,  Preferably, the step 3-1) includes the following steps,
3-1-1 )将声音文件按时间序列划分为多帧, 分别计算出各帧的频谱, 得到以时间、 频率 为二维变量的特征值;  3-1-1) Dividing the sound file into multiple frames in time series, respectively calculating the spectrum of each frame, and obtaining the feature value with the time and frequency as two-dimensional variables;
3-1-2) 以时间和频率为坐标轴建立二维平面, 将对应的特征值以梯度的形式进行平面图 形的显示输出; 或者, 将对应的特征值以高度的形式进行三维图形的显示输出。  3-1-2) Create a two-dimensional plane with time and frequency as the coordinate axes, and display and output the corresponding feature values in the form of gradients; or, display the corresponding feature values in three dimensions. Output.
优选的是, 在步骤 3-1-2)中将特征值取对数后以一设定最大值进行归一化, 再以梯度或 高度的形式进行图形输出。  Preferably, in step 3-1-2), the feature values are logarithmically normalized to a set maximum value, and then the graphic output is performed in the form of a gradient or a height.
优选的是, 在所述步骤 4) 中存储经时间标定后的字符区域时釆用这样的数据结构, 至 少包括该字符区域所对应的字符串在字符集中的起始和结束位置, 以及该字符串所对应的播 放起止时间。 Preferably, when the time-characterized character region is stored in the step 4), the data structure is used, including at least the start and end positions of the character string corresponding to the character region in the character set, and the character The corresponding broadcast Let go and stop.
优选的是, 在对字符区域进行起止时间的标定时, 基于前一字符区域的停止时间来确定 后一字符区域的起始时间。  Preferably, the start time of the next character area is determined based on the stop time of the previous character area at the start time of the character area.
优选的是, 在对字符区域进行起止时间的标定时, 复制另一已标定字符区域的起止时间 差, 即其播放时间, 作为待标定字符区域的播放时间。  Preferably, in the calibration of the start and end time of the character area, the start and end time difference of the other calibrated character area, that is, the play time thereof, is copied as the play time of the character area to be calibrated.
优选的是, 所述步骤 3 ) 中, 所述图形在一具有操作界面的显示窗口中按其时轴顺序分 段显示或连续滚动, 所述显示窗口中显示出的时间轴具有可设定的时间跨度; 当所述图形为 分段显示时, 以一沿时间轴移动的指示标志标示当前同步播放的声音文件的对应位置; 当所 述图形为连续滚动时, 以一固定位置的指示标志标示当前同步播放的声音文件的对应位置。  Preferably, in the step 3), the graphic is segmentally displayed or continuously scrolled in a display window having an operation interface in a time axis thereof, and the time axis displayed in the display window is configurable. Time span; when the graphic is segmented display, an indication mark moving along the time axis indicates a corresponding position of the currently synchronized sound file; when the graphic is continuous scrolling, marked with a fixed position indicator The corresponding position of the currently synchronized sound file.
进一步优选的是, 所述指示标志的移动速度或所述图形的连续滚动速度, 以及同步播放 的声音文件的播放速度, 低于该声音文件的原始播放速度。  Further preferably, the moving speed of the indicator mark or the continuous scrolling speed of the graphic, and the playing speed of the synchronizedly played sound file are lower than the original playing speed of the sound file.
为本发明的目的, 还提出一种字幕编辑制作装置, 包括- 数据存储装置, 用于存储作为素材的声音文件与字符集;  For the purpose of the present invention, a subtitle editing and creating apparatus is further provided, including: a data storage device for storing a sound file and a character set as materials;
数据处理装置, 用于将声音文件转换成以时间、 频率为二维变量的特征值, 以及, 将 字符集划分为多个载入段, 各个载入段中包括一个或多个字符区域;  a data processing device, configured to convert the sound file into feature values whose time and frequency are two-dimensional variables, and divide the character set into a plurality of load segments, each of the load segments including one or more character regions;
图形显示装置, 用于将转换后的声音文件以图形的形式进行显示输出;  a graphic display device, configured to display and output the converted sound file in a graphic form;
指令接收装置, 用于接收使用者发出的编辑指令, 并转换为可供指令执行装置识别的 指令信号; 以及  An instruction receiving device, configured to receive an editing instruction issued by a user, and convert the instruction signal into an instruction signal that is identifiable by the instruction executing device;
指令执行装置, 用于根据指令信号, 更改字符区域所包含的字符范围、 对各个字符区 域进行起止时间的标定。  The command execution device is configured to change a range of characters included in the character area according to the command signal, and perform calibration of the start and end time of each character area.
采用上述技术方案, 结合下面将要详述的实施例, 本发明有益的技术效果在于: 1 )将声 音文件数据转换成以时间、 频率为二维变量的特征值, 并以图形的形式显示输出, 大大丰富 了制作者可获得的视觉信息, 在很多时候, 都能够直观的从图形上观察出字符区域的播放起 止位置, 极大减轻了制作者的工作难度和强度, 提高了时间标定的精确程度, 使得字幕编辑 成为一件轻松愉快的事情。 2) 以梯度的形式显示二维声音图形, 界面清爽, 更符合编辑人员 的使用习惯, 且显示程序的运算相对简单; 采用三维图形显示则能提供更富立体感的视觉信 息, 信息表达更加丰富和完整。 3 )对特征值进行归一化使得输出的图形在各个时间点上具有 相同梯度或高度的峰值, 操作者更容易获取有价值的变化信息。 4)在大多数时候(例如歌曲 中没有间奏的部分), 字符区域在时间轴上都是连续排列的, 因此根据前一字符区域的停止时 间来确定后一字符区域的起始时间可节省操作者的编辑步骤。 5 )由于歌曲中经常会出现相同 一段旋律使用不同歌词文本, 且歌词文本中各个字符单元具有相同演唱时间的情况 (例如, 一些具有多段歌词的歌曲, 每段歌词都以同样的节奏重复相同的旋律), 复制一已标定字符区 域的播放时间, 作为具有相同旋律的待标定宇符区域的播放时间可大大节省操作者的编辑时 间, 提高编辑效率, 在有些情况下, 甚至不用播放完整首歌曲就可以完成全部歌词的时间标 定, 这是目前所有歌词编辑系统都不具备的优点。 6)通过调整显示窗口时间轴的跨度, 可以 改变单屏所显示声音文件的时间长度, 增加编辑系统的可操作性。 7)以低速播放声音和对应 的图形文件, 给操作者以充足的识别和编辑时间, 能够提高编辑精度, 减少回放率。 With the above technical solution, combined with the embodiments to be described in detail below, the beneficial technical effects of the present invention are as follows: 1) Converting the sound file data into feature values with time and frequency as two-dimensional variables, and displaying the output in the form of graphics. It greatly enriches the visual information available to the producer. In many cases, it can intuitively observe the starting and ending position of the character area from the graphic, greatly reducing the difficulty and intensity of the work of the producer, and improving the accuracy of the time calibration. , making subtitle editing Be a lighthearted thing. 2) Display two-dimensional sound graphics in the form of gradients, the interface is refreshing, more in line with the editor's usage habits, and the operation of the display program is relatively simple; using three-dimensional graphics display can provide more stereoscopic visual information, and the information expression is more abundant. And complete. 3) The eigenvalues are normalized so that the output graph has peaks of the same gradient or height at various points in time, and the operator is more likely to obtain valuable change information. 4) Most of the time (for example, there is no interlude part in the song), the character areas are consecutively arranged on the time axis, so the start time of the latter character area can be saved according to the stop time of the previous character area. The operator's editing steps. 5) Since the same melody often uses different lyrics in the song, and each character unit in the lyric text has the same singing time (for example, some songs with multiple lyrics, each lyric repeats the same tempo at the same tempo Melody), copying the playback time of a calibrated character area, as the playback time of the to-be-calibrated area with the same melody can greatly save the operator's editing time, improve editing efficiency, and in some cases, even without playing the entire song It is possible to complete the time calibration of all the lyrics, which is the advantage that all lyric editing systems do not currently have. 6) By adjusting the span of the time axis of the display window, the length of the sound file displayed on the single screen can be changed, and the operability of the editing system can be increased. 7) Play the sound and the corresponding graphic file at a low speed, giving the operator sufficient recognition and editing time, which can improve the editing precision and reduce the playback rate.
下面通过实施例并结合附图, 对本发明作进一步的详细说明。  The present invention will be further described in detail below by way of embodiments with reference to the accompanying drawings.
【附图说明】  [Description of the Drawings]
图 1是一种现有字幕编辑系统操作界面波形输出窗口的截图。  FIG. 1 is a screenshot of an existing subtitle editing system operation interface waveform output window.
图 2是本发明提供的字幕编辑制作装置的一种电路配置方框图。  2 is a block diagram showing a circuit configuration of a caption editing and producing apparatus provided by the present invention.
图 3是本发明提供的字幕编辑制作方法的流程图。  FIG. 3 is a flowchart of a method for creating a caption editing provided by the present invention.
图 4是一句歌词的二维灰度语谱图。  Figure 4 is a two-dimensional grayscale spectrum of a lyric.
图 5是一幅显示窗口截图, 在该窗口中以绿色梯度显示语谱图。  Figure 5 is a screenshot of a display window in which the spectrum is displayed in a green gradient.
图 6是一个对字符区域进行时间标定的操作界面截图。  Figure 6 is a screenshot of the operation interface for time calibration of the character area.
图 7是一幅以三维形式显示的语谱图。  Figure 7 is a spectrum diagram displayed in three dimensions.
【具体实施方式】  【detailed description】
实施例一、 一种字幕编辑制作装置, 结合图 2所表示的其电路配置方框图, 包括: 数据 存储装置 1, 用于存储作为素材的声音与字符集; 数据处理装置 2, 用于将声音文件转换成以 时间、 频率为二维变量的特征值, 以及, 将字符集划分为多个载入段, 各个载入段中包括一 个或多个字符区域; 图形显示装置 3 , 用于将转换后的声音文件以图形的形式进行显示输出; 指令接收装置 4, 用于接收使用者发出的编辑指令, 并转换为可供指令执行装置识别的指令 信号; 以及指令执行装置 5, 用于根据指令信号, 更改字符区域所包含的字符范围, 以及对 各个字符区域进行起止时间的标定。 Embodiment 1 A subtitle editing and manufacturing apparatus, which is combined with the circuit configuration block diagram shown in FIG. 2, includes: a storage device 1 for storing a sound and a character set as a material; a data processing device 2 for converting the sound file into feature values whose time and frequency are two-dimensional variables, and dividing the character set into a plurality of loads a segment, each of the loading segments includes one or more character regions; a graphic display device 3, configured to display and output the converted sound file in a graphical form; and an instruction receiving device 4, configured to receive an editing command issued by the user And converted into a command signal for the instruction execution device to recognize; and an instruction execution device 5 for changing the range of characters included in the character region according to the command signal, and performing calibration of the start and end time of each character region.
在本实施例中, 数据处理装置 2和指令执行装置 5可以是由计算机的微处理器读取并执 行存储在某临时或固定存储设备上的处理程序而实现的。 基于这一结构, 图形显示装置 3是 能够为处理结果提供显示输出的窗口的设备, 例如显示器、 投影仪等, 而指令接收装置 4通 常可以采用能够向微处理器发送可识别指令的设备, 例如键盘、 鼠标、 轨迹球等。  In the present embodiment, the data processing device 2 and the instruction executing device 5 may be realized by a microprocessor of a computer reading and executing a processing program stored on a temporary or fixed storage device. Based on this configuration, the graphic display device 3 is a device capable of providing a display output window for the processing result, such as a display, a projector, etc., and the command receiving device 4 can generally employ a device capable of transmitting an identifiable command to the microprocessor, for example, Keyboard, mouse, trackball, etc.
实施例二、 一种字幕编辑制作方法, 结合图 3所表示的流程图, 包括如下步骤: Embodiment 2 A method for creating a subtitle editing, combined with the flowchart shown in FIG. 3, includes the following steps:
1 )将声音文件与字符集储存于记忆体中。这些素材文件可以是经由外部存储设备或者通 过一个通信网络下载到当前记忆设备中的, 也可以是通过语音输入设备直接录入的。 在需要 的情况下, 素材文件还应当转换为可供执行程序编辑处理的格式。 例如, 从网络上下载得到 的声音文件通常为经过高度压缩的格式, 需要采用解码器进行解压, 一般而言声音文件需要 转换为 PCM (Pulse Code Modulation)音频数据流, 以便进行后续的数据转换处理。 字符集 则通常具有较高的兼容性, 一般的文本文件(例如 txt、 rtf、 word等)通常都可以使用。 1) Store the sound file and character set in the memory. These material files may be downloaded to the current memory device via an external storage device or via a communication network, or may be directly entered through a voice input device. The material file should also be converted to a format that can be edited by the program if needed. For example, a sound file downloaded from the network is usually in a highly compressed format and needs to be decompressed by a decoder. Generally, the sound file needs to be converted into a PCM (Pulse Code Modulation) audio data stream for subsequent data conversion processing. . Character sets are usually highly compatible, and general text files (such as txt, rtf, word, etc.) are generally available.
2)将字符集划分为多个载入段, 各个载入段中包括一个或多个字符区域(Region)。字符 集的载入段通常对应于在编辑界面中显示出的一行, 一般以文本自然断句的形式形成 (例如 文本编辑中的回车(Enter)符号)。 字符区域(Region)则采用特定规则来进行划分。 例如以 空格(space)字符为分隔符来划分各个 Region (—般应用于西文类语言, 得到的就是以单词 为单位的 Region), 而对中文类语言, 则通常釆用单字划分, 即每个字符作为一个 Region。 每个 Region可包含一个或多个字符, 使用者可以釆用特定的操作方法(例如通过输入设备输 入合并或分切指令), 来扩大或者缩小一个 Region所包含的字符范围。 3 )根据声音文件标定字符区域的起止时间。 这是整个字幕制作过程中, 最核心, 也是最 花费时间和精力的部分。 下述过程将提供一种能够轻松、 直观而以高准确度完成这一步骤的 方法。 2) The character set is divided into a plurality of loading sections, each of the load segments comprises one or more character regions (Re g ion). The load segment of the character set typically corresponds to a line displayed in the editing interface, typically in the form of a textual natural sentence (eg, an Enter symbol in text editing). The Region is divided by specific rules. For example, a space character is used as a separator to divide each Region (usually applied to a Western-language language, and a region is obtained in units of words), while for a Chinese-language language, it is usually divided into single words, that is, each Characters as a Region. Each Region can contain one or more characters, and the user can expand or reduce the range of characters contained in a Region by using a specific operation method (for example, inputting merge or split instructions through an input device). 3) The start and end time of the character area is calibrated according to the sound file. This is the core of the entire subtitle production process, and the most time and effort. The following process will provide a way to accomplish this step easily, intuitively and with high accuracy.
3-1 )将声音文件转换成以时间、 频率为二维变量的特征值, 并以图形的形式显示输 出。在专利号为 ZL00802335.2的中国专利中, 公开了一种将声音图形化为 "声音频谱图"的 方法, 用于将两份 "声音频谱图"进行静态对比, 根据其匹配度来实现讲话者的识别。 在本 专利中, 声音图形(以下称为 "语谱图")用于动态输出和观察, 不过声音的图形化基本可以 采用上述专利中的方法。 具体来说,  3-1) Convert the sound file into feature values whose time and frequency are two-dimensional variables, and display the output in the form of a graph. In the Chinese patent No. ZL00802335.2, a method for graphically synthesizing a sound into a "sound spectrum map" for statically comparing two "sound spectrum maps" and realizing speech according to the matching degree is disclosed. Identification. In this patent, a sound pattern (hereinafter referred to as "spectrum map") is used for dynamic output and observation, but the patterning of sound can basically adopt the method in the above patent. Specifically,
3-1-1 ) 首先获得声音文件的 PCM音频数据流 (需要的话, 可以通过采用第三方软 件解压等方法), 采样率为 44100Hz, 然后将音频数据按设定的单位时间间隔(通常取 512个 采样点, 因此时间间隔为, 采样点 /釆样率 =512/44100«11.61ms)划分成多个帧(frame), 每帧取样点个数 N为 512, 将所划分的数据序列乘以作为窗函数的哈明窗(hamming) (或者 哈宁窗 (hanning) ) 函数, 再进行快速傅立叶变换, 即得到各帧的原始频谱值, 是以时间、 频率为二维变量的特征值;  3-1-1) First obtain the PCM audio data stream of the sound file (if necessary, use third-party software decompression, etc.), the sampling rate is 44100Hz, and then the audio data is set to the unit time interval (usually 512) The sampling points, so the time interval is, the sampling point/sample rate = 512/44100 «11.61ms) is divided into a plurality of frames, and the number of sampling points per frame N is 512, and the divided data sequence is multiplied by As a window function, the hamming (or hanning) function, and then performing fast Fourier transform, the original spectral value of each frame is obtained, which is a characteristic value of time and frequency as a two-dimensional variable;
3-1-2)将特征值取对数后以 255为最大值归一化, 以时间和频率为坐标轴建立二维 平面, 水平轴代表时间, 垂直轴代表频率, 该平面上各点对应的特征值以梯度的形式进行显 示, 即, 将该平面上各点对应的特征值转化为 RGB颜色值, 例如, 直接将 256级梯度转化为 绿色分量值, 以显示出彩色或者单色二维图像。 图 4显示了一幅对歌曲 " Happy Birthday"中 歌词 "Happy birthday to you"进行转换后以 256级灰度表示的语谱图。 将获得的声音图形存 储起来, 并根据编辑指令的指示在一具有操作界面的显示窗口中按其时轴顺序进行分段显示 或连续滚动, 该显示窗口中所显示的时间轴具有可设定的时间跨度。 图 5中显示了一幅窗口 时轴跨享为 8000ms的显示窗口截图, 在该窗口中, 语谱图以 256级绿色梯度进行显示。 本 实施例中, 语谱图采用分段显示的方式, 以沿时间轴方向移动的指示标志 (图 5中的白色竖 线)标示当前同步播放的声音文件的对应时间位置。 该指示标志沿时间坐标轴的移动速度, W 以及同步播放的声音文件的播放速度为该声音文件的原始播放速度的 0.5倍。 3-1-2) Normalize the eigenvalues with 255 as the maximum value, establish a two-dimensional plane with time and frequency as the coordinate axes, the horizontal axis represents time, the vertical axis represents frequency, and the points on the plane correspond to The eigenvalues are displayed in the form of a gradient, that is, the eigenvalues corresponding to the points on the plane are converted into RGB color values, for example, 256-level gradients are directly converted into green component values to display color or monochrome two-dimensional image. Figure 4 shows a spectrum of 256-level grayscale converted to the lyrics "Happy birthday to you" in the song "Happy Birthday". The obtained sound pattern is stored, and according to the instruction of the editing instruction, segment display or continuous scrolling is performed in the display window having the operation interface in the order of the time axis, and the time axis displayed in the display window is settable. time span. Figure 5 shows a screenshot of the display window with a window span of 8000ms, in which the spectrum is displayed with a 256-level green gradient. In this embodiment, the spectrogram adopts a segment display manner, and the indication mark (white vertical line in FIG. 5) moving in the time axis direction indicates the corresponding time position of the currently synchronized played sound file. The speed at which the indicator moves along the time axis, The playback speed of the W and the synchronized sound files is 0.5 times the original playback speed of the sound file.
3-2) 根据显示输出的图形及同步播放的声音文件对字符区域进行起止时间的标定。 图 6是正在对字符区域进行时间标定的操作界面截图。 在编辑时, 先读取一段时间跨度为 4000ms的语谱图显示在显示窗口的右半部分中,指示标志从窗口的中间位置开始按设定速度 沿时间轴滑动, 当到达窗口右端时, 程序将窗口左端的时间增加 4s (即将右半部分显示的语 谱图移动到左半部分), 重新读入后续 4000ms的语谱图, 指示标志同时回到窗口中间开始继 续移动, 如此重复直至播放结束。 操作者一边聆听以慢速度进行播放的音乐, 一边观察指示 标志 (图 6中的白色竖线) 的位置变化。 当确认了当前待标定字符区域(图中红色下划线字 符) 的起始时间或停止时间后, 选择暂停播放并随之停止指示标志的移动, 然后通过输入设 备(例如鼠标、 键盘等)在显示窗口中标定当前字符区域的起止时间位置。 图 6中黄色框线 划出的区域即为相应字符的播放区域, 白线表示当前播放的时间点, 红色竖线为当前正在进 行编辑的时间标签。 一般可通过输入一确认信号的方式将当前播放时间点设置为当前待标定 字符区域的起始(或停止时间)。在编辑操作过程中, 操作者也可进行调整字符区域所覆盖的 字符范围等操作, 例如, 将字符范围扩大到一个词组或缩小到单个音素或字母。 对于已标定 的字符区域, 仍然可以改变其起止时间标签。  3-2) The start and end time of the character area is calibrated according to the graphic of the display output and the sound file played synchronously. Figure 6 is a screenshot of the operation interface that is time calibrating the character area. When editing, first read the spectrum map with a span of 4000ms for a period of time. In the right half of the display window, the indicator mark slides along the time axis from the middle position of the window at the set speed. When the right end of the window is reached, the program Increase the time at the left end of the window by 4s (that is, move the spectrum displayed in the right half to the left half), re-read the spectrum of the next 4000ms, and return the indicator to the middle of the window to continue moving. Repeat until the end of the playback. . The operator listens to the music played at a slow speed while observing the positional change of the indicator mark (the white vertical line in Fig. 6). After confirming the start time or stop time of the current character area to be calibrated (the red underline character in the figure), select pause playback and stop the movement of the indicator, and then display the window through the input device (such as mouse, keyboard, etc.) The calibration starts and ends the current character area. The area marked by the yellow line in Figure 6 is the play area of the corresponding character, the white line indicates the current playback time point, and the red vertical line is the time label currently being edited. Generally, the current playback time point can be set to the start (or stop time) of the currently to-be-calibrated character area by inputting a confirmation signal. During the editing operation, the operator can also perform operations such as adjusting the range of characters covered by the character area, for example, expanding the range of characters to a phrase or narrowing down to a single phoneme or letter. For the character area that has been calibrated, the start and end time labels can still be changed.
由于在大多数时候 (例如歌曲中没有间奏的部分),字符区域在时间轴上都是连续排列的, 因此可以基于前一字符区域的停止时间来确定后一字符区域的起始时间, 例如, 在程序中设 定, 如不特殊设置, 均以前一字符区域的停止时间作为后一字符区域的起始时间, 以节省操 作者的编辑步骤。 此外, 由于歌曲中经常会出现相同一段旋律使用不同歌词文本 (或者相同 歌词文本),且歌词文本中各个字符单元具有相同演唱时间的情况, 复制一已标定字符区域的 播放时间, 作为具有相同旋律的待标定字符区域的播放时间可大大节省操作者的编辑时间, 提高编辑效率。 例如, 在图 6所示的进行编辑的歌曲 "Happy Birthday"中, 前两句 "Happy birthday to you"具有相同的节奏, 在标定了第一句 "Happy birthday to you"中各个字符区域 (分别为 "Ha"、 "ppy"、 " " bir、 "th"、 "day", "to"、 "you") 的时间区域后, 只要标定第二 句 "Happy birthday to you"中第一个字符区域 "Ha"的起始时间标签, 再将前句的各个字符 区域的播放时间复制到后句相应的各个字符区域上, 就可以完成后句全部字符区域的时间标 定, 十分省时省力。 Since most of the time (for example, there is no part of the song in the song), the character areas are consecutively arranged on the time axis, the start time of the latter character area can be determined based on the stop time of the previous character area, for example , Set in the program, if not set, the stop time of the previous character area is used as the start time of the next character area to save the operator's editing steps. In addition, since the same melody often uses different lyric texts (or the same lyrics text) in the song, and each character unit in the lyric text has the same singing time, the playing time of a calibrated character area is copied as having the same melody. The playing time of the character area to be calibrated can greatly save the operator's editing time and improve the editing efficiency. For example, in the song "Happy Birthday" edited as shown in Fig. 6, the first two sentences "Happy birthday to you" have the same rhythm, and each character region in the first sentence "Happy birthday to you" is calibrated (respectively After the time zone of "Ha", "ppy", "" bir, "th", "day", "to", "you"), just calibrate the second In the sentence "Happy birthday to you", the start time label of the first character area "Ha", and then copy the play time of each character area of the preceding sentence to the corresponding character area of the latter sentence, you can complete all the later sentences. The time calibration of the character area is very time-saving and labor-saving.
4)存储经时间标定后的字符区域。在存储时采用这样的数据结构, 至少包括该字符区域 所对应的字符串在字符集中的起始和结束位置, 以及该字符串所对应的播放起止时间。 存储 后的文件可以被相应的字幕显示插仵读取, 用于在各种播放软件上同步播放和显示字幕。  4) Store the time-characterized character area. In the storage, such a data structure is adopted, which includes at least the start and end positions of the character string corresponding to the character area in the character set, and the play start and end time corresponding to the character string. The stored files can be read by the corresponding subtitle display plugs for simultaneous playback and display of subtitles on various playback software.
实施例三、 另一种字幕编辑制作方法, 其过程与实施例二基本相同, 只是在步骤 3-1-2) 中将特征值以高度的形式进行三维图形的显示输出。图 7显示了一幅对歌曲 "Happy Birthday" 中歌词为 "Happy birthday to you"的一段进行转换后以 256级高度表示的三维语谱图, 图 7 中并以不同颜色来表示各点的高程。 可以看出, 釆用三维图形显示能提供更富立体感的视觉 信息, 信息表达更加丰富和完整。  Embodiment 3: Another method for creating a subtitle editing process is basically the same as that of the second embodiment, except that the feature value is displayed and outputted in a high-level form in step 3-1-2). Figure 7 shows a three-dimensional map of the song "Happy Birthday" with the lyrics "Happy birthday to you" converted to a height of 256 levels. Figure 7 shows the elevation of each point in different colors. . It can be seen that the use of three-dimensional graphics display can provide more stereoscopic visual information, and the information expression is more abundant and complete.
为理解本发明的目的, 上述实施例中介绍了一种基本的声音图形化方法,在实际操作中, 还可根据需要, 有针对性的对原始声音文件或声音频谱采取各种增强及优化处理, 或对获得 的图形进行视觉上的修饰和编辑, 以获得更适合相应要求的效果, 这些基于本发明的具体处 理方式上的变化均不脱离本发明的保护范围。  For the purpose of understanding the present invention, a basic sound patterning method is introduced in the above embodiment. In actual operation, various enhancements and optimizations of the original sound file or sound spectrum may be targeted according to needs. , or the obtained graphics are visually modified and edited to obtain an effect more suitable for the corresponding requirements, and the specific processing manners based on the present invention are not deviated from the scope of the present invention.
本发明提供的字幕编辑制作方法和装置不仅可以用来进行歌曲字幕的编辑也可以制作纯 语音对白字幕, 例如电影字幕、 电视字幕等, 还可以作为学习异国语言的辅助工具。 由于本 方法大大简化了字幕制作的难度, 提髙了时间标签编辑的精确程度, 它将使得 "字幕 DIY" 成为普通非专业用户的一种新鲜娱乐方式。  The caption editing method and device provided by the invention can be used not only for editing song subtitles but also for pure speech subtitles, such as movie subtitles, TV subtitles, etc., and as an auxiliary tool for learning foreign languages. Since this method greatly simplifies the difficulty of subtitle production, and improves the accuracy of time label editing, it will make "subtitle DIY" a new form of entertainment for ordinary non-professional users.

Claims

权 利 要 求 Rights request
1、 一种字幕编辑制作方法, 包括如下步骤:  1. A method for making subtitle editing, comprising the following steps:
1 ) 将声音文件与字符集储存于记忆体中;  1) storing the sound file and character set in the memory;
2) 将字符集划分为多个载入段, 各个载入段中包括一个或多个字符区域;  2) dividing the character set into a plurality of load segments, each of which includes one or more character regions;
3 ) 根据声音文件标定字符区域的起止时间;  3) calibrate the start and end time of the character area according to the sound file;
4) 存储经时间标定后的字符区域;  4) storing the time-characterized character area;
其特征在于: 所述步骤 3 ) 包括如下步骤, The method is characterized in that: the step 3) comprises the following steps,
3-1 ) 将声音文件转换成以时间、 频率为二维变量的特征值, 并以图形的形式显示输出; 3-2) 根据显示输出的图形及同步播放的声音文件对字符区域进行起止时间的标定。 3-1) Convert the sound file into feature values with time and frequency as two-dimensional variables, and display the output in the form of graphics; 3-2) Start and end time of the character area according to the graphic output and the sound file played synchronously Calibration.
2、 根据权利要求 1所述的字幕编辑制作方法, 其特征在于: 在所述步骤 4) 中存储经时 间标定后的字符区域时釆用这样的数据结构, 至少包括该字符区域所对应的字符串在字符集 1 的起始和结束位置, 以及该字符串所对应的播放起止时间。 2. The caption editing method according to claim 1, wherein: when the time-characterized character region is stored in the step 4), the data structure is used, and at least the character corresponding to the character region is included. The string is at the beginning and end of character set 1, and the start and end time of the string.
3、 根据权利要求 1所述的字幕编辑制作方法, 其特征在于: 在对字符区域进行起止时间 的标定时, 基于前一字符区域的停止时间来确定后一字符区域的起始时间。  3. The caption editing method according to claim 1, wherein: the start time of the character region is determined, and the start time of the latter character region is determined based on the stop time of the previous character region.
4、 根据权利要求 2所述的字幕编辑制作方法, 其特征在于: 在对字符区域进行起止时间 的标定时, 基于前一字符区域的停止时间来确定后一字符区域的起始时间。  4. The subtitle editing method according to claim 2, wherein: the start time of the start and end time of the character area is determined based on the stop time of the previous character area to determine the start time of the next character area.
5、 根据权利要求 1所述的字幕编辑制作方法, 其特征在于: 在对字符区域进行起止时间 的标定时, 复制另一已标定字符区域的起止时间差, 即其播放时间, 作为待标定字符区域的 播放时间。  The method for creating a caption editing according to claim 1, wherein: in the calibration of the start and end time of the character region, the start and end time difference of the other calibrated character region, that is, the play time, is used as the character to be calibrated. Play time.
6、 根据权利要求 2所述的字幕编辑制作方法, 其特征在于: 在对字符区域进行起止时间 的标定时, 复制另一已标定字符区域的起止时间差, 即其播放时间, 作为待标定字符区域的 播放吋间。  The method for creating a caption editing according to claim 2, wherein: in the calibration of the start and end time of the character region, the start and end time difference of the other calibrated character region, that is, the play time thereof, is copied as the character to be calibrated. Play time.
7、 根据权利要求 3所述的字幕编辑制作方法, 其特征在于: 在对字符区域进行起止时间 的标定时, 复制另一已标定字符区域的起止时间差, 即其播放时间, 作为待标定字符区域的 播放时间。 7. The subtitle editing method according to claim 3, wherein: the start time of the character area is marked, and the start and end time difference of the other calibrated character area, that is, the play time, is used as the to-be-calibrated character area. of play time.
8、 根据权利要求 4所述的字幕编辑制作方法, 其特征在于: 在对字符区域进行起止时间 的标定时, 复制另一已标定字符区域的起止时间差, 即其播放时间, 作为待标定字符区域的 播放时间。  8. The subtitle editing method according to claim 4, wherein: the start time of the character region is marked, and the start and end time difference of the other calibrated character region, that is, the play time, is used as the to-be-calibrated character region. Play time.
9、 根据权利要求 1~8任意一项所述的字幕编辑制作方法, 其特征在于: 所述步骤 3-1 ) 包括如下步骤,  The method for creating a caption editing according to any one of claims 1 to 8, wherein the step 3-1) comprises the following steps:
3-1 -1 ) 将声音文件按时间序列划分为多帧, 分别计算出各帧的频谱, 得到以时间、 频率 为二维变量的特征值;  3-1 -1 ) dividing the sound file into multiple frames in time series, respectively calculating the spectrum of each frame, and obtaining the feature value with the time and frequency as two-dimensional variables;
3-1-2 ) 以时间和频率为坐标轴建立二维平面, 将对应的特征值以梯度的形式进行平面图 形的显示输出; 或者, 将对应的特征值以高度的形式进行三维图形的显示输出。  3-1-2) Create a two-dimensional plane with time and frequency as the coordinate axes, and display and output the corresponding feature values in the form of gradients; or, display the corresponding feature values in a high-level form. Output.
10、 根据权利要求 9 所述的字幕编辑制作方法, 其特征在于: 在步骤 3-1-2 ) 中将特征值 取对数后以一设定最大值进行归一化, 再以梯度或高度的形式进行图形输出。  The method for creating a caption editing according to claim 9, wherein: in step 3-1-2), the feature value is logarithmized and normalized by a set maximum value, and then the gradient or height is used. The form is graphically output.
1 1、 根据权利要求 1~8任意一项所述的字幕编辑制作方法, 其特征在于: 所述步骤 3 ) 中, 所述图形在一具有操作界面的显示窗口中按其时轴顺序分段显示或连续滚动, 所述显示窗口 中显示出的时间轴具有可设定的时间跨度; 当所述图形为分段显示时, 以一沿时间轴移动的 指示标志标示当前同步播放的声音文件的对应位置; 当所述图形为连续滚动时, 以一固定位 置的指示标志标示当前同步播放的声音文件的对应位置。  The method for creating a caption editing according to any one of claims 1 to 8, wherein in the step 3), the graphic is segmented in a time window in a display window having an operation interface. Display or continuous scrolling, the time axis displayed in the display window has a configurable time span; when the graphic is segmented display, an indicator moving along the time axis indicates the sound file currently being played synchronously Corresponding position; when the graphic is continuous scrolling, the corresponding position of the currently synchronized sound file is indicated by an indication of a fixed position.
12、 根据权利要求 9所述的字幕编辑制作方法, 其特征在于: 所述步骤 3 ) 中, 所述图形 在一具有操作界面的显示窗口中按其时轴顺序分段显示或连续滚动, 所述显示窗口中显示出 的时间轴具有可设定的时间跨度; 当所述图形为分段显示时, 以一沿时间轴移动的指示标志 标示当前同步播放的声音文件的对应位置; 当所述图形为连续滚动时, 以一固定位置的指示 标志标示当前同步播放的声音文件的对应位置。  The caption editing method according to claim 9, wherein in the step 3), the graphic is segmentally displayed or continuously scrolled in a display window having an operation interface in a time axis thereof. The time axis displayed in the display window has a settable time span; when the graphic is segmented display, an indication mark moving along the time axis indicates the corresponding position of the currently synchronized played sound file; When the graphic is continuous scrolling, the corresponding position of the currently synchronized sound file is indicated by a fixed position indicator.
13、 根据权利要求 10 所述的字幕编辑制作方法, 其特征在于: 所述步骤 3 ) 中, 所述图 形在一具有操作界面的显示窗口中按其时轴顺序分段显示或连续滚动, 所述显示窗口中显示 出的时间轴具有可设定的时间跨度; 当所述图形为分段显示时, 以一沿时间轴移动的指示标 志标示当前同步播放的声音文件的对应位置; 当所述图形为连续滚动时, 以一固定位置的指 示标志标示当前同步播放的声音文件的对应位置。 The method for creating a caption editing according to claim 10, wherein: in the step 3), the figure is Forming a segment display or continuous scrolling in a display window having an operation interface in a time axis thereof, the time axis displayed in the display window has a settable time span; when the graphic is segmented display, The corresponding position of the currently synchronized played sound file is indicated by an indication mark moving along the time axis; when the graphic is continuous scrolling, the corresponding position of the currently synchronized played sound file is indicated by a fixed position indication mark.
14、 根据权利要求 11 所述的字幕编辑制作方法, 其特征在于: 所述指示标志的移动速度 或所述图形的连续滚动速度, 以及同步播放的声音文件的播放速度, 低于该声音文件的原始 播放速度。  The method for creating a subtitle editing according to claim 11, wherein: the moving speed of the indication mark or the continuous scrolling speed of the graphic, and the playing speed of the synchronously played sound file are lower than the sound file. Original playback speed.
1 5、 根据权利要求 12所述的字幕编辑制作方法, 其特征在于: 所述指示标志的移动速度 或所述图形的连续滚动速度, 以及同步播放的声音文件的播放速度, 低于该声音文件的原始 播放速度。  The method for creating a subtitle editing according to claim 12, wherein: the moving speed of the indication mark or the continuous scrolling speed of the graphic, and the playing speed of the synchronously played sound file are lower than the sound file. The original playback speed.
16、 根据权利要求 13 所述的字幕编辑制作方法, 其特征在于: 所述指示标志的移动速度 或所述图形的连续滚动速度, 以及同步播放的声音文件的播放速度, 低于该声音文件的原始 播放速度。  16. The subtitle editing method according to claim 13, wherein: the moving speed of the indication mark or the continuous scrolling speed of the graphic, and the playing speed of the synchronizedly played sound file are lower than the sound file. Original playback speed.
17、 一种字幕编辑制作装置, 包括:  17. A subtitle editing production device, comprising:
数据存储装置, 用于存储作为素材的声音文件与字符集;  a data storage device for storing a sound file and a character set as a material;
数据处理装置, 用于将声音文件转换成以时间、 频率为二维变量的特征值, 以及, 将 字符集划分为多个载入段, 各个载入段中包括一个或多个字符区域;  a data processing device, configured to convert the sound file into feature values whose time and frequency are two-dimensional variables, and divide the character set into a plurality of load segments, each of the load segments including one or more character regions;
图形显示装置, 用于将转换后的声音文件以图形的形式进行显示输出;  a graphic display device, configured to display and output the converted sound file in a graphic form;
指令接收装置, 用于接收使用者发出的编辑指令, 并转换为可供指令执行装置识别的 指令信号; 以及  An instruction receiving device, configured to receive an editing instruction issued by a user, and convert the instruction signal into an instruction signal that is identifiable by the instruction executing device;
指令执行装置, 用于根据指令信号, 更改字符区域所包含的字符范围、 对各个字符区 域进行起止时间的标定。  The command execution device is configured to change a range of characters included in the character area according to the command signal, and perform calibration of the start and end time of each character area.
PCT/CN2005/000535 2005-04-19 2005-04-19 Subtitle editing method and the device thereof WO2006111041A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/CN2005/000535 WO2006111041A1 (en) 2005-04-19 2005-04-19 Subtitle editing method and the device thereof

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2005/000535 WO2006111041A1 (en) 2005-04-19 2005-04-19 Subtitle editing method and the device thereof

Publications (1)

Publication Number Publication Date
WO2006111041A1 true WO2006111041A1 (en) 2006-10-26

Family

ID=37114689

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2005/000535 WO2006111041A1 (en) 2005-04-19 2005-04-19 Subtitle editing method and the device thereof

Country Status (1)

Country Link
WO (1) WO2006111041A1 (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5194683A (en) * 1991-01-01 1993-03-16 Ricos Co., Ltd. Karaoke lyric position display device
US5243582A (en) * 1990-07-06 1993-09-07 Pioneer Electronic Corporation Apparatus for reproducing digital audio information related to musical accompaniments
JPH08234775A (en) * 1995-02-24 1996-09-13 Victor Co Of Japan Ltd Music reproducing device
US5997308A (en) * 1996-08-02 1999-12-07 Yamaha Corporation Apparatus for displaying words in a karaoke system
US20020193895A1 (en) * 2001-06-18 2002-12-19 Ziqiang Qian Enhanced encoder for synchronizing multimedia files into an audio bit stream

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5243582A (en) * 1990-07-06 1993-09-07 Pioneer Electronic Corporation Apparatus for reproducing digital audio information related to musical accompaniments
US5194683A (en) * 1991-01-01 1993-03-16 Ricos Co., Ltd. Karaoke lyric position display device
JPH08234775A (en) * 1995-02-24 1996-09-13 Victor Co Of Japan Ltd Music reproducing device
US5997308A (en) * 1996-08-02 1999-12-07 Yamaha Corporation Apparatus for displaying words in a karaoke system
US20020193895A1 (en) * 2001-06-18 2002-12-19 Ziqiang Qian Enhanced encoder for synchronizing multimedia files into an audio bit stream

Similar Documents

Publication Publication Date Title
US10056062B2 (en) Systems and methods for the creation and playback of animated, interpretive, musical notation and audio synchronized with the recorded performance of an original artist
US9489938B2 (en) Sound synthesis method and sound synthesis apparatus
US6424944B1 (en) Singing apparatus capable of synthesizing vocal sounds for given text data and a related recording medium
US5915237A (en) Representing speech using MIDI
KR101274961B1 (en) music contents production system using client device.
JP6465136B2 (en) Electronic musical instrument, method, and program
EP1512140B1 (en) Musical notation system
CN103093750A (en) Music data display control apparatus and method
JP7259817B2 (en) Electronic musical instrument, method and program
US20220076658A1 (en) Electronic musical instrument, method, and storage medium
EP3975167A1 (en) Electronic musical instrument, control method for electronic musical instrument, and storage medium
KR100710600B1 (en) The method and apparatus that createdplayback auto synchronization of image, text, lip's shape using TTS
JP2008020621A (en) Content authoring system
CN108922505B (en) Information processing method and device
WO2006111041A1 (en) Subtitle editing method and the device thereof
WO2020217801A1 (en) Audio information playback method and device, audio information generation method and device, and program
JP2001134283A (en) Device and method for synthesizing speech
JP2580565B2 (en) Voice information dictionary creation device
JP2007225916A (en) Authoring apparatus, authoring method and program
JP4501874B2 (en) Music practice device
JP2001125599A (en) Voice data synchronizing device and voice data generator
JP3620423B2 (en) Music information input editing device
KR101427666B1 (en) Method and device for providing music score editing service
JP4161714B2 (en) Karaoke equipment
Bodo et al. Web Sonification with synesthesia tools

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application
NENP Non-entry into the national phase

Ref country code: DE

WWW Wipo information: withdrawn in national office

Country of ref document: DE

NENP Non-entry into the national phase

Ref country code: RU

WWW Wipo information: withdrawn in national office

Country of ref document: RU

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC - FORM EPO 1205A DATED 11-04-2008

122 Ep: pct application non-entry in european phase

Ref document number: 05743435

Country of ref document: EP

Kind code of ref document: A1