CN103997657A - Converting method and device of audio in video - Google Patents

Converting method and device of audio in video Download PDF

Info

Publication number
CN103997657A
CN103997657A CN201410248518.7A CN201410248518A CN103997657A CN 103997657 A CN103997657 A CN 103997657A CN 201410248518 A CN201410248518 A CN 201410248518A CN 103997657 A CN103997657 A CN 103997657A
Authority
CN
China
Prior art keywords
audio
video
field
preset
image
Prior art date
Application number
CN201410248518.7A
Other languages
Chinese (zh)
Inventor
刘德建
汪松
关胤
Original Assignee
福建天晴数码有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 福建天晴数码有限公司 filed Critical 福建天晴数码有限公司
Priority to CN201410248518.7A priority Critical patent/CN103997657A/en
Publication of CN103997657A publication Critical patent/CN103997657A/en

Links

Abstract

The invention discloses a converting method of audio in video. The method comprises the steps that subtitles are obtained, and the subtitles correspond to the video; fields matched with fields in a preset field list in subtitle information are recognized, the fields are used as first fields, the time positions of the first fields in the video are used as first positions, and audio at the first positions is used as first audio; and the first audio is converted according to a preset scheme. The invention further provides a converting device of the audio in the video for achieving the method. According to the technical scheme, fields which need to be learned with more attention can be extracted from the subtitles effectively, marks are emphasized through seeing and hearing effect, and user learning efficiency and memorizing effect are effectively improved with entertainment.

Description

一种视频中音频的变换方法及装置 To a variant of the video and the audio device

技术领域 FIELD

[0001] 本发明涉及多媒体处理领域,特别涉及一种视频中音频的变换方法及装置。 [0001] The present invention relates to multimedia processing, and more particularly, to a method and apparatus for converting video audio.

背景技术 Background technique

[0002] 随着文化生活的日益丰富和对外交流需求的日趋扩大,越来越多的外语学习者选择通过观看外语纪录片、电视剧、电影等方式扩充词汇量、提升外语能力。 [0002] With the increasingly rich cultural life and the increasing expansion of foreign exchange demand, a growing number of foreign language learners choose to expand foreign language vocabulary by watching documentaries, TV shows, movies, etc., to enhance the foreign language skills. 事实上,在日常休闲过程中潜移默化地接触外语词汇的确能够更有效地帮助记忆,提升学习效率,并且与一定语境和视听效果结合的记忆将给学习者留下更深刻的印象。 In fact, in the course of subtle everyday casual contact with foreign words can indeed be more effective in helping memory, enhance learning efficiency, and combined with a certain context and audio-visual effects memory learner will leave a deeper impression. 然而,目前利用休闲娱乐性质的多媒体资料进行外语学习的方式多存在于学习者自行选择的方式中,并没有一种技术能对休闲娱乐性质的多媒体资料中的语音内容与学习者所需的外语知识资料库建立系统联系从而达到更有目的性和针对性的学习效果,也无法有效利用多媒体媒介的特性对某一知识点进行视听有机结合的学习。 However, the use of multimedia data recreational nature of foreign language learning way more than exist in the way the learner's own choice, and no technology capable of multimedia materials for recreational property in the learner voice content and required foreign language establish contact system knowledge database so as to achieve more purposeful and targeted learning, can not effectively use the characteristics of multimedia audio-visual media are learning to combine a knowledge point.

发明内容 SUMMARY

[0003] 为此,需要提供一种视频中音频的变换方法及其装置。 [0003] To this end, there is provided a method of video and audio converting means.

[0004] 为实现上述目的,发明人提供了一种一种视频中音频的变换方法,包括步骤: [0004] To achieve the above object, the invention provides a method of transforming one kind of audio in a video, comprising the steps of:

[0005] 获取字幕,所述字幕与视频对应; [0005], captions, video corresponding to the caption;

[0006] 识别字幕信息中与预设字段列表中字段相匹配的字段,以这些字段为第一字段,以第一字段在视频中出现的时间位置为第一位置,以第一位置的音频为第一音频; [0006] The subtitle identification information in the preset field in the field list matches the field to a first field of these fields, the position of the first time to appear in the video field to a first position to a first position of the audio The first audio;

[0007] 以预设方案变换所述第一音频。 [0007] In transforming the first audio preset scheme.

[0008] 进一步地,所述的视频中音频的变换方法中,在步骤“以预设方案变换所述第一音频”后还包括步骤: After [0008] Further, the method of converting audio in a video, in the step "to transforming the first audio preset scheme" further comprises the step of:

[0009] 以预设方案变换视频第一位置的单帧图像或多帧图像。 [0009] The individual images or frames in a preset scheme transformed video image in a first position.

[0010] 进一步地,所述的视频中音频的变换方法中,所述获取字幕步骤具体包括: [0010] Further, the method of converting audio video, said step of obtaining the subtitle comprises:

[0011] 获取文本格式的字幕,或 [0011] Gets text captions, or

[0012] 获取图片格式的字幕,辨别和提取所述图片中的文字信息。 [0012] Text subtitle format information acquired image, identify and extract the picture.

[0013] 进一步地,所述的视频中音频的变换方法中,步骤“以预设方案变换所述第一音频”具体包括: [0013] Further, the method of converting audio video, step "In converting the first audio preset scheme" comprises:

[0014] 以第二音频替换第一音频,所述第二音频为对应于第一字段的音频;或 [0014] In a second alternative a first audio audio, the second audio corresponding to the audio of the first field; or

[0015] 以第三音频替换第一音频,所述第三音频为经处理的第一音频与第二音频的叠加音频。 [0015] In a first audio third audio Alternatively, the third audio is processed first audio and second audio overlay audio.

[0016] 进一步地,所述的视频中音频的变换方法中,步骤“以预设方案变换视频第一位置的单帧图像或多帧图像”具体包括: [0016] Further, the method of converting audio video, step "In a first position of a preset scheme transformed video image or frames of a single frame of image" comprises:

[0017] 在所述单帧图像或多帧图像的预设位置添加对应于第一字段的文字信息;或 [0017] added in a predetermined position of the individual images or frames corresponding to the character image information of the first field; or

[0018]以预设方案变化或替换所述单帧图像或多帧图像中的字幕中对应于第一字段的文字。 [0018] In the preset scheme change or replace individual images or frames caption image corresponding to the first text field. [0019] 此外,发明人还提供了一种视频中音频的变换装置,包括字幕获取单元、字段识别单元和音频变换单元; [0019] Furthermore, the invention also provides an audio video conversion apparatus, the acquisition unit includes a subtitle, and an audio recognition unit field conversion unit;

[0020] 所述字幕获取单元用于获取字幕,所述字幕与视频对应; [0020] The acquisition unit for acquiring a subtitle caption, the caption corresponding to a video;

[0021] 所述字段识别单元用于识别字幕信息中与预设字段列表中字段相匹配的字段,以这些字段为第一字段,以第一字段在视频中出现的时间位置为第一位置,以第一位置的音频为第一音频; [0021] The identification field is a field for identifying caption information unit with a preset list field matches a field to a first field of these fields, the first time position in the video field occurring as a first position, in a first position of a first audio audio;

[0022] 所述音频变换单元用于以预设方案变换所述第一音频。 [0022] In the audio converting means for converting the first audio preset scheme.

[0023] 进一步地,所述的视频中音频的变换装置,还包括视频变换单元; [0023] Further, according to the audio video conversion apparatus, further includes a video transform unit;

[0024] 所述视频变换单元用于以预设方案变换视频第一位置的单帧图像或多帧图像。 [0024] The video conversion means to image the individual images or frames preset scheme transformed video first position.

[0025] 进一步地,所述的视频中音频的变换装置中,字幕获取单元获取字幕具体包括: [0025] Further, the video converting means in the audio, subtitles, captions acquisition unit comprises:

[0026] 获取文本格式的字幕,或 [0026] acquired text subtitles, or

[0027] 获取图片格式的字幕,辨别和提取所述图片中的文字信息。 [0027] Text subtitle format information acquired image, identify and extract the picture.

[0028] 进一步地,所述的视频中音频的变换装置中,音频变换单元以预设方案变换所述第一音频具体包括: [0028] Further, the video conversion device according to the audio, the audio conversion unit converting the first audio preset scheme comprises:

[0029] 以第二音频替换第一音频,所述第二音频为对应于第一字段的音频;或 [0029] In a second alternative a first audio audio, the second audio corresponding to the audio of the first field; or

[0030] 以第三音频替换第一音频,所述第三音频为经处理的第一音频与第二音频的叠加音频。 [0030] In a first audio third audio Alternatively, the third audio is processed first audio and second audio overlay audio.

[0031] 进一步地,所述的视频中音频的变换装置中,视频变换单元以预设方案变换视频第一位置的单帧图像或多帧图像具体包括: [0031] Further, the video conversion device according to the audio, the video converting means to a first position preset scheme transformed video image or frames of a single frame of the image comprises:

[0032] 在所述单帧图像或多帧图像的预设位置添加对应于第一字段的文字信息;或 [0032] added in a predetermined position of the individual images or frames corresponding to the character image information of the first field; or

[0033]以预设方案变化或替换所述单帧图像或多帧图像中的字幕中对应于第一字段的文字。 [0033] In the preset scheme change or replace individual images or frames caption image corresponding to the first text field.

[0034] 区别于现有技术,上述技术方案能有效地从字幕中提取需要重点学习的字段并以视听效果强调标记,寓教于乐地有效提升用户的学习效率以及记忆效果。 [0034] distinguished from the prior art, the above technical solution can effectively extract from the fields that need to focus on learning to subtitles and audio-visual effects highlight mark, entertaining the user to effectively enhance the efficiency of learning and memory effect.

附图说明 BRIEF DESCRIPTION

[0035] 图1为本发明一实施方式所述视频中音频的变换方法的流程图; [0035] FIG. 1 is a flowchart of an audio video conversion method in an embodiment of the present invention embodiment;

[0036] 图2为本发明另一实施方式所述视频中音频的变换装置的功能模块示意图。 [0036] FIG. 2 is a schematic diagram of an audio function module converting apparatus of the embodiment in another embodiment of the invention the video.

[0037] 附图标记说明: [0037] REFERENCE NUMERALS:

[0038] 1-字幕获取单元 [0038] 1- subtitle acquiring unit

[0039] 2-字段识别单元 [0039] 2- field recognition unit

[0040] 3-音频变换单元 [0040] 3- audio transform unit

[0041] 4-视频变换单元 [0041] 4- video transform unit

[0042] 5-存储单元 [0042] 5- storage unit

[0043] 6-判断单元 [0043] 6- determination unit

具体实施方式 Detailed ways

[0044] 为详细说明技术方案的技术内容、构造特征、所实现目的及效果,以下结合具体实施例并配合附图详予说明。 [0044] The technical content of the technical solutions described in detail, structural features, objects and effects achieved by the following embodiments and with reference to specific embodiments in detail with reference to FIG. [0045] 请参阅图1,为本发明一实施方式所述视频中音频的变换方法的流程图。 [0045] Referring to FIG. 1, a flowchart of a method for converting the video audio embodiment of the present invention. 所述方法包括步骤: Said method comprising the steps of:

[0046] S1、获取字幕,所述字幕与视频对应; [0046] S1, obtaining a subtitle, the subtitle corresponding to the video;

[0047] S2、识别字幕信息中与预设字段列表中字段相匹配的字段,以这些字段为第一字段,以第一字段在视频中出现的时间位置为第一位置,以第一位置的音频为第一音频; [0047] S2, the subtitle identification information in a preset list field matches a field in the field to a first field of these fields, the first time position in the video field occurring as a first position to a first position audio first audio;

[0048] S3、以预设方案变换所述第一音频; [0048] S3, transforming the first preset audio program;

[0049] S4、以预设方案变换视频第一位置的单帧图像或多帧图像。 [0049] S4, a first transformed video program at a predetermined position of individual images or frames image.

[0050] 进一步地,步骤SI所述的获取字幕具体包括获取文本格式的字幕,或获取图片格式的字幕,辨别和提取所述图片中的文字信息。 [0050] Further, the step of obtaining the SI comprises obtaining subtitle caption text format, image format information, or acquiring text captions identify and extract the picture.

[0051] 具体地,获取字幕的方式可以是获取该视频的图片格式字幕或文本格式字幕,首先获取视频文件对应的路径,并在统一路径下按扩展名搜索字幕文件,文本格式字幕文件的扩展名通常是ass (Advanced Substation Alpha)、srt (Sub Rip Text)等;而图片格式字幕常见由字幕图片文件(如.sub文档)和字幕索引文件(如.idx文档)组成,一个.sub文档可同时包含多语言字幕,由.1dx文档进行调用。 [0051] In particular, the acquisition subtitles ways is to get the picture format caption or text format captions for the video, first obtain the path of the video files corresponding and by extension search subtitle file extension text subtitle file under a unified path usually with an ass (Advanced Substation Alpha), srt (Sub Rip Text) and so on; and the common image formats subtitles by the subtitle image files (such as documents .sub) subtitle and index files (such as documents .idx) composed of a document may .sub contains multi-language subtitles, is called by .1dx document. 对图片格式字幕,可以按一定的转换方法(如光学字符识别OCR法,或直接调用Subrip、Vobsub, SubToSrt等字幕格式转换软件)将其转换为文本格式字幕,并按照字幕的挂载形式进行转换:若字幕是以外挂或内挂形式挂载则直接获取并转换为文本格式字幕;若字幕是以内嵌形式挂载,则从相应的视频帧的图片文件中以光学文字识别(OCR)算法获取字幕中的文字。 The picture format of the caption to be (conversion software such as optical character recognition OCR method, or a direct call Subrip, Vobsub, SubToSrt subtitle format, etc.) by a certain conversion method to convert it to text subtitles, and conversion of a subtitle in accordance with the form of the mount : If the subtitle is within or linked to form plug mounted directly acquired and converted to text subtitle; if the subtitle is embedded in the form of mount, the corresponding image file from the video frame to obtain an optical character recognition (OCR) algorithm caption text. 优选的处理方式为:获取当前帧图像,以该图像下方高度10%,宽度100%矩形内截取粗定位图像,对该粗定位图像进行光学字符识别(OCR)处理获取文本信息。 The preferred treatment: obtaining the current frame image, to the image height below 10%, the width of the image taken coarse positioning rectangle 100%, the coarse positioning image optical character recognition (OCR) process acquires text information.

[0052] 更进一步地,由于字幕所在图像像素通常是不变的,可以基于此特性更精确地获得文本所在位置,其方法为,获取连续N帧图像(本例中N = 5),对每帧图像获取粗定位图像,对N帧粗定位图像的对应像素点进行比较,如果其BGR三通道灰度值之差在一定范围R(本例为10)之内,则标记不定点,截取包含所有不动点的最小外接矩形内的图像,并以光学字符识别(OCR)算法获取文本。 [0052] Still further, since the image pixels where the subtitles are always the same, this feature can be obtained based on the position where the text more accurately, the method of obtaining N consecutive frame images (in this example N = 5), for each frame image obtaining coarse positioning image, the corresponding pixel of the N frame image is compared coarse positioning, if the difference between the three channels within the grayscale values ​​in a certain range of its BGR R (10 in this example), the tag is not designated, interception comprising Fixed images in all points minimum bounding rectangle, and acquires text to optical character recognition (OCR) algorithm.

[0053] 在某些实施例中,获取字幕的方式还可以是分析音频并分离背景音效与人声,从获取的人声中通过语音识别技术确定与人声相对应的文字及其于视频中的位置信息。 [0053] In certain embodiments, the acquisition mode of subtitles and audio analysis may also be separated from the background sound and the human voice, the human voice is determined corresponding to the character by voice recognition technology acquired from the human voice and video in the location information.

[0054] 步骤S2中所述的字段可以是单词、词组、短语或句子;所述预设字段列表可以是预设的词汇库如大学英语四级词库、大学英语六级词库、托福(TOEFL)考试词库、雅思(IELTS)考试词库等词库中的字段的英文或中文形式字段;可以是大学日语四级词库、大学日语六级词库、日语等级考试(JLPT)N1-N5各等级词库等词汇库中的字段的日文或中文形式字段,以及其他多种外语的教科书大纲词汇库、考试大纲词汇库、双语或多语种字典等词库的字段。 In the fields [0054] Step S2 can be a word, phrase, sentence or phrase; the list of preset fields can be preset as CET vocabulary thesaurus, thesaurus CET, TOEFL ( TOEFL) exam vocabulary, IELTS (IELTS) exam thesaurus thesaurus in the field of English or Chinese form field; thesaurus can be four Japanese universities, the University of six Japanese dictionary, Japanese grade examination (JLPT) N1- field of vocabulary in each class N5 thesaurus and other Japanese or Chinese form fields, and a variety of other fields of foreign language vocabulary textbook outline, syllabus vocabulary, bilingual or multilingual dictionaries, thesaurus. 同时,预设字段列表还可以是用户自定义词库,其中内容由用户根据自己学习进度和掌握程度自行添加。 Meanwhile, the list of preset fields can also be user-defined dictionary, where the content added by the users themselves according to their own learning progress and mastery.

[0055] 步骤S3中,以预设方案变换所述第一音频具体包括: [0055] In step S3, the first to transform audio preset scheme comprises:

[0056] 以第二音频替换第一音频,所述第二音频为对应于第一字段的音频;或 [0056] In a second alternative a first audio audio, the second audio corresponding to the audio of the first field; or

[0057] 以第三音频替换第一音频,所述第三音频为经处理的第一音频与第二音频的叠加音频。 [0057] In a first audio third audio Alternatively, the third audio is processed first audio and second audio overlay audio.

[0058] 具体地,例如视频中人声台词为“I suspect a diaphragmatic rupture.”,其对应的双语字幕为“I suspect a diaphragmatic rupture.我怀疑是横隔膜破裂”,经步骤S2的识别,确定“横膈膜破裂”(diaphragmatic rupture)这一词组为预设字段列表中的内容即第一字段,并且原音频流中“diaphragmatic rupture”的英文人声音频为第一音频;则在本步骤中,用“横膈膜破裂”的中文人声音频作为第二音频,替换原音频流中的第一音频。 [0058] Specifically, the human voice such as video lines of "I suspect a diaphragmatic rupture.", Corresponding to bilingual subtitles "I suspect a diaphragmatic rupture. I suspect rupture diaphragm", is identified in step S2, it is determined " diaphragm rupture "(diaphragmatic rupture) the phrase preset content of the field list, i.e., a first field, and the original audio stream" diaphragmatic rupture "in English as a first vocal audio audio; then in this step, with "rupture diaphragm" in Chinese people as a second audio sound audio, replacing the first audio in the original audio stream.

[0059] 又例如,视频中出现字幕“电击受伤可能导致腹腔组织受损The electricalinjury may have caused intra-abdominal tissue damage”,经步骤S2 的识别,确定“电击伤”(electrical injury)这一词组为预设字段列表中的内容即第一字段,视频中出现本字幕的时间位置为第一位置;但经检测和判断,原视频中此第一位置处的音频流中不存在对应于本句的人声,则用“电击伤”中文人声音频或“electrical injury”英文人声音频作为第二音频替换原音频流中的第一音频。 [0059] As another example, the video appears in the subtitle "shock injury may cause abdominal tissue damage The electricalinjury may have caused intra-abdominal tissue damage", identified by the step S2, it is determined "electrical injury" (electrical injury) of the phrase default content list field, i.e. a first field, the time position of the video appears in the first position of the subtitle; but by detecting and determining, at the audio stream of the original video is not present in this first position corresponds to a sentence the human voice, with the "electrical injury" Chinese vocal audio or "electrical injury" English audio replacing the first vocal audio as original audio stream a second audio. 进一步地,如果视频中原音频流中不存在对应于字幕的人声,但经检测和判断确定第一位置存在音乐等背景音效,则也可以该第一位置处的音频流作为第一音频,与第二音频叠加后生成第三音频,并以第三音频替换第一音频;或者更进一步地,对音乐等背景音效衰减处理后与第二音频叠加生成第三音频,并以第三音频替换第一音频。 Further, if the Central audio video stream corresponding to the human voice is not present subtitles, but is determined by detecting and determining the presence of background music, sound effects a first position, the audio stream may be at the first position as the first audio, and generating a second audio superimposed upon a third audio, and audio to a third alternative of the first audio; or further, a background music sound attenuating post-processing a second audio added to produce a third audio, and audio to a third alternative of an audio.

[0060] 又例如,视频中出现字幕“他需要持续的心脏重症监护He needs sustainedcardiac 1.CU”经步骤S2的识别,确定“持续的”(sustained) —词为预设字段列表中的内容即第一字段,视频中出现本字幕的时间位置为第一位置;但经检测和判断,原视频中此第一位置处的音频流为另一人声“He,s gonna die if we don't do anything.”(如果我们什么都不做,他会死的),也就是说经判别发现此处的人声与字幕不对应,则处理方法可以是依然用“sustained”英文人声作为第二音频或用“持续的”中文人声作为第二音频替换第一位置处的第一音频,也可以是不做任何替换处理,以免音频替换对视频观看造成娱乐效果或理解方面的影响。 [0060] In another example, after identifying step S2 caption "He needs continuous cardiac intensive care He needs sustainedcardiac 1.CU" appears in the video to determine the "sustained" (sustained) - word for content that is preset fields lists a first field, the time position of the video appears in the first position of the subtitle; but after detection and determination, the original audio stream at a first position of this video another singing "He, s gonna die if we do not do anything. "(If we do nothing, he will die), that is to say by discrimination found in the human voice here does not correspond with the subtitle, the processing method may still be used" sustained "human voice in English as a second audio or use "persistent" Chinese human voice as the first audio at the first position of the second audio replacement, can be replaced without any treatment, so as not to affect the entertainment audio replacement or understanding aspects of video viewing.

[0061] 也就是说,在步骤S3中的替换步骤之前还包括一判断步骤:分析并判断第一音频是否与第一字段存在对应关系。 [0061] That is, before the replacement step in step S3 a judgment step further comprises: analyzing a first audio and judges whether there is a correspondence between the first field. 并根据分析判断所得结果,按预设方案变换所述音频。 The obtained results of the analysis and determination, converting the audio according to a preset program.

[0062] 进一步地,步骤S4中的“以预设方案变换视频第一位置的单帧图像或多帧图像”具体包括: [0062] Further, the step S4 is "transformed video program at a first predetermined position of the image of individual images or frames" comprises:

[0063] 在所述单帧图像或多帧图像的预设位置添加对应于第一字段的文字信息;或以预设方案变化或替换所述单帧图像或多帧图像中的字幕中对应于第一字段的文字。 [0063] Add text information corresponding to the first field in a predetermined position of the image of individual images or frames; preset programs or modification or replacement of the individual images or frames of the image corresponding to the caption the first text field.

[0064] 所述“对应于第一字段的文字信息”可以是预设字段列表中有关第一字段的字段、解释、音标或例句。 [0064] The "character information corresponding to the first field" may be preset in the relevant field of the first field list field, interpretation, or phonetic sentence. 例如,经过步骤S3中用“横膈膜破裂”的中文人声音频作为第二音频,替换原音频流中的第一音频之后,在视频第一位置的单帧图像或多帧图像上的预设位置如视频底部添加字样“横膈膜破裂diaphragmatic rupture”,或检测原字幕中对应于第一字段的部分是否有双语字幕,若有双语字幕则将其中对应于第一字段的部分以高亮提示或用彩色文字替换原字幕中对应文字;若原字幕中仅有第一字段的中文或英文字幕,则将其对应的英文或中文文字标记于该字幕上方或下方。 For example, after using S3 through step "rupture diaphragm" in Chinese people as a second audio sound audio, replacing the first audio in the original audio stream, on a pre-image of individual images or frames of a video in a first position the position of the bottom of the video is provided to add the word "diaphragm rupture diaphragmatic rupture", or detect whether the original caption corresponding to the first field is part of a bilingual subtitles, then the bilingual subtitles wherein if the first field corresponding to the highlighted portion Alternatively or subtitles corresponding to the original prompt text with colored text; if only the first field of the original subtitles or English subtitles in Chinese, it is English or Chinese characters corresponding to the subtitle marker above or below.

[0065] 本步骤的作用是通过视觉效果对第一字段进行加强提示,与声效相结合,进一步增强用户对待掌握词汇的印象。 [0065] effect of this step is carried out by the visual effect of the first field to strengthen prompt, combined with sound, further enhancing the user to treat master the vocabulary impression.

[0066] 请参阅图2,为本发明另一实施方式所述视频中音频的变换装置的功能模块示意图。 [0066] Please refer to FIG. 2, a schematic diagram of functional modules of an audio converting device in the video mode to another embodiment of the present invention. 所述装置包括字幕获取单元1、字段识别单元2和音频变换单元3 ;[0067] 所述字幕获取单元I用于获取字幕,所述字幕与视频对应; The apparatus includes a subtitle acquisition unit 1, the field recognition unit 2 and an audio conversion unit 3; [0067] I said obtaining unit configured to obtain subtitle caption, the caption corresponding to a video;

[0068] 所述字段识别单元2用于识别字幕信息中与预设字段列表中字段相匹配的字段,以这些字段为第一字段,以第一字段在视频中出现的时间位置为第一位置,以第一位置的音频为第一音频; [0068] The field identification unit 2 for identifying caption information field list in the preset field matches a field to a first field of these fields, the first time position in the video field appears first position to a first position to a first audio audio;

[0069] 所述音频变换单元3用于以预设方案变换所述第一音频。 [0069] The audio transform unit 3 for transforming the first preset audio program.

[0070] 进一步地,所述装置还包括视频变换单元4,用于以预设方案变换视频第一位置的单帧图像或多帧图像。 [0070] Further, the apparatus further comprises a 4, individual images or frame images at a preset position of a first embodiment of the transformed video image converting means.

[0071] 进一步地,字幕获取单元I获取字幕具体包括获取文本格式的字幕,或获取图片格式的字幕,辨别和提取所述图片中的文字信息。 [0071] Further, the subtitles, captions acquisition unit I comprises acquiring text subtitles, captions or image format, and extracting character information to identify the picture.

[0072] 具体地,字幕获取单元I获取字幕的方式可以是获取该视频的图片格式字幕或文本格式字幕,字幕获取单元I首先获取视频文件对应的路径,并在统一路径下按扩展名搜索字幕文件,将其存储于存储单元5并待后续处理调用。 [0072] Specifically, the subtitle acquisition unit I get subtitles ways is to get the picture format caption or text format captions for the video's caption acquisition unit I first get the path of video files corresponding and unified path by extension search subtitles file, which is stored in the storage unit 5 and the subsequent processing to be invoked. 文本格式字幕文件的扩展名通常是ass (Advanced Substation Alpha)、srt (Sub Rip Text)等;而图片格式字幕常见由字幕图片文件(如.sub文档)和字幕索引文件(如.1dx文档)组成,一个.sub文档可同时包含多语言字幕,由.1dx文档进行调用。 Extension text format subtitle file is usually ass (Advanced Substation Alpha), srt (Sub Rip Text) and so on; and the common image formats subtitles by the subtitle image files (such as documents .sub) subtitle and index files (such as .1dx document) composed of , .sub document may contain both a multi-language subtitles, is called by .1dx document. 对图片格式字幕,可以按一定的转换算法(如光学字符识别OCR法,或直接调用Subrip、V0bsub、SubT0Srt等字幕格式转换软件)将其转换为文本格式字幕,并按照字幕的挂载形式进行转换:若字幕是以外挂或内挂形式挂载则直接获取并转换为文本格式字幕;若字幕是以内嵌形式挂载,则从相应的视频帧的图片文件中以光学文字识别(OCR)算法获取字幕中的文字。 The picture format of the caption to be converted according to a certain algorithm (e.g., optical character recognition method OCR, or direct calls Subrip, V0bsub, SubT0Srt subtitle format conversion software, etc.) to convert it to text subtitle, and converted in accordance with the form of subtitles mount : If the subtitle is within or linked to form plug mounted directly acquired and converted to text subtitle; if the subtitle is embedded in the form of mount, the corresponding image file from the video frame to obtain an optical character recognition (OCR) algorithm caption text. 此处字幕获取单元I优选的工作方式为:获取当前帧图像,以该图像下方高度10%,宽度100%矩形内截取粗定位图像,对该粗定位图像进行光学字符识别(OCR)处理获取文本信息。 Here subtitle acquiring unit I preferred operating mode: obtaining the current frame image, to the image height below 10%, the width of the image taken coarse positioning rectangle 100%, the coarse positioning image optical character recognition (OCR) processing acquired text information.

[0073] 更进一步地,由于字幕所在图像像素通常是不变的,字幕获取单元I可以基于此特性更精确地获得文本所在位置,具体方式为,获取连续N帧图像(本例中N = 5),对每帧图像获取粗定位图像,对N帧粗定位图像的对应像素点进行比较,如果其BGR三通道灰度值之差在一定范围R(本例为10)之内,则标记不定点,截取包含所有不动点的最小外接矩形内的图像,并以光学字符识别(OCR)算法获取文本。 [0073] Still further, since the image pixels where the subtitles are always the same, I can subtitle acquisition unit based on this characteristic more accurately obtain the position where the text, particularly a manner as to obtain successive N frame images (in this example N = 5 ), to obtain coarse positioning each frame image, a frame of N pixels corresponding to the coarse positioning image are compared, if the difference between the three channels within the grayscale values ​​in a certain range of its BGR R (10 in this example), the tag is not point, all taken image including the fixed point minimum bounding rectangle, and acquires text to optical character recognition (OCR) algorithm.

[0074] 在某些实施例中,字幕获取单元I获取字幕的方式还可以是分析音频并分离背景音效与人声,从获取的人声中通过语音识别技术确定与人声相对应的文字及其于视频中的位置信息。 Embodiment [0074] In certain embodiments, subtitles, captions acquisition unit I may also be analyzed and isolated background audio and voice sound, vocal determined from the text corresponding to human voice acquired by the voice recognition technology and its location in the video.

[0075] 进一步地,字段识别单元2所识别的字段可以是单词、词组、短语或句子;预设字段列表存储于存储单元5 ;所述预设字段列表可以是预设的词汇库如大学英语四级词库、大学英语六级词库、托福(TOEFL)考试词库、雅思(IELTS)考试词库等词库中的字段的英文或中文形式字段;可以是大学日语四级词库、大学日语六级词库、日语等级考试(JLPT)N1-N5各等级词库等词汇库中的字段的日文或中文形式字段,以及其他多种外语的教科书大纲词汇库、考试大纲词汇库、双语或多语种字典等词库的字段。 [0075] Further, the field recognition unit 2 may be a field identified words, phrases, sentences or phrases; preset fields lists stored in the memory unit 5; the preset field list may be preset as English vocabulary four dictionary, thesaurus CET, TOEFL (TOEFL) exam vocabulary, IELTS (IELTS) exam thesaurus thesaurus in the field of English or Chinese form field; thesaurus can be four Japanese universities, the University of six Japanese dictionary, Japanese grade examination (JLPT) N1-N5 each grade thesaurus and other vocabulary in the fields of Japanese or Chinese form fields, and a variety of other foreign language textbook vocabulary outline, syllabus vocabulary, bilingual or multilingual thesaurus dictionaries and other fields. 同时,预设字段列表还可以是用户自定义词库,其中内容由用户根据自己学习进度和掌握程度自行添加。 Meanwhile, the list of preset fields can also be user-defined dictionary, where the content added by the users themselves according to their own learning progress and mastery.

[0076] 进一步地,音频变换单元3以预设方案变换所述第一音频具体包括: [0076] Further, the audio conversion unit 3 converting the first audio preset scheme comprises:

[0077] 以第二音频替换第一音频,所述第二音频为对应于第一字段的音频;或 [0077] In a second alternative a first audio audio, the second audio corresponding to the audio of the first field; or

[0078] 以第三音频替换第一音频,所述第三音频为经处理的第一音频与第二音频的叠加音频。 [0078] In a first audio third audio Alternatively, the third audio is processed first audio and second audio overlay audio.

[0079] 具体地,例如视频中人声台词为“I suspect a diaphragmatic rupture.”,其对应的双语字幕为“I suspect a diaphragmatic rupture.我怀疑是横隔膜破裂”,经字段识别单元2的识别,确定“横膈膜破裂”(diaphragmatic rupture)这一词组为预设字段列表中的内容即第一字段,并且原音频流中“diaphragmaticrupture”的英文人声音频为第一音频;则此处音频变换单元3用“横膈膜破裂”的中文人声音频作为第二音频,替换原音频流中的第一音频。 [0079] Specifically, the human voice such as video lines of "I suspect a diaphragmatic rupture.", Corresponding to bilingual subtitles "I suspect a diaphragmatic rupture. I suspect rupture diaphragm", identified by the field identifying unit 2, OK "diaphragm rupture" (diaphragmatic rupture) the phrase as a preset list of contents of the field that is the first field, and the original audio stream "diaphragmaticrupture" English voice audio for the first audio; converting the audio here unit 3 with "rupture diaphragm" in Chinese people as a second audio sound audio, replacing the first audio in the original audio stream.

[0080] 又例如,视频中出现字幕“电击受伤可能导致腹腔组织受损The electricalinjury may have caused intra-abdominal tissue damage,,,经字段识别单兀2 的识别,石角定“电击伤”(electrical injury)这一词组为预设字段列表中的内容即第一字段,视频中出现本字幕的时间位置为第一位置;但经检测和判断,原视频中此第一位置处的音频流中不存在对应于本句的人声,则用“电击伤”中文人声音频或“electrical injury”英文人声音频作为第二音频替换原音频流中的第一音频。进一步地,如果视频中原音频流中不存在对应于字幕的人声,但经检测和判断确定第一位置存在音乐等背景音效,则也可以该第一位置处的音频流作为第一音频,与第二音频叠加后生成第三音频,并以第三音频替换第一音频;或者更进一步地,对音乐等背景音效衰减处理后与第二音频叠加生成第三音 [0080] As another example, the video appears in the subtitle "shock injury may cause abdominal tissue damage The electricalinjury may have caused intra-abdominal tissue damage ,,, Wu identified field recognition unit 2, the corner stone set" electrical injury "(Electrical injury) the phrase preset content of the field list, i.e., a first field, the time position of the video appears in the first position of the subtitle; but by detecting and determining, at the audio stream of the original video this is not a first position corresponds to a present vocal sentence, with the "electrical injury" Chinese vocal audio or "electrical injury" English audio replacing the first vocal audio as original audio stream a second audio further, if the video audio stream Plains subtitles corresponding to the human voice is not present, but is determined by detecting and determining the presence of background music, sound effects a first position, the audio stream may be at the first position as the first audio, and second audio post-added to produce a third audio, and audio to a third alternative of the first audio; or Still further, the sound attenuation of the background music and a second audio processing to generate a third tone superimposed ,并以第三音频替换第一音频。 And replacing the first to third audio audio.

[0081] 又例如,视频中出现字幕“他需要持续的心脏重症监护He needs sustainedcardiac 1.CU”经字段识别单元2的识别,确定“持续的” (sustained) —词为预设字段列表中的内容即第一字段,视频中出现本字幕的时间位置为第一位置;但经检测和判断,原视频中此第一位置处的音频流为另一人声“He's gonna die if we don't do anything.”(如果我们什么都不做,他会死的),也就是说经判别发现此处的人声与字幕不对应,则处理方法可以是依然用“sustained”英文人声作为第二音频或用“持续的”中文人声作为第二音频替换第一位置处的第一音频,也可以是不做任何替换处理,以免音频替换对视频观看造成娱乐效果或理解方面的影响。 [0081] In another example, the video caption "He needs continuous cardiac intensive care He needs sustainedcardiac 1.CU" recognition by the recognition unit 2 field appears to determine the "sustained" (sustained) - word for a preset list of fields SUMMARY first field i.e., the time position of the video appears in the first position of the subtitle; but after detection and determination, the original audio stream at a first position of this video another singing "He's gonna die if we do not do anything. "(If we do nothing, he will die), that is to say by discrimination found in the human voice here does not correspond with the subtitle, the processing method may still be used" sustained "human voice in English as a second audio or use "persistent" Chinese human voice as the first audio at the first position of the second audio replacement, can be replaced without any treatment, so as not to affect the entertainment audio replacement or understanding aspects of video viewing.

[0082] 也就是说,在音频变换单元3进行音频变换之前还包括一判断单元6所进行的判断工作:分析并判断第一音频是否与第一字段存在对应关系。 [0082] That is, before the audio converting means 3 further comprising converting the audio determining unit for determining a work 6: Analysis and determines whether there is a first audio correspondence relationship with the first field. 音频变换单元3根据分析判断所得结果,按预设方案变换所述音频。 The audio conversion unit 3 based on the analysis result obtained determination, converting the audio according to a preset program.

[0083] 进一步地,视频变换单元4 “以预设方案变换视频第一位置的单帧图像或多帧图像”具体包括: [0083] Further, the video converting unit 4 "In a first position of a preset scheme transformed video frame image of individual images or" specifically includes:

[0084] 在所述单帧图像或多帧图像的预设位置添加对应于第一字段的文字信息;或以预设方案变化或替换所述单帧图像或多帧图像中的字幕中对应于第一字段的文字。 [0084] Add text information corresponding to the first field in a predetermined position of the image of individual images or frames; preset programs or modification or replacement of the individual images or frames of the image corresponding to the caption the first text field.

[0085] 所述“对应于第一字段的文字信息”可以是预设字段列表中有关第一字段的字段、解释、音标或例句。 The [0085] "character information corresponding to the first field" may be preset in the relevant field of the first field list field, interpretation, or phonetic sentence. 例如,经过音频替换单元3用“横膈膜破裂”的中文人声音频作为第二音频,替换原音频流中的第一音频之后,视频变换单元4在视频第一位置的单帧图像或多中贞图像上的预设位置如视频底部添加字样“横膈膜破裂diaphragmatic rupture”,或检测原字幕中对应于第一字段的部分是否有双语字幕,若有双语字幕则将其中对应于第一字段的部分以高亮提示或用彩色文字替换原字幕中对应文字;若原字幕中仅有第一字段的中文或英文字幕,则将其对应的英文或中文文字标记于该字幕上方或下方。 For example, after replacing unit 3 through the audio with "rupture diaphragm" in Chinese people as a second audio sound audio, replacing the first audio in the original audio stream, a video conversion unit 4 in a first position of the individual images or video Zhen predetermined position on the bottom of the video image is added as the word "diaphragm rupture diaphragmatic rupture", or detect whether the original caption corresponding to the first field is part of a bilingual subtitles, wherein if bilingual subtitles will correspond to the first to highlight part of the field or alternatively prompt corresponding to the original text with colored text subtitle; if only the first field of the original subtitles or English subtitles in Chinese, it is English or Chinese characters corresponding to the subtitle marker above or below. [0086] 视频变换单元4的作用是通过视觉效果对第一字段进行加强提示,与声效相结合,进一步增强用户对待掌握词汇的印象。 [0086] The video-converting unit 4 is carried out by visual field strengthening effects of the first tips, combined with sound, to further enhance the user grasp words treat impression.

[0087] 上述实施例涉及的方法中的全部或部分步骤可以通过程序来指令相关的硬件来完成,所述的程序可以存储于计算机设备可读取的存储介质中,用于执行上述各实施例方法所述的全部或部分步骤。 [0087] The method of the above-described embodiments relate to all or part of the steps may be by a program instructing relevant hardware to complete, the program may be stored in a storage medium that can be read in a computer, for performing the above-described embodiments all or part of the steps of the method. 所述计算机设备,例如个人计算机、服务器、网络设备、智能移动终端、智能家居设备、穿戴式智能设备、车载智能设备等;所述的存储介质,例如:RAM、ROM、磁碟、磁带、光盘、闪存、U盘、移动硬盘、存储卡、记忆棒、网络服务器存储、网络云存储等。 The computer device, such as personal computers, servers, network equipment, intelligent mobile terminal, smart home devices, wearable smart devices, intelligent vehicle equipment; the storage medium, such as: RAM, ROM, magnetic disk, magnetic tape, optical disc , flash memory, U disk, mobile hard disk, memory card, memory stick, a network storage server, network cloud storage.

[0088] 以上所述仅为本发明的实施例,并非因此限制本发明的专利保护范围,凡是利用本发明说明书及附图内容所作的等效结构或等效流程变换,或直接或间接运用在其他相关的技术领域,均同理包括在本发明的专利保护范围内。 [0088] The embodiments described above are only embodiments of the present invention, not intended to limit the scope of the present invention, all utilize the present specification and drawings taken equivalent structures or equivalent process, or applied directly or indirectly other related technical fields shall fall within the scope of protection of the present invention.

Claims (10)

1.一种视频中音频的变换方法,包括步骤: 获取字幕,所述字幕与视频对应; 识别字幕信息中与预设字段列表中字段相匹配的字段,以这些字段为第一字段,以第一字段在视频中出现的时间位置为第一位置,以第一位置的音频为第一音频; 以预设方案变换所述第一音频。 An audio video conversion method, comprising the steps of: acquiring caption, the caption corresponding to a video; field identifies the caption information in the preset list field matches a field to a first field of these fields, with the first a time position in the video field appears as a first position to a first position to a first audio audio; transforming the first preset audio program.
2.如权利要求1所述的视频中音频的变换方法中,在步骤“以预设方案变换所述第一音频”后还包括步骤: 以预设方案变换视频第一位置的单帧图像或多帧图像。 2. The video conversion method of claim 1 in the audio, in a step as claimed in claim "In converting the first audio preset scheme" further comprises the step of: a single image at a preset position of a first embodiment transformed video or multi-frame images.
3.如权利要求1或2所述的视频中音频的变换方法中,所述获取字幕步骤具体包括: 获取文本格式的字幕,或获取图片格式的字幕,辨别和提取所述图片中的文字信息。 3. The video claim 1 or 2, audio conversion method claims, said step of acquiring the subtitle comprises: obtaining the subtitle text format, or to obtain information of the subtitle text, identify and extract the picture in picture format .
4.如权利要求1或2所述的视频中音频的变换方法中,步骤“以预设方案变换所述第一音频”具体包括: 以第二音频替换第一音频,所述第二音频为对应于第一字段的音频;或以第三音频替换第一音频,所述第三音频为经处理的第一音频与第二音频的叠加音频。 4. A conversion method of claim 1 or 2, audio, video, step as claimed in claim "In converting the first audio preset scheme" specifically includes: a first to a second alternative audio audio, the second audio is a first field corresponding to the audio; third audio or first audio Alternatively, the third audio processed first audio and the second audio overlay audio.
5.如权利要求2所述的视频中音频的变换方法中,步骤“以预设方案变换视频第一位置的单帧图像或多帧图像”具体包括: 在所述单帧图像或多帧图像的预设位置添加对应于第一字段的文字信息;或以预设方案变化或替换所述单帧图像或多帧图像中的字幕中对应于第一字段的文字。 "Individual images or frames in a preset scheme transformed video image a first position" 5. The video conversion method of claim 2, audio in the claims comprises the step of: in said image of individual images or frames It was added to a predetermined position of the character information corresponding to the first field; preset programs or modification or replacement of the individual images or frames caption image corresponding to the first text field.
6.一种视频中音频的变换装置,包括字幕获取单元、字段识别单元和音频变换单元; 所述字幕获取单元用于获取字幕,所述字幕与视频对应; 所述字段识别单元用于识别字幕信息中与预设字段列表中字段相匹配的字段,以这些字段为第一字段,以第一字段在视频中出现的时间位置为第一位置,以第一位置的音频为第一音频; 所述音频变换单元用于以预设方案变换所述第一音频。 An audio video conversion apparatus, the acquisition unit includes a subtitle, and an audio recognition unit field conversion unit; obtaining unit configured to obtain the subtitle caption, the caption corresponding to a video; identifying means for identifying the field caption field information field list in the preset field in matched in such fields as the first field, the first time position in the video field appears as a first position to a first position to a first audio audio; the said audio conversion means converting the first preset audio program.
7.如权利要求6所述的视频中音频的变换装置,还包括视频变换单元; 所述视频变换单元用于以预设方案变换视频第一位置的单帧图像或多帧图像。 7. The video conversion device according to the sixth audio claim, further comprising image converting means; means for converting the video image of a single frame or frames to a preset scheme transformed video image of the first position.
8.如权利要求6或7所述的视频中音频的变换装置中,字幕获取单元获取字幕具体包括: 获取文本格式的字幕,或获取图片格式的字幕,辨别和提取所述图片中的文字信息。 Audio video conversion apparatus as claimed in claim 6 or 7, the acquisition unit acquires a subtitle caption comprises: acquiring character information of text subtitle or caption acquired image format, the picture identification and extraction of .
9.如权利要求6或7所述的视频中音频的变换装置中,音频变换单元以预设方案变换所述第一音频具体包括: 以第二音频替换第一音频,所述第二音频为对应于第一字段的音频;或以第三音频替换第一音频,所述第三音频为经处理的第一音频与第二音频的叠加音频。 Audio video conversion apparatus as claimed in claim 6 or 7, the audio conversion unit converting the first audio preset scheme comprises: a first to a second alternative audio audio, the second audio is a first field corresponding to the audio; third audio or first audio Alternatively, the third audio processed first audio and the second audio overlay audio.
10.如权利要求7所述的视频中音频的变换装置中,视频变换单元以预设方案变换视频第一位置的单帧图像或多帧图像具体包括:在所述单帧图像或多帧图像的预设位置添加对应于第一字段的文字信息;或以预设方案变化或替换所述单帧图像或多帧图像中的字幕中对应于第一字段的文字。 Individual images or frames of the audio video image conversion device as claimed in claim 7, the video converting means to a first predetermined position of transformed video program comprises: an image of the individual images or frames It was added to a predetermined position of the character information corresponding to the first field; preset programs or modification or replacement of the individual images or frames caption image corresponding to the first text field.
CN201410248518.7A 2014-06-06 2014-06-06 Converting method and device of audio in video CN103997657A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410248518.7A CN103997657A (en) 2014-06-06 2014-06-06 Converting method and device of audio in video

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410248518.7A CN103997657A (en) 2014-06-06 2014-06-06 Converting method and device of audio in video

Publications (1)

Publication Number Publication Date
CN103997657A true CN103997657A (en) 2014-08-20

Family

ID=51311642

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410248518.7A CN103997657A (en) 2014-06-06 2014-06-06 Converting method and device of audio in video

Country Status (1)

Country Link
CN (1) CN103997657A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104679405A (en) * 2015-02-06 2015-06-03 深圳市金立通信设备有限公司 Terminal

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1964428A (en) * 2005-11-10 2007-05-16 国际商业机器公司 Method and apparatus for creating alternative audio via closed caption data
CN101978425A (en) * 2008-03-20 2011-02-16 迪讯广播公司 Method and apparatus for replacement of audio data in recorded audio/video stream
CN103226947A (en) * 2013-03-27 2013-07-31 广东欧珀移动通信有限公司 Mobile terminal-based audio processing method and device
CN103491429A (en) * 2013-09-04 2014-01-01 张家港保税区润桐电子技术研发有限公司 Audio processing method and audio processing equipment

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1964428A (en) * 2005-11-10 2007-05-16 国际商业机器公司 Method and apparatus for creating alternative audio via closed caption data
CN101978425A (en) * 2008-03-20 2011-02-16 迪讯广播公司 Method and apparatus for replacement of audio data in recorded audio/video stream
CN103226947A (en) * 2013-03-27 2013-07-31 广东欧珀移动通信有限公司 Mobile terminal-based audio processing method and device
CN103491429A (en) * 2013-09-04 2014-01-01 张家港保税区润桐电子技术研发有限公司 Audio processing method and audio processing equipment

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104679405A (en) * 2015-02-06 2015-06-03 深圳市金立通信设备有限公司 Terminal

Similar Documents

Publication Publication Date Title
Duchenne et al. Automatic annotation of human actions in video.
US8869222B2 (en) Second screen content
JP3844431B2 (en) Caption system based on speech recognition
US8416332B2 (en) Information processing apparatus, information processing method, and program
JP3953886B2 (en) The caption extraction device
You et al. A multiple visual models based perceptive analysis framework for multilevel video summarization
Rivlin et al. MAESTRO: Conductor of multimedia analysis technologies
CN101382937B (en) Multimedia resource processing method based on speech recognition and on-line teaching system thereof
US8645121B2 (en) Language translation of visual and audio input
CN1324517C (en) Method and system for person identification using video-speech matching
US7698721B2 (en) Video viewing support system and method
CN102132341B (en) Robust media fingerprints
Yang et al. Content based lecture video retrieval using speech and video text information
JP2002251197A (en) Audiovisual summary creating method
JP2005504395A (en) Multilingual transcription system
CN1905645B (en) Apparatus and method for providing addition information using extension subtitle file
WO2014049461A1 (en) Captioning using socially derived acoustic profiles
JPWO2008146616A1 (en) Image-sound segment corresponding apparatus and method, and program
Turetsky et al. Screenplay alignment for closed-system speaker identification and analysis of feature films
Jiang et al. Predicting emotions in user-generated videos
CN101021903A (en) Video content analysis systems subtitle
Hong et al. Dynamic captioning: video accessibility enhancement for hearing impairment
US20090144056A1 (en) Method and computer program product for generating recognition error correction information
US20140289226A1 (en) System and Method For Search and Display of Content in the Form of Audio, Video or Audio-Video
CN102207844B (en) The information processing apparatus and information processing method

Legal Events

Date Code Title Description
C06 Publication
C10 Entry into substantive examination