CN105898556A - Plug-in subtitle automatic synchronization method and device - Google Patents

Plug-in subtitle automatic synchronization method and device Download PDF

Info

Publication number
CN105898556A
CN105898556A CN201511018280.XA CN201511018280A CN105898556A CN 105898556 A CN105898556 A CN 105898556A CN 201511018280 A CN201511018280 A CN 201511018280A CN 105898556 A CN105898556 A CN 105898556A
Authority
CN
China
Prior art keywords
plug
time
audio
initial time
short sentence
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201511018280.XA
Other languages
Chinese (zh)
Inventor
蔡炜
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Leshi Zhixin Electronic Technology Tianjin Co Ltd
Original Assignee
Leshi Zhixin Electronic Technology Tianjin Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Leshi Zhixin Electronic Technology Tianjin Co Ltd filed Critical Leshi Zhixin Electronic Technology Tianjin Co Ltd
Priority to CN201511018280.XA priority Critical patent/CN105898556A/en
Publication of CN105898556A publication Critical patent/CN105898556A/en
Pending legal-status Critical Current

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/439Processing of audio elementary streams
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/4302Content synchronisation processes, e.g. decoder synchronisation
    • H04N21/4307Synchronising the rendering of multiple content streams or additional data on devices, e.g. synchronisation of audio on a mobile phone with the video output on the TV screen
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/439Processing of audio elementary streams
    • H04N21/4394Processing of audio elementary streams involving operations for analysing the audio stream, e.g. detecting features or characteristics in audio streams
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/47End-user applications
    • H04N21/488Data services, e.g. news ticker
    • H04N21/4884Data services, e.g. news ticker for displaying subtitles

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)

Abstract

The invention relates to the technical field of video play and discloses a plug-in subtitle automatic synchronization method and device. The method comprises the following steps: extracting an audio portion of a video file, and carrying out decoding on the audio portion to obtain pulse coding modulation data; dividing the pulse coding modulation data into audio clips, and classifying the audio clips; dividing the audio clips, which are classified as speeches, into short sentences, and determining the start time and end time of each short sentence; searching a match item in a plug-in subtitle file according to the determined start time and end time of the short sentences; changing the start time of the match item into presentation time stamp PTS of a current video, and updating the starting time of each item, the start time of which is larger than that of the match item, in the plug-in subtitle file according to the presentation time stamp. The display time of the subtitle file is allowed to be consistent with the play time of audio/video, thereby realizing automatic synchronization of plug-in subtitles and improving watch experience of a user.

Description

The automatic synchronous method of a kind of plug-in captions and device
Technical field
The present invention relates to video display arts field, particularly to the automatic synchronous method of a kind of plug-in captions And device.
Background technology
Captions (subtitles ofmotion picture) refer to make with written form display TV, film, stage The non-visual contents such as the dialogue in product, also refer to the word of films and television programs post-production.In Making Movies etc. During video work, video file and subtitle file can be integrated, so not do when playing The captions that method changes and removes are referred to as embedded captions.And in some works, respective self-existent video literary composition Part and subtitle file are each individually present, and then can import the subtitle file of required version when video playback, This kind of subtitle file is referred to as plug-in captions.Comparing embedded captions, plug-in captions have versatile and flexible, import Convenience and the advantage without compromising on video quality etc..
Plug-in captions typically use specific subtitles software to carry out captions making.This production method firstly the need of Manually listen complete lines, according to the content described in every lines, complete lines captions are input to electricity Among Ziwen basis, it utilizes specific subtitles software, while listen caption content, while carry out manual punctuate, with really The initial time of fixed each dialogue and dialogue length, the most so-called " time shaft ".When whole captions systems Making complete, captions software can export the plug-in subtitle file of a certain or several form.When certain plays system When system is capable of identify that and supports the broadcast mode of plug-in captions, these captions can be loaded when video playback File.But, the own characteristic made due to plug-in subtitle file determines, the time of plug-in subtitle file Labelling accuracy is poor, poor with the synchronicity of audio frequency and video when causing playing, and user manually regulates captions Reproduction time then seem cumbersome, have a strong impact on the normal viewing of user.
Summary of the invention
It is an object of the invention to provide automatic synchronous method and the device of a kind of plug-in captions so that captions The display time of file is consistent with the reproduction time of audio frequency and video, thus realizes the automatic synchronization of plug-in captions, The viewing improving user is experienced.
For solving above-mentioned technical problem, embodiments of the present invention provide the most same of a kind of plug-in captions One step process, comprises the steps of the audio-frequency unit extracting video file, and is decoded audio-frequency unit, Obtain pulse code modulation data;Described pulse code modulation data is cut into audio fragment, and to institute State audio fragment to classify;Wherein, the classification of described classification comprises: quiet, voice and non-voice; The described audio fragment being categorized as voice is divided into short sentence, and determines initial time and the knot of described short sentence The bundle time;Initial time according to the described short sentence determined and end time, search in plug-in subtitle file One occurrence of rope;The initial time of described occurrence is changed to the reproduction time stamp PTS of current video, And stab according to described reproduction time, update initial time rising more than described occurrence in plug-in subtitle file The initial time of each of time beginning.
Embodiments of the present invention additionally provide the automatic synchronizing apparatus of a kind of plug-in captions, comprise: extract Module, cutting module, division module, search module and more new module;Described extraction module is used for extracting The audio-frequency unit of video file, and audio-frequency unit is decoded, it is thus achieved that pulse code modulation data;Institute State cutting module for described pulse code modulation data being cut into audio fragment, and to described audio frequency sheet Duan Jinhang classifies;Wherein, the classification of described classification comprises: quiet, voice and non-voice;Described division Module for being divided into short sentence by the described audio fragment being categorized as voice, and determines the initial of described short sentence Time and end time;Described search module is for the initial time according to the described short sentence determined and end Time, plug-in subtitle file is searched for an occurrence;Described more new module is for by described occurrence Initial time change to the reproduction time stamp PTS of current video, and stab according to described reproduction time, more In new plug-in subtitle file during initial more than each of initial time of described occurrence of initial time Between.
Embodiment of the present invention in terms of existing technologies, extracts the audio-frequency unit of video file, and right Audio-frequency unit is decoded obtaining pulse code modulation data, and pulse code modulation data is cut into audio frequency Fragment, and audio fragment is categorized as voice, quiet, non-voice, and then would be classified as the audio frequency of voice Fragment be divided into short sentence, and determine initial time and the end time of short sentence, and then short according to determine The initial time of sentence and end time, plug-in subtitle file is searched for an occurrence, and by occurrence Initial time change to current video reproduction time stamp PTS, and according to reproduction time stab, outside renewal Hang initial time in subtitle file and be more than the initial time of each of the initial time of occurrence, so that Obtain display time and the video playback automatic synchronization of the dialogue of subtitle file, improve user's viewing and experience.
Preferably, in the described initial time according to the described short sentence determined and end time, at plug-in word Curtain file is searched in the step of an occurrence, comprise following sub-step: before and after described initial time In preset duration, in described plug-in subtitle file, find corresponding entry;In the described corresponding entry found, Find out all items in error allowed band of the dialogue duration with described short sentence;If the item number found out More than one, by the described short sentence determined upper one record with the described item found out upper one record into Row compares, until finding most like one as occurrence.Thus improve the coupling of captions and audio frequency and video Efficiency and accuracy.
Preferably, described, described audio fragment is divided in the step of short sentence, enters according to speech pause Row divides;Wherein, described speech pause is including at least the audio section of the first preset number.Such that it is able to carry The efficiency that high statement divides.
Preferably, described first preset number is 2.Such that it is able to ignore shorter sound information, more Protect well integrity in short.
Preferably, described short sentence is including at least the audio section of the second preset number, described second preset number It it is 3.Such that it is able to the invalid information in short-term filtered out in audio frequency, improve the efficiency that statement divides.
Accompanying drawing explanation
Fig. 1 is the flow chart of the automatic synchronous method according to the plug-in captions of first embodiment of the invention;
Fig. 2 is according to first embodiment of the invention short sentence and subtitle item matching algorithm schematic diagram;
Fig. 3 is the structured flowchart of the automatic synchronizing apparatus according to the plug-in captions of second embodiment of the invention.
Detailed description of the invention
For making the object, technical solutions and advantages of the present invention clearer, below in conjunction with accompanying drawing to this Bright each embodiment is explained in detail.But, it will be understood by those skilled in the art that In each embodiment of the present invention, propose many technology to make reader be more fully understood that the application thin Joint.But, even if there is no these ins and outs and many variations based on following embodiment and amendment, The application each claim technical scheme required for protection can also be realized.
First embodiment of the present invention relates to the automatic synchronous method of a kind of plug-in captions, and idiographic flow is such as Shown in Fig. 1, comprise the steps of
Step 10: extract the audio-frequency unit of video file, and audio-frequency unit is decoded, it is thus achieved that pulse Coding modulation data.
Video file is obtained by video flowing and audio stream synthesis, during online broadcasting video, first from regarding Frequency file extracts audio stream.The storehouse ffmpeg that increases income can be used to extract the audio-frequency unit of video file, By respective decoder, audio-frequency unit is decoded as PCM (Pulse Coding Modulation, pulse again Coded modulation, is called for short PCM) data.
Step 11: pulse code modulation data is cut into audio fragment, and audio fragment is classified.
In present embodiment, it is possible to use Marsyas software audio frequency (the pulse code modulation number to extracting According to) classify, such as, pass through Marsyas, it can be determined that go out the classification of this voice data: quiet, Voice and non-voice.The a length of 32ms of interface setting audio frame that can be provided by Marsyas, and by 5 Individual audio frame as an audio section, i.e. a length of 0.16s of audio section.In categorizing process, can be with audio frequency Section carries out a subseries for unit, improves the efficiency of classification.Present embodiment is for the classification of audio fragment Method is not specifically limited, as long as can voice and non-voice be distinguished.As can be seen here, pass through The classification of this step can obtain initial time and the end time of sound bite in audio fragment, for from Audio fragment extracts speech sentences lay the first stone.
Step 12: the audio fragment that would be classified as voice is divided into short sentence, and determines the initial time of short sentence And the end time.By the classification of step 11, it may be determined that voice, non-voice, quiet etc. initial time Between and the end time, and then according to speech pause, voice can be divided into short sentence.
The beginning detecting sentence in present embodiment is the key that short sentence divides with end, because only that reach To higher end-point detection precision, just can accomplish with a definite target in view, it is achieved to sentence length sum purpose control System.This step is based on the classification information obtained in step 11, by taking the segmentation algorithm preset can be from Audio frequency intercepts out voice unit (i.e. short sentence).Specifically, following strategy can be used to carry out audio frequency cut Point: during to enter the time point of quiet section before continuous speech section or non-speech segment as the beginning of sentence Between, the time point of last voice segments during to terminate continuous speech section is as the end time of sentence. So to the speech pause using certain time length i.e. available after audio frequency cutting as phase partitioning boundary, semantic To the short sentence in complete " class sentence " unit, i.e. present embodiment.
But, it is likely to result in some extreme cases by above-mentioned cutting strategy detection sentence end points: such as Can mark off some extremely short sentences, the length of these sentences is only one to two audio sections, such sentence Son generally only comprises one or two word, does not even comprise any effective voice messaging, therefore these sentences Needs are filtered out and cannot function as sentence effectively and carry out Subtitle Demonstration.
In order to improve cutting efficiency, cutting strategy arranges speech pause including at least the first preset number Audio section, it is preferred that the audio section of the first preset number is such as 2 audio sections.By arranging language The minimum length that sound pauses, can ignore the instantaneous ventilation etc. of shorter sound information, such as speaker, It is thus possible to the integrity that protection is in short.
Further, the short sentence being syncopated as includes at least the audio section of the second preset number, it is preferred that the The audio section of two preset number can be such as 3 audio sections, i.e. ignores the overall length voice less than 0.48 second Unit, by limiting the minimum length of sentence, can filter out the invalid information in short-term in audio frequency, such as Speaker tussiculas.
Should be appreciated that present embodiment is for the first preset data or the concrete numerical value of the second preset number Be not restricted, in actual application, can according to the feature of language be adjusted with more accurately, the most true Determine initial time and the end time of statement element.
By step 12, the section audio extracted is cut into the most independent statement, and has obtained Take initial time and the end time of statement, may determine that the playing duration of statement accordingly.
Step 13: according to initial time and the end time of the short sentence determined, search in plug-in subtitle file One occurrence of rope.
Generally, plug-in subtitle file includes initial time, the information of dialogue duration etc..This embodiment party Formula, when playing, obtains plug-in subtitle file, and according to plug-in subtitle file create one < initial time, Dialogue duration > data structure datastruct1, during such that it is able to find each dialogue initial easily Between and dialogue duration.Then time according to the short sentence (i.e. dialogue in video) marked off in step 12 initial Between and the end time in data structure datastruct1, find occurrence item.
Specifically, step 13 comprises following sub-step:
Sub-step 130: before and after initial time in preset duration, finds corresponding in plug-in subtitle file ?.
Ideally each dialogue in audio-frequency unit (short sentence being similar in present embodiment) rise Time beginning, end time dialogue corresponding with subtitle file (i.e. corresponding entry in present embodiment) rise Time beginning, end time synchronize.Owing to subtitle file of the prior art makes, cause captions The initial time of corresponding entry, end time etc. and the initial time of dialogue, end in audio-frequency unit in file There is deviation in the time.Therefore, in preset duration, (in the most possible captions, corresponding entry rises this step needs Time beginning and the difference of the initial time of audio frequency dialogue) interior, from plug-in captions, find corresponding entry, this enforcement Preset duration in mode can be 1 minute, i.e. in the initial time of the short sentence extracted from video file Front and back in plug-in captions, find corresponding entry in 1 minute.Should be appreciated that preset duration can be according to captions The actual features of file is set, and present embodiment is not restricted for the specific size of preset duration.
Sub-step 131: in the corresponding entry found, finds out the dialogue duration with short sentence and allows model in error Enclose interior all items.
Such as, in 1 minute before and after the initial time of short sentence, in datastruct1 search with The error of the dialogue duration of short sentence was at all items of 3 seconds.Such as, during the dialogue of short sentence a length of 4 seconds, as Fruit is in 1 minute, and finding dialogue duration subtitle item between 2.5 seconds to 5.5 seconds has 3, then Extract this 3 corresponding entry.Should be appreciated that the present embodiment concrete numerical value for error allowed band Illustrate understanding merely for convenience, protection scope of the present invention can not be limited with this.
Sub-step 132: judge that the item number found out is the most more than one.If the item number found out is one Individual, then it is assumed that this corresponding entry is the occurrence of corresponding audio frequency, continues executing with step 14, if the item found out Number is more than one, then immediate occurrence in needing to screen further, therefore continues executing with step 133。
Sub-step 133: a upper record of a upper record of the short sentence determined with the item found out is carried out Relatively, until finding most like one as occurrence.
Now be illustrated below: as in figure 2 it is shown, such as in step 131 short sentence P at datastruct1 In find 3 subtitle item (i.e. subtitle item A, subtitle item B, subtitle item C), then continue short sentence P Upper one record short sentence P-1 respectively with subtitle item A, subtitle item B, the previous subtitle item of subtitle item C A-1, subtitle item B-1, subtitle item C-1 match, and matching algorithm can be to compare initial time and dialogue Durations etc., if short sentence P-1 finds the subtitle item of more than 2, then continue upper the one of short sentence P-1 Individual record short sentence P-2 mates with a upper record of the multiple subtitle item found respectively, class according to this Push away, until finding the subtitle item matched with short sentence.
Step 14: the initial time of occurrence is changed to the reproduction time stamp PTS of current video, and root Stab according to reproduction time, update initial time in plug-in subtitle file and be more than each of the initial time of occurrence The initial time of item.
Specifically, first the initial time of occurrence is changed to the reproduction time stamp PTS of current video (Presentation time stamp, current time is stabbed, and is called for short PTS), it is possible to by following public affairs When formula updates initial more than each of the initial time of occurrence of initial time in plug-in subtitle file Between:
Initial time 2=initial time 1-(item. initial time-video.pts)
Wherein, item. initial time is the initial time of current matching item, and video.pts is current video The time of frame, then (item. initial time-video.pts) represents between current matching item and audio frequency and video Time difference.Initial time 1 represents the initial time of the front subtitle item of correction in datastruct1, time initial Between 2 represent in datastruct1 the initial time of subtitle item after correction.
Present embodiment can be embedded in playout software, in video display process, in video playback It is performed both by present embodiment in starting end and predetermined time interval (such as 10 minutes) afterwards, i.e. obtains Take the voice data with certain time length, be decoded thus obtain pulse code modulation data, then will This portion of audio data carries out classifying and being cut into short sentence, and finds the coupling of short sentence in subtitle file , and then update occurrence and reproduction time and be positioned at the initial time of all captions after this occurrence. Or, it is also possible to the initial time of dialogues all in voice data is mated so that plug-in captions with Audio frequency and video Complete Synchronization, reaches more preferably viewing effect.
Present embodiment in terms of existing technologies, extracts the audio-frequency unit of video file, to audio portion Divide and be decoded obtaining pulse code modulation data, such that it is able to the voice messaging in audio frequency is analyzed, Again pulse code modulation data is cut into audio fragment, such that it is able to divide by analyzing just audio fragment Class is voice, quiet, non-voice, can would be classified as further voice audio fragment be divided into short Sentence, and initial time and the end time of short sentence is determined with the reproduction time stamp PTS of current video frame, then Initial time according to the short sentence determined and end time, plug-in subtitle file is searched for an occurrence, Such that it is able to the initial time of occurrence to be changed to the reproduction time stamp PTS of current video, and according to broadcasting Put timestamp, update initial time in plug-in subtitle file and be more than each 's of the initial time of occurrence Initial time.By above-mentioned steps, present embodiment can be according to dialogue automatic time correction subtitle item The display time, make Subtitle Demonstration and audio and video playing time consistency, so that plug-in captions and audio frequency and video Automatic synchronization, reaches preferably viewing effect, improves Consumer's Experience.
The step of the most various methods divides, and is intended merely to describe clear, it is achieved time can merge into one Step or split some step, is decomposed into multiple step, as long as comprising identical logical relation, All in the protection domain of this patent;To adding inessential amendment in algorithm or in flow process or drawing Enter inessential design, but do not change the core design of its algorithm and flow process all at the protection model of this patent In enclosing.
Second embodiment of the invention relates to the automatic synchronizing apparatus of a kind of plug-in captions, as it is shown on figure 3, Comprise: extraction module, cutting module, division module, search module and more new module.
Extraction module is for extracting the audio-frequency unit of video file, and is decoded audio-frequency unit, it is thus achieved that Pulse code modulation data.
Cutting module for being cut into audio fragment by pulse code modulation data, and carries out audio fragment Classification, wherein, the classification of classification comprises: quiet, voice and non-voice.
Divide module and be divided into short sentence for the audio fragment that would be classified as voice, and determine the initial of short sentence Time and end time.Specifically, module is divided for carrying out short sentence division, voice according to speech pause Pause the audio section including at least the first preset number, and is divided into audio fragment including at least second The short sentence of the audio section of preset number.Wherein, the audio section of the first preset number time a length of, second is pre- If Serpentis purpose audio frequency end time a length of.Should be appreciated that the first preset number and the second preset number Own characteristic according to voice data and subtitle file is set, and present embodiment is for its concrete number Value is not restricted.
Search module comprises further: initial matched sub-block, dialogue matched sub-block and comparison match Module.Initial matched sub-block is used for before and after initial time in preset duration, in plug-in subtitle file Find corresponding entry for the initial time according to the short sentence determined and end time, in plug-in subtitle file Search for an occurrence.Dialogue matched sub-block is used in the corresponding entry that initial matched sub-block finds, Find out all items in error allowed band of the dialogue duration with short sentence.Comparison match submodule is used for When item number that dialogue matched sub-block is found out is more than one, by the upper record of the short sentence determined and look for A upper record of the item gone out compares, until finding most like one as occurrence.
More new module stabs PTS for the reproduction time that the initial time of occurrence changes to current video, And stab according to reproduction time, update initial time in plug-in subtitle file and be more than the initial time of occurrence The initial time of each.
Present embodiment in terms of existing technologies, by extracting the voice data in video file, and Voice data is classified, is cut into statement, thus obtain the initial time of accurate statement, end Time, and in subtitle file, find occurrence accordingly, and by the initial time in occurrence correspondingly Modify so that subtitle file synchronizes to reach Tong Bu with audio frequency and video.Therefore, present embodiment is without user Manually regulate plug-in captions, it is possible to make plug-in captions automatic synchronization in audio frequency and video, thus reach preferably Viewing effect, improves Consumer's Experience.
It is seen that, present embodiment is the system embodiment corresponding with the first embodiment, this enforcement Mode can be worked in coordination enforcement with the first embodiment.The relevant technical details mentioned in first embodiment The most effective, in order to reduce repetition, repeat no more here.Correspondingly, this enforcement The relevant technical details mentioned in mode is also applicable in the first embodiment.
It is noted that each module involved in present embodiment is logic module, in reality In application, a logical block can be a physical location, it is also possible to be one of a physical location Point, it is also possible to realize with the combination of multiple physical locations.Additionally, for the innovative part highlighting the present invention, Not by the unit the closest with solving technical problem relation proposed by the invention in present embodiment Introduce, but this is not intended that in present embodiment the unit that there is not other.
It will be understood by those skilled in the art that the respective embodiments described above are realize the present invention concrete Embodiment, and in actual applications, can to it, various changes can be made in the form and details, and the most inclined From the spirit and scope of the present invention.

Claims (11)

1. the automatic synchronous method of plug-in captions, it is characterised in that comprise the steps of
Extract the audio-frequency unit of video file, and audio-frequency unit is decoded, it is thus achieved that pulse code modulation Data;
Described pulse code modulation data is cut into audio fragment, and described audio fragment is carried out point Class;Wherein, the classification of described classification comprises: quiet, voice and non-voice;
The described audio fragment being categorized as voice is divided into short sentence, and determines the initial time of described short sentence And the end time;
Initial time according to the described short sentence determined and end time, plug-in subtitle file is searched for one Individual occurrence;
The initial time of described occurrence is changed to the reproduction time stamp PTS of current video, and according to institute State reproduction time stamp, update initial time in plug-in subtitle file and be more than the initial time of described occurrence The initial time of each.
The automatic synchronous method of plug-in captions the most according to claim 1, it is characterised in that The described initial time according to the described short sentence determined and end time, plug-in subtitle file is searched for one In the step of individual occurrence, comprise following sub-step:
Before and after described initial time in preset duration, in described plug-in subtitle file, find corresponding entry;
In the described corresponding entry found, find out the dialogue duration with described short sentence in error allowed band All items;
If the item number found out is more than one, a upper record of the described short sentence determined is looked for described A upper record of the item gone out compares, until finding most like one as occurrence.
The automatic synchronous method of plug-in captions the most according to claim 1 and 2, it is characterised in that Described, described audio fragment is divided in the step of short sentence, divides according to speech pause;
Wherein, described speech pause is including at least the audio section of the first preset number.
The automatic synchronous method of plug-in captions the most according to claim 3, it is characterised in that institute Stating the first preset number is 2.
The automatic synchronous method of plug-in captions the most according to claim 3, it is characterised in that institute State the short sentence audio section including at least the second preset number.
The automatic synchronous method of plug-in captions the most according to claim 5, it is characterised in that institute Stating the second preset number is 3.
The automatic synchronous method of plug-in captions the most according to claim 1, it is characterised in that
In the described initial time determining described short sentence and the step of end time, to enter continuous speech Before Duan quiet section or the time point of non-speech segment are as the time started of sentence, to terminate continuous speech The time point of last voice segments during section is as the end time of sentence.
8. the automatic synchronizing apparatus of plug-in captions, it is characterised in that comprise: extraction module, cut Sub-module, division module, search module and more new module;
Described extraction module is for extracting the audio-frequency unit of video file, and is decoded audio-frequency unit, Obtain pulse code modulation data;
Described cutting module is used for being cut into described pulse code modulation data audio fragment, and to described Audio fragment is classified;Wherein, the classification of described classification comprises: quiet, voice and non-voice;
Described division module is for being divided into short sentence by the described audio fragment being categorized as voice, and determines institute State initial time and the end time of short sentence;
Described search module is for the initial time according to the described short sentence determined and end time, plug-in Subtitle file is searched for an occurrence;
When described more new module for changing to the broadcasting of current video by the initial time of described occurrence Between stab PTS, and stab according to described reproduction time, update in plug-in subtitle file initial time more than described The initial time of each of the initial time of occurrence.
The self-synchronous system of plug-in captions the most according to claim 8, it is characterised in that institute State search module to comprise: initial matched sub-block, dialogue matched sub-block and comparison match submodule;
Described initial matched sub-block is used for before and after described initial time in preset duration, described plug-in Subtitle file finds corresponding entry;
Described dialogue matched sub-block, in the corresponding entry that described initial matched sub-block finds, is found out With the dialogue duration of the described short sentence all items in error allowed band;
Described comparison match submodule is more than one for the item number found out in described dialogue matched sub-block Time individual, a upper record of a upper record of the described short sentence determined with the described item found out is compared Relatively, until finding most like one as occurrence.
The self-synchronous system of plug-in captions the most according to claim 8 or claim 9, it is characterised in that Described division module is additionally operable to divide according to speech pause;
Wherein, described speech pause is including at least the audio section of the first preset number.
The self-synchronous system of 11. plug-in captions according to claim 10, it is characterised in that Described division module is additionally operable to described audio fragment is divided into the audio frequency including at least the second preset number The short sentence of section.
CN201511018280.XA 2015-12-30 2015-12-30 Plug-in subtitle automatic synchronization method and device Pending CN105898556A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201511018280.XA CN105898556A (en) 2015-12-30 2015-12-30 Plug-in subtitle automatic synchronization method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201511018280.XA CN105898556A (en) 2015-12-30 2015-12-30 Plug-in subtitle automatic synchronization method and device

Publications (1)

Publication Number Publication Date
CN105898556A true CN105898556A (en) 2016-08-24

Family

ID=57002208

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201511018280.XA Pending CN105898556A (en) 2015-12-30 2015-12-30 Plug-in subtitle automatic synchronization method and device

Country Status (1)

Country Link
CN (1) CN105898556A (en)

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106504773A (en) * 2016-11-08 2017-03-15 上海贝生医疗设备有限公司 A kind of wearable device and voice and activities monitoring system
CN107402530A (en) * 2017-09-20 2017-11-28 淮安市维达科技有限公司 Control system of one computer using lines captions as core coordination linkage stage equipment
CN107562737A (en) * 2017-09-05 2018-01-09 语联网(武汉)信息技术有限公司 A kind of methods of video segmentation and its system for being used to translate
CN108305636A (en) * 2017-11-06 2018-07-20 腾讯科技(深圳)有限公司 A kind of audio file processing method and processing device
CN108924664A (en) * 2018-07-26 2018-11-30 青岛海信电器股份有限公司 A kind of synchronous display method and terminal of program credits
CN109005444A (en) * 2017-06-07 2018-12-14 纳宝株式会社 Content providing server, content providing terminal and content providing
CN109413475A (en) * 2017-05-09 2019-03-01 北京嘀嘀无限科技发展有限公司 Method of adjustment, device and the server of subtitle in a kind of video
CN110781649A (en) * 2019-10-30 2020-02-11 中央电视台 Subtitle editing method and device, computer storage medium and electronic equipment
CN111050201A (en) * 2019-12-10 2020-04-21 Oppo广东移动通信有限公司 Data processing method and device, electronic equipment and storage medium
CN113992940A (en) * 2021-12-27 2022-01-28 北京美摄网络科技有限公司 Web end character video editing method, system, electronic equipment and storage medium
CN114640874A (en) * 2022-03-09 2022-06-17 湖南国科微电子股份有限公司 Subtitle synchronization method and device, set top box and computer readable storage medium
WO2023015416A1 (en) * 2021-08-09 2023-02-16 深圳Tcl新技术有限公司 Subtitle processing method and apparatus, and storage medium

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101021854A (en) * 2006-10-11 2007-08-22 鲍东山 Audio analysis system based on content
US20090213924A1 (en) * 2008-02-22 2009-08-27 Sheng-Nan Sun Method and Related Device for Converting Transport Stream into File
CN103647909A (en) * 2013-12-16 2014-03-19 宇龙计算机通信科技(深圳)有限公司 Caption adjusting method and caption adjusting device

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101021854A (en) * 2006-10-11 2007-08-22 鲍东山 Audio analysis system based on content
US20090213924A1 (en) * 2008-02-22 2009-08-27 Sheng-Nan Sun Method and Related Device for Converting Transport Stream into File
CN103647909A (en) * 2013-12-16 2014-03-19 宇龙计算机通信科技(深圳)有限公司 Caption adjusting method and caption adjusting device

Cited By (19)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106504773A (en) * 2016-11-08 2017-03-15 上海贝生医疗设备有限公司 A kind of wearable device and voice and activities monitoring system
CN109413475A (en) * 2017-05-09 2019-03-01 北京嘀嘀无限科技发展有限公司 Method of adjustment, device and the server of subtitle in a kind of video
CN109005444A (en) * 2017-06-07 2018-12-14 纳宝株式会社 Content providing server, content providing terminal and content providing
CN107562737A (en) * 2017-09-05 2018-01-09 语联网(武汉)信息技术有限公司 A kind of methods of video segmentation and its system for being used to translate
CN107402530A (en) * 2017-09-20 2017-11-28 淮安市维达科技有限公司 Control system of one computer using lines captions as core coordination linkage stage equipment
CN108305636A (en) * 2017-11-06 2018-07-20 腾讯科技(深圳)有限公司 A kind of audio file processing method and processing device
WO2019086044A1 (en) * 2017-11-06 2019-05-09 腾讯科技(深圳)有限公司 Audio file processing method, electronic device and storage medium
US11538456B2 (en) 2017-11-06 2022-12-27 Tencent Technology (Shenzhen) Company Limited Audio file processing method, electronic device, and storage medium
CN108924664B (en) * 2018-07-26 2021-06-08 海信视像科技股份有限公司 Synchronous display method and terminal for program subtitles
CN108924664A (en) * 2018-07-26 2018-11-30 青岛海信电器股份有限公司 A kind of synchronous display method and terminal of program credits
CN110781649A (en) * 2019-10-30 2020-02-11 中央电视台 Subtitle editing method and device, computer storage medium and electronic equipment
CN110781649B (en) * 2019-10-30 2023-09-15 中央电视台 Subtitle editing method and device, computer storage medium and electronic equipment
CN111050201B (en) * 2019-12-10 2022-06-14 Oppo广东移动通信有限公司 Data processing method and device, electronic equipment and storage medium
CN111050201A (en) * 2019-12-10 2020-04-21 Oppo广东移动通信有限公司 Data processing method and device, electronic equipment and storage medium
WO2023015416A1 (en) * 2021-08-09 2023-02-16 深圳Tcl新技术有限公司 Subtitle processing method and apparatus, and storage medium
CN113992940A (en) * 2021-12-27 2022-01-28 北京美摄网络科技有限公司 Web end character video editing method, system, electronic equipment and storage medium
CN113992940B (en) * 2021-12-27 2022-03-29 北京美摄网络科技有限公司 Web end character video editing method, system, electronic equipment and storage medium
CN114640874A (en) * 2022-03-09 2022-06-17 湖南国科微电子股份有限公司 Subtitle synchronization method and device, set top box and computer readable storage medium
WO2023169240A1 (en) * 2022-03-09 2023-09-14 湖南国科微电子股份有限公司 Subtitle synchronization method and apparatus, set-top box and computer readable storage medium

Similar Documents

Publication Publication Date Title
CN105898556A (en) Plug-in subtitle automatic synchronization method and device
CN108780643B (en) Automatic dubbing method and device
US8281231B2 (en) Timeline alignment for closed-caption text using speech recognition transcripts
US8179475B2 (en) Apparatus and method for synchronizing a secondary audio track to the audio track of a video source
US20080219641A1 (en) Apparatus and method for synchronizing a secondary audio track to the audio track of a video source
CN106792145A (en) A kind of method and apparatus of the automatic overlapping text of audio frequency and video
US8958013B2 (en) Aligning video clips to closed caption files
US9609397B1 (en) Automatic synchronization of subtitles based on audio fingerprinting
US8564721B1 (en) Timeline alignment and coordination for closed-caption text using speech recognition transcripts
US20200126559A1 (en) Creating multi-media from transcript-aligned media recordings
CN105635782A (en) Subtitle output method and device
KR20150057591A (en) Method and apparatus for controlling playing video
CN106162293B (en) A kind of method and device of video sound and image synchronization
US11064245B1 (en) Piecewise hybrid video and audio synchronization
US20210151082A1 (en) Systems and methods for mixing synthetic voice with original audio tracks
US10692497B1 (en) Synchronized captioning system and methods for synchronizing captioning with scripted live performances
KR102308651B1 (en) Media environment-oriented content distribution platform
WO2017062961A1 (en) Methods and systems for interactive multimedia creation
Federico et al. An automatic caption alignment mechanism for off-the-shelf speech recognition technologies
US9905221B2 (en) Automatic generation of a database for speech recognition from video captions
CN109963092B (en) Subtitle processing method and device and terminal
EP3839953A1 (en) Automatic caption synchronization and positioning
CN106162323A (en) A kind of video data handling procedure and device
CN112714348A (en) Intelligent audio and video synchronization method
CN103152607B (en) The supper-fast thick volume method of video

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20160824

WD01 Invention patent application deemed withdrawn after publication