CN105898556A - Plug-in subtitle automatic synchronization method and device - Google Patents
Plug-in subtitle automatic synchronization method and device Download PDFInfo
- Publication number
- CN105898556A CN105898556A CN201511018280.XA CN201511018280A CN105898556A CN 105898556 A CN105898556 A CN 105898556A CN 201511018280 A CN201511018280 A CN 201511018280A CN 105898556 A CN105898556 A CN 105898556A
- Authority
- CN
- China
- Prior art keywords
- plug
- time
- audio
- initial time
- short sentence
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/43—Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
- H04N21/439—Processing of audio elementary streams
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/43—Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
- H04N21/4302—Content synchronisation processes, e.g. decoder synchronisation
- H04N21/4307—Synchronising the rendering of multiple content streams or additional data on devices, e.g. synchronisation of audio on a mobile phone with the video output on the TV screen
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/43—Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
- H04N21/439—Processing of audio elementary streams
- H04N21/4394—Processing of audio elementary streams involving operations for analysing the audio stream, e.g. detecting features or characteristics in audio streams
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/47—End-user applications
- H04N21/488—Data services, e.g. news ticker
- H04N21/4884—Data services, e.g. news ticker for displaying subtitles
Landscapes
- Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)
Abstract
The invention relates to the technical field of video play and discloses a plug-in subtitle automatic synchronization method and device. The method comprises the following steps: extracting an audio portion of a video file, and carrying out decoding on the audio portion to obtain pulse coding modulation data; dividing the pulse coding modulation data into audio clips, and classifying the audio clips; dividing the audio clips, which are classified as speeches, into short sentences, and determining the start time and end time of each short sentence; searching a match item in a plug-in subtitle file according to the determined start time and end time of the short sentences; changing the start time of the match item into presentation time stamp PTS of a current video, and updating the starting time of each item, the start time of which is larger than that of the match item, in the plug-in subtitle file according to the presentation time stamp. The display time of the subtitle file is allowed to be consistent with the play time of audio/video, thereby realizing automatic synchronization of plug-in subtitles and improving watch experience of a user.
Description
Technical field
The present invention relates to video display arts field, particularly to the automatic synchronous method of a kind of plug-in captions
And device.
Background technology
Captions (subtitles ofmotion picture) refer to make with written form display TV, film, stage
The non-visual contents such as the dialogue in product, also refer to the word of films and television programs post-production.In Making Movies etc.
During video work, video file and subtitle file can be integrated, so not do when playing
The captions that method changes and removes are referred to as embedded captions.And in some works, respective self-existent video literary composition
Part and subtitle file are each individually present, and then can import the subtitle file of required version when video playback,
This kind of subtitle file is referred to as plug-in captions.Comparing embedded captions, plug-in captions have versatile and flexible, import
Convenience and the advantage without compromising on video quality etc..
Plug-in captions typically use specific subtitles software to carry out captions making.This production method firstly the need of
Manually listen complete lines, according to the content described in every lines, complete lines captions are input to electricity
Among Ziwen basis, it utilizes specific subtitles software, while listen caption content, while carry out manual punctuate, with really
The initial time of fixed each dialogue and dialogue length, the most so-called " time shaft ".When whole captions systems
Making complete, captions software can export the plug-in subtitle file of a certain or several form.When certain plays system
When system is capable of identify that and supports the broadcast mode of plug-in captions, these captions can be loaded when video playback
File.But, the own characteristic made due to plug-in subtitle file determines, the time of plug-in subtitle file
Labelling accuracy is poor, poor with the synchronicity of audio frequency and video when causing playing, and user manually regulates captions
Reproduction time then seem cumbersome, have a strong impact on the normal viewing of user.
Summary of the invention
It is an object of the invention to provide automatic synchronous method and the device of a kind of plug-in captions so that captions
The display time of file is consistent with the reproduction time of audio frequency and video, thus realizes the automatic synchronization of plug-in captions,
The viewing improving user is experienced.
For solving above-mentioned technical problem, embodiments of the present invention provide the most same of a kind of plug-in captions
One step process, comprises the steps of the audio-frequency unit extracting video file, and is decoded audio-frequency unit,
Obtain pulse code modulation data;Described pulse code modulation data is cut into audio fragment, and to institute
State audio fragment to classify;Wherein, the classification of described classification comprises: quiet, voice and non-voice;
The described audio fragment being categorized as voice is divided into short sentence, and determines initial time and the knot of described short sentence
The bundle time;Initial time according to the described short sentence determined and end time, search in plug-in subtitle file
One occurrence of rope;The initial time of described occurrence is changed to the reproduction time stamp PTS of current video,
And stab according to described reproduction time, update initial time rising more than described occurrence in plug-in subtitle file
The initial time of each of time beginning.
Embodiments of the present invention additionally provide the automatic synchronizing apparatus of a kind of plug-in captions, comprise: extract
Module, cutting module, division module, search module and more new module;Described extraction module is used for extracting
The audio-frequency unit of video file, and audio-frequency unit is decoded, it is thus achieved that pulse code modulation data;Institute
State cutting module for described pulse code modulation data being cut into audio fragment, and to described audio frequency sheet
Duan Jinhang classifies;Wherein, the classification of described classification comprises: quiet, voice and non-voice;Described division
Module for being divided into short sentence by the described audio fragment being categorized as voice, and determines the initial of described short sentence
Time and end time;Described search module is for the initial time according to the described short sentence determined and end
Time, plug-in subtitle file is searched for an occurrence;Described more new module is for by described occurrence
Initial time change to the reproduction time stamp PTS of current video, and stab according to described reproduction time, more
In new plug-in subtitle file during initial more than each of initial time of described occurrence of initial time
Between.
Embodiment of the present invention in terms of existing technologies, extracts the audio-frequency unit of video file, and right
Audio-frequency unit is decoded obtaining pulse code modulation data, and pulse code modulation data is cut into audio frequency
Fragment, and audio fragment is categorized as voice, quiet, non-voice, and then would be classified as the audio frequency of voice
Fragment be divided into short sentence, and determine initial time and the end time of short sentence, and then short according to determine
The initial time of sentence and end time, plug-in subtitle file is searched for an occurrence, and by occurrence
Initial time change to current video reproduction time stamp PTS, and according to reproduction time stab, outside renewal
Hang initial time in subtitle file and be more than the initial time of each of the initial time of occurrence, so that
Obtain display time and the video playback automatic synchronization of the dialogue of subtitle file, improve user's viewing and experience.
Preferably, in the described initial time according to the described short sentence determined and end time, at plug-in word
Curtain file is searched in the step of an occurrence, comprise following sub-step: before and after described initial time
In preset duration, in described plug-in subtitle file, find corresponding entry;In the described corresponding entry found,
Find out all items in error allowed band of the dialogue duration with described short sentence;If the item number found out
More than one, by the described short sentence determined upper one record with the described item found out upper one record into
Row compares, until finding most like one as occurrence.Thus improve the coupling of captions and audio frequency and video
Efficiency and accuracy.
Preferably, described, described audio fragment is divided in the step of short sentence, enters according to speech pause
Row divides;Wherein, described speech pause is including at least the audio section of the first preset number.Such that it is able to carry
The efficiency that high statement divides.
Preferably, described first preset number is 2.Such that it is able to ignore shorter sound information, more
Protect well integrity in short.
Preferably, described short sentence is including at least the audio section of the second preset number, described second preset number
It it is 3.Such that it is able to the invalid information in short-term filtered out in audio frequency, improve the efficiency that statement divides.
Accompanying drawing explanation
Fig. 1 is the flow chart of the automatic synchronous method according to the plug-in captions of first embodiment of the invention;
Fig. 2 is according to first embodiment of the invention short sentence and subtitle item matching algorithm schematic diagram;
Fig. 3 is the structured flowchart of the automatic synchronizing apparatus according to the plug-in captions of second embodiment of the invention.
Detailed description of the invention
For making the object, technical solutions and advantages of the present invention clearer, below in conjunction with accompanying drawing to this
Bright each embodiment is explained in detail.But, it will be understood by those skilled in the art that
In each embodiment of the present invention, propose many technology to make reader be more fully understood that the application thin
Joint.But, even if there is no these ins and outs and many variations based on following embodiment and amendment,
The application each claim technical scheme required for protection can also be realized.
First embodiment of the present invention relates to the automatic synchronous method of a kind of plug-in captions, and idiographic flow is such as
Shown in Fig. 1, comprise the steps of
Step 10: extract the audio-frequency unit of video file, and audio-frequency unit is decoded, it is thus achieved that pulse
Coding modulation data.
Video file is obtained by video flowing and audio stream synthesis, during online broadcasting video, first from regarding
Frequency file extracts audio stream.The storehouse ffmpeg that increases income can be used to extract the audio-frequency unit of video file,
By respective decoder, audio-frequency unit is decoded as PCM (Pulse Coding Modulation, pulse again
Coded modulation, is called for short PCM) data.
Step 11: pulse code modulation data is cut into audio fragment, and audio fragment is classified.
In present embodiment, it is possible to use Marsyas software audio frequency (the pulse code modulation number to extracting
According to) classify, such as, pass through Marsyas, it can be determined that go out the classification of this voice data: quiet,
Voice and non-voice.The a length of 32ms of interface setting audio frame that can be provided by Marsyas, and by 5
Individual audio frame as an audio section, i.e. a length of 0.16s of audio section.In categorizing process, can be with audio frequency
Section carries out a subseries for unit, improves the efficiency of classification.Present embodiment is for the classification of audio fragment
Method is not specifically limited, as long as can voice and non-voice be distinguished.As can be seen here, pass through
The classification of this step can obtain initial time and the end time of sound bite in audio fragment, for from
Audio fragment extracts speech sentences lay the first stone.
Step 12: the audio fragment that would be classified as voice is divided into short sentence, and determines the initial time of short sentence
And the end time.By the classification of step 11, it may be determined that voice, non-voice, quiet etc. initial time
Between and the end time, and then according to speech pause, voice can be divided into short sentence.
The beginning detecting sentence in present embodiment is the key that short sentence divides with end, because only that reach
To higher end-point detection precision, just can accomplish with a definite target in view, it is achieved to sentence length sum purpose control
System.This step is based on the classification information obtained in step 11, by taking the segmentation algorithm preset can be from
Audio frequency intercepts out voice unit (i.e. short sentence).Specifically, following strategy can be used to carry out audio frequency cut
Point: during to enter the time point of quiet section before continuous speech section or non-speech segment as the beginning of sentence
Between, the time point of last voice segments during to terminate continuous speech section is as the end time of sentence.
So to the speech pause using certain time length i.e. available after audio frequency cutting as phase partitioning boundary, semantic
To the short sentence in complete " class sentence " unit, i.e. present embodiment.
But, it is likely to result in some extreme cases by above-mentioned cutting strategy detection sentence end points: such as
Can mark off some extremely short sentences, the length of these sentences is only one to two audio sections, such sentence
Son generally only comprises one or two word, does not even comprise any effective voice messaging, therefore these sentences
Needs are filtered out and cannot function as sentence effectively and carry out Subtitle Demonstration.
In order to improve cutting efficiency, cutting strategy arranges speech pause including at least the first preset number
Audio section, it is preferred that the audio section of the first preset number is such as 2 audio sections.By arranging language
The minimum length that sound pauses, can ignore the instantaneous ventilation etc. of shorter sound information, such as speaker,
It is thus possible to the integrity that protection is in short.
Further, the short sentence being syncopated as includes at least the audio section of the second preset number, it is preferred that the
The audio section of two preset number can be such as 3 audio sections, i.e. ignores the overall length voice less than 0.48 second
Unit, by limiting the minimum length of sentence, can filter out the invalid information in short-term in audio frequency, such as
Speaker tussiculas.
Should be appreciated that present embodiment is for the first preset data or the concrete numerical value of the second preset number
Be not restricted, in actual application, can according to the feature of language be adjusted with more accurately, the most true
Determine initial time and the end time of statement element.
By step 12, the section audio extracted is cut into the most independent statement, and has obtained
Take initial time and the end time of statement, may determine that the playing duration of statement accordingly.
Step 13: according to initial time and the end time of the short sentence determined, search in plug-in subtitle file
One occurrence of rope.
Generally, plug-in subtitle file includes initial time, the information of dialogue duration etc..This embodiment party
Formula, when playing, obtains plug-in subtitle file, and according to plug-in subtitle file create one < initial time,
Dialogue duration > data structure datastruct1, during such that it is able to find each dialogue initial easily
Between and dialogue duration.Then time according to the short sentence (i.e. dialogue in video) marked off in step 12 initial
Between and the end time in data structure datastruct1, find occurrence item.
Specifically, step 13 comprises following sub-step:
Sub-step 130: before and after initial time in preset duration, finds corresponding in plug-in subtitle file
?.
Ideally each dialogue in audio-frequency unit (short sentence being similar in present embodiment) rise
Time beginning, end time dialogue corresponding with subtitle file (i.e. corresponding entry in present embodiment) rise
Time beginning, end time synchronize.Owing to subtitle file of the prior art makes, cause captions
The initial time of corresponding entry, end time etc. and the initial time of dialogue, end in audio-frequency unit in file
There is deviation in the time.Therefore, in preset duration, (in the most possible captions, corresponding entry rises this step needs
Time beginning and the difference of the initial time of audio frequency dialogue) interior, from plug-in captions, find corresponding entry, this enforcement
Preset duration in mode can be 1 minute, i.e. in the initial time of the short sentence extracted from video file
Front and back in plug-in captions, find corresponding entry in 1 minute.Should be appreciated that preset duration can be according to captions
The actual features of file is set, and present embodiment is not restricted for the specific size of preset duration.
Sub-step 131: in the corresponding entry found, finds out the dialogue duration with short sentence and allows model in error
Enclose interior all items.
Such as, in 1 minute before and after the initial time of short sentence, in datastruct1 search with
The error of the dialogue duration of short sentence was at all items of 3 seconds.Such as, during the dialogue of short sentence a length of 4 seconds, as
Fruit is in 1 minute, and finding dialogue duration subtitle item between 2.5 seconds to 5.5 seconds has 3, then
Extract this 3 corresponding entry.Should be appreciated that the present embodiment concrete numerical value for error allowed band
Illustrate understanding merely for convenience, protection scope of the present invention can not be limited with this.
Sub-step 132: judge that the item number found out is the most more than one.If the item number found out is one
Individual, then it is assumed that this corresponding entry is the occurrence of corresponding audio frequency, continues executing with step 14, if the item found out
Number is more than one, then immediate occurrence in needing to screen further, therefore continues executing with step
133。
Sub-step 133: a upper record of a upper record of the short sentence determined with the item found out is carried out
Relatively, until finding most like one as occurrence.
Now be illustrated below: as in figure 2 it is shown, such as in step 131 short sentence P at datastruct1
In find 3 subtitle item (i.e. subtitle item A, subtitle item B, subtitle item C), then continue short sentence P
Upper one record short sentence P-1 respectively with subtitle item A, subtitle item B, the previous subtitle item of subtitle item C
A-1, subtitle item B-1, subtitle item C-1 match, and matching algorithm can be to compare initial time and dialogue
Durations etc., if short sentence P-1 finds the subtitle item of more than 2, then continue upper the one of short sentence P-1
Individual record short sentence P-2 mates with a upper record of the multiple subtitle item found respectively, class according to this
Push away, until finding the subtitle item matched with short sentence.
Step 14: the initial time of occurrence is changed to the reproduction time stamp PTS of current video, and root
Stab according to reproduction time, update initial time in plug-in subtitle file and be more than each of the initial time of occurrence
The initial time of item.
Specifically, first the initial time of occurrence is changed to the reproduction time stamp PTS of current video
(Presentation time stamp, current time is stabbed, and is called for short PTS), it is possible to by following public affairs
When formula updates initial more than each of the initial time of occurrence of initial time in plug-in subtitle file
Between:
Initial time 2=initial time 1-(item. initial time-video.pts)
Wherein, item. initial time is the initial time of current matching item, and video.pts is current video
The time of frame, then (item. initial time-video.pts) represents between current matching item and audio frequency and video
Time difference.Initial time 1 represents the initial time of the front subtitle item of correction in datastruct1, time initial
Between 2 represent in datastruct1 the initial time of subtitle item after correction.
Present embodiment can be embedded in playout software, in video display process, in video playback
It is performed both by present embodiment in starting end and predetermined time interval (such as 10 minutes) afterwards, i.e. obtains
Take the voice data with certain time length, be decoded thus obtain pulse code modulation data, then will
This portion of audio data carries out classifying and being cut into short sentence, and finds the coupling of short sentence in subtitle file
, and then update occurrence and reproduction time and be positioned at the initial time of all captions after this occurrence.
Or, it is also possible to the initial time of dialogues all in voice data is mated so that plug-in captions with
Audio frequency and video Complete Synchronization, reaches more preferably viewing effect.
Present embodiment in terms of existing technologies, extracts the audio-frequency unit of video file, to audio portion
Divide and be decoded obtaining pulse code modulation data, such that it is able to the voice messaging in audio frequency is analyzed,
Again pulse code modulation data is cut into audio fragment, such that it is able to divide by analyzing just audio fragment
Class is voice, quiet, non-voice, can would be classified as further voice audio fragment be divided into short
Sentence, and initial time and the end time of short sentence is determined with the reproduction time stamp PTS of current video frame, then
Initial time according to the short sentence determined and end time, plug-in subtitle file is searched for an occurrence,
Such that it is able to the initial time of occurrence to be changed to the reproduction time stamp PTS of current video, and according to broadcasting
Put timestamp, update initial time in plug-in subtitle file and be more than each 's of the initial time of occurrence
Initial time.By above-mentioned steps, present embodiment can be according to dialogue automatic time correction subtitle item
The display time, make Subtitle Demonstration and audio and video playing time consistency, so that plug-in captions and audio frequency and video
Automatic synchronization, reaches preferably viewing effect, improves Consumer's Experience.
The step of the most various methods divides, and is intended merely to describe clear, it is achieved time can merge into one
Step or split some step, is decomposed into multiple step, as long as comprising identical logical relation,
All in the protection domain of this patent;To adding inessential amendment in algorithm or in flow process or drawing
Enter inessential design, but do not change the core design of its algorithm and flow process all at the protection model of this patent
In enclosing.
Second embodiment of the invention relates to the automatic synchronizing apparatus of a kind of plug-in captions, as it is shown on figure 3,
Comprise: extraction module, cutting module, division module, search module and more new module.
Extraction module is for extracting the audio-frequency unit of video file, and is decoded audio-frequency unit, it is thus achieved that
Pulse code modulation data.
Cutting module for being cut into audio fragment by pulse code modulation data, and carries out audio fragment
Classification, wherein, the classification of classification comprises: quiet, voice and non-voice.
Divide module and be divided into short sentence for the audio fragment that would be classified as voice, and determine the initial of short sentence
Time and end time.Specifically, module is divided for carrying out short sentence division, voice according to speech pause
Pause the audio section including at least the first preset number, and is divided into audio fragment including at least second
The short sentence of the audio section of preset number.Wherein, the audio section of the first preset number time a length of, second is pre-
If Serpentis purpose audio frequency end time a length of.Should be appreciated that the first preset number and the second preset number
Own characteristic according to voice data and subtitle file is set, and present embodiment is for its concrete number
Value is not restricted.
Search module comprises further: initial matched sub-block, dialogue matched sub-block and comparison match
Module.Initial matched sub-block is used for before and after initial time in preset duration, in plug-in subtitle file
Find corresponding entry for the initial time according to the short sentence determined and end time, in plug-in subtitle file
Search for an occurrence.Dialogue matched sub-block is used in the corresponding entry that initial matched sub-block finds,
Find out all items in error allowed band of the dialogue duration with short sentence.Comparison match submodule is used for
When item number that dialogue matched sub-block is found out is more than one, by the upper record of the short sentence determined and look for
A upper record of the item gone out compares, until finding most like one as occurrence.
More new module stabs PTS for the reproduction time that the initial time of occurrence changes to current video,
And stab according to reproduction time, update initial time in plug-in subtitle file and be more than the initial time of occurrence
The initial time of each.
Present embodiment in terms of existing technologies, by extracting the voice data in video file, and
Voice data is classified, is cut into statement, thus obtain the initial time of accurate statement, end
Time, and in subtitle file, find occurrence accordingly, and by the initial time in occurrence correspondingly
Modify so that subtitle file synchronizes to reach Tong Bu with audio frequency and video.Therefore, present embodiment is without user
Manually regulate plug-in captions, it is possible to make plug-in captions automatic synchronization in audio frequency and video, thus reach preferably
Viewing effect, improves Consumer's Experience.
It is seen that, present embodiment is the system embodiment corresponding with the first embodiment, this enforcement
Mode can be worked in coordination enforcement with the first embodiment.The relevant technical details mentioned in first embodiment
The most effective, in order to reduce repetition, repeat no more here.Correspondingly, this enforcement
The relevant technical details mentioned in mode is also applicable in the first embodiment.
It is noted that each module involved in present embodiment is logic module, in reality
In application, a logical block can be a physical location, it is also possible to be one of a physical location
Point, it is also possible to realize with the combination of multiple physical locations.Additionally, for the innovative part highlighting the present invention,
Not by the unit the closest with solving technical problem relation proposed by the invention in present embodiment
Introduce, but this is not intended that in present embodiment the unit that there is not other.
It will be understood by those skilled in the art that the respective embodiments described above are realize the present invention concrete
Embodiment, and in actual applications, can to it, various changes can be made in the form and details, and the most inclined
From the spirit and scope of the present invention.
Claims (11)
1. the automatic synchronous method of plug-in captions, it is characterised in that comprise the steps of
Extract the audio-frequency unit of video file, and audio-frequency unit is decoded, it is thus achieved that pulse code modulation
Data;
Described pulse code modulation data is cut into audio fragment, and described audio fragment is carried out point
Class;Wherein, the classification of described classification comprises: quiet, voice and non-voice;
The described audio fragment being categorized as voice is divided into short sentence, and determines the initial time of described short sentence
And the end time;
Initial time according to the described short sentence determined and end time, plug-in subtitle file is searched for one
Individual occurrence;
The initial time of described occurrence is changed to the reproduction time stamp PTS of current video, and according to institute
State reproduction time stamp, update initial time in plug-in subtitle file and be more than the initial time of described occurrence
The initial time of each.
The automatic synchronous method of plug-in captions the most according to claim 1, it is characterised in that
The described initial time according to the described short sentence determined and end time, plug-in subtitle file is searched for one
In the step of individual occurrence, comprise following sub-step:
Before and after described initial time in preset duration, in described plug-in subtitle file, find corresponding entry;
In the described corresponding entry found, find out the dialogue duration with described short sentence in error allowed band
All items;
If the item number found out is more than one, a upper record of the described short sentence determined is looked for described
A upper record of the item gone out compares, until finding most like one as occurrence.
The automatic synchronous method of plug-in captions the most according to claim 1 and 2, it is characterised in that
Described, described audio fragment is divided in the step of short sentence, divides according to speech pause;
Wherein, described speech pause is including at least the audio section of the first preset number.
The automatic synchronous method of plug-in captions the most according to claim 3, it is characterised in that institute
Stating the first preset number is 2.
The automatic synchronous method of plug-in captions the most according to claim 3, it is characterised in that institute
State the short sentence audio section including at least the second preset number.
The automatic synchronous method of plug-in captions the most according to claim 5, it is characterised in that institute
Stating the second preset number is 3.
The automatic synchronous method of plug-in captions the most according to claim 1, it is characterised in that
In the described initial time determining described short sentence and the step of end time, to enter continuous speech
Before Duan quiet section or the time point of non-speech segment are as the time started of sentence, to terminate continuous speech
The time point of last voice segments during section is as the end time of sentence.
8. the automatic synchronizing apparatus of plug-in captions, it is characterised in that comprise: extraction module, cut
Sub-module, division module, search module and more new module;
Described extraction module is for extracting the audio-frequency unit of video file, and is decoded audio-frequency unit,
Obtain pulse code modulation data;
Described cutting module is used for being cut into described pulse code modulation data audio fragment, and to described
Audio fragment is classified;Wherein, the classification of described classification comprises: quiet, voice and non-voice;
Described division module is for being divided into short sentence by the described audio fragment being categorized as voice, and determines institute
State initial time and the end time of short sentence;
Described search module is for the initial time according to the described short sentence determined and end time, plug-in
Subtitle file is searched for an occurrence;
When described more new module for changing to the broadcasting of current video by the initial time of described occurrence
Between stab PTS, and stab according to described reproduction time, update in plug-in subtitle file initial time more than described
The initial time of each of the initial time of occurrence.
The self-synchronous system of plug-in captions the most according to claim 8, it is characterised in that institute
State search module to comprise: initial matched sub-block, dialogue matched sub-block and comparison match submodule;
Described initial matched sub-block is used for before and after described initial time in preset duration, described plug-in
Subtitle file finds corresponding entry;
Described dialogue matched sub-block, in the corresponding entry that described initial matched sub-block finds, is found out
With the dialogue duration of the described short sentence all items in error allowed band;
Described comparison match submodule is more than one for the item number found out in described dialogue matched sub-block
Time individual, a upper record of a upper record of the described short sentence determined with the described item found out is compared
Relatively, until finding most like one as occurrence.
The self-synchronous system of plug-in captions the most according to claim 8 or claim 9, it is characterised in that
Described division module is additionally operable to divide according to speech pause;
Wherein, described speech pause is including at least the audio section of the first preset number.
The self-synchronous system of 11. plug-in captions according to claim 10, it is characterised in that
Described division module is additionally operable to described audio fragment is divided into the audio frequency including at least the second preset number
The short sentence of section.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201511018280.XA CN105898556A (en) | 2015-12-30 | 2015-12-30 | Plug-in subtitle automatic synchronization method and device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201511018280.XA CN105898556A (en) | 2015-12-30 | 2015-12-30 | Plug-in subtitle automatic synchronization method and device |
Publications (1)
Publication Number | Publication Date |
---|---|
CN105898556A true CN105898556A (en) | 2016-08-24 |
Family
ID=57002208
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201511018280.XA Pending CN105898556A (en) | 2015-12-30 | 2015-12-30 | Plug-in subtitle automatic synchronization method and device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN105898556A (en) |
Cited By (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106504773A (en) * | 2016-11-08 | 2017-03-15 | 上海贝生医疗设备有限公司 | A kind of wearable device and voice and activities monitoring system |
CN107402530A (en) * | 2017-09-20 | 2017-11-28 | 淮安市维达科技有限公司 | Control system of one computer using lines captions as core coordination linkage stage equipment |
CN107562737A (en) * | 2017-09-05 | 2018-01-09 | 语联网(武汉)信息技术有限公司 | A kind of methods of video segmentation and its system for being used to translate |
CN108305636A (en) * | 2017-11-06 | 2018-07-20 | 腾讯科技(深圳)有限公司 | A kind of audio file processing method and processing device |
CN108924664A (en) * | 2018-07-26 | 2018-11-30 | 青岛海信电器股份有限公司 | A kind of synchronous display method and terminal of program credits |
CN109005444A (en) * | 2017-06-07 | 2018-12-14 | 纳宝株式会社 | Content providing server, content providing terminal and content providing |
CN109413475A (en) * | 2017-05-09 | 2019-03-01 | 北京嘀嘀无限科技发展有限公司 | Method of adjustment, device and the server of subtitle in a kind of video |
CN110781649A (en) * | 2019-10-30 | 2020-02-11 | 中央电视台 | Subtitle editing method and device, computer storage medium and electronic equipment |
CN111050201A (en) * | 2019-12-10 | 2020-04-21 | Oppo广东移动通信有限公司 | Data processing method and device, electronic equipment and storage medium |
CN113992940A (en) * | 2021-12-27 | 2022-01-28 | 北京美摄网络科技有限公司 | Web end character video editing method, system, electronic equipment and storage medium |
CN114640874A (en) * | 2022-03-09 | 2022-06-17 | 湖南国科微电子股份有限公司 | Subtitle synchronization method and device, set top box and computer readable storage medium |
WO2023015416A1 (en) * | 2021-08-09 | 2023-02-16 | 深圳Tcl新技术有限公司 | Subtitle processing method and apparatus, and storage medium |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101021854A (en) * | 2006-10-11 | 2007-08-22 | 鲍东山 | Audio analysis system based on content |
US20090213924A1 (en) * | 2008-02-22 | 2009-08-27 | Sheng-Nan Sun | Method and Related Device for Converting Transport Stream into File |
CN103647909A (en) * | 2013-12-16 | 2014-03-19 | 宇龙计算机通信科技(深圳)有限公司 | Caption adjusting method and caption adjusting device |
-
2015
- 2015-12-30 CN CN201511018280.XA patent/CN105898556A/en active Pending
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101021854A (en) * | 2006-10-11 | 2007-08-22 | 鲍东山 | Audio analysis system based on content |
US20090213924A1 (en) * | 2008-02-22 | 2009-08-27 | Sheng-Nan Sun | Method and Related Device for Converting Transport Stream into File |
CN103647909A (en) * | 2013-12-16 | 2014-03-19 | 宇龙计算机通信科技(深圳)有限公司 | Caption adjusting method and caption adjusting device |
Cited By (19)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106504773A (en) * | 2016-11-08 | 2017-03-15 | 上海贝生医疗设备有限公司 | A kind of wearable device and voice and activities monitoring system |
CN109413475A (en) * | 2017-05-09 | 2019-03-01 | 北京嘀嘀无限科技发展有限公司 | Method of adjustment, device and the server of subtitle in a kind of video |
CN109005444A (en) * | 2017-06-07 | 2018-12-14 | 纳宝株式会社 | Content providing server, content providing terminal and content providing |
CN107562737A (en) * | 2017-09-05 | 2018-01-09 | 语联网(武汉)信息技术有限公司 | A kind of methods of video segmentation and its system for being used to translate |
CN107402530A (en) * | 2017-09-20 | 2017-11-28 | 淮安市维达科技有限公司 | Control system of one computer using lines captions as core coordination linkage stage equipment |
CN108305636A (en) * | 2017-11-06 | 2018-07-20 | 腾讯科技(深圳)有限公司 | A kind of audio file processing method and processing device |
WO2019086044A1 (en) * | 2017-11-06 | 2019-05-09 | 腾讯科技(深圳)有限公司 | Audio file processing method, electronic device and storage medium |
US11538456B2 (en) | 2017-11-06 | 2022-12-27 | Tencent Technology (Shenzhen) Company Limited | Audio file processing method, electronic device, and storage medium |
CN108924664B (en) * | 2018-07-26 | 2021-06-08 | 海信视像科技股份有限公司 | Synchronous display method and terminal for program subtitles |
CN108924664A (en) * | 2018-07-26 | 2018-11-30 | 青岛海信电器股份有限公司 | A kind of synchronous display method and terminal of program credits |
CN110781649A (en) * | 2019-10-30 | 2020-02-11 | 中央电视台 | Subtitle editing method and device, computer storage medium and electronic equipment |
CN110781649B (en) * | 2019-10-30 | 2023-09-15 | 中央电视台 | Subtitle editing method and device, computer storage medium and electronic equipment |
CN111050201B (en) * | 2019-12-10 | 2022-06-14 | Oppo广东移动通信有限公司 | Data processing method and device, electronic equipment and storage medium |
CN111050201A (en) * | 2019-12-10 | 2020-04-21 | Oppo广东移动通信有限公司 | Data processing method and device, electronic equipment and storage medium |
WO2023015416A1 (en) * | 2021-08-09 | 2023-02-16 | 深圳Tcl新技术有限公司 | Subtitle processing method and apparatus, and storage medium |
CN113992940A (en) * | 2021-12-27 | 2022-01-28 | 北京美摄网络科技有限公司 | Web end character video editing method, system, electronic equipment and storage medium |
CN113992940B (en) * | 2021-12-27 | 2022-03-29 | 北京美摄网络科技有限公司 | Web end character video editing method, system, electronic equipment and storage medium |
CN114640874A (en) * | 2022-03-09 | 2022-06-17 | 湖南国科微电子股份有限公司 | Subtitle synchronization method and device, set top box and computer readable storage medium |
WO2023169240A1 (en) * | 2022-03-09 | 2023-09-14 | 湖南国科微电子股份有限公司 | Subtitle synchronization method and apparatus, set-top box and computer readable storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN105898556A (en) | Plug-in subtitle automatic synchronization method and device | |
CN108780643B (en) | Automatic dubbing method and device | |
US8281231B2 (en) | Timeline alignment for closed-caption text using speech recognition transcripts | |
US8179475B2 (en) | Apparatus and method for synchronizing a secondary audio track to the audio track of a video source | |
US20080219641A1 (en) | Apparatus and method for synchronizing a secondary audio track to the audio track of a video source | |
CN106792145A (en) | A kind of method and apparatus of the automatic overlapping text of audio frequency and video | |
US8958013B2 (en) | Aligning video clips to closed caption files | |
US9609397B1 (en) | Automatic synchronization of subtitles based on audio fingerprinting | |
US8564721B1 (en) | Timeline alignment and coordination for closed-caption text using speech recognition transcripts | |
US20200126559A1 (en) | Creating multi-media from transcript-aligned media recordings | |
CN105635782A (en) | Subtitle output method and device | |
KR20150057591A (en) | Method and apparatus for controlling playing video | |
CN106162293B (en) | A kind of method and device of video sound and image synchronization | |
US11064245B1 (en) | Piecewise hybrid video and audio synchronization | |
US20210151082A1 (en) | Systems and methods for mixing synthetic voice with original audio tracks | |
US10692497B1 (en) | Synchronized captioning system and methods for synchronizing captioning with scripted live performances | |
KR102308651B1 (en) | Media environment-oriented content distribution platform | |
WO2017062961A1 (en) | Methods and systems for interactive multimedia creation | |
Federico et al. | An automatic caption alignment mechanism for off-the-shelf speech recognition technologies | |
US9905221B2 (en) | Automatic generation of a database for speech recognition from video captions | |
CN109963092B (en) | Subtitle processing method and device and terminal | |
EP3839953A1 (en) | Automatic caption synchronization and positioning | |
CN106162323A (en) | A kind of video data handling procedure and device | |
CN112714348A (en) | Intelligent audio and video synchronization method | |
CN103152607B (en) | The supper-fast thick volume method of video |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WD01 | Invention patent application deemed withdrawn after publication |
Application publication date: 20160824 |
|
WD01 | Invention patent application deemed withdrawn after publication |