CN103945140B

CN103945140B - The generation method and system of video caption

Info

Publication number: CN103945140B
Application number: CN201310018669.9A
Authority: CN
Inventors: 赵永刚
Original assignee: Lenovo Beijing Ltd
Current assignee: Lenovo Beijing Ltd
Priority date: 2013-01-17
Filing date: 2013-01-17
Publication date: 2017-11-28
Anticipated expiration: 2033-01-17
Also published as: CN103945140A

Abstract

The invention discloses a kind of video caption generation method and system, detection video caption plays type control information；Obtain the video caption broadcast information for playing type control information with the video caption and matching；It is determined that video caption animation model corresponding with the video caption broadcast information；Extract video caption text information；The video caption text information is finally converted using the video caption animation model, video caption is generated, because the video caption of generation is the video caption with caption animation model, realizes the purpose of the dynamic effect of video caption.

Description

The generation method and system of video caption

Technical field

The present invention relates to technical field of data processing, a kind of generation method more specifically to video caption and it is System.

Background technology

Now, video includes film and TV, because that can bring the lifting of visual experience, by rapid general And.

However, the generating mode of the captions of video still can only meet the requirement that planar solid is shown in the prior art, no Dynamic Announce can be realized.

The content of the invention

In view of this, the present invention provides a kind of generation method of video caption, to generate the video caption of dynamic effect.

To achieve these goals, it is proposed that scheme it is as follows：

A kind of video caption generation method, including：

Detect video caption and play type control information；

Obtain the video caption broadcast information for playing type control information with the video caption and matching；

It is determined that video caption animation model corresponding with the video caption broadcast information；

Extract video caption text information；

The video caption text information is converted using the video caption animation model, generates video caption.

Preferably, the detection video caption, which plays type control information, includes：

Gather the human facial expression information of speech vendors corresponding with captions in video.

Receive user's input video captions and play type control information.

Gather the tone of speech vendors corresponding with captions in video；

The tonal variations of preset time period are calculated, it is determined that video caption corresponding with the tonal variations plays Type Control Information.

Preferably, the extraction video caption text information includes：

Gather the voice messaging of speech vendors corresponding with captions in video；

The voice messaging is identified, generates text information corresponding with the voice.

Preferably, also include before generating video caption：

Gather the speech volume of speech vendors corresponding with captions in video；

The parameter of the video caption animation model is adjusted according to the speech volume.

A kind of video caption generates system, including：

Detector, type control information is played for detecting video caption；

Processor, for obtaining the video caption broadcasting letter for playing type control information with the video caption and matching Breath；It is determined that video caption animation model corresponding with the video caption broadcast information；Extract video caption text information；Using The video caption animation model makes the video caption text information, generates video caption.

Preferably, the detector is image acquisition device, for gathering speech vendors corresponding with captions in video Human facial expression information.

Preferably, the detector is receiver, and the video caption for receiving user's input plays type control information.

Preferably, the detector is voice collector, for gathering speech vendors corresponding with captions in video Tone；

The processor is additionally operable to obtain the tone, calculates the tonal variations of preset time period, it is determined that with the tone Video caption corresponding to change plays type control information.

Preferably, the mode of the processor extraction video caption text information includes：

Preferably, the processor is additionally operable to before video caption is generated, and gathers voice corresponding with captions in video The speech volume of supplier；The parameter of the video caption animation model is adjusted according to the speech volume.

It can be seen from the above technical scheme that in the generation method of video caption disclosed by the invention, the video of generation Captions are the video caption with caption animation model, realize the dynamic effect of video caption.

Brief description of the drawings

In order to illustrate more clearly about the embodiment of the present invention or technical scheme of the prior art, below will be to embodiment or existing There is the required accompanying drawing used in technology description to be briefly described, it should be apparent that, drawings in the following description are only this Some embodiments of invention, for those of ordinary skill in the art, on the premise of not paying creative work, can be with Other accompanying drawings are obtained according to these accompanying drawings.

Fig. 1 is a kind of flow chart of the generation method of video caption disclosed in the embodiment of the present invention；

Fig. 2 is a kind of flow chart of the generation method of video caption disclosed in another embodiment of the present invention；

Fig. 3 is a kind of flow chart of the generation method of video caption disclosed in another embodiment of the present invention；

Fig. 4 is a kind of flow chart of the generation method of video caption disclosed in another embodiment of the present invention；

Fig. 5 is a kind of flow chart of the generation method of video caption disclosed in another embodiment of the present invention；

Fig. 6 is the structure chart that a kind of video caption disclosed in another embodiment of the present invention generates system.

Embodiment

Below in conjunction with the accompanying drawing in the embodiment of the present invention, the technical scheme in the embodiment of the present invention is carried out clear, complete Site preparation describes, it is clear that described embodiment is only part of the embodiment of the present invention, rather than whole embodiments.It is based on Embodiment in the present invention, those of ordinary skill in the art are obtained every other under the premise of creative work is not made Embodiment, belong to the scope of protection of the invention.

The embodiment of the present invention provides a kind of generation method of video caption, to generate the video caption of dynamic effect.

Referring to Fig. 1, video caption generation method disclosed in the embodiment of the present invention, including step：

S101, detection video caption play type control information；

Wherein, the video caption plays the generation type that type control information controls the video caption, and plays and regard Played out during frequency captions using the generation type of video caption.

S102, obtain the video caption broadcast information for playing type control information with the video caption and matching；

Specifically, prestoring video caption plays type control information and the corresponding relation of video caption broadcast information, After the video caption broadcasting type control information is got, search in the corresponding relation and played with the video caption The video caption broadcast information that type control information matches.

S103, determine video caption animation model corresponding with the video caption broadcast information；

S104, extraction video caption text information；

Specifically, video caption text information can be prestored, when needing to generate video caption, acquisition prestores Video caption text information；Or when needing to generate video caption, receive the video caption text information of input.

S105, the video caption text information, generation video caption are converted using the video caption animation model.

Wherein, when needing to generate the video caption of animation effect, it is necessary to be generated according to video caption animation model.

In the generation method of video caption disclosed in the present embodiment, the video caption of generation is with caption animation model Video caption, realize the dynamic effect of video caption.

Preferably, in video caption generation method disclosed in the present embodiment, before step S105, it can also carry out following step Suddenly：

Specifically, the parameter in the video caption animation model is used for the animation effect for controlling the video caption of generation Degree, when needing to generate the video caption of different animation effect degree, the parameter of video caption animation model can be adjusted.

In the playing process of video, the language and characters of the speaker in video are corresponding with captions, gather the language of speaker Sound volume, the parameter of the video caption animation model is adjusted according to the speech volume, generates different animation effect degree Video caption.

Another embodiment of the present invention also discloses a kind of video caption generation method, as shown in Fig. 2 including step：

The human facial expression information of speech vendors corresponding with captions in S201, collection video；

Specifically, in video display process, the language and characters of the speaker currently shown can be identical with captions.Also, say The human face expression of words person can change according to the scene of video, gather the human facial expression information of the speaker currently shown, can To ensure that the animation effect of the video caption of generation is identical with the scene of current video.

Also, human facial expression information can include single characteristic informations such as the interpupillary distance of eyes, the appearance profile of eyes and mouth shape, Or including all features of human face expression change, including eyes, the corners of the mouth and eyebrow etc. can be reflected.

S202, obtain the video caption broadcast information to match with the human facial expression information；

Specifically, after collecting human facial expression information, by identifying that the human facial expression information reflects current video Scene.Also, the video caption broadcast information obtained matches with human facial expression information, it is ensured that the video caption of generation is expired The demand of sufficient video scene.

Such as：When the human facial expression information of collection shows that the speaker currently shown is very glad, illustrate the field of current video Scape is cheerful and light-hearted scene；When the human facial expression information of collection shows that the speaker currently shown is very angry, illustrate current video Scene is nervous scene.

Wherein, when the interpupillary distance that the human facial expression information is eyes, the size of the interpupillary distance of eyes can be analyzed, to determine to work as The mood of the speaker of preceding display；When the human facial expression information is the appearance profile of eyes, the profile of eyes can be analyzed The trend of profile, to determine the mood of the speaker currently shown；When the human facial expression information is mouth shape, can equally analyze The mood for walking the speaker for always determining currently to show of mouth shape.

When the human facial expression information is integrated information, including all features of human face expression change can be reflected, can With the human face expression for being formed all features that can reflect human face expression change and multiple basic human face expression templates Matched, the mood of the people indicated by the higher basic human face expression template of matching degree is the heart of the speaker currently shown Feelings.

Or using neural network analysis methods, using the expression of basic face as output neuron, generally six kinds of bases The expression of this face, using the human facial expression information collected as input neuron, analysis, which is calculated to correspond to, states human face expression The human face expression type of information, it is determined that the mood of the speaker currently shown.

S203, determine video caption animation model corresponding with the video caption broadcast information；

Wherein：Different video caption broadcast informations is corresponding with different video caption animation models；Get video words After curtain broadcast information, it is thus necessary to determine that video caption animation model corresponding with the video caption broadcast information.

For example, when the mood for the speaker that video caption broadcast information reaction is currently shown is glad, can be true The model of fixed cheerful and light-hearted captions bouncing effect；When the mood for the speaker that video caption broadcast information reaction is currently shown is When angry, it may be determined that there is the model of destructive effect.

S204, extraction video caption text information；

It is same as the previously described embodiments, video caption text information can be prestored, when needing to generate video caption, is obtained Take the video caption text information prestored；Or when needing to generate video caption, receive the video caption word of input Information.

S205, the video caption text information, generation video caption are converted using the video caption animation model.

In the generation method of video caption disclosed by the invention, the video caption of generation is regarding with caption animation model Frequency captions, realize the dynamic effect of video caption；Also, the caption animation model of the video caption also and with the captions Corresponding speech vendors' human facial expression information is corresponding, and the dynamic effect of captions is met to the face table of subtitle language supplier Feelings demand, enhance the iconicity of screen picture.

Same as the previously described embodiments, the present embodiment may also include step before step S205：

Such as：When it is determined that video caption animation model be cheerful and light-hearted captions bouncing effect model, pass through the language of collection Sound volume adjusts the parameter of the model of cheerful and light-hearted captions bouncing effect, it is determined that the amplitude of the captions of bounce.

Another embodiment of the present invention also discloses a kind of video caption generation method, as shown in figure 3, including step：

S301, the video caption broadcasting type control information for receiving user's input；

Specifically, when needing the broadcasting type of video caption of manual control generation, can be played with input video captions Type control information.

S302, obtain the video caption broadcast information for playing type control information with the video caption and matching；

Equally, it is previously stored with the corresponding pass that storage video caption plays type control information and video caption broadcast information System, after the video caption broadcasting type control information is got, searched and the video caption in the corresponding relation Play the video caption broadcast information that type control information matches.

S303, determine video caption animation model corresponding with the video caption broadcast information；

S304, extraction video caption text information；

S305, the video caption text information, generation video caption are converted using the video caption animation model.

The detailed process of the present embodiment is shown in above-mentioned two embodiment disclosure, and here is omitted.

Video caption generation method disclosed in the present embodiment, the video caption inputted according to user play Type Control letter Breath, it is final to determine video caption animation model, then the video caption word letter is converted using the video caption animation model Breath, generate video caption；In this way, video caption can be generated according to user's request.

Another embodiment of the present invention also discloses a kind of video caption generation method, as shown in figure 4, including step：

The tone of speech vendors corresponding with captions in S401, collection video；

Specifically, in video display process, the scene of video is different, and the mood of speaker is different, and speaker's speaks Tone it is also different；The tone of speech vendors corresponding with captions judges current language in video by gathering a period of time The mood of sound supplier.

S402, the tonal variations for calculating preset time period, it is determined that video caption broadcast message class corresponding with the tonal variations Type control information；

Specifically, according to actual use demand setting time section, the change of the tone of the collection of the period is calculated, according to Tonal variations determine that video caption plays type control information.

Wherein, generally, when the tonal variations for judging preset time period are fast, then show speaker's mood to be excited or Person's indignation, it is determined that video caption to play type control information can be that control video caption have the video of violent animation effect Captions play type control information；

When the tonal variations for judging preset time are smaller, or do not change, show that speaker's phychology is gentle, it is determined that regard It can be that the video caption that control video caption has gentle animation effect plays type control that frequency captions, which play type control information, Information processed.

S403, obtain the video caption broadcast information for playing type control information with the video caption and matching；

S404, determine video caption animation model corresponding with the video caption broadcast information；

S405, extraction video caption text information；

S406, the video caption text information, generation video caption are converted using the video caption animation model.

In the present embodiment, video caption is generated according to speech vendors' tonal variations, the dynamic effect of video caption is expired The tonal variations of sufficient subtitle language supplier, equally also enhance the iconicity of screen picture.

The embodiment of corresponding diagram 3 and Fig. 4, it is preferable that can perform step before video caption is generated：

Wherein, specific process is shown in corresponding diagram 1 and Fig. 2 embodiment, and here is omitted.

Referring to Fig. 5, the also disclosed video caption generation method of another embodiment of the present invention, including step：

S501, detection video caption play type control information；

S502, obtain the video caption broadcast information for playing type control information with the video caption and matching；

S503, determine video caption animation model corresponding with the video caption broadcast information；

The voice messaging of speech vendors corresponding with captions in S504, collection video；

S505, the identification voice messaging, generate text information corresponding with the voice.

S506, the video caption text information, generation video caption are converted using the video caption animation model.

In the present embodiment, when video playback, voice messaging is gathered, identifies voice messaging, is generated corresponding with the voice Text information, it is not necessary to prestore video caption text information, it is not required that obtain video caption text information, it is simpler Folk prescription is just.

The detailed process of the present embodiment is shown in above-mentioned all embodiment disclosures, and here is omitted.

Another embodiment of the present invention also discloses a kind of video caption generation system, referring to Fig. 6, including：

Detector 101, type control information is played for detecting video caption；

Processor 102, for obtaining the video caption broadcasting for playing type control information with the video caption and matching Information；It is determined that video caption animation model corresponding with the video caption broadcast information；Extract video caption text information；Adopt The video caption text information is made with the video caption animation model, generates video caption.

Specifically, detector 101 detects that video caption is transmitted to processor 102, processing after playing type control information Video caption is prestored in device 102 and plays type control information and the corresponding relation of video caption broadcast information, works as processor After 102 receive the video caption broadcasting type control information, search in the corresponding relation and broadcast with the video caption The video caption broadcast information that type control information matches is put, then determines video corresponding with the video caption broadcast information Caption animation model；And video caption text information is extracted, finally, the video is made using the video caption animation model Caption character information, generate video caption.

Wherein, processor 102 can prestore video caption text information, when needing to generate video caption, obtain The video caption text information prestored；Or when needing to generate video caption, processor 102 receives the video of input Caption character information.

Video caption disclosed in the present embodiment generates system, when detector 101 detects that video caption plays Type Control Information, and processor 102 is sent it to, processor 102 obtains to match with video caption broadcasting type control information Video caption broadcast information；It is determined that video caption animation model corresponding with the video caption broadcast information；Extract video Caption character information；The video caption text information is made using the video caption animation model, generates video caption.This Sample, the video caption that processor 102 generates is the video caption with caption animation model, realizes the dynamic effect of video caption The purpose of fruit.

Preferably, the detector 101 in above-described embodiment can be image acquisition device, for gather in video with captions pair The human facial expression information of the speech vendors answered.

Specifically, described image collector can be camera, the facial image of speaker in screen is shot；Wherein, may be used To shoot whole face, can also be shot only for the part of face, such as：Human eye, mouth etc..

The processor obtains the image of camera shooting, identifies image, determines the current mood of speaker, and obtain The video caption broadcast information to match with the human facial expression information.

Wherein, image is identified to determine that the process of the current mood of speaker is shown in embodiment corresponding with Fig. 2, herein not Repeat again.

Or, it is preferable that the detector 101 in above-described embodiment is receiver, for receiving the video words of user's input Curtain plays type control information.

Specifically, the receiver can be connected with communication interface, the processor by communication interface with external equipment, use Family plays type control information in the human-computer interaction interface input video captions of external device, and the video caption plays type control Information processed passes through communications interface transmission to processor.

Again or, it is preferable that the detector 101 in above-described embodiment is voice collector, for gather in video with word The tone of speech vendors corresponding to curtain；

Specifically, the voice collector can be speech transducer, the frequency of the voice of speaker, i.e. tone are gathered. Processor obtains the frequency of the voice of the speaker of speech transducer collection, calculates the tonal variations of preset time period, it is determined that with Video caption corresponding to the tonal variations plays type control information.

The processor determines that video caption plays type control information according to the speed of tonal variations, and detailed process is shown in The content of embodiment corresponding to Fig. 4, here is omitted.

In above-mentioned all embodiments, the mode of the processor extraction video caption text information can be：Prestore Video caption text information, when needing to generate video caption, obtain the video caption text information prestored；Or when When needing to generate video caption, the video caption text information of input is received.

It can also include：During video playback, voice corresponding with captions carries in the processor collection video The voice messaging of donor；The voice messaging is identified, generates text information corresponding with the voice.It is this way it is not necessary to extra Store video caption text information or extra receive store video caption text information, need to only be converted according to video speech, It is simple and convenient.

Also, all embodiments disclosed above, in the processor using video caption animation model conversion institute Video caption text information is stated, before generating video caption, the processor can also carry out following operation：

Finally, it is to be noted that, herein, such as first and second or the like relational terms be used merely to by One entity or operation make a distinction with another entity or operation, and not necessarily require or imply these entities or operation Between any this actual relation or order be present.Moreover, term " comprising ", "comprising" or its any other variant meaning Covering including for nonexcludability, so that process, method, article or equipment including a series of elements not only include that A little key elements, but also the other element including being not expressly set out, or also include for this process, method, article or The intrinsic key element of equipment.In the absence of more restrictions, the key element limited by sentence "including a ...", is not arranged Except other identical element in the process including the key element, method, article or equipment being also present.

Each embodiment is described by the way of progressive in this specification, what each embodiment stressed be and other The difference of embodiment, between each embodiment identical similar portion mutually referring to.

The foregoing description of the disclosed embodiments, professional and technical personnel in the field are enable to realize or using the present invention. A variety of modifications to these embodiments will be apparent for those skilled in the art, as defined herein General Principle can be realized in other embodiments without departing from the spirit or scope of the present invention.Therefore, it is of the invention The embodiments shown herein is not intended to be limited to, and is to fit to and principles disclosed herein and features of novelty phase one The most wide scope caused.

Claims

A kind of 1. video caption generation method, it is characterised in that including：

The video caption for detecting speech vendors corresponding with captions in video plays type control information；

Obtain the video caption broadcast information for playing type control information with the video caption and matching；

It is determined that video caption animation model corresponding with the video caption broadcast information, wherein, react current speech supplier Video caption animation model corresponding to the video caption broadcast information of different moods is different, in the video caption animation model Parameter be used for control generation video caption animation effect degree, with will pass through adjustment video caption animation model ginseng Number, generate the video caption of different animation effect degree；

Extract video caption text information；

The video caption text information is converted using the video caption animation model, generates video caption.
2. according to the method for claim 1, it is characterised in that the detection video caption plays type control information bag Include：

Gather the human facial expression information of speech vendors corresponding with captions in video.
3. according to the method for claim 1, it is characterised in that the detection video caption plays type control information bag Include：

Receive user's input video captions and play type control information.
4. according to the method for claim 1, it is characterised in that the detection video caption plays type control information bag Include：

Gather the tone of speech vendors corresponding with captions in video；

The tonal variations of preset time period are calculated, it is determined that video caption corresponding with the tonal variations plays Type Control letter Breath.
5. according to the method for claim 1, it is characterised in that the extraction video caption text information includes：

Gather the voice messaging of speech vendors corresponding with captions in video；

The voice messaging is identified, generates text information corresponding with the voice.
6. according to the method described in any one in claim 1-5, it is characterised in that also include before generation video caption：

Gather the speech volume of speech vendors corresponding with captions in video；

The parameter of the video caption animation model is adjusted according to the speech volume.
7. a kind of video caption generates system, it is characterised in that including：

Detector, the video caption for detecting speech vendors corresponding with captions in video play type control information；

Processor, for obtaining the video caption broadcast information for playing type control information with the video caption and matching；Really Fixed video caption animation model corresponding with the video caption broadcast information；Extract video caption text information；Using described Video caption animation model makes the video caption text information, generates video caption；

Wherein, the video caption animation model corresponding to the video caption broadcast information of current speech supplier's difference mood is reacted Difference, the parameter in the video caption animation model are used for the degree for controlling the animation effect of the video caption of generation, so as to By adjusting the parameter of video caption animation model, the video caption of different animation effect degree is generated.
8. system according to claim 7, it is characterised in that the detector is image acquisition device, for gathering video In speech vendors corresponding with captions human facial expression information.
9. system according to claim 7, it is characterised in that the detector is receiver, for receiving user's input Video caption play type control information.
10. system according to claim 7, it is characterised in that the detector is voice collector, for gathering video In speech vendors corresponding with captions tone；

The processor is additionally operable to obtain the tone, calculates the tonal variations of preset time period, it is determined that with the tonal variations Corresponding video caption plays type control information.
11. system according to claim 7, it is characterised in that the side of the processor extraction video caption text information Formula includes：

Gather the voice messaging of speech vendors corresponding with captions in video；

The voice messaging is identified, generates text information corresponding with the voice.
12. according to the system described in claim 7-11 any one, it is characterised in that the processor is additionally operable to regard in generation Before frequency captions, the speech volume of speech vendors corresponding with captions in video is gathered；Institute is adjusted according to the speech volume State the parameter of video caption animation model.