CN109862422A

CN109862422A - Method for processing video frequency, device, computer readable storage medium and computer equipment

Info

Publication number: CN109862422A
Application number: CN201910150506.3A
Authority: CN
Inventors: 杨广煜
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2019-02-28
Filing date: 2019-02-28
Publication date: 2019-06-07

Abstract

This application involves a kind of method for processing video frequency, device, computer readable storage medium and computer equipment, method includes: the audio fragment obtained in audio file；Application on Voiceprint Recognition is carried out to audio fragment, obtains the corresponding sound type of audio fragment；Obtain caption character corresponding with audio fragment in subtitle file；Obtain original video frame sequence corresponding with audio fragment in original video file；Based on the original video frame in original video frame sequence, role corresponding with sound type is identified；According to position of the role in original video frame, the subtitling image frame including caption character is generated；Subtitling image frame is used to synthesize target video frame with original video frame.The ability of delivery of video information can be improved in scheme provided by the present application.

Description

Method for processing video frequency, device, computer readable storage medium and computer equipment

Technical field

This application involves field of computer technology, more particularly to a kind of method for processing video frequency, device, computer-readable deposit Storage media and computer equipment.

Background technique

Nowadays most of videos are by the way that video file, audio file and subtitle file are carried out group according to the agreement of agreement It is obtained after conjunction, video is simultaneously presented to the user by a complete video available in this way.Subtitle in video can be by sound Acoustic information in frequency file shows in the form of subtitles, and family can be used and visually see dialogue between role, Better understand the content of video.

However, being in most cases fixed placement in video pictures, leading to word in this way for the subtitle in video Defective tightness is contacted between the role occurred in curtain and video pictures, so that video conveys the ability of information to be restricted.

Summary of the invention

Based on this, it is necessary to the ability for causing video to convey information in video pictures for existing subtitle fixed placement The technical issues of being restricted provides a kind of method for processing video frequency, device, computer readable storage medium and computer equipment.

A kind of method for processing video frequency, comprising:

Obtain the audio fragment in audio file；

Application on Voiceprint Recognition is carried out to the audio fragment, obtains the corresponding sound type of the audio fragment；

Obtain caption character corresponding with the audio fragment in subtitle file；

Obtain original video frame sequence corresponding with the audio fragment in original video file；

Based on the original video frame in the original video frame sequence, role corresponding with the sound type is identified；

According to position of the role in the original video frame, the subtitling image frame including the caption character is generated； The subtitling image frame is used to synthesize target video frame with the original video frame.

A kind of video process apparatus, described device include:

Audio fragment obtains module, for obtaining the audio fragment in audio file；

It is corresponding to obtain the audio fragment for carrying out Application on Voiceprint Recognition to the audio fragment for sound type identification module Sound type；

Caption character obtains module, for obtaining caption character corresponding with the audio fragment in subtitle file；

Original video frame sequence obtains module, for obtaining original video frame corresponding with the audio fragment in original video file Sequence；

Role's identification module, for based on the original video frame in the original video frame sequence, identification and the sound type Corresponding role；

Subtitling image frame generation module, for the position according to the role in the original video frame, generating includes institute State the subtitling image frame of caption character；The subtitling image frame is used to synthesize target video frame with the original video frame.

A kind of computer readable storage medium is stored with computer program, when the computer program is executed by processor, So that the processor executes the step of above-mentioned method for processing video frequency.

A kind of computer equipment, including memory and processor, the memory are stored with computer program, the calculating When machine program is executed by the processor, so that the step of processor executes above-mentioned method for processing video frequency.

Above-mentioned method for processing video frequency, device, computer readable storage medium and computer equipment have pre-established entire view The corresponding relationship of sound type and role in frequency, by finding angle corresponding with the sound type identified in original video frame Color generates the subtitling image frame including caption character according to position of the role in original video frame.The subtitling image frame of generation It synthesizes to obtain the target video frame including the caption character close to role with original video frame.Due in the target video frame of synthesis Role of the subtitle in video pictures shows that the form of expression more horn of plenty is believed to improve video pictures and transmit to user The ability of breath is conducive to user and better understands video content.

Detailed description of the invention

Fig. 1 is the applied environment figure of method for processing video frequency in one embodiment；

Fig. 2 is the overall flow schematic diagram of method for processing video frequency in one embodiment；

Fig. 3 is the flow diagram of method for processing video frequency in one embodiment；

Fig. 4 is in one embodiment be the corresponding relationship established in one embodiment between sound type and role process Schematic diagram；

Fig. 5 is the schematic diagram of subtitling image frame corresponding with original video frame in one embodiment；

Fig. 6 is the flow diagram that terminal obtains credit video file from server in one embodiment；

Fig. 7 is the flow diagram of method for processing video frequency in a specific embodiment；

Fig. 8 is the structural block diagram of video process apparatus in one embodiment；

Fig. 9 is the structural block diagram of video process apparatus in another embodiment；

Figure 10 is the structural block diagram of computer equipment in one embodiment.

Specific embodiment

It is with reference to the accompanying drawings and embodiments, right in order to which the objects, technical solutions and advantages of the application are more clearly understood The application is further elaborated.It should be appreciated that specific embodiment described herein is only used to explain the application, and It is not used in restriction the application.

Fig. 1 is the applied environment figure of method for processing video frequency in one embodiment.Referring to Fig.1, the method for processing video frequency application In processing system for video.The processing system for video includes terminal 110 and server 120.Terminal 110 and server 120 pass through net Network connection.Terminal 110 specifically can be terminal console or mobile terminal, and mobile terminal specifically can be with mobile phone, tablet computer, notes At least one of this computer and Web TV etc..The mountable client having for carrying out video playing in terminal 110, such as Client or videoconference client etc. is broadcast live.It can be interacted, be led to server 120 by the client run in terminal 110 Client triggering is crossed after the acquisition synthetic video file of server 120, terminal 110 can be played based on the synthetic video file And show the target video frame in synthetic video file.Server 120 can use independent server either multiple servers The server cluster of composition is realized.

As shown in Fig. 2, being in one embodiment the overall flow schematic diagram of method for processing video frequency provided by the present application.Ginseng According to Fig. 2, server according to the subtitle file of a complete image, audio file and original video file generated credit video file, Credit video file is made of subtitling image frame, and server can also be by the credit video frame and original video in credit video file Original video frame in file synthesizes the target video frame for obtaining caption character close to role, the synthetic video that target video frame is constituted File.The audio file of synthetic video file and the image can be sent to terminal in response to the request of terminal by server, by Terminal, which is realized, plays the audio file and synthetic video file, and the caption character in target video frame in synthetic video file leans on Role in nearly video pictures, so that the target video frame in synthetic video file strengthens contacting between subtitle and role, More information can be transmitted to user.In one embodiment, the available user of terminal passes through locally-installed client The request of triggering obtains synthetic video file from server based on the request.

As shown in figure 3, in one embodiment, providing a kind of method for processing video frequency.The present embodiment is mainly in this way It is illustrated applied to the server 120 in above-mentioned Fig. 1.Referring to Fig. 3, which specifically comprises the following steps:

S302 obtains the audio fragment in audio file.

Wherein, audio file be include a complete image all voice data file.The institute of one complete image There is voice data to can be the dubbing data of all roles in this complete image.Audio fragment is by audio file with a language Sound data are each voice data that unit is divided.Specifically, the audio text of the available complete image of server Part obtains first audio fragment of audio file, from first sound in audio file according to the play time sequence of image Frequency segment starts, and is successively handled.

S304 carries out Application on Voiceprint Recognition to audio fragment, obtains the corresponding sound type of audio fragment.

Wherein, sound type is the classification of sound.In a complete image, if dubbing not of using of different role Together, corresponding sound type is also just different, and sound type and role are one-to-one corresponding relationships；If multiple roles have used phase Same dubs, then the sound type of this multiple role is also just identical, and sound type and role are many-to-one corresponding relationships；If one A role used it is multiple dub, then the corresponding sound type of this role just there are many, sound type and role are a pair of More corresponding relationships.Server can carry out Application on Voiceprint Recognition to currently processed audio fragment, and it is corresponding to obtain the audio fragment Sound type.For example different sound types can be marked with " 1 bugle call sound ", " 2 bugle call sound " etc., it can be with role in plot In name represent role.

In one embodiment, before step S304, method for processing video frequency further includes to the sound in a complete image The step of sound type and role establish corresponding relationship and store the corresponding relationship: each audio fragment in traversal audio file； Application on Voiceprint Recognition is carried out to current audio fragment, obtains sound type corresponding with audio fragment；It is matched when sound type exists Role when, then traverse next audio fragment；When matched role is not present in sound type, then obtain in original video file Original video frame sequence corresponding with current audio fragment carries out recognition of face to each original video frame in original video frame sequence, The role in original video frame is obtained, by sound type storage corresponding with the role identified.

Wherein, original video file be include a complete image all image datas file.In general, one complete Image is by audio file, video file and subtitle file according to obtaining after preset combination of protocols, same portion's image middle pitch There is corresponding close between the caption character in the sequence of frames of video and subtitle file in audio fragment, video file in frequency file System.For example, one section is dubbed and corresponded to one section of continuous video pictures, while a line subtitle is also corresponded to.

It specifically, can be with when corresponding relationship of the server between the sound type and role for establishing a complete image Each audio fragment in audio file is traversed, Application on Voiceprint Recognition is carried out to the audio fragment currently traversed, obtains corresponding sound Sound type if occurring when audio fragment of the sound type before traversal, that is, has locally identified and the audio The corresponding role of segment, then traverse next audio fragment；If not going out when audio fragment of the sound type before traversal Now cross, that is, temporarily there is not yet role corresponding with the sound type, then obtain in original video file with the audio fragment Corresponding original video frame sequence identifies angle corresponding with the sound type to recognition of face is carried out in the original video frame sequence Color, to establish the corresponding relationship between the sound type and the role.

In some embodiments, if when audio fragment of the sound type before traversal occurred, server still can be with Original video frame sequence corresponding with the audio fragment is obtained, recognition of face is carried out to original video frame sequence, identifies angle therein Color traverses next audio if the role identified role corresponding with the sound type is stored before is the same role Segment；If not the same role, then there may be multiple roles to be corresponding to it for the sound type, can to the sound type into Line flag, to carry out manual confirmation.

As shown in figure 4, for the flow diagram for the corresponding relationship established in one embodiment between sound type and role. Step in the flow chart is executed by server.Referring to Fig. 4, comprising the following steps: S402, server traverse in audio file Audio fragment；S404 carries out Application on Voiceprint Recognition to the audio fragment currently traversed, obtains sound type；S406 judges whether In the presence of role corresponding with the sound type；If so, thening follow the steps S412；If it is not, thening follow the steps S408；S408, to original Original video frame sequence corresponding with the audio fragment currently traversed carries out recognition of face in video file, obtains and the sound type Corresponding role；S410, by sound type storage corresponding with the role；S412 traverses next audio fragment.

In the above-described embodiments, if there are multiple faces in picture when carrying out recognition of face to original video frame sequence, also The speaker in picture further can be determined according to the opening and closing state of the shape of the mouth as one speaks of face each in original video frame sequence, thus will Speaker is as role corresponding with the sound type.

S306 obtains caption character corresponding with audio fragment in subtitle file.

Wherein, subtitle file includes the file of all caption characters an of complete image.The sound of same portion's complete image Frequency file, original video file and subtitle file can be mapped by timestamp.In the available subtitle file of server with The corresponding caption character of currently processed audio fragment.Caption character is one corresponding with the voice data in audio fragment Words.Step S306 can be executed after step S308 in the present embodiment, can also be executed after step S310.

S308 obtains original video frame sequence corresponding with audio fragment in original video file.

Specifically, after obtaining the corresponding sound type of currently processed audio fragment, server can be according to building in advance Corresponding relationship between vertical sound type and role inquires role corresponding with the sound type.Then server Obtain original video file in original video frame sequence corresponding with audio fragment, with determine the original video frame sequence in the presence or absence of with The identical role of the role inquired.Original video frame sequence is made of continuous multiple original video frames, present video segment The frame per second of the quantity of original video frame included by corresponding original video frame per second and the duration of present video segment and original video file It is positively correlated.

S310 identifies role corresponding with sound type based on the original video frame in original video frame sequence.

Specifically, server can to the original video frame sequence of acquisition carry out recognition of face, identify in original video frame with The corresponding role of sound type.In one embodiment, if the role and to inquire role corresponding with sound type be identical Role, then server can determine position of the role in original video frame.Position of the role in original video frame can be used Coordinate of the role in original video frame indicates.

S312 generates the subtitling image frame including caption character according to position of the role in original video frame；Subtitling image Frame is used to synthesize target video frame with original video frame.

Specifically, after server identifies role corresponding with sound type in original video frame, server can root According to position of the role identified in original video frame, a subtitling image frame identical with original video frame size, word are generated Include the caption character that obtains in step S306 in curtain picture frame, and the caption character according to the role identified in original video Position in frame is placed in the corresponding position of subtitling image frame.For example, head of the caption character close to role in subtitling image frame Portion is placed on corresponding position.

Subtitling image frame is the video frame of background transparent.In some embodiments, in subtitling image frame caption character word Body can be unified color, such as white or black, naturally it is also possible to be other colors.In further embodiments, word The font of curtain text may not be unified color, for example, the font color of caption character can be according to word in original video frame The corresponding background colour in position shown by curtain text accordingly adjusts, as long as caption character can be shown.For another example, subtitling image In frame the font color of caption character can also be in original video frame close to character location contrastive colours, caption character can be made Apparent, if placing than caption character close to the head of role, the color close to the head of role is black, then caption character For white.Caption character can laterally or longitudinally be shown.

It in one embodiment, can also include the subtitle pointer that caption character is directed toward to role in subtitling image frame；It should Method for processing video frequency further include: determine position of the role in original video frame；When each original regards role in original video frame sequence When position in frequency frame changes, then correspondingly change position of the subtitle pointer in subtitling image frame, so that in video frame Subtitle pointer is accordingly moved with the movement of role in the video continuous pictures of composition.

Wherein, subtitle pointer is the pointer for caption character to be directed toward to role corresponding with sound type.Subtitling image Caption character and subtitle pointer in frame cooperate, after the subtitling image frame synthesizes to obtain video frame with original video frame, new view The video pictures that frequency frame is showed can specify contacting between role in subtitle and picture, and user not only can be with more intuitive bright The really lines of which role can also convey more video informations to the defective user of hearing.

Specifically, server can determine angle in identifying original video frame sequence after role corresponding with sound type Position of the color in each original video frame can be determined according to the variation of position in the time corresponding with the audio fragment duration It is interior, variation of the role in picture.For example, currently processed audio fragment when it is 2 seconds a length of, the frame per second of original video file is 30 frames/second, then the original video frame sequence obtained include 60 frame original video frames, and for each original video frame, server, which all determines, to be known Not Chu role corresponding with sound type where position, so that it is determined that change in location of the role in picture in this 2 seconds.

In one embodiment, if the position where the role identified is not sent out in the corresponding duration of audio fragment Changing, the i.e. role are static in video pictures, then position of the subtitle pointer in subtitling image frame is also constant；If In the corresponding duration of audio fragment, determines that the role is moved in picture according to the variation of role position, for example be horizontal To or longitudinal linear movement, then position of the subtitle pointer in subtitling image frame also corresponding change, in this way, subtitling image frame with The effect of movement that the video frame obtained after the superposition of original video frame can show subtitle pointer with the movement of role.

In one embodiment, position of the caption character in subtitling image frame close to the role in the original video frame Set corresponding show in bubble text box.Wherein, bubble text box is the text presented in the form of bubble is wrapped in caption character This frame.In some embodiments, the caption character strong for some moods, can also be using the skill with more visual impact Art font is shown in subtitling image frame.

As shown in figure 5, for the schematic diagram of subtitling image frame corresponding with original video frame in one embodiment.It is left referring to Fig. 5 Side is three original video frames in the corresponding original video frame sequence of audio fragment；Referring among Fig. 5, corresponded to according to audio fragment Caption character, the position where the role that identifies in the original video frame of the left side, generate corresponding subtitling image frame, subtitling image Caption character in frame shows in bubble text box that the bubble pointer of bubble text box is directed toward should according to the position where role Role；New video frame is to synthesize original video frame with subtitling image frame shown on the right of Fig. 5, it can be seen that subtitle refers to The position of needle changes with the variation of character location.

In some embodiments, when the position according to role determines that role moves in original video picture, do not change The position of bubble text box in the subtitling image frame of generation, that is, the position of caption character will not change, and only accordingly change word The position of curtain pointer.

Above-mentioned method for processing video frequency has pre-established the corresponding relationship of sound type and role in entire video, by Role corresponding with the sound type identified is found in original video frame, according to position of the role in original video frame, is generated Subtitling image frame including caption character.The subtitling image frame of generation is used to synthesize to obtain with original video frame including close to role's The target video frame of caption character.Due to the subtitle in the target video frame of synthesis, the role in video pictures is shown, table Existing form more horn of plenty is conducive to user and better understands view to improve the ability that video pictures transmit information to user Frequency content.

In one embodiment, method for processing video frequency further include: when there is no corresponding with sound type in original video frame Role when, it is determined that role's title corresponding with role；Generate the subtitle including role's title, caption character and subtitle pointer Picture frame；The image border of subtitle pointer direction subtitling image frame.

Specifically, after server inquires role corresponding with sound type according to corresponding relationship, server is based on original Original video frame in sequence of frames of video carry out recognition of face it is unidentified go out inquiry role identical role when, that is, the role When not appearing in original video frame, then server can show caption character according to the corresponding role's title of role of inquiry In the subtitling image frame of generation.In the subtitling image frame, subtitle pointer is directed toward image border, and can be in caption character The corresponding role's title of the role is added in beginning, in this way, the subtitling image frame synthesizes after obtaining new video frame with original video frame, Contacting between the caption character and speaker's identity can also be conveyed to user.

In some embodiments, since more people speak simultaneously, the sound type of present video segment can not be identified, Then server can also be marked present video segment, determine that present video segment is corresponding by way of manual confirmation Sound type.

In some embodiments, recognition of face can not be carried out in the complicated picture of some high-speed motions due to having, it can The corresponding this kind of original video frame of audio fragment to be marked, to carry out manual confirmation, to generate corresponding credit video Frame.

In one embodiment, above-mentioned method for processing video frequency further include: determine that there is no corresponding sounds in original video file The original video frame of frequency segment；Blank image frame is generated, using blank image frame as the corresponding subtitling image frame of the original video frame.

Specifically, for the original video frame in original video file there is no corresponding audio fragment, for example, some landscape paintings Face etc., then server can be corresponding with these original video frames with the picture frame of blank, so that in the credit video file generated Original video frame in subtitling image frame and original video file is one-to-one, that is, credit video file and original video file Duration be identical.In this way, when synthesizing new video, it is only necessary to which the credit video frame in credit video file to be superimposed upon In original video file on corresponding original video frame, so that it may obtain new video frame.Certainly for being not present in original video file The original video frame of corresponding audio fragment, can be used directly the original video frame, is added to subtitle together with other subtitling image frames In video file.

In one embodiment, above-mentioned method for processing video frequency further include: corresponding according to original video frame each in original video file Credit video frame generate credit video file；By the storage corresponding with the video identifier of original video file of credit video file.

Specifically, for original video frame each in original video frame sequence, server generate corresponding credit video frame it Afterwards, so that it may obtain next audio-frequency fragments, next audio fragment is handled, generate the corresponding original of an audio fragment Credit video frame corresponding to each original video frame in sequence of frames of video, and so on, until to the audio piece in audio file Section is disposed, so that it may the corresponding all credit video frames of entire audio file is obtained, for the original of audio fragment is not present Video frame can be corresponding to it with blank image frame, thus, the corresponding credit video frame of entire original video file is obtained, these Credit video frame constitutes credit video file.Server can be by the video identifier pair of the credit video file and original video file It should store.Due in credit video file do not include original video file content, can by the credit video file with not Original video file with clarity or different-format is synthesized, and new synthetic video file is obtained, clear compared to every kind For the original video file of degree or format all generates corresponding credit video file, memory space can be saved.

In one embodiment, above-mentioned method for processing video frequency further include: receive the carrying video identifier and work as that terminal is sent The subtitle of preceding play time node opens request；Obtain the corresponding original video file of video identifier and credit video file；From original Current play time node is corresponding in the corresponding original video frame of current play time node and credit video file in video file Credit video frame start carry out Video Composition, obtain synthetic video file；It opens and requests in response to subtitle, by synthetic video text Part feeds back to terminal；The synthetic video file for feeding back to terminal is used to indicate terminal current play time from synthetic video file The corresponding video frame of node starts to play in order video frame.

Wherein, when the caption character in credit video file is shown in bubble text box, subtitle opens request can be with It is that bubble caption opens request, the subtitle open command of the available user's triggering of terminal, according to the video of currently playing video Mark, current play time node generate corresponding subtitle and open request, and are sent to server, and including to server request should The video file of special subtitle.Server extracts video and current play time node after receiving the subtitle and opening request, Obtain the corresponding credit video file of the video identifier and original video file, from credit video file with current play time section The corresponding subtitling image frame of point starts, and original video frame corresponding with current play time node is closed in original video file At synthetic video file being obtained, and synthetic video file is fed back to terminal, in this way, terminal can play the synthetic video File shows video pictures since current play time node, includes the view of the caption character close to role to user's displaying Frequency picture.Audio file corresponding with the video identifier can also be fed back to end together with the synthetic video file by server End, terminal can play the audio file and synthetic video file.

In some embodiments, if terminal barrage when playing video is to open, user's triggering is got in terminal After subtitle open command, barrage can be automatically closed, guarantee that subtitle is not covered by barrage.

As shown in fig. 6, the flow diagram of credit video file is obtained from server for terminal in one embodiment.Reference Fig. 6, comprising the following steps: S602: user, which clicks, opens bubble caption.Terminal gets user and clicks the behaviour for opening bubble caption After work, it is carried out S604: sending the bubble caption unlatching request for carrying video identifier to server.Server is according to video identifier Original video file and credit video file are obtained, and executes step S606: original video file and credit video file are closed At obtaining synthetic video file.Then server executes step S608: synthetic video file is back to terminal.S610: user It can the video with bubble caption that is played according to the synthetic video file of viewing terminal.

As shown in fig. 7, in a specific embodiment method for processing video frequency the following steps are included:

S702 traverses each audio fragment in audio file.

S704 carries out Application on Voiceprint Recognition to current audio fragment, obtains sound type corresponding with audio fragment.

S706 then traverses next audio fragment when the sound type has existed matched role；

S708 then obtains audio piece in original video file and current when matched role is not present in the sound type The corresponding original video frame sequence of section carries out recognition of face to each original video frame in the original video frame sequence, obtains original video frame In role, by the sound type it is corresponding with the role identified storage.

S710 obtains audio fragment currently processed in audio file；Application on Voiceprint Recognition is carried out to the audio fragment, is somebody's turn to do The corresponding sound type of audio fragment.

S712 obtains caption character corresponding with audio fragment to be processed in subtitle file.

S714 obtains original video frame sequence corresponding with the audio fragment in original video file.

S716 carries out recognition of face based on the original video frame in the original video frame sequence.

S718, when recognizing role corresponding to sound type corresponding with currently processed audio fragment, determining should Position of the role in original video frame.

S720, according to position of the role in original video frame, generating includes caption character and the subtitle for being directed toward the role The subtitling image frame of pointer；The subtitling image frame with original video frame for synthesizing to obtain new video frame.

S722, when position of the role in original video frame sequence in each original video frame changes, then correspondingly Change position of the subtitle pointer in the subtitling image frame, so that subtitle refers in the video continuous pictures that new video frame is constituted Needle is accordingly moved with the movement of role.

S724, when there is no roles corresponding with sound type corresponding to currently processed audio fragment in original video frame When, it is determined that role's title corresponding with the role；Generate the subtitling image including role's title, caption character and subtitle pointer Frame；Subtitle pointer is directed toward the image border of the subtitling image frame.

S726 determines the original video frame that corresponding audio fragment is not present in original video file；Blank image frame is generated, it will Blank image frame is as the corresponding subtitling image frame of the original video frame.

S728 generates credit video file according to the corresponding credit video frame of original video frame each in original video file；

S730, by the storage corresponding with the video identifier of original video file of credit video file.

S732 receives the subtitle unlatching request for carrying video identifier and current play time node that terminal is sent；

S734 obtains the corresponding original video file of video identifier and credit video file；

S736, it is current from the corresponding original video frame of current play time node in original video file and credit video file The corresponding credit video frame of play time node starts to carry out Video Composition, obtains synthetic video file；

S738 opens in response to subtitle and requests, synthetic video file is fed back to terminal；Feed back to the synthetic video of terminal File is used to indicate terminal and plays in order video the corresponding video frame of current play time node since synthetic video file Frame.

Above-mentioned method for processing video frequency establishes the corresponding relationship of sound type and role in entire video, by original video Role corresponding with the sound type identified is found in frame, according to position of the role in original video frame, generating includes word The subtitling image frame of curtain text.The subtitling image frame of generation synthesizes to obtain including the caption character close to role with original video frame Target video frame.Due to the subtitle in the target video frame of synthesis, the role in video pictures shows that the form of expression is more It is abundant, to improve the ability that video pictures transmit information to user, is conducive to user and better understands video content.

Fig. 7 is the flow diagram of method for processing video frequency in one embodiment.Although should be understood that the process of Fig. 7 Each step in figure is successively shown according to the instruction of arrow, but these steps are not the inevitable sequence indicated according to arrow Successively execute.Unless expressly stating otherwise herein, there is no stringent sequences to limit for the execution of these steps, these steps can To execute in other order.Moreover, at least part step in Fig. 7 may include multiple sub-steps or multiple stages, These sub-steps or stage are not necessarily to execute completion in synchronization, but can execute at different times, these Sub-step perhaps the stage execution sequence be also not necessarily successively carry out but can be with the son of other steps or other steps Step or at least part in stage execute in turn or alternately.

In one embodiment, as shown in figure 8, providing a kind of video process apparatus 800, which includes audio fragment Acquisition module 802, sound type identification module 804, caption character obtain module 806, original video frame sequence obtains module 808, Role's identification module 810 and subtitling image frame generation module 812, in which:

Audio fragment obtains module 802, for obtaining the audio fragment in audio file；

Sound type identification module 804 obtains the corresponding sound of audio fragment for carrying out Application on Voiceprint Recognition to audio fragment Type；

Caption character obtains module 806, for obtaining caption character corresponding with audio fragment in subtitle file；

Original video frame sequence obtains module 808, for obtaining original video frame corresponding with audio fragment in original video file Sequence；

Role's identification module 810, for identifying corresponding with sound type based on the original video frame in original video frame sequence Role；

Subtitling image frame generation module 812, for the position according to role in original video frame, generating includes caption character Subtitling image frame；Subtitling image frame is used to synthesize target video frame with original video frame.

It in one embodiment, further include the subtitle pointer that caption character is directed toward to role in subtitling image frame；At video Managing device 800 further includes subtitle pointer processing module；Subtitle pointer processing module is for determining position of the role in original video frame It sets；When position of the role in original video frame sequence in each original video frame changes, then correspondingly change subtitle pointer Position in subtitling image frame so that video frame constitute video continuous pictures in subtitle pointer with the movement of role and It is corresponding mobile.

In one embodiment, subtitling image frame generation module 812 is also used to be not present in the original video frame and sound class When the corresponding role of type, it is determined that role's title corresponding with role；Generate includes role's title, caption character and subtitle pointer Subtitling image frame；The image border of subtitle pointer direction subtitling image frame.

In one embodiment, subtitling image frame generation module 812 is also used to determine that there is no corresponding in original video file Audio fragment original video frame；Blank image frame is generated, using blank image frame as the corresponding subtitling image frame of original video frame.

In one embodiment, video process apparatus 800 further includes corresponding relationship memory module；Corresponding relationship memory module For traversing each audio fragment in audio file；Application on Voiceprint Recognition is carried out to current audio fragment, is obtained and audio fragment Corresponding sound type；When sound type is there are when matched role, then next audio fragment is traversed；When sound type is not deposited In matched role, then original video frame sequence corresponding with current audio fragment in original video file is obtained, to original video Each original video frame in frame sequence carries out recognition of face, obtains the role in original video frame, by sound type and the angle identified The corresponding storage of color.

In one embodiment, video process apparatus 800 further includes credit video file storage module；Credit video storage Module is used to generate credit video file according to the corresponding credit video frame of original video frame each in original video file；By credit video File storage corresponding with the video identifier of original video file.

In one embodiment, as shown in figure 9, video process apparatus 800 further includes that subtitle opens request receiving module 902, synthesis module 904 and sending module 906；Subtitle opens the carrying view that request receiving module 902 is used to receive terminal transmission Frequency marking is known and the subtitle of current play time node opens request；Synthesis module 904 is for obtaining the corresponding former view of video identifier Frequency file and credit video file；The corresponding original video frame of current play time node and credit video text from original video file The corresponding credit video frame of current play time node starts to carry out Video Composition in part, obtains synthetic video file；Send mould Block 906 is used to open in response to subtitle and request, and synthetic video file is fed back to terminal；Feed back to the synthetic video file of terminal It is used to indicate terminal and plays in order video frame the corresponding video frame of current play time node since synthetic video file.

In one embodiment, position of the caption character in subtitling image frame close to the role in the original video frame Set corresponding show in bubble text box.

Above-mentioned video process apparatus 800, has pre-established the corresponding relationship of sound type and role in entire video, has passed through Role corresponding with the sound type identified is found in original video frame, it is raw according to position of the role in original video frame At the subtitling image frame including caption character.The subtitling image frame of generation synthesizes to obtain the word including close to role with original video frame The target video frame of curtain text.Due to the subtitle in the target video frame of synthesis, the role in video pictures is shown, performance Form more horn of plenty is conducive to user and better understands video to improve the ability that video pictures transmit information to user Content.

Figure 10 shows the internal structure chart of computer equipment in one embodiment.The computer equipment specifically can be figure Server 120 in 1.As shown in Figure 10, which includes the processor connected by system bus 1002 1004, memory 1006 and network interface 1008.Wherein, memory 1006 includes non-volatile memory medium and built-in storage. The non-volatile memory medium of the computer equipment 1000 is stored with operating system, can also be stored with computer program, the calculating When machine program is executed by processor 1004, processor 1004 may make to realize method for processing video frequency.It can also be stored up in the built-in storage There is computer program, when which is executed by processor 1004, processor 1004 may make to execute video processing side Method.

It will be understood by those skilled in the art that structure shown in Figure 10, only part relevant to application scheme The block diagram of structure, does not constitute the restriction for the computer equipment being applied thereon to application scheme, and specific computer is set Standby may include perhaps combining certain components or with different component layouts than more or fewer components as shown in the figure.

In one embodiment, video process apparatus 800 provided by the present application can be implemented as a kind of computer program Form, computer program can be run in computer equipment as shown in Figure 10.Group can be stored in the memory of computer equipment At each program module of the video process apparatus, for example, audio fragment shown in Fig. 8 obtains module 802, sound type identification Module 804, caption character obtain module 806, original video frame sequence obtains module 808, role's identification module 810 and subtitling image Frame generation module 812.The computer program that each program module is constituted makes processor execute this Shen described in this specification It please step in the method for processing video frequency of each embodiment.

For example, computer equipment shown in Fig. 10 can pass through the audio piece in video process apparatus 800 as shown in Figure 8 Section obtains module 802 and executes step S302.Computer equipment can execute step S304 by sound type identification module 804.Meter The execution of module 806 step S306 can be obtained by caption character by calculating machine equipment.Computer equipment can be obtained by original video frame sequence Modulus block 808 executes step S308.Computer equipment can obtain module 808 by original video frame sequence and execute step S310.Meter Step S312 can be executed by subtitling image frame generation module 812 by calculating machine equipment.

In one embodiment, a kind of computer equipment, including memory and processor are provided, memory is stored with meter Calculation machine program, when computer program is executed by processor, so that the step of processor executes above-mentioned method for processing video frequency.It regards herein The step of frequency processing method, can be the step in the method for processing video frequency of above-mentioned each embodiment.

In one embodiment, a kind of computer readable storage medium is provided, computer program, computer journey are stored with When sequence is executed by processor, so that the step of processor executes above-mentioned method for processing video frequency.The step of method for processing video frequency herein It can be the step in the method for processing video frequency of above-mentioned each embodiment.

It can be with those of ordinary skill in the art will appreciate that realizing that all or part of the process in above-described embodiment method is Relevant hardware is instructed to complete by computer program, it is readable that computer program can be stored in a non-volatile computer It takes in storage medium, which when being executed, can execute according to the step of embodiment of such as above-mentioned each method.Wherein, To any reference of memory or other media used in each embodiment provided herein, may each comprise non-volatile And/or volatile memory.Nonvolatile memory may include that read-only memory (ROM), programming ROM (PROM), electricity can be compiled Journey ROM (EPROM), electrically erasable ROM (EEPROM) or flash memory.Volatile memory may include random access memory (RAM) or external cache.By way of illustration and not limitation, RAM is available in many forms, such as static state RAM (SRAM), dynamic ram (DRAM), synchronous dram (SDRAM), double data rate sdram (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronization link (Synchlink) DRAM (SLDRAM), memory bus (Rambus) directly RAM (RDRAM), straight Connect memory bus dynamic ram (DRDRAM) and memory bus dynamic ram (RDRAM) etc..

Each technical characteristic of above embodiments can be combined arbitrarily, for simplicity of description, not to above-described embodiment In each technical characteristic it is all possible combination be all described, as long as however, the combination of these technical characteristics be not present lance Shield all should be considered as described in this specification.

Above embodiments only express the several embodiments of the application, and the description thereof is more specific and detailed, but can not Therefore it is interpreted as the limitation to the application the scope of the patents.It should be pointed out that for those of ordinary skill in the art, Without departing from the concept of this application, various modifications and improvements can be made, these belong to the protection model of the application It encloses.Therefore, the scope of protection shall be subject to the appended claims for the application patent.

Claims

1. a kind of method for processing video frequency, comprising:

Obtain the audio fragment in audio file；

According to position of the role in the original video frame, the subtitling image frame including the caption character is generated；It is described Subtitling image frame is used to synthesize target video frame with the original video frame.

2. the method according to claim 1, wherein further including by the caption character in the subtitling image frame It is directed toward the subtitle pointer of the role；The method also includes:

Determine position of the role in the original video frame；

When position of the role in the original video frame sequence in each original video frame changes, then correspondingly change Position of the subtitle pointer in the subtitling image frame, so that in the video continuous pictures that the target video frame is constituted The subtitle pointer is accordingly moved with the movement of the role.

3. the method according to claim 1, wherein the method also includes:

When role corresponding with the sound type is not present in the original video frame, then

Determine role's title corresponding with the role；

Generate the subtitling image frame including role's title, the caption character and subtitle pointer；The subtitle pointer is directed toward The image border of the subtitling image frame.

4. the method according to claim 1, wherein the method also includes:

Determine the original video frame that corresponding audio fragment is not present in the original video file；

Blank image frame is generated, using the blank image frame as the corresponding subtitling image frame of the original video frame.

5. the method according to claim 1, wherein the method also includes:

Traverse each audio fragment in audio file；

Application on Voiceprint Recognition is carried out to current audio fragment, obtains sound type corresponding with the audio fragment；

When the sound type is there are when matched role, then next audio fragment is traversed；

When matched role is not present in the sound type, then obtain corresponding with current audio fragment in original video file Original video frame sequence carries out recognition of face to each original video frame in the original video frame sequence, obtains in the original video frame Role, by the sound type it is corresponding with the role identified storage.

6. the method according to claim 1, wherein the method also includes:

Credit video file is generated according to the corresponding credit video frame of original video frame each in original video file；

By credit video file storage corresponding with the video identifier of the original video file.

7. method according to any one of claims 1 to 6, which is characterized in that the method also includes:

Receive the subtitle unlatching request for carrying video identifier and current play time node that terminal is sent；

Obtain the corresponding original video file of the video identifier and credit video file；

In the corresponding original video frame of the current play time node described in the original video file and the credit video file The corresponding credit video frame of current play time node starts to carry out Video Composition, obtains synthetic video file；

It opens and requests in response to the subtitle, the synthetic video file is fed back into the terminal；Feed back to the terminal Synthetic video file is used to indicate the terminal corresponding view of current play time node described in the synthetic video file Frequency frame starts to play in order video frame.

8. method according to any one of claims 1 to 6, which is characterized in that the caption character in the subtitling image frame It is accordingly shown in bubble text box close to position of the role in the original video frame.

9. a kind of video process apparatus, which is characterized in that described device includes:

Sound type identification module obtains the corresponding sound of the audio fragment for carrying out Application on Voiceprint Recognition to the audio fragment Sound type；

Original video frame sequence obtains module, for obtaining original video frame sequence corresponding with the audio fragment in original video file Column；

Role's identification module, for identifying corresponding with the sound type based on the original video frame in the original video frame sequence Role；

Subtitling image frame generation module, for the position according to the role in the original video frame, generating includes the word The subtitling image frame of curtain text；The subtitling image frame is used to synthesize target video frame with the original video frame.

10. device according to claim 9, which is characterized in that further include by the subtitle text in the subtitling image frame Word is directed toward the subtitle pointer of the role；Described device further includes subtitle pointer processing module；The subtitle pointer processing module For determining position of the role in the original video frame；When each original regards the role in the original video frame sequence When position in frequency frame changes, then correspondingly change position of the subtitle pointer in the subtitling image frame, so that The subtitle pointer described in the video continuous pictures that the target video frame is constituted accordingly is moved with the movement of the role.

11. device according to claim 9, which is characterized in that the subtitling image frame generation module is also used to when described When role corresponding with the sound type being not present in original video frame, it is determined that role's title corresponding with the role；It is raw At the subtitling image frame for including role's title, the caption character and subtitle pointer；The subtitle pointer is directed toward the word The image border of curtain picture frame.

12. device according to claim 9, which is characterized in that described device further includes corresponding relationship memory module；It is described Corresponding relationship memory module is used to traverse each audio fragment in audio file；Vocal print knowledge is carried out to current audio fragment Not, sound type corresponding with the audio fragment is obtained；When the sound type is there are when matched role, then traverse next A audio fragment；When matched role is not present in the sound type, then audio piece in original video file and current is obtained The corresponding original video frame sequence of section carries out recognition of face to each original video frame in the original video frame sequence, obtains the original Role in video frame, by sound type storage corresponding with the role identified.

13. device according to claim 9, which is characterized in that described device further includes synthetic video file sending module； The subtitle of carrying video identifier and current play time node that synthetic video file sending module is used to receive terminal transmission is opened Open request；Obtain the corresponding original video file of the video identifier and credit video file；Described in the original video file The corresponding subtitle of current play time node in the corresponding original video frame of current play time node and the credit video file Video frame starts to carry out Video Composition, obtains synthetic video file；It opens and requests in response to the subtitle, by the synthetic video File feeds back to the terminal；The synthetic video file for feeding back to the terminal is used to indicate the terminal from the synthetic video The corresponding video frame of current play time node described in file starts to play in order video frame.

14. a kind of computer readable storage medium is stored with computer program, when the computer program is executed by processor, So that the processor is executed such as the step of any one of claims 1 to 8 the method.

15. a kind of computer equipment, including memory and processor, the memory is stored with computer program, the calculating When machine program is executed by the processor, so that the processor executes the step such as any one of claims 1 to 8 the method Suddenly.