CN108063722A

CN108063722A - Video data generating method, computer readable storage medium and electronic equipment

Info

Publication number: CN108063722A
Application number: CN201711385340.0A
Authority: CN
Inventors: 王宏达; 赵正雄
Original assignee: Beijing Times Pulse Information Technology Co Ltd
Current assignee: Beijing Times Pulse Information Technology Co Ltd
Priority date: 2017-12-20
Filing date: 2017-12-20
Publication date: 2018-05-22

Abstract

This application discloses a kind of video data generating method, computer readable storage medium and electronic equipments.The application is by obtaining the audio stream data in video, audio stream data is converted by speech recognition, it obtains corresponding text data, and the video data issued or shared is generated according to the image stream data of synchronization gain and the text data to be come by speech recognition conversion.The application can be embedded in the text of the voice messaging in characterization video in video data as a result, be expressed in terms of vision and the sense of hearing two for the voice messaging in video, provide more abundant user experience.And pass through the application video data generating method can " key " operation generation one with video, voice, image, the multimedia content of text, it is user-friendly.Video generation disclosed in the present application and dissemination method, in the issue of mobile Internet information and interaction scenarios, than issuing text information more convenient and efficient with input method and keyboard.

Description

Video data generating method, computer readable storage medium and electronic equipment

Technical field

This application involves video processing techniques, and in particular to a kind of video data generating method, computer-readable storage medium Matter and electronic equipment.

Background technology

With linking Internet bandwidth be continuously increased and the popularization of intelligent terminal, video content be increasingly becoming it is social should Main information carrier.Compared to graph text information and audio-frequency information, video content can transfer truer, more visual impact Information.The user of application program can be recorded generation video data by user terminal and be issued and be shared.It is but existing In technology, due to accent, environment or beholder itself, voice category information in video content often can not effectively by Beholder identifies.Meanwhile if necessary to add the additional informations such as special efficacy or word in video content, operation is comparatively laborious.

The content of the invention

In view of this, the application provides a kind of video data generating method, computer readable storage medium and electronic equipment, Its object is to automatically add text message in the video of generation, a key operation is realized, simultaneously so that user can break away from Dependence for input method carries out the input of video, voice, image and word freely.

On the one hand, present applicant proposes a kind of video data generating method, including:

Obtain image stream data and audio stream data；

Obtain the text data to come by speech recognition conversion；And

It is generated according to described image flow data, audio stream data and the text data to be come by speech recognition conversion Video data to be released.

Preferably, when detecting the operation to the first control, start to obtain described image flow data and audio stream data, When detecting the operation of the second control to being located at same position, triggering generates video data to be released to realize that a key is grasped Make.

Preferably, when detecting the operation to the first space, start to obtain described image flow data and audio stream data, When detecting that recording time reaches pre- specified time, redirect automatically and generate video data to be released to realize a key operation.

Preferably, the method further includes：

When receiving issue instruction, the video data to be released is uploaded into content server.

Preferably, obtaining the text data to come by speech recognition conversion is：

The text flow information to come by speech recognition conversion is obtained in real time.

Preferably, the method further includes：

During video record, the treated image stream data of real-time display and described pass through speech recognition conversion mistake The text flow information come.

Preferably, the treated image stream data of the real-time display and the text flow information include：

The image stream data obtained with the display of the first figure layer；

In the masking-out image and/or filtering effects that the addition of the second figure layer is selected；And

The text flow information to come by speech recognition conversion is shown in the 3rd figure layer.

Preferably, the method further includes：

Obtain masking-out image and/or filter that user selectes.

Preferably, show that the text flow information to come by speech recognition conversion includes in the 3rd figure layer：

Division mark in the text flow information to be come by speech recognition conversion shows described logical step by step Cross the text flow information that speech recognition conversion comes.

Preferably, the corresponding text data of the audio stream data that obtains includes：

The audio stream data is sent to online speech recognition server and receives what is come by speech recognition conversion Text flow information；Or

Call offline speech recognition application programming interface interface that the audio stream data information is identified to obtain described lead to Cross the text flow information that speech recognition conversion comes.

Preferably, the video data includes video file and subtitle file；

According to treated image stream data, audio stream data and the textual data to be come by speech recognition conversion Include according to the video data for generating to be released：

According to treated image stream data and audio stream data generation video file；And

According to the correspondence generation by the text data that speech recognition conversion comes and audio stream data with synchronous The subtitle file of information, the synchronizing information are used to cause the text data to come by speech recognition conversion when playing With the video file time synchronization.

On the other hand, the application also proposed a kind of computer readable storage medium, store computer program instructions thereon, Wherein, the computer program instructions realize method as described above when being executed by processor.

Meanwhile the application also proposed a kind of electronic equipment, including memory and processor, wherein, the memory is used In storing one or more computer program instructions, wherein, one or more computer program instructions are by the processor It performs to realize method as described in relation to the first aspect.

The application is obtained by obtaining the audio stream data in video, identification audio stream data through speech recognition conversion mistake The text data come, and the image stream data according to synchronization gain and the text data to be come by speech recognition conversion To generate the video data issued or shared.The application can be embedded in characterization voice messaging in video data as a result, Text is expressed in terms of vision and the sense of hearing two for the voice messaging in video, provides more abundant user experience. Also, by the video data generating method of the application can " key " operation generation one carry video, voice, image and text This multimedia content, it is user-friendly.Video generation disclosed in the present application and dissemination method, in mobile Internet information In issue and interaction scenarios, than issuing text information more convenient and efficient with input method and keyboard.

Description of the drawings

By the description referring to the drawings to the embodiment of the present invention, the above and other purposes of the present invention, feature and Advantage will be apparent from, in the accompanying drawings：

Fig. 1 is the system block diagram of the video sharing system of the embodiment of the present application；

Fig. 2 is the flow chart of the video data generating method of the embodiment of the present application；

Fig. 3 is the flow chart of the video data generating method of an optional realization method of the embodiment of the present application；

Fig. 4 is the schematic diagram of graphic user interface before the video record of the embodiment of the present application starts；

Fig. 5 is the schematic diagram that the video record of the embodiment of the present application starts rear graphic user interface；

Fig. 6 be the embodiment of the present application video record during graphic user interface schematic diagram；

Fig. 7 be the embodiment of the present application video record during graphic user interface schematic diagram；

Fig. 8 be the embodiment of the present application video record at the end of graphic user interface schematic diagram；

User interface schematic diagram when Fig. 9 is the video playing of the embodiment of the present application；

Figure 10 is the block diagram using the electronic equipment of the embodiment of the present application.

Specific embodiment

Below based on embodiment, present invention is described, but the present invention is not restricted to these embodiments.Under Text to the present invention datail description in, it is detailed to describe some specific detail sections.Do not have for a person skilled in the art The description of these detail sections can also understand the present invention completely.In order to avoid obscuring the substantive content of the present invention, well known side There is no narrations in detail for method, process, flow, element and circuit.

In addition, it should be understood by one skilled in the art that provided herein attached drawing be provided to explanation purpose, and What attached drawing was not necessarily drawn to scale.

Unless the context clearly requires otherwise, otherwise throughout the specification and claims " comprising ", "comprising" etc. are similar Word should be construed to the meaning included rather than exclusive or exhaustive meaning；That is, it is containing for " including but not limited to " Justice.

In the description of the present invention, in the description of the present invention, unless otherwise indicated, " multiple " are meant that two or two More than a.

Fig. 1 is the system block diagram of the video sharing system of the embodiment of the present application.As shown in Figure 1, the video of the present embodiment point The system of enjoying can include multiple user terminals 101, network 102 and content server 103.User terminal 101 by network 102 with Content server 103 communicates.On the one hand, user terminal 101 can record generation video data, be uploaded to by network 102 interior Hold server 103 to issue.On the other hand, user terminal 101 can from content server 103 obtain video data carry out browsing and Comment.Content server 103 is configured as receiving the video data that user terminal 101 uploads and passes through database purchase, meanwhile, Request in response to other user terminals 101 provides video data to user terminal.

In the present embodiment, user terminal 101 can be the communications data processing unit of loading predetermined application, example Such as, intelligent mobile terminal, smart television or all-purpose computer etc..User terminal 101 can carry out under control of the application The acquisition of image stream data and audio stream data, and handled the data of acquisition are further.Content server 103 can be Communications data processing unit.Since content server 103 needs pair after the video data sent by user terminal 101 is received Information is stored and issued, therefore in general content server 103 should possess larger storage capacity and preferable Data-handling capacity.It is to be understood that content server 103 can be concentrate connection one or more servers or with Distributed way is in communication with each other multiple server clusters of connection.Network 102 can be LAN (LAN) or wide area network (WAN), Can be the network based on wired connection access or the network of wireless connection access.Preferably, network 102, which uses, is based on wirelessly connecting The internet of access.

Fig. 2 is the flow chart of the video data generating method of the embodiment of the present application.As shown in Fig. 2, the video of the present embodiment Data creation method includes the following steps：

Step S210, image stream data and audio stream data are obtained.

Step S220, the text data to be come described in obtaining by speech recognition conversion.

Step S230, at the end of video record, according to described image flow data, audio stream data and described voice is passed through Identify that converted next text data generates video data to be released.

When applied to terminal device with touch-control input device, the step S210 of the present embodiment can be in Programmable detection It is triggered when clicking on to user and start and record control.After step S210 triggerings, it is located at same position detecting that user clicks on When terminating to record control, video record terminates.User only needs the same position in touch-control input device to click on twice, and being exactly can With one section of video content with sound and word of generation, the operation of video data generation is simplified, realizes " key " operation.

Meanwhile the present embodiment can also realized for the triggering of generation operation by redirecting automatically, detected to the During the operation of one control, start to obtain described image flow data and audio stream data, detecting that recording time reaches pre- timing In limited time, redirect automatically and generate video data to be released to realize a key operation.

As a result, after recording is started, redirected automatically if the time longer (such as reaching 60 seconds) and generate to be released regard Frequency evidence is further simplified operation.

In step S210, receive it is input by user beginning record command when, user terminal 101 start camera and Microphone is acquired external dynamic image and audio, and is formed as image stream data and audio stream data.In this reality It applies in example, image stream data and audio stream data are stream medium data.Stream medium data refer to be suitable for used on network, Using the consecutive hours base media data of streaming technology, can data packet in a manner of continual data flow into Row is transmitted and realized and handles in real time.

Meanwhile the process of text data that above-mentioned acquisition is come by speech recognition conversion, it can terminate in video record When proceed by, corresponding all pass through speech recognition in this way, can be obtained according to the audio stream data file of a completion Converted next text data.

In step S220, voice knowledge is carried out by calling the speech recognition application programming interface interface of offline or online form Not.The corresponding text data of audio stream data namely the text to be come by speech recognition conversion can be obtained by speech recognition Notebook data.Above-mentioned text data can effectively characterize voice messaging included in audio stream data.

The characteristics of being handled in real time using stream medium data, obtains the textual data to come by speech recognition conversion According to process can during the video record by almost in real time in a manner of carry out.In this way, can with recording process into Exhibition obtains the feedback of the text data to come by speech recognition conversion in real time.Meanwhile it obtains in real time in this way logical It crosses the text data that speech recognition conversion comes to can be also used for being shown in recording process, to improve in recording process User experience.

In step S230, by the image stream data got in entire recording process, audio stream data and pass through voice Identify that converted next text data is integrated into the video data issued with shared together.The video data can be one A independent data file can also be the file bag of multiple file compositions.The beholder of video data can out see as a result, See image and hear outside sound, can also see the text information shown in different forms in video, so as to from viewing Environment, producer's accent and the influence of beholder ability itself fully obtain the information that video is conveyed.On the one hand, this is promoted On the other hand the usage experience that producer shares, also improves the usage experience of video viewers.

In an optional implementation, the process of above-mentioned acquisition text data can be opened at the end of video record Begin to carry out, in this way, corresponding all text datas can be obtained according to the audio stream data file of a completion.

The technical solution of the application passes through voice by obtaining the audio stream data in video, identification audio stream data acquisition Identify it is converted come text data, and described come according to the image stream data of synchronization gain and by speech recognition conversion Text data generate the video data issued or shared.Thus, it is possible to the embedded characterization voice letter in video data The text of breath is expressed in terms of vision and the sense of hearing two for the voice messaging in video, provides more abundant user Experience.Also, by the video data generating method of the application can " key " operation generation one with video, voice, text This multimedia content, it is user-friendly.

Meanwhile each step of the embodiment of the present invention can optimize, and further promote user and carry out video record Usage experience.

Fig. 3 is the flow chart of the video data generating method of an optional realization method of the embodiment of the present application.Such as Fig. 3 institutes Show, described method includes following steps：

Step S310, into configuration interface is recorded, user is prompted to select masking-out image and/or filter.

Step S320, the masking-out image and/or filter that user selectes are obtained.

Step S330, after receiving the instruction for starting to record, image stream data and audio stream data are obtained in real time.

Step S340, the text data that speech recognition conversion comes is obtained.

Step S350, during video record, real-time display is by rendering the image stream data of processing and passing through voice Identify converted next text flow information.

Step S360, at the end of video record, according to treated image stream data, audio stream data and described logical It crosses the text data that speech recognition conversion comes and generates video data to be released.

Step S370, when receiving issue instruction, the video data to be released is uploaded into content server.

In step S310, before recording starts, masking-out in follow-up rendering to video, filter can be pre-selected in user Or other special efficacys.One illustrative graphic user interface is as shown in Figure 4.User can select certain types of the Video Option, For example, selection " mood recording " option, enters interface shown in Fig. 4.In graphic user interface shown in Fig. 4, pass through row Table lists alternative multiple masking-outs, including black dusters, dreamlike space and the pixel world.It is to be understood that except masking-out, It can also select the various special efficacys such as filter, frame.Meanwhile be provided with recording in graphical user interfaces and start control 11, User can start to record by clicking on recording beginning control 11.

For step S320, it should be appreciated that in other realization methods, the selection step of masking-out image or filter can also It is arranged in video record or after video record.In these realization methods, based on masking-out image and/or filter for figure As the processing of flow data can also be deferred in video record or after.

Optionally, user during being recorded can with the selected masking-out image of multiple conversions and/or filter, by This, the treated image stream data finally obtained different effects can be presented in different time sections according to the opportunity that user selects Fruit.For example, user starts to record in moment t0, initial selected is black dusters masking-out, and dreamlike space is switched in moment t1 Masking-out terminates to record in moment t2.In this case, the image stream data finally exported is covered in t0-t1 using black dusters Version processing image stream data, image stream data is handled in t1-t2 using the dreamlike space masking-out after switching.

In step S330, image stream data and audio stream data are stream medium data, can pass through real-time mode It obtains, handle and forwards.Specifically, frame data YUV420 and audio output signal the PCM Data of video output signals are received Respectively as image stream data and audio stream data.

In step S340, by calling speech recognition reference program interface (Application Programming Interface, API) voice messaging in audio stream data is identified, come so as to obtain by speech recognition conversion Text flow information as text data.In this realization method, audio stream data is sent to online speech recognition program and is connect Mouthful, and receive the word stream information to come by speech recognition conversion that online speech recognition program interface returns.Optionally, may be used With using the online speech recognition application programming interface interface provided by third parties such as search dog, HKUST News, Baidu.

Further, under the application scenarios for needing saving network flow, offline speech recognition application journey can also be used Sequence interface is identified.

In step S350, image stream data is obtained to step S320 according to previously selected masking-out image and/or filter It is handled, output is being shown by rendering the image stream data of processing.Meanwhile step S340 obtain in real time with The corresponding text flow information of voice messaging in audio stream file is also attached to show.

Specifically, image stream data render processing can be rendered by the texture of OpenGL API processing obtain one it is pre- It lookes at figure layer (the first figure layer), adds a masking-out figure layer or filter figure layer (the second figure layer) on it, then, will be known by voice Not converted next word flow information processing is word figure layer (the 3rd figure layer), according to image stream file, masking-out figure layer and word The stacked display of mode order of figure layer.

User may browse through the dynamic that selected masking-out image is attached to recording during recorded video as a result, On image, meanwhile, text data (that is, the text data to be come by speech recognition conversion) corresponding with the voice of input It is attached on video.

Further, the text flow information that step S340 is obtained includes the pause according to voice messaging in audio stream data And the division mark that speech recognition obtains (including punctuation mark and the different words of division or the mark of phrase).In real-time display text During this stream information, can text data be shown according to division mark step by step in a manner of being segmented display, to obtain It inputs the effect of word in real time by voice, promotes user experience.

Fig. 5-Fig. 7 is the schematic diagram of the graphic user interface during video record.User, which inputs, starts recorded video After instruction, graphic user interface is switched to Fig. 5.Recording in Fig. 4 starts control 11 and is switched to recording end control 12, meanwhile, It records in the region 13 for terminating 12 top of control and shows the image stream data obtained in real time and masking-out image.In the example of Fig. 5 In, user has selected granular translucent masking-out image, and the image obtained in real time as a result, is hidden by granular translucent masking-out Gear, shows dim effect.This can effectively protect the privacy of user.As Figure 6-Figure 7, after recording starts, use Family can record the voice of oneself or other people.After user terminal 101 gets audio stream data, speech recognition application journey is called It is identified in sequence interface, so as to obtain the text flow information to come by speech recognition conversion.Pass through speech recognition conversion The text flow information to come over is displayed in graphical interfaces step by step in units of word or word or short sentence.During display, The text having shown that can also change position according to the quantity of follow-up text information, the effect being gradually increasing be showed, to carry For more preferably user experience.

It is to be understood that the recording of step S330, S340 and step S350, identification and display operation in the order described above with Continuous obtain of image stream data and audio stream data repeats in real time until user's end of input record command.

User can trigger step S360 by inputting recording END instruction, terminate to record.It can also be monitored by program It the time of recording, prescribes a time limit when recording time reaches predetermined, redirects end automatically and record.

In this step, it is image stream data, audio stream data and the text data that is come by speech recognition conversion is whole It is combined into same video file or the file bag including video file and subtitle file.It, can as a result, after video data issue To be played by downloading video files or file bag with masking-out image and the textual data to be come by speech recognition conversion According to video.

Further, in video data, in the displaying and video data of the text data to be come by speech recognition conversion Voice messaging basic synchronization.In this way, when playing, what user can show according to voice and synchronously passes through speech recognition conversion The progress and content of voice messaging is understood quickly in the text data to come over.

To achieve these goals, a kind of mode is will directly to be come in synthetic video file by speech recognition conversion Text data be synchronously embedded.

Another way is according to subtitle file of the text data generation with synchronizing information.Specifically, can include such as Lower step：

Step S361, video file is generated according to the image stream data and audio stream data by rendering processing.

Preferably, the video file is MP4 (MPEG-4Part 14) file.

Step S362, according to the subtitle text of text data and the generation of the correspondence of audio stream data with synchronizing information Part.Wherein, the synchronizing information is used to cause the text data and the video file time synchronization when playing.

Preferably, caption information can be the general subtitle file of the forms such as srt, smi, ssa.

Fig. 8 is the schematic diagram of graphic user interface at the end of video record.As shown in Figure 7 and Figure 8, Fig. 7 is clicked in user In end record control 12 after, user terminal 101 terminates video acquisition, and jumps to interface shown in Fig. 8.In fig. 8 The main part loop play of graphic user interface this step generation video data (including image, audio and text).Meanwhile also Display issue control 14.Under this interface, user can be with the effect of preview video, and decides whether to issue or share the video. The video data generated can be uploaded content server 103 by user by clicking on the issue input issue instruction of control 14.

After video data issue, the other users of application program can log in content server by application program and obtain Generation is recorded to above-mentioned steps S310-S360

Fig. 9 is the schematic diagram of the graphic user interface under video broadcasting condition.As shown in figure 9, playing above-mentioned video counts According to when, while show the image by rendering processing, audio stream plays data, and text is shown in a synchronous manner in top layer Data.The text data can be shown according to synchronizing information in a manner of changing color, to prompt corresponding playing progress rate.

The technical solution of the application passes through voice by obtaining the audio stream data in video, identification audio stream data acquisition Identify it is converted come text data, and according to the image stream data with audio stream data synchronization gain and described pass through voice Converted next text data is identified to generate the video data issued or shared.It is thus, it is possible to embedding in video data Enter the text for characterizing the voice messaging in video, table is carried out simultaneously for the voice messaging in video in terms of vision and the sense of hearing two It reaches, provides more abundant user experience.

Further, by the video data generating method of the application can " key " operation generation one with video, The multimedia content of voice, text, it is user-friendly.Meanwhile video generation method disclosed in the present application, it can substitute often The input method of rule directly inputs word and voice by voice, is effectively improved information input efficiency.

Further, the embodiment of the present application can carry out image stream data according to selected masking-out image and/or filter Processing provides secret protection or promotion display effect to the user to realize different image shows effects.

Further, the embodiment of the present application by identifying audio stream data in real time so that can be obtained with near-real-time Corresponding text flow information.It, can be with producer's shape by the text flow that while recorded video, synchronous displaying identification obtains Into feedback, so as to further promote user experience.

Further, by after recording, by the text data to be come by speech recognition conversion with audio stream The mode of data synchronization is incorporated into the video data of generation so that the text to be come during follow-up play by speech recognition conversion The broadcasting of data is synchronous with the broadcasting of audio stream data, and generation video data has preferably readable and bandwagon effect.

Above the technical solution of the embodiment of the present application to be illustrated exemplified by video sharing application program this scene. It is to be understood that the embodiment of the present application is not limited to above-mentioned specific application scenarios, the video generation method of the application can also be applied The scene for being suitable for the video for other terminal plays is recorded in other any need.For example, the video data of the embodiment of the present application Generation method can be applied in instant communication software, and the lteral data to come by speech recognition conversion is carried by recording Short-sighted frequency, and another or another group of user terminal that user specifies are sent to, realize the video point with abundant bandwagon effect It enjoys.In another example the video data generating method of the embodiment of the present application can be applied to commenting for social networking application, E-business applications etc. In, user can be recorded one end based on the method for the embodiment of the present application and carry the lteral data to come by speech recognition conversion Comment information of the video as the information entity shown for above application, thus, it is possible to provide more abundant comment side Formula promotes user experience.

Figure 10 is the schematic diagram of the terminal device for the method for being used to implement the embodiment of the present invention.Terminal device 10 includes display Device A1, memory A2 (it can include one or more computer readable storage mediums), storage control A3, one or more Processor (CPU) A4, Peripheral Interface A5, radio circuit A6, input/output (I/O) subsystem A7 and one or more can be with Obtain the optical sensor A8 of image.These components can be communicated by one or more communication bus or signal wire A9.Ying Li Solution, electronic equipment 10 shown in Fig. 10 are an example, and electronic equipment 10 can have more more or fewer than shown component Component can combine two or more components or can have different component Configurations or arrangement.

Wherein, memory A2 can store component software, such as operating system, communication module, interactive module and application Program.Above-described each module and application program are both corresponded to complete one or more functions and retouched in inventive embodiments One group of executable program instructions of the method stated.

Meanwhile as skilled in the art will be aware of, the various aspects of the embodiment of the present application may be implemented as be System, method or computer program product.Therefore, the various aspects of the embodiment of the present application can take following form：Complete hardware Embodiment, complete software embodiment (including firmware, resident software, microcode etc.) usually can all claim herein For the embodiment for being combined software aspects with hardware aspect of " circuit ", " module " or " system ".In addition, the side of the application Face can take following form：The computer program product realized in one or more computer-readable mediums, computer can Reading medium has the computer readable program code realized on it.

Any combination of one or more computer-readable mediums can be utilized.Computer-readable medium can be computer Readable signal medium or computer readable storage medium.Computer readable storage medium can be such as (but not limited to) electronics, Magnetic, optical, electromagnetism, infrared or semiconductor system, device or foregoing any suitable combination.Meter The more specific example (exhaustive to enumerate) of calculation machine readable storage medium storing program for executing will include the following：With one or more electric wire Electrical connection, portable computer diskette, random access memory (RAM), read-only memory (ROM), erasable are compiled hard disk Journey read-only memory (EPROM or flash memory), optical fiber, portable optic disk read-only storage (CD-ROM), light storage device, Magnetic memory apparatus or foregoing any suitable combination.In the context of this hair application embodiment, computer-readable storage medium Matter can be can include or store the program used by instruction execution system, device or combined command perform system, The arbitrary tangible medium for the program that device uses.

Computer-readable signal media can include the data-signal propagated, and the data-signal of the propagation has wherein The computer readable program code realized such as the part in a base band or as carrier wave.The signal of such propagation may be employed Any form in diversified forms includes but not limited to：Electromagnetism, optical or its any appropriate combination.It is computer-readable Signal media can be following arbitrary computer-readable medium：It is not computer readable storage medium, and can be to by instructing The program that execution system, device use or combined command performs system, device uses is communicated, propagated Or transmission.

Including but not limited to wireless, wired, fiber optic cables, RF etc. or foregoing can be used arbitrary appropriately combined arbitrary Suitable medium transmits the program code realized on a computer-readable medium.

Computer program code for performing for the operation of the application each side can be with one or more programming languages Any combination of speech is write, and the programming language includes：The programming language of object-oriented such as Java, Python, Smalltalk, C++ etc.；And conventional process programming language such as " C " programming language or similar programming language.Program code can Fully on the user computer, partly to be performed on the user computer as independent software package；Partly counted in user It partly performs on calculation machine and on the remote computer；Or it fully performs on a remote computer or server.Latter It, can be by remote computer by including any type of network connection of LAN (LAN) or wide area network (WAN) in the case of kind (such as the Yin Te by using ISP can be attached to subscriber computer or with outer computer Net).

The above-mentioned flow chart legend according to the method for the embodiment of the present application, equipment (system) and computer program product and/ Or block diagram describes the various aspects of the application.It will be appreciated that each block and flow of flow chart legend and/or block diagram The combination of block in figure legend and/or block diagram can be realized by computer program instructions.These computer program instructions can be with The processor of all-purpose computer, special purpose computer or other programmable data processing devices is provided to, to generate machine so that (being performed via computer or the processor of other programmable data processing devices) instruction establishment be used to implement flow chart and/or The device for the function/action specified in block diagram or block.

These computer program instructions can also be stored in can instruct computer, other programmable data processing devices Or in the computer-readable medium that runs in a specific way of other devices so that the instruction production stored in computer-readable medium The product of the raw instruction for including realizing the function/action specified in flow chart and/or block diagram or block.

Computer program instructions can also be loaded on computer, other programmable data processing devices or other devices On, so as to perform a series of operable steps on computer, other programmable devices or other devices to generate computer reality Existing process so that the instruction offer performed on computer or other programmable devices is used to implement in flow chart and/or frame The process for the function/action specified in segment or block.

The foregoing is merely the preferred embodiments of the application, are not limited to the application, for those skilled in the art For, the application can have various modifications and changes.All any modifications made within spirit herein and principle are equal Replace, improve etc., it should be included within the protection domain of the application.

Claims

1. a kind of video data generating method, including:

Obtain image stream data and audio stream data；

Obtain the text data to come by speech recognition conversion；And

It is generated according to described image flow data, audio stream data and the text data to be come by speech recognition conversion pending The video data of cloth.

2. according to the method described in claim 1, it is characterized in that, when detecting the operation to the first control, start to obtain Described image flow data and audio stream data, when detecting the operation of the second control to being located at same position, triggering generation Video data to be released is to realize a key operation.

3. according to the method described in claim 1, it is characterized in that, when detecting the operation to the first control, start to obtain When detecting that recording time reaches pre- specified time, it is to be released to redirect generation automatically for described image flow data and audio stream data Video data to realize a key operation.

4. according to the method described in claim 1, it is characterized in that, the method further includes：

5. the according to the method described in claim 1, it is characterized in that, text data that acquisition is come by speech recognition conversion For：

6. according to the method described in claim 3, it is characterized in that, the method further includes：

During video record, real-time display treated image stream data and described come by speech recognition conversion Text flow information.

7. according to the method described in claim 6, it is characterized in that, the treated image stream data of the real-time display and institute Stating the text flow information to come by speech recognition conversion includes：

The image stream data obtained with the display of the first figure layer；

8. the method according to the description of claim 7 is characterized in that the method further includes：

Obtain masking-out image and/or filter that user selectes.

9. it the method according to the description of claim 7 is characterized in that is shown in the 3rd figure layer described by speech recognition conversion mistake The text flow information come includes：

Division mark in the text flow information to be come by speech recognition conversion passes through language described in showing step by step The converted next text flow information of sound identification.

10. the according to the method described in claim 1, it is characterized in that, text data that acquisition is come by speech recognition conversion Including：

The audio stream data is sent to online speech recognition server and receives the text to come by speech recognition conversion Stream information；Or

Offline speech recognition application programming interface interface is called to be identified to pass through language described in acquisition to the audio stream data information The converted next text flow information of sound identification.

11. according to the method described in claim 1, it is characterized in that, the video data includes video file and subtitle file；

It is generated according to described image flow data, audio stream data and the text data to be come by speech recognition conversion pending The video data of cloth includes：

According to the generation of the correspondence of the text data and audio stream data to be come by speech recognition conversion with synchronizing information Subtitle file, the synchronizing information is used to cause the text data to come by speech recognition conversion and institute when playing State video file time synchronization.

12. a kind of computer readable storage medium, stores computer program instructions thereon, which is characterized in that the computer journey The method as any one of claim 1-11 is realized in sequence instruction when being executed by processor.

13. a kind of electronic equipment, including memory and processor, which is characterized in that the memory is for storage one or more Computer program instructions, wherein, one or more computer program instructions perform to realize such as power by the processor Profit requires the method any one of 1-11.