CN108833969A

CN108833969A - A kind of clipping method of live stream, device and equipment

Info

Publication number: CN108833969A
Application number: CN201810689302.2A
Authority: CN
Inventors: 王释涵
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2018-06-28
Filing date: 2018-06-28
Publication date: 2018-11-16

Abstract

The embodiment of the present application discloses the clipping method, device and relevant device of a kind of live stream, wherein this method includes：Obtain live stream and explanation audio stream corresponding with the live stream, it is handled using bloom segment identification model end to end audio stream is explained, obtain the start/stop time of bloom segment in live stream corresponding with the explanation audio stream, and then according to the start/stop time of bloom segment, editing obtains the bloom segment of live stream from live stream.Thus, editing personnel are no longer needed by manual analysis live stream and explains the content of audio stream and determines the start/stop time of bloom segment, be can be improved editing efficiency, are guaranteed editing quality, cost of labor can also be reduced, to meet under internet big data environment to the business demand of live stream application.

Description

A kind of clipping method of live stream, device and equipment

Technical field

This application involves direct seeding technique field more particularly to a kind of clipping method of live stream, device, equipment and computers Readable storage medium storing program for executing.

Background technique

With the rapid development of direct seeding technique, more and more users' selection knows relevent information by being broadcast live.Live streaming Business side in order to provide the user with more selections and preferably sight listen experience, usually all during live streaming, according to being broadcast live Content provide some excellent live stream segments for user, so that user checks and shares.Such as it is straight in a Basketball Match In broadcasting, live broadcast service side can provide some splendid moment small videos according to the content being broadcast live for user, and such as shooting is scored excellent The videos small fragments such as moment, remarkable moment facilitate user to directly select interested partially viewed.

Live broadcast service side is normally based on artificial editing and generates live stream segment at present, listens live streaming by seeing by editing personnel Stream and personal experience complete editing generation live streaming flow by video clipping software to judge which is wonderful Section.

But this method is higher to the skill requirement of editing personnel, and entire editing process, which also compares, expends the time, therefore, this The cost of labor of kind method is relatively high, and editing efficiency is relatively low, and editing unstable quality is difficult to meet straight under big data environment Broadcast the business demand of stream application.

Summary of the invention

The embodiment of the present application provides the clipping method, device and relevant device of a kind of live stream, can be improved editing Efficiency.

In view of this, the application first aspect provides a kind of clipping method of live stream, the method includes：

Obtain live stream and explanation audio stream corresponding with the live stream；

The explanation audio stream is inputted into bloom segment identification model, obtains the height of the bloom segment identification model output The start/stop time of mating plate section；The bloom segment identification model is neural network model end to end；

According to the start/stop time of the bloom segment, editing obtains the high mating plate of the live stream from the live stream Section.

The application second aspect provides a kind of editing device of live stream, and described device includes：

Module is obtained, for obtaining live stream and explanation audio stream corresponding with the live stream；

Processing module obtains the bloom segment and knows for the explanation audio stream to be inputted bloom segment identification model The start/stop time of the bloom segment of other model output；The bloom segment identification model is neural network model end to end；

Editing module, for according to the corresponding start/stop time of the bloom segment, editing to obtain institute from the live stream State the bloom segment of live stream.

The application third aspect provides a kind of equipment, and the equipment includes processor and memory：

Said program code is transferred to the processor for storing program code by the memory；

The processor is used to execute the live stream as described in above-mentioned first aspect according to the instruction in said program code Clipping method the step of.

The application fourth aspect provides a kind of computer readable storage medium, and the computer readable storage medium is for depositing Program code is stored up, said program code is used to execute the clipping method of live stream described in above-mentioned first aspect.

The 5th aspect of the application provides a kind of computer program product including instruction, when run on a computer, So that the computer executes the clipping method of live stream described in above-mentioned first aspect.

As can be seen from the above technical solutions, the embodiment of the present application has the following advantages that：

The embodiment of the present application provides a kind of clipping method of live stream, and this method utilizes a kind of bloom segment end to end Identification model determines the start/stop time of bloom segment in live stream corresponding with the explanation audio stream according to audio stream is explained, into And based on the start/stop time of identified bloom segment, editing goes out the bloom segment of the live stream from live stream.This is end-to-end Bloom segment identification model can be by carrying out intelligence point to the features such as semanteme, word speed and intonation explained in audio stream Analysis, determines the start/stop time of bloom segment in the corresponding live stream of explanation audio stream, thus, it is no longer necessary to which editing personnel pass through Manual analysis live stream determines the start/stop time of bloom segment with the content for explaining audio stream, greatly reduces live stream editing The cost of labor for needing to expend in the process improves editing efficiency, while also meeting under internet big data environment to live streaming The business demand of stream application.

Detailed description of the invention

Fig. 1 is a kind of application scenarios schematic diagram of the clipping method of live stream in the embodiment of the present application；

Fig. 2 is a kind of flow diagram of the clipping method of live stream in the embodiment of the present application；

Fig. 3 is a kind of interface schematic diagram that interface is broadcast live in the embodiment of the present application；

Fig. 4 is a kind of configuration diagram of bloom segment identification model in the embodiment of the present application；

Fig. 5 is a kind of configuration diagram of bloom segment identification model training process in the embodiment of the present application；

Fig. 6 is a kind of flow diagram of the training method of bloom segment identification model in the embodiment of the present application；

Fig. 7 is the application scenarios schematic diagram of the clipping method of another live stream in the embodiment of the present application；

Fig. 8 is the structural schematic diagram of the editing device of the first live stream in the embodiment of the present application；

Fig. 9 is the structural schematic diagram of the editing device of second of live stream in the embodiment of the present application；

Figure 10 is the structural schematic diagram of the editing device of the third live stream in the embodiment of the present application；

Figure 11 is the structural schematic diagram of the editing device of the 4th kind of live stream in the embodiment of the present application；

Figure 12 is the structural schematic diagram of the editing device of the 5th kind of live stream in the embodiment of the present application；

Figure 13 is the structural schematic diagram of the editing device of the 6th kind of live stream in the embodiment of the present application；

Figure 14 is a kind of structural schematic diagram of the film editing equipment of live stream in the embodiment of the present invention；

Figure 15 is the structural schematic diagram of the film editing equipment of another live stream in the embodiment of the present invention.

Specific embodiment

In order to make those skilled in the art more fully understand application scheme, below in conjunction in the embodiment of the present application Attached drawing, the technical scheme in the embodiment of the application is clearly and completely described, it is clear that described embodiment is only this Apply for a part of the embodiment, instead of all the embodiments.Based on the embodiment in the application, those of ordinary skill in the art exist Every other embodiment obtained under the premise of creative work is not made, shall fall in the protection scope of this application.

The description and claims of this application and term " first ", " second ", " third ", " in above-mentioned attached drawing The (if present)s such as four " are to be used to distinguish similar objects, without being used to describe a particular order or precedence order.It should manage The data that solution uses in this way are interchangeable under appropriate circumstances, so that embodiments herein described herein for example can be to remove Sequence other than those of illustrating or describe herein is implemented.In addition, term " includes " and " having " and theirs is any Deformation, it is intended that cover it is non-exclusive include, for example, containing the process, method of a series of steps or units, system, production Product or equipment those of are not necessarily limited to be clearly listed step or unit, but may include be not clearly listed or for this A little process, methods, the other step or units of product or equipment inherently.

In the prior art, existed by the method for the artificial editing live stream of editing personnel and need the high labor cost expended, cut It collects low efficiency, be difficult to meet under big data environment to the technical problems such as the business demand of live stream application, the embodiment of the present application needle A kind of clipping method of live stream is provided to above-mentioned technical problem.

The clipping method of live stream provided by the embodiments of the present application is using a kind of bloom segment identification model end to end, root According to audio stream is explained, the start/stop time of bloom segment in live stream corresponding with the explanation audio stream is determined, and then really based on institute The start/stop time of fixed bloom segment, editing goes out the bloom segment of the live stream from live stream.Above-mentioned bloom segment identifies mould Type is neural network model end to end, using the corresponding explanation audio stream of live stream as input, with bloom segment in live stream Start/stop time as output.Bloom segment identification model can be by the meaning of one's words in explanation audio stream, language end to end for this The features such as speed and intonation carry out intellectual analysis, when determining the start-stop of bloom segment in the corresponding live stream of explanation audio stream It carves, thus, it is no longer necessary to which editing personnel determine bloom segment by the content of manual analysis live stream and explanation audio stream Start/stop time greatly reduces the cost of labor for needing to expend during live stream editing, improves editing efficiency, guarantees to stablize Editing quality, while also meet under internet big data environment to live stream application business demand.

The clipping method of live stream provided by the embodiments of the present application can be applied to have multimedia-data procession function Equipment, such as terminal device, server.Wherein, terminal device is specifically as follows computer, smart phone, personal digital assistant (Personal Digital Assitant, PDA), tablet computer etc.；Server is specifically as follows separate server, can also be with For cluster server, the server can multiple live streams of editing simultaneously bloom segment.

The technical solution of the application in order to facilitate understanding, below with reference to practical application scene to provided by the embodiments of the present application The clipping method of live stream is introduced.

Referring to Fig. 1, Fig. 1 is the application scenarios schematic diagram of the clipping method of live stream provided by the embodiments of the present application.This is answered With in scene include server 101, video camera 102 and explain microphone 103, wherein server 101 for execute the application reality The clipping method of the live stream of example offer is applied, video camera 102 explains microphone 103 for acquiring and imaging for acquiring live stream The corresponding explanation audio stream of the live stream collected of machine 102.

During live streaming, video camera 102 in real time acquisition live streaming scene scenic picture, and by scenic picture collected with The form of live stream is sent to server 101；At the same time, the acquisition of unlock microphone 103 corresponds to the scene at current live scene The explanation audio signal of picture, and explanation audio signal collected is sent to server 101 in the form of explaining audio stream.

Server 101 receives the live stream of the transmission of video camera 102 and explains the explanation audio stream that microphone 103 is sent Afterwards, by the received audio stream that explains be input to the bloom segment identification model of self-operating, utilize the bloom segment to identify mould Type carries out intellectual analysis to features such as semanteme, word speed and intonation in the explanation audio stream, determines that the explanation audio stream is corresponding Live stream in bloom segment start/stop time；In turn, the start/stop time of identified bloom segment and server 101 are connect The live stream received is input to the editing system run in server 101, so that start-stop of the editing system according to bloom segment Moment, correspondingly the editing from live stream went out the bloom segment of the live stream.

It should be understood that other than video camera shown in FIG. 1 can be used, can also be had using other when acquisition live stream Multi-medium data acquires the equipment such as smart phone, computer etc. of function, is acquired to live stream；Similarly, acquisition explains When audio stream, other than explanation microphone shown in FIG. 1 can be used, other can also be used to have audio signal sample function Equipment acquisition explains audio stream.

It should be understood that some equipment for having multi-medium data acquisition function can acquire a variety of multi-medium datas, example simultaneously Such as, video camera can acquire image data and audio data simultaneously, therefore when acquiring live stream and explaining audio stream, can make With it is this can acquire the equipment of a variety of multi-medium datas simultaneously and meanwhile acquire live stream and explain audio stream；I.e. in addition to that can make With video camera shown in FIG. 1 reconciliation speak cylinder independently acquire live stream and explain audio stream outside, can also use have both it is more The equipment of kind multi-medium data acquisition function acquires live stream simultaneously and explains audio stream.

It should be noted that above-mentioned bloom segment identification model is neural network model end to end, the nerve net is utilized Network model carries out intellectual analysis to features such as voice, word speed and the intonation explained in audio stream, is capable of determining that the interpretation speech Frequency flows the start/stop time of bloom segment in corresponding live stream, thus, it is no longer necessary to which editing personnel pass through manual analysis live stream Determine editing moment of bloom segment with the content that explains audio stream, greatly reduce editing live stream needs expend it is artificial Cost improves editing efficiency, and meets under internet big data environment to the business need of live stream application.

It should be noted that scene shown in above-mentioned Fig. 1 is only a kind of example, in practical applications, the embodiment of the present application is mentioned The clipping method of the live stream of confession can also be applied to terminal device, and can also be using other equipment acquisition live stream reconciliation It says audio stream, any specific restriction is not done to the application scenarios of the clipping method of live stream provided by the embodiments of the present application herein.

It is introduced below by clipping method of the embodiment to live stream provided by the present application.

Referring to fig. 2, Fig. 2 is a kind of flow diagram of the clipping method of live stream provided by the embodiments of the present application.In order to Convenient for description, the present embodiment is described using server as executing subject, it should be appreciated that the execution of the clipping method of the live stream Main body is not limited to server, can also be applied to other equipment with multimedia-data procession function.As shown in Fig. 2, this is straight The clipping method for broadcasting stream includes the following steps：

Step 201：Obtain live stream and explanation audio stream corresponding with the live stream.

Live stream is the standard flow data generated and carrying out real-time coding to the on-site signal acquired in real time, the reality When the on-site signal that acquires can be vision signal, or audio signal, correspondingly, live stream can be net cast Stream, or live audio stream.For example, at Basketball Match scene, it can be by the live video that is acquired in real time to video camera Signal carries out real-time coding and generates corresponding net cast stream；In another example microphone reality can be passed through when music radio station is broadcast live When the live audio signal that acquires carry out real-time coding and generate corresponding live audio stream.

Explaining audio is that announcer is explained in real time, introduced and analyzed and generated to the content of current live signal Audio signal, the explanation audio signal synchronize corresponding with the on-site signal acquired in real time.Correspondingly, explain audio stream be by pair The corresponding standard flow data for explaining audio signal and carrying out real-time coding and generating synchronous with on-site signal acquired in real time.Example Such as, during Basketball Match, announcer to the match game situation at Basketball Match scene can explain in real time and introduce, and thus give birth to At audio signal be to explain audio signal, pass through and carry out real-time coding to explaining the explanation audio signal that acquires in real time of microphone Produce corresponding explanation audio stream.

When server needs editing live stream to obtain the bloom segment of the live stream, the available live stream of server And explanation audio stream corresponding with the live stream, acquired live stream are real by carrying out to the on-site signal acquired in real time When coding and the standard flow data that generate, acquired explanations audio stream is by synchronous with the on-site signal acquired in real time The corresponding standard flow data for explaining audio and carrying out real-time coding generation.

It should be noted that bloom segment refers to the hot spot that viewer or listener can be attracted to pay close attention in live stream Segment and bright spot segment, the bloom segment are specifically as follows segment more excellent in live stream, or can cause The more segment that dispute is discussed can also be the relatively high segment etc. of topic.For example, in Basketball Match live video, Certain hot spot sportsman dunk shot or remarkable segment are more excellent, more viewer can be attracted to pay close attention to, then hot spot sportsman's dunk shot Or remarkable segment can be confirmed as bloom segment；In another example the segment for discharging a certain technical ability can in game live video Can be more excellent, the discussion of more viewer can be caused to learn, then the segment of the release technical ability can be confirmed as bloom Segment.

Optionally, server obtains live stream and when explanation audio stream corresponding with the live stream, can in real time from Live stream and explanation audio stream corresponding with the live stream are obtained at direct broadcast server.

In one possible implementation, the equipment of the collection site signal solution corresponding to on-site signal synchronous with acquisition The equipment for saying audio signal is two independent acquisition equipment, for example, during Basketball Match, acquisition Basketball Match scene The equipment of vision signal is video camera, and acquires the equipment of the explanation audio signal of the Basketball Match to explain microphone.Acquisition is existing The on-site signal acquired in real time is sent to direct broadcast server by the equipment of field signal, and at the same time, acquisition explains audio signal The corresponding explanation audio signal synchronous with on-site signal acquired in real time is also sent to direct broadcast server by equipment；Direct broadcast server After receiving on-site signal and corresponding explanation audio signal synchronous with on-site signal, to on-site signal and and on-site signal It synchronizes corresponding explanation audio signal and carries out real-time coding, to obtain corresponding live stream and explain audio stream, so that this is straight It broadcasts stream and explains audio stream and be sent to server for editing live stream.

In alternatively possible implementation, the equipment of collection site signal is synchronous with acquisition corresponding to on-site signal The equipment for explaining audio signal is same acquisition equipment, i.e., the acquisition equipment has both collection site signal and acquisition explains audio letter Number function, for example, can use the game that shows on computer real-time synchronization acquisition screen during game live streaming and draw Face and explanation audio signal corresponding to going game picture.Synchronous acquisition on-site signal and the acquisition for explaining audio signal are set It is standby, on-site signal collected and explanation audio signal are mixed and be sent to direct broadcast server as live signal, accordingly Ground after direct broadcast server receives the live signal, carries out lock out operation to the live information, to isolate scene letter therein Number with explain audio signal, and then respectively to on-site signal and explain audio signal carry out carry out real-time coding, to obtain correspondence Live stream and explain audio stream, and then by the live stream and explain audio stream and be sent to the service of editing live stream of being used for Device.

It is understood that the equipment that the acquisition equipment of collection site signal explains audio signal with acquisition can also be direct The signal that itself is acquired is sent to and is used for the server of editing live stream, by the server to the on-site signal acquired in real time and It explains audio signal and carries out real-time coding, obtain corresponding live stream and explain audio stream, herein not to acquisition live stream reconciliation Say that the mode of audio stream does any restriction.

Step 202：The explanation audio stream is inputted into bloom segment identification model, obtains the bloom segment identification model The start/stop time of the bloom segment of output；The bloom segment identification model is neural network model end to end.

After server gets the live stream for needing editing and corresponding explanation audio stream synchronous with the live stream, by institute The bloom segment identification model for explaining audio stream input self-operating of acquisition, the bloom segment identification model pass through to input It explains the features such as semanteme, word speed and the intonation of audio stream and carries out intellectual analysis, determination is synchronous corresponding with the explanation audio stream The start/stop time of bloom segment in live stream.

It should be noted that above-mentioned bloom segment identification model is neural network end to end, the neural network is to explain Audio stream is as input, and using the start/stop time of bloom segment as output, i.e., server is input to the bloom for audio stream is explained After segment identification model, which can correspondingly export corresponding live stream synchronous with the explanation audio stream The start/stop time of middle bloom segment.

It should be understood that explaining the features such as semanteme, word speed and the intonation in audio stream everywhere can reflect to a certain extent Whether live stream corresponding to audio stream belongs to bloom segment herein out, for example, if bloom segment identification model is to explanation audio Stream carries out intellectual analysis, determines that the semantic expressiveness of somewhere audio stream in the explanation audio stream is at this time out the match of current live Match point, and the word speed that audio stream is explained at this is very fast, and intonation is higher, then explanation audio stream herein can be determined correspondingly Corresponding live stream belongs to bloom segment.It is possible thereby to when according to the broadcasting of the identified live stream for belonging to bloom segment It carves, the start/stop time of bloom segment in live stream is further determined that, for example, the live stream that bloom segment identification model is inputted at certain In determine that the live stream of 5 ' 32 " to 6 ' 49 " belongs to bloom segment, and determine 5 ' 32 " before in preset time period and 6 ' 49 " it " is bloom segment that there is no the live streams for belonging to bloom segment in preset time period later, then being assured that 5 ' 32 At the beginning of, 6 ' 49 " be the end time of bloom segment.

It should be noted that bloom segment identification model can explain semanteme, word speed and the language of audio stream according to somewhere Any one or more feature in feature is adjusted, determines and explains whether the corresponding live stream of audio stream belongs to bloom segment herein.

Step 203：According to the start/stop time of the bloom segment, editing obtains the live stream from the live stream Bloom segment.

The bloom segment identification model run in server is determined to synchronize corresponding live streaming according to audio stream is explained In stream after the start/stop time of bloom segment, server needs further to enable live stream editing function, according to identified bloom The start/stop time of segment carries out editing operation to live stream, to obtain the bloom segment of the live stream.

When carrying out editing to live stream according to the start/stop time of bloom segment, live streaming can be flowed into using cloud editing system Row editing operation.When specific implementation, server will need the bloom of the live stream and the output of bloom segment identification model of editing The start/stop time of segment is input to the cloud editing system of self-operating, high mating plate of the cloud editing system accordingly based upon input The start/stop time of section carries out editing operation to the live stream of input, obtains the bloom segment of the live stream.The cloud editing system is Based on the editing system of cloud editing technology building, the real-time online editing to live stream may be implemented using cloud editing technology and grasp Make, during progress is broadcast live, which can be according to the bloom segment start-stop of bloom segment identification model output Moment correspondingly carries out real-time online editing operation to live stream, and at the end of live streaming, which can be synchronously completed pair The editing operation of entire live stream, editing go out bloom segment all in the live stream.

It should be understood that the clipping method of live stream provided by the embodiments of the present application is in the start/stop time for determining bloom segment Afterwards, live streaming can also be flowed into using the editing system constructed based on other editing technologies according to the start/stop time of bloom segment Row editing operation, editing system involved in the clipping method to live stream provided by the embodiments of the present application does not do specific limit herein It is fixed.

It should be noted that bloom segment identification model according to the explanation audio stream of input in addition to that can export and the solution Say that audio stream synchronizes in corresponding live stream outside the start/stop time of bloom segment, bloom segment identification model can also export bloom The corresponding title classification of segment, title classification belonging to the bloom segment can be from generally reflecting in the bloom segment Hold.

When specific implementation, bloom segment identification model can carry out intelligence by the semantic feature of the explanation audio stream to input It can analyze, determine the keyword that the explanation audio stream is related to, and then the keyword being related to according to audio stream is explained, determine the bloom The title classification of segment.For example, bloom segment identification model carries out intellectual analysis to the semantic feature of the explanation audio stream of input, The semantic feature for explaining audio stream according to certain section determines that this section explains the keyword that audio stream is related to as " play-off competition ", and then bloom Segment identification model can be according to the keyword, and the title classification for determining that this section explains the corresponding bloom segment of audio stream is " to add When match wonderful ".Can be for one or multiple it should be noted that explaining keyword involved in audio stream, herein Any restriction is not done to the quantity of keyword.

In one possible implementation, server can obtain the relevant information of live stream, correspondingly, bloom in advance Segment identification model, can be in conjunction with the semantic feature for explaining audio stream and pre- when determining the corresponding title classification of bloom segment The relevant information of the live stream first obtained determines the corresponding title classification of bloom segment.For example, one basketball of server editing When the live stream of match, getting this Basketball Match in advance is the match between Warriors and Cavaliers, and is correspondingly obtained To sportsman's relevant information of Warriors and two teams of Cavaliers, the analysis of bloom segment identification model explains certain section of interpretation speech in audio stream The semantic feature of frequency stream, the semanteme for determining that this section explains audio stream is " James's ending is knocked down three-pointer and surpassed in reverse ", in turn, bloom The relevant information for this Basketball Match that segment identification model can be obtained according to determining semanteme and in advance, determines this section of height The corresponding title classification of mating plate section is " Cavaliers' James's ending knocks down three-pointer and surpasses in reverse Warriors ".

The start/stop time for the bloom segment that server is exported according to bloom segment identification model, editing live stream obtain bloom After segment, the bloom segment that editing obtains can be sent to client, so that client issues the live stream at live streaming interface Bloom segment；In addition, the corresponding title classification of bloom segment can also be sent to client by server, so that client exists The corresponding title classification of the bloom segment is issued while issuing bloom segment.User can be according to each bloom segment pair as a result, The title classification answered knows the General Matters of each bloom segment, and then decides whether to click and watch the bloom segment.

The specific implementation process of above-mentioned bloom segment publication in order to facilitate understanding, with reference to the accompanying drawing to above-mentioned bloom segment Issuing process is illustrated.It is the interface schematic diagram that interface is broadcast live in Basketball Match referring to Fig. 3, Fig. 3.

As shown in figure 3, what Basketball Match video and user of the video playback area 301 for playing current live selected The corresponding video of bloom segment；Client receives the bloom segment and bloom in the Basketball Match live stream of server transmission After the corresponding title classification of segment, by the received bloom segment publication of institute in bloom segment publication column 302, and issued Each bloom segment marks corresponding title classification, bloom segment issue the bloom segment in column 302 with Basketball Match into Row real-time update.User can issue the title classification of the bloom segment shown in column 302 according to bloom segment as a result, know The General Matters of each bloom segment, and then determine whether to watch each bloom segment.When user wants to watch a certain bloom segment, User can issue in column 302 in bloom segment and click the bloom segment, and correspondingly, client end response is grasped in the click of user Make, obtains the bloom segment from server, and play the bloom segment in video playback area 301.

In order to further increase the experience that user watches bloom segment, server can be deleted in bloom segment with editing and be wrapped The advertisement included, so that user can watch bloom segment in the case where not interfered by advertisement.

In one possible implementation, server can know each advertisement play time in the live stream in advance Start/stop time, and then server, according to the start/stop time of each advertisement play time, the advertisement in bloom segment is deleted in editing.Tool When body is realized, after the bloom segment identification model in server determines the start/stop time of bloom segment according to explanation audio stream, It can judge at the beginning of identified bloom segment further according to the start/stop time for the advertisement play time known in advance Whether include advertisement play time, if including, obtain the process of bloom segment in editing live stream if carving between end time In, advertising clip is deleted according to the start/stop time of advertisement play time, and then acquisition does not include the bloom segment of advertisement, if not Including directly obtaining bloom according to the start/stop time editing live stream of bloom segment then without carrying out advertising clip delete processing Segment.

It should be understood that server can also be broadcast after editing live stream obtains bloom segment further according to the advertisement known in advance Whether the start/stop time for putting the time judges in each bloom segment to include advertisement, if including, further according to included wide The start/stop time of play time is accused, the advertisement for including in bloom segment is deleted in editing, can be directly on live streaming circle if not including Issue the bloom segment that editing obtains in face.

In alternatively possible implementation, server is preparatory and is not known by the start-stop of advertisement play time in live stream Moment, server can be determined further extensively according to the fragment where fuzzy location algorithm positioning advertising, and then from the fragment The start/stop time of announcement deletes the advertisement from the bloom segment of live stream according to the start/stop time of identified advertisement.

Specifically, server can substantially orient the fragment where advertisement according to the content of live stream, and then positioning In the fragment where advertisement out, at the beginning of determining advertisement by way of moving to left a frame, and by moving to right a frame Mode determines the end time of advertisement, and the start/stop time of advertisement is determined by this two-way parallel method of determination of frame by frame, can Location efficiency is improved, certainly, server can also be or unidirectional suitable according to from right to left according to by left-to-right unidirectional sequence Sequence determines the start/stop time of advertisement frame by frame, i.e., judges the content frame by frame of the fragment judge whether the content of each frame is right The start frame and abort frame of advertisement, Jin Ergen should be thereby determined that out in the perhaps content of advertisement abort frame in advertisement start frame The start/stop time that the advertisement is determined according to advertisement start frame and advertisement abort frame, will be in bloom segment according to the start/stop time of the advertisement Advertising clip delete.The specific implementation of the advertisement in bloom segment is deleted in editing herein and editing in a kind of upper implementation is deleted Except the operation of advertisement is similar, the associated description of advertisement process is deleted referring particularly to above-mentioned editing, is not repeated herein.

The clipping method of live stream provided by the embodiments of the present application is using a kind of bloom segment identification model end to end, root According to audio stream is explained, the start/stop time of bloom segment in live stream corresponding with the explanation audio stream is determined, and then really based on institute The start/stop time of fixed bloom segment, editing goes out the bloom segment of the live stream from live stream.Above-mentioned bloom segment identifies mould Type is neural network model end to end, using the corresponding explanation audio stream of live stream as input, with bloom segment in live stream Start/stop time as output.Bloom segment identification model can be by the meaning of one's words in explanation audio stream, language end to end for this The features such as speed and intonation carry out intellectual analysis, when determining the start-stop of bloom segment in the corresponding live stream of explanation audio stream It carves, thus, it is no longer necessary to which editing personnel determine bloom segment by the content of manual analysis live stream and explanation audio stream Start/stop time greatly reduces the cost of labor for needing to expend during live stream editing, improves editing efficiency, can guarantee Stable editing quality, while also meeting under internet big data environment to the business demand of live stream application.

As described above, the clipping method of live stream provided by the embodiments of the present application needs to identify mould based on bloom segment Type determines the start-stop of the bloom segment in corresponding live stream synchronous with the explanation audio stream according to the explanation audio stream of input Moment.For the ease of further understand live stream provided by the embodiments of the present application clipping method specific implementation process, below Above-mentioned bloom segment identification model is specifically introduced in conjunction with attached drawing.

Referring to fig. 4, Fig. 4 is the configuration diagram of bloom segment identification model 400 provided by the embodiments of the present application.Such as Fig. 4 Shown, which includes cascade speech recognition network 401 and positioning network 402.

Wherein, speech recognition network 401 be using the corresponding explanation audio stream of live stream as input, using explain text as The neural network of output.Speech recognition network 401 is the neural network constructed based on speech recognition technology, is mainly responsible for explanation The semanteme of audio stream carries out understanding identification, to obtain explanation text corresponding with audio stream is explained.

When the explanation audio stream of 401 pairs of speech recognition network inputs carries out speech recognition, need by the sound wherein run Learn feature extraction (Acoustic Feature extraction) model, acoustic model (Acoustic model), Pronounceable dictionary (Pronunciation Dictionary), language model (Language model) and decoder (Decoder) are to explanation Audio stream carries out speech recognition, to obtain the higher explanation text corresponding with the explanation audio stream of accuracy, below to voice Feature Selection Model, acoustic model, Pronounceable dictionary, language model and the decoder for including in identification network 401 carry out respectively It introduces：

Acoustic feature extracts model and is mainly used for extracting acoustic feature vector from the explanation audio stream of input, is decoder Processing data are provided.When specific implementation, acoustic feature, which extracts model, can be based on mel-frequency cepstrum coefficient (Mel- Frequency cepstral coefficients, MFCC) acoustic feature explained in audio stream is extracted, since MFCC is base Put forward in human hearing characteristic, therefore MFCC and sound frequency, utilize MFCC and sound frequency at nonlinear correspondence relation Between nonlinear correspondence relation, can calculate obtain explain audio stream in acoustic feature vector.

In addition to this, which, which extracts model, can be combined with some signal processing technologies, to the interpretation speech of input Frequency stream carries out signal processing, to reduce the factors such as the ambient noise, the channel disturbance that explain in audio stream as much as possible to acoustics spy It is influenced caused by sign.

Decoder is one of the core of speech recognition network 401, converts (Weighted using weighted finite state Finaite-state transducer, WFST) acoustic model, Pronounceable dictionary and language model are effectively integrated, with Most effective way is scanned for and is matched to the acoustic feature vector of input, to find most matched word string under statistical significance As recognition result, i.e., as explanation text.

The process that decoder executes speech recognition is broadly divided into two stages, and the first stage is speech recognition period, herein Stage, using the acoustic model in speech recognition network 401, the acoustic feature vector that acoustic feature extracts model output was converted For the acoustic feature of syllable form；Second stage is speech understanding stage, the pronunciation word in this stage speech recognition network 401 The acoustic feature of syllable form is converted to Chinese character by allusion quotation, and then is understood using language model the Chinese character converted, in turn Generate corresponding explanation text.

Acoustic model, Pronounceable dictionary and the language model in speech recognition network 401 are introduced separately below：

Acoustic model is one of part mostly important in speech recognition network 401, and input value is acoustic feature extraction The acoustic feature vector of model output, output valve is the acoustics for the syllable form being calculated according to each frame acoustic feature vector Feature.The Hidden Markov Model (GMM-HMM) mostly based on Gaussian Mixture distribution constructs acoustic model at present, and relatively advanced goes back Have based on deep-neural-network Hidden Markov Model (DNN-HMM) building acoustic model, Hidden Markov Model it is general Thought is a discrete time-domain finite-state automata, and the internal state external world of Hidden Markov Model is invisible, and the external world can only be seen To the output valve at the model each moment.

Include syllable in Pronounceable dictionary to the mapping relations between word, main function be to connect acoustic model and Language model.The set for all words that can be handled in Pronounceable dictionary comprising speech recognition network 401, and correspondingly indicate each The pronunciation of word, thus Pronounceable dictionary can according to the corresponding relationship between word and pronunciation of words, construct acoustic model with Mapping relations between language model, and then can be connected acoustic model and language model by the mapping relations, i.e., Pronounceable dictionary can be according to the word and the corresponding pronunciation of each word itself stored, by the syllable form of acoustic model output Acoustic feature is converted to corresponding word, and the word being converted to is input to language model.

Language model is based on the specific knowledge of grammar, carries out Understanding to each word identified by Pronounceable dictionary, To guarantee to identify obtained sentence grammaticalness logic, and then the accuracy rate of speech recognition is improved, that is, guarantees the explanation identified Text grammaticalness logic has higher correspondence with audio stream is explained.Common language model is generally divided into two kinds, Yi Zhongwei Statistical language model based on Large Scale Corpus, this language model are suitble to handle extensive real corpus, data preparation Consistency is preferable；Another kind is rule-based language model, and this language model is with grammatical and semantic in Chinese vocabulary system Based on classification, by determining that morphology, grammer and the semantic relation of natural language achieve the purpose that sentence identifies.

Positioning network 402 is using the output of speech recognition network 401 and to explain audio stream as input, with live stream pair Network of the start/stop time for the bloom segment answered as output.Positioning network 402 is mainly responsible for be exported according to speech recognition network Explanation text and explain audio stream, when positioning the start-stop of bloom segment in synchronous with the explanation audio stream corresponding live stream It carves.

Network 402 is positioned in determining live stream during bloom segment start/stop time, needed by wherein running Feature Selection Model analyzes the word speed, the intonation feature that explain audio stream, analyzes the semantic feature for explaining text, After obtaining the semanteme for explaining audio stream, word speed and intonation feature by analysis, in turn, it can be relied on by positioning in network 402 The location model of middle operation, according to bloom in the determining semanteme of Feature Selection Model, word speed and intonation feature location live stream The start/stop time of segment.

In order to make it easy to understand, the Feature Selection Model and location model that position in network 402 are introduced below：

The explanation text that Feature Selection Model is exported with speech recognition network 401, and explain audio stream and be used as input, with Semantic feature, word speed feature and the intonation feature of the explanation audio stream are as output.When specific implementation, Feature Selection Model is to defeated The explanation text entered carries out semantic analysis, determines the semantic feature explained in audio stream everywhere by analyzing the explanation text；It is special Sign extracts model can also carry out the analysis of word speed and intonation to the explanation audio stream of input, by analyzing the explanation audio stream Determine in the explanation audio stream feature of word speed and intonation everywhere.

Feature Selection Model correspondingly exists during analysis explains semanteme, word speed and the intonation feature of audio stream Explain and mark out semantic feature, word speed feature and intonation feature everywhere in audio stream, for example, Feature Selection Model by pair It explains text and explains audio stream and carry out signature analysis discovery, explain the explanation audio in audio stream from 40 ' 51 " to 43 ' 20 " The semantic feature of stream shows the final hit period for Basketball Match at this time, and the word speed feature of this section of explanation audio stream is compared to Word speed feature in the explanation audio stream at other positions is very fast, and intonation feature is compared to other positions in the explanation audio stream The intonation feature at place is higher, and Feature Selection Model 40 ' 51 " to 43 ' 20 " place in explaining audio stream accordingly marks out this section of solution Say the semantic feature, word speed feature and intonation feature of audio stream.

After location model receives the semantic feature, word speed feature and intonation feature of Feature Selection Model transmission, Ji Kegen According to the received semantic feature of institute, word speed feature and an intonation feature, determines and explain in audio stream live streaming corresponding to audio stream everywhere Whether stream belongs to bloom segment.For example, semantic feature, word speed feature and intonation that location model is exported according to Feature Selection Model Feature discovery explains in audio stream and shows at this time from the semantic feature of this section of explanation audio stream of 40 ' 51 " to 43 ' 20 " as basketball ratio The final hit period of match, the word speed feature that this section explains audio stream are compared to the spy of the word speed in the explanation audio stream at other positions Sign is very fast, and the intonation feature that intonation feature is compared in the explanation audio stream at other positions is higher, correspondingly, location model This section can be explained to audio stream and synchronize corresponding live stream as bloom segment, the start/stop time that this section explains audio stream is made For the start/stop time of this section of bloom segment.

It should be noted that Feature Selection Model can also only divide the explanation text of speech recognition network output Analysis determines and explains the corresponding semantic feature of audio stream, and then the semantic feature is exported to location model, so that location model is only The start/stop time of the corresponding bloom segment of live stream is determined according to the semantic feature；In addition, Feature Selection Model can also be true On the basis of making semantic feature, the word speed feature or intonation that audio stream determines the explanation audio stream are further explained by analysis Feature, and by the semantic feature determined and word speed feature, alternatively, the semantic feature determined and intonation feature are sent To location model, so that location model is according to semantic feature and word speed feature, alternatively, being determined according to semantic feature and intonation feature The start/stop time of the corresponding bloom segment of live stream.

It should be noted that the semanteme that the Feature Selection Model in positioning network 402 can explain text by analysis is special Sign, determines meaning expressed by each section of explanation audio stream, in turn, Feature Selection Model can be according to each section of explanation audio stream institute table The meaning reached determines the title classification of each section of explanation audio stream, correspondingly, if certain section of explanation audio stream is determined as by location model Bloom segment, then the title classification that this section explains audio stream can be used as the title classification of this section of bloom segment.

Above-mentioned bloom segment identification model is by cascade speech recognition network and positioning network, according to the explanation of input Audio stream determines that the explanation audio stream synchronizes the start/stop time of bloom segment in corresponding live stream.Correspondingly, the bloom is utilized When segment identification model identifies bloom segment start/stop time, it can be determined with settling at one go according to the explanation audio stream of input The start/stop time of bloom segment in corresponding live stream synchronous with the explanation audio stream.

It is understood that above-mentioned bloom segment identification model whether can accurately according to the explanation audio stream of input, The start/stop time for determining bloom segment in the corresponding live stream of explanation audio stream, dependent on the bloom segment identification model Model performance, and the quality of the model performance of bloom segment identification model depends on training to the bloom segment identification model Journey.

It is introduced below with reference to process of the Fig. 5 to training bloom segment identification model.

It is the configuration diagram of bloom segment identification model training process referring to Fig. 5, Fig. 5.As shown in figure 5, building in advance Bloom segment identify initial model 510, the bloom segment identify initial model 510 in include initial speech identification network 511 with And initial alignment network 512, the sample in training sample is explained into audio stream and inputs bloom segment identification initial model 510, Processing through bloom segment identification initial model 510 obtains high mating plate in sample live stream corresponding with sample explanation audio stream Section prediction data, the bloom segment prediction data include that bloom segment explains in the corresponding sample live stream of audio stream in sample Predict start/stop time；Using the prediction data of bloom segment and the labeled data of known bloom segment, loss function is constructed, it should The labeled data of bloom segment includes that bloom segment is flowing the start/stop time in corresponding sample live stream with sample explanation.In turn Optimize the model parameter that bloom segment identifies each network in initial model 510 by the loss function, to optimize the high mating plate Section identification initial model 510.

It, can be according to current high mating plate when bloom segment identification initial model 510 meets default training termination condition The model parameter and network structure of section identification initial model, building can put into the bloom segment identification model 520 of practical application, Include in the bloom segment identification model 520：The speech recognition net that network 511 obtains is identified by training optimization initial speech Network 521 obtains specified position network 522 by training optimization initial alignment network 512.

In order to further appreciate that the training process of above-mentioned bloom segment identification model, below in conjunction with Fig. 6 to shown in fig. 5 The concrete methods of realizing of the training process of bloom segment identification model is introduced.It is that bloom segment identifies mould referring to Fig. 6, Fig. 6 The flow diagram of the training method of type, the training method include the following steps：

Step 601：Training sample is obtained, the training sample includes：Sample explain audio stream and with sample interpretation speech Frequency flows bloom segment labeled data in corresponding live stream, the bloom segment labeled data include bloom segment with sample solution Say the start/stop time in the corresponding sample live stream of audio stream.

Before being trained bloom segment identification initial model, need to obtain training sample, it is acquired to utilize Training sample is trained bloom segment identification initial model.During hands-on, available multiple training samples, Model training is carried out as sample data using multiple training samples.

Since the input of bloom segment identification model is to explain audio stream, export as the corresponding live stream of explanation audio stream Therefore the start/stop time of middle bloom segment when being trained bloom segment initial identification model using training sample, needs to obtain Take it is identical with bloom segment identification model output and input, i.e., need to include sample explanation audio in acquired training sample The labeled data of bloom segment, the mark of the bloom segment in stream and sample live stream corresponding with sample explanation audio stream Data include start/stop time of the bloom segment in sample live stream corresponding with sample explanation audio stream.Thereby guarantee that utilization The bloom segment identification model that training sample training obtains, can satisfy the input of bloom segment identification model in practical application Demand and output demand.

It should be noted that bloom segment identification initial model is the neural network structure built in advance, the high mating plate Section identification initial model includes initial speech identification network and initial alignment network, includes wherein just in initial speech identification network Beginning acoustic feature extracts model, initial decoder, acoustic model, Pronounceable dictionary and language model, wraps in initial alignment network It includes initial characteristics and extracts model and initial alignment model.

Step 602：Training is iterated to bloom segment identification initial model using the training sample, obtains and meets in advance If the bloom segment identification model of training termination condition.

When being iterated trained to bloom segment identification initial model using each training sample obtained in step 601, Sample in training sample is explained into audio stream input bloom segment and identifies initial model, identifies initial model using bloom segment In initial speech identification network audio stream explained to the sample of input carry out speech recognition, obtain and explain audio stream pair with sample The prediction answered explains text, and then prediction explanation text and sample are explained audio stream and are input to initial alignment network, by Initial alignment network explains text according to prediction and sample explains audio stream, determining corresponding with sample explanation audio stream straight The prediction data of bloom segment in stream is broadcast, the prediction data of the bloom segment includes that bloom segment is explaining audio stream with the sample Prediction start/stop time in corresponding sample live stream.

When specific implementation, the initial speech that sample explanation audio stream is input in bloom segment identification initial model is identified Network extracts model using acoustic feature in initial speech identification network and explains audio stream progress acoustic feature to the sample of input It extracts, in turn, in conjunction with acoustic model, Pronounceable dictionary and language model, utilizes the initial decoding in initial speech identification network Device extracts the acoustic feature that model extraction goes out according to acoustic feature, generates prediction corresponding with sample explanation audio stream and explains text This.

Initial speech identifies that the prediction of generation is further explained text input to initial alignment network by network, while by sample This explanation audio stream also inputs the initial alignment network, model is extracted using the initial characteristics in initial alignment network, according to defeated The prediction that enters explains text and sample explains audio stream, extract sample explain the semantic feature of audio stream, word speed feature and Intonation feature；In turn, semantic feature, word speed feature and intonation feature that initial characteristics extract model output are input to initially Position the initial alignment model in network, initial alignment model according to the semantic feature, word speed feature and intonation feature of input, The prediction data of bloom segment, the prediction data packet of the bloom segment in determining live stream corresponding with sample explanation audio stream Include the prediction start/stop time of bloom segment in sample live stream corresponding with sample explanation audio stream.

In turn, according to the prediction data of bloom segment identification initial model output and known bloom segment labeled data Loss function is constructed, and then according to the loss function, the model parameter in bloom segment identification initial model is adjusted, from And realize the optimization to bloom segment identification initial model.When bloom segment identification initial model meets default training termination condition When, the model parameter of initial model and the network structure of bloom segment identification model can be identified according to current bloom segment, Determine bloom segment identification model.

When specifically judging whether bloom segment identification initial model meets default training termination condition, it can use test specimens This verifies the first model, wherein the first model is to carry out first to bloom segment identification initial model using training sample The model that wheel training optimization obtains utilizes specifically, the test in test sample, which is explained audio stream, is input to first model First model determines that test explains the test data of bloom segment in the corresponding test live stream of audio stream, the bloom segment Test data includes test start/stop time of the bloom segment in test live stream corresponding with test explanation audio stream.Into And bloom segment recognition accuracy is calculated according to the labeled data of the test data of bloom segment and bloom segment, when the identification When accuracy rate is greater than preset threshold, i.e., it is believed that the better performances of first model, meet default training termination condition, it can be with Directly according to the model parameter and network structure of first model, bloom segment identification model is determined.

It should be noted that above-mentioned preset threshold can be set according to the actual situation, herein not to the preset threshold It is specifically limited.

Moreover, it is judged that can also be taken turns according to when whether bloom segment identification initial model meets default training condition more Multiple models that training obtains, it is determined whether continue to be trained model, be known with obtaining the optimal bloom segment of model performance Other model.The multiple models got through more trainings in rotation are verified respectively specifically, can use test sample, are judged through each Whether the Detection accuracy for the model that training in rotation is got is promoted, if the Detection accuracy of the model got through each training in rotation it Between gap it is smaller, then it is assumed that the performance of model do not had deduct a percentage space, then can choose the highest model of recognition accuracy, root According to the model parameter and network structure of the model, bloom segment identification model is determined；If the instruction pair got through each training in rotation As detection model Detection accuracy between have biggish gap, then it is assumed that the performance of model there are also training improving performance sky Between, then it can continue to be trained model, it is initial until obtaining the model performance bloom segment identification optimal compared with stable performance Model.

Bloom segment identification initial model is trained using above-mentioned model training method, the prediction based on bloom segment Error between data and bloom constructs loss function, is joined according to model of the loss function to instruction object initial detecting model Number is adjusted, and then when bloom segment identification initial model meets default training condition, is identified according to current bloom segment The model structure of model parameter and bloom segment the identification initial model of initial model, determines bloom segment identification model, by This obtains model performance and preferably refers to bloom segment identification model.

For the ease of further understanding the clipping method of live stream provided by the embodiments of the present application, below with reference to live streaming basketball The clipping method of live stream provided by the embodiments of the present application is introduced in the application scenarios of match.

Referring to Fig. 7, Fig. 7 is the application scenarios schematic diagram of the clipping method of live stream provided by the embodiments of the present application.This is answered Include in scene：Video camera 701 explains microphone 702, direct broadcast server 703, live stream editing server 704 and client End 705.

At Basketball Match scene, video camera 701 is sent out the scenic picture of the in-situ match acquired in real time as on-site signal It send to direct broadcast server 703；At the same time, explaining microphone 702, the generation of current race game situation is introduced in acquisition announcer explanation in real time Explanation audio signal, and the explanation audio signal is sent to direct broadcast server 703.

The on-site signal that direct broadcast server 703 receives the transmission of video camera 701 conciliates the explanation audio for the transmission of cylinder 702 of speaking After signal, correspondingly to the received on-site signal of institute and explanation audio signal real-time perfoming coding, obtain synchronizing corresponding live streaming Stream and explanation audio stream.And then by live stream and audio streams are explained to live stream editing server 704.

Live stream editing server 704 gets the live stream and corresponding with the live stream of the transmission of direct broadcast server 703 Explanation audio stream after, by received explain the bloom segment that runs in audio stream input live stream editing server 704 and know Other model, the bloom segment identification model carry out intelligence by semanteme, word speed and the intonation feature of the explanation audio stream to input It can analyze, determine the start/stop time of bloom segment in corresponding live stream synchronous with the explanation audio stream.Bloom segment identification Model is neural network end to end, can be directly according to the explanation audio stream of input, and determination is corresponding with the explanation audio stream The start/stop time of bloom segment in live stream.

The process of the start/stop time of bloom segment is determined based on bloom segment identification model in live stream editing server 704 In, live stream editing server 704 will first explain audio stream and be input to the speech recognition network in bloom segment identification model, language Sound identifies that the acoustic feature in network extracts the acoustic feature vector that model extraction explains audio stream, and by the acoustic feature vector To decoder, decoder combination acoustic model, Pronounceable dictionary and language model are analyzed and processed acoustic feature vector for output, Explanation text corresponding with the explanation audio stream of input is obtained, and by the explanation text output into high mating plate section identification model Position network；At the same time, live stream editing server 704 will also explain audio stream and also input the positioning network, position network Feature Selection Model by being analyzed and processed to explaining text and explain audio stream, the semanteme for obtaining the explanation audio stream is special Sign, word speed feature and intonation feature, and then the semantic feature of the explanation audio stream, word speed feature and intonation feature are exported Location model into positioning network, so that the location model is according to semantic feature, word speed feature and the language for explaining audio stream Feature is adjusted, determines the start/stop time of bloom segment in live stream corresponding with the explanation audio stream.

After bloom segment identification model determines the start/stop time of bloom segment, run in live stream editing server 704 Cloud editing system editing operation can be carried out to acquired live stream according to the start/stop time of the bloom segment, to obtain The bloom segment of the live stream.

It should be noted that bloom segment identification model can also be determined according to the explanation audio stream semantic feature of input The corresponding title classification of each bloom segment can also be correspondingly each after editing live stream obtains the bloom segment in live stream Bloom segment marks corresponding title classification.

It should be noted that live stream editing server 704 may be used also during editing live stream obtains bloom segment The advertising clip in each bloom segment to be deleted, so that user according to the start/stop time of the advertisement play time obtained in advance From the influence of advertisement when viewing bloom segment.

After live stream editing services the bloom segment that 704 editings obtain live stream, by bloom segment each in live stream and The corresponding title classification of each bloom segment is sent to client, and bloom segment of the client correspondingly on live streaming interface issues column The received bloom segment of middle publication institute, and accordingly mark the corresponding title classification of each bloom segment.In this way, client is being broadcast live It, can be in the bloom segment and each bloom of bloom segment publication this Basketball Match of column synchronized update during Basketball Match The corresponding title classification of segment, user can be according to the corresponding title classification of each bloom segment as a result, it is determined whether watches each height Mating plate section.

For the clipping method of above-described live stream, present invention also provides the editing device of corresponding live stream, In order to the application and realization of these methods in practice.

Show referring to the structure that Fig. 8, Fig. 8 are a kind of editing devices 800 of live stream corresponding with method shown in figure 2 above It is intended to, which includes：

Module 801 is obtained, for obtaining live stream and explanation audio stream corresponding with the live stream；

Processing module 802 obtains the bloom segment for the explanation audio stream to be inputted bloom segment identification model The start/stop time of the bloom segment of identification model output；The bloom segment identification model is neural network model end to end；

Editing module 803, for according to the corresponding start/stop time of the bloom segment, editing to be obtained from the live stream The bloom segment of the live stream.

Optionally, in the editing device of above-mentioned live stream shown in Fig. 8, the bloom segment identification model includes：Cascade Speech recognition network and positioning network；Wherein,

The speech recognition network is using the corresponding explanation audio stream of live stream as input, to explain text as output Neural network；

The positioning network is using speech recognition network output and the explanation audio stream as input, with live streaming Flow neural network of the start/stop time of corresponding bloom segment as output.

Optionally, the speech recognition network includes：Acoustic model, Pronounceable dictionary, language model and decoder.

Optionally, the positioning network includes：Feature Selection Model and location model；

The Feature Selection Model be using the speech recognition network output and the explanations audio stream as input, with The neural network of semantic feature, word speed feature and intonation feature as output；

The location model is using the output of the characteristic extracting module as input, with the corresponding bloom segment of live stream Start/stop time as output neural network.

It optionally, is the embodiment of the present application referring to Fig. 9, Fig. 9 on the basis of the editing device of live stream shown in Fig. 8 The structural schematic diagram of the editing device 900 of another live stream of offer, the device further include：

Sample acquisition module 901, for obtaining training sample, each training sample includes：Sample explains audio stream, Yi Jiyu Sample explains the labeled data of bloom segment in the corresponding live stream of audio stream, and the labeled data of the bloom segment includes bloom Start/stop time of the segment in sample live stream corresponding with sample explanation audio stream；

Training module 902 is obtained for being iterated training to bloom segment identification initial model using each training sample Meet the bloom segment identification model of default training termination condition.

Optionally, in the editing device of live stream shown in Fig. 8, the bloom segment identification model is to explain audio stream It is output with the corresponding start/stop time of bloom segment and affiliated title classification as input；Then live stream shown in Fig. 8 On the basis of editing device, referring to Figure 10, Figure 10 is the editing device 1000 of another live stream provided by the embodiments of the present application Structural schematic diagram, which further includes：

Title classification obtains module 1001, and the bloom segment for obtaining the bloom segment identification model output is corresponding Title classification.

It optionally, is the application implementation referring to Figure 11, Figure 11 on the basis of the editing device of live stream shown in Fig. 8 The structural schematic diagram of the editing device 1100 for another live stream that example provides, the device further include：

First advertisement removing module 1101 is used for according to advertisement play time start/stop time, from the bloom of the live stream Advertisement is deleted in segment.

It optionally, is the application implementation referring to Figure 12, Figure 12 on the basis of the editing device of live stream shown in Fig. 8 The structural schematic diagram of the editing device 1200 for another live stream that example provides, the device further include：

Advertisement fragment locating module 1201, for according to the fragment where fuzzy location algorithm positioning advertising；

Advertisement locating module 1202, for determining advertisement start/stop time from the fragment；

Second advertisement removing module 1203 is used for according to the advertisement start/stop time, from the bloom segment of the live stream It is middle to delete the advertisement.

It optionally, is the application implementation referring to Figure 13, Figure 13 on the basis of the editing device of live stream shown in Fig. 8 The structural schematic diagram of the editing device 1300 for another live stream that example provides, the device further include：

Bloom segment release module 1301, for issuing the bloom segment of the live stream at live streaming interface.

Optionally, in the editing device of live stream shown in Fig. 8, the live stream includes net cast stream or audio Live stream.

Optionally, in the editing device of live stream shown in Fig. 8, the acquisition module 801 is specifically used in real time from straight It broadcasts server and obtains live stream and explanation audio stream corresponding with the live stream.

Optionally, in the editing device of live stream shown in Fig. 8, the editing module 803 is specifically used for the height The corresponding start/stop time of mating plate section inputs cloud editing system, by cloud editing system from the live stream editing obtain it is described straight Broadcast the bloom segment of stream.

The editing device of live stream provided by the embodiments of the present application, using a kind of bloom segment identification model end to end, According to explanation audio stream corresponding with the live stream of wanted editing, the start/stop time of bloom segment in the live stream is determined, in turn According to the start/stop time of bloom segment in the live stream, editing goes out bloom segment from live stream.The bloom segment end to end Identification model can determine the solution by carrying out intellectual analysis to features such as the meaning of one's words, word speed and the intonation explained in audio stream The start/stop time for saying the bloom segment in the corresponding live stream of audio stream is conciliate without editing personnel by manual analysis live stream The content of audio stream is said to determine the start/stop time of bloom segment, greatly reduces the people for needing to expend during live stream editing Work cost improves editing efficiency, and in the environment of internet big data, utilizes the bloom segment in the embodiment of the present application Identification model can rapidly determine the start/stop time of bloom segment, and then rising according to the bloom segment according to explanation audio stream Only moment accordingly editing live stream, meets the demand nowadays to live stream editing business.

Present invention also provides a kind of film editing equipments of live stream, which is specifically as follows server, referring to figure 14, Figure 14 be a kind of structural schematic diagram of the film editing equipment of live stream provided by the embodiments of the present application, which can be because Configuration or performance are different and generate bigger difference, may include one or more central processing units (central Processing units, CPU) 1422 (for example, one or more processors) and memory 1432, one or one with The storage medium 1430 (such as one or more mass memory units) of upper storage application program 1442 or data 1444.Its In, memory 1432 and storage medium 1430 can be of short duration storage or persistent storage.It is stored in the program of storage medium 1430 It may include one or more modules (diagram does not mark), each module may include to the series of instructions in server Operation.Further, central processing unit 1422 can be set to communicate with storage medium 1430, execute on server 1400 Series of instructions operation in storage medium 1430.

Server 1400 can also include one or more power supplys 1426, one or more wired or wireless nets Network interface 1450, one or more input/output interfaces 1458, and/or, one or more operating systems 1441, example Such as Windows ServerTM, Mac OS XTM, UnixTM, LinuxTM, FreeBSDTM etc..

The step as performed by server can be based on server architecture shown in the Figure 14 in above-described embodiment.

Wherein, CPU 1422 is for executing following steps：

Optionally, any specific implementation side of clipping method of live stream in the embodiment of the present application can also be performed in CPU 1422 The method and step of formula.

The embodiment of the present application also provides the film editing equipment of another live stream, which can be terminal device, As shown in figure 15, for ease of description, part relevant to the embodiment of the present application is illustrated only, particular technique details does not disclose , please refer to the embodiment of the present application method part.The terminal can be include computer, mobile phone, tablet computer, personal digital assistant (full name in English：Personal Digital Assistant, english abbreviation：PDA), point-of-sale terminal (full name in English：Point of Sales, english abbreviation：POS), any terminal device such as vehicle-mounted computer, taking the terminal as an example：

Figure 15 shows the block diagram of the part-structure of mobile phone relevant to terminal provided by the embodiments of the present application.With reference to figure 15, mobile phone includes：Radio frequency (full name in English：Radio Frequency, english abbreviation：RF) circuit 1510, memory 1520, defeated Enter unit 1530, display unit 1540, sensor 1550, voicefrequency circuit 1560, Wireless Fidelity (full name in English：wireless Fidelity, english abbreviation：WiFi) the components such as module 1570, processor 1580 and power supply 1590.Those skilled in the art It is appreciated that handset structure shown in Figure 15 does not constitute the restriction to mobile phone, it may include more more or fewer than illustrating Component perhaps combines certain components or different component layouts.

It is specifically introduced below with reference to each component parts of the Figure 15 to mobile phone：

RF circuit 1510 can be used for receiving and sending messages or communication process in, signal sends and receivees, particularly, by base station After downlink information receives, handled to processor 1580；In addition, the data for designing uplink are sent to base station.In general, RF circuit 1510 include but is not limited to antenna, at least one amplifier, transceiver, coupler, low-noise amplifier (full name in English：Low Noise Amplifier, english abbreviation：LNA), duplexer etc..In addition, RF circuit 1510 can also by wireless communication with net Network and other equipment communication.Any communication standard or agreement can be used in above-mentioned wireless communication, and including but not limited to the whole world is mobile Communication system (full name in English：Global System of Mobile communication, english abbreviation：GSM), general point Group wireless service (full name in English：General Packet Radio Service, GPRS), CDMA (full name in English：Code Division Multiple Access, english abbreviation：CDMA), wideband code division multiple access (full name in English：Wideband Code Division Multiple Access, english abbreviation：WCDMA), long term evolution (full name in English：Long Term Evolution, english abbreviation：LTE), Email, short message service (full name in English：Short Messaging Service, SMS) etc..

Memory 1520 can be used for storing software program and module, and processor 1580 is stored in memory by operation 1520 software program and module, thereby executing the various function application and data processing of mobile phone.Memory 1520 can be led It to include storing program area and storage data area, wherein storing program area can be needed for storage program area, at least one function Application program (such as sound-playing function, image player function etc.) etc.；Storage data area, which can be stored, uses institute according to mobile phone Data (such as audio data, phone directory etc.) of creation etc..In addition, memory 1520 may include high random access storage Device, can also include nonvolatile memory, and a for example, at least disk memory, flush memory device or other volatibility are solid State memory device.

Input unit 1530 can be used for receiving the number or character information of input, and generate with the user setting of mobile phone with And the related key signals input of function control.Specifically, input unit 1530 may include touch panel 1531 and other inputs Equipment 1532.Touch panel 1531, also referred to as touch screen collect touch operation (such as the user of user on it or nearby Use the behaviour of any suitable object or attachment such as finger, stylus on touch panel 1531 or near touch panel 1531 Make), and corresponding attachment device is driven according to preset formula.Optionally, touch panel 1531 may include touch detection Two parts of device and touch controller.Wherein, the touch orientation of touch detecting apparatus detection user, and detect touch operation band The signal come, transmits a signal to touch controller；Touch controller receives touch information from touch detecting apparatus, and by it It is converted into contact coordinate, then gives processor 1580, and order that processor 1580 is sent can be received and executed.In addition, Touch panel 1531 can be realized using multiple types such as resistance-type, condenser type, infrared ray and surface acoustic waves.In addition to touch surface Plate 1531, input unit 1530 can also include other input equipments 1532.Specifically, other input equipments 1532 may include But in being not limited to physical keyboard, function key (such as volume control button, switch key etc.), trace ball, mouse, operating stick etc. It is one or more.

Display unit 1540 can be used for showing information input by user or be supplied to user information and mobile phone it is each Kind menu.Display unit 1540 may include display panel 1541, optionally, can use liquid crystal display (full name in English： Liquid Crystal Display, english abbreviation：LCD), Organic Light Emitting Diode (full name in English：Organic Light- Emitting Diode, english abbreviation：) etc. OLED forms configure display panel 1541.Further, touch panel 1531 can Covering display panel 1541 sends processor to after touch panel 1531 detects touch operation on it or nearby 1580, to determine the type of touch event, are followed by subsequent processing device 1580 and are provided on display panel 1541 according to the type of touch event Corresponding visual output.Although touch panel 1531 and display panel 1541 are come as two independent components in Figure 15 Realize the input and input function of mobile phone, but in some embodiments it is possible to by touch panel 1531 and display panel 1541 It is integrated and that realizes mobile phone output and input function.

Mobile phone may also include at least one sensor 1550, such as optical sensor, motion sensor and other sensors. Specifically, optical sensor may include ambient light sensor and proximity sensor, wherein ambient light sensor can be according to ambient light Light and shade adjust the brightness of display panel 1541, proximity sensor can close display panel when mobile phone is moved in one's ear 1541 and/or backlight.As a kind of motion sensor, accelerometer sensor can detect in all directions (generally three axis) and add The size of speed can detect that size and the direction of gravity when static, can be used to identify application (such as the horizontal/vertical screen of mobile phone posture Switching, dependent game, magnetometer pose calibrating), Vibration identification correlation function (such as pedometer, tap) etc.；Also as mobile phone The other sensors such as configurable gyroscope, barometer, hygrometer, thermometer, infrared sensor, details are not described herein.

Voicefrequency circuit 1560, loudspeaker 1561, microphone 1562 can provide the audio interface between user and mobile phone.Audio Electric signal after the audio data received conversion can be transferred to loudspeaker 1561, be converted by loudspeaker 1561 by circuit 1560 For voice signal output；On the other hand, the voice signal of collection is converted to electric signal by microphone 1562, by voicefrequency circuit 1560 Audio data is converted to after reception, then by after the processing of audio data output processor 1580, through RF circuit 1510 to be sent to ratio Such as another mobile phone, or audio data is exported to memory 1520 to be further processed.

WiFi belongs to short range wireless transmission technology, and mobile phone can help user's transceiver electronics postal by WiFi module 1570 Part, browsing webpage and access streaming video etc., it provides wireless broadband internet access for user.Although Figure 15 is shown WiFi module 1570, but it is understood that, and it is not belonging to must be configured into for mobile phone, it can according to need do not changing completely Become in the range of the essence of invention and omits.

Processor 1580 is the control centre of mobile phone, using the various pieces of various interfaces and connection whole mobile phone, By running or execute the software program and/or module that are stored in memory 1520, and calls and be stored in memory 1520 Interior data execute the various functions and processing data of mobile phone, to carry out integral monitoring to mobile phone.Optionally, processor 1580 may include one or more processing units；Preferably, processor 1580 can integrate application processor and modulation /demodulation processing Device, wherein the main processing operation system of application processor, user interface and application program etc., modem processor is mainly located Reason wireless communication.It is understood that above-mentioned modem processor can not also be integrated into processor 1580.

Mobile phone further includes the power supply 1590 (such as battery) powered to all parts, it is preferred that power supply can pass through power supply Management system and processor 1580 are logically contiguous, to realize management charging, electric discharge and power consumption pipe by power-supply management system The functions such as reason.

Although being not shown, mobile phone can also include camera, bluetooth module etc., and details are not described herein.

In the embodiment of the present application, processor 1580 included by the terminal device is also with the following functions：

Optionally, any specific implementation of clipping method of live stream in the embodiment of the present application can also be performed in processor 1580 The method and step of mode.

The embodiment of the present application also provides a kind of computer readable storage medium, for storing program code, the program code For executing any one embodiment in a kind of clipping method of live stream described in foregoing individual embodiments.

The embodiment of the present application also provides a kind of computer program product including instruction, when run on a computer, So that computer executes any one embodiment in a kind of clipping method of live stream described in foregoing individual embodiments.

It is apparent to those skilled in the art that for convenience and simplicity of description, the system of foregoing description, The specific work process of device and unit, can refer to corresponding processes in the foregoing method embodiment, and details are not described herein.

In several embodiments provided herein, it should be understood that disclosed system, device and method can be with It realizes by another way.For example, the apparatus embodiments described above are merely exemplary, for example, the unit It divides, only a kind of logical function partition, there may be another division manner in actual implementation, such as multiple units or components It can be combined or can be integrated into another system, or some features can be ignored or not executed.Another point, it is shown or The mutual coupling, direct-coupling or communication connection discussed can be through some interfaces, the indirect coupling of device or unit It closes or communicates to connect, can be electrical property, mechanical or other forms.

The unit as illustrated by the separation member may or may not be physically separated, aobvious as unit The component shown may or may not be physical unit, it can and it is in one place, or may be distributed over multiple In network unit.It can select some or all of unit therein according to the actual needs to realize the mesh of this embodiment scheme 's.

It, can also be in addition, each functional unit in each embodiment of the application can integrate in one processing unit It is that each unit physically exists alone, can also be integrated in one unit with two or more units.Above-mentioned integrated list Member both can take the form of hardware realization, can also realize in the form of software functional units.

If the integrated unit is realized in the form of SFU software functional unit and sells or use as independent product When, it can store in a computer readable storage medium.Based on this understanding, the technical solution of the application is substantially The all or part of the part that contributes to existing technology or the technical solution can be in the form of software products in other words It embodies, which is stored in a storage medium, including some instructions are used so that a computer Equipment (can be personal computer, server or the network equipment etc.) executes the complete of each embodiment the method for the application Portion or part steps.And storage medium above-mentioned includes：USB flash disk, mobile hard disk, read-only memory (full name in English：Read-Only Memory, english abbreviation：ROM), random access memory (full name in English：Random Access Memory, english abbreviation： RAM), the various media that can store program code such as magnetic or disk.

The above, above embodiments are only to illustrate the technical solution of the application, rather than its limitations；Although referring to before Embodiment is stated the application is described in detail, those skilled in the art should understand that：It still can be to preceding Technical solution documented by each embodiment is stated to modify or equivalent replacement of some of the technical features；And these It modifies or replaces, the spirit and scope of each embodiment technical solution of the application that it does not separate the essence of the corresponding technical solution.

Claims

1. a kind of clipping method of live stream, which is characterized in that including：

The explanation audio stream is inputted into bloom segment identification model, obtains the high mating plate of the bloom segment identification model output The start/stop time of section；The bloom segment identification model is neural network model end to end；

According to the start/stop time of the bloom segment, editing obtains the bloom segment of the live stream from the live stream.

2. the method according to claim 1, wherein the bloom segment identification model includes：Speech recognition net Network and positioning network；

The speech recognition network is using the corresponding explanation audio stream of live stream as input, to explain text as the mind of output Through network；

The positioning network is using the output of the speech recognition network and the explanation audio stream as input, with live stream Neural network of the start/stop time of corresponding bloom segment as output.

3. according to the method described in claim 2, it is characterized in that, the speech recognition network includes：Acoustic model, pronunciation word Allusion quotation, language model and decoder.

4. according to the method described in claim 2, it is characterized in that, the positioning network includes：Feature Selection Model and positioning Model；

The Feature Selection Model is using the output of the speech recognition network and the explanation audio stream as input, with language The neural network of adopted feature, word speed feature and intonation feature as output；

The location model is using the output of the characteristic extracting module as input, with rising for the corresponding bloom segment of live stream Only neural network of the moment as output.

5. the method according to claim 1, wherein the method also includes：

Training sample is obtained, the training sample includes：Sample explains audio stream and to explain audio stream with sample corresponding straight The labeled data of bloom segment in stream is broadcast, the labeled data of the bloom segment includes that bloom segment is explaining audio stream with sample Start/stop time in corresponding sample live stream；

Training is iterated to bloom segment identification initial model using the training sample, obtaining to meet to preset to train terminates item The bloom segment identification model of part.

6. the method according to any one of claims 1 to 5, which is characterized in that the bloom segment identification model is to solve Say that audio stream is output with the corresponding start/stop time of bloom segment and affiliated title classification as input；

Then the method also includes：

Obtain the corresponding title classification of bloom segment of the bloom segment identification model output.

7. the method according to any one of claims 1 to 5, which is characterized in that the method also includes：

According to advertisement play time start/stop time, advertisement is deleted from the bloom segment of the live stream.

8. the method according to any one of claims 1 to 5, which is characterized in that the method also includes：

Fragment where fuzzy location algorithm positioning advertising；

Advertisement start/stop time is determined from the fragment；

According to the advertisement start/stop time, the advertisement is deleted from the bloom segment of the live stream.

9. method according to any one of claims 1 to 5, which is characterized in that the method also includes：

The bloom segment of the live stream is issued at live streaming interface.

10. method according to any one of claims 1 to 5, which is characterized in that the live stream include net cast stream or Person's live audio stream.

11. method according to any one of claims 1 to 5, which is characterized in that the acquisition live stream and with it is described straight The corresponding explanation audio stream of stream is broadcast, including：

Live stream and explanation audio stream corresponding with the live stream are obtained from direct broadcast server in real time.

12. method according to any one of claims 1 to 5, which is characterized in that described corresponding according to the bloom segment Start/stop time, editing obtains the bloom segment of the live stream from the live stream, including：

The corresponding start/stop time of the bloom segment is inputted into cloud editing system, is cut from the live stream by cloud editing system It collects and obtains the bloom segment of the live stream.

13. a kind of editing device of live stream, which is characterized in that including：

Processing module obtains the bloom segment identification mould for the explanation audio stream to be inputted bloom segment identification model The start/stop time of the bloom segment of type output；The bloom segment identification model is neural network model end to end；

Editing module, for according to the corresponding start/stop time of the bloom segment, editing to obtain described straight from the live stream Broadcast the bloom segment of stream.

14. a kind of equipment, which is characterized in that the equipment includes processor and memory：

The processor is used for according to the described in any item live streams of instruction execution claim 1-12 in said program code Clipping method.

15. a kind of computer readable storage medium, which is characterized in that the computer readable storage medium is for storing program generation Code, said program code require the clipping method of the described in any item live streams of 1-12 for perform claim.