CN110166650A

CN110166650A - Generation method and device, the computer equipment and readable medium of video set

Info

Publication number: CN110166650A
Application number: CN201910355708.1A
Authority: CN
Inventors: 刘霄; 李鑫; 李甫; 何栋梁; 龙翔; 张赫男; 孙昊; 文石磊; 丁二锐
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2019-04-29
Filing date: 2019-04-29
Publication date: 2019-08-23
Anticipated expiration: 2039-04-29
Also published as: CN110166650B

Abstract

The present invention provides the generation method and device, computer equipment and readable medium of a kind of video set.Its method includes: to obtain the relevant multiple videos of designated entities based on the knowledge mapping pre-established；According to motion detection model trained in advance, editing includes multiple video clips of required movement from the multiple video；The multiple video clip comprising the required movement is stitched together, the video set is obtained.It is of the invention by using above-mentioned technical proposal, provide it is a kind of efficiently, automatically generate the scheme of video set.Technical solution of the present invention, knowledge based map and AI generate video set, can effectively guarantee the precision of the accuracy of the video clip of editing and the video set of generation, and generating process does not need manually to participate in editing, the formation efficiency of video set is very high.

Description

Generation method and device, the computer equipment and readable medium of video set

[technical field]

The present invention relates to the generation methods and device of computer application technology more particularly to a kind of video set, calculating Machine equipment and readable medium.

[background technique]

With the fast development of multimedia and internet, a kind of information that video becomes indispensable in user's life is obtained Take mode.User can not only be known by video studying new knowledge, can also watch various tourism videos, entertainment video whenever and wherever possible Etc., to enjoy the amusement and recreation time.

In the prior art, video resource is very rich, and the information content for including in each video is very big, even a shadow The films and television programs that shooting there are many portions can be corresponded to depending on performer.So if user wants the view of the magnanimity included by the video library Editing goes out the video set comprising common required movement in frequency, just needs to browse each video, and manually from each It includes the video clip of the required movement that editing, which goes out, in video comprising the required movement.It then manually will include the required movement Each video clip be serially connected, generate video set.

It can be seen from the above, the generation of existing video set takes time and effort very much, and due to artificial editing, video clipping precision It is not high, lead in the video set of editing that there may be the video clip quilts of video clip or required movement except required movement Editing it is imperfect.Therefore, it is urgent to provide a kind of generation schemes of efficient video set.

[summary of the invention]

The present invention provides a kind of generation method of video set and devices, computer equipment and readable medium, for providing A kind of generation scheme of efficient video set.

The present invention provides a kind of generation method of video set, which comprises

Based on the knowledge mapping pre-established, the relevant multiple videos of designated entities are obtained；

According to motion detection model trained in advance, editing includes multiple videos of required movement from the multiple video Segment；

The multiple video clip comprising the required movement is stitched together, the video set is obtained.

The present invention provides a kind of generating means of video set, and described device includes:

Module is obtained, for obtaining the relevant multiple videos of designated entities based on the knowledge mapping pre-established；

Editing module, for according to motion detection model trained in advance, editing to include specified from the multiple video Multiple video clips of movement；

Splicing module obtains described for will include that the multiple video clip of the required movement is stitched together Video set.

The present invention also provides a kind of computer equipment, the equipment includes:

One or more processors；

Memory, for storing one or more programs；

When one or more of programs are executed by one or more of processors, so that one or more of processing Device realizes the generation method of video set as described above.

The present invention also provides a kind of computer-readable mediums, are stored thereon with computer program, which is held by processor The generation method of video set as described above is realized when row.

Generation method and device, the computer equipment and readable medium of video set of the invention, by using above-mentioned technology Scheme, the technology that the precision for the video set that manually generated video set takes time and effort, and generates in the prior art can be overcome not high Problem, provide it is a kind of efficiently, automatically generate the scheme of video set.Technical solution of the present invention, knowledge based map and AI are raw At video set, it can effectively guarantee the precision of the accuracy of the video clip of editing and the video set of generation, and generating process It does not need manually to participate in editing, the formation efficiency of video set is very high.

[Detailed description of the invention]

Fig. 1 is the flow chart of the generation method embodiment one of video set of the invention.

Fig. 2 is the flow chart of the generation method embodiment two of video set of the invention.

Fig. 3 is the flow chart of the generation method embodiment three of video set of the invention.

Fig. 4 is a kind of structural schematic diagram of time convolutional network provided by the invention.

Fig. 5 is the structure chart of the generating means embodiment one of video set of the invention.

Fig. 6 is the structure chart of the generating means embodiment two of video set of the invention.

Fig. 7 is the structure chart of the generating means embodiment three of video set of the invention.

Fig. 8 is the structure chart of the generating means example IV of video set of the invention.

Fig. 9 is the structure chart of computer equipment embodiment of the invention.

Figure 10 is a kind of exemplary diagram of computer equipment provided by the invention.

[specific embodiment]

To make the objectives, technical solutions, and advantages of the present invention clearer, right in the following with reference to the drawings and specific embodiments The present invention is described in detail.

Artificial intelligence (Artificial Intelligence；AI), it is research, develops for simulating, extending and extending people Intelligence theory, method, a new technological sciences of technology and application system.Artificial intelligence is one of computer science Branch, it attempts to understand the essence of intelligence, and produces a kind of new intelligence that can be made a response in such a way that human intelligence is similar Energy machine, the research in the field includes robot, language identification, image recognition, natural language processing and expert system etc..It is based on AI technology can realize various applications using various neural network models.

Knowledge mapping (Knowledge Graph) is also known as mapping knowledge domains, and being known as knowledge domain in books and information group can Map is mapped depending on change or ken, is a series of a variety of different figures of explicit knowledge's development process and structural relation, is used Visualization technique describes knowledge resource and its carrier, excavates, analysis, building, draws and explicit knowledge and mutual between them Connection.Knowledge mapping is by by the theory and method of the subjects such as applied mathematics, graphics, Information Visualization Technology, information science And the methods of meterological citation analysis, Co-occurrence Analysis combine, and the core knot of subject is visually shown using visual map Structure, developing history, Disciplinary Frontiers and whole Knowledge framework reach the modern theory of Multidisciplinary Integration purpose, mention for disciplinary study For practical, valuable reference.The data in existing various fields can carry out storing data by constructing knowledge mapping In relationship and entity attributes and attribute value etc. between each entity, each entity, convenient for the demand of business.

Realize the generation of video set, the present invention is based on AI and knowledge mapping to provide a kind of generation of efficient video set Batch, large-scale video set generation may be implemented in scheme.

Fig. 1 is the flow chart of the generation method embodiment one of video set of the invention.As shown in Figure 1, the view of the present embodiment The generation method of frequency collection, can specifically include following steps:

S100, based on the knowledge mapping pre-established, obtain the relevant multiple videos of designated entities；

The executing subject of the generation method of the video set of the present embodiment can be the generating means of a video set, the video set Generating means can be an electronic entity, or may be an application by Integrated Simulation.

In the present embodiment, the knowledge mapping pre-established can refer to the building mode of existing knowledge mapping, building view The knowledge mapping in frequency field.May include in the structure of knowledge mapping relationship between multiple entities, entity, entity attributes with And attribute value etc..In the knowledge mapping of video field, principal can be the title of video, if the video of figure kind, It may include name, the name of role, the name of director, the name of producer and the master of the performer in video in related entities Inscribe song etc..It may include the type etc. of animal in video if animal class video, in related entities.

The designated entities of the present embodiment can be people entities, or animal carcasses or other kinds of entity. If designated entities are some designated person information corresponding entity, the required movement of the present embodiment can be able to detect that Any movement, for example, can to kiss, fighting, joyride etc. acts.If designated entities are some specified corresponding reality of animal information When body, the required movement of the present embodiment can be any movement of animal, such as can be to run, walk single-plank bridge, jump through fire hoops, eating The movement such as thing.Similarly, the designated entities of the present embodiment can also be other celestial bodies such as the earth, the moon, the sun, and required movement is also It can be total solar eclipse, partial solar eclipse, lunar eclipse etc. required movement.Or the designated entities of the present embodiment can also be other entities, Required movement can also can complete other movements for designated entities, and no longer cluster repeats one by one herein.

Before step S100, the generating means of the video set can receive the generation request of the video set of user's triggering, The video set generation request in can carry designated entities and required movement, with request generate one about designated entities, Video comprising required movement.

It accordingly,, can basis after the generating means of the video set receive the generation request of video set in step S100 Relationship in the knowledge mapping pre-established between entity and entity obtains the corresponding multiple video informations such as video of designated entities Title；Then the corresponding video of each video information is obtained from video library, obtains multiple videos.

S101, basis motion detection model trained in advance, editing includes multiple views of required movement from multiple videos Frequency segment；

In the present embodiment, using motion detection model trained in advance, required movement in multiple videos can be identified Video clip, so can from multiple videos editing go out include required movement multiple video clips.Specifically, multiple videos In some video in may not include required movement video clip, or may also include one, two or more packet Video clip containing required movement.In short, can be gone out from multiple videos with editing includes multiple video clips of required movement.Depending on Quantity of the quantity of frequency segment possibly more than video, it is also possible to less than the quantity of video.

In the present embodiment, a kind of motion detection model is only capable of identifying a kind of required movement.And it is other specified to detection Movement, needs re -training others motion detection model.

Specifically, according to motion detection model trained in advance, editing includes the multiple of required movement from multiple videos When video clip, it can directly be exported in the video by each video input into motion detection model by motion detection model The video clip of the required movement.It should be noted that needing first before step execution by a similar method to movement Detection model is trained.Need first to acquire before training in several training videos and each training video comprising specified dynamic Each training video of acquisition is input to motion detection model, is exported by motion detection model by the video clip of work when training It include the video clip of the required movement in the training video, then according to of the video clip comprising required movement of prediction The start-stop point of stop and the true video clip comprising required movement constructs loss function, and judges whether loss function restrains (such as whether being less than preset threshold) adjusts the parameter of motion detection model, so that the packet of motion detection model prediction if not restraining Video clip and the true video clip comprising required movement containing required movement reach unanimity.Using several training videos and The video clip comprising required movement in each training video, constantly motion detection model is instructed in the manner described above Practice, until loss function convergence, the parameter of motion detection model is determined, so that it is determined that motion detection model, motion detection model Training finishes.

According to the motion detection model of above-described embodiment training, in use, by video input into motion detection model, if Include the video clip of required movement in the video, the motion detection model can directly export include required movement view Frequency segment, in this way, by handling multiple videos, can from multiple videos editing include required movement multiple videos Segment.

In above-mentioned technical proposal, motion detection model is to identify the required movement in video, Jin Ercong using video as granularity Editing includes multiple video clips of required movement in multiple videos.In the present embodiment, each video clip only includes a finger Fixed movement.

S102, multiple video clips comprising required movement are stitched together, obtain video set.

Specifically, multiple video clips comprising required movement are stitched together, are obtained in the way of the concatenation of first place About the video set comprising required movement of designated entities needed for user.Plurality of video clip can spell at random when splicing It connects, without permanent order.

The generation method of the video set of the present embodiment can overcome people in the prior art by using above-mentioned technical proposal The not high technical problem of the precision for the video set that work generation video set takes time and effort, and generates, provides one kind and gives birth to efficiently, automatically At the scheme of video set.The technical solution of the present embodiment, knowledge based map and AI generate video set, can effectively guarantee to cut The precision of the video set of the accuracy and generation for the video clip collected, and generating process does not need manually to participate in editing, depending on The formation efficiency of frequency collection is very high.

Fig. 2 is the flow chart of the generation method embodiment two of video set of the invention.As shown in Fig. 2, the view of the present embodiment Technical side of the invention is discussed in detail on the basis of the technical solution of above-mentioned embodiment illustrated in fig. 1 in the generation method of frequency collection Case.As shown in Fig. 2, the generation method of the video set of the present embodiment, can specifically include following steps:

S200, receive carrying designated entities and required movement that user is inputted by human-machine interface module video set life At request；

It carries performer A for example, user can be sent by human-machine interface module to the generating means of video set and specifies dynamic The generation of the video set of work is requested, with the video set of request performer A required movement completed.Or user can pass through Human-machine interface module sends the generation request for carrying the video set of director B and required movement to the generating means of video set, to ask Seek the video set for obtaining the required movement that the director directs.Or user can generation by from human-machine interface module to video set Device sends the generation request for carrying dog and the video set gnawed a bone, the video set gnawed a bone with request dog.

The human-machine interface module of the present embodiment can be mouse, keyboard, touch screen, or can also can connect for microphone etc. Receive the request of Client-initiated speech form.

Relationship in the knowledge mapping that S201, basis pre-establish between entity and entity, it is corresponding to obtain designated entities Multiple video informations；

S202, the corresponding video of each video information is obtained from video library, obtain multiple videos；

Step S201 and S202 is a kind of specific implementation form of the step S100 of above-mentioned embodiment illustrated in fig. 1, in detail The record of above-mentioned embodiment illustrated in fig. 1 can be referred to, details are not described herein.

S203, the image of each frame is extracted according to chronological order for each video, obtains one group of image sequence；

S204, motion detection model trained according to each group image sequence and in advance predict to specify in corresponding video dynamic The starting point and ending point of work；

S205, according to the starting point and ending point of each required movement, editing is corresponding comprising specified from corresponding video Multiple video clips are obtained in the video clip of movement；

Above-mentioned steps S203-S205 is a kind of implementation of the step S101 of above-mentioned embodiment illustrated in fig. 1.

Specifically, the motion detection model in the present embodiment identifies in video using the image of frame each in video as granularity Required movement, and then editing includes multiple video clips of required movement from multiple videos.

In the program, motion detection model is to realize the identification of required movement by the detection to image.Due to specified dynamic Make that there is certain duration, in the present embodiment, each frame image in the image sequence of identification video, prediction view can be passed through The starting point and ending point of required movement in frequency, and then the starting point and ending point based on required movement, editing include specified dynamic The video clip of work.

For example, each image in each group image sequence can be input in motion detection model when specific implementation, by The corresponding image of motion detection model prediction be the starting point of required movement probability and corresponding image be required movement end The probability of stop；Then the probability for obtaining the starting point of required movement in each group image sequence is greater than the image of predetermined probabilities threshold value Probability at the time of corresponding as the starting point of required movement in corresponding video, the terminating point of required movement is greater than predetermined probabilities The terminating point at the time of image of threshold value corresponds to as required movement in corresponding video.The predetermined probabilities threshold value of the present embodiment can To be rule of thumb arranged, such as 0.5 probability value less than 1 can be greater than for one.

That is, for the image sequence of each video a figure can be chosen every time according to vertical sequence Picture is input in motion detection model.Accordingly, motion detection model exports corresponding two probability values of this image, and one Be this image be required movement starting point probability value, one be this image be required movement terminating point probability Value.If the probability value of starting point or the probability value of terminating point are greater than predetermined probabilities threshold value, then it is assumed that this image is corresponding Starting point or terminating point.In practical application, an image can not be both starting point and terminating point, so practical application In, the case where being simultaneously greater than predetermined probabilities threshold value there is no the probability value of the probability value of starting point and terminating point.In this way, for The image sequence of each video from front to back one by one can first predict every image input value motion detection model specified The corresponding image of the starting point of movement, and then at the time of according to this image, it can determine the start-stop point of required movement.Next It continues to test, according to similar mode, can determine the terminating point of required movement.In the detection process, starting point and end Stop is existing in pairs.It may only include a pair of of start-stop point and terminating point, it is also possible to including multipair start-stop point in one video And terminating point.

According to above-described embodiment, the available starting point and ending point to multiple required movements is every for what is got Every a pair of of start-stop point and terminating point in one video, can from corresponding video editing it is corresponding comprising required movement Video clip, total available multiple video clips comprising required movement.

Optionally, in practical application, if user request be about performer A complete required movement video set, although According to the available all videos performed to performer A of step S201 and S202, but it in the video may further include other Performer also completes the required movement, thus above-mentioned steps S205 obtain may not be performer A complete required movement view Frequency segment.Therefore, can also include following situation before step S204 after the step S203 of above-described embodiment:

If designated entities are designated person information, the image in each group image sequence is carried out based on designated person information Face datection is deleted in each group image sequence including the image of designated person information；In this manner it is ensured that the finger of subsequent acquisition Surely all it is the video clip for including the designated person information in the video clip acted, the standard of the video clip of acquisition can be improved True property.

Similarly, if designated entities are specified animal information, based on specified animal information to the figure in each group image sequence As carrying out feature detection, deleting does not include the image for specifying animal information in each group image sequence, it is also ensured that subsequent acquisition Required movement video clip in be all the video clip for including the specified animal information, the video clip of acquisition can be improved Accuracy.

Further optionally, multiple video clips can also be obtained at this time after step S205, is then based on specified People information carries out Face datection to multiple video clips comprising required movement that editing obtains, and deletes in multiple video clips It does not include the video clip of designated person information；Required movement in the video clip of reservation is regarded as the designated person information What corresponding personage completed.

Similarly, if designated entities are specified animal information, the packet that editing can also be obtained based on specified animal information Multiple video clips containing required movement carry out feature detection, and deleting does not include the view for specifying animal information in multiple video clips Frequency segment, the required movement in the video clip of reservation are regarded as what the corresponding animal of the specified animal information was completed.

Wherein, the Face datection mode of the present embodiment can be with are as follows: presets the corresponding people of multiple designated person information Face template when detection, extracts the face in each frame image of video, then specifies the face of extraction with multiple this is preset The corresponding face template of people information is matched, if similarity is greater than preset threshold, then it is assumed that the face in the video is should Otherwise the corresponding face of designated person information is not.Or in the present embodiment, the training designated person information pair can also be passed through The Face datection model answered is realized, acquires the facial image of multiple personages in advance, training the face detection model, so that The face detection model can accurately identify the face of the designated person.In use, each frame image of video is input to the people Face detection model, the face detection model can predict in the video whether include the designated person face.

In addition, if being detected by the way of feature detection when designated entities are specified animal information.It should be noted that It is that animal is different, and feature collected may not also be identical, specifically the image of feature templates can be chosen specified dynamic comprising this Object is different from the more apparent feature of other animals.For example, some animals can area by animal head in practical application Transfer species, multiple head portraits are acquired at this time as preset feature templates.If only undistinguishable by head portrait, also Other characteristic informations of the body of animal can be increased, not only to include that head portrait is also wanted in the image of the feature templates acquired at this time Including other characteristic informations.Specific detection process is identical as the principle of above-mentioned Face datection.Similarly the mode of feature detection can also To specify the feature detection model of animal to realize using this, the realization principle phase of realization principle and above-mentioned Face datection model Together, referring in detail to the record of above-described embodiment, details are not described herein.

S206, according to preset splicing rule, multiple video clips comprising required movement are stitched together, depending on Frequency collects.

In the present embodiment, preset splicing rule can be preset, for example, can according to each video clip duration by Be short to long sequence perhaps from long to short or can be shown according to the corresponding video of each video clip the date by as far as close or Sequence from the near to the remote can also can also perhaps be implemented according to other preset splicing rules or according to shown in above-mentioned Fig. 1 The rule of example spliced at random, multiple video clips comprising required movement are stitched together, video set is obtained.

The generation method of the video set of the present embodiment provides one kind and gives birth to efficiently, automatically by using above-mentioned technical proposal At the scheme of video set.The technical solution of the present embodiment, knowledge based map and AI generate video set, can effectively guarantee to cut The precision of the video set of the accuracy and generation for the video clip collected, and generating process does not need manually to participate in editing, depending on The formation efficiency of frequency collection is very high.

Fig. 3 is the flow chart of the generation method embodiment three of video set of the invention.As shown in figure 3, the view of the present embodiment The generation method of frequency collection is discussed in detail in embodiment illustrated in fig. 2 on the basis of the technical solution of above-mentioned embodiment illustrated in fig. 2 The training process of used motion detection model.The generation method of the video set of the present embodiment can specifically include following step It is rapid:

S300, acquisition include several training video segments of required movement, and mark to mark in each training video segment and refer to Surely the true starting point acted and true terminating point；

S301, according in several training video segments, each training video segment mark required movement true starting point and True terminating point, training action detection model.

The executing subject of the generation method of the video set of the present embodiment can be consistent with above-mentioned Fig. 1, Fig. 2, by video set Generating means come together to realize.First motion detection model is trained by the generating means of video set, then by video set Generating means be based on trained motion detection model and knowledge mapping, using the technical solution of above-mentioned embodiment illustrated in fig. 2, Generate video set.

Or the executing subject of the generation method of the video set of the present embodiment, it may also be distinct from that shown in above-mentioned Fig. 1 and Fig. 2 The executing subject of embodiment is the training device of the motion detection model of the generating means independently of video set.Specifically make Used time first trains the motion detection model by the training device of motion detection model, and then the generating means of video set are generating When video set, the knowledge mapping for calling directly trained motion detection model and pre-establishing is implemented using shown in above-mentioned Fig. 2 The technical solution of example generates video set.

Before the training of the motion detection model of the present embodiment, need to acquire several training video pieces including required movement Section, while needing to mark the true starting point of required movement and true terminating point in each training video segment, using as subsequent tune The reference of the parameter of whole motion detection model.

When specific training, step S301 can specifically include following steps:

(a) image of each frame is extracted according to chronological order for each training video segment, obtains one group of training figure As sequence；

(b) it according to each group training image sequence and motion detection model, predicts to specify in corresponding training video segment dynamic The prediction starting point and prediction terminating point of work；

Wherein step (a) and implementation (b) and step S203 in above-mentioned embodiment illustrated in fig. 2 and step S204 Implementation is identical, can record in detail with reference to the related of above-described embodiment, details are not described herein.

(c) according to the true starting point of the required movement in each training video segment and prediction starting point, true terminating point With prediction terminating point, mean square deviation loss function is calculated；

(d) judge whether mean square deviation loss function restrains；If when not converged, executing step (e)；

For example, the preset threshold of a very little can be set, judge whether the value of mean square deviation function is less than when specific implementation The preset threshold, if so, mean square deviation function convergence is thought, otherwise it is assumed that mean square deviation function is not restrained.

(e) parameter of gradient descent method update action detection model is utilized；It executes step (f)；

In the present embodiment, by using the parameter of gradient descent method update action detection model, undated parameter can be made Motion detection model afterwards, the starting point of prediction and true starting point, the terminating point of prediction and true terminating point closer to, thus So that the value of mean square deviation loss function tends to restrain.

(f) it repeats to be trained motion detection model in the manner described above using each training video segment, that is, repeats Step (b)-(e) is stated, until mean square deviation loss function tends to restrain, determines the parameter of motion detection model, so that it is determined that movement Detection model, motion detection model training finishes at this time.

It, can be by one three layers of building of time convolutional network as motion detection model in the present embodiment.The time Each layer of input of convolutional network is a sequence, and output is an isometric sequence.Output sequence is at each time point Value is entered data to determine by current time of list entries, previous moment, next moment.The motion detection model Input be the continuous multiple frames video extraction extracted using convolutional neural networks characteristic sequence.The moment is exported in its top layer Whether predict to obtain is the probability for acting starting point or terminating point.In the training stage, terminating point and starting point be all it is known, Therefore it can be used to the mode of supervised training to be trained.Fig. 4 is a kind of structure of time convolutional network provided by the invention Schematic diagram.As shown in figure 4, the time convolutional network may include multilayer shown in figure.

The generation method of the video set of the present embodiment can train one efficiently to move by using above-mentioned technical proposal Make detection model, video set can be generated based on the motion detection model and the knowledge mapping pre-established in order to subsequent, into And it can effectively guarantee the precision of the video set of the accuracy and generation of each video clip in video set.

Below by user request to generate a well-known performer Q complete to fight movement video set for, to describe the application Video set generating process.

Specifically, video set generating means receive the well-known performer Q of carrying and the video set for action message of fighting of user Generate request；Then the video display that video set generating means are taken part in a performance according to the well-known performer Q of this in the knowledge mapping pre-established with it The corresponding relationship of acute title, the available all movie and television play titles taken part in a performance to the well-known performer Q, may further be from video library In get the videos of all movie and television plays that the well-known performer Q takes part in a performance, such as available multiple videos altogether.

In the present embodiment, the dynamic of the starting point and ending point of a movement of fighting in video for identification can be trained in advance Make detection model.Next, according to chronological order, extracting the image of each frame for each video got, obtaining one Group image sequence；And each image in image sequence is sequentially input according to sequencing to motion detection mould trained in advance Type, by motion detection model prediction correspondence image be fight movement starting point probability and fight movement terminating point it is general Rate.In the prediction result of each image in image sequence, the probability of the starting point for movement of fighting is taken to be greater than predetermined probabilities threshold The image of value such as 0.5 is the start-stop point of movement of fighting at the time of correspondence, accordingly, take be located in image sequence thereafter, recently The image that the probability of the terminating point of adjacent movement of fighting is greater than predetermined probabilities threshold value such as 0.5 is movement of fighting at the time of correspondence Terminating point.Due to the starting point and ending point for movement of fighting be it is existing in pairs, thus just can be based on the starting for movement of fighting Point and terminating point, the corresponding video clip of editing from video.If in a video, there are multistage fight movement video frequency Section, in the manner described above, video clip of the everywhere that can be clipped in each video comprising movement of fighting finally obtains Multiple video clips comprising movement of fighting about the well-known performer Q.

But in order to avoid there is the piece of video for not including well-known performer Q in multiple video clips comprising movement of fighting Section can also be using the face template of the well-known performer Q pre-established to multiple video clips after obtaining multiple video clips Face datection is carried out, video clip not including well-known performer Q is deleted, only reservation includes the video clip of well-known performer Q, is recognized It is that well-known performer Q is completed for the movement of fighting in the video clip of reservation.Finally by multiple video clips of reservation according to pre- If splicing rule be stitched together, generate the well-known performer Q and complete to fight the video set of movement.

Above-mentioned scene is only a kind of application scenarios of the present embodiment, can also be at other in the present embodiment in practical application The video set of the relevant required movement of other designated entities is generated under scene, no longer citing repeats one by one herein.

Fig. 5 is the structure chart of the generating means embodiment one of video set of the invention.As shown in figure 5, the view of the present embodiment The generating means of frequency collection, can specifically include:

Module 10 is obtained to be used to obtain the relevant multiple videos of designated entities based on the knowledge mapping pre-established；

Editing module 11 is used for according to motion detection model trained in advance, from the multiple videos for obtaining the acquisition of module 10 Editing includes multiple video clips of required movement；

Multiple video clips splicing comprising required movement that splicing module 12 is used to obtain 11 editing of editing module exists Together, video set is obtained.

Still optionally further, module 10 is obtained, is specifically used for:

According to the relationship in knowledge mapping between entity and entity, the corresponding multiple video informations of designated entities are obtained；

The corresponding video of each video information is obtained from video library, obtains multiple videos.

The generating means of the video set of the present embodiment realize the realization principle of the generation of video set by using above-mentioned module And technical effect is identical as the realization of above-mentioned related method embodiment, can refer to the note of above-mentioned related method embodiment in detail It carries, details are not described herein.

Fig. 6 is the structure chart of the generating means embodiment two of video set of the invention.As shown in fig. 6, the view of the present embodiment The generating means of frequency collection, on the basis of the technical solution of above-mentioned embodiment illustrated in fig. 5, editing module 11 can also include:

Extracting unit 111 is used to extract each frame according to chronological order for obtaining each video that module 10 is got Image, obtain one group of image sequence；

Predicting unit 112 is used for each group image sequence and motion detection model being drawn into according to extracting unit 111, prediction The starting point and ending point of required movement in corresponding video；

The starting point and ending point for each required movement that editing unit 113 is used to be predicted according to predicting unit 112, from correspondence Video in the corresponding video clip comprising required movement of editing, multiple video clips are obtained.

Still optionally further, 112 predicting unit of predicting unit, is specifically used for:

Each image in each group image sequence is input in motion detection model, is corresponded to by motion detection model prediction Image be required movement starting point probability and corresponding image be required movement terminating point probability；

The probability for obtaining the starting point of required movement in each group image sequence is corresponding greater than the image of predetermined probabilities threshold value Moment is greater than predetermined probabilities threshold value as the probability of the starting point of required movement in corresponding video, the terminating point of required movement Terminating point at the time of image corresponds to as required movement in corresponding video.

Still optionally further, as shown in fig. 6, the generating means of the video set of the present embodiment, editing module 11, further include inspection Unit 114 is surveyed, is used for:

If designated entities are designated person information, extracting unit 111 is drawn into based on designated person information each group Image in image sequence carries out Face datection, deletes in each group image sequence including the image of designated person information；

If designated entities are specified animal information, extracting unit 111 is drawn into based on specified animal information each group Image in image sequence carries out feature detection, and deleting does not include the image for specifying animal information in each group image sequence.

At this point, accordingly, predicting unit 112 is used for according to treated each group image sequence and the movement of detection unit 114 Detection model predicts the starting point and ending point of required movement in corresponding video.

Fig. 7 is the structure chart of the generating means embodiment three of video set of the invention.As shown in fig. 7, the view of the present embodiment The generating means of frequency collection can also include following technical solution on the basis of the technical solution of above-mentioned embodiment illustrated in fig. 5.

As shown in fig. 7, the generating means of the video set of the present embodiment, further include detection module 13, are used for:

If designated entities are designated person information, editing module 11 is obtained based on designated person information multiple videos Segment carries out Face datection, deletes in multiple video clips including the video clip of designated person information；

If designated entities are specified animal information, editing module 11 is obtained based on specified animal information multiple videos Segment carries out feature detection, and deleting does not include the video clip for specifying animal information in multiple video clips.

Accordingly, splicing module 12 is used to will test the multiple piece of video comprising required movement obtained after module 13 is handled Section is stitched together, and obtains video set.

Fig. 8 is the structure chart of the generating means example IV of video set of the invention.As shown in figure 8, the view of the present embodiment The generating means of frequency collection, can specifically include:

Acquisition module 14 marks each training video segment for acquiring several training video segments including required movement The true starting point of middle required movement and true terminating point；

Several training video segments, each training video segment of the training module 15 for being acquired according to acquisition module 14 are got the bid The true starting point of note required movement and true terminating point, training action detection model.

Still optionally further, training module 15 is specifically used for:

The image of each frame is extracted according to chronological order for each training video segment, obtains one group of training image sequence Column；

According to each group training image sequence and motion detection model, required movement in corresponding training video segment is predicted Predict starting point and prediction terminating point；

According to the true starting point of the required movement in each training video segment and prediction starting point, true terminating point and pre- Terminating point is surveyed, mean square deviation loss function is calculated；

If mean square deviation loss function is not converged, the parameter of gradient descent method update action detection model is utilized；

It repeats to be trained motion detection model in the manner described above using each training video segment, until mean square deviation is damaged It loses function to tend to restrain, the parameter of motion detection model is determined, so that it is determined that motion detection model.

The generating means of the video set of the present embodiment can be individually present, can also respectively with above-mentioned Fig. 5, Fig. 6 and Fig. 7 Middle combination forms a kind of alternative embodiment of the invention.

Fig. 9 is the structure chart of computer equipment embodiment of the invention.As shown in figure 9, the computer equipment of the present embodiment, It include: one or more processors 30 and memory 40, memory 40 works as memory for storing one or more programs The one or more programs stored in 40 are executed by one or more processors 30, so that one or more processors 30 are realized such as The generation method of figure 1 above-embodiment illustrated in fig. 3 video set.In embodiment illustrated in fig. 9 for including multiple processors 30.

For example, Figure 10 is a kind of exemplary diagram of computer equipment provided by the invention.Figure 10, which is shown, to be suitable for being used to realizing The block diagram of the exemplary computer device 12a of embodiment of the present invention.The computer equipment 12a that Figure 10 is shown is only one and shows Example, should not function to the embodiment of the present invention and use scope bring any restrictions.

As shown in Figure 10, computer equipment 12a is showed in the form of universal computing device.The component of computer equipment 12a Can include but is not limited to: one or more processor 16a, system storage 28a, connecting different system components (including is Unite memory 28a and processor 16a) bus 18a.

Bus 18a indicates one of a few class bus structures or a variety of, including memory bus or Memory Controller, Peripheral bus, graphics acceleration port, processor or the local bus using any bus structures in a variety of bus structures.It lifts For example, these architectures include but is not limited to industry standard architecture (ISA) bus, microchannel architecture (MAC) Bus, enhanced isa bus, Video Electronics Standards Association (VESA) local bus and peripheral component interconnection (PCI) bus.

Computer equipment 12a typically comprises a variety of computer system readable media.These media can be it is any can The usable medium accessed by computer equipment 12a, including volatile and non-volatile media, moveable and immovable Jie Matter.

System storage 28a may include the computer system readable media of form of volatile memory, such as deposit at random Access to memory (RAM) 30a and/or cache memory 32a.Computer equipment 12a may further include it is other it is removable/ Immovable, volatile/non-volatile computer system storage medium.Only as an example, storage system 34a can be used for reading Write immovable, non-volatile magnetic media (Figure 10 do not show, commonly referred to as " hard disk drive ").Although not showing in Figure 10 Out, the disc driver for reading and writing to removable non-volatile magnetic disk (such as " floppy disk ") can be provided, and to removable The CD drive of anonvolatile optical disk (such as CD-ROM, DVD-ROM or other optical mediums) read-write.In these cases, Each driver can be connected by one or more data media interfaces with bus 18a.System storage 28a may include At least one program product, the program product have one group of (for example, at least one) program module, these program modules are configured To execute the function of the above-mentioned each embodiment of Fig. 1-Fig. 8 of the present invention.

Program with one group of (at least one) program module 42a/utility 40a, can store and deposit in such as system In reservoir 28a, such program module 42a include --- but being not limited to --- operating system, one or more application program, It may include the reality of network environment in other program modules and program data, each of these examples or certain combination It is existing.Program module 42a usually executes the function and/or method in above-mentioned each embodiment of Fig. 1-Fig. 8 described in the invention.

Computer equipment 12a can also be with one or more external equipment 14a (such as keyboard, sensing equipment, display 24a etc.) communication, the equipment interacted with computer equipment 12a communication can be also enabled a user to one or more, and/or (such as network interface card is adjusted with any equipment for enabling computer equipment 12a to be communicated with one or more of the other calculating equipment Modulator-demodulator etc.) communication.This communication can be carried out by input/output (I/O) interface 22a.Also, computer equipment 12a can also by network adapter 20a and one or more network (such as local area network (LAN), wide area network (WAN) and/or Public network, such as internet) communication.As shown, network adapter 20a passes through its of bus 18a and computer equipment 12a The communication of its module.It should be understood that although not shown in the drawings, other hardware and/or software can be used in conjunction with computer equipment 12a Module, including but not limited to: microcode, device driver, redundant processor, external disk drive array, RAID system, tape Driver and data backup storage system etc..

Processor 16a by the program that is stored in system storage 28a of operation, thereby executing various function application and Data processing, such as realize the generation of video set shown in above-described embodiment.

The present invention also provides a kind of computer-readable mediums, are stored thereon with computer program, which is held by processor The generation of the video set as shown in above-described embodiment is realized when row.

The computer-readable medium of the present embodiment may include in the system storage 28a in above-mentioned embodiment illustrated in fig. 10 RAM30a, and/or cache memory 32a, and/or storage system 34a.

With the development of science and technology, the route of transmission of computer program is no longer limited by tangible medium, it can also be directly from net Network downloading, or obtained using other modes.Therefore, the computer-readable medium in the present embodiment not only may include tangible Medium can also include invisible medium.

The computer-readable medium of the present embodiment can be using any combination of one or more computer-readable media. Computer-readable medium can be computer-readable signal media or computer readable storage medium.Computer-readable storage medium Matter for example may be-but not limited to-system, device or the device of electricity, magnetic, optical, electromagnetic, infrared ray or semiconductor, or Any above combination of person.The more specific example (non exhaustive list) of computer readable storage medium includes: with one Or the electrical connections of multiple conducting wires, portable computer diskette, hard disk, random access memory (RAM), read-only memory (ROM), Erasable programmable read only memory (EPROM or flash memory), optical fiber, portable compact disc read-only memory (CD-ROM), light Memory device, magnetic memory device or above-mentioned any appropriate combination.In this document, computer readable storage medium can With to be any include or the tangible medium of storage program, the program can be commanded execution system, device or device use or Person is in connection.

Computer-readable signal media may include in a base band or as carrier wave a part propagate data-signal, Wherein carry computer-readable program code.The data-signal of this propagation can take various forms, including --- but It is not limited to --- electromagnetic signal, optical signal or above-mentioned any appropriate combination.Computer-readable signal media can also be Any computer-readable medium other than computer readable storage medium, which can send, propagate or Transmission is for by the use of instruction execution system, device or device or program in connection.

The program code for including on computer-readable medium can transmit with any suitable medium, including --- but it is unlimited In --- wireless, electric wire, optical cable, RF etc. or above-mentioned any appropriate combination.

The computer for executing operation of the present invention can be write with one or more programming languages or combinations thereof Program code, described program design language include object oriented program language-such as Java, Smalltalk, C++, Further include conventional procedural programming language-such as " C " language or similar programming language.Program code can be with It fully executes, partly execute on the user computer on the user computer, being executed as an independent software package, portion Divide and partially executes or executed on a remote computer or server completely on the remote computer on the user computer.? Be related in the situation of remote computer, remote computer can pass through the network of any kind --- including local area network (LAN) or Wide area network (WAN)-be connected to subscriber computer, or, it may be connected to outer computer (such as mentioned using Internet service It is connected for quotient by internet).

In several embodiments provided by the present invention, it should be understood that disclosed system, device and method can be with It realizes by another way.For example, the apparatus embodiments described above are merely exemplary, for example, the unit It divides, only a kind of logical function partition, there may be another division manner in actual implementation.

The unit as illustrated by the separation member may or may not be physically separated, aobvious as unit The component shown may or may not be physical unit, it can and it is in one place, or may be distributed over multiple In network unit.It can select some or all of unit therein according to the actual needs to realize the mesh of this embodiment scheme 's.

It, can also be in addition, the functional units in various embodiments of the present invention may be integrated into one processing unit It is that each unit physically exists alone, can also be integrated in one unit with two or more units.Above-mentioned integrated list Member both can take the form of hardware realization, can also realize in the form of hardware adds SFU software functional unit.

The above-mentioned integrated unit being realized in the form of SFU software functional unit can store and computer-readable deposit at one In storage media.Above-mentioned SFU software functional unit is stored in a storage medium, including some instructions are used so that a computer It is each that equipment (can be personal computer, server or the network equipment etc.) or processor (processor) execute the present invention The part steps of embodiment the method.And storage medium above-mentioned includes: USB flash disk, mobile hard disk, read-only memory (Read- Only Memory, ROM), random access memory (Random Access Memory, RAM), magnetic or disk etc. it is various It can store the medium of program code.

The foregoing is merely illustrative of the preferred embodiments of the present invention, is not intended to limit the invention, all in essence of the invention Within mind and principle, any modification, equivalent substitution, improvement and etc. done be should be included within the scope of the present invention.

Claims

1. a kind of generation method of video set, which is characterized in that the described method includes:

According to motion detection model trained in advance, editing includes multiple piece of video of required movement from the multiple video Section；

2. the method according to claim 1, wherein according to motion detection model trained in advance, from described more Editing includes multiple video clips of required movement in a video, comprising:

The image of each frame is extracted according to chronological order for each video, obtains one group of image sequence；

According to each group described image sequence and the motion detection model, required movement described in the corresponding video is predicted Starting point and ending point；

According to the starting point and ending point of each required movement, it includes the finger that editing is corresponding from the corresponding video Surely the multiple video clip is obtained in the video clip acted.

3. according to the method described in claim 2, it is characterized in that, according to each group described image sequence and the motion detection mould Type predicts the starting point and ending point of required movement described in the corresponding video, comprising:

Each described image in each group described image sequence is input in the motion detection model, by the motion detection The corresponding described image of model prediction is the probability of the starting point of the required movement and corresponding described image is described specified The probability of the terminating point of movement；

The probability for obtaining the starting point of required movement described in each group described image sequence is greater than the image pair of predetermined probabilities threshold value It is big as the starting point of required movement described in the corresponding video, the probability of the terminating point of the required movement at the time of answering The terminating point at the time of image of the predetermined probabilities threshold value corresponds to as required movement described in the corresponding video.

4. according to the method described in claim 2, it is characterized in that, according to chronological order, being extracted for each video The image of each frame after obtaining one group of image sequence, according to each group described image sequence and the motion detection model, obtains each Before the starting point and ending point of the required movement in group described image sequence, the method also includes:

If the designated entities are designated person information, based on the designated person information in each group described image sequence Image carries out Face datection, deletes in each group described image sequence including the image of the designated person information；

If the designated entities are specified animal information, based on the specified animal information in each group described image sequence Image carries out feature detection, deletes in each group described image sequence including the image of the specified animal information.

5. the method according to claim 1, wherein according to motion detection model trained in advance, from described more After editing is comprising multiple video clips of required movement in a video, by the multiple piece of video comprising the required movement Section is stitched together, before obtaining the video set, the method also includes:

If the designated entities are designated person information, the multiple video clip is carried out based on the designated person information Face datection is deleted in the multiple video clip including the video clip of the designated person information；

If the designated entities are specified animal information, the multiple video clip is carried out based on the specified animal information Feature detection is deleted in the multiple video clip including the video clip of the specified animal information.

6. the method according to claim 1, wherein according to motion detection model trained in advance, from described more Before editing is comprising multiple video clips of required movement in a video, the method also includes:

Acquisition includes several training video segments of the required movement, and marks and specify described in each training video segment The true starting point of movement and true terminating point；

According to the true starting point for marking the required movement in several training video segments, each training video segment With true terminating point, the training motion detection model.

7. according to the method described in claim 6, it is characterized in that, according to several training video segments, each training Marked in video clip the required movement true starting point and true terminating point, the training motion detection model, comprising:

The training image sequence according to each group and the motion detection model predict institute in the corresponding training video segment State the prediction starting point and prediction terminating point of required movement；

According to the true starting point of the required movement in each training video segment and prediction starting point, true terminating point With prediction terminating point, mean square deviation loss function is calculated；

If the mean square deviation loss function is not converged, the parameter of the motion detection model is updated using gradient descent method；

It repeats to be trained the motion detection model in the manner described above using each training video segment, until described Mean square deviation loss function tends to restrain, and determines the parameter of the motion detection model, so that it is determined that the motion detection model.

8. the method according to claim 1, wherein obtaining designated entities based on the knowledge mapping pre-established Relevant multiple videos, comprising:

According to the relationship in the knowledge mapping between entity and entity, the corresponding multiple video letters of the designated entities are obtained Breath；

The corresponding video of each video information is obtained from video library, obtains the multiple video.

9. a kind of generating means of video set, which is characterized in that described device includes:

Editing module, for according to motion detection model trained in advance, editing to include required movement from the multiple video Multiple video clips；

Splicing module obtains the video for will include that the multiple video clip of the required movement is stitched together Collection.

10. device according to claim 9, which is characterized in that the editing module, comprising:

Extracting unit obtains one group of image for extracting the image of each frame according to chronological order for each video Sequence；

Predicting unit, for predicting in the corresponding video according to each group described image sequence and the motion detection model The starting point and ending point of the required movement；

Editing unit, for the starting point and ending point according to each required movement, the editing pair from the corresponding video The video clip comprising the required movement answered, is obtained the multiple video clip.

11. device according to claim 10, which is characterized in that the predicting unit is used for:

12. device according to claim 10, which is characterized in that the editing module further includes detection unit, is used for:

13. device according to claim 9, which is characterized in that described device further includes detection module, is used for:

14. device according to claim 9, which is characterized in that described device further include:

Acquisition module for acquiring several training video segments including the required movement, and marks each training video The true starting point of required movement described in segment and true terminating point；

Training module, for described specified dynamic according to being marked in several training video segments, each training video segment The true starting point made and true terminating point, the training motion detection model.

15. device according to claim 14, which is characterized in that the training module is used for:

16. device according to claim 9, which is characterized in that the acquisition module is used for:

17. a kind of computer equipment, which is characterized in that the equipment includes:

One or more processors；

Memory, for storing one or more programs；

When one or more of programs are executed by one or more of processors, so that one or more of processors are real Now such as method described in any one of claims 1-8.

18. a kind of computer-readable medium, is stored thereon with computer program, which is characterized in that the program is executed by processor Shi Shixian method for example described in any one of claims 1-8.