CN110147722A

CN110147722A - A kind of method for processing video frequency, video process apparatus and terminal device

Info

Publication number: CN110147722A
Application number: CN201910288788.3A
Authority: CN
Inventors: 孟桂国
Original assignee: Ping An Technology Shenzhen Co Ltd
Current assignee: Ping An Technology Shenzhen Co Ltd
Priority date: 2019-04-11
Filing date: 2019-04-11
Publication date: 2019-08-20

Abstract

The present invention is suitable for technical field of image processing, provides a kind of method for processing video frequency, video process apparatus and terminal device, the method for processing video frequency, comprising: obtain video to be processed；Determine the scene type of each video frame in the video to be processed；According to the scene type of each video frame, target identification is carried out to each video frame in the video to be processed by the first deep learning model after training, obtain target identification result, wherein, the target identification result indicates that the target object is corresponding with the scene type whether comprising target object in each video frame；According to the target identification as a result, multiple target frames of the identification video to be processed, wherein include the target object in the target frame；According to chronological order of each target frame in the video to be processed, successively multiple target frames are combined, obtain the target video.

Description

A kind of method for processing video frequency, video process apparatus and terminal device

Technical field

The invention belongs to technical field of image processing more particularly to a kind of method for processing video frequency, video process apparatus and end End equipment.

Background technique

Currently, video image technology is extensive in multiple fields (such as video monitoring, mobile terminal and social platform) Using, and a large amount of video file is produced in application process.These video files usually contain a large amount of redundancy or not Part and parcel, and the duration of these video files is often very long, causing user to need to take a long time could be from described Required key message is inquired in video file.For example, user need from when a length of 24 hours monitor video in inquire To some individual in the video section for executing specified activities, and currently, the mode that user can take is carried out to video file Fast-forward play, and some the individual time corresponding to the video section for executing specified activities that can be will be seen that is marked, To search required video section from the video file according to the corresponding time in subsequent use.But due to video file In a large amount of redundancy or unessential part presence, the mode of the key message in inquiry video in this way takes a long time, use Family is difficult to efficiently extract some key messages in video.

Summary of the invention

In view of this, the embodiment of the invention provides a kind of method for processing video frequency, video process apparatus and terminal device, it can To identify and extract the key message in video (such as identify and extract the video section comprising target individual), to improve User obtains the efficiency of the key message in video.

The first aspect of the embodiment of the present invention provides a kind of method for processing video frequency, comprising:

Obtain video to be processed；

Determine the scene type of each video frame in the video to be processed；

According to the scene type of each video frame, by the first deep learning model after training to the video to be processed In each video frame carry out target identification, obtain target identification result, wherein the target identification result indicates each video It whether include target object in frame, the target object is corresponding with the scene type；

According to the target identification as a result, multiple target frames of the identification video to be processed, wherein in the target frame Include the target object；

According to chronological order of each target frame in the video to be processed, successively by multiple target frames into Row combination, obtains the target video.

The second aspect of the embodiment of the present invention provides a kind of video process apparatus, comprising:

Module is obtained, for obtaining video to be processed；

Determining module, for determining the scene type of each video frame in the video to be processed；

First identification module passes through the first deep learning mould after training for the scene type according to each video frame Type carries out target identification to each video frame in the video to be processed, obtains target identification result, wherein the target is known Other result indicates that the target object is corresponding with the scene type whether comprising target object in each video frame；

Second identification module, for according to the target identification as a result, multiple target frames of the identification video to be processed, It wherein, include the target object in the target frame；

Processing module successively will be more for the chronological order according to each target frame in the video to be processed A target frame is combined, and obtains the target video.

The third aspect of the embodiment of the present invention provides a kind of terminal device, including memory, processor and is stored in In the memory and the computer program that can run on the processor, when the processor executes the computer program The step of realizing method as described above.

The fourth aspect of the embodiment of the present invention provides a kind of computer readable storage medium, the computer-readable storage Media storage has the step of computer program, the computer program realizes method as described above when being executed by processor.

Existing beneficial effect is the embodiment of the present invention compared with prior art: in the embodiment of the present invention, obtaining to be processed Video；Determine the scene type of each video frame in the video to be processed；According to the scene type of each video frame, pass through The first deep learning model after training carries out target identification to each video frame in the video to be processed, obtains target and knows Other result, wherein whether the target identification result indicates in each video frame to include target object, the target object and institute It is corresponding to state scene type；According to the target identification as a result, multiple target frames of the identification video to be processed, wherein institute It states in target frame comprising the target object；According to chronological order of each target frame in the video to be processed, according to It is secondary to be combined multiple target frames, obtain the target video.The embodiment of the present invention is by determining each video frame Scene type, and determined in each video frame by the deep learning model after training with the presence or absence of corresponding to the scene type Target object can carry out targeted target identification according to different scene types, so that target identification result more has needle To property, interference is less；According to the target identification as a result, identifying multiple target frames of the video to be processed, and according to described Target frame obtains target video, and the key message that can accurately and efficiently extract in the video to be processed (is such as identified and mentioned Take out the image section comprising target individual), and the target video being made of key message is obtained, and delete video to be processed Middle redundancy, unessential video content, remains valuable video content, thus such as believing the key of monitor video In the application scenarios such as breath lookup, user can carry out more efficient, more targeted processing according to the target video, be not necessarily to people The operations such as work searches key message, editing synthesizes again.The embodiment of the present invention substantially increases user and obtains in video The efficiency of key message, practicability and ease for use are stronger.

Detailed description of the invention

It to describe the technical solutions in the embodiments of the present invention more clearly, below will be to embodiment or description of the prior art Needed in attached drawing be briefly described, it should be apparent that, the accompanying drawings in the following description is only of the invention some Embodiment for those of ordinary skill in the art without creative efforts, can also be attached according to these Figure obtains other attached drawings.

Fig. 1 is the implementation process schematic diagram for the method for processing video frequency that the embodiment of the present invention one provides；

Fig. 2 is the implementation process schematic diagram of method for processing video frequency provided by Embodiment 2 of the present invention；

Fig. 3 is the schematic diagram for the video process apparatus that the embodiment of the present invention three provides；

Fig. 4 is the schematic diagram for the terminal device that the embodiment of the present invention four provides.

Specific embodiment

In being described below, for illustration and not for limitation, the tool of such as particular system structure, technology etc is proposed Body details, to understand thoroughly the embodiment of the present invention.However, it will be clear to one skilled in the art that there is no these specific The present invention also may be implemented in the other embodiments of details.In other situations, it omits to well-known system, device, electricity The detailed description of road and method, in case unnecessary details interferes description of the invention.

It should be appreciated that ought use in this specification and in the appended claims, term " includes " instruction is described special Sign, entirety, step, operation, the presence of element and/or component, but be not precluded one or more of the other feature, entirety, step, Operation, the presence or addition of element, component and/or its set.

It is also understood that mesh of the term used in this description of the invention merely for the sake of description specific embodiment And be not intended to limit the present invention.As description of the invention and it is used in the attached claims, unless on Other situations are hereafter clearly indicated, otherwise " one " of singular, "one" and "the" are intended to include plural form.

It will be further appreciated that the term "and/or" used in description of the invention and the appended claims is Refer to any combination and all possible combinations of one or more of associated item listed, and including these combinations.

As used in this specification and in the appended claims, term " if " can be according to context quilt Be construed to " when ... " or " once " or " in response to determination " or " in response to detecting ".Similarly, phrase " if it is determined that " or " if detecting [described condition or event] " can be interpreted to mean according to context " once it is determined that " or " in response to true It is fixed " or " once detecting [described condition or event] " or " in response to detecting [described condition or event] ".

In order to illustrate technical solutions according to the invention, the following is a description of specific embodiments.

Fig. 1 is the implementation process schematic diagram for the method for processing video frequency that the embodiment of the present invention one provides, the video as shown in Figure 1 Processing method may comprise steps of:

Step S101 obtains video to be processed.

The video to be processed can be the file for having carried out audio/video coding, and the coding mode of the video to be processed is It is known, also, the coding mode is coding mode included in training data, wherein the training data is for instructing Practice deep learning model.

Certainly, the video to be processed is also possible to the video not encoded, wherein each frame image includes complete Image pixel information.

Step S102 determines the scene type of each video frame in the video to be processed.

In the embodiment of the present invention, the scene type of the video frame can be determined in several ways.For example, can pass through Predetermined depth learning model determines the scene type of the video frame, it should be noted that the predetermined depth learning model can be with It is identical as the first deep learning model after training described in step S103, it can also be different；The predetermined depth learning model The scene type of each video frame in the video to be processed can be detected and determined one by one, it can also be according to a frame or multiframe The scene type of the obtained video frame is detected to determine the subject scenes classification of the video to be processed (e.g., if more than pre- If the scene type of the video frame of ratio is scene type A, it is determined that the subject scenes classification of the video to be processed is field Scape classification A), and using the subject scenes classification as the scene type of each video frame of the video to be processed.In addition, can also To determine each video frame in the video to be processed according to the title of the video to be processed, theme or default label Scene type.

Step S103, according to the scene type of each video frame, by the first deep learning model after training to described Each video frame in video to be processed carries out target identification, obtains target identification result, wherein the target identification result refers to Show that the target object is corresponding with the scene type whether comprising target object in each video frame.

In the embodiment of the present invention, different scene types can correspond to different target objects.Such as, however, it is determined that the view The scene type of frequency frame is airport scene, then target object corresponding to the scene of airport may include that aircraft, people, other machinery are set Standby (small transport device in such as airport), other flying object objects；If it is determined that the scene type of the video frame is animal Garden scene, then target object corresponding to the scene of zoo may include animal, people etc..The scene type and the object The corresponding relationship of body can be pre-set.

Illustratively, the first deep learning model may include such as ResNet model, R-CNN model, Fast R- CNN model etc., the first deep learning model can there are many, be not limited thereto.

Optionally, may include in the target identification result target object included in the video frame type, The information such as the position of target object in the video frame.

Optionally, by training after the first deep learning model to each video frame in the video to be processed into It further include the trained first deep learning model before row target identification.

Specifically, the step of training the first deep learning model may include:

Training data is obtained, the training data includes corresponding at least one training video and the training video Identification information, the identification information may include in training video accurate information on target object (type of such as target object, Position etc.)；

The training data is detected by the first deep learning model, and obtains testing result；

The parameter that the first deep learning model is adjusted according to the testing result, until described by adjusting after The testing result that one deep learning model obtains meets preset condition, and (value of such as corresponding loss function is less than preset threshold Deng), and using the first deep learning model adjusted as the first deep learning model after training.

Step S104, according to the target identification as a result, multiple target frames of the identification video to be processed, wherein institute It states in target frame comprising the target object.

The target frame can indicate the key message in the video to be processed, and the target frame of the video to be processed Specific identification method can be arranged according to the needs of practical application scene.For example, it may be determined that any one frame includes described The video frame of target object is the target frame.Furthermore, it is possible to according to the information on target object, according to chronological order according to Comprising the otherness between the video frame of the target object, (otherness may include the difference of target object for secondary judgement Property, and can ignoring for the otherness of non-targeted object), and determine according to the otherness target of the video to be processed Frame.Illustratively, the otherness between the video frame comprising the target object can be identified by more traditional method.Example Such as, for each video frame, if the image district where the positioning of target object described in the video frame, the target object Domain size etc. changes relative to a upper video frame or next frame video frame；Alternatively, if target described in a certain video frame The positioning of object, described image region the difference degree relative to a upper video frame or next frame video frame such as size Meet specified preset condition (area of such as offset of positioning more than image-region in default bias amount threshold value, two frame video frames The ratio of the difference area that accounts for one of image-region be more than default fractional threshold etc.), it is determined that the video frame is mesh Frame is marked, until traversing all video frames in the video to be processed.It is of course also possible to be judged respectively by default classifier Whether a video frame is target frame.Illustratively, the classifier may include decision tree, logistic regression, naive Bayesian, mind Through network scheduling algorithm.

Illustrate a kind of implementation of step S104 with a specific example below.

Such as, however, it is determined that the scene type of each frame video frame of the video to be processed is airport scene, the machine Target object corresponding to the scene of field is aircraft, it is assumed that includes 20 frame video frames in the video to be processed, wherein preceding 10 frame regards Frequency frame does not include aircraft, it is determined that preceding 10 frame video is non-targeted frame；And the 11st to the 20th frame includes inspection in the video frame of aircraft It measures the aircraft position that aircraft position of the 15th frame into the 20th frame is respectively relative to its former frame video frame and offset occurs, then Determine that the 14th frame to the 20th frame video frame is the target frame of the video to be processed.

Step S105, according to chronological order of each target frame in the video to be processed, successively by multiple institutes It states target frame to be combined, obtains the target video.

It can only include the target frame in the video to be processed in the embodiment of the present invention, in the target video, therefore, It is considered that deleting redundancy in video to be processed, unessential video content in the target video, remain valuable Video content, to search etc. in application scenarios in the key message to monitor video, user can be according to the mesh Mark video and carry out more efficient, more targeted processing, without manually being searched key message, editing the behaviour such as synthesize again Make.In the target video, the target frame corresponding timing node information in video to be processed can be retained, it can also not Retain.

In addition, it is optional, after obtaining the target video, can also include:

The target video is stored into specified memory space.

In the embodiment of the present invention, the memory space, which can be, to be pre-set, and is also possible to obtain the target After video, the specified memory space is determined according to the instruction operation of user, then the target video is stored to specified In memory space.

Due to usually containing a large amount of redundancy or unessential part in video to be processed, and these parts can occupy greatly The space of amount and resource, even if this resource occupation and wave can not be improved well by modes such as existing compressing files The problem of taking.Therefore, in the prior art, directly the video to be processed is stored, a large amount of storage resource can be wasted.And In the embodiment of the present invention, can store the target video so that storage video needed for space greatly reduce, so as to improve due to Storage resource bears excessive problem caused by video is excessive.

Optionally, if including target object in the video frame of the video to be processed, the target identification result also refers to Show the region where the target object and/or characteristic point position；

Correspondingly, it is described according to the target identification as a result, multiple target frames of the identification video to be processed, comprising:

Region and/or characteristic point position according to where the target object indicated by the target identification result, sentence Whether each video frame broken in the video to be processed meets the first preset condition；

The video frame for meeting first preset condition is determined as to the target frame of the video to be processed.

In the embodiment of the present invention, region and/or characteristic point position where the target object may include the target Top left co-ordinate, bottom right angular coordinate, center point coordinate and the target of image-region area, image-region where object At least one of information such as the characteristic point coordinate of object.The characteristic point can be according to the target object and application scenarios It determines, for example, the characteristic point may include the characteristic point of facial face if the target object is face.Described It is pre-set that one preset condition can be user, also, first preset condition can there are many, for example, it may be if Image-region area where the characteristic point coordinate of target object described in the video frame, the target object etc. is relative to upper One video frame or next frame video frame change；Alternatively, if the characteristic point coordinate of target object described in a certain video frame, Described image region area etc. meets specified preset condition relative to the difference degree of a upper video frame or next frame video frame (as the offset of positioning accounts for one of them more than the difference of the area of image-region in default bias amount threshold value, two frame video frames The ratio of the area of image-region is more than default fractional threshold etc.).

Wherein, optionally, the position where the target object according to indicated by the target identification result, judges institute Whether each video frame stated in video to be processed meets the first preset condition, comprising:

For each video frame in the video to be processed including the target object, according to the target identification result Region and/or characteristic point position where indicated target object, judge in the video frame, the target object relative to The difference in the region and/or characteristic point position where target object in a upper video frame for the video frame or in next video frame Whether off course degree meets the first preset condition；

Correspondingly, the target that the video frame for meeting first preset condition is determined as to the video to be processed Frame, comprising:

If in the video frame, the target object is relative in a upper video frame for the video frame or next video frame In target object where region and/or the difference degree of characteristic point position meet the first preset condition, then by the video frame Target frame as the video to be processed.

Illustratively, in the embodiment of the present invention, the difference journey in region and/or characteristic point position where the target object Degree may include one of characteristic point position degrees of offset, image-region variation degree etc. or a variety of, wherein described image Regional change degree may include the variation degree of the area in described image region, or may include in described image region, The part that changing occurs in pixel value accounts for the ratio size in described image region.At this point, illustrative, first preset condition can The offset of characteristic point coordinate to be such as target object is more than default bias amount threshold value, image-region in two frame video frames The ratio that the difference of area accounts for the area of one of image-region is more than default fractional threshold etc..First preset condition It can be configured, be not limited thereto according to concrete application scene.

Optionally, the scene type of each video frame in the determination video to be processed, comprising:

Scene knowledge is carried out to each video frame in the video to be processed by the second deep learning model after training Not, with the scene type of each video frame in the determination video to be processed；

Alternatively,

According to the title of the video to be processed, theme or default label, determine each in the video to be processed The scene type of video frame.

In the embodiment of the present invention, the second deep learning model can be identical as the first deep learning model, It can be different.Illustratively, if the second deep learning model is identical as the first deep learning model, then described Two deep learning models (the first deep learning model in other words) may include two cascade deep learning submodels, and respectively The scene type of each video frame in the video to be processed is determined by cascade deep learning submodel and to described Each video frame in video to be processed carries out target identification.Assuming that the deep learning submodel is respectively the first deep learning Submodel and the second deep learning submodel, then can by the first deep learning submodel described in the video input to be processed, The first output of the first deep learning submodel is obtained as a result, output result instruction comes in the video to be processed Each video frame scene Recognition, then the first of the first deep learning submodel the output result is inputted into the second depth Submodel is practised, obtains the second output of the second deep learning submodel as a result, simultaneously can be according to the second output result Obtain target identification result.Alternatively, the second deep learning model can also only include a convolutional neural networks model, and The scene type and target identification result of each video frame are obtained by the convolutional neural networks model.

Illustratively, the second deep learning model may include such as ResNet model, R-CNN model, Fast R- CNN model etc., the second deep learning model can there are many, be not limited thereto.

Optionally, after the multiple target frames for identifying the video to be processed, further includes:

The target frame is identified in the video to be processed.

Illustratively, timing node information, field corresponding to the target frame can be identified in the video to be processed One of scape classification information, information on target object (type of such as target object, characteristic point, positioning, image-region) information Or it is a variety of.The mode of the mark can there are many, for example, the target can be identified by rectangle frame of true-to-shape etc. The image-region of object can identify timing node corresponding to the target frame in the time schedule item of the video to be processed Etc..

Wherein, optionally, it when the continuous multiple frames video frame of the video to be processed is target frame, can identify described First frame and last frame in continuous multiple frames video frame.

The embodiment of the present invention passes through the scene type for determining each video frame, and true by the deep learning model after training With the presence or absence of target object corresponding to the scene type in fixed each video frame, can be had according to different scene types Targetedly target identification, so that target identification result is more targeted, interference is less；According to the target identification as a result, knowing Multiple target frames of the not described video to be processed, and target video is obtained according to the target frame, it can accurately and efficiently extract To the key message (such as identify and extract the image section comprising target individual) in the video to be processed, and obtain by closing The target video of key information composition, and redundancy in video to be processed, unessential video content are deleted, it remains valuable Video content, to search etc. in application scenarios in the key message to monitor video, user can be according to the mesh Mark video and carry out more efficient, more targeted processing, without manually being searched key message, editing the behaviour such as synthesize again Make.The embodiment of the present invention substantially increases the efficiency that user obtains the key message of video, and practicability and ease for use are stronger.

On the basis of the above embodiments, Fig. 2 is the implementation process of method for processing video frequency provided by Embodiment 2 of the present invention Schematic diagram, the method for processing video frequency as shown in Figure 2 may comprise steps of:

Step S201 obtains video to be processed；

Step S202 determines the scene type of each video frame in the video to be processed；

Step S203, according to the scene type of each video frame, by the first deep learning model after training to described Each video frame in video to be processed carries out target identification, obtains target identification result, wherein the target identification result refers to Show that the target object is corresponding with the scene type whether comprising target object in each video frame；

Step S204, according to the target identification as a result, multiple target frames of the identification video to be processed, wherein institute It states in target frame comprising the target object；

Step S205, according to chronological order of each target frame in the video to be processed, successively by multiple institutes It states target frame to be combined, obtains the target video；

The present embodiment step S201, S202, S203, S204, S205 and above-mentioned steps S101, S102, S103, S104, S105 is identical, and for details, reference can be made to step S101, S102, S103, S104, S105 associated descriptions, and details are not described herein.

Step S206 establishes information index according to the target frame.

In the embodiment of the present invention, the information index may include being directed toward the target frame and/or being directed toward and the target The link of the relevant information of frame or the inventory of pointer.The information index can be provided in direction specified data or node etc. The pointer of appearance or link, to quickly access information relevant to the target frame by the information index.The information Index can be set in interface associated and/or associated with the target video with the video to be processed, also, described Information index can be directed toward the video to be processed and/or the target video.In addition, it should be noted that, the information rope Draw can be embedded in the video to be processed perhaps in the target video or, or the video to be processed and Information interface (such as database, table, text etc.) except target video, display mode can there are many.

Optionally, described according to the target frame, establish information index, comprising:

The information index list about the target frame is established, the information index list includes the target frame described In the scene type information of temporal information, the target frame and the information on target object of the target frame in video to be processed It is at least one.

In the embodiment of the present invention, the temporal information can indicate that the target frame is corresponding in the video to be processed Timing node；Pass through the temporal information, it will be appreciated that the target frame is locating in the time shaft of the video to be processed Time location so that user can understand the target frame in different time referential by the temporal information Situation, without because by the target frame after being extracted in the video to be processed, segmentum intercalaris when being just lost original Point information.The scene type information can indicate that the scene type of the target frame, the information on target object may include The type of the target object, positioning, image-region etc. information relevant to the target object.

In the embodiment of the present invention, by establishing information index according to the target frame, it may be implemented to the target frame The efficient retrieval of relevant information greatly improves user to the lookup of the key message in the video to be processed, editor and divides The treatment effeciency of the operations such as analysis.

It should be understood that the size of the serial number of each step is not meant that the order of the execution order in above-described embodiment, each process Execution sequence should be determined by its function and internal logic, the implementation process without coping with the embodiment of the present invention constitutes any limit It is fixed.

Fig. 3 is the schematic diagram for the video process apparatus that the embodiment of the present invention three provides.For ease of description, illustrate only with The relevant part of the embodiment of the present invention.

The video process apparatus 300 includes:

Module 301 is obtained, for obtaining video to be processed；

Determining module 302, for determining the scene type of each video frame in the video to be processed；

First identification module 303 passes through the first deep learning after training for the scene type according to each video frame Model carries out target identification to each video frame in the video to be processed, obtains target identification result, wherein the target Recognition result indicates that the target object is corresponding with the scene type whether comprising target object in each video frame；

Second identification module 304, for according to the target identification as a result, multiple targets of the identification video to be processed Frame, wherein include the target object in the target frame；

Processing module 305 successively will for the chronological order according to each target frame in the video to be processed Multiple target frames are combined, and obtain the target video.

Correspondingly, second identification module 304 specifically includes:

Judging unit, for where the target object according to indicated by the target identification result region and/or Characteristic point position, judges whether each video frame in the video to be processed meets the first preset condition；

Determination unit, for the video frame for meeting first preset condition to be determined as to the target of the video to be processed Frame.

Optionally, the judging unit is specifically used for:

Correspondingly, the determination unit is specifically used for:

Optionally, the determining module 302 is specifically used for:

Alternatively,

Optionally, the video process apparatus 300 further include:

Mark module, for identifying the target frame in the video to be processed.

Optionally, the video process apparatus 300 further include:

Index module, for establishing information index according to the target frame.

Optionally, the index module is specifically used for:

The embodiment of the present invention passes through the scene type for determining each video frame, and true by the deep learning model after training With the presence or absence of target object corresponding to the scene type in fixed each video frame, can be had according to different scene types Targetedly target identification, so that target identification result is more targeted, interference is less；According to the target identification as a result, knowing Multiple target frames of the not described video to be processed, and target video is obtained according to the target frame, it can accurately and efficiently extract To the key message (such as identify and extract the image section comprising target individual) in the video to be processed, and obtain by closing The target video of key information composition, and redundancy in video to be processed, unessential video content are deleted, it remains valuable Video content, to search etc. in application scenarios in the key message to monitor video, user can be according to the mesh Mark video and carry out more efficient, more targeted processing, without manually being searched key message, editing the behaviour such as synthesize again Make.The embodiment of the present invention substantially increases the efficiency for the key message that user obtains in video, and practicability and ease for use are stronger.

Fig. 4 is the schematic diagram for the terminal device that the embodiment of the present invention four provides.As shown in figure 4, the terminal of the embodiment is set Standby 4 include: processor 40, memory 41 and are stored in the meter that can be run in the memory 41 and on the processor 40 Calculation machine program 42.The processor 40 is realized when executing the computer program 42 in above-mentioned each method for processing video frequency embodiment The step of, such as step 101 shown in FIG. 1 is to 105.Alternatively, realization when the processor 40 executes the computer program 42 The function of each module/unit in above-mentioned each Installation practice, such as the function of module 301 to 305 shown in Fig. 3.

Illustratively, the computer program 42 can be divided into one or more module/units, it is one or Multiple module/units are stored in the memory 41, and are executed by the processor 40, to complete the present invention.Described one A or multiple module/units can be the series of computation machine program instruction section that can complete specific function, which is used for Implementation procedure of the computer program 42 in the terminal device 4 is described.For example, the computer program 42 can be divided It is cut into and obtains module, determining module, the first identification module, the second identification module, processing module, each module concrete function is as follows:

Module is obtained, for obtaining video to be processed；

The terminal device 4 can be the calculating such as desktop PC, notebook, palm PC and cloud server and set It is standby.The terminal device may include, but be not limited only to, processor 40, memory 41.It will be understood by those skilled in the art that Fig. 4 The only example of terminal device 4 does not constitute the restriction to terminal device 4, may include than illustrating more or fewer portions Part perhaps combines certain components or different components, such as the terminal device can also include input-output equipment, net Network access device, bus etc..

Alleged processor 40 can be central processing unit (Central Processing Unit, CPU), can also be Other general processors, digital signal processor (Digital Signal Processor, DSP), specific integrated circuit (Application Specific Integrated Circuit, ASIC), ready-made programmable gate array (Field- Programmable Gate Array, FPGA) either other programmable logic device, discrete gate or transistor logic, Discrete hardware components etc..General processor can be microprocessor or the processor is also possible to any conventional processor Deng.

The memory 41 can be the internal storage unit of the terminal device 4, such as the hard disk or interior of terminal device 4 It deposits.The memory 41 is also possible to the External memory equipment of the terminal device 4, such as be equipped on the terminal device 4 Plug-in type hard disk, intelligent memory card (Smart Media Card, SMC), secure digital (Secure Digital, SD) card dodge Deposit card (Flash Card) etc..Further, the memory 41 can also both include the storage inside list of the terminal device 4 Member also includes External memory equipment.The memory 41 is for storing needed for the computer program and the terminal device Other programs and data.The memory 41 can be also used for temporarily storing the data that has exported or will export.

It is apparent to those skilled in the art that for convenience of description and succinctly, only with above-mentioned each function Can unit, module division progress for example, in practical application, can according to need and by above-mentioned function distribution by different Functional unit, module are completed, i.e., the internal structure of described device is divided into different functional unit or module, more than completing The all or part of function of description.Each functional unit in embodiment, module can integrate in one processing unit, can also To be that each unit physically exists alone, can also be integrated in one unit with two or more units, it is above-mentioned integrated Unit both can take the form of hardware realization, can also realize in the form of software functional units.In addition, each function list Member, the specific name of module are also only for convenience of distinguishing each other, the protection scope being not intended to limit this application.Above system The specific work process of middle unit, module, can refer to corresponding processes in the foregoing method embodiment, and details are not described herein.

In the above-described embodiments, it all emphasizes particularly on different fields to the description of each embodiment, is not described in detail or remembers in some embodiment The part of load may refer to the associated description of other embodiments.

Those of ordinary skill in the art may be aware that list described in conjunction with the examples disclosed in the embodiments of the present disclosure Member and algorithm steps can be realized with the combination of electronic hardware or computer software and electronic hardware.These functions are actually It is implemented in hardware or software, the specific application and design constraint depending on technical solution.Professional technician Each specific application can be used different methods to achieve the described function, but this realization is it is not considered that exceed The scope of the present invention.

In embodiment provided by the present invention, it should be understood that disclosed device/terminal device and method, it can be with It realizes by another way.For example, device described above/terminal device embodiment is only schematical, for example, institute The division of module or unit is stated, only a kind of logical function partition, there may be another division manner in actual implementation, such as Multiple units or components can be combined or can be integrated into another system, or some features can be ignored or not executed.Separately A bit, shown or discussed mutual coupling or direct-coupling or communication connection can be through some interfaces, device Or the INDIRECT COUPLING or communication connection of unit, it can be electrical property, mechanical or other forms.

The unit as illustrated by the separation member may or may not be physically separated, aobvious as unit The component shown may or may not be physical unit, it can and it is in one place, or may be distributed over multiple In network unit.It can select some or all of unit therein according to the actual needs to realize the mesh of this embodiment scheme 's.

It, can also be in addition, the functional units in various embodiments of the present invention may be integrated into one processing unit It is that each unit physically exists alone, can also be integrated in one unit with two or more units.Above-mentioned integrated list Member both can take the form of hardware realization, can also realize in the form of software functional units.

If the integrated module/unit be realized in the form of SFU software functional unit and as independent product sale or In use, can store in a computer readable storage medium.Based on this understanding, the present invention realizes above-mentioned implementation All or part of the process in example method, can also instruct relevant hardware to complete, the meter by computer program Calculation machine program can be stored in a computer readable storage medium, the computer program when being executed by processor, it can be achieved that on The step of stating each embodiment of the method.Wherein, the computer program includes computer program code, the computer program Code can be source code form, object identification code form, executable file or certain intermediate forms etc..Computer-readable Jie Matter may include: can carry the computer program code any entity or device, recording medium, USB flash disk, mobile hard disk, Magnetic disk, CD, computer storage, read-only memory (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory), electric carrier signal, telecommunication signal and software distribution medium etc..It should be noted that described The content that computer-readable medium includes can carry out increasing appropriate according to the requirement made laws in jurisdiction with patent practice Subtract, such as does not include electric carrier signal and electricity according to legislation and patent practice, computer-readable medium in certain jurisdictions Believe signal.

Embodiment described above is merely illustrative of the technical solution of the present invention, rather than its limitations；Although referring to aforementioned reality Applying example, invention is explained in detail, those skilled in the art should understand that: it still can be to aforementioned each Technical solution documented by embodiment is modified or equivalent replacement of some of the technical features；And these are modified Or replacement, the spirit and scope for technical solution of various embodiments of the present invention that it does not separate the essence of the corresponding technical solution should all It is included within protection scope of the present invention.

Claims

1. a kind of method for processing video frequency characterized by comprising

Obtain video to be processed；

Determine the scene type of each video frame in the video to be processed；

According to the scene type of each video frame, by the first deep learning model after training in the video to be processed Each video frame carries out target identification, obtains target identification result, wherein the target identification result indicates in each video frame It whether include target object, the target object is corresponding with the scene type；

According to the target identification as a result, multiple target frames of the identification video to be processed, wherein include in the target frame The target object；

According to chronological order of each target frame in the video to be processed, multiple target frames are successively subjected to group It closes, obtains the target video.

2. method for processing video frequency as described in claim 1, which is characterized in that if including in the video frame of the video to be processed Target object, then the target identification result also indicates the region where the target object and/or characteristic point position；

Region and/or characteristic point position according to where the target object indicated by the target identification result, judge institute Whether each video frame stated in video to be processed meets the first preset condition；

3. method for processing video frequency as claimed in claim 2, which is characterized in that described according to indicated by the target identification result Target object where position, judge whether each video frame in the video to be processed meets the first preset condition, wrap It includes:

It is signified according to the target identification result for each video frame in the video to be processed including the target object Region and/or characteristic point position where the target object shown judge in the video frame, the target object is relative to the view The difference journey in the region and/or characteristic point position where target object in a upper video frame for frequency frame or in next video frame Whether degree meets the first preset condition；

Correspondingly, the target frame that the video frame for meeting first preset condition is determined as to the video to be processed, packet It includes:

If in the video frame, the target object is relative in a upper video frame for the video frame or in next video frame The difference degree in region and/or characteristic point position where target object meets the first preset condition, then using the video frame as The target frame of the video to be processed.

4. method for processing video frequency as described in claim 1, which is characterized in that each in the determination video to be processed The scene type of video frame, comprising:

Scene Recognition is carried out to each video frame in the video to be processed by the second deep learning model after training, with Determine the scene type of each video frame in the video to be processed；

Alternatively,

According to the title of the video to be processed, theme or default label, each video in the video to be processed is determined The scene type of frame.

5. method for processing video frequency as described in claim 1, which is characterized in that in the multiple targets for identifying the video to be processed After frame, further includes:

The target frame is identified in the video to be processed.

6. the method for processing video frequency as described in claim 1 to 5 any one, which is characterized in that identifying the view to be processed After multiple target frames of frequency, further includes:

According to the target frame, information index is established.

7. method for processing video frequency as claimed in claim 6, which is characterized in that it is described according to the target frame, establish information rope Draw, comprising:

Establish information index list about the target frame, the information index list include the target frame described wait locate In the information on target object of temporal information, the scene type information of the target frame and the target frame in reason video at least It is a kind of.

8. a kind of video process apparatus characterized by comprising

Module is obtained, for obtaining video to be processed；

First identification module passes through the first deep learning model pair after training for the scene type according to each video frame Each video frame in the video to be processed carries out target identification, obtains target identification result, wherein the target identification knot Fruit indicates that the target object is corresponding with the scene type whether comprising target object in each video frame；

Second identification module, for according to the target identification as a result, multiple target frames of the identification video to be processed, In, it include the target object in the target frame；

Processing module, for the chronological order according to each target frame in the video to be processed, successively by multiple institutes It states target frame to be combined, obtains the target video.

9. a kind of terminal device, including memory, processor and storage are in the memory and can be on the processor The computer program of operation, which is characterized in that the processor realizes such as claim 1 to 7 when executing the computer program The step of any one method for processing video frequency.

10. a kind of computer readable storage medium, the computer-readable recording medium storage has computer program, and feature exists In the step of realization method for processing video frequency as described in any one of claim 1 to 7 when the computer program is executed by processor Suddenly.