CN108830208A

CN108830208A - Method for processing video frequency and device, electronic equipment, computer readable storage medium

Info

Publication number: CN108830208A
Application number: CN201810588001.0A
Authority: CN
Inventors: 陈岩
Original assignee: Guangdong Oppo Mobile Telecommunications Corp Ltd
Current assignee: Guangdong Oppo Mobile Telecommunications Corp Ltd
Priority date: 2018-06-08
Filing date: 2018-06-08
Publication date: 2018-11-16
Also published as: WO2019233262A1

Abstract

This application involves a kind of method for processing video frequency and device, electronic equipment, computer readable storage medium.The method includes：Scene Recognition is carried out to the image in video, obtains the scene tag of image, obtains the target image of the crucial label comprising input in scene tag, target video is generated according to target image.In the above method, since target image can be obtained according to the crucial label of input, target video is generated, can simplify the operation of video clipping.

Description

Method for processing video frequency and device, electronic equipment, computer readable storage medium

Technical field

This application involves field of computer technology, more particularly to a kind of method for processing video frequency and device, electronic equipment, meter Calculation machine readable storage medium storing program for executing.

Background technique

With the development of computer technology, video becomes one of the important entertainment way in people's daily life.Increasingly More people passes through on network sharing to social network sites by application program editor video, and by short-sighted frequency.When people want to upload When the video shot in the past, need to carry out video into the short-sighted frequency that editing is 10s or so.

In conventional method, people are screened by each image checked in video, can be 10s or so by video clipping Short-sighted frequency.However, current video clipping method there is a problem of it is cumbersome.

Summary of the invention

The embodiment of the present application provides a kind of method for processing video frequency, device, electronic equipment, computer readable storage medium, can To simplify video clipping operation.

A kind of method for processing video frequency, including：

Scene Recognition is carried out to the image in video, obtains the scene tag of described image；

Obtain the target image of the crucial label comprising input in the scene tag；

Target video is generated according to the target image.

A kind of video process apparatus, including：

Scene Recognition module obtains the scene tag of described image for carrying out scene Recognition to the image in video；

Target image obtains module, for obtaining the target image of the crucial label comprising input in the scene tag；

Target video generation module, for generating target video according to the target image.

A kind of electronic equipment, including memory and processor store computer program, the calculating in the memory When machine program is executed by the processor, so that the processor executes following steps：

Target video is generated according to the target image.

A kind of computer readable storage medium, is stored thereon with computer program, and the computer program is held by processor Following steps are realized when row：

Target video is generated according to the target image.

Above-mentioned method for processing video frequency and device, electronic equipment, computer readable storage medium carry out the image in video Scene Recognition obtains the scene tag of image, the target image of the crucial label comprising input in scene tag is obtained, according to mesh Logo image generates target video.Since target image can be obtained according to the crucial label of input, target video, Ke Yijian are generated Change the operation of video clipping.

Detailed description of the invention

In order to illustrate the technical solutions in the embodiments of the present application or in the prior art more clearly, to embodiment or will show below There is attached drawing needed in technical description to be briefly described, it should be apparent that, the accompanying drawings in the following description is only this Some embodiments of application for those of ordinary skill in the art without creative efforts, can be with It obtains other drawings based on these drawings.

Fig. 1 is the schematic diagram of internal structure of electronic equipment in one embodiment；

Fig. 2 is the flow chart of method for processing video frequency in one embodiment；

Fig. 3 is the flow chart that target video is generated in one embodiment；

Fig. 4 is the flow chart that target video is generated in another embodiment；

Fig. 5 is the flow chart that target video is generated in another embodiment；

Fig. 6 is the flow chart that target video is generated in one embodiment；

Fig. 7 is the configuration diagram of neural network in one embodiment；

Fig. 8 is the structural block diagram of video process apparatus in one embodiment；

Fig. 9 is the schematic diagram of information-processing circuit in one embodiment.

Specific embodiment

It is with reference to the accompanying drawings and embodiments, right in order to which the objects, technical solutions and advantages of the application are more clearly understood The application is further elaborated.It should be appreciated that specific embodiment described herein is only used to explain the application, and It is not used in restriction the application.

It is appreciated that term " first " used in this application, " second " etc. can be used to describe various elements herein, But these elements should not be limited by these terms.These terms are only used to distinguish the first element from the other element.Citing comes It says, in the case where not departing from scope of the present application, the first client can be known as the second client, and similarly, can incite somebody to action Second client is known as the first client.The first client and the second client both client, but it is not same visitor Family end.

Fig. 1 is the schematic diagram of internal structure of electronic equipment in one embodiment.As shown in Figure 1, the electronic equipment includes logical Cross processor, memory and the network interface of system bus connection.Wherein, which is used to provide calculating and control ability, Support the operation of entire electronic equipment.Memory for storing data, program etc., at least one computer journey is stored on memory Sequence, the computer program can be executed by processor, to realize that is provided in the embodiment of the present application is suitable for the wireless of electronic equipment Network communication method.Memory may include non-volatile memory medium and built-in storage.Non-volatile memory medium is stored with behaviour Make system and computer program.The computer program can be performed by processor, to be mentioned for realizing following each embodiment A kind of method for processing video frequency supplied.Built-in storage provides high speed for the operating system computer program in non-volatile memory medium The running environment of caching.Network interface can be Ethernet card or wireless network card etc., for being led to external electronic equipment Letter.The electronic equipment can be mobile phone, tablet computer or personal digital assistant or wearable device etc..

Fig. 2 is the flow chart of method for processing video frequency in one embodiment.Method for processing video frequency in the present embodiment, with operation It is described on electronic equipment in Fig. 1.As shown in Fig. 2, method for processing video frequency includes step 202 to step 206.

Step 202, scene Recognition is carried out to the image in video, obtains the scene tag of image.

Video refers to any video on electronic equipment.Specifically, video can be electronic equipment and be acquired by camera Video, be also possible to be stored in the video of electronic equipment local, can also be the video etc. that electronic equipment is downloaded from network.Depending on The continuous pictures that frequency is made of multiframe still image.Electronic equipment carries out scene Recognition to the image in video, specifically, electricity At least frame image that sub- equipment can obtain video at random carries out scene Recognition, can also be obtained in video according to preset condition Image carries out scene Recognition.For example, electronic equipment, which can obtain a frame image every default frame, carries out scene Recognition, it can also be with every It is without being limited thereto that frame image progress scene Recognition etc. is obtained every prefixed time interval.

Electronic equipment carries out scene Recognition to the image in video specifically can be according to VGG (Visual Geometry Group, visual geometric group), CNN (Convolutional Neural Network, convolutional neural networks), decision tree (Decision Tree), random forest (Random forest) even depth learning algorithm Training scene identification model, according to field Scape identification model carries out scene Recognition to image.The scene of image can be landscape, seabeach, blue sky, greenweed, snow scenes, pyrotechnics, gather Light lamp, text, portrait, baby, cat, dog, cuisines etc..The scene tag of image refers to the scene classification label of image.Specifically, It can be using the scene Recognition result of image as the scene tag of image.For example, when the scene Recognition result of image is blue sky, Then the scene tag of image is blue sky.Electronic equipment can carry out scene Recognition to multiple image in video, obtain each in video The corresponding scene tag of frame image.

Step 204, the target image of the crucial label comprising input in scene tag is obtained.

Crucial label refers to main scene tag in the target video of electronic equipment editing completion.Specifically, electronics is set The standby crucial label that can receive input, the crucial label of input can be text, be also possible to image, audio, video etc., electricity Sub- equipment can convert the image of input, audio, video etc. to text as crucial label.Electronic equipment can be according to input Crucial label, obtain scene tag in comprising input crucial label target image.For example, when the crucial label of input is When blue sky, electronic equipment obtains the image that scene tag in the image of video includes blue sky scene tag, using the image as mesh Logo image.In one embodiment, electronic equipment can also prestore the corresponding scene tag of crucial label, according to the key of input Label determines corresponding scene tag, obtains the target image comprising the scene tag.Such as the crucial mark of snow scenes scene tag Label can be snow, snow, snow scenes etc., and when the crucial label that electronic equipment receives input is to avenge or snow, then electronic equipment can To obtain the image in scene tag comprising snow scenes as target image.The crucial label of input can be one be also possible to it is more A, electronic equipment can include the image of multiple crucial labels according to the crucial label of input as target image simultaneously, can also To obtain the image comprising at least one crucial label as target image.

The crucial label of the available input of electronic equipment, according to including the key label in crucial tag search video Image obtains the image comprising crucial label in the scene tag as target image.

Step 206, target video is generated according to target image.

Target video refers to the video carried out after the completion of editing to video.Electronic equipment can directly carry out target image Composition generates target video；In the case where limiting duration, electronic equipment is by improving the frame per second of target video to target video It carries out being compressed to restriction duration, or obtains the biggish target image life in apparent or crucial label region from target image At target video.The target image that electronic equipment can also will acquire is supplied to user, and it is raw to obtain the target image that user chooses At target video.Preselected fragment can also be supplied to by electronic equipment using target image continuous in video as preselected fragment User obtains the preselected fragment that user chooses and generates target video.

Method for processing video frequency provided by the embodiments of the present application can carry out scene Recognition to the image in video, obtain figure The scene tag of picture, obtains the target image of the crucial label comprising input in scene tag, generates target according to target image Video.Due to the target image of the available crucial label comprising input, target video is generated, manual retrieval's video is not needed In each frame image, simplify the operation of video clipping.

As shown in figure 3, in one embodiment, the method for processing video frequency provided further includes step 302 to step 306.Its In,

Step 302, the timestamp of target image in video is extracted.

Timestamp refers to the time point of target image in video.Specifically, electronic equipment can be according to the frame per second of video It is obtained with the serial number of target image in video.For example, when video frame rate is that 20 frames are per second, the 2nd frame image in video Timestamp is 0.05s, and the 10th frame image is 0.45s, and the 100th frame image is 4.5s.Electronic equipment according to the target image of extraction, Obtain target image corresponding timestamp in video.

Step 304, when the difference of the timestamp of two adjacent target images is less than threshold value, then determine adjacent two Target image is same segment.

Threshold value can be determined according to video frame rate and actual demand.For example, being threshold in 20 frames video per second in frame per second When value can be 0.2s, if the timestamp of adjacent two frames target image is less than 0.2s, which is same One segment, if the timestamp of adjacent two frames target image is greater than 0.2s, which is different fragments；Depending on The threshold value of frequency can also be that 0.1s, 0.15s, 0.3s etc. are without being limited thereto.When the difference of the timestamp of two adjacent target images When less than threshold value, then determine that two adjacent frame target images are same segment, the segment include adjacent two frame target images and Other images between the two.

Step 306, the segment for clip durations being greater than the first preset duration is generated as preselected fragment according to preselected fragment Target video.

First preset duration can determine according to actual needs.Such as first preset duration can be 1s, 2s, 3s etc. no It is limited to this.Segment refers to that the time tolerance of two frame target images of arbitrary neighborhood in the segment is respectively less than threshold value.Clip durations Refer to the difference of the first two frame target images in the segment.Preselected fragment refers to the segment that can be used for generating target video.Electronics Equipment can detecte the clip durations of each segment, filter out segment of the clip durations greater than the first preset duration as pre- chip select Section.Electronic equipment, which generates target video according to preselected fragment, specifically can be supplied to user for preselected fragment, by user couple Preselected fragment carries out being compiled as target video, is also possible to electronic equipment automatically for preselected fragment combination producing target video.

Electronic equipment is by extracting the timestamp of target image in video, according to the time of two adjacent target images The difference of stamp determines whether two adjacent target images are same segment, can be to avoid because generating jump during video capture And causing the case where there are more invalid frame images in segment, clip durations are greater than the segment of the first preset duration by electronic equipment As preselected fragment, preselected fragment is supplied to user's editing or directly generates target video, can simplify the behaviour of video clipping Make, improves the efficiency of video clipping.

As shown in figure 4, in one embodiment, generating target video according to preselected fragment in the method for processing video frequency provided Process further include：

Step 402, it receives and instruction is chosen to preselected fragment.

Specifically, choose instruction can be user click display screen on button generate, be also possible to user by by The control on touch screen is pressed to generate.Electronic equipment can receive simultaneously chooses instruction at least one preselected fragment.Electronics is set It is standby to receive to preselected fragment when choosing instruction, preselected fragment is labeled as selected state.

Step 404, according to the multiple preselected fragments generation target videos for choosing instruction that will choose.

The preselected fragment for choosing instruction to choose can be 1 be also possible to it is multiple, electronic equipment can by choose 1 or Multiple preselected fragments form target video.Preselected fragment is supplied to user and carries out editing by electronic equipment, can receive user Instruction is chosen to preselected fragment, according to choosing instruction that the multiple preselected fragments chosen are generated target video, can simplify view The operation of frequency editing improves the efficiency of video clipping.

As shown in figure 5, in one embodiment, generating target video according to preselected fragment in the method for processing video frequency provided Process can also include step 502 to step 506.Wherein：

Step 502, whether when preselected fragment is greater than the second preset duration, detecting has in the target image in preselected fragment Portrait label.

Second preset duration refers to the restriction duration of target video.Specifically, the second preset duration can be user's setting Target video duration, be also possible to electronic equipment according to the application scenarios of target video determine.For example, defining video Duration is in the video website of 10s, when video preparation of user's selection greater than 10s uploads to video website, electronic equipment Video length, that is, 10s that available video website limits.Portrait label in target image can have one or more.Portrait Label refers to that there are face or portraits in image, and specifically, face can be positive face, side face, portrait can positive shooting, side Shoot, be also possible to that background shoot etc. it is without being limited thereto.Electronic equipment is before obtaining target image, to target figure Therefore whether the scene tag as carrying out and obtaining target image has in the target image in electronic equipment detection preselected fragment Portrait label can directly acquire the corresponding scene tag of each target image in preselected fragment, when the corresponding field of each target image When scape label has at least one portrait label, it is determined that have portrait label in the target image pre-selected.

Step 504, when having portrait label in the target image in preselected fragment, using the traversal frame of the second preset duration Preselected fragment is traversed, the frequency of occurrence of portrait label in the corresponding sub-piece of traversal frame of the second preset duration of detection.

Specifically, when traversing preselected fragment using the traversal frame of the second preset duration, the time interval of traversal can basis Actual demand determines, such as to can be 0.5s, 1s, 2s without being limited thereto.The frequency of occurrence of portrait label, which refers in sub-piece, wraps Target image quantity containing portrait label.Electronic equipment detects the corresponding son of traversal frame of the second preset duration in ergodic process The frequency of occurrence of portrait label in segment.For example, a length of 20s when preselected fragment, when the second preset duration is 10s, electronics Equipment can traverse the video using the traversal frame of 10s, and when being divided into 1s between traversal, the available traversal frame of electronic equipment is corresponding 10 sub-pieces, and detect 10 sub-pieces in portrait label frequency of occurrence.

Step 506, using the most sub-piece of the frequency of occurrence of portrait label as target video.

In daily life, portrait is an important scenes in people's shooting video, and electronic equipment is from portrait mark The most sub-piece of portrait label frequency of occurrence is filtered out in the preselected fragment of label as target video, can more embody target video Content.Meanwhile when the duration of preselected fragment is less than the second preset duration, electronic equipment can be by one or more pre- chip selects Section is automatically composed the target video of the second preset duration；When preselected fragment is greater than the second preset duration, electronic equipment can be incited somebody to action This is greater than the sub-piece for filtering out the second preset duration in the preselected fragment of the second preset duration as target video, simplifies view The operation of frequency editing.

In one embodiment, as shown in fig. 6, generating target video according to preselected fragment in the method for processing video frequency provided Process can also include step 602 to step 606.Wherein：

Step 602, when preselected fragment is greater than the second preset duration, pre-selection is traversed using the traversal frame of the second preset duration Segment establishes the color histogram of the corresponding sub-piece of traversal frame of the second preset duration.

Color histogram refers to the figure constructed according to distribution of color situation.Electronic equipment can be when preselected fragment When length is greater than the second preset duration, preselected fragment is traversed using the traversal frame of the second preset duration, according to the second preset duration The distribution of color situation of target image establishes the corresponding color histogram of the sub-piece in the corresponding sub-piece of traversal frame.Specifically Ground, electronic equipment can extract the color parameter of each pixel of target image in sub-piece, be joined according to the color of pixel Number determines the color of pixel, the pixel number of various colors in each target image is counted, according to color and corresponding picture Vegetarian refreshments number establishes color histogram；It can also be according to all pixels point in color number of corresponding pixels and target image Several ratio obtains the frequency of occurrences of the color, establishes face according to the color of target image in sub-piece and the corresponding frequency of occurrences Color Histogram.The color parameter of pixel can use RGB (Red, Green, Blue, red, green, blue) color space determine, It can also be determined using HSB (hue, saturation, brightness, form and aspect, saturation degree, lightness) color space, it can be with It is determined using HSL (hue, saturation, lightness, form and aspect, saturation degree, brightness) color space.

In one embodiment, electronic equipment can determine the color of pixel, electronic equipment using HSB color space Color parameter range of the various colors in HSB color space can be prestored, if the parameter area of yellow is 30<H<90,0.3<S <1,50<B<230, green parameter area is：90<H<180,0.3<S<1,50<B<230, blue parameter area is 180<H< 270,0.3<S<1,50<B<230 etc., then electronic equipment can determine the corresponding face of pixel according to the color parameter of pixel Color.For example, when the HSB color parameter of the A pixel in target image is H=95, S=0.2, B=60, then pixel A For color parameter within the scope of the color parameter of green, the color of A pixel is green.

Specifically, electronic equipment can be using color as the abscissa of color histogram, the picture of the color in the sub-piece Vegetarian refreshments number establishes color histogram as the ordinate of color histogram, then electronic equipment according to color histogram it can be concluded that The distribution situation of color in the sub-piece.

Step 604, the dispersion degree of color histogram is detected.

The dispersion degree of color histogram refers in color histogram, the difference degree between the pixel number of color. Dispersion degree is smaller, then the difference of the pixel number of each color is smaller in sub-piece, then each distribution of color in sub-piece It is more uniform；Dispersion degree is bigger, then the difference of each color number of corresponding pixels is bigger in sub-piece, then each in sub-piece A distribution of color area difference is larger, occurs the identical region of large area color in sub-piece.Electronic equipment can calculate color The various ways such as the very poor of colored pixels point number, mean difference, standard deviation or the variance of histogram are to the discrete of color histogram Degree is detected.It is detected by the dispersion degree to color histogram, each height in the available video of electronic equipment The dispersion degree of the color histogram of segment.

Step 606, using the corresponding sub-piece of the smallest color histogram of dispersion degree as target video.

The dispersion degree of color histogram is minimum, then each compared to other sub-pieces in the corresponding sub-piece of the color histogram More evenly, i.e., color is more abundant for a distribution of color.When being greater than the second preset duration by the duration in preselected fragment, using the The traversal frame of two preset durations traverses preselected fragment, establishes the color histogram of the corresponding sub-piece of traversal frame of the second preset duration Figure, according to color histogram using the corresponding sub-piece of the smallest color histogram of dispersion degree as target video, then obtains The target video of second preset duration is the scene tag and the most abundant segment of distribution of color in the video both comprising input.

In one embodiment, scene Recognition is carried out to the image in video in the method for processing video frequency provided, obtains figure The process of the scene tag of picture further includes：Scene Recognition is carried out to the image in video, it is corresponding multiple to obtain image in video Scene tag.

Electronic equipment can training can export the neural networks of multiple scene tags.Specifically, in neural metwork training In the process, the training image comprising multiple trained labels can be input in neural network, neural network to training image into Row feature extraction is detected to obtain the corresponding forecast confidence of each feature in image to the characteristics of image of extraction, according to spy The forecast confidence of sign and true confidence level obtain loss function, are adjusted according to parameter of the loss function to neural network, So that the subsequent corresponding scene tag of multiple features that can identify image simultaneously of neural network of training, to obtain exporting multiple The neural network of scene tag.Confidence level is the credibility for being measured the measured value of parameter.True confidence level is indicated in the instruction Practice the confidence level of the affiliated given scenario classification of feature marked in advance in image.

Electronic equipment can also train the neural network that scene classification and target detection can be achieved at the same time.Specifically, in mind It can will include that the training image of at least one background training objective and prospect training objective inputs through in network training process Into neural network, neural network carries out feature extraction according to background training objective and prospect training objective, to background training mesh Mark is detected to obtain the first forecast confidence, obtains first-loss letter according to the first forecast confidence and the first true confidence level Number, detects prospect training objective to obtain the second forecast confidence, according to the second forecast confidence and the second true confidence Degree obtains the second loss function, target loss function is obtained according to first-loss function and the second loss function, to neural network Parameter be adjusted so that the neural network of training is subsequent can to identify scene classification and target classification simultaneously, by scene point The scene tag of class and target classification as image, to obtain simultaneously carrying out the foreground area of image and background area The neural network of detection.Confidence level is the credibility for being measured the magnitude of parameter.The first true confidence level is indicated in the instruction Practice the confidence level that image category is specified belonging to the background image marked in advance in image.Second true confidence level is indicated in the training The confidence level of target category is specified belonging to the foreground target marked in advance in image.

In one embodiment, above-mentioned neural network include at least one input layer, facilities network network layers, sorter network layer, Target detection network layer and two output layers, two output layers include with cascade first output layer of the sorter network layer and with Cascade second output layer of the target detection network layer；Wherein, in the training stage, which is used to receive the training image, First output layer is used to export the first prediction confidence of the affiliated given scenario classification of background image of sorter network layer detection Degree；Second output layer is used to export belonging to the default boundary frame of each pre-selection of target detection network layer detection relative to finger Set the goal the offset parameter of corresponding real border frame and the second forecast confidence of affiliated specified target category.Fig. 7 is The configuration diagram of neural network in one embodiment.As shown in fig. 7, the input layer of neural network, which receives, has image category mark The training image of label carries out feature extraction by basic network (such as VGG network), and the characteristics of image of extraction is exported to feature Layer carries out classification to image by this feature layer and detects to obtain first-loss function, carries out mesh according to characteristics of image to foreground target Mark detection obtains the second loss function, carries out position detection according to foreground target to foreground target and obtains position loss function, will First-loss function, the second loss function and position loss function are weighted summation and obtain target loss function.Neural network Including data input layer, facilities network network layers, scene classification network layer, target detection network layer and two output layers.Data input Layer is for receiving raw image data.Facilities network network layers carry out pretreatment and feature extraction to the image that input layer inputs.It should Pretreatment may include mean value, normalization, dimensionality reduction and whitening processing.Mean value is gone to refer to all centralizations of each dimension of input data It is 0, it is therefore an objective to which the center of sample is withdrawn on coordinate origin.Normalization is by amplitude normalization to same range.It is white Change refers to the amplitude normalization on each feature axis of data.Image data carries out feature extraction, such as utilizes before VGG16 5 Layer convolutional layer carries out feature extraction to original image, then the feature of extraction is input to sorter network layer and target detection network Layer.Sorter network layer can be used the depth convolution such as Mobilenet network, point convolution feature is detected, then input The first forecast confidence that image category is specified belonging to image scene classification is obtained to output layer, then according to the first prediction confidence Degree asks difference to obtain first-loss function with the first true confidence level；It can be used in target detection network layer such as SSD network, Concatenated convolutional characteristic layer after preceding 5 layers of the convolutional layer of VGG16 is predicted to specify in convolution characteristic layer using one group of convolution filter Pre-selection default boundary frame corresponding to target category is right relative to the offset parameter of real border frame and specified target category institute The second forecast confidence answered.Area-of-interest is the region for preselecting default boundary frame.Position damage is constructed according to offset parameter Function is lost, the second loss function is obtained according to the difference of the second forecast confidence and the second true confidence level.By first-loss letter Number, the second loss function and position loss function weighted sum obtain target loss function, according to target loss function using anti- To the parameter of propagation algorithm adjustment neural network, neural network is trained.

When being identified using trained neural network to image, neural network input layer receives the image of input, mentions The feature for taking image is input to sorter network layer and carries out image scene identification, defeated by softmax classifier in the first output layer The confidence level of each given scenario classification belonging to background image out chooses confidence level highest and is more than the picture field of confidence threshold value Scene classification label belonging to background image of the scape as the image.The feature of the image of extraction is input to target detection network Layer carries out foreground target detection, is exported in the second output layer by softmax classifier and specifies target category belonging to foreground target Confidence level and corresponding position, choose confidence level highest and be more than confidence threshold value target category as prospect in the image Target classification label belonging to target, and export the corresponding position of target classification label.By obtained scene classification label and Scene tag of the target classification label as image.

In one embodiment, scene Recognition is carried out to the image in video in the method for processing video frequency provided, obtains figure The process of the scene tag of picture further includes：A frame image is extracted at interval of default frame from video, extracted image is carried out Scene Recognition obtains the corresponding scene tag of image.

Specifically, it is without being limited thereto to can be 1 frame, 2 frames, 3 frames etc. for default frame.For example, in the video of a 10s, video Frame per second is that 20 frames are per second, if extracting each frame image in video, needs to extract 200 images and carries out scene Recognition, if according to Default frame extracts a frame image at interval of default frame to from video, then when default frame is 1 frame, then electronic equipment is at interval of one Frame extracts a frame image namely 100 images and carries out scene Recognition.Electronic equipment according to default frame in video at interval of pre- If frame extracts a frame image and carries out scene Recognition, the workload of electronic equipment can be greatly reduced.

Also, electronic equipment can obtain target image according to testing result, and the timestamp according to adjacent target image is poor It is worth and determines that adjacent target image is in same segment, the segment which is greater than the first preset duration is raw as preparatory segment At target video, the target image of the scene tag comprising input will not be filtered out because reducing the detection to image, was both simplified The process of video clipping again improves the working efficiency of electronic equipment.

In one embodiment, a kind of method for processing video frequency is provided, this method is realized specific step is as follows and is described：

Firstly, electronic equipment carries out scene Recognition to the image in video, the scene tag of image is obtained.Video is by more The continuous pictures of frame still image composition.Electronic equipment carries out scene Recognition to the image in video, and specifically, electronic equipment can Scene Recognition is carried out to obtain an at least frame image for video at random, image in video can also be obtained according to preset condition and is carried out Scene Recognition.Electronic equipment in video image carry out scene Recognition, specifically, can according to VGG, CNN, decision tree, with Machine forest even depth learning algorithm Training scene identification model carries out scene Recognition to image according to scene Recognition model.

Optionally, electronic equipment carries out scene Recognition to the image in video, obtains the corresponding multiple fields of image in video Scape label.Electronic equipment can training can export the neural networks of multiple scene tags, can specifically train can be real simultaneously The neural network of existing scene classification and target detection.When being identified using trained neural network to image, neural network Input layer receives the image of input, extracts the feature of image, is input to sorter network layer and carries out image scene identification, in output layer The confidence level that classification is specified belonging to characteristics of image and corresponding position are obtained, using the highest target category of confidence level as image spy Classification belonging to sign, will obtain scene tag of the tagsort as image.

Optionally, electronic equipment extracts a frame image at interval of default frame from video, carries out field to extracted image Scape identification, obtains the corresponding scene tag of image.Electronic equipment extracts a frame at interval of default frame in video according to default frame Image carries out scene Recognition, can greatly reduce the workload of electronic equipment.Also, electronic equipment can be according to testing result Target image is obtained, determines that adjacent target image is in same segment, the segment according to the time tolerance of adjacent target image The segment that duration is greater than the first preset duration generates target video as preparatory segment, will not be because reducing the detection to image due to mistake The target image of the scene tag comprising input is filtered, the process for not only simplifying video clipping improves the work of electronic equipment again Efficiency.

Then, electronic equipment obtains the target image of the crucial label comprising input in scene tag.Crucial label refers to Main scene tag in the target video that electronic equipment editing is completed.Specifically, electronic equipment can receive the key of input The crucial label of label, input can be text, be also possible to image, audio, video etc., and electronic equipment can be by the figure of input Picture, audio, video etc. are converted into text as crucial label.The crucial label of the available input of electronic equipment, according to key Include the image of the key label in tag search video, obtains the image comprising crucial label in the scene tag as target Image.

Then, electronic equipment generates target video according to target image.Target video, which refers to, carries out editing completion to video Video afterwards.Electronic equipment can directly form target image, generate target video；Electronic equipment can also will acquire Target image be supplied to user, obtain the target image that user chooses and generate target video.Electronic equipment can also be by video In continuous target image be used as preselected fragment, preselected fragment is supplied to user, obtains the preselected fragment generation that user chooses Target video.

Optionally, electronic equipment extracts target image timestamp in video, when two adjacent target images when Between the difference stabbed when being less than threshold value, then determine that two adjacent target images are same segment, it is pre- that clip durations are greater than first If the segment of duration generates target video as preselected fragment, according to preselected fragment.Electronic equipment is existed by extracting target image Whether timestamp in video determines two adjacent target images according to the difference of the timestamp of two adjacent target images It, can there are the feelings of more invalid frame image to avoid causing in segment due to generating jump during video capture for same segment Preselected fragment is supplied to user as preselected fragment by condition, the segment that clip durations are greater than the first preset duration by electronic equipment Editing directly generates target video, can simplify the operation of video clipping, improves the efficiency of video clipping.

Optionally, electronic equipment reception instruction is chosen to preselected fragment, according to choose instruction by choose it is multiple described in Preselected fragment generates target video.Electronic equipment can receive simultaneously chooses instruction at least one preselected fragment.Choose finger It enables the preselected fragment chosen can be 1 and is also possible to one or more multiple, electronic equipment can will be chosen preselected fragment groups At target video.Preselected fragment is supplied to user and carries out editing by electronic equipment, can receive choosing of the user to preselected fragment Middle instruction can simplify the operation of video clipping, mention according to choosing instruction that the multiple preselected fragments chosen are generated target video The efficiency of high video clipping.

Optionally, when preselected fragment is greater than the second preset duration, electronic equipment detects the target image in preselected fragment In whether have portrait label, when having portrait label in the target image in preselected fragment, using the traversal of the second preset duration Frame traverses preselected fragment, the frequency of occurrence of portrait label in the corresponding sub-piece of traversal frame of the second preset duration of detection, by people As the most sub-piece of the frequency of occurrence of label is as target video.

Optionally, when preselected fragment is greater than the second preset duration, electronic equipment uses the traversal frame of the second preset duration Preselected fragment is traversed, the color histogram of the corresponding sub-piece of traversal frame of the second preset duration is established, detects color histogram Dispersion degree, using the corresponding sub-piece of the smallest color histogram of dispersion degree as target video.By in preselected fragment Duration be greater than the second preset duration when, using the second preset duration traversal frame traverse preselected fragment, establish second it is default when The color histogram of the corresponding sub-piece of long traversal frame, according to color histogram by the smallest color histogram pair of dispersion degree For the sub-piece answered as target video, then the target video of the second preset duration obtained is the field in the video both comprising input Scape label and the most abundant segment of distribution of color.

It should be understood that although each step in the flow chart of Fig. 1-6 is successively shown according to the instruction of arrow, These steps are not that the inevitable sequence according to arrow instruction successively executes.Unless expressly stating otherwise herein, these steps Execution there is no stringent sequences to limit, these steps can execute in other order.Moreover, at least one in Fig. 1-6 Part steps may include that perhaps these sub-steps of multiple stages or stage are not necessarily in synchronization to multiple sub-steps Completion is executed, but can be executed at different times, the execution sequence in these sub-steps or stage is also not necessarily successively It carries out, but can be at least part of the sub-step or stage of other steps or other steps in turn or alternately It executes.

Fig. 8 is the structural block diagram of the video process apparatus of one embodiment.As shown in figure 8, the video process apparatus provided In include scene Recognition module 802, target image obtain module 804, target video generation module 806.Wherein：

Scene Recognition module 802 obtains the scene tag of image for carrying out scene Recognition to the image in video.

Target image obtains module 804, for obtaining the target image of the crucial label comprising input in scene tag.

Target video generation module 806, for generating target video according to target image.

In one embodiment, target video generation module 806 can be also used for extract target image in video when Between stab, when the difference of the timestamp of two adjacent target images be less than threshold value when, then determine that two adjacent target images are Same segment, the segment that clip durations are greater than the first preset duration generate target view according to preselected fragment as preselected fragment Frequently.

In one embodiment, target video generation module 806, which can be also used for receiving, chooses instruction to preselected fragment, According to the multiple preselected fragments generation target videos for choosing instruction that will choose.

In one embodiment, target video generation module 806 can be also used for when preselected fragment is default greater than second When long, whether have portrait label, when having portrait in the target image in preselected fragment if detecting in the target image in preselected fragment When label, preselected fragment is traversed using the traversal frame of the second preset duration, detects the corresponding son of traversal frame of the second preset duration The frequency of occurrence of portrait label in segment, using the most sub-piece of the frequency of occurrence of portrait label as target video.

In one embodiment, target video generation module 806 can be also used for when preselected fragment is default greater than second When long, preselected fragment is traversed using the traversal frame of the second preset duration, establishes the corresponding sub-pieces of traversal frame of the second preset duration The color histogram of section, detects the dispersion degree of color histogram, by the corresponding sub-pieces of the smallest color histogram of dispersion degree Duan Zuowei target video.

In one embodiment, picture recognition module 802 can be also used for carrying out scene Recognition to the image in video, obtain The corresponding multiple scene tags of image into video.

In one embodiment, picture recognition module 802 can be also used for extracting a frame at interval of default frame from video Image carries out scene Recognition to extracted image, obtains the corresponding scene tag of image.

The division of modules is only used for for example, in other embodiments, can will regard in above-mentioned video process apparatus Frequency processing device is divided into different modules as required, to complete all or part of function of above-mentioned video process apparatus.

Specific about video process apparatus limits the restriction that may refer to above for method for processing video frequency, herein not It repeats again.Modules in above-mentioned video process apparatus can be realized fully or partially through software, hardware and combinations thereof.On Stating each module can be embedded in the form of hardware or independently of in the processor in computer equipment, can also store in a software form In memory in computer equipment, the corresponding operation of the above modules is executed in order to which processor calls.

Realizing for the modules in video process apparatus provided in the embodiment of the present application can be the shape of computer program Formula.The computer program can be run in terminal or server.The program module that the computer program is constituted is storable in terminal Or on the memory of server.When the computer program is executed by processor, method described in the embodiment of the present application is realized Step.

The embodiment of the present application also provides a kind of computer readable storage mediums.One or more is executable comprising computer The non-volatile computer readable storage medium storing program for executing of instruction, when the computer executable instructions are executed by one or more processors When, so that the step of processor executes method for processing video frequency.

A kind of computer program product comprising instruction, when run on a computer, so that computer executes video Processing method.

The embodiment of the present application also provides a kind of electronic equipment.It include image processing circuit in above-mentioned electronic equipment, at image Reason circuit can use hardware and or software component realization, it may include define ISP (Image Signal Processing, figure As signal processing) the various processing units of pipeline.Fig. 9 is the schematic diagram of image processing circuit in one embodiment.Such as Fig. 9 institute Show, for purposes of illustration only, only showing the various aspects of image processing techniques relevant to the embodiment of the present application.

As shown in figure 9, image processing circuit includes ISP processor 940 and control logic device 950.Imaging device 910 captures Image data handled first by ISP processor 940, ISP processor 940 to image data analyzed with capture can be used for really The image statistics of fixed and/or imaging device 910 one or more control parameters.Imaging device 910 may include having one The camera of a or multiple lens 912 and imaging sensor 914.Imaging sensor 914 may include colour filter array (such as Bayer filter), imaging sensor 914 can obtain the luminous intensity captured with each imaging pixel of imaging sensor 914 and wavelength Information, and the one group of raw image data that can be handled by ISP processor 940 is provided.Sensor 920 (such as gyroscope) can be based on biography The parameter (such as stabilization parameter) of the image procossing of acquisition is supplied to ISP processor 940 by 920 interface type of sensor.Sensor 920 Interface can use SMIA (Standard Mobile Imaging Architecture, Standard Mobile Imager framework) interface, The combination of other serial or parallel camera interfaces or above-mentioned interface.

In addition, raw image data can also be sent to sensor 920 by imaging sensor 914, sensor 920 can be based on biography Raw image data is supplied to ISP processor 940 to 920 interface type of sensor or sensor 920 deposits raw image data It stores up in video memory 930.

ISP processor 940 handles raw image data pixel by pixel in various formats.For example, each image pixel can Bit depth with 9,10,12 or 14 bits, ISP processor 940 can carry out raw image data at one or more images Reason operation, statistical information of the collection about image data.Wherein, image processing operations can be by identical or different bit depth precision It carries out.

ISP processor 940 can also receive image data from video memory 930.For example, 920 interface of sensor will be original Image data is sent to video memory 930, and the raw image data in video memory 930 is available to ISP processor 940 It is for processing.Video memory 930 can be independent special in a part, storage equipment or electronic equipment of memory device It with memory, and may include DMA (Direct Memory Access, direct direct memory access (DMA)) feature.

When receiving from 914 interface of imaging sensor or from 920 interface of sensor or from video memory 930 When raw image data, ISP processor 940 can carry out one or more image processing operations, such as time-domain filtering.Treated schemes As data can be transmitted to video memory 930, to carry out other processing before shown.ISP processor 940 is from image Memory 930 receives processing data, and carries out in original domain and in RGB and YCbCr color space to the processing data Image real time transfer.Treated that image data may be output to display 970 for ISP processor 940, for user's viewing and/or It is further processed by graphics engine or GPU (Graphics Processing Unit, graphics processor).In addition, ISP processor 940 output also can be transmitted to video memory 930, and display 970 can read image data from video memory 930.? In one embodiment, video memory 930 can be configured to realize one or more frame buffers.In addition, ISP processor 940 Output can be transmitted to encoder/decoder 960, so as to encoding/decoding image data.The image data of coding can be saved, And it is decompressed before being shown in 970 equipment of display.Encoder/decoder 960 can be real by CPU or GPU or coprocessor It is existing.

The statistical data that ISP processor 940 determines, which can be transmitted, gives control logic device Unit 950.For example, statistical data can wrap Include the image sensings such as automatic exposure, automatic white balance, automatic focusing, flicker detection, black level compensation, 912 shadow correction of lens 914 statistical information of device.Control logic device 950 may include the processor and/or micro-control for executing one or more routines (such as firmware) Device processed, one or more routines can statistical data based on the received, determine the control parameter and ISP processor of imaging device 910 940 control parameter.For example, the control parameter of imaging device 910 may include 920 control parameter of sensor (such as gain, exposure The time of integration, stabilization parameter of control etc.), camera flash control parameter, 912 control parameter of lens (such as focus or zoom With focal length) or these parameters combination.ISP control parameter may include for automatic white balance and color adjustment (for example, in RGB During processing) 912 shadow correction parameter of gain level and color correction matrix and lens.

Electronic equipment according to above-mentioned image processing techniques the embodiment of the present application may be implemented described in video processing side Method.

Any reference to memory, storage, database or other media used in this application may include non-volatile And/or volatile memory.Suitable nonvolatile memory may include read-only memory (ROM), programming ROM (PROM), Electrically programmable ROM (EPROM), electrically erasable ROM (EEPROM) or flash memory.Volatile memory may include arbitrary access Memory (RAM), it is used as external cache.By way of illustration and not limitation, RAM is available in many forms, such as It is static RAM (SRAM), dynamic ram (DRAM), synchronous dram (SDRAM), double data rate sdram (DDR SDRAM), enhanced SDRAM (ESDRAM), synchronization link (Synchlink) DRAM (SLDRAM), memory bus (Rambus) direct RAM (RDRAM), direct memory bus dynamic ram (DRDRAM) and memory bus dynamic ram (RDRAM).

The several embodiments of the application above described embodiment only expresses, the description thereof is more specific and detailed, but simultaneously The limitation to the application the scope of the patents therefore cannot be interpreted as.It should be pointed out that for those of ordinary skill in the art For, without departing from the concept of this application, various modifications and improvements can be made, these belong to the guarantor of the application Protect range.Therefore, the scope of protection shall be subject to the appended claims for the application patent.

Claims

1. a kind of method for processing video frequency, which is characterized in that including：

Target video is generated according to the target image.

2. the method according to claim 1, wherein the method also includes：

Extract timestamp of the target image in the video；

When the difference of the timestamp of two adjacent target images is less than threshold value, then two adjacent target images are determined For same segment；

The segment that clip durations are greater than the first preset duration generates target view according to the preselected fragment as preselected fragment Frequently.

3. according to the method described in claim 2, it is characterized in that, described generate target video, packet according to the preselected fragment It includes：

Reception chooses instruction to the preselected fragment；

Target video is generated according to the multiple preselected fragments for choosing instruction that will choose.

4. according to the method described in claim 2, it is characterized in that, described generate target video, packet according to the preselected fragment It includes：

Whether when the preselected fragment is greater than the second preset duration, detecting in the target image in the preselected fragment has portrait Label；

When having portrait label in the target image in the preselected fragment, pre-selection is traversed using the traversal frame of the second preset duration Segment detects the frequency of occurrence of portrait label in the corresponding sub-piece of traversal frame of second preset duration；

Using the most sub-piece of the frequency of occurrence of portrait label as target video.

5. according to the method described in claim 2, it is characterized in that, described generate target video, packet according to the preselected fragment It includes：

When the preselected fragment is greater than the second preset duration, preselected fragment is traversed using the traversal frame of the second preset duration, is built Found the color histogram of the corresponding sub-piece of traversal frame of second preset duration；

Detect the dispersion degree of the color histogram；

Using the corresponding sub-piece of the smallest color histogram of dispersion degree as target video.

6. being obtained the method according to claim 1, wherein the image in video carries out scene Recognition The scene tag of described image, including：

Scene Recognition is carried out to the image in video, obtains the corresponding multiple scene tags of image in the video.

7. being obtained the method according to claim 1, wherein the image in video carries out scene Recognition The scene tag of described image, including：

A frame image is extracted at interval of default frame from the video, scene Recognition is carried out to the extracted image, is obtained The corresponding scene tag of described image.

8. a kind of video process apparatus, which is characterized in that including：

9. a kind of electronic equipment, including memory and processor, computer program, the computer are stored in the memory When program is executed by the processor, so that the processor executes the video processing as described in any one of claims 1 to 7 The step of method.

10. a kind of computer readable storage medium, is stored thereon with computer program, which is characterized in that the computer program The step of method as described in any one of claims 1 to 7 is realized when being executed by processor.