CN110351597A - A kind of method, apparatus and electronic equipment of video clipping - Google Patents

A kind of method, apparatus and electronic equipment of video clipping Download PDF

Info

Publication number
CN110351597A
CN110351597A CN201810308302.3A CN201810308302A CN110351597A CN 110351597 A CN110351597 A CN 110351597A CN 201810308302 A CN201810308302 A CN 201810308302A CN 110351597 A CN110351597 A CN 110351597A
Authority
CN
China
Prior art keywords
video
classification
image
image set
chosen
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
CN201810308302.3A
Other languages
Chinese (zh)
Inventor
刘兆艳
肖其虎
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
ZTE Corp
Original Assignee
ZTE Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by ZTE Corp filed Critical ZTE Corp
Priority to CN201810308302.3A priority Critical patent/CN110351597A/en
Priority to PCT/CN2019/081749 priority patent/WO2019196795A1/en
Publication of CN110351597A publication Critical patent/CN110351597A/en
Withdrawn legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/213Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods
    • G06F18/2135Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods based on approximation criteria, e.g. principal component analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/41Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/46Extracting features or characteristics from the video content, e.g. video fingerprints, representative shots or key frames
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/48Matching video sequences
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/168Feature extraction; Face representation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/172Classification, e.g. identification
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream, rendering scenes according to MPEG-4 scene graphs
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/80Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
    • H04N21/83Generation or processing of protective or descriptive data associated with content; Content structuring
    • H04N21/845Structuring of content, e.g. decomposing content into time segments
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/80Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
    • H04N21/83Generation or processing of protective or descriptive data associated with content; Content structuring
    • H04N21/845Structuring of content, e.g. decomposing content into time segments
    • H04N21/8456Structuring of content, e.g. decomposing content into time segments by decomposing the content in the time domain, e.g. in time segments

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Multimedia (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Data Mining & Analysis (AREA)
  • Signal Processing (AREA)
  • Evolutionary Computation (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Biology (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Human Computer Interaction (AREA)
  • Software Systems (AREA)
  • Computational Linguistics (AREA)
  • Television Signal Processing For Recording (AREA)

Abstract

The embodiment of the invention discloses a kind of method, apparatus of video clipping and electronic equipments, which comprises obtains the video image of video to be clipped each time point in multiple time points, forms image collection;Classify to all video images in described image set;The video image that the classification chosen is obtained from described image set generates corresponding classification video according to the video image of the classification chosen.Through the embodiment of the present invention, video clipping operation can be made convenient, and can also editing go out personalized classification video, promote user experience, meet users ' individualized requirement.

Description

A kind of method, apparatus and electronic equipment of video clipping
Technical field
This application involves field of video processing, the method, apparatus and electronic equipment of espespecially a kind of video clipping.
Background technique
Video clipping is to carry out non-linear editing to video source, by the picture of addition, background music, special efficacy, scene Equal materials are mixed again with video, are cut, are merged to video source, and by secondary coding, generating has different manifestations power New video.
With the fast development of internet, user will use the application program sharing video frequency in electronic equipment, at present in society Hand over each platform on platform not support the video of long time, if the video of user's shooting is too long, it is necessary to video into Row is cut;And present people also pursue personalized more and more, and the video of sharing also prefers to unusual, this is also It needs to carry out video personalized cutting and editor.
The Normal practice of current major video software for editing on the market, is that a video leaves out the beginning and the end, when choosing some Between point to the video content segments between another time point, the video and audio data of this segment according to certain video lattice Formula container rule generates a new video file.
This method can solve the primary demand that user cuts video, but also suffer from certain drawbacks.PC(Personal Computer, personal computer) on software for editing the method can be reused and choose multiple segments and cut, and hand Cannot be excessively complicated because being limited to interaction design on machine, it is most of namely only to support to select one section of continual view therein Frequently, any number of segments of selecting video are unable to, have limitation to the operation of user, user experience is poor.
Summary of the invention
The embodiment of the invention provides the method, apparatus and electronic equipment of a kind of processing of video, to promote user experience.
The embodiment of the invention provides a kind of methods of video clipping, comprising:
The video image of video to be clipped each time point in multiple time points is obtained, image collection is formed;
Classify to all video images in described image set;
The video image that the classification chosen is obtained from described image set, according to the video image of the classification chosen Generate corresponding classification video.
The embodiment of the invention also provides a kind of devices of video clipping, comprising:
Image set obtains module, for obtaining the video image of video to be clipped each time point in multiple time points, Form image collection;
Categorization module, for classifying to all video images in described image set;
Editing module is chosen according to described for obtaining the video image for the classification chosen from described image set The video image of classification generates corresponding classification video.
The embodiment of the invention also provides a kind of electronic equipment, comprising:
Processor;
For storing the memory of the processor-executable instruction;
Wherein, the processor is for performing the following operations:
The video image of video to be clipped each time point in multiple time points is obtained, image collection is formed;
Classify to all video images in described image set;
The video image that the classification chosen is obtained from described image set, according to the video image of the classification chosen Generate corresponding classification video.
The embodiment of the present invention includes: obtain the video image of video to be clipped each time point in multiple time points, group At image collection;Classify to all video images in described image set;Acquisition is chosen from described image set The video image of classification generates corresponding classification video according to the video image of the classification chosen.Implement through the invention Example can make video clipping operation convenient, and can also editing go out personalized classification video, promote user experience, meet and use Family individual demand.
Other features and advantages of the present invention will be illustrated in the following description, also, partly becomes from specification It obtains it is clear that understand through the implementation of the invention.The objectives and other advantages of the invention can be by specification, right Specifically noted structure is achieved and obtained in claim and attached drawing.
Detailed description of the invention
Attached drawing is used to provide to further understand technical solution of the present invention, and constitutes part of specification, with this The embodiment of application technical solution for explaining the present invention together, does not constitute the limitation to technical solution of the present invention.
Fig. 1 is the flow chart of the method for the video clipping of the embodiment of the present invention;
Fig. 2 is the schematic diagram of the method for the video clipping of the embodiment of the present invention;
Fig. 3 is the flow chart of the method for the video clipping of application example of the present invention;
Fig. 4 is GUI (Graphical User Interface, the graphical user of the electronic equipment of application example of the present invention Interface) schematic diagram;
Fig. 5 is that user selects the sorted GUI schematic diagram of A;
Fig. 6 is that user selects editing, obtains the schematic diagram of the new video of A classification;
Fig. 7 is the schematic device of the video clipping of the embodiment of the present invention;
Fig. 8 is the schematic device of the video clipping of another embodiment of the present invention.
Specific embodiment
The embodiment of the present invention is described in detail below in conjunction with attached drawing.It should be noted that not conflicting In the case of, the features in the embodiments and the embodiments of the present application can mutual any combination.
Step shown in the flowchart of the accompanying drawings can be in a computer system such as a set of computer executable instructions It executes.Also, although logical order is shown in flow charts, and it in some cases, can be to be different from herein suitable Sequence executes shown or described step.
With the fast development of internet, when user uses the application program sharing video frequency in electronic equipment, it is desirable to have a The cutting and edit mode of property.In addition, popularizing with monitor and control facility, produces the video of magnanimity, looks for from massive video It is more and more difficult to interested video content, it is time-consuming and laborious by manually searching.In embodiments of the present invention, it proposes to video Content recognition classification, then can extract the video of classification interested.
As depicted in figs. 1 and 2, the method for the video clipping of the embodiment of the present invention, comprising:
Step 101, the video image of video to be clipped each time point in multiple time points is obtained, image set is formed It closes.
Wherein, the video to be clipped can be user by the application program of video record in electronic equipment to certain fields What scape was recorded, being also possible to the video that user is obtained by other approach, as downloaded on network, being also possible to lead to Cross what monitor and control facility automatic recording obtained.
The multiple time point can be continuous time point, such as n seconds every, and n is the number greater than 0, can be according to be clipped The difference of the time span of video and have different settings.For example, video image as much as possible is known for image in order to obtain Not, n is equal to 1, i.e., one video image of acquisition in every 1 second.
The video image is also referred to as video frame images, can be the key frame (I frame) or P frame or B frame of video, It can be determined according to the compression situation of video to be clipped, the video that video record software is recorded on electronic equipment is usually to adopt With frame data compression, the video image on time point taken is usually key frame images (I frame).
In one embodiment, after step 101, it may also include that generating each video image in described image set corresponds to Thumbnail.And it may also include that the corresponding thumbnail of each video image in display described image set.
In an embodiment of the present invention, user can understand the content of video to be clipped by corresponding thumbnail, lead to Thumbnail corresponding with each image in generation multi-frame video image is crossed, depositing for electronic equipment can be saved in the display thumbnail time Space is stored up, and guarantees that user sees accurate video content, promotes user experience.
Step 102, classify to all video images in described image set.
It in one embodiment, can also include: to all video images in described image set before being classified It is pre-processed.
Wherein, the pretreatment may include: the image data for inputting the video image, and described image data are carried out Data type conversion carries out data normalization and whitening processing.
Wherein, input image data, the method that can be read using image decoding or image file;
Described image data are subjected to data type conversion, i.e., is converted to be more suitable to classify by the type of image data and calculate The data type of method, such as int type (integer type) are converted into float type (float);
Data normalization, common method have simple scalability, sample-by-sample mean value abatement, feature normalization (to make institute in data set Have feature that all there is zero-mean and unit variance) etc..Wherein simple scalability, it is therefore an objective to pass through the value of each latitude to data It is readjusted, so that final data vector is fallen in the section of [0,1] or [- 1,1], for color image, color is logical Between road and smooth performance is not present, therefore when handling color image, usually feature scaling is carried out to data, what we obtained For pixel value in [0,255] section, common processing is to make these pixel values divided by 255 them to zoom to [0,1] section It is interior;
Whitening processing, generally include PCA (Principal Component Analysis, principal component analysis) albefaction and ZCA (Zero-phase Component Analysis, zero phase constituent analysis) albefaction, wherein PCA albefaction guarantees that data are each The variance of dimension is 1, and ZCA albefaction guarantees that the variance of each dimension of data is identical.PCA albefaction, which can be used for dimensionality reduction, can also go phase Guan Xing, and ZCA albefaction is mainly used for decorrelation, and makes the data after albefaction close to original input data as far as possible.PCA/ZCA is white In change processing, covariance matrix is unit matrix, first has to make feature zero averaging, followed by select suitable epsilon (rule Then change item, have low-pass filtering effect to data), generally to select sufficiently large epsilon to carry out PCA/ for color image ZCA。
In addition, image preprocessing can also include following one or more processing in order to increase data set to deep learning:
The a series of overturning of the picture progress such as left and right of input is overturn, is spun upside down by Image Reversal, diagonal line is overturn Deng carrying out expanding data amount with this so that the image of all angles has, moreover it is possible to alleviate the problem of identification mistake;
Colour switching, such as brightness, contrast, saturation degree, the tone etc. of adjustment image.
In embodiments of the present invention, can by deep learning algorithm to all video images in described image set into Row classification.
In an embodiment of the present invention, artificial intelligence approach is introduced, is divided by frame image of the deep learning to video Class, the deep learning is the Learning Studies based on deep neural network, and deep neural network has multiple hidden layers Neural network, it is more useful to learn by constructing the training data of the machine learning model and magnanimity with many hidden layers Feature, to finally promote the accuracy of classification or prediction.Wherein feature learning is by sample by layer-by-layer eigentransformation in original The character representation in space transforms to a new feature space, to make to classify or predict to be more easier.Deep learning can be from big The expression of automatic learning characteristic in data, wherein may include thousands of parameter.For example, Hinton research group in 2012 Participating in ImageNet ILSVRC match and using is convolutional network model, and the character representation of model contains 6,000 ten thousand parameters, From sample middle schools up to a million acquistion to.
Deep learning training process may include:
It 1) is the process of a feature learning using the unsupervised learning from lower rising
Using no label data or there is each layer parameter of label data order training method, this step can be regarded as one without prison Training process is superintended and directed, is to distinguish the best part with traditional neural network.
Wherein, first with no label or there is label data training first layer, when training first learns parameter (this layer of first layer It is considered as obtaining the hidden layer for export and input the smallest three-layer neural network of difference);
After study obtains (n-1)th layer, by n-1 layers of the input exported as n-th layer, thus training n-th layer obtains respectively To the parameter of each layer.
2) top-down supervised learning is a process to whole network tuning
It based on each layer parameter that the first step obtains, goes to train by the data of tape label, error is top-down to be transmitted into one Step finely tunes the parameter of entire Multi-Layered Network Model, this step is a Training process.
From ImageNet go to school acquistion to character representation have very strong generalization ability, can successfully be applied to it Its data acquisition system task.The training set much applied be it is lesser, in this case using deep learning can there are three types of Method:
(1) it can will train obtained model as starting point on ImageNet, utilize target training set and backpropagation pair It carries out continuing to train, and model is adapted to specific application.ImageNet plays the role of pre-training.
(2) if target training set is not big enough, the network parameter of low layer can also be fixed, continues to use the instruction on ImageNet Practice collection as a result, being only updated to upper layer.This is because the network parameter of bottom is most difficult to update, and learn from ImageNet Obtained bottom filter often describes a variety of different local edges and texture information, and these filters are to general figure As there is preferable universality.
(3) model that training obtains is directlyed adopt on ImageNet, using the output of highest hidden layer as feature representation, Instead of the feature of common hand-designed.
Wherein ImageNet data set is that the maximum data set of image recognition and deep learning image are led in the world at present A very more fields for domain application, ImageNet data set have more than 1,400 ten thousand width pictures, cover a classification more than 20,000, wherein It has more than million picture to be marked by specific classification, which has special messenger's maintenance, can all do mark update every year.
In one embodiment, after step 102, it may also include that and classification marker is arranged to the video image of each classification.
The classification marker can indicate the classification that the video image is belonged to.
It in one embodiment, can also include: that a video is selected from the video image of each classification after step 102 Image generates thumbnail.
In the embodiment of the present invention, a video image conduct can be selected from all video frame images of each classification The video image of the classification generates the corresponding thumbnail of video image of each classification.The mode of selection can use preset Rule, for example, select first in the video frame images of each classification or it is intermediate one or last, can also be random Select a video image as the classification in the video frame images of each classification.
In one embodiment, it may also include that the corresponding thumbnail of each classification of display.
By showing the corresponding thumbnail of each classification, it includes the interior of which classification that user, which can see video to be clipped, Hold, user can be allowed to understand video to be clipped from different sides, promotes user experience.
Step 103, the video image that the classification chosen is obtained from described image set, according to the classification chosen Video image generates corresponding classification video.
In one embodiment, before step 103, it may also include that and receive the first user instruction, first user instruction is used In the classification that instruction is chosen.
Wherein, first user instruction can be the selection instruction of user;For example, user can pass through touch screen, mouse The input tools such as mark, keyboard select some classification.
In one embodiment, the video image that the classification chosen is obtained from described image set, comprising: from described The video image labeled as the classification chosen is obtained in image collection.
Wherein it is possible to according to classification marker, the video image for the classification chosen described in acquisition.
In one embodiment, described after the video image for obtaining the classification chosen in described image set, it can also wrap It includes: the corresponding thumbnail of all video images for the classification chosen described in generation.And it may also include that the display classification The corresponding thumbnail of all video images.
User can pass through the thumbnail of all video images under the classification chosen in an embodiment of the present invention Know the content in selected classification comprising which, user can be allowed to understand and check the video from the angle of classification, enhancing is used The individualized experience at family.
In one embodiment, before the video image of the classification chosen according to generates corresponding classification video, It may also include that and receive second user instruction, the second user instruction, which is used to indicate, generates the classification video.
Wherein, the second user instruction can be the confirmation instruction of user, receives the confirmation instruction of user, then starts to give birth to Constituent class video.
In one embodiment, the video image of the classification chosen according to generates corresponding classification video, comprising: The corresponding video clip of the video image of the classification is merged and encoded, the classification video is generated.
Wherein, the video clip in the corresponding time point section of each video image, for example, in a step 101, obtaining for every 1 second A video image is taken, then corresponding 1 second video clip of each video image.
In this step, the corresponding video clip of the video image of the classification is merged, and is generated as needed Video format, carry out secondary coding, generate the classification video.
Through the embodiment of the present invention, video clipping operation can be made convenient, and can also editing go out personalized classification view Frequently, user experience is promoted, users ' individualized requirement is met.
It is illustrated below with an application example.
As shown in figure 3, the method for the video clipping for application example of the present invention, Fig. 4~Fig. 6 is application example of the present invention The GUI schematic diagram of electronic equipment.
As shown in Fig. 4~Fig. 5, the GUI include video display area 401, video to be clipped video frame thumbnail show The thumbnail display area 404 of all images under region 402, classification thumbnail display area 403, classification, video playing are worked as Preceding moment 405, the total duration 406 of video, editing 407.
Wherein, video display area 401 is used to play the display of video.
The video frame thumbnail display area 402 of video to be clipped is used to show the contracting of the video frame images of video to be clipped Sketch map, user can be presented whole thumbnails by sliding thumbnail complete.
Classification thumbnail display area 403 is for showing the corresponding thumbnail of the image of each classification.
The thumbnail display area 404 of all images is used to show all thumbnails of the classification under classification.
The current time 405 of video playing is used to show current schedules moment when video playing.
The total duration 406 of video is used to show the total duration of video, after having selected a classification, then shows to be the classification The corresponding video clips of all images merge after new video duration.
Editing 407 is clip button, merges the corresponding video clips of all images of the classification after click, coding, generates New classification video.
Referring to Fig. 3, the method for the video clipping of application example of the present invention, comprising:
Step 301, video to be clipped is obtained, starts to execute editing operation.
The executing subject of this application example can be video clipping class application program in electronic equipment.
Wherein, the application program can be the software program of operation on an electronic device, and electronic equipment can be individual Computer, cloud device, mobile device such as smart phone or tablet computer etc..
Wherein, the video to be clipped is the video for needing to carry out it editing, can be passed through in electronic equipment for user The relevant software of video record records some scenes, or the video obtained for user by other approach is such as It is being downloaded on network, obtain from monitor and control facility.
Wherein, user triggers the editing operation to the video to be clipped, and triggering mode can pass through the text from electronic equipment The video of any time length is imported video clipping class application program by the video that any time length is selected in part folder, or Person can pass through its additional processing function such as editing of the video selection of any time length.
Step 302, the video image for obtaining video to be clipped each time point in multiple time points forms an image Set;
In an embodiment of the present invention, user can by the video image in each time point in multiple time points come Obtain the content of video to be clipped, acquired video frame images can be the key frame (I frame) or P frame or B of video Frame is determined by the background operating system of video clipping class application program according to the compression situation of video to be clipped, electronic equipment Frame data compression is usually used in the video that upper video record software is recorded, and the video image on time point taken is usually to close Key frame image (I frame).
Wherein, multiple time points can be continuous time point, such as n seconds every, and n is the number greater than 0, can be by video clipping The background operating system of class application program has different settings according to the difference of the time span of video to be clipped.
Step 310, for accessed image collection, the corresponding contracting of each video image in described image set is generated Sketch map.
In an embodiment of the present invention, user can understand the content of video to be clipped by corresponding thumbnail, lead to Thumbnail corresponding with each image in generation multi-frame video image is crossed, depositing for electronic equipment can be saved in the display thumbnail time Space is stored up, and guarantees that user sees accurate video content, promotes user experience.
Step 311, whole thumbnails are shown.
In application example of the present invention, user can be presented whole thumbnails by sliding thumbnail complete.It is aobvious Show region for the thumbnail display area 402 of video to be cut in Fig. 4.
Step 303, it (may include to image that every video image, which is pre-processed, in the image collection obtained to step 302 Data carry out data type conversion, carry out data normalization and whitening processing etc.), it prepares for image recognition.
Step 304, classified by deep learning algorithm to all video images in described image set, by video Image recognition is at different classification.
Wherein, deep learning includes the study of feature learning and disaggregated model, practical real according to the difference of the scene of application It is existing different.
For example, can choose often for garage or the monitor video of highway to the feature extraction of identified image The license plate number of vehicle carries out intelligent recognition and classification according to license plate number.Car license recognition mainly divides 3 parts, comprising:
It is License Plate first, generally using color positioning, feature location etc.;
Followed by License Plate Segmentation, generally use sciagraphy;
Character recognition is relatively good come recognition effect using convolutional neural networks, and sample number is big when training network, training set In character include all numbers and letter and the Chinese character in part province.Trained network can be used to carry out later Character recognition.
For another example carrying out the feature of face to identified image for the monitor video of public place such as kindergarten etc. It extracts, intelligent recognition and classification is carried out to face using face recognition algorithms.
Deep learning model can learn to express for the layered characteristic of facial image, may include:
The bottom can portray local edge and textural characteristics from original pixels learning filters;
By being combined to various boundary filters, middle layer filter can describe different types of human face;
The top global characteristics for describing entire face.
Deep learning provides distributed character representation.Highest hidden layer, each neuron represent a category Property classifier, such as men and women, ethnic group and hair color etc..
In field of face identification, Labeled Faces in the Wild (LFW) is current famous recognition of face test Collection, is created in 2007.LFW has collected the human face photo of a famous person more than 5,000 from internet, for assessing face recognition algorithms Performance under the conditions of non-controllable.These photos often have complicated light, expression, posture, age and block etc. Variation.The test set of LFW contains 6000 pairs of facial images.Wherein 3000 pairs are positive samples, and two each pair of images belong to together One people;Remaining 3000 pairs are negative samples, and two each pair of images belong to different people.The accuracy rate guessed at random is 50%.Through The face recognition algorithms Eigenface of allusion quotation only has 60% discrimination on this test set.In the algorithm of non-deep learning, Best discrimination is 96.33%.Deep learning can achieve 99.47% discrimination at present.
Step 305, classification marker is carried out to video image according to the result of image recognition.
Step 312, the thumbnail of video image corresponding with classification is generated.
Wherein it is possible to select video figure of the image as the classification from all video images of each classification Picture generates the corresponding thumbnail of video image of each classification.
In addition, since step 310 has generated the thumbnail of all video images, so in this step, it can also be direct The corresponding thumbnail of video image of each classification is obtained from the thumbnail of all video images.
Step 313, classification thumbnail is shown.
Referring to fig. 4, classification thumbnail display area is thumbnail display area 403 of classifying in Fig. 4, in this application example, Video image is divided into A, B, C tri- classification.
User can see that video to be clipped includes the content of which classification, Ke Yirang by corresponding classification thumbnail User understands video from different sides, promotes user experience.
Step 306, the first user instruction is received, a classification is selected.
First user instruction can be the selection instruction of user, and in application example of the invention, user can be from The classification thumbnail display area 403 of Fig. 4 selects a classification.
Step 307, it will be extracted labeled as all video images of the classification.
Step 314, thumbnail corresponding with all video images of the classification is generated.
Since step 310 has generated the thumbnail of all video images, so in this step, it can also be directly from all All thumbnails of the classification are obtained in the thumbnail of video image.
Step 315, all thumbnails of the classification are shown.
In application example of the present invention, display area is the thumbnail display area 404 of all images under classifying in Fig. 5, User selects the label seen after A classification for all thumbnails 404 of A.
User can know the content in selected classification comprising which by the thumbnail for lower all images of classifying, can be with It allows user to understand and check the video from the angle of classification, enhances the individualized experience of user.
Step 308, second user instruction is received, determines editing.
The confirmation that the second user instruction can be user instructs in application example of the present invention, and user can pass through a little 407 icon of editing hit in Fig. 5 come determine editing selected classification under all video clips.
Step 309, the corresponding video clips of all images of the classification are merged, coding, generates new classification video.
As shown in fig. 6, user selects editing, the new video of A classification is obtained.
The embodiment of the present invention also provides a kind of device of video clipping, and the device is for realizing above-described embodiment and embodiment party Formula, the descriptions that have already been made will not be repeated.As used below, the software of predetermined function may be implemented in term " module " And/or the combination of hardware.Although device described in following embodiment can be realized with software, hardware or software Realization with the combination of hardware is also that may and be contemplated.
As shown in fig. 7, the device of the video clipping of the embodiment of the present invention, comprising:
Image set obtains module 701, for obtaining the video figure of video to be clipped each time point in multiple time points Picture forms image collection;
Categorization module 702, for classifying to all video images in described image set;
Editing module 703 is chosen for obtaining the video image for the classification chosen from described image set according to described The video image of classification generate corresponding classification video.
Through the embodiment of the present invention, video clipping operation can be made convenient, and can also editing go out personalized classification view Frequently, user experience is promoted, users ' individualized requirement is met.
As shown in figure 8, in one embodiment, described device further include:
First generation module 704, for generating the corresponding thumbnail of each video image in described image set.
In one embodiment, described device further include:
First display module 705, for showing the corresponding thumbnail of each video image in described image set.
In one embodiment, the categorization module 702, is used for:
Classified by deep learning algorithm to all video images in described image set.
In one embodiment, the categorization module 702, is also used to through deep learning algorithm in described image set Before all video images are classified, all video images in described image set are pre-processed.
In one embodiment, the categorization module 702 is also used to: being carried out to all video images in described image set After classification, classification marker is arranged to the video image of each classification;
The editing module, for obtaining the video image labeled as the classification chosen from described image set.
In one embodiment, described device further include:
Second generation module 706 generates thumbnail for selecting a video image from the video image of each classification.
In one embodiment, described device further include:
Second display module 707, for showing the corresponding thumbnail of each classification.
In one embodiment, described device further include:
First receiving module 708, for receiving the first user instruction, first user instruction is used to indicate point chosen Class.
In one embodiment, described device further include:
Third generation module 709, for generating the corresponding thumbnail of all video images of the classification chosen.
In one embodiment, described device further include:
Third display module 710, for showing the corresponding thumbnail of all video images of the classification chosen.
In one embodiment, described device further include:
Second receiving module 711, for receiving second user instruction, the second user instruction is used to indicate described in generation Classification video.
In one embodiment, the editing module 703, is used for:
The corresponding video clip of the video image of the classification is merged and encoded, the classification video is generated.
The embodiment of the present invention also provides a kind of electronic equipment, and the electronic equipment can be PC, cloud device, shifting Dynamic equipment such as smart phone or tablet computer etc..The electronic equipment includes:
Processor;
For storing the memory of the processor-executable instruction;
Wherein, the processor is for performing the following operations:
The video image of video to be clipped each time point in multiple time points is obtained, image collection is formed;
Classify to all video images in described image set;
The video image that the classification chosen is obtained from described image set, according to the video image of the classification chosen Generate corresponding classification video.
In one embodiment, the processor is for performing the following operations:
In the video image for obtaining video each time point in multiple time points to be clipped, after forming image collection, Generate the corresponding thumbnail of each video image in described image set.
In one embodiment, the processor is for performing the following operations:
In generating described image set after the corresponding thumbnail of each video image, show every in described image set The corresponding thumbnail of a video image.
In one embodiment, the processor is for performing the following operations:
Classified by deep learning algorithm to all video images in described image set.
In one embodiment, the processor is for performing the following operations:
Before being classified by deep learning algorithm to all video images in described image set, to the figure All video images in image set conjunction are pre-processed.
In one embodiment, the processor is for performing the following operations:
Described image data are carried out data type conversion by the image data for inputting the video image, are carried out data and are returned One change and whitening processing.
In one embodiment, the processor is for performing the following operations:
After classifying to all video images in described image set, the video image of each classification is arranged Classification marker;
The video image labeled as the classification chosen is obtained from described image set.
In one embodiment, the processor is for performing the following operations:
After classifying to all video images in described image set, selected from the video image of each classification A video image is selected, thumbnail is generated.
In one embodiment, the processor is for performing the following operations:
A video image is selected in the video image from each classification, after generating thumbnail, shows each classification pair The thumbnail answered.
In one embodiment, the processor is for performing the following operations:
Before the video image for obtaining the classification chosen in described image set, the first user instruction is being received, it is described First user instruction is used to indicate the classification chosen.
In one embodiment, the processor is for performing the following operations:
In the institute for the classification after the video image for obtaining the classification chosen in described image set, chosen described in generation There is the corresponding thumbnail of video image.
In one embodiment, the processor is for performing the following operations:
After the corresponding thumbnail of all video images for the classification chosen described in the generation, the classification chosen described in display The corresponding thumbnail of all video images.
In one embodiment, the processor is for performing the following operations:
Before the video image for the classification chosen according to generates corresponding classification video, receives second user and refer to It enables, the second user instruction, which is used to indicate, generates the classification video.
In one embodiment, the processor is for performing the following operations:
The corresponding video clip of the video image of the classification is merged and encoded, the classification video is generated.
The embodiment of the present invention also provides a kind of computer readable storage medium, is stored with computer executable instructions, described The method that computer executable instructions are used to execute the video clipping.
In the present embodiment, above-mentioned storage medium can include but is not limited to: USB flash disk, read-only memory (ROM, Read- Only Memory), random access memory (RAM, Random Access Memory), mobile hard disk, magnetic or disk etc. The various media that can store program code.
Obviously, it is logical to should be understood that the module of the above-mentioned embodiment of the present invention or step can be used by those skilled in the art Computing device realizes that they can be concentrated on a single computing device, or be distributed in multiple computing device institutes group At network on, optionally, they can be realized with the program code that computing device can perform, it is thus possible to which they are deposited Storage is performed by computing device in the storage device, and in some cases, can be to be different from sequence execution institute herein The step of showing or describing, perhaps they are fabricated to integrated circuit modules or by the multiple modules or step in them Suddenly single integrated circuit module is fabricated to realize.In this way, the embodiment of the present invention is not limited to any specific hardware and software In conjunction with.
Although disclosed herein embodiment it is as above, the content only for ease of understanding the present invention and use Embodiment is not intended to limit the invention.Technical staff in any fields of the present invention is taken off not departing from the present invention Under the premise of the spirit and scope of dew, any modification and variation, but the present invention can be carried out in the form and details of implementation Scope of patent protection, still should be subject to the scope of the claims as defined in the appended claims.

Claims (30)

1. a kind of method of video clipping, comprising:
The video image of video to be clipped each time point in multiple time points is obtained, image collection is formed;
Classify to all video images in described image set;
The video image that the classification chosen is obtained from described image set is generated according to the video image of the classification chosen Corresponding classification video.
2. the method as described in claim 1, which is characterized in that it is described obtain video to be clipped it is each in multiple time points when Between the video image put, after forming image collection, further includes:
Generate the corresponding thumbnail of each video image in described image set.
3. method according to claim 2, which is characterized in that each video image is corresponding in the generation described image set Thumbnail after, further includes:
Show the corresponding thumbnail of each video image in described image set.
4. the method as described in claim 1, which is characterized in that all video images in described image set carry out Classification includes:
Classified by deep learning algorithm to all video images in described image set.
5. method as claimed in claim 4, which is characterized in that it is described by deep learning algorithm in described image set Before all video images are classified, further includes:
All video images in described image set are pre-processed.
6. method as claimed in claim 5, which is characterized in that all video images in described image set carry out Pretreatment includes:
Described image data are carried out data type conversion, carry out data normalization by the image data for inputting the video image And whitening processing.
7. the method as described in claim 1, which is characterized in that
After all video images in described image set are classified, further includes: to the video figure of each classification As setting classification marker;
The video image that the classification chosen is obtained from described image set, comprising: mark is obtained from described image set It is denoted as the video image of the classification chosen.
8. the method as described in claim 1, which is characterized in that all video images in described image set carry out After classification, further includes:
A video image is selected from the video image of each classification, generates thumbnail.
9. method according to claim 8, which is characterized in that described to select a video figure from the video image of each classification Picture, generate thumbnail after, further includes:
Show the corresponding thumbnail of each classification.
10. the method as described in claim 1, which is characterized in that described to obtain the classification chosen from described image set Before video image, further includes:
The first user instruction is received, first user instruction is used to indicate the classification chosen.
11. the method as described in claim 1, which is characterized in that described to obtain the classification chosen from described image set After video image, further includes:
The corresponding thumbnail of all video images for the classification chosen described in generation.
12. method as claimed in claim 11, which is characterized in that all video images for the classification chosen described in the generation After corresponding thumbnail, further includes:
The corresponding thumbnail of all video images for the classification chosen described in display.
13. the method as described in claim 1, which is characterized in that the video image of the classification chosen according to generates Before corresponding classification video, further includes:
Second user instruction is received, the second user instruction, which is used to indicate, generates the classification video.
14. the method as described in claim 1, which is characterized in that the video image of the classification chosen according to generates Corresponding classification video, comprising:
The corresponding video clip of the video image of the classification is merged and encoded, the classification video is generated.
15. the method as described in any one of claim 1~14, which is characterized in that
The multiple time point is continuous time point.
16. the method as described in any one of claim 1~14, which is characterized in that
The video image is key frame images.
17. a kind of device of video clipping characterized by comprising
Image set obtains module, for obtaining the video image of video to be clipped each time point in multiple time points, forms Image collection;
Categorization module, for classifying to all video images in described image set;
Editing module, for obtaining the video image for the classification chosen from described image set, according to the classification chosen Video image generate corresponding classification video.
18. device as claimed in claim 17, which is characterized in that further include:
First generation module, for generating the corresponding thumbnail of each video image in described image set.
19. device as claimed in claim 18, which is characterized in that further include:
First display module, for showing the corresponding thumbnail of each video image in described image set.
20. device as claimed in claim 17, which is characterized in that the categorization module is used for:
Classified by deep learning algorithm to all video images in described image set.
21. device as claimed in claim 20, which is characterized in that
The categorization module is also used to classify to all video images in described image set by deep learning algorithm Before, all video images in described image set are pre-processed.
22. device as claimed in claim 17, which is characterized in that
The categorization module is also used to: after classifying to all video images in described image set, to each classification Video image be arranged classification marker;
The editing module, for obtaining the video image labeled as the classification chosen from described image set.
23. device as claimed in claim 17, which is characterized in that further include:
Second generation module generates thumbnail for selecting a video image from the video image of each classification.
24. device as claimed in claim 23, which is characterized in that further include:
Second display module, for showing the corresponding thumbnail of each classification.
25. device as claimed in claim 17, which is characterized in that further include:
First receiving module, for receiving the first user instruction, first user instruction is used to indicate the classification chosen.
26. device as claimed in claim 17, which is characterized in that further include:
Third generation module, for generating the corresponding thumbnail of all video images of the classification chosen.
27. device as claimed in claim 26, which is characterized in that further include:
Third display module, for showing the corresponding thumbnail of all video images of the classification chosen.
28. device as claimed in claim 17, which is characterized in that further include:
Second receiving module, for receiving second user instruction, the second user instruction, which is used to indicate, generates the classification view Frequently.
29. device as claimed in claim 17, which is characterized in that the editing module is used for:
The corresponding video clip of the video image of the classification is merged and encoded, the classification video is generated.
30. a kind of electronic equipment characterized by comprising
Processor;
For storing the memory of the processor-executable instruction;
Wherein, the processor is for performing the following operations:
The video image of video to be clipped each time point in multiple time points is obtained, image collection is formed;
Classify to all video images in described image set;
The video image that the classification chosen is obtained from described image set is generated according to the video image of the classification chosen Corresponding classification video.
CN201810308302.3A 2018-04-08 2018-04-08 A kind of method, apparatus and electronic equipment of video clipping Withdrawn CN110351597A (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201810308302.3A CN110351597A (en) 2018-04-08 2018-04-08 A kind of method, apparatus and electronic equipment of video clipping
PCT/CN2019/081749 WO2019196795A1 (en) 2018-04-08 2019-04-08 Video editing method, device and electronic device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810308302.3A CN110351597A (en) 2018-04-08 2018-04-08 A kind of method, apparatus and electronic equipment of video clipping

Publications (1)

Publication Number Publication Date
CN110351597A true CN110351597A (en) 2019-10-18

Family

ID=68163482

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810308302.3A Withdrawn CN110351597A (en) 2018-04-08 2018-04-08 A kind of method, apparatus and electronic equipment of video clipping

Country Status (2)

Country Link
CN (1) CN110351597A (en)
WO (1) WO2019196795A1 (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110856037A (en) * 2019-11-22 2020-02-28 北京金山云网络技术有限公司 Video cover determination method and device, electronic equipment and readable storage medium
CN110910470A (en) * 2019-11-11 2020-03-24 广联达科技股份有限公司 Method and device for generating high-quality thumbnail
CN113395542A (en) * 2020-10-26 2021-09-14 腾讯科技(深圳)有限公司 Video generation method and device based on artificial intelligence, computer equipment and medium
CN114302224A (en) * 2021-12-23 2022-04-08 新华智云科技有限公司 Intelligent video editing method, device, equipment and storage medium
WO2022081081A1 (en) * 2020-10-15 2022-04-21 脸萌有限公司 Video distribution system and method, computing device, and user equipment
CN117177006A (en) * 2023-09-01 2023-12-05 湖南广播影视集团有限公司 CNN algorithm-based short video intelligent manufacturing method

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115460446A (en) * 2022-08-19 2022-12-09 上海爱奇艺新媒体科技有限公司 Alignment method and device for multiple paths of video signals and electronic equipment

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102819528A (en) * 2011-06-10 2012-12-12 中国电信股份有限公司 Method and device for generating video abstraction
CN103079117A (en) * 2012-12-30 2013-05-01 信帧电子技术(北京)有限公司 Video abstract generation method and video abstract generation device
CN106937120A (en) * 2015-12-29 2017-07-07 北京大唐高鸿数据网络技术有限公司 Object-based monitor video method for concentration
CN107566907A (en) * 2017-09-20 2018-01-09 广东欧珀移动通信有限公司 video clipping method, device, storage medium and terminal

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TW520602B (en) * 2001-06-28 2003-02-11 Ulead Systems Inc Device and method of editing video program
US7954065B2 (en) * 2006-12-22 2011-05-31 Apple Inc. Two-dimensional timeline display of media items
CN105763884A (en) * 2014-12-18 2016-07-13 广州市动景计算机科技有限公司 Video processing method, device and apparatus
CN104796781B (en) * 2015-03-31 2019-01-18 小米科技有限责任公司 Video clip extracting method and device
CN107295377B (en) * 2017-07-14 2020-12-01 程工 Film production method, device and system

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102819528A (en) * 2011-06-10 2012-12-12 中国电信股份有限公司 Method and device for generating video abstraction
CN103079117A (en) * 2012-12-30 2013-05-01 信帧电子技术(北京)有限公司 Video abstract generation method and video abstract generation device
CN106937120A (en) * 2015-12-29 2017-07-07 北京大唐高鸿数据网络技术有限公司 Object-based monitor video method for concentration
CN107566907A (en) * 2017-09-20 2018-01-09 广东欧珀移动通信有限公司 video clipping method, device, storage medium and terminal

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110910470A (en) * 2019-11-11 2020-03-24 广联达科技股份有限公司 Method and device for generating high-quality thumbnail
CN110910470B (en) * 2019-11-11 2023-07-07 广联达科技股份有限公司 Method and device for generating high-quality thumbnail
CN110856037A (en) * 2019-11-22 2020-02-28 北京金山云网络技术有限公司 Video cover determination method and device, electronic equipment and readable storage medium
WO2022081081A1 (en) * 2020-10-15 2022-04-21 脸萌有限公司 Video distribution system and method, computing device, and user equipment
US11838576B2 (en) 2020-10-15 2023-12-05 Lemon Inc. Video distribution system, method, computing device and user equipment
CN113395542A (en) * 2020-10-26 2021-09-14 腾讯科技(深圳)有限公司 Video generation method and device based on artificial intelligence, computer equipment and medium
CN114302224A (en) * 2021-12-23 2022-04-08 新华智云科技有限公司 Intelligent video editing method, device, equipment and storage medium
CN114302224B (en) * 2021-12-23 2023-04-07 新华智云科技有限公司 Intelligent video editing method, device, equipment and storage medium
CN117177006A (en) * 2023-09-01 2023-12-05 湖南广播影视集团有限公司 CNN algorithm-based short video intelligent manufacturing method

Also Published As

Publication number Publication date
WO2019196795A1 (en) 2019-10-17

Similar Documents

Publication Publication Date Title
CN110351597A (en) A kind of method, apparatus and electronic equipment of video clipping
Deng et al. Image aesthetic assessment: An experimental survey
Williams et al. Images as data for social science research: An introduction to convolutional neural nets for image classification
Botha et al. Fake news and deepfakes: A dangerous threat for 21st century information security
Karayev et al. Recognizing image style
KR102290419B1 (en) Method and Appratus For Creating Photo Story based on Visual Context Analysis of Digital Contents
Höferlin et al. Inter-active learning of ad-hoc classifiers for video visual analytics
CN110889672B (en) Student card punching and class taking state detection system based on deep learning
US8548249B2 (en) Information processing apparatus, information processing method, and program
JP2011215963A (en) Electronic apparatus, image processing method, and program
Li et al. Videography-based unconstrained video analysis
Li et al. Data-driven affective filtering for images and videos
Zhang et al. A comprehensive survey on computational aesthetic evaluation of visual art images: Metrics and challenges
Vonikakis et al. A probabilistic approach to people-centric photo selection and sequencing
Ismail et al. Deepfake video detection: YOLO-Face convolution recurrent approach
Yang et al. A comprehensive survey on image aesthetic quality assessment
Verma et al. Age prediction using image dataset using machine learning
Somaini On the altered states of machine vision: Trevor Paglen, Hito Steyerl, Grégory Chatonsky
CN106657817A (en) Processing method applied to mobile phone platform for automatically making album MV
CN112040273B (en) Video synthesis method and device
Dao et al. Robust event discovery from photo collections using Signature Image Bases (SIBs)
Debnath et al. Computational approaches to aesthetic quality assessment of digital photographs: state of the art and future research directives
Yang et al. Learning the synthesizability of dynamic texture samples
Tian et al. Relative aesthetic quality ranking
Sasireka Comparative analysis on video retrieval technique using machine learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WW01 Invention patent application withdrawn after publication
WW01 Invention patent application withdrawn after publication

Application publication date: 20191018