CN110351597A - A kind of method, apparatus and electronic equipment of video clipping - Google Patents
A kind of method, apparatus and electronic equipment of video clipping Download PDFInfo
- Publication number
- CN110351597A CN110351597A CN201810308302.3A CN201810308302A CN110351597A CN 110351597 A CN110351597 A CN 110351597A CN 201810308302 A CN201810308302 A CN 201810308302A CN 110351597 A CN110351597 A CN 110351597A
- Authority
- CN
- China
- Prior art keywords
- video
- classification
- image
- image set
- chosen
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Withdrawn
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/213—Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods
- G06F18/2135—Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods based on approximation criteria, e.g. principal component analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/40—Scenes; Scene-specific elements in video content
- G06V20/41—Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/40—Scenes; Scene-specific elements in video content
- G06V20/46—Extracting features or characteristics from the video content, e.g. video fingerprints, representative shots or key frames
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/40—Scenes; Scene-specific elements in video content
- G06V20/48—Matching video sequences
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/16—Human faces, e.g. facial parts, sketches or expressions
- G06V40/168—Feature extraction; Face representation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/16—Human faces, e.g. facial parts, sketches or expressions
- G06V40/172—Classification, e.g. identification
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/43—Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
- H04N21/44—Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream, rendering scenes according to MPEG-4 scene graphs
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/80—Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
- H04N21/83—Generation or processing of protective or descriptive data associated with content; Content structuring
- H04N21/845—Structuring of content, e.g. decomposing content into time segments
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/80—Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
- H04N21/83—Generation or processing of protective or descriptive data associated with content; Content structuring
- H04N21/845—Structuring of content, e.g. decomposing content into time segments
- H04N21/8456—Structuring of content, e.g. decomposing content into time segments by decomposing the content in the time domain, e.g. in time segments
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Multimedia (AREA)
- General Physics & Mathematics (AREA)
- Physics & Mathematics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Data Mining & Analysis (AREA)
- Signal Processing (AREA)
- Evolutionary Computation (AREA)
- Oral & Maxillofacial Surgery (AREA)
- Bioinformatics & Computational Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- General Engineering & Computer Science (AREA)
- Artificial Intelligence (AREA)
- Life Sciences & Earth Sciences (AREA)
- Evolutionary Biology (AREA)
- Health & Medical Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Human Computer Interaction (AREA)
- Software Systems (AREA)
- Computational Linguistics (AREA)
- Television Signal Processing For Recording (AREA)
Abstract
The embodiment of the invention discloses a kind of method, apparatus of video clipping and electronic equipments, which comprises obtains the video image of video to be clipped each time point in multiple time points, forms image collection;Classify to all video images in described image set;The video image that the classification chosen is obtained from described image set generates corresponding classification video according to the video image of the classification chosen.Through the embodiment of the present invention, video clipping operation can be made convenient, and can also editing go out personalized classification video, promote user experience, meet users ' individualized requirement.
Description
Technical field
This application involves field of video processing, the method, apparatus and electronic equipment of espespecially a kind of video clipping.
Background technique
Video clipping is to carry out non-linear editing to video source, by the picture of addition, background music, special efficacy, scene
Equal materials are mixed again with video, are cut, are merged to video source, and by secondary coding, generating has different manifestations power
New video.
With the fast development of internet, user will use the application program sharing video frequency in electronic equipment, at present in society
Hand over each platform on platform not support the video of long time, if the video of user's shooting is too long, it is necessary to video into
Row is cut;And present people also pursue personalized more and more, and the video of sharing also prefers to unusual, this is also
It needs to carry out video personalized cutting and editor.
The Normal practice of current major video software for editing on the market, is that a video leaves out the beginning and the end, when choosing some
Between point to the video content segments between another time point, the video and audio data of this segment according to certain video lattice
Formula container rule generates a new video file.
This method can solve the primary demand that user cuts video, but also suffer from certain drawbacks.PC(Personal
Computer, personal computer) on software for editing the method can be reused and choose multiple segments and cut, and hand
Cannot be excessively complicated because being limited to interaction design on machine, it is most of namely only to support to select one section of continual view therein
Frequently, any number of segments of selecting video are unable to, have limitation to the operation of user, user experience is poor.
Summary of the invention
The embodiment of the invention provides the method, apparatus and electronic equipment of a kind of processing of video, to promote user experience.
The embodiment of the invention provides a kind of methods of video clipping, comprising:
The video image of video to be clipped each time point in multiple time points is obtained, image collection is formed;
Classify to all video images in described image set;
The video image that the classification chosen is obtained from described image set, according to the video image of the classification chosen
Generate corresponding classification video.
The embodiment of the invention also provides a kind of devices of video clipping, comprising:
Image set obtains module, for obtaining the video image of video to be clipped each time point in multiple time points,
Form image collection;
Categorization module, for classifying to all video images in described image set;
Editing module is chosen according to described for obtaining the video image for the classification chosen from described image set
The video image of classification generates corresponding classification video.
The embodiment of the invention also provides a kind of electronic equipment, comprising:
Processor;
For storing the memory of the processor-executable instruction;
Wherein, the processor is for performing the following operations:
The video image of video to be clipped each time point in multiple time points is obtained, image collection is formed;
Classify to all video images in described image set;
The video image that the classification chosen is obtained from described image set, according to the video image of the classification chosen
Generate corresponding classification video.
The embodiment of the present invention includes: obtain the video image of video to be clipped each time point in multiple time points, group
At image collection;Classify to all video images in described image set;Acquisition is chosen from described image set
The video image of classification generates corresponding classification video according to the video image of the classification chosen.Implement through the invention
Example can make video clipping operation convenient, and can also editing go out personalized classification video, promote user experience, meet and use
Family individual demand.
Other features and advantages of the present invention will be illustrated in the following description, also, partly becomes from specification
It obtains it is clear that understand through the implementation of the invention.The objectives and other advantages of the invention can be by specification, right
Specifically noted structure is achieved and obtained in claim and attached drawing.
Detailed description of the invention
Attached drawing is used to provide to further understand technical solution of the present invention, and constitutes part of specification, with this
The embodiment of application technical solution for explaining the present invention together, does not constitute the limitation to technical solution of the present invention.
Fig. 1 is the flow chart of the method for the video clipping of the embodiment of the present invention;
Fig. 2 is the schematic diagram of the method for the video clipping of the embodiment of the present invention;
Fig. 3 is the flow chart of the method for the video clipping of application example of the present invention;
Fig. 4 is GUI (Graphical User Interface, the graphical user of the electronic equipment of application example of the present invention
Interface) schematic diagram;
Fig. 5 is that user selects the sorted GUI schematic diagram of A;
Fig. 6 is that user selects editing, obtains the schematic diagram of the new video of A classification;
Fig. 7 is the schematic device of the video clipping of the embodiment of the present invention;
Fig. 8 is the schematic device of the video clipping of another embodiment of the present invention.
Specific embodiment
The embodiment of the present invention is described in detail below in conjunction with attached drawing.It should be noted that not conflicting
In the case of, the features in the embodiments and the embodiments of the present application can mutual any combination.
Step shown in the flowchart of the accompanying drawings can be in a computer system such as a set of computer executable instructions
It executes.Also, although logical order is shown in flow charts, and it in some cases, can be to be different from herein suitable
Sequence executes shown or described step.
With the fast development of internet, when user uses the application program sharing video frequency in electronic equipment, it is desirable to have a
The cutting and edit mode of property.In addition, popularizing with monitor and control facility, produces the video of magnanimity, looks for from massive video
It is more and more difficult to interested video content, it is time-consuming and laborious by manually searching.In embodiments of the present invention, it proposes to video
Content recognition classification, then can extract the video of classification interested.
As depicted in figs. 1 and 2, the method for the video clipping of the embodiment of the present invention, comprising:
Step 101, the video image of video to be clipped each time point in multiple time points is obtained, image set is formed
It closes.
Wherein, the video to be clipped can be user by the application program of video record in electronic equipment to certain fields
What scape was recorded, being also possible to the video that user is obtained by other approach, as downloaded on network, being also possible to lead to
Cross what monitor and control facility automatic recording obtained.
The multiple time point can be continuous time point, such as n seconds every, and n is the number greater than 0, can be according to be clipped
The difference of the time span of video and have different settings.For example, video image as much as possible is known for image in order to obtain
Not, n is equal to 1, i.e., one video image of acquisition in every 1 second.
The video image is also referred to as video frame images, can be the key frame (I frame) or P frame or B frame of video,
It can be determined according to the compression situation of video to be clipped, the video that video record software is recorded on electronic equipment is usually to adopt
With frame data compression, the video image on time point taken is usually key frame images (I frame).
In one embodiment, after step 101, it may also include that generating each video image in described image set corresponds to
Thumbnail.And it may also include that the corresponding thumbnail of each video image in display described image set.
In an embodiment of the present invention, user can understand the content of video to be clipped by corresponding thumbnail, lead to
Thumbnail corresponding with each image in generation multi-frame video image is crossed, depositing for electronic equipment can be saved in the display thumbnail time
Space is stored up, and guarantees that user sees accurate video content, promotes user experience.
Step 102, classify to all video images in described image set.
It in one embodiment, can also include: to all video images in described image set before being classified
It is pre-processed.
Wherein, the pretreatment may include: the image data for inputting the video image, and described image data are carried out
Data type conversion carries out data normalization and whitening processing.
Wherein, input image data, the method that can be read using image decoding or image file;
Described image data are subjected to data type conversion, i.e., is converted to be more suitable to classify by the type of image data and calculate
The data type of method, such as int type (integer type) are converted into float type (float);
Data normalization, common method have simple scalability, sample-by-sample mean value abatement, feature normalization (to make institute in data set
Have feature that all there is zero-mean and unit variance) etc..Wherein simple scalability, it is therefore an objective to pass through the value of each latitude to data
It is readjusted, so that final data vector is fallen in the section of [0,1] or [- 1,1], for color image, color is logical
Between road and smooth performance is not present, therefore when handling color image, usually feature scaling is carried out to data, what we obtained
For pixel value in [0,255] section, common processing is to make these pixel values divided by 255 them to zoom to [0,1] section
It is interior;
Whitening processing, generally include PCA (Principal Component Analysis, principal component analysis) albefaction and
ZCA (Zero-phase Component Analysis, zero phase constituent analysis) albefaction, wherein PCA albefaction guarantees that data are each
The variance of dimension is 1, and ZCA albefaction guarantees that the variance of each dimension of data is identical.PCA albefaction, which can be used for dimensionality reduction, can also go phase
Guan Xing, and ZCA albefaction is mainly used for decorrelation, and makes the data after albefaction close to original input data as far as possible.PCA/ZCA is white
In change processing, covariance matrix is unit matrix, first has to make feature zero averaging, followed by select suitable epsilon (rule
Then change item, have low-pass filtering effect to data), generally to select sufficiently large epsilon to carry out PCA/ for color image
ZCA。
In addition, image preprocessing can also include following one or more processing in order to increase data set to deep learning:
The a series of overturning of the picture progress such as left and right of input is overturn, is spun upside down by Image Reversal, diagonal line is overturn
Deng carrying out expanding data amount with this so that the image of all angles has, moreover it is possible to alleviate the problem of identification mistake;
Colour switching, such as brightness, contrast, saturation degree, the tone etc. of adjustment image.
In embodiments of the present invention, can by deep learning algorithm to all video images in described image set into
Row classification.
In an embodiment of the present invention, artificial intelligence approach is introduced, is divided by frame image of the deep learning to video
Class, the deep learning is the Learning Studies based on deep neural network, and deep neural network has multiple hidden layers
Neural network, it is more useful to learn by constructing the training data of the machine learning model and magnanimity with many hidden layers
Feature, to finally promote the accuracy of classification or prediction.Wherein feature learning is by sample by layer-by-layer eigentransformation in original
The character representation in space transforms to a new feature space, to make to classify or predict to be more easier.Deep learning can be from big
The expression of automatic learning characteristic in data, wherein may include thousands of parameter.For example, Hinton research group in 2012
Participating in ImageNet ILSVRC match and using is convolutional network model, and the character representation of model contains 6,000 ten thousand parameters,
From sample middle schools up to a million acquistion to.
Deep learning training process may include:
It 1) is the process of a feature learning using the unsupervised learning from lower rising
Using no label data or there is each layer parameter of label data order training method, this step can be regarded as one without prison
Training process is superintended and directed, is to distinguish the best part with traditional neural network.
Wherein, first with no label or there is label data training first layer, when training first learns parameter (this layer of first layer
It is considered as obtaining the hidden layer for export and input the smallest three-layer neural network of difference);
After study obtains (n-1)th layer, by n-1 layers of the input exported as n-th layer, thus training n-th layer obtains respectively
To the parameter of each layer.
2) top-down supervised learning is a process to whole network tuning
It based on each layer parameter that the first step obtains, goes to train by the data of tape label, error is top-down to be transmitted into one
Step finely tunes the parameter of entire Multi-Layered Network Model, this step is a Training process.
From ImageNet go to school acquistion to character representation have very strong generalization ability, can successfully be applied to it
Its data acquisition system task.The training set much applied be it is lesser, in this case using deep learning can there are three types of
Method:
(1) it can will train obtained model as starting point on ImageNet, utilize target training set and backpropagation pair
It carries out continuing to train, and model is adapted to specific application.ImageNet plays the role of pre-training.
(2) if target training set is not big enough, the network parameter of low layer can also be fixed, continues to use the instruction on ImageNet
Practice collection as a result, being only updated to upper layer.This is because the network parameter of bottom is most difficult to update, and learn from ImageNet
Obtained bottom filter often describes a variety of different local edges and texture information, and these filters are to general figure
As there is preferable universality.
(3) model that training obtains is directlyed adopt on ImageNet, using the output of highest hidden layer as feature representation,
Instead of the feature of common hand-designed.
Wherein ImageNet data set is that the maximum data set of image recognition and deep learning image are led in the world at present
A very more fields for domain application, ImageNet data set have more than 1,400 ten thousand width pictures, cover a classification more than 20,000, wherein
It has more than million picture to be marked by specific classification, which has special messenger's maintenance, can all do mark update every year.
In one embodiment, after step 102, it may also include that and classification marker is arranged to the video image of each classification.
The classification marker can indicate the classification that the video image is belonged to.
It in one embodiment, can also include: that a video is selected from the video image of each classification after step 102
Image generates thumbnail.
In the embodiment of the present invention, a video image conduct can be selected from all video frame images of each classification
The video image of the classification generates the corresponding thumbnail of video image of each classification.The mode of selection can use preset
Rule, for example, select first in the video frame images of each classification or it is intermediate one or last, can also be random
Select a video image as the classification in the video frame images of each classification.
In one embodiment, it may also include that the corresponding thumbnail of each classification of display.
By showing the corresponding thumbnail of each classification, it includes the interior of which classification that user, which can see video to be clipped,
Hold, user can be allowed to understand video to be clipped from different sides, promotes user experience.
Step 103, the video image that the classification chosen is obtained from described image set, according to the classification chosen
Video image generates corresponding classification video.
In one embodiment, before step 103, it may also include that and receive the first user instruction, first user instruction is used
In the classification that instruction is chosen.
Wherein, first user instruction can be the selection instruction of user;For example, user can pass through touch screen, mouse
The input tools such as mark, keyboard select some classification.
In one embodiment, the video image that the classification chosen is obtained from described image set, comprising: from described
The video image labeled as the classification chosen is obtained in image collection.
Wherein it is possible to according to classification marker, the video image for the classification chosen described in acquisition.
In one embodiment, described after the video image for obtaining the classification chosen in described image set, it can also wrap
It includes: the corresponding thumbnail of all video images for the classification chosen described in generation.And it may also include that the display classification
The corresponding thumbnail of all video images.
User can pass through the thumbnail of all video images under the classification chosen in an embodiment of the present invention
Know the content in selected classification comprising which, user can be allowed to understand and check the video from the angle of classification, enhancing is used
The individualized experience at family.
In one embodiment, before the video image of the classification chosen according to generates corresponding classification video,
It may also include that and receive second user instruction, the second user instruction, which is used to indicate, generates the classification video.
Wherein, the second user instruction can be the confirmation instruction of user, receives the confirmation instruction of user, then starts to give birth to
Constituent class video.
In one embodiment, the video image of the classification chosen according to generates corresponding classification video, comprising:
The corresponding video clip of the video image of the classification is merged and encoded, the classification video is generated.
Wherein, the video clip in the corresponding time point section of each video image, for example, in a step 101, obtaining for every 1 second
A video image is taken, then corresponding 1 second video clip of each video image.
In this step, the corresponding video clip of the video image of the classification is merged, and is generated as needed
Video format, carry out secondary coding, generate the classification video.
Through the embodiment of the present invention, video clipping operation can be made convenient, and can also editing go out personalized classification view
Frequently, user experience is promoted, users ' individualized requirement is met.
It is illustrated below with an application example.
As shown in figure 3, the method for the video clipping for application example of the present invention, Fig. 4~Fig. 6 is application example of the present invention
The GUI schematic diagram of electronic equipment.
As shown in Fig. 4~Fig. 5, the GUI include video display area 401, video to be clipped video frame thumbnail show
The thumbnail display area 404 of all images under region 402, classification thumbnail display area 403, classification, video playing are worked as
Preceding moment 405, the total duration 406 of video, editing 407.
Wherein, video display area 401 is used to play the display of video.
The video frame thumbnail display area 402 of video to be clipped is used to show the contracting of the video frame images of video to be clipped
Sketch map, user can be presented whole thumbnails by sliding thumbnail complete.
Classification thumbnail display area 403 is for showing the corresponding thumbnail of the image of each classification.
The thumbnail display area 404 of all images is used to show all thumbnails of the classification under classification.
The current time 405 of video playing is used to show current schedules moment when video playing.
The total duration 406 of video is used to show the total duration of video, after having selected a classification, then shows to be the classification
The corresponding video clips of all images merge after new video duration.
Editing 407 is clip button, merges the corresponding video clips of all images of the classification after click, coding, generates
New classification video.
Referring to Fig. 3, the method for the video clipping of application example of the present invention, comprising:
Step 301, video to be clipped is obtained, starts to execute editing operation.
The executing subject of this application example can be video clipping class application program in electronic equipment.
Wherein, the application program can be the software program of operation on an electronic device, and electronic equipment can be individual
Computer, cloud device, mobile device such as smart phone or tablet computer etc..
Wherein, the video to be clipped is the video for needing to carry out it editing, can be passed through in electronic equipment for user
The relevant software of video record records some scenes, or the video obtained for user by other approach is such as
It is being downloaded on network, obtain from monitor and control facility.
Wherein, user triggers the editing operation to the video to be clipped, and triggering mode can pass through the text from electronic equipment
The video of any time length is imported video clipping class application program by the video that any time length is selected in part folder, or
Person can pass through its additional processing function such as editing of the video selection of any time length.
Step 302, the video image for obtaining video to be clipped each time point in multiple time points forms an image
Set;
In an embodiment of the present invention, user can by the video image in each time point in multiple time points come
Obtain the content of video to be clipped, acquired video frame images can be the key frame (I frame) or P frame or B of video
Frame is determined by the background operating system of video clipping class application program according to the compression situation of video to be clipped, electronic equipment
Frame data compression is usually used in the video that upper video record software is recorded, and the video image on time point taken is usually to close
Key frame image (I frame).
Wherein, multiple time points can be continuous time point, such as n seconds every, and n is the number greater than 0, can be by video clipping
The background operating system of class application program has different settings according to the difference of the time span of video to be clipped.
Step 310, for accessed image collection, the corresponding contracting of each video image in described image set is generated
Sketch map.
In an embodiment of the present invention, user can understand the content of video to be clipped by corresponding thumbnail, lead to
Thumbnail corresponding with each image in generation multi-frame video image is crossed, depositing for electronic equipment can be saved in the display thumbnail time
Space is stored up, and guarantees that user sees accurate video content, promotes user experience.
Step 311, whole thumbnails are shown.
In application example of the present invention, user can be presented whole thumbnails by sliding thumbnail complete.It is aobvious
Show region for the thumbnail display area 402 of video to be cut in Fig. 4.
Step 303, it (may include to image that every video image, which is pre-processed, in the image collection obtained to step 302
Data carry out data type conversion, carry out data normalization and whitening processing etc.), it prepares for image recognition.
Step 304, classified by deep learning algorithm to all video images in described image set, by video
Image recognition is at different classification.
Wherein, deep learning includes the study of feature learning and disaggregated model, practical real according to the difference of the scene of application
It is existing different.
For example, can choose often for garage or the monitor video of highway to the feature extraction of identified image
The license plate number of vehicle carries out intelligent recognition and classification according to license plate number.Car license recognition mainly divides 3 parts, comprising:
It is License Plate first, generally using color positioning, feature location etc.;
Followed by License Plate Segmentation, generally use sciagraphy;
Character recognition is relatively good come recognition effect using convolutional neural networks, and sample number is big when training network, training set
In character include all numbers and letter and the Chinese character in part province.Trained network can be used to carry out later
Character recognition.
For another example carrying out the feature of face to identified image for the monitor video of public place such as kindergarten etc.
It extracts, intelligent recognition and classification is carried out to face using face recognition algorithms.
Deep learning model can learn to express for the layered characteristic of facial image, may include:
The bottom can portray local edge and textural characteristics from original pixels learning filters;
By being combined to various boundary filters, middle layer filter can describe different types of human face;
The top global characteristics for describing entire face.
Deep learning provides distributed character representation.Highest hidden layer, each neuron represent a category
Property classifier, such as men and women, ethnic group and hair color etc..
In field of face identification, Labeled Faces in the Wild (LFW) is current famous recognition of face test
Collection, is created in 2007.LFW has collected the human face photo of a famous person more than 5,000 from internet, for assessing face recognition algorithms
Performance under the conditions of non-controllable.These photos often have complicated light, expression, posture, age and block etc.
Variation.The test set of LFW contains 6000 pairs of facial images.Wherein 3000 pairs are positive samples, and two each pair of images belong to together
One people;Remaining 3000 pairs are negative samples, and two each pair of images belong to different people.The accuracy rate guessed at random is 50%.Through
The face recognition algorithms Eigenface of allusion quotation only has 60% discrimination on this test set.In the algorithm of non-deep learning,
Best discrimination is 96.33%.Deep learning can achieve 99.47% discrimination at present.
Step 305, classification marker is carried out to video image according to the result of image recognition.
Step 312, the thumbnail of video image corresponding with classification is generated.
Wherein it is possible to select video figure of the image as the classification from all video images of each classification
Picture generates the corresponding thumbnail of video image of each classification.
In addition, since step 310 has generated the thumbnail of all video images, so in this step, it can also be direct
The corresponding thumbnail of video image of each classification is obtained from the thumbnail of all video images.
Step 313, classification thumbnail is shown.
Referring to fig. 4, classification thumbnail display area is thumbnail display area 403 of classifying in Fig. 4, in this application example,
Video image is divided into A, B, C tri- classification.
User can see that video to be clipped includes the content of which classification, Ke Yirang by corresponding classification thumbnail
User understands video from different sides, promotes user experience.
Step 306, the first user instruction is received, a classification is selected.
First user instruction can be the selection instruction of user, and in application example of the invention, user can be from
The classification thumbnail display area 403 of Fig. 4 selects a classification.
Step 307, it will be extracted labeled as all video images of the classification.
Step 314, thumbnail corresponding with all video images of the classification is generated.
Since step 310 has generated the thumbnail of all video images, so in this step, it can also be directly from all
All thumbnails of the classification are obtained in the thumbnail of video image.
Step 315, all thumbnails of the classification are shown.
In application example of the present invention, display area is the thumbnail display area 404 of all images under classifying in Fig. 5,
User selects the label seen after A classification for all thumbnails 404 of A.
User can know the content in selected classification comprising which by the thumbnail for lower all images of classifying, can be with
It allows user to understand and check the video from the angle of classification, enhances the individualized experience of user.
Step 308, second user instruction is received, determines editing.
The confirmation that the second user instruction can be user instructs in application example of the present invention, and user can pass through a little
407 icon of editing hit in Fig. 5 come determine editing selected classification under all video clips.
Step 309, the corresponding video clips of all images of the classification are merged, coding, generates new classification video.
As shown in fig. 6, user selects editing, the new video of A classification is obtained.
The embodiment of the present invention also provides a kind of device of video clipping, and the device is for realizing above-described embodiment and embodiment party
Formula, the descriptions that have already been made will not be repeated.As used below, the software of predetermined function may be implemented in term " module "
And/or the combination of hardware.Although device described in following embodiment can be realized with software, hardware or software
Realization with the combination of hardware is also that may and be contemplated.
As shown in fig. 7, the device of the video clipping of the embodiment of the present invention, comprising:
Image set obtains module 701, for obtaining the video figure of video to be clipped each time point in multiple time points
Picture forms image collection;
Categorization module 702, for classifying to all video images in described image set;
Editing module 703 is chosen for obtaining the video image for the classification chosen from described image set according to described
The video image of classification generate corresponding classification video.
Through the embodiment of the present invention, video clipping operation can be made convenient, and can also editing go out personalized classification view
Frequently, user experience is promoted, users ' individualized requirement is met.
As shown in figure 8, in one embodiment, described device further include:
First generation module 704, for generating the corresponding thumbnail of each video image in described image set.
In one embodiment, described device further include:
First display module 705, for showing the corresponding thumbnail of each video image in described image set.
In one embodiment, the categorization module 702, is used for:
Classified by deep learning algorithm to all video images in described image set.
In one embodiment, the categorization module 702, is also used to through deep learning algorithm in described image set
Before all video images are classified, all video images in described image set are pre-processed.
In one embodiment, the categorization module 702 is also used to: being carried out to all video images in described image set
After classification, classification marker is arranged to the video image of each classification;
The editing module, for obtaining the video image labeled as the classification chosen from described image set.
In one embodiment, described device further include:
Second generation module 706 generates thumbnail for selecting a video image from the video image of each classification.
In one embodiment, described device further include:
Second display module 707, for showing the corresponding thumbnail of each classification.
In one embodiment, described device further include:
First receiving module 708, for receiving the first user instruction, first user instruction is used to indicate point chosen
Class.
In one embodiment, described device further include:
Third generation module 709, for generating the corresponding thumbnail of all video images of the classification chosen.
In one embodiment, described device further include:
Third display module 710, for showing the corresponding thumbnail of all video images of the classification chosen.
In one embodiment, described device further include:
Second receiving module 711, for receiving second user instruction, the second user instruction is used to indicate described in generation
Classification video.
In one embodiment, the editing module 703, is used for:
The corresponding video clip of the video image of the classification is merged and encoded, the classification video is generated.
The embodiment of the present invention also provides a kind of electronic equipment, and the electronic equipment can be PC, cloud device, shifting
Dynamic equipment such as smart phone or tablet computer etc..The electronic equipment includes:
Processor;
For storing the memory of the processor-executable instruction;
Wherein, the processor is for performing the following operations:
The video image of video to be clipped each time point in multiple time points is obtained, image collection is formed;
Classify to all video images in described image set;
The video image that the classification chosen is obtained from described image set, according to the video image of the classification chosen
Generate corresponding classification video.
In one embodiment, the processor is for performing the following operations:
In the video image for obtaining video each time point in multiple time points to be clipped, after forming image collection,
Generate the corresponding thumbnail of each video image in described image set.
In one embodiment, the processor is for performing the following operations:
In generating described image set after the corresponding thumbnail of each video image, show every in described image set
The corresponding thumbnail of a video image.
In one embodiment, the processor is for performing the following operations:
Classified by deep learning algorithm to all video images in described image set.
In one embodiment, the processor is for performing the following operations:
Before being classified by deep learning algorithm to all video images in described image set, to the figure
All video images in image set conjunction are pre-processed.
In one embodiment, the processor is for performing the following operations:
Described image data are carried out data type conversion by the image data for inputting the video image, are carried out data and are returned
One change and whitening processing.
In one embodiment, the processor is for performing the following operations:
After classifying to all video images in described image set, the video image of each classification is arranged
Classification marker;
The video image labeled as the classification chosen is obtained from described image set.
In one embodiment, the processor is for performing the following operations:
After classifying to all video images in described image set, selected from the video image of each classification
A video image is selected, thumbnail is generated.
In one embodiment, the processor is for performing the following operations:
A video image is selected in the video image from each classification, after generating thumbnail, shows each classification pair
The thumbnail answered.
In one embodiment, the processor is for performing the following operations:
Before the video image for obtaining the classification chosen in described image set, the first user instruction is being received, it is described
First user instruction is used to indicate the classification chosen.
In one embodiment, the processor is for performing the following operations:
In the institute for the classification after the video image for obtaining the classification chosen in described image set, chosen described in generation
There is the corresponding thumbnail of video image.
In one embodiment, the processor is for performing the following operations:
After the corresponding thumbnail of all video images for the classification chosen described in the generation, the classification chosen described in display
The corresponding thumbnail of all video images.
In one embodiment, the processor is for performing the following operations:
Before the video image for the classification chosen according to generates corresponding classification video, receives second user and refer to
It enables, the second user instruction, which is used to indicate, generates the classification video.
In one embodiment, the processor is for performing the following operations:
The corresponding video clip of the video image of the classification is merged and encoded, the classification video is generated.
The embodiment of the present invention also provides a kind of computer readable storage medium, is stored with computer executable instructions, described
The method that computer executable instructions are used to execute the video clipping.
In the present embodiment, above-mentioned storage medium can include but is not limited to: USB flash disk, read-only memory (ROM, Read-
Only Memory), random access memory (RAM, Random Access Memory), mobile hard disk, magnetic or disk etc.
The various media that can store program code.
Obviously, it is logical to should be understood that the module of the above-mentioned embodiment of the present invention or step can be used by those skilled in the art
Computing device realizes that they can be concentrated on a single computing device, or be distributed in multiple computing device institutes group
At network on, optionally, they can be realized with the program code that computing device can perform, it is thus possible to which they are deposited
Storage is performed by computing device in the storage device, and in some cases, can be to be different from sequence execution institute herein
The step of showing or describing, perhaps they are fabricated to integrated circuit modules or by the multiple modules or step in them
Suddenly single integrated circuit module is fabricated to realize.In this way, the embodiment of the present invention is not limited to any specific hardware and software
In conjunction with.
Although disclosed herein embodiment it is as above, the content only for ease of understanding the present invention and use
Embodiment is not intended to limit the invention.Technical staff in any fields of the present invention is taken off not departing from the present invention
Under the premise of the spirit and scope of dew, any modification and variation, but the present invention can be carried out in the form and details of implementation
Scope of patent protection, still should be subject to the scope of the claims as defined in the appended claims.
Claims (30)
1. a kind of method of video clipping, comprising:
The video image of video to be clipped each time point in multiple time points is obtained, image collection is formed;
Classify to all video images in described image set;
The video image that the classification chosen is obtained from described image set is generated according to the video image of the classification chosen
Corresponding classification video.
2. the method as described in claim 1, which is characterized in that it is described obtain video to be clipped it is each in multiple time points when
Between the video image put, after forming image collection, further includes:
Generate the corresponding thumbnail of each video image in described image set.
3. method according to claim 2, which is characterized in that each video image is corresponding in the generation described image set
Thumbnail after, further includes:
Show the corresponding thumbnail of each video image in described image set.
4. the method as described in claim 1, which is characterized in that all video images in described image set carry out
Classification includes:
Classified by deep learning algorithm to all video images in described image set.
5. method as claimed in claim 4, which is characterized in that it is described by deep learning algorithm in described image set
Before all video images are classified, further includes:
All video images in described image set are pre-processed.
6. method as claimed in claim 5, which is characterized in that all video images in described image set carry out
Pretreatment includes:
Described image data are carried out data type conversion, carry out data normalization by the image data for inputting the video image
And whitening processing.
7. the method as described in claim 1, which is characterized in that
After all video images in described image set are classified, further includes: to the video figure of each classification
As setting classification marker;
The video image that the classification chosen is obtained from described image set, comprising: mark is obtained from described image set
It is denoted as the video image of the classification chosen.
8. the method as described in claim 1, which is characterized in that all video images in described image set carry out
After classification, further includes:
A video image is selected from the video image of each classification, generates thumbnail.
9. method according to claim 8, which is characterized in that described to select a video figure from the video image of each classification
Picture, generate thumbnail after, further includes:
Show the corresponding thumbnail of each classification.
10. the method as described in claim 1, which is characterized in that described to obtain the classification chosen from described image set
Before video image, further includes:
The first user instruction is received, first user instruction is used to indicate the classification chosen.
11. the method as described in claim 1, which is characterized in that described to obtain the classification chosen from described image set
After video image, further includes:
The corresponding thumbnail of all video images for the classification chosen described in generation.
12. method as claimed in claim 11, which is characterized in that all video images for the classification chosen described in the generation
After corresponding thumbnail, further includes:
The corresponding thumbnail of all video images for the classification chosen described in display.
13. the method as described in claim 1, which is characterized in that the video image of the classification chosen according to generates
Before corresponding classification video, further includes:
Second user instruction is received, the second user instruction, which is used to indicate, generates the classification video.
14. the method as described in claim 1, which is characterized in that the video image of the classification chosen according to generates
Corresponding classification video, comprising:
The corresponding video clip of the video image of the classification is merged and encoded, the classification video is generated.
15. the method as described in any one of claim 1~14, which is characterized in that
The multiple time point is continuous time point.
16. the method as described in any one of claim 1~14, which is characterized in that
The video image is key frame images.
17. a kind of device of video clipping characterized by comprising
Image set obtains module, for obtaining the video image of video to be clipped each time point in multiple time points, forms
Image collection;
Categorization module, for classifying to all video images in described image set;
Editing module, for obtaining the video image for the classification chosen from described image set, according to the classification chosen
Video image generate corresponding classification video.
18. device as claimed in claim 17, which is characterized in that further include:
First generation module, for generating the corresponding thumbnail of each video image in described image set.
19. device as claimed in claim 18, which is characterized in that further include:
First display module, for showing the corresponding thumbnail of each video image in described image set.
20. device as claimed in claim 17, which is characterized in that the categorization module is used for:
Classified by deep learning algorithm to all video images in described image set.
21. device as claimed in claim 20, which is characterized in that
The categorization module is also used to classify to all video images in described image set by deep learning algorithm
Before, all video images in described image set are pre-processed.
22. device as claimed in claim 17, which is characterized in that
The categorization module is also used to: after classifying to all video images in described image set, to each classification
Video image be arranged classification marker;
The editing module, for obtaining the video image labeled as the classification chosen from described image set.
23. device as claimed in claim 17, which is characterized in that further include:
Second generation module generates thumbnail for selecting a video image from the video image of each classification.
24. device as claimed in claim 23, which is characterized in that further include:
Second display module, for showing the corresponding thumbnail of each classification.
25. device as claimed in claim 17, which is characterized in that further include:
First receiving module, for receiving the first user instruction, first user instruction is used to indicate the classification chosen.
26. device as claimed in claim 17, which is characterized in that further include:
Third generation module, for generating the corresponding thumbnail of all video images of the classification chosen.
27. device as claimed in claim 26, which is characterized in that further include:
Third display module, for showing the corresponding thumbnail of all video images of the classification chosen.
28. device as claimed in claim 17, which is characterized in that further include:
Second receiving module, for receiving second user instruction, the second user instruction, which is used to indicate, generates the classification view
Frequently.
29. device as claimed in claim 17, which is characterized in that the editing module is used for:
The corresponding video clip of the video image of the classification is merged and encoded, the classification video is generated.
30. a kind of electronic equipment characterized by comprising
Processor;
For storing the memory of the processor-executable instruction;
Wherein, the processor is for performing the following operations:
The video image of video to be clipped each time point in multiple time points is obtained, image collection is formed;
Classify to all video images in described image set;
The video image that the classification chosen is obtained from described image set is generated according to the video image of the classification chosen
Corresponding classification video.
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810308302.3A CN110351597A (en) | 2018-04-08 | 2018-04-08 | A kind of method, apparatus and electronic equipment of video clipping |
PCT/CN2019/081749 WO2019196795A1 (en) | 2018-04-08 | 2019-04-08 | Video editing method, device and electronic device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810308302.3A CN110351597A (en) | 2018-04-08 | 2018-04-08 | A kind of method, apparatus and electronic equipment of video clipping |
Publications (1)
Publication Number | Publication Date |
---|---|
CN110351597A true CN110351597A (en) | 2019-10-18 |
Family
ID=68163482
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810308302.3A Withdrawn CN110351597A (en) | 2018-04-08 | 2018-04-08 | A kind of method, apparatus and electronic equipment of video clipping |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN110351597A (en) |
WO (1) | WO2019196795A1 (en) |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110856037A (en) * | 2019-11-22 | 2020-02-28 | 北京金山云网络技术有限公司 | Video cover determination method and device, electronic equipment and readable storage medium |
CN110910470A (en) * | 2019-11-11 | 2020-03-24 | 广联达科技股份有限公司 | Method and device for generating high-quality thumbnail |
CN113395542A (en) * | 2020-10-26 | 2021-09-14 | 腾讯科技(深圳)有限公司 | Video generation method and device based on artificial intelligence, computer equipment and medium |
CN114302224A (en) * | 2021-12-23 | 2022-04-08 | 新华智云科技有限公司 | Intelligent video editing method, device, equipment and storage medium |
WO2022081081A1 (en) * | 2020-10-15 | 2022-04-21 | 脸萌有限公司 | Video distribution system and method, computing device, and user equipment |
CN117177006A (en) * | 2023-09-01 | 2023-12-05 | 湖南广播影视集团有限公司 | CNN algorithm-based short video intelligent manufacturing method |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115460446A (en) * | 2022-08-19 | 2022-12-09 | 上海爱奇艺新媒体科技有限公司 | Alignment method and device for multiple paths of video signals and electronic equipment |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102819528A (en) * | 2011-06-10 | 2012-12-12 | 中国电信股份有限公司 | Method and device for generating video abstraction |
CN103079117A (en) * | 2012-12-30 | 2013-05-01 | 信帧电子技术(北京)有限公司 | Video abstract generation method and video abstract generation device |
CN106937120A (en) * | 2015-12-29 | 2017-07-07 | 北京大唐高鸿数据网络技术有限公司 | Object-based monitor video method for concentration |
CN107566907A (en) * | 2017-09-20 | 2018-01-09 | 广东欧珀移动通信有限公司 | video clipping method, device, storage medium and terminal |
Family Cites Families (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
TW520602B (en) * | 2001-06-28 | 2003-02-11 | Ulead Systems Inc | Device and method of editing video program |
US7954065B2 (en) * | 2006-12-22 | 2011-05-31 | Apple Inc. | Two-dimensional timeline display of media items |
CN105763884A (en) * | 2014-12-18 | 2016-07-13 | 广州市动景计算机科技有限公司 | Video processing method, device and apparatus |
CN104796781B (en) * | 2015-03-31 | 2019-01-18 | 小米科技有限责任公司 | Video clip extracting method and device |
CN107295377B (en) * | 2017-07-14 | 2020-12-01 | 程工 | Film production method, device and system |
-
2018
- 2018-04-08 CN CN201810308302.3A patent/CN110351597A/en not_active Withdrawn
-
2019
- 2019-04-08 WO PCT/CN2019/081749 patent/WO2019196795A1/en active Application Filing
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102819528A (en) * | 2011-06-10 | 2012-12-12 | 中国电信股份有限公司 | Method and device for generating video abstraction |
CN103079117A (en) * | 2012-12-30 | 2013-05-01 | 信帧电子技术(北京)有限公司 | Video abstract generation method and video abstract generation device |
CN106937120A (en) * | 2015-12-29 | 2017-07-07 | 北京大唐高鸿数据网络技术有限公司 | Object-based monitor video method for concentration |
CN107566907A (en) * | 2017-09-20 | 2018-01-09 | 广东欧珀移动通信有限公司 | video clipping method, device, storage medium and terminal |
Cited By (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110910470A (en) * | 2019-11-11 | 2020-03-24 | 广联达科技股份有限公司 | Method and device for generating high-quality thumbnail |
CN110910470B (en) * | 2019-11-11 | 2023-07-07 | 广联达科技股份有限公司 | Method and device for generating high-quality thumbnail |
CN110856037A (en) * | 2019-11-22 | 2020-02-28 | 北京金山云网络技术有限公司 | Video cover determination method and device, electronic equipment and readable storage medium |
WO2022081081A1 (en) * | 2020-10-15 | 2022-04-21 | 脸萌有限公司 | Video distribution system and method, computing device, and user equipment |
US11838576B2 (en) | 2020-10-15 | 2023-12-05 | Lemon Inc. | Video distribution system, method, computing device and user equipment |
CN113395542A (en) * | 2020-10-26 | 2021-09-14 | 腾讯科技(深圳)有限公司 | Video generation method and device based on artificial intelligence, computer equipment and medium |
CN114302224A (en) * | 2021-12-23 | 2022-04-08 | 新华智云科技有限公司 | Intelligent video editing method, device, equipment and storage medium |
CN114302224B (en) * | 2021-12-23 | 2023-04-07 | 新华智云科技有限公司 | Intelligent video editing method, device, equipment and storage medium |
CN117177006A (en) * | 2023-09-01 | 2023-12-05 | 湖南广播影视集团有限公司 | CNN algorithm-based short video intelligent manufacturing method |
Also Published As
Publication number | Publication date |
---|---|
WO2019196795A1 (en) | 2019-10-17 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110351597A (en) | A kind of method, apparatus and electronic equipment of video clipping | |
Deng et al. | Image aesthetic assessment: An experimental survey | |
Williams et al. | Images as data for social science research: An introduction to convolutional neural nets for image classification | |
Botha et al. | Fake news and deepfakes: A dangerous threat for 21st century information security | |
Karayev et al. | Recognizing image style | |
KR102290419B1 (en) | Method and Appratus For Creating Photo Story based on Visual Context Analysis of Digital Contents | |
Höferlin et al. | Inter-active learning of ad-hoc classifiers for video visual analytics | |
CN110889672B (en) | Student card punching and class taking state detection system based on deep learning | |
US8548249B2 (en) | Information processing apparatus, information processing method, and program | |
JP2011215963A (en) | Electronic apparatus, image processing method, and program | |
Li et al. | Videography-based unconstrained video analysis | |
Li et al. | Data-driven affective filtering for images and videos | |
Zhang et al. | A comprehensive survey on computational aesthetic evaluation of visual art images: Metrics and challenges | |
Vonikakis et al. | A probabilistic approach to people-centric photo selection and sequencing | |
Ismail et al. | Deepfake video detection: YOLO-Face convolution recurrent approach | |
Yang et al. | A comprehensive survey on image aesthetic quality assessment | |
Verma et al. | Age prediction using image dataset using machine learning | |
Somaini | On the altered states of machine vision: Trevor Paglen, Hito Steyerl, Grégory Chatonsky | |
CN106657817A (en) | Processing method applied to mobile phone platform for automatically making album MV | |
CN112040273B (en) | Video synthesis method and device | |
Dao et al. | Robust event discovery from photo collections using Signature Image Bases (SIBs) | |
Debnath et al. | Computational approaches to aesthetic quality assessment of digital photographs: state of the art and future research directives | |
Yang et al. | Learning the synthesizability of dynamic texture samples | |
Tian et al. | Relative aesthetic quality ranking | |
Sasireka | Comparative analysis on video retrieval technique using machine learning |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WW01 | Invention patent application withdrawn after publication | ||
WW01 | Invention patent application withdrawn after publication |
Application publication date: 20191018 |