Summary of the invention
Embodiment of the disclosure proposes the method and apparatus for generating video tab model, and for generating video
Class label set method and apparatus.
In a first aspect, embodiment of the disclosure provides a kind of method for generating video tab model, this method packet
It includes: obtaining at least two Sample video collection, wherein Sample video collection corresponds to preset video classification, and Sample video collection includes
Belong to the corresponding other positive sample video of video class and is not belonging to the corresponding other negative sample video of video class, positive sample video pair
The negative classification information marked in advance should be corresponded in the positive classification information marked in advance, negative sample video;From at least two samples
It selects Sample video collection to execute following training step using selected Sample video collection in video set: utilizing machine learning side
Method, the positive sample video for including using Sample video collection is as input, by positive classification information corresponding with the positive sample video of input
As desired output, the negative sample video that Sample video is concentrated, will be corresponding with the negative sample video of input negative as input
Classification information is as desired output, training initial model;Whether determine that at least two Sample videos are concentrated includes non-selected sample
This video set;It does not include that the initial model after determining the last time training is video tab model in response to determination.
In some embodiments, this method further include: in response to determining that at least two Sample videos are concentrated including non-selected
Sample video collection, from non-selected Sample video concentration reselect Sample video collection, utilize reselect sample view
Initial model after frequency collection and the last training, continues to execute training step.
In some embodiments, positive classification information and negative classification information are respectively the vector for including preset quantity element,
Object element in the corresponding vector of positive sample video belongs to corresponding video classification, negative sample view for characterizing positive sample video
Frequently the object element in corresponding vector is not belonging to corresponding video classification for characterizing negative sample video, and object element is vector
In element position in, the corresponding video classification of the Sample video collection belonging to the Sample video corresponding with vector in advance is established
Element at the element position of corresponding relationship.
In some embodiments, initial model is convolutional neural networks, including feature extraction layer and classification layer, layer packet of classifying
Preset quantity weighted data is included, weighted data corresponds to preset video classification, for determining that the video of input belongs to weight
The other probability of the corresponding video class of data.
In some embodiments, training initial model, comprising: in fixed preset quantity weighted data, except sample view
Frequency collects other weighted datas except corresponding weighted data, and the corresponding weighted data of adjustment Sample video collection, to first
Beginning model is trained.
In some embodiments, initial model further includes video frame extraction layer;And the positive sample for by Sample video collection including
This video is as input, using positive classification information corresponding with the positive sample video of input as desired output, by Sample video collection
In negative sample video as input, negative classification information corresponding with the negative sample video of input is used as to desired output, it is trained
Initial model, comprising: the positive sample video input video frame extraction layer for including by Sample video collection obtains positive sample set of video
It closes;It, will be corresponding with the positive sample video of input using obtained positive sample sets of video frames as the input of feature extraction layer
Positive desired output of the classification information as initial model;The negative sample video input video frame extraction for including by Sample video collection
Layer, obtains negative sample sets of video frames;Using obtained negative sample sets of video frames as the input of feature extraction layer, will with it is defeated
Desired output of the corresponding negative classification information of the negative sample video entered as initial model, training initial model.
Second aspect, embodiment of the disclosure provide a kind of method for generating the class label set of video, should
Method includes: to obtain video to be sorted;The video tab model that video input to be sorted is trained in advance generates class label collection
It closes, wherein class label corresponds to preset video classification, belongs to the corresponding video of class label for characterizing video to be sorted
Classification, video tab model are that the method described according to any embodiment in above-mentioned first aspect generates.
The third aspect, embodiment of the disclosure provide a kind of for generating the device of video tab model, the device packet
Include: acquiring unit is configured to obtain at least two Sample video collection, wherein Sample video collection corresponds to preset video class
Not, Sample video collection includes belonging to the corresponding other positive sample video of video class and being not belonging to the corresponding other negative sample of video class
Video, positive sample video correspond to the positive classification information marked in advance, and negative sample video corresponds to the negative classification letter marked in advance
Breath;Training unit is configured to concentrate selection Sample video collection from least two Sample videos, utilizes selected Sample video
Collection, executes following training step: using machine learning method, the positive sample video for including using Sample video collection, will as input
Positive classification information corresponding with the positive sample video of input as desired output, negative sample video that Sample video is concentrated as
Input, using negative classification information corresponding with the negative sample video of input as desired output, training initial model;Determine at least two
Whether it includes non-selected Sample video collection that a Sample video is concentrated;It does not include after determining the last training in response to determination
Initial model be video tab model.
In some embodiments, device further include: selecting unit is configured in response to determine at least two samples view
It includes non-selected Sample video collection that frequency, which is concentrated, reselects Sample video collection from non-selected Sample video concentration, utilizes
Initial model after the Sample video collection reselected and the last training, continues to execute training step.
In some embodiments, positive classification information and negative classification information are respectively the vector for including preset quantity element,
Object element in the corresponding vector of positive sample video belongs to corresponding video classification, negative sample view for characterizing positive sample video
Frequently the object element in corresponding vector is not belonging to corresponding video classification for characterizing negative sample video, and object element is vector
In element position in, the corresponding video classification of the Sample video collection belonging to the Sample video corresponding with vector in advance is established
Element at the element position of corresponding relationship.
In some embodiments, initial model is convolutional neural networks, including feature extraction layer and classification layer, layer packet of classifying
Preset quantity weighted data is included, weighted data corresponds to preset video classification, for determining that the video of input belongs to weight
The other probability of the corresponding video class of data.
In some embodiments, training unit is further configured to: in fixed preset quantity weighted data, except sample
Other weighted datas except the corresponding weighted data of this video set, and the corresponding weighted data of adjustment Sample video collection, with
Initial model is trained.
In some embodiments, initial model further includes video frame extraction layer;And training unit is further configured to:
The positive sample video input video frame extraction layer for including by Sample video collection, obtains positive sample sets of video frames;It will be obtained
Input of the positive sample sets of video frames as feature extraction layer, using positive classification information corresponding with the positive sample video of input as
The desired output of initial model;The negative sample video input video frame extraction layer for including by Sample video collection obtains negative sample view
Frequency frame set;Using obtained negative sample sets of video frames as the input of feature extraction layer, by the negative sample video with input
Desired output of the corresponding negative classification information as initial model, training initial model.
Fourth aspect, embodiment of the disclosure provide it is a kind of for generating the device of the class label set of video, should
Device includes: acquiring unit, is configured to obtain video to be sorted;Generation unit is configured to video input to be sorted is pre-
First trained video tab model generates class label set, wherein class label corresponds to preset video classification, is used for
It characterizes video to be sorted and belongs to the corresponding video classification of class label, video tab model is according to any in above-mentioned first aspect
What the method for embodiment description generated.
5th aspect, embodiment of the disclosure provide a kind of electronic equipment, which includes: one or more places
Manage device;Storage device is stored thereon with one or more programs;When one or more programs are held by one or more processors
Row, so that one or more processors realize the method as described in implementation any in first aspect or second aspect.
6th aspect, embodiment of the disclosure provide a kind of computer-readable medium, are stored thereon with computer program,
The method as described in implementation any in first aspect or second aspect is realized when the computer program is executed by processor.
The method and apparatus for generating video tab model that embodiment of the disclosure provides, by obtaining at least two
Sample video collection, wherein Sample video collection corresponds to preset video classification, and Sample video collection includes positive sample video and negative sample
This video, positive sample video correspond to positive classification information, and negative sample video corresponds to negative classification information;Then by positive sample video
It is using negative sample video as input, negative classification information is defeated as it is expected using positive classification information as desired output as input
Out, initial model is trained, final training obtains video tab model, since positive classification information and negative classification information are needles
To the information of single classification, therefore, embodiment of the disclosure can not use the classification information for being labeled as multi-tag, and need to only make
It is trained with the classification information for being labeled as single label, the video tab model that can carry out multi-tag classification is obtained, thus sharp
With single labeling simplicity, targeted feature, the flexibility of model training is improved, and help to improve and utilize video
Accuracy of the label model to visual classification.
Specific embodiment
The disclosure is described in further detail with reference to the accompanying drawings and examples.It is understood that this place is retouched
The specific embodiment stated is used only for explaining that correlation is open, rather than the restriction to the disclosure.It also should be noted that in order to
Convenient for description, is illustrated only in attached drawing and disclose relevant part to related.
It should be noted that in the absence of conflict, the feature in embodiment and embodiment in the disclosure can phase
Mutually combination.The disclosure is described in detail below with reference to the accompanying drawings and in conjunction with the embodiments.
Fig. 1 is shown can be using embodiment of the disclosure for generating the method for video tab model or for generating
The device of video tab model, and the class label set for generating video method or classification mark for generating video
Sign the exemplary system architecture 100 of the device of set.
As shown in Figure 1, system architecture 100 may include terminal device 101,102,103, network 104 and server 105.
Network 104 between terminal device 101,102,103 and server 105 to provide the medium of communication link.Network 104 can be with
Including various connection types, such as wired, wireless communication link or fiber optic cables etc..
User can be used terminal device 101,102,103 and be interacted by network 104 with server 105, to receive or send out
Send message etc..Various telecommunication customer end applications can be installed on terminal device 101,102,103, such as video playing application,
Video processing applications, web browser applications, social platform software etc..
Terminal device 101,102,103 can be hardware, be also possible to software.When terminal device 101,102,103 is hard
When part, it can be various electronic equipments.When terminal device 101,102,103 is software, above-mentioned electronic equipment may be mounted at
In.Multiple softwares or software module (such as providing the software of Distributed Services or software module) may be implemented into it,
Single software or software module may be implemented into.It is not specifically limited herein.
Server 105 can be to provide the server of various services, such as uploaded using terminal device 101,102,103
The background model server of Sample video collection progress model training.Background model server can use at least two samples of acquisition
This video set carries out model training, generates video tab model, can also send video tab model on terminal device, or
Video to be sorted is handled using video tab model, obtains the label of video to be sorted.
It should be noted that can be by taking for generating the method for video tab model provided by embodiment of the disclosure
Business device 105 executes, and can also be executed by terminal device 101,102,103, correspondingly, for generating the device of video tab model
It can be set in server 105, also can be set in terminal device 101,102,103.In addition, embodiment of the disclosure institute
The method of the class label set for generating video provided can be executed by server 105, can also be by terminal device
101, it 102,103 executes, correspondingly, can be set in server 105 for generating the device of class label set of video,
Also it can be set in terminal device 101,102,103.
It should be noted that server can be hardware, it is also possible to software.When server is hardware, may be implemented
At the distributed server cluster that multiple servers form, individual server also may be implemented into.It, can when server is software
To be implemented as multiple softwares or software module (such as providing the software of Distributed Services or software module), also may be implemented
At single software or software module.It is not specifically limited herein.
It should be understood that the number of terminal device, network and server in Fig. 1 is only schematical.According to realization need
It wants, can have any number of terminal device, network and server.Sample video collection needed for training pattern is not required to from remote
Journey obtains or video to be sorted is not required in the case where long-range obtain, and above system framework can not include network, and only need
Server or terminal device.
With continued reference to Fig. 2, one embodiment of the method for generating video tab model according to the disclosure is shown
Process 200.The method for being used to generate video tab model, comprising the following steps:
Step 201, at least two Sample video collection are obtained.
In the present embodiment, for generating executing subject (such as the server shown in FIG. 1 of the method for video tab model
Or terminal device) at least two Sample video collection can be obtained from long-range by wired connection mode or radio connection,
Or at least two Sample video collection are obtained from local.Wherein, Sample video collection corresponds to preset video classification, Sample video collection
Including belonging to the corresponding other positive sample video of video class and being not belonging to the corresponding other negative sample video of video class, positive sample view
Frequency corresponds to the positive classification information marked in advance, and negative sample video corresponds to the negative classification information marked in advance.
Specifically, positive classification information and negative classification information may include the information of following at least one form: text, number
Word, symbol etc..For example, positive classification information can be " seashore ", negative classification information can be " non-seashore ".
It should be noted that positive sample video and negative sample video used by the present embodiment include containing at least two figures
The image sequence of picture.
In some optional implementations of the present embodiment, positive classification information and negative classification information include respectively presetting
The vector of quantity element, object element in the corresponding vector of positive sample video for characterize positive sample video belong to it is corresponding
Video classification, the object element in the corresponding vector of negative sample video are not belonging to corresponding video class for characterizing negative sample video
Not, object element is to establish the member of corresponding relationship positioned at video classification corresponding with vector in advance in the element position in vector
Element at plain position.The corresponding video classification of vector is the corresponding view of Sample video collection belonging to the corresponding Sample video of vector
Frequency classification.
As an example it is supposed that preset quantity is 100, for a Sample video collection, the corresponding video of Sample video collection
Classification be seashore class, then the Sample video concentrate the corresponding positive classification information of positive sample video can be vector (1,0,0,
0 ..., 0), which includes 100 elements, wherein first element corresponds to seashore class.Here, number 1 indicates that video belongs to
Seashore class, other elements 0 indicate the corresponding video classification of element position that video is not belonging to where 0.Correspondingly, negative classification letter
Breath can be vector (0,0,0,0 ..., 0).It should be noted that other elements are also possible to other numerical value, it is not limited to 0.It uses
The classification information of vector form mark is commonly used in training multi-tag disaggregated model, since a vector here is for characterizing one
Whether a video belongs to a video classification, and therefore, the classification information of this implementation can regard single label as.Utilizing some
When Sample video collection is trained, it can be conducive to simplify training step using the training method of single label model.
Classification information is characterized by using vector, neatly the video classification of video tab model identification can also be carried out
Extension.For example, it is assumed that preset quantity is 100, i.e., model is at best able to identify the video of 100 classifications.In practical application, only
It need to identify 10 video classifications, the 1st to the 10th element in vector corresponds respectively to preset video classification.When needs make to regard
When frequency label model can identify the video of larger class, only the corresponding video classification of other elements need to be set, so as to spirit
Ground living is extended the recognition capability of video tab model.
Step 202, selection Sample video collection is concentrated from least two Sample videos, using selected Sample video collection,
Execute following training step: using machine learning method, the positive sample video for including using Sample video collection as inputting, will with it is defeated
The corresponding positive classification information of the positive sample video entered is as desired output, and the negative sample video that Sample video is concentrated is as defeated
Enter, using negative classification information corresponding with the negative sample video of input as desired output, training initial model;Determine at least two
Whether it includes non-selected Sample video collection that Sample video is concentrated;It does not include after determining the last training in response to determination
Initial model is video tab model.
In the present embodiment, above-mentioned executing subject can execute following sub-step:
Step 2021, selection Sample video collection is concentrated from least two Sample videos.
Specifically, above-mentioned executing subject can select Sample video collection in various manners, such as randomly choose, according to pre-
Number order selection for each Sample video collection being first arranged etc..
Then, using selected Sample video collection, following training step (including step 2022- step 2024) is executed.
Step 2022, using machine learning method, the positive sample video for including using Sample video collection as input, will with it is defeated
The corresponding positive classification information of the positive sample video entered is as desired output, and the negative sample video that Sample video is concentrated is as defeated
Enter, using negative classification information corresponding with the negative sample video of input as desired output, training initial model.
Specifically, initial model can be various types of models, such as Recognition with Recurrent Neural Network model, convolutional neural networks
Model etc..During training initial model, for the positive sample video or negative sample video of each training input, it can obtain
To reality output.Wherein, reality output is the data of initial model reality output, for characterizing classification information.Then, above-mentioned to hold
Row main body can use gradient descent method, be based on reality output and desired output, adjust the parameter of initial model, will adjust every time
Initial model of the model obtained after parameter as training next time, and in the case where meeting preset termination condition, terminate needle
Training to a Sample video collection.It should be noted that the training termination condition here preset at can include but is not limited to
At least one of lower: the training time is more than preset duration;Frequency of training is more than preset times;Using preset loss function (such as
Cross entropy loss function) resulting penalty values are calculated less than default penalty values threshold value.
As an example, initial model may include at least two 2 disaggregated models, and each two disaggregated model corresponds to
One Sample video collection.For some two disaggregated model, which can include based on corresponding Sample video collection
Positive sample video and the training of negative sample video obtain.Two disaggregated model that final training is completed, can determine the video of input
Whether two disaggregated model corresponding video classification is belonged to, if it is determined that is belonged to, is then generated for characterizing two disaggregated model pair
The other label of the video class answered.To can be generated when the final video tab model completed using training carries out visual classification
For characterizing at least one other label of video class, the effect of multi-tag classification is realized.
In some optional implementations of the present embodiment, above-mentioned initial model can be convolutional neural networks, including
Feature extraction layer and classification layer.Wherein, classification layer includes preset quantity weighted data, and weighted data corresponds to preset video
Classification, for determining that the video of input belongs to the other probability of the corresponding video class of weighted data.In general, feature extraction layer can wrap
Convolutional layer, pond layer etc. are included, for generating the characteristic of video, characteristic can be used for characterizing the image in such as video
The features such as color, shape.Layer of classifying includes full articulamentum, and characteristic of the full connection for being exported according to feature extraction layer is raw
At a feature vector (such as vector of 2048 dimensions).Weighted data includes weight coefficient, and weight coefficient can be with characteristic
It is multiplied, weighted data can also include bias, corresponding general using the available weighted data of weight coefficient and bias
Rate value, the video which is used to characterize input belong to the other probability of the corresponding video class of the weighted data.
In some optional implementations of the present embodiment, above-mentioned executing subject can train initially in accordance with the following steps
Model:
Other power in fixed above-mentioned preset quantity weighted data, in addition to the corresponding weighted data of Sample video collection
Tuple evidence, and the corresponding weighted data of adjustment Sample video collection, to be trained to initial model.
Specifically, for a Sample video collection, other power except the corresponding weighted data of Sample video collection are fixed
Tuple evidence can use the training method of two disaggregated models, adjust the corresponding weighted data of Sample video collection.To make the sample
The corresponding weighted data of this video set is optimal.It should be noted that training two disaggregated models method be at present extensively research and
The well-known technique of application, details are not described herein.By this implementation, the weighted data that video tab model can be made to include
Independently of one another, when being trained using a Sample video collection, other weighted datas are not influenced, so that finally obtained
Video tab model more accurately classifies to video.Due to using multiple weighted datas, finally obtained video mark
Multiple other divisions of video class can be carried out to video therein is inputted by signing model, realize the effect of multi-tag classification.
In some optional implementations of the present embodiment, initial model further includes video frame extraction layer.Above-mentioned execution
Main body can train as follows initial model:
The positive sample video input video frame extraction layer for including by Sample video collection, obtains positive sample sets of video frames.It will
Input of the obtained positive sample sets of video frames as feature extraction layer, by positive classification corresponding with the positive sample video of input
Desired output of the information as initial model, training initial model.The negative sample video input video for including by Sample video collection
Frame extract layer obtains negative sample sets of video frames.Using obtained negative sample sets of video frames as the input of feature extraction layer,
Using negative classification information corresponding with the negative sample video of input as the desired output of initial model, training initial model.
Specifically, video frame extraction layer can extract video frame according to the video frame extraction mode of various default settings.Example
Such as, it can be regarded according to the method for the existing key frame for extracting video, the key frame for extracting the Sample video of input as sample
Frequency frame.Or video frame is extracted as Sample video frame according to preset play time interval.Pass through this implementation, Ke Yicong
A certain number of video frames are extracted in Sample video for classifying to Sample video, so as to reduce the calculating of model
Amount, improves the efficiency of model training.
Step 2023, whether determine that at least two Sample videos are concentrated includes non-selected Sample video collection.
It step 2024, does not include that the initial model after determining the last time training is video tab model in response to determination.
In some optional implementations of the present embodiment, above-mentioned executing subject can be in response to determining at least two samples
Include non-selected Sample video collection in this video set, reselects Sample video from non-selected Sample video concentration
Collection continues to execute above-mentioned training step (i.e. using the initial model after the Sample video collection reselected and the last training
Step 2022- step 2024).Wherein, the mode of Sample video collection of reselecting is concentrated from non-selected Sample video, it can be with
It is random selection or is selected according to the number order of Sample video collection, here without limitation.
According to the video tab model that above steps training obtains, the video for being determined for inputting belongs to each pre-
If the other probability value of video class, if some probability value be more than or equal to preset probability threshold value, generate for characterize input
Video belong to the other class label of the corresponding video class of the probability value.In practical applications, video tab model can export
Class label set, each class label corresponds to a preset video classification, for characterizing input video label model
Video belongs to the video classification.The video tab model that training obtains as a result, is multi-tag disaggregated model.
With continued reference to the application scenarios that Fig. 3, Fig. 3 are according to the method for generating video tab model of the present embodiment
One schematic diagram.In the application scenarios of Fig. 3, electronic equipment 301 obtains at least two Sample video collection 302 first.Wherein, often
A Sample video collection corresponds to preset video classification, and Sample video collection includes belonging to the corresponding other positive sample video of video class
Be not belonging to the corresponding other negative sample video of video class, positive sample video corresponds to the positive classification information marked in advance, negative sample
This video corresponds to the negative classification information marked in advance.For example, Sample video collection 3021 corresponds to video classification " seashore ", sample
Video set 3022 corresponds to video classification " hotel ".The corresponding positive class of each positive sample video for including in Sample video collection 3021
Other information be vector (1,0,0 ...), including the corresponding negative classification information of each negative sample video be vector (0,0,0 ...).
Sample video integrates the corresponding positive classification information of each positive sample video for including in 3022 as vector (0,1,0 ...), including it is every
The corresponding negative classification information of a negative sample video is vector (0,0,0 ...).Wherein, each element position in vector corresponds to
One video classification.
Then, electronic equipment 301 is from above-mentioned at least two Sample videos collection 302, according to pre-set, Sample video
The number order of collection successively selects Sample video collection, using selected Sample video collection, executes following training step: utilizing
Machine learning method, the positive sample video for including using Sample video collection, will be corresponding with the positive sample video of input as input
As desired output, the negative sample video that Sample video is concentrated will regard positive classification information as input with the negative sample of input
Frequently corresponding negative classification information is as desired output, training initial model 303.As shown in the figure is that Sample video collection 3021 is used to instruct
Practice initial model 303.Initial model 303 retains parameter adjusted, continues to use it after every time using Sample video training
He trains at Sample video.Every time using after the training of Sample video collection, electronic equipment 301 determines at least two Sample video collection
It whether include non-selected Sample video collection in 302, if not including, i.e., whole Sample video collection are used training, determine
Initial model after the last time training is video tab model 304.
The method provided by the above embodiment of the disclosure, by obtaining at least two Sample video collection, wherein Sample video
Collection corresponds to preset video classification, and Sample video collection includes positive sample video and negative sample video, and positive sample video corresponds to
Positive classification information, negative sample video correspond to negative classification information;Then using positive sample video as input, positive classification information is made
, using negative classification information as desired output, initial model is trained for desired output using negative sample video as input,
Final training obtains video tab model, since positive classification information and negative classification information are to be directed to the information of single classification,
Embodiment of the disclosure can need to only use the classification for being labeled as single label without the use of the classification information for being labeled as multi-tag
Information is trained, and obtains the video tab model that can carry out multi-tag classification, to using single labeling simplicity, have needle
The characteristics of to property, the flexibility of model training is improved, and helped to improve using video tab model to visual classification
Accuracy.
With further reference to Fig. 4, the one of the method for the class label set for generating video according to the disclosure is shown
The process 400 of a embodiment.This is used to generate the method for the class label set of video, comprising the following steps:
Step 401, video to be sorted is obtained.
In the present embodiment, for generating the executing subject (clothes as shown in Figure 1 of the method for the class label set of video
Business device or terminal device) it can be from local or from remotely obtaining video to be sorted.Wherein, video to be sorted is to divide it
The video of class.It should be noted that video to be sorted used by the present embodiment includes the image sequence containing at least two images
Column.
Step 402, video tab model video input to be sorted trained in advance generates class label set.
In the present embodiment, the video tab model that above-mentioned executing subject can train video input to be sorted in advance,
Generate class label set.Wherein, class label corresponds to preset video classification, belongs to classification for characterizing video to be sorted
The corresponding video classification of label.Class label can be various forms of labels, including but not limited to following at least one: text
Word, number, symbol etc..
In the present embodiment, video tab model is generated according to the method for above-mentioned Fig. 2 corresponding embodiment description, specifically
It may refer to each step of Fig. 2 corresponding embodiment description, which is not described herein again.
In general, the class label set generated can be with video associated storage to be sorted.For example, can be by class label collection
Cooperation is the attribute information of video to be sorted, is stored into the attribute information set of video to be sorted.So as to increase characterization
The attribute of video to be sorted it is comprehensive.Attribute information set can include but is not limited to following at least one attribute information: to
Title, size, generation time of classification video etc..
Optionally, the class label set of generation can export in various manners, for example, class label set is shown
On the display screen that above-mentioned executing subject includes.It is communicated to connect alternatively, sending class label set to above-mentioned executing subject
Other electronic equipments on.
The method provided by the above embodiment of the disclosure, by using Fig. 2 corresponding embodiment generate video tab model,
Classify to video to be sorted, generates the class label set of video to be sorted, thus trained using by single exemplar, it is raw
At the video tab model classified for multi-tag, the accuracy and efficiency to visual classification is improved.
With further reference to Fig. 5, as the realization to method shown in above-mentioned Fig. 2, present disclose provides one kind for generating view
One embodiment of the device of frequency label model, the Installation practice is corresponding with embodiment of the method shown in Fig. 2, device tool
Body can be applied in various electronic equipments.
As shown in figure 5, the present embodiment includes: acquiring unit 501, quilt for generating the device 500 of video tab model
It is configured to obtain at least two Sample video collection, wherein Sample video collection corresponds to preset video classification, Sample video Ji Bao
It includes and belongs to the corresponding other positive sample video of video class and be not belonging to the corresponding other negative sample video of video class, positive sample video
Corresponding to the positive classification information marked in advance, negative sample video corresponds to the negative classification information marked in advance;Training unit 502,
It is configured to concentrate selection Sample video collection from least two Sample videos, using selected Sample video collection, execute as follows
Training step: machine learning method is utilized, the positive sample video for including using Sample video collection is as input, by the positive sample with input
The corresponding positive classification information of this video as desired output, negative sample video that Sample video is concentrated as inputting, will with it is defeated
The corresponding negative classification information of the negative sample video entered is as desired output, training initial model;Determine at least two Sample videos
Whether concentrate includes non-selected Sample video collection;It does not include the initial model determined after the last training in response to determination
For video tab model.
In the present embodiment, acquiring unit 501 can be by wired connection mode or radio connection from remotely obtaining
At least two Sample video collection are taken, or obtain at least two Sample video collection from local.Wherein, Sample video collection corresponds to default
Video classification, Sample video collection includes belonging to the corresponding other positive sample video of video class and to be not belonging to corresponding video classification
Negative sample video, positive sample video corresponds to the positive classification information that marks in advance, and negative sample video corresponds to be marked in advance
Negative classification information.
Specifically, positive classification information and negative classification information may include the information of following at least one form: text, number
Word, symbol etc..For example, positive classification information can be " seashore ", negative classification information can be " non-seashore ".
It should be noted that positive sample video and negative sample video used by the present embodiment include containing at least two figures
The image sequence of picture.
In the present embodiment, training unit 502 can execute following sub-step:
Step 5021, selection Sample video collection is concentrated from least two Sample videos.
Specifically, above-mentioned training unit 502 can select Sample video collection in various manners, such as randomly choose, and press
According to the number order selection etc. of each Sample video collection.
Then, using selected Sample video collection, following training step (including step 5022- step 5024) is executed.
Step 5022, using machine learning method, the positive sample video for including using Sample video collection as input, will with it is defeated
The corresponding positive classification information of the positive sample video entered is as desired output, and the negative sample video that Sample video is concentrated is as defeated
Enter, using negative classification information corresponding with the negative sample video of input as desired output, training initial model.
Specifically, initial model can be various types of models, such as Recognition with Recurrent Neural Network model, convolutional neural networks
Model etc..During training initial model, for the positive sample video or negative sample video of each training input, it can obtain
To reality output.Wherein, reality output is the data of initial model reality output, for characterizing classification information.Then, above-mentioned instruction
Gradient descent method can be used by practicing unit 502, be based on reality output and desired output, adjusted the parameter of initial model, will be each
Initial model of the model obtained after adjusting parameter as training next time, and in the case where meeting preset termination condition, knot
Beam is directed to the training of a Sample video collection.It should be noted that the training termination condition here preset at may include but unlimited
In at least one of following: the training time is more than preset duration;Frequency of training is more than preset times;Utilize preset loss function
(such as cross entropy loss function) calculates resulting penalty values and is less than default penalty values threshold value.
As an example, initial model may include at least two 2 disaggregated models, and each two disaggregated model corresponds to
One Sample video collection.For some two disaggregated model, which can include based on corresponding Sample video collection
Positive sample video and the training of negative sample video obtain.Two disaggregated model that final training is completed, can determine the video of input
Whether two disaggregated model corresponding video classification is belonged to, if it is determined that is belonged to, is then generated for characterizing two disaggregated model pair
The other label of the video class answered.To can be generated when the final video tab model completed using training carries out visual classification
For characterizing at least one other label of video class, the effect of multi-tag classification is realized.
Step 5023, whether determine that at least two Sample videos are concentrated includes non-selected Sample video collection.
It step 5024, does not include that the initial model after determining the last time training is video tab model in response to determination.
In some optional implementations of the present embodiment, the device 500 can also include: selecting unit (in figure not
Show), it is configured in response to determine that at least two Sample videos are concentrated to include non-selected Sample video collection, from unselected
Sample video concentration reselect Sample video collection, using after the Sample video collection that reselects and the last training just
Beginning model, continues to execute training step.
In some optional implementations of the present embodiment, positive classification information and negative classification information include respectively presetting
The vector of quantity element, object element in the corresponding vector of positive sample video for characterize positive sample video belong to it is corresponding
Video classification, the object element in the corresponding vector of negative sample video are not belonging to corresponding video class for characterizing negative sample video
Not, object element is to establish the member of corresponding relationship positioned at video classification corresponding with vector in advance in the element position in vector
Element at plain position, the corresponding video classification of vector are the corresponding view of Sample video collection belonging to the corresponding Sample video of vector
Frequency classification.
In some optional implementations of the present embodiment, initial model is convolutional neural networks, including feature extraction
Layer and classification layer, classification layer include preset quantity weighted data, and weighted data corresponds to preset video classification, for determining
The video of input belongs to the other probability of the corresponding video class of weighted data.
In some optional implementations of the present embodiment, training unit 502 can be further configured to: fixed pre-
If other weighted datas in quantity weighted data, in addition to the corresponding weighted data of Sample video collection, and adjustment sample
The corresponding weighted data of this video set, to be trained to initial model.
In some optional implementations of the present embodiment, initial model further includes video frame extraction layer;And training
Unit 502 can be further configured to: the positive sample video input video frame extraction layer for including by Sample video collection obtains just
Sample video frame set;Using obtained positive sample sets of video frames as the input of feature extraction layer, by the positive sample with input
Desired output of the corresponding positive classification information of this video as initial model;The negative sample video input for including by Sample video collection
Video frame extraction layer obtains negative sample sets of video frames;Using obtained negative sample sets of video frames as feature extraction layer
Input, using negative classification information corresponding with the negative sample video of input as the desired output of initial model, training initial model.
The device provided by the above embodiment 500 of the disclosure, by obtaining at least two Sample video collection, wherein sample
Video set corresponds to preset video classification, and Sample video collection includes positive sample video and negative sample video, positive sample video pair
Ying Yuzheng classification information, negative sample video correspond to negative classification information;Then using positive sample video as input, positive classification is believed
Breath is used as desired output, using negative sample video as input, using negative classification information as desired output, instructs to initial model
To practice, final training obtains video tab model, since positive classification information and negative classification information are to be directed to the information of single classification, because
This, embodiment of the disclosure can need to only use without the use of the classification information for being labeled as multi-tag and be labeled as single label
Classification information is trained, and obtains the video tab model that can carry out multi-tag classification, thus easy using single labeling,
Targeted feature, improves the flexibility of model training, and helps to improve using video tab model to video point
The accuracy of class
With further reference to Fig. 6, as the realization to method shown in above-mentioned Fig. 4, present disclose provides one kind for generating view
One embodiment of the device of the class label set of frequency, the Installation practice is corresponding with embodiment of the method shown in Fig. 4, should
Device specifically can be applied in various electronic equipments.
As shown in fig. 6, the present embodiment includes: acquiring unit for generating the device 600 of the class label set of video
601, it is configured to obtain video to be sorted;Generation unit 602 is configured to the video for training video input to be sorted in advance
Label model generates class label set, wherein class label corresponds to preset video classification, for characterizing view to be sorted
Frequency belongs to the corresponding video classification of class label, and video tab model is described according to any embodiment in above-mentioned first aspect
What method generated.
In the present embodiment, acquiring unit 601 can be from local or from remotely obtaining video to be sorted.Wherein, wherein to
Classification video is the video to classify to it.It should be noted that video to be sorted used by the present embodiment includes containing
There is the image sequence of at least two images.
In the present embodiment, the video tab model that generation unit 602 can train video input to be sorted in advance, it is raw
At class label set.Wherein, class label corresponds to preset video classification, belongs to classification mark for characterizing video to be sorted
Sign corresponding video classification.Class label can be various forms of labels, including but not limited to following at least one: text,
Number, symbol etc..
In the present embodiment, video tab model is generated according to the method for above-mentioned Fig. 2 corresponding embodiment description, specifically
It may refer to each step of Fig. 2 corresponding embodiment description, which is not described herein again.
In general, the class label set generated can be with video associated storage to be sorted.For example, can be by class label collection
Cooperation is the attribute information of video to be sorted, is stored into the attribute information set of video to be sorted.So as to increase characterization
The attribute of video to be sorted it is comprehensive.Attribute information set can include but is not limited to following at least one attribute information: to
Title, size, generation time of classification video etc..
The device provided by the above embodiment 600 of the disclosure, the video tab mould generated by using Fig. 2 corresponding embodiment
Type classifies to video to be sorted, generates the class label set of video to be sorted, to instruct using by single exemplar
Practice, generates the video tab model for multi-tag classification, improve the accuracy and efficiency to visual classification.
Below with reference to Fig. 7, it illustrates the electronic equipment that is suitable for being used to realize embodiment of the disclosure, (example is as shown in figure 1
Server or terminal device) 700 structural schematic diagram.Terminal device in embodiment of the disclosure can include but is not limited to all
As mobile phone, laptop, digit broadcasting receiver, PDA (personal digital assistant), PAD (tablet computer), PMP are (portable
Formula multimedia player), the mobile terminal and such as number TV, desk-top meter of car-mounted terminal (such as vehicle mounted guidance terminal) etc.
The fixed terminal of calculation machine etc..Electronic equipment shown in Fig. 7 is only an example, should not be to the function of embodiment of the disclosure
Any restrictions are brought with use scope.
As shown in fig. 7, electronic equipment 700 may include processing unit (such as central processing unit, graphics processor etc.)
701, random access can be loaded into according to the program being stored in read-only memory (ROM) 702 or from storage device 708
Program in memory (RAM) 703 and execute various movements appropriate and processing.In RAM 703, it is also stored with electronic equipment
Various programs and data needed for 700 operations.Processing unit 701, ROM 702 and RAM 703 pass through the phase each other of bus 704
Even.Input/output (I/O) interface 705 is also connected to bus 704.
In general, following device can connect to I/O interface 705: including such as touch screen, touch tablet, keyboard, mouse, taking the photograph
As the input unit 706 of head, microphone, accelerometer, gyroscope etc.;Including such as liquid crystal display (LCD), loudspeaker, vibration
The output device 707 of dynamic device etc.;Storage device 708 including such as tape, hard disk etc.;And communication device 709.Communication device
709, which can permit electronic equipment 700, is wirelessly or non-wirelessly communicated with other equipment to exchange data.Although Fig. 7 shows tool
There is the electronic equipment 700 of various devices, it should be understood that being not required for implementing or having all devices shown.It can be with
Alternatively implement or have more or fewer devices.Each box shown in Fig. 7 can represent a device, can also root
According to needing to represent multiple devices.
Particularly, in accordance with an embodiment of the present disclosure, it may be implemented as computer above with reference to the process of flow chart description
Software program.For example, embodiment of the disclosure includes a kind of computer program product comprising be carried on computer-readable medium
On computer program, which includes the program code for method shown in execution flow chart.In such reality
It applies in example, which can be downloaded and installed from network by communication device 709, or from storage device 708
It is mounted, or is mounted from ROM 702.When the computer program is executed by processing unit 701, the implementation of the disclosure is executed
The above-mentioned function of being limited in the method for example.
It is situated between it should be noted that computer-readable medium described in embodiment of the disclosure can be computer-readable signal
Matter or computer readable storage medium either the two any combination.Computer readable storage medium for example can be with
System, device or the device of --- but being not limited to --- electricity, magnetic, optical, electromagnetic, infrared ray or semiconductor, or it is any more than
Combination.The more specific example of computer readable storage medium can include but is not limited to: have one or more conducting wires
Electrical connection, portable computer diskette, hard disk, random access storage device (RAM), read-only memory (ROM), erasable type are programmable
Read-only memory (EPROM or flash memory), optical fiber, portable compact disc read-only memory (CD-ROM), light storage device, magnetic are deposited
Memory device or above-mentioned any appropriate combination.In embodiment of the disclosure, computer readable storage medium, which can be, appoints
What include or the tangible medium of storage program that the program can be commanded execution system, device or device use or and its
It is used in combination.And in embodiment of the disclosure, computer-readable signal media may include in a base band or as carrier wave
The data-signal that a part is propagated, wherein carrying computer-readable program code.The data-signal of this propagation can be adopted
With diversified forms, including but not limited to electromagnetic signal, optical signal or above-mentioned any appropriate combination.Computer-readable signal is situated between
Matter can also be any computer-readable medium other than computer readable storage medium, which can be with
It sends, propagate or transmits for by the use of instruction execution system, device or device or program in connection.Meter
The program code for including on calculation machine readable medium can transmit with any suitable medium, including but not limited to: electric wire, optical cable,
RF (radio frequency) etc. or above-mentioned any appropriate combination.
Above-mentioned computer-readable medium can be included in above-mentioned electronic equipment;It is also possible to individualism, and not
It is fitted into the electronic equipment.Above-mentioned computer-readable medium carries one or more program, when said one or more
When a program is executed by the electronic equipment, so that the electronic equipment: obtaining at least two Sample video collection, wherein Sample video
Collection corresponds to preset video classification, and Sample video collection includes belonging to the corresponding other positive sample video of video class and being not belonging to pair
The other negative sample video of the video class answered, positive sample video correspond to the positive classification information marked in advance, and negative sample video is corresponding
In the negative classification information marked in advance;Selection Sample video collection is concentrated from least two Sample videos, utilizes selected sample
Video set executes following training step: utilizing machine learning method, the positive sample video for including using Sample video collection is as defeated
Enter, using positive classification information corresponding with the positive sample video of input as desired output, the negative sample that Sample video is concentrated is regarded
Frequency is as input, using negative classification information corresponding with the negative sample video of input as desired output, training initial model;It determines
Whether it includes non-selected Sample video collection that at least two Sample videos are concentrated;It does not include determining the last time in response to determination
Initial model after training is video tab model.
In addition, when said one or multiple programs are executed by the electronic equipment, it is also possible that the electronic equipment: obtaining
Take video to be sorted;The video tab model that video input to be sorted is trained in advance generates class label set.
The behaviour for executing embodiment of the disclosure can be write with one or more programming languages or combinations thereof
The computer program code of work, described program design language include object oriented program language-such as Java,
Smalltalk, C++ further include conventional procedural programming language-such as " C " language or similar program design language
Speech.Program code can be executed fully on the user computer, partly be executed on the user computer, as an independence
Software package execute, part on the user computer part execute on the remote computer or completely in remote computer or
It is executed on server.In situations involving remote computers, remote computer can pass through the network of any kind --- packet
It includes local area network (LAN) or wide area network (WAN)-is connected to subscriber computer, or, it may be connected to outer computer (such as benefit
It is connected with ISP by internet).
Flow chart and block diagram in attached drawing are illustrated according to the system of the various embodiments of the disclosure, method and computer journey
The architecture, function and operation in the cards of sequence product.In this regard, each box in flowchart or block diagram can generation
A part of one module, program segment or code of table, a part of the module, program segment or code include one or more use
The executable instruction of the logic function as defined in realizing.It should also be noted that in some implementations as replacements, being marked in box
The function of note can also occur in a different order than that indicated in the drawings.For example, two boxes succeedingly indicated are actually
It can be basically executed in parallel, they can also be executed in the opposite order sometimes, and this depends on the function involved.Also it to infuse
Meaning, the combination of each box in block diagram and or flow chart and the box in block diagram and or flow chart can be with holding
The dedicated hardware based system of functions or operations as defined in row is realized, or can use specialized hardware and computer instruction
Combination realize.
Being described in unit involved in embodiment of the disclosure can be realized by way of software, can also be passed through
The mode of hardware is realized.Described unit also can be set in the processor, for example, can be described as: a kind of processor
Including acquiring unit, training unit.Wherein, the title of these units does not constitute the limit to the unit itself under certain conditions
It is fixed, for example, acquiring unit is also described as " obtaining the unit of at least two Sample video collection ".
Above description is only the preferred embodiment of the disclosure and the explanation to institute's application technology principle.Those skilled in the art
Member it should be appreciated that embodiment of the disclosure involved in invention scope, however it is not limited to the specific combination of above-mentioned technical characteristic and
At technical solution, while should also cover do not depart from foregoing invention design in the case where, by above-mentioned technical characteristic or its be equal
Feature carries out any combination and other technical solutions for being formed.Such as disclosed in features described above and embodiment of the disclosure (but
It is not limited to) technical characteristic with similar functions is replaced mutually and the technical solution that is formed.