Summary of the invention
The embodiment of the present application proposes the method and apparatus for generating information.
In a first aspect, the embodiment of the present application provides a kind of method for generating information, this method comprises: successively extracting
Each extracted continuous multiple frames are input to the first detection model of training in advance, obtained by the continuous multiple frames in target video
The candidate categories of the continuous multiple frames inputted, wherein the first detection model is used to determine the candidate categories of image sequence;It will be candidate
Classification includes that the designated frame in the continuous multiple frames of target category is determined as candidate frame;Identified candidate frame is input to preparatory instruction
The second experienced detection model obtains the classification testing result of inputted candidate frame, wherein the second detection model is for determining figure
Whether the classification of picture is target category;Classification testing result based on candidate frame determines the classification testing result of target video.
In some embodiments, continuous multiple frames are continuous three frame, and the designated frame in continuous multiple frames is the centre of continuous three frame
Frame.
In some embodiments, the classification testing result based on candidate frame determines the classification testing result of target video, packet
Include: the classification for determining classification testing result instruction candidate frame is the quantity of target category;It is pre- to be greater than first in response to quantification
If numerical value, determine that the classification of target video is target category.
In some embodiments, the classification testing result based on candidate frame determines the classification testing result of target video, packet
Include: the classification for determining classification testing result instruction candidate frame is the quantity of target category, and, determine the total quantity of candidate frame;
It is greater than the second default value in response to the ratio of the above-mentioned quantity of determination and total quantity, determines that the classification of target video is target class
Not.
In some embodiments, training obtains the first detection model as follows: the first training sample set is obtained,
Wherein, the training sample in the first training sample set includes first sample image and the classification for being used to indicate first sample image
Markup information;Using machine learning method, using the first sample image in the first training sample set as pre-establishing
The input of first convolutional neural networks, using the markup information of the first sample image inputted as the first convolutional neural networks
Output, the first convolutional neural networks of training, using the first convolutional neural networks after training as the first detection model.
In some embodiments, training obtains the second detection model as follows: the second training sample set is obtained,
Wherein, the training sample in the second training sample set includes the second sample image and the classification for being used to indicate the second sample image
Whether be target category markup information;Using machine learning method, by the second sample image in the second training sample set
As the input of the second convolutional neural networks pre-established, using the markup information of the second sample image inputted as second
The output of convolutional neural networks, the second convolutional neural networks of training, using the second convolutional neural networks after training as the second inspection
Survey model.
Second aspect, the embodiment of the present application provide a kind of for generating the device of information, which includes: the first input
Unit is configured to successively extract the continuous multiple frames in target video, and each extracted continuous multiple frames are input to preparatory instruction
The first experienced detection model, obtains the initial category of inputted continuous multiple frames, wherein the first detection model is for determining image
The candidate categories of sequence;First determination unit is configured to be the designated frame in the continuous multiple frames of target category by initial category
It is determined as candidate frame;Second input unit is configured to for identified candidate frame being input to the second detection mould of training in advance
Type obtains the classification testing result of inputted candidate frame, wherein the second detection model be used for determine image classification whether be
Target category;Second determination unit is configured to the classification testing result based on candidate frame, determines the classification detection of target video
As a result.
In some embodiments, continuous multiple frames are continuous three frame, the designated frame in continuous multiple frames be in continuous three frame in
Between frame.
In some embodiments, the second determination unit, comprising: the first determining module is configured to determine classification detection knot
Fruit indicates that the classification of candidate frame is the quantity of target category;Second determining module is configured in response to quantification and is greater than the
One default value determines that the classification of target video is target category.
In some embodiments, the second determination unit, comprising: third determining module is configured to determine classification detection knot
Fruit indicates that the classification of candidate frame is the quantity of target category, and, determine the total quantity of candidate frame;4th determining module is matched
It is set to and is greater than the second default value in response to the ratio of the above-mentioned quantity of determination and total quantity, determine that the classification of target video is target
Classification.
In some embodiments, training obtains the first detection model as follows: the first training sample set is obtained,
Wherein, the training sample in the first training sample set includes first sample image and the classification for being used to indicate first sample image
Markup information;Using machine learning method, using the first sample image in the first training sample set as pre-establishing
The input of first convolutional neural networks, using the markup information of the first sample image inputted as the first convolutional neural networks
Output, the first convolutional neural networks of training, using the first convolutional neural networks after training as the first detection model.
In some embodiments, training obtains the second detection model as follows: the second training sample set is obtained,
Wherein, the training sample in the second training sample set includes the second sample image and the classification for being used to indicate the second sample image
Whether be target category markup information;Using machine learning method, by the second sample image in the second training sample set
As the input of the second convolutional neural networks pre-established, using the markup information of the second sample image inputted as second
The output of convolutional neural networks, the second convolutional neural networks of training, using the second convolutional neural networks after training as the second inspection
Survey model.
The third aspect, the embodiment of the present application provide a kind of electronic equipment, comprising: one or more processors;Storage dress
Set, be stored thereon with one or more programs, when one or more programs are executed by one or more processors so that one or
Multiple processors realize the method such as any embodiment in first aspect.
Fourth aspect, the embodiment of the present application provide a kind of computer-readable medium, are stored thereon with computer program, should
The method such as any embodiment in first aspect is realized when program is executed by processor.
Method and apparatus provided by the embodiments of the present application for generating information, by successively extracting the company in target video
Each extracted continuous multiple frames are input to the first detection model of training in advance, so as to obtain each institute by continuous multiframe
Candidate categories corresponding to the continuous multiple frames of input.Herein, if inputting single frame every time, when frame is more fuzzy, classification inspection
The accuracy of survey is lower.Continuous multiple frames mode as input is used herein, feature can be extracted to continuous multiple frames simultaneously, improved
The accuracy of classification detection.Then, candidate categories are determined as candidate comprising the designated frame in the continuous multiple frames of target category
Frame, so as to extract the doubtful frame for belonging to target category from target video.Then, candidate frame is input to and is used to determine
Whether image belongs to the second detection model of target category, obtains the further classification testing result of candidate frame.Based on candidate
The classification testing result of frame can determine the classification testing result of target video.It is therefore not required to which each frame is input to
Two detection models carry out classification detection, improve the efficiency that classification detection is carried out to target video.
Specific embodiment
The application is described in further detail with reference to the accompanying drawings and examples.It is understood that this place is retouched
The specific embodiment stated is used only for explaining related invention, rather than the restriction to the invention.It also should be noted that in order to
Convenient for description, part relevant to related invention is illustrated only in attached drawing.
It should be noted that in the absence of conflict, the features in the embodiments and the embodiments of the present application can phase
Mutually combination.The application is described in detail below with reference to the accompanying drawings and in conjunction with the embodiments.
Fig. 1 is shown can be using the application for generating the method for information or the example of the device for generating information
Property system architecture 100.
As shown in Figure 1, system architecture 100 may include terminal device 101,102,103, network 104 and server 105.
Network 104 between terminal device 101,102,103 and server 105 to provide the medium of communication link.Network 104 can be with
Including various connection types, such as wired, wireless communication link or fiber optic cables etc..
User can be used terminal device 101,102,103 and be interacted by network 104 with server 105, to receive or send out
Send message etc..Various telecommunication customer end applications can be installed, such as video record class is answered on terminal device 101,102,103
With the application of, video playback class, the application of interactive voice class, searching class application, instant messaging tools, mailbox client, social platform
Software etc..
Terminal device 101,102,103 can be hardware, be also possible to software.When terminal device 101,102,103 is hard
When part, it can be the various electronic equipments with display screen, including but not limited to smart phone, tablet computer, on knee portable
Computer and desktop computer etc..When terminal device 101,102,103 is software, above-mentioned cited electricity may be mounted at
In sub- equipment.Multiple softwares or software module (such as providing Distributed Services) may be implemented into it, also may be implemented into
Single software or software module.It is not specifically limited herein.
When terminal device 101,102,103 is hardware, it is also equipped with image capture device thereon.Image Acquisition is set
It is standby to can be the various equipment for being able to achieve acquisition image function, such as camera, sensor.User can use terminal device
101, the image capture device on 102,103, to acquire video.
Server 105 can be to provide the server of various services, such as uploading to terminal device 101,102,103
The video video processing service device that is stored, managed or analyzed.Video processing service device can store for figure
As the first detection model that classification is detected, and, can store whether the classification for determining image is target category
The second detection model.Herein, target category can be one or more classifications.Each target category can correspond to one
Two detection models, for determining whether image belongs to the target category.Video processing service device can use the first detection model
It determines the corresponding classification of continuous multiple frames in target video, also can use the second detection model and determine whether the classification of frame is mesh
Classification is marked, to obtain the classification testing result of video.
In this way, server 105 can determine user institute after user is using 101,102,103 uploaded videos of terminal device
Whether the video of upload belongs to target category, in turn, can carry out the processing such as forbidding pushing, forbid forwarding to target video, or
Person pushes relevant information (such as classification testing result of target video).
It should be noted that server 105 can be hardware, it is also possible to software.When server is hardware, Ke Yishi
The distributed server cluster of ready-made multiple server compositions, also may be implemented into individual server.When server is software,
Multiple softwares or software module (such as providing Distributed Services) may be implemented into, single software or soft also may be implemented into
Part module.It is not specifically limited herein.
It should be noted that the method provided by the embodiment of the present application for generating information is generally held by server 105
Row, correspondingly, the device for generating information is generally positioned in server 105.
It should be understood that the number of terminal device, network and server in Fig. 1 is only schematical.According to realization need
It wants, can have any number of terminal device, network and server.
With continued reference to Fig. 2, the process of one embodiment of the method for generating information according to the application is shown
200.The method for being used to generate information, comprising the following steps:
Step 201, the continuous multiple frames in target video are successively extracted, each extracted continuous multiple frames are input in advance
The first trained detection model, obtains the candidate categories of inputted continuous multiple frames.
It in the present embodiment, can be with for generating the executing subject (such as server 105 shown in FIG. 1) of the method for information
The continuous multiple frames (for example, at least two frames) in target video are successively extracted, the continuous multiple frames extracted every time are input to preparatory training
The first detection model, obtain the corresponding candidate categories of the continuous multiple frames inputted every time.
Herein, target video can be the video of classification to be detected.In a kind of scene, target video can be stored in advance
In above-mentioned executing subject.Above-mentioned executing subject can be directly from the continuous multiple frames being locally extracted in target video.In another kind
In scene, target video can be above-mentioned executing subject by wired connection or radio connection, from terminal device (such as
Terminal device shown in FIG. 1 101,102,103) institute is received.Above-mentioned executing subject can regard after receiving from target
Continuous multiple frames are successively extracted in frequency.It should be pointed out that above-mentioned radio connection can include but is not limited to 3G/4G connection,
WiFi connection, bluetooth connection, WiMAX connection, Zigbee connection, UWB (ultra wideband) connection and other now
The radio connection known or developed in the future.
Herein, each secondary continuous multiple frames can be carried out successively using the frame in target video as the start frame of continuous multiple frames
It extracts.As an example, the first, second frame of target video can be extracted for the first time;Second can extract the of target video
Two, third frame;And so on.As another example, the first frame of target video can be extracted for the first time to the 5th frame;Second
The second frame of target video can be extracted to the 6th frame;And so on.Herein, if inputting single frame every time, when frame is more fuzzy
When, the accuracy of classification detection is lower.Continuous multiple frames mode as input is used herein, and continuous multiple frames can be extracted simultaneously
Feature improves the accuracy of classification detection.
Herein, the first detection model is determined for the candidate categories of image sequence.Herein, image sequence and image
It can be divided into multiple classifications in advance.For example, can include but is not limited at least one of following: pornographic class, violence class, weapon
Class, normal class.First detection model can be the corresponding relationship for characterizing image sequence and candidate categories.As an example, the
One detection model can be that sequence (can be characterized) in the form of pixel matrix is corresponding with candidate categories for characterizing image
The mapping table of relationship.
As another example, the first detection model can be using machine learning method, to existing convolutional neural networks
(such as DenseBox, VGGNet, ResNet, SegNet etc.) carries out obtained from Training.In practice, convolutional Neural net
Network (Convolutional Neural Network, CNN) is a kind of feedforward neural network, its artificial neuron can respond
Surrounding cells in a part of coverage area have outstanding performance for image procossing, therefore, it is possible to using convolutional neural networks into
The extraction of frame feature in row Sample video.In this example, the product neural network established may include convolutional layer, Chi Hua
Layer, full articulamentum etc..Wherein, convolutional layer can be used for extracting characteristics of image.Pond layer can be used for carrying out the information of input
Down-sampled (downsample).Full articulamentum can be used for classifying obtained feature.
It here, can be respectively to the image sequence of input to obtained first detection model after convolutional neural networks training
Each of column image (such as each frame in the continuous multiple frames inputted each time) extracts feature.It then, can be to institute
The feature of extraction is merged.For example, the feature of every frame can be indicated with the form of eigenmatrix or feature vector.It can be with
The characteristic value of same position in the eigenmatrix of each frame is averaged, fused eigenmatrix is obtained.To extracted spy
It after sign is merged, can classify to fused eigenmatrix, determine inputted image sequence (continuous multiple frames)
Belong to the probability of each pre-set categories.Finally, the general of default value (such as 0.3) can be greater than by the calculated probability of institute
Candidate categories of the pre-set categories corresponding to rate as the image sequence (continuous multiple frames) inputted.In practice, when being calculated
Probability in, there is no be greater than default value probability, then the image sequence inputted (continuous multiple frames) be not present corresponding time
Select classification.At this point it is possible to continue to extract next image sequence (continuous multiple frames), continue to be input to the first detection model.Work as institute
In calculated probability, there are one or more probability for being greater than default value, then each can will be greater than the general of default value
Candidate categories of the pre-set categories corresponding to rate as the image sequence (continuous multiple frames) inputted.Then, continue to extract next
A image sequence (continuous multiple frames), continues to be input to the first detection model.
It should be noted that above-mentioned first detection model is also possible to have the function of image characteristics extraction and classification feature
Other models, however it is not limited to which above-mentioned example, specific model structure can set according to actual needs, be not construed as limiting herein.
In practice, the convolutional neural networks structure that lightweight can be used in the first detection model is trained to obtain.It utilizes
First detection model determines the classification of continuous multiple frames, it is therefore intended that rapidly extracting goes out the doubtful frame for target category, lightweight
Convolutional neural networks structure is relatively simple.For example, can be successively by convolutional layer, pond layer, convolutional layer, pond layer, full articulamentum
It constitutes.Since the convolutional neural networks structure of lightweight is relatively simple, thus, data calculation amount is smaller, and processing speed is very fast.This
Place, target category can be preassigned a certain classification, such as pornographic class perhaps violence class or weapon class etc..
In some optional implementations of the present embodiment, each extracted continuous multiple frames can be continuous three frame.
As an example, extracting the first frame of target video for the first time to third frame;The second frame of target video is extracted for the second time to the 4th frame;
And so on.
In some optional implementations of the present embodiment, above-mentioned first detection model can train as follows
It obtains: the first step, available first training sample set.Wherein, the training sample in above-mentioned first training sample set can
To include the markup information of first sample image with the classification for being used to indicate first sample image.Herein, first sample image can
To be single image, it is also possible to image sequence.First sample image can be the various images collected from internet, can also
To be the extracted frame from the video that user uploads.Second step will be in the first training sample set using machine learning method
Input of the first sample image as the first convolutional neural networks pre-established, by the mark of the first sample image inputted
Infuse output of the information as above-mentioned first convolutional neural networks, above-mentioned first convolutional neural networks of training, by first after training
Convolutional neural networks are as the first detection model.Herein, the first convolutional neural networks can use various existing lightweights
Convolutional neural networks structure.
Step 202, candidate categories are determined as candidate frame comprising the designated frame in the continuous multiple frames of target category.
In the present embodiment, for extracted continuous multiple frames each time, above-mentioned executing subject can determine this time first
In the candidate categories of extracted continuous multiple frames, if including target category.If including can be extracted continuous by this time
Designated frame in multiframe is determined as candidate frame.Herein, designated frame can be the frame of the designated order in continuous multiple frames.For example, can
To be to remove first frame and an any other frame of tail frame, first frame or the tail frame being also possible in continuous multiple frames in continuous multiple frames.
Thus, it is possible to extract the doubtful candidate frame for belonging to target category from target video.It in turn, can be according to time
The analysis for selecting frame determines the classification of target video.It does not need each frame carrying out classification detection respectively.Thus, it improves to mesh
Mark the efficiency that video carries out classification detection.
In some optional implementations of the present embodiment, each extracted continuous multiple frames can be continuous three frame.
Designated frame in continuous multiple frames can be the intermediate frame of continuous three frame.
Step 203, identified candidate frame is input to the second detection model of training in advance, obtains inputted candidate
The classification testing result of frame.
In the present embodiment, identified candidate frame can be input to the second detection of training in advance by above-mentioned executing subject
Model obtains the classification testing result of inputted candidate frame.Wherein, above-mentioned second detection model is determined for image
Whether classification is target category, and classification testing result can serve to indicate that whether the classification of inputted candidate frame is target class
Not.
As an example, the second detection model can be for characterize image (can be characterized in the form of pixel matrix) with
The mapping table of classification testing result.
As another example, the second detection model can be using machine learning method, have classification feature to existing
Model carry out Training obtained from.The above-mentioned model with classification feature, which can be, uses various existing structures
The convolutional neural networks of (such as DenseBox, VGGNet, ResNet, SegNet etc.).It is also possible to support vector machines
Models such as (Support Vector Machine, SVM).Second detection model can carry out feature to the candidate frame inputted and mention
It takes, and then extracted feature is analyzed, the classification that inputted candidate frame is calculated is the probability of target category.When
When determining that the classification of candidate frame is greater than predetermined probabilities (such as 0.5) for the probability of target category, inputted candidate can be determined
The classification of frame is target category.It, can when the probability that the classification for determining candidate frame is target category is no more than above-mentioned predetermined probabilities
To determine the classification of inputted candidate frame not for target category.
Thus, it is only necessary to candidate frame is input to the second detection model and carry out classification detection, do not need to input each frame
Classification detection is carried out to the second detection model, improves the efficiency for carrying out classification detection to target video.
In some optional implementations of the present embodiment, the second detection model can be trained as follows
It arrives: the first step, available second training sample set.Wherein, the training sample in above-mentioned second training sample set includes
Second sample image and be used to indicate the second sample image classification whether be target category markup information.Herein, the second sample
This image can be the various images collected from internet, be also possible to the extracted frame from the video that user uploads.The
Two steps, can use machine learning method, using the second sample image in the second training sample set as pre-established
The input of two convolutional neural networks, using the markup information of the second sample image inputted as above-mentioned second convolutional neural networks
Output, above-mentioned second convolutional neural networks of training, using the second convolutional neural networks after training as the second detection model.This
Place, the second convolutional neural networks can use various existing convolutional neural networks structures.
Step 204, the classification testing result based on candidate frame determines the classification testing result of target video.
In the present embodiment, classification testing result of the above-mentioned executing subject based on candidate frame, it is true to can use various modes
Set the goal the classification testing result of video.For example, in response to determining that there are the candidate frames that classification is target category, it is determined that target
The classification of video is target category.
In some optional implementations of the present embodiment, above-mentioned executing subject can determine that classification testing result indicates
The classification of candidate frame is the quantity of above-mentioned target category.It is greater than the first default value in response to the above-mentioned quantity of determination, determines above-mentioned
The classification of target video is above-mentioned target category.Herein, the first default value can be arranged according to actual needs.
In some optional implementations of the present embodiment, above-mentioned executing subject can determine classification testing result first
The classification for indicating candidate frame is the quantity of above-mentioned target category, and, determine the total quantity of candidate frame.Then, class can be determined
It Wei not the quantity of above-mentioned target category and the ratio of above-mentioned total quantity.In response to the ratio of determination above-mentioned quantity and above-mentioned total quantity
Greater than the second default value, determine that the classification of above-mentioned target video is above-mentioned target category.Herein, the second default value can root
It is arranged according to actual demand.
Thus, it is possible to determine the classification testing result of target video according to the classification testing result of candidate frame, do not need by
Each frame is input to the second detection model and carries out classification detection, improves the efficiency that classification detection is carried out to target video.
With continued reference to the signal that Fig. 3, Fig. 3 are according to the application scenarios of the method for generating information of the present embodiment
Figure.In the application scenarios of Fig. 3, user is had sent to server 302 comprising target video to be uploaded by terminal device 301
303 video upload request.Server 302 can successively extract the company in target video 303 after receiving target video
Each extracted continuous multiple frames are input to the first detection model of training in advance, obtain being inputted continuous more by continuous multiframe
The candidate categories of frame.Then, candidate categories can be included that the designated frame in the continuous multiple frames of pornographic class is determined as by server 302
Candidate frame, and candidate frame is input to the second detection model trained in advance, obtain the classification detection knot of inputted candidate frame
Fruit.Later, server 302 can determine the classification testing result of above-mentioned target video based on the classification testing result of candidate frame.
When the classification of classification testing result instruction target video is pornographic class, server 302 can be pushed to terminal device 301 and be prompted
Information.At this point, the target video that the prompt information can be used for that user is prompted to be uploaded is related to Pornograph and prompt user
Video is replaced to upload.When the classification of classification testing result instruction target video is not pornographic class, the target video can store,
And prompt information is pushed to terminal device 301.At this point, the prompt information can be used for that ownership goal video is prompted to upload successfully.
The method provided by the above embodiment of the application will be each by successively extracting the continuous multiple frames in target video
Extracted continuous multiple frames are input to the first detection model of training in advance, so as to obtain the continuous multiple frames inputted every time
Corresponding candidate categories.Herein, if inputting single frame every time, when frame is more fuzzy, the accuracy of classification detection is lower.
Continuous multiple frames mode as input is used herein, feature can be extracted to continuous multiple frames simultaneously, be improved the standard of classification detection
True property.Then, include by candidate categories target category continuous multiple frames in designated frame be determined as candidate frame, so as to from mesh
The doubtful frame for belonging to target category is extracted in mark video.Then, candidate frame is input to and is used to determine whether image to belong to mesh
The second detection model for marking classification, obtains the further classification testing result of candidate frame.Classification based on candidate frame detects knot
Fruit can determine the classification testing result of above-mentioned target video.It is therefore not required to which each frame is input to the second detection model
Classification detection is carried out, the efficiency for carrying out classification detection to target video is improved.
With further reference to Fig. 4, it illustrates the processes 400 of another embodiment of the method for generating information.The use
In the process 400 for the method for generating information, comprising the following steps:
Step 401, continuous three frame in target video is successively extracted, each extracted continuous three frame is input in advance
The first trained detection model obtains the candidate categories of continuous three frame inputted.
It in the present embodiment, can be with for generating the executing subject (such as server 105 shown in FIG. 1) of the method for information
Continuous three frame in target video is successively extracted, continuous three frame extracted every time is input to the first detection mould of training in advance
Type obtains the corresponding candidate categories of continuous three frame inputted every time.
Step 402, candidate categories are determined as candidate frame comprising the intermediate frame in continuous three frame of target category.
In the present embodiment, for continuous three frame extracted each time, above-mentioned executing subject can determine this time first
In the candidate categories of extracted continuous three frame, if including target category.If including can be extracted continuous by this time
Intermediate frame in three frames is determined as candidate frame.Thus, it is possible to extract the doubtful candidate for belonging to target category from target video
Frame.In turn, the classification of target video can be determined according to the analysis to candidate frame.It does not need each frame carrying out classification respectively
Detection.Thus, improve the efficiency that classification detection is carried out to target video.
Step 403, identified candidate frame is input to the second detection model of training in advance, obtains inputted candidate
The classification testing result of frame.
In the present embodiment, identified candidate frame can be input to the second detection of training in advance by above-mentioned executing subject
Model obtains the classification testing result of inputted candidate frame.Wherein, above-mentioned second detection model is determined for image
Whether classification is target category, and classification testing result can serve to indicate that whether the classification of inputted candidate frame is target class
Not.
Step 404, the classification for determining classification testing result instruction candidate frame is the quantity of target category, and, it determines and waits
Select the total quantity of frame.
In the present embodiment, above-mentioned executing subject can determine that the classification of classification testing result instruction candidate frame is upper first
The quantity of target category is stated, and, determine the total quantity of candidate frame.
Step 405, it is greater than the second default value in response to the ratio of the above-mentioned quantity of determination and above-mentioned total quantity, determines target
The classification of video is target category.
In the present embodiment, the ratio for the quantity and above-mentioned total quantity that classification is above-mentioned target category can be determined.Response
It is greater than the second default value in the ratio for determining above-mentioned quantity and above-mentioned total quantity, determines that the classification of above-mentioned target video is above-mentioned
Target category.Herein, the second default value can be arranged according to actual needs.
Figure 4, it is seen that the method for generating information compared with the corresponding embodiment of Fig. 2, in the present embodiment
Process 400 highlight the step of the classification testing result to the ratio-dependent target video for belonging to target category according to candidate frame
Suddenly.The scheme of the present embodiment description according to the classification testing result of candidate frame, can determine the classification detection of target video as a result,
Result it is not necessary to determine whether the frame in target video belongs to target category one by one, improves and classification inspection is carried out to target video
The efficiency of survey.
With further reference to Fig. 5, as the realization to method shown in above-mentioned each figure, this application provides one kind for generating letter
One embodiment of the device of breath, the Installation practice is corresponding with embodiment of the method shown in Fig. 2, which can specifically answer
For in various electronic equipments.
As shown in figure 5, being used to generate the device 500 of information described in the present embodiment includes: the first input unit 501, quilt
It is configured to successively extract the continuous multiple frames in target video, each extracted continuous multiple frames is input to the first of training in advance
Detection model obtains the initial category of inputted continuous multiple frames, wherein above-mentioned first detection model is for determining image sequence
Candidate categories;First determination unit 502 is configured to initial category is true for the designated frame in the continuous multiple frames of target category
It is set to candidate frame;Second input unit 503 is configured to for identified candidate frame being input to the second detection mould of training in advance
Type obtains the classification testing result of inputted candidate frame, wherein above-mentioned second detection model is used to determine that the classification of image to be
No is above-mentioned target category;Second determination unit 504 is configured to the classification testing result based on candidate frame, determines above-mentioned mesh
Mark the classification testing result of video.
In some optional implementations of the present embodiment, above-mentioned continuous multiple frames can be continuous three frame, above-mentioned continuous
Designated frame in multiframe can be the intermediate frame in above-mentioned continuous three frame.
In some optional implementations of the present embodiment, above-mentioned second determination unit 504 may include first determining
Module and the second determining module (not shown).Wherein, above-mentioned first determining module may be configured to determine classification detection
As a result the classification for indicating candidate frame is the quantity of above-mentioned target category.Above-mentioned second determining module may be configured in response to true
Fixed above-mentioned quantity is greater than the first default value, determines that the classification of above-mentioned target video is above-mentioned target category.
In some optional implementations of the present embodiment, above-mentioned second determination unit 504 may include that third determines
Module and the 4th determining module (not shown).Wherein, above-mentioned third determining module may be configured to determine classification detection
As a result the classification for indicating candidate frame is the quantity of above-mentioned target category, and, determine the total quantity of candidate frame.Above-mentioned 4th determines
Module may be configured to be greater than the second default value in response to the ratio of the above-mentioned quantity of determination and above-mentioned total quantity, determine above-mentioned
The classification of target video is above-mentioned target category.
In some optional implementations of the present embodiment, above-mentioned first detection model can train as follows
It obtains: obtaining the first training sample set, wherein the training sample in above-mentioned first training sample set includes first sample figure
The markup information of picture and the classification for being used to indicate first sample image;Using machine learning method, by the first training sample set
In input of the first sample image as the first convolutional neural networks pre-established, by the first sample image inputted
Output of the markup information as above-mentioned first convolutional neural networks, above-mentioned first convolutional neural networks of training, by the after training
One convolutional neural networks are as the first detection model.
In some optional implementations of the present embodiment, above-mentioned second detection model can train as follows
It obtains: obtaining the second training sample set, wherein the training sample in above-mentioned second training sample set includes the second sample graph
Picture and be used to indicate the second sample image classification whether be target category markup information;Using machine learning method, by
Input of second sample image as the second convolutional neural networks pre-established in two training sample set, by what is inputted
Output of the markup information of second sample image as above-mentioned second convolutional neural networks, the above-mentioned second convolution nerve net of training
Network, using the second convolutional neural networks after training as the second detection model.
The device provided by the above embodiment of the application is successively extracted in target video by the first input unit 501
Each extracted continuous multiple frames are input to the first detection model of training in advance by continuous multiple frames, so as to obtain every time
Candidate categories corresponding to the continuous multiple frames inputted.Herein, if inputting single frame every time, when frame is more fuzzy, classification
The accuracy of detection is lower.Continuous multiple frames mode as input is used herein, feature can be extracted to continuous multiple frames simultaneously, mentioned
The high accuracy of classification detection.Then, the first determination unit 502 is by candidate categories comprising in the continuous multiple frames of target category
Designated frame is determined as candidate frame, so as to extract the doubtful frame for belonging to target category from target video.Then, second is defeated
Enter unit 503 and be input to candidate frame to be used to determine the second detection model whether image belongs to target category, obtains candidate frame
Further classification testing result.Classification testing result of second determination unit 504 based on candidate frame, can determine above-mentioned
The classification testing result of target video.It is therefore not required to which each frame, which is input to the second detection model, carries out classification detection, improve
The efficiency of classification detection is carried out to target video.
Below with reference to Fig. 6, it illustrates the computer systems 600 for the electronic equipment for being suitable for being used to realize the embodiment of the present application
Structural schematic diagram.Electronic equipment shown in Fig. 6 is only an example, function to the embodiment of the present application and should not use model
Shroud carrys out any restrictions.
As shown in fig. 6, computer system 600 includes central processing unit (CPU) 601, it can be read-only according to being stored in
Program in memory (ROM) 602 or be loaded into the program in random access storage device (RAM) 603 from storage section 608 and
Execute various movements appropriate and processing.In RAM 603, also it is stored with system 600 and operates required various programs and data.
CPU 601, ROM 602 and RAM 603 are connected with each other by bus 604.Input/output (I/O) interface 605 is also connected to always
Line 604.
I/O interface 605 is connected to lower component: the importation 606 including keyboard, mouse etc.;It is penetrated including such as cathode
The output par, c 607 of spool (CRT), liquid crystal display (LCD) etc. and loudspeaker etc.;Storage section 608 including hard disk etc.;
And the communications portion 609 of the network interface card including LAN card, modem etc..Communications portion 609 via such as because
The network of spy's net executes communication process.Driver 610 is also connected to I/O interface 605 as needed.Detachable media 611, such as
Disk, CD, magneto-optic disk, semiconductor memory etc. are mounted on as needed on driver 610, in order to read from thereon
Computer program be mounted into storage section 608 as needed.
Particularly, in accordance with an embodiment of the present disclosure, it may be implemented as computer above with reference to the process of flow chart description
Software program.For example, embodiment of the disclosure includes a kind of computer program product comprising be carried on computer-readable medium
On computer program, which includes the program code for method shown in execution flow chart.In such reality
It applies in example, which can be downloaded and installed from network by communications portion 609, and/or from detachable media
611 are mounted.When the computer program is executed by central processing unit (CPU) 601, limited in execution the present processes
Above-mentioned function.It should be noted that computer-readable medium described herein can be computer-readable signal media or
Computer readable storage medium either the two any combination.Computer readable storage medium for example can be --- but
Be not limited to --- electricity, magnetic, optical, electromagnetic, infrared ray or semiconductor system, device or device, or any above combination.
The more specific example of computer readable storage medium can include but is not limited to: have one or more conducting wires electrical connection,
Portable computer diskette, hard disk, random access storage device (RAM), read-only memory (ROM), erasable type may be programmed read-only deposit
Reservoir (EPROM or flash memory), optical fiber, portable compact disc read-only memory (CD-ROM), light storage device, magnetic memory
Part or above-mentioned any appropriate combination.In this application, computer readable storage medium, which can be, any include or stores
The tangible medium of program, the program can be commanded execution system, device or device use or in connection.And
In the application, computer-readable signal media may include in a base band or the data as the propagation of carrier wave a part are believed
Number, wherein carrying computer-readable program code.The data-signal of this propagation can take various forms, including but not
It is limited to electromagnetic signal, optical signal or above-mentioned any appropriate combination.Computer-readable signal media can also be computer
Any computer-readable medium other than readable storage medium storing program for executing, the computer-readable medium can send, propagate or transmit use
In by the use of instruction execution system, device or device or program in connection.Include on computer-readable medium
Program code can transmit with any suitable medium, including but not limited to: wireless, electric wire, optical cable, RF etc., Huo Zheshang
Any appropriate combination stated.
Flow chart and block diagram in attached drawing are illustrated according to the system of the various embodiments of the application, method and computer journey
The architecture, function and operation in the cards of sequence product.In this regard, each box in flowchart or block diagram can generation
A part of one module, program segment or code of table, a part of the module, program segment or code include one or more use
The executable instruction of the logic function as defined in realizing.It should also be noted that in some implementations as replacements, being marked in box
The function of note can also occur in a different order than that indicated in the drawings.For example, two boxes succeedingly indicated are actually
It can be basically executed in parallel, they can also be executed in the opposite order sometimes, and this depends on the function involved.Also it to infuse
Meaning, the combination of each box in block diagram and or flow chart and the box in block diagram and or flow chart can be with holding
The dedicated hardware based system of functions or operations as defined in row is realized, or can use specialized hardware and computer instruction
Combination realize.
Being described in unit involved in the embodiment of the present application can be realized by way of software, can also be by hard
The mode of part is realized.Described unit also can be set in the processor, for example, can be described as: a kind of processor packet
Include the first input unit, the first determination unit, the second input unit and the second determination unit.Wherein, the title of these units exists
The restriction to the unit itself is not constituted in the case of certain, for example, the first input unit is also described as " successively extracting
Each extracted continuous multiple frames are input to the list of the first detection model of training in advance by the continuous multiple frames in target video
Member ".
As on the other hand, present invention also provides a kind of computer-readable medium, which be can be
Included in device described in above-described embodiment;It is also possible to individualism, and without in the supplying device.Above-mentioned calculating
Machine readable medium carries one or more program, when said one or multiple programs are executed by the device, so that should
Device: successively extracting the continuous multiple frames in target video, and each extracted continuous multiple frames are input to the first of training in advance
Detection model obtains the candidate categories of inputted continuous multiple frames;It include in the continuous multiple frames of target category by candidate categories
Designated frame is determined as candidate frame;Identified candidate frame is input to the second detection model of training in advance, obtains being inputted
The classification testing result of candidate frame;Classification testing result based on candidate frame determines the classification testing result of the target video.
Above description is only the preferred embodiment of the application and the explanation to institute's application technology principle.Those skilled in the art
Member is it should be appreciated that invention scope involved in the application, however it is not limited to technology made of the specific combination of above-mentioned technical characteristic
Scheme, while should also cover in the case where not departing from foregoing invention design, it is carried out by above-mentioned technical characteristic or its equivalent feature
Any combination and the other technical solutions formed.Such as features described above has similar function with (but being not limited to) disclosed herein
Can technical characteristic replaced mutually and the technical solution that is formed.