CN109308490A

CN109308490A - Method and apparatus for generating information

Info

Publication number: CN109308490A
Application number: CN201811042461.XA
Authority: CN
Inventors: 喻冬东; 王长虎
Original assignee: Beijing ByteDance Network Technology Co Ltd
Current assignee: Douyin Vision Co Ltd; Douyin Vision Beijing Co Ltd
Priority date: 2018-09-07
Filing date: 2018-09-07
Publication date: 2019-02-05
Anticipated expiration: 2038-09-07
Also published as: CN109308490B

Abstract

The embodiment of the present application discloses the method and apparatus for generating information.One specific embodiment of this method includes: the continuous multiple frames successively extracted in target video, and each extracted continuous multiple frames are input to the first detection model of training in advance, obtain the candidate categories of inputted continuous multiple frames；Candidate categories are determined as candidate frame comprising the designated frame in the continuous multiple frames of target category；Identified candidate frame is input to the second detection model of training in advance, obtains the classification testing result of inputted candidate frame；Classification testing result based on candidate frame determines the classification testing result of the target video.This embodiment improves the accuracys of video classification testing result.

Description

Method and apparatus for generating information

Technical field

The invention relates to field of computer technology, and in particular to the method and apparatus for generating information.

Background technique

With the development of computer technology, video class application is come into being.User can use video class application and upload, send out Cloth video.To guarantee video quality, it usually needs determine the classification of the related content for the video that user uploads.When video relates to And when the contents such as pornographic, violence, weapon, the push and forwarding of the video can be forbidden.

Existing visual classification mode, it usually needs extract each frame in video, or mentioned according to Fixed Time Interval Multiple frames in video are taken, whether the classification for analyzing each frame one by one is target category, and then determines video classification.

Summary of the invention

The embodiment of the present application proposes the method and apparatus for generating information.

In a first aspect, the embodiment of the present application provides a kind of method for generating information, this method comprises: successively extracting Each extracted continuous multiple frames are input to the first detection model of training in advance, obtained by the continuous multiple frames in target video The candidate categories of the continuous multiple frames inputted, wherein the first detection model is used to determine the candidate categories of image sequence；It will be candidate Classification includes that the designated frame in the continuous multiple frames of target category is determined as candidate frame；Identified candidate frame is input to preparatory instruction The second experienced detection model obtains the classification testing result of inputted candidate frame, wherein the second detection model is for determining figure Whether the classification of picture is target category；Classification testing result based on candidate frame determines the classification testing result of target video.

In some embodiments, continuous multiple frames are continuous three frame, and the designated frame in continuous multiple frames is the centre of continuous three frame Frame.

In some embodiments, the classification testing result based on candidate frame determines the classification testing result of target video, packet Include: the classification for determining classification testing result instruction candidate frame is the quantity of target category；It is pre- to be greater than first in response to quantification If numerical value, determine that the classification of target video is target category.

In some embodiments, the classification testing result based on candidate frame determines the classification testing result of target video, packet Include: the classification for determining classification testing result instruction candidate frame is the quantity of target category, and, determine the total quantity of candidate frame； It is greater than the second default value in response to the ratio of the above-mentioned quantity of determination and total quantity, determines that the classification of target video is target class Not.

In some embodiments, training obtains the first detection model as follows: the first training sample set is obtained, Wherein, the training sample in the first training sample set includes first sample image and the classification for being used to indicate first sample image Markup information；Using machine learning method, using the first sample image in the first training sample set as pre-establishing The input of first convolutional neural networks, using the markup information of the first sample image inputted as the first convolutional neural networks Output, the first convolutional neural networks of training, using the first convolutional neural networks after training as the first detection model.

In some embodiments, training obtains the second detection model as follows: the second training sample set is obtained, Wherein, the training sample in the second training sample set includes the second sample image and the classification for being used to indicate the second sample image Whether be target category markup information；Using machine learning method, by the second sample image in the second training sample set As the input of the second convolutional neural networks pre-established, using the markup information of the second sample image inputted as second The output of convolutional neural networks, the second convolutional neural networks of training, using the second convolutional neural networks after training as the second inspection Survey model.

Second aspect, the embodiment of the present application provide a kind of for generating the device of information, which includes: the first input Unit is configured to successively extract the continuous multiple frames in target video, and each extracted continuous multiple frames are input to preparatory instruction The first experienced detection model, obtains the initial category of inputted continuous multiple frames, wherein the first detection model is for determining image The candidate categories of sequence；First determination unit is configured to be the designated frame in the continuous multiple frames of target category by initial category It is determined as candidate frame；Second input unit is configured to for identified candidate frame being input to the second detection mould of training in advance Type obtains the classification testing result of inputted candidate frame, wherein the second detection model be used for determine image classification whether be Target category；Second determination unit is configured to the classification testing result based on candidate frame, determines the classification detection of target video As a result.

In some embodiments, continuous multiple frames are continuous three frame, the designated frame in continuous multiple frames be in continuous three frame in Between frame.

In some embodiments, the second determination unit, comprising: the first determining module is configured to determine classification detection knot Fruit indicates that the classification of candidate frame is the quantity of target category；Second determining module is configured in response to quantification and is greater than the One default value determines that the classification of target video is target category.

In some embodiments, the second determination unit, comprising: third determining module is configured to determine classification detection knot Fruit indicates that the classification of candidate frame is the quantity of target category, and, determine the total quantity of candidate frame；4th determining module is matched It is set to and is greater than the second default value in response to the ratio of the above-mentioned quantity of determination and total quantity, determine that the classification of target video is target Classification.

The third aspect, the embodiment of the present application provide a kind of electronic equipment, comprising: one or more processors；Storage dress Set, be stored thereon with one or more programs, when one or more programs are executed by one or more processors so that one or Multiple processors realize the method such as any embodiment in first aspect.

Fourth aspect, the embodiment of the present application provide a kind of computer-readable medium, are stored thereon with computer program, should The method such as any embodiment in first aspect is realized when program is executed by processor.

Method and apparatus provided by the embodiments of the present application for generating information, by successively extracting the company in target video Each extracted continuous multiple frames are input to the first detection model of training in advance, so as to obtain each institute by continuous multiframe Candidate categories corresponding to the continuous multiple frames of input.Herein, if inputting single frame every time, when frame is more fuzzy, classification inspection The accuracy of survey is lower.Continuous multiple frames mode as input is used herein, feature can be extracted to continuous multiple frames simultaneously, improved The accuracy of classification detection.Then, candidate categories are determined as candidate comprising the designated frame in the continuous multiple frames of target category Frame, so as to extract the doubtful frame for belonging to target category from target video.Then, candidate frame is input to and is used to determine Whether image belongs to the second detection model of target category, obtains the further classification testing result of candidate frame.Based on candidate The classification testing result of frame can determine the classification testing result of target video.It is therefore not required to which each frame is input to Two detection models carry out classification detection, improve the efficiency that classification detection is carried out to target video.

Detailed description of the invention

By reading a detailed description of non-restrictive embodiments in the light of the attached drawings below, the application's is other Feature, objects and advantages will become more apparent upon:

Fig. 1 is that one embodiment of the application can be applied to exemplary system architecture figure therein；

Fig. 2 is the flow chart according to one embodiment of the method for generating information of the application；

Fig. 3 is the schematic diagram according to an application scenarios of the method for generating information of the application；

Fig. 4 is the flow chart according to another embodiment of the method for generating information of the application；

Fig. 5 is the structural schematic diagram according to one embodiment of the device for generating information of the application；

Fig. 6 is adapted for the structural schematic diagram for the computer system for realizing the electronic equipment of the embodiment of the present application.

Specific embodiment

The application is described in further detail with reference to the accompanying drawings and examples.It is understood that this place is retouched The specific embodiment stated is used only for explaining related invention, rather than the restriction to the invention.It also should be noted that in order to Convenient for description, part relevant to related invention is illustrated only in attached drawing.

It should be noted that in the absence of conflict, the features in the embodiments and the embodiments of the present application can phase Mutually combination.The application is described in detail below with reference to the accompanying drawings and in conjunction with the embodiments.

Fig. 1 is shown can be using the application for generating the method for information or the example of the device for generating information Property system architecture 100.

As shown in Figure 1, system architecture 100 may include terminal device 101,102,103, network 104 and server 105. Network 104 between terminal device 101,102,103 and server 105 to provide the medium of communication link.Network 104 can be with Including various connection types, such as wired, wireless communication link or fiber optic cables etc..

User can be used terminal device 101,102,103 and be interacted by network 104 with server 105, to receive or send out Send message etc..Various telecommunication customer end applications can be installed, such as video record class is answered on terminal device 101,102,103 With the application of, video playback class, the application of interactive voice class, searching class application, instant messaging tools, mailbox client, social platform Software etc..

Terminal device 101,102,103 can be hardware, be also possible to software.When terminal device 101,102,103 is hard When part, it can be the various electronic equipments with display screen, including but not limited to smart phone, tablet computer, on knee portable Computer and desktop computer etc..When terminal device 101,102,103 is software, above-mentioned cited electricity may be mounted at In sub- equipment.Multiple softwares or software module (such as providing Distributed Services) may be implemented into it, also may be implemented into Single software or software module.It is not specifically limited herein.

When terminal device 101,102,103 is hardware, it is also equipped with image capture device thereon.Image Acquisition is set It is standby to can be the various equipment for being able to achieve acquisition image function, such as camera, sensor.User can use terminal device 101, the image capture device on 102,103, to acquire video.

Server 105 can be to provide the server of various services, such as uploading to terminal device 101,102,103 The video video processing service device that is stored, managed or analyzed.Video processing service device can store for figure As the first detection model that classification is detected, and, can store whether the classification for determining image is target category The second detection model.Herein, target category can be one or more classifications.Each target category can correspond to one Two detection models, for determining whether image belongs to the target category.Video processing service device can use the first detection model It determines the corresponding classification of continuous multiple frames in target video, also can use the second detection model and determine whether the classification of frame is mesh Classification is marked, to obtain the classification testing result of video.

In this way, server 105 can determine user institute after user is using 101,102,103 uploaded videos of terminal device Whether the video of upload belongs to target category, in turn, can carry out the processing such as forbidding pushing, forbid forwarding to target video, or Person pushes relevant information (such as classification testing result of target video).

It should be noted that server 105 can be hardware, it is also possible to software.When server is hardware, Ke Yishi The distributed server cluster of ready-made multiple server compositions, also may be implemented into individual server.When server is software, Multiple softwares or software module (such as providing Distributed Services) may be implemented into, single software or soft also may be implemented into Part module.It is not specifically limited herein.

It should be noted that the method provided by the embodiment of the present application for generating information is generally held by server 105 Row, correspondingly, the device for generating information is generally positioned in server 105.

It should be understood that the number of terminal device, network and server in Fig. 1 is only schematical.According to realization need It wants, can have any number of terminal device, network and server.

With continued reference to Fig. 2, the process of one embodiment of the method for generating information according to the application is shown 200.The method for being used to generate information, comprising the following steps:

Step 201, the continuous multiple frames in target video are successively extracted, each extracted continuous multiple frames are input in advance The first trained detection model, obtains the candidate categories of inputted continuous multiple frames.

It in the present embodiment, can be with for generating the executing subject (such as server 105 shown in FIG. 1) of the method for information The continuous multiple frames (for example, at least two frames) in target video are successively extracted, the continuous multiple frames extracted every time are input to preparatory training The first detection model, obtain the corresponding candidate categories of the continuous multiple frames inputted every time.

Herein, target video can be the video of classification to be detected.In a kind of scene, target video can be stored in advance In above-mentioned executing subject.Above-mentioned executing subject can be directly from the continuous multiple frames being locally extracted in target video.In another kind In scene, target video can be above-mentioned executing subject by wired connection or radio connection, from terminal device (such as Terminal device shown in FIG. 1 101,102,103) institute is received.Above-mentioned executing subject can regard after receiving from target Continuous multiple frames are successively extracted in frequency.It should be pointed out that above-mentioned radio connection can include but is not limited to 3G/4G connection, WiFi connection, bluetooth connection, WiMAX connection, Zigbee connection, UWB (ultra wideband) connection and other now The radio connection known or developed in the future.

Herein, each secondary continuous multiple frames can be carried out successively using the frame in target video as the start frame of continuous multiple frames It extracts.As an example, the first, second frame of target video can be extracted for the first time；Second can extract the of target video Two, third frame；And so on.As another example, the first frame of target video can be extracted for the first time to the 5th frame；Second The second frame of target video can be extracted to the 6th frame；And so on.Herein, if inputting single frame every time, when frame is more fuzzy When, the accuracy of classification detection is lower.Continuous multiple frames mode as input is used herein, and continuous multiple frames can be extracted simultaneously Feature improves the accuracy of classification detection.

Herein, the first detection model is determined for the candidate categories of image sequence.Herein, image sequence and image It can be divided into multiple classifications in advance.For example, can include but is not limited at least one of following: pornographic class, violence class, weapon Class, normal class.First detection model can be the corresponding relationship for characterizing image sequence and candidate categories.As an example, the One detection model can be that sequence (can be characterized) in the form of pixel matrix is corresponding with candidate categories for characterizing image The mapping table of relationship.

As another example, the first detection model can be using machine learning method, to existing convolutional neural networks (such as DenseBox, VGGNet, ResNet, SegNet etc.) carries out obtained from Training.In practice, convolutional Neural net Network (Convolutional Neural Network, CNN) is a kind of feedforward neural network, its artificial neuron can respond Surrounding cells in a part of coverage area have outstanding performance for image procossing, therefore, it is possible to using convolutional neural networks into The extraction of frame feature in row Sample video.In this example, the product neural network established may include convolutional layer, Chi Hua Layer, full articulamentum etc..Wherein, convolutional layer can be used for extracting characteristics of image.Pond layer can be used for carrying out the information of input Down-sampled (downsample).Full articulamentum can be used for classifying obtained feature.

It here, can be respectively to the image sequence of input to obtained first detection model after convolutional neural networks training Each of column image (such as each frame in the continuous multiple frames inputted each time) extracts feature.It then, can be to institute The feature of extraction is merged.For example, the feature of every frame can be indicated with the form of eigenmatrix or feature vector.It can be with The characteristic value of same position in the eigenmatrix of each frame is averaged, fused eigenmatrix is obtained.To extracted spy It after sign is merged, can classify to fused eigenmatrix, determine inputted image sequence (continuous multiple frames) Belong to the probability of each pre-set categories.Finally, the general of default value (such as 0.3) can be greater than by the calculated probability of institute Candidate categories of the pre-set categories corresponding to rate as the image sequence (continuous multiple frames) inputted.In practice, when being calculated Probability in, there is no be greater than default value probability, then the image sequence inputted (continuous multiple frames) be not present corresponding time Select classification.At this point it is possible to continue to extract next image sequence (continuous multiple frames), continue to be input to the first detection model.Work as institute In calculated probability, there are one or more probability for being greater than default value, then each can will be greater than the general of default value Candidate categories of the pre-set categories corresponding to rate as the image sequence (continuous multiple frames) inputted.Then, continue to extract next A image sequence (continuous multiple frames), continues to be input to the first detection model.

It should be noted that above-mentioned first detection model is also possible to have the function of image characteristics extraction and classification feature Other models, however it is not limited to which above-mentioned example, specific model structure can set according to actual needs, be not construed as limiting herein.

In practice, the convolutional neural networks structure that lightweight can be used in the first detection model is trained to obtain.It utilizes First detection model determines the classification of continuous multiple frames, it is therefore intended that rapidly extracting goes out the doubtful frame for target category, lightweight Convolutional neural networks structure is relatively simple.For example, can be successively by convolutional layer, pond layer, convolutional layer, pond layer, full articulamentum It constitutes.Since the convolutional neural networks structure of lightweight is relatively simple, thus, data calculation amount is smaller, and processing speed is very fast.This Place, target category can be preassigned a certain classification, such as pornographic class perhaps violence class or weapon class etc..

In some optional implementations of the present embodiment, each extracted continuous multiple frames can be continuous three frame. As an example, extracting the first frame of target video for the first time to third frame；The second frame of target video is extracted for the second time to the 4th frame； And so on.

In some optional implementations of the present embodiment, above-mentioned first detection model can train as follows It obtains: the first step, available first training sample set.Wherein, the training sample in above-mentioned first training sample set can To include the markup information of first sample image with the classification for being used to indicate first sample image.Herein, first sample image can To be single image, it is also possible to image sequence.First sample image can be the various images collected from internet, can also To be the extracted frame from the video that user uploads.Second step will be in the first training sample set using machine learning method Input of the first sample image as the first convolutional neural networks pre-established, by the mark of the first sample image inputted Infuse output of the information as above-mentioned first convolutional neural networks, above-mentioned first convolutional neural networks of training, by first after training Convolutional neural networks are as the first detection model.Herein, the first convolutional neural networks can use various existing lightweights Convolutional neural networks structure.

Step 202, candidate categories are determined as candidate frame comprising the designated frame in the continuous multiple frames of target category.

In the present embodiment, for extracted continuous multiple frames each time, above-mentioned executing subject can determine this time first In the candidate categories of extracted continuous multiple frames, if including target category.If including can be extracted continuous by this time Designated frame in multiframe is determined as candidate frame.Herein, designated frame can be the frame of the designated order in continuous multiple frames.For example, can To be to remove first frame and an any other frame of tail frame, first frame or the tail frame being also possible in continuous multiple frames in continuous multiple frames.

Thus, it is possible to extract the doubtful candidate frame for belonging to target category from target video.It in turn, can be according to time The analysis for selecting frame determines the classification of target video.It does not need each frame carrying out classification detection respectively.Thus, it improves to mesh Mark the efficiency that video carries out classification detection.

In some optional implementations of the present embodiment, each extracted continuous multiple frames can be continuous three frame. Designated frame in continuous multiple frames can be the intermediate frame of continuous three frame.

Step 203, identified candidate frame is input to the second detection model of training in advance, obtains inputted candidate The classification testing result of frame.

In the present embodiment, identified candidate frame can be input to the second detection of training in advance by above-mentioned executing subject Model obtains the classification testing result of inputted candidate frame.Wherein, above-mentioned second detection model is determined for image Whether classification is target category, and classification testing result can serve to indicate that whether the classification of inputted candidate frame is target class Not.

As an example, the second detection model can be for characterize image (can be characterized in the form of pixel matrix) with The mapping table of classification testing result.

As another example, the second detection model can be using machine learning method, have classification feature to existing Model carry out Training obtained from.The above-mentioned model with classification feature, which can be, uses various existing structures The convolutional neural networks of (such as DenseBox, VGGNet, ResNet, SegNet etc.).It is also possible to support vector machines Models such as (Support Vector Machine, SVM).Second detection model can carry out feature to the candidate frame inputted and mention It takes, and then extracted feature is analyzed, the classification that inputted candidate frame is calculated is the probability of target category.When When determining that the classification of candidate frame is greater than predetermined probabilities (such as 0.5) for the probability of target category, inputted candidate can be determined The classification of frame is target category.It, can when the probability that the classification for determining candidate frame is target category is no more than above-mentioned predetermined probabilities To determine the classification of inputted candidate frame not for target category.

Thus, it is only necessary to candidate frame is input to the second detection model and carry out classification detection, do not need to input each frame Classification detection is carried out to the second detection model, improves the efficiency for carrying out classification detection to target video.

In some optional implementations of the present embodiment, the second detection model can be trained as follows It arrives: the first step, available second training sample set.Wherein, the training sample in above-mentioned second training sample set includes Second sample image and be used to indicate the second sample image classification whether be target category markup information.Herein, the second sample This image can be the various images collected from internet, be also possible to the extracted frame from the video that user uploads.The Two steps, can use machine learning method, using the second sample image in the second training sample set as pre-established The input of two convolutional neural networks, using the markup information of the second sample image inputted as above-mentioned second convolutional neural networks Output, above-mentioned second convolutional neural networks of training, using the second convolutional neural networks after training as the second detection model.This Place, the second convolutional neural networks can use various existing convolutional neural networks structures.

Step 204, the classification testing result based on candidate frame determines the classification testing result of target video.

In the present embodiment, classification testing result of the above-mentioned executing subject based on candidate frame, it is true to can use various modes Set the goal the classification testing result of video.For example, in response to determining that there are the candidate frames that classification is target category, it is determined that target The classification of video is target category.

In some optional implementations of the present embodiment, above-mentioned executing subject can determine that classification testing result indicates The classification of candidate frame is the quantity of above-mentioned target category.It is greater than the first default value in response to the above-mentioned quantity of determination, determines above-mentioned The classification of target video is above-mentioned target category.Herein, the first default value can be arranged according to actual needs.

In some optional implementations of the present embodiment, above-mentioned executing subject can determine classification testing result first The classification for indicating candidate frame is the quantity of above-mentioned target category, and, determine the total quantity of candidate frame.Then, class can be determined It Wei not the quantity of above-mentioned target category and the ratio of above-mentioned total quantity.In response to the ratio of determination above-mentioned quantity and above-mentioned total quantity Greater than the second default value, determine that the classification of above-mentioned target video is above-mentioned target category.Herein, the second default value can root It is arranged according to actual demand.

Thus, it is possible to determine the classification testing result of target video according to the classification testing result of candidate frame, do not need by Each frame is input to the second detection model and carries out classification detection, improves the efficiency that classification detection is carried out to target video.

With continued reference to the signal that Fig. 3, Fig. 3 are according to the application scenarios of the method for generating information of the present embodiment Figure.In the application scenarios of Fig. 3, user is had sent to server 302 comprising target video to be uploaded by terminal device 301 303 video upload request.Server 302 can successively extract the company in target video 303 after receiving target video Each extracted continuous multiple frames are input to the first detection model of training in advance, obtain being inputted continuous more by continuous multiframe The candidate categories of frame.Then, candidate categories can be included that the designated frame in the continuous multiple frames of pornographic class is determined as by server 302 Candidate frame, and candidate frame is input to the second detection model trained in advance, obtain the classification detection knot of inputted candidate frame Fruit.Later, server 302 can determine the classification testing result of above-mentioned target video based on the classification testing result of candidate frame. When the classification of classification testing result instruction target video is pornographic class, server 302 can be pushed to terminal device 301 and be prompted Information.At this point, the target video that the prompt information can be used for that user is prompted to be uploaded is related to Pornograph and prompt user Video is replaced to upload.When the classification of classification testing result instruction target video is not pornographic class, the target video can store, And prompt information is pushed to terminal device 301.At this point, the prompt information can be used for that ownership goal video is prompted to upload successfully.

The method provided by the above embodiment of the application will be each by successively extracting the continuous multiple frames in target video Extracted continuous multiple frames are input to the first detection model of training in advance, so as to obtain the continuous multiple frames inputted every time Corresponding candidate categories.Herein, if inputting single frame every time, when frame is more fuzzy, the accuracy of classification detection is lower. Continuous multiple frames mode as input is used herein, feature can be extracted to continuous multiple frames simultaneously, be improved the standard of classification detection True property.Then, include by candidate categories target category continuous multiple frames in designated frame be determined as candidate frame, so as to from mesh The doubtful frame for belonging to target category is extracted in mark video.Then, candidate frame is input to and is used to determine whether image to belong to mesh The second detection model for marking classification, obtains the further classification testing result of candidate frame.Classification based on candidate frame detects knot Fruit can determine the classification testing result of above-mentioned target video.It is therefore not required to which each frame is input to the second detection model Classification detection is carried out, the efficiency for carrying out classification detection to target video is improved.

With further reference to Fig. 4, it illustrates the processes 400 of another embodiment of the method for generating information.The use In the process 400 for the method for generating information, comprising the following steps:

Step 401, continuous three frame in target video is successively extracted, each extracted continuous three frame is input in advance The first trained detection model obtains the candidate categories of continuous three frame inputted.

It in the present embodiment, can be with for generating the executing subject (such as server 105 shown in FIG. 1) of the method for information Continuous three frame in target video is successively extracted, continuous three frame extracted every time is input to the first detection mould of training in advance Type obtains the corresponding candidate categories of continuous three frame inputted every time.

Step 402, candidate categories are determined as candidate frame comprising the intermediate frame in continuous three frame of target category.

In the present embodiment, for continuous three frame extracted each time, above-mentioned executing subject can determine this time first In the candidate categories of extracted continuous three frame, if including target category.If including can be extracted continuous by this time Intermediate frame in three frames is determined as candidate frame.Thus, it is possible to extract the doubtful candidate for belonging to target category from target video Frame.In turn, the classification of target video can be determined according to the analysis to candidate frame.It does not need each frame carrying out classification respectively Detection.Thus, improve the efficiency that classification detection is carried out to target video.

Step 403, identified candidate frame is input to the second detection model of training in advance, obtains inputted candidate The classification testing result of frame.

Step 404, the classification for determining classification testing result instruction candidate frame is the quantity of target category, and, it determines and waits Select the total quantity of frame.

In the present embodiment, above-mentioned executing subject can determine that the classification of classification testing result instruction candidate frame is upper first The quantity of target category is stated, and, determine the total quantity of candidate frame.

Step 405, it is greater than the second default value in response to the ratio of the above-mentioned quantity of determination and above-mentioned total quantity, determines target The classification of video is target category.

In the present embodiment, the ratio for the quantity and above-mentioned total quantity that classification is above-mentioned target category can be determined.Response It is greater than the second default value in the ratio for determining above-mentioned quantity and above-mentioned total quantity, determines that the classification of above-mentioned target video is above-mentioned Target category.Herein, the second default value can be arranged according to actual needs.

Figure 4, it is seen that the method for generating information compared with the corresponding embodiment of Fig. 2, in the present embodiment Process 400 highlight the step of the classification testing result to the ratio-dependent target video for belonging to target category according to candidate frame Suddenly.The scheme of the present embodiment description according to the classification testing result of candidate frame, can determine the classification detection of target video as a result, Result it is not necessary to determine whether the frame in target video belongs to target category one by one, improves and classification inspection is carried out to target video The efficiency of survey.

With further reference to Fig. 5, as the realization to method shown in above-mentioned each figure, this application provides one kind for generating letter One embodiment of the device of breath, the Installation practice is corresponding with embodiment of the method shown in Fig. 2, which can specifically answer For in various electronic equipments.

As shown in figure 5, being used to generate the device 500 of information described in the present embodiment includes: the first input unit 501, quilt It is configured to successively extract the continuous multiple frames in target video, each extracted continuous multiple frames is input to the first of training in advance Detection model obtains the initial category of inputted continuous multiple frames, wherein above-mentioned first detection model is for determining image sequence Candidate categories；First determination unit 502 is configured to initial category is true for the designated frame in the continuous multiple frames of target category It is set to candidate frame；Second input unit 503 is configured to for identified candidate frame being input to the second detection mould of training in advance Type obtains the classification testing result of inputted candidate frame, wherein above-mentioned second detection model is used to determine that the classification of image to be No is above-mentioned target category；Second determination unit 504 is configured to the classification testing result based on candidate frame, determines above-mentioned mesh Mark the classification testing result of video.

In some optional implementations of the present embodiment, above-mentioned continuous multiple frames can be continuous three frame, above-mentioned continuous Designated frame in multiframe can be the intermediate frame in above-mentioned continuous three frame.

In some optional implementations of the present embodiment, above-mentioned second determination unit 504 may include first determining Module and the second determining module (not shown).Wherein, above-mentioned first determining module may be configured to determine classification detection As a result the classification for indicating candidate frame is the quantity of above-mentioned target category.Above-mentioned second determining module may be configured in response to true Fixed above-mentioned quantity is greater than the first default value, determines that the classification of above-mentioned target video is above-mentioned target category.

In some optional implementations of the present embodiment, above-mentioned second determination unit 504 may include that third determines Module and the 4th determining module (not shown).Wherein, above-mentioned third determining module may be configured to determine classification detection As a result the classification for indicating candidate frame is the quantity of above-mentioned target category, and, determine the total quantity of candidate frame.Above-mentioned 4th determines Module may be configured to be greater than the second default value in response to the ratio of the above-mentioned quantity of determination and above-mentioned total quantity, determine above-mentioned The classification of target video is above-mentioned target category.

In some optional implementations of the present embodiment, above-mentioned first detection model can train as follows It obtains: obtaining the first training sample set, wherein the training sample in above-mentioned first training sample set includes first sample figure The markup information of picture and the classification for being used to indicate first sample image；Using machine learning method, by the first training sample set In input of the first sample image as the first convolutional neural networks pre-established, by the first sample image inputted Output of the markup information as above-mentioned first convolutional neural networks, above-mentioned first convolutional neural networks of training, by the after training One convolutional neural networks are as the first detection model.

In some optional implementations of the present embodiment, above-mentioned second detection model can train as follows It obtains: obtaining the second training sample set, wherein the training sample in above-mentioned second training sample set includes the second sample graph Picture and be used to indicate the second sample image classification whether be target category markup information；Using machine learning method, by Input of second sample image as the second convolutional neural networks pre-established in two training sample set, by what is inputted Output of the markup information of second sample image as above-mentioned second convolutional neural networks, the above-mentioned second convolution nerve net of training Network, using the second convolutional neural networks after training as the second detection model.

The device provided by the above embodiment of the application is successively extracted in target video by the first input unit 501 Each extracted continuous multiple frames are input to the first detection model of training in advance by continuous multiple frames, so as to obtain every time Candidate categories corresponding to the continuous multiple frames inputted.Herein, if inputting single frame every time, when frame is more fuzzy, classification The accuracy of detection is lower.Continuous multiple frames mode as input is used herein, feature can be extracted to continuous multiple frames simultaneously, mentioned The high accuracy of classification detection.Then, the first determination unit 502 is by candidate categories comprising in the continuous multiple frames of target category Designated frame is determined as candidate frame, so as to extract the doubtful frame for belonging to target category from target video.Then, second is defeated Enter unit 503 and be input to candidate frame to be used to determine the second detection model whether image belongs to target category, obtains candidate frame Further classification testing result.Classification testing result of second determination unit 504 based on candidate frame, can determine above-mentioned The classification testing result of target video.It is therefore not required to which each frame, which is input to the second detection model, carries out classification detection, improve The efficiency of classification detection is carried out to target video.

Below with reference to Fig. 6, it illustrates the computer systems 600 for the electronic equipment for being suitable for being used to realize the embodiment of the present application Structural schematic diagram.Electronic equipment shown in Fig. 6 is only an example, function to the embodiment of the present application and should not use model Shroud carrys out any restrictions.

As shown in fig. 6, computer system 600 includes central processing unit (CPU) 601, it can be read-only according to being stored in Program in memory (ROM) 602 or be loaded into the program in random access storage device (RAM) 603 from storage section 608 and Execute various movements appropriate and processing.In RAM 603, also it is stored with system 600 and operates required various programs and data. CPU 601, ROM 602 and RAM 603 are connected with each other by bus 604.Input/output (I/O) interface 605 is also connected to always Line 604.

I/O interface 605 is connected to lower component: the importation 606 including keyboard, mouse etc.；It is penetrated including such as cathode The output par, c 607 of spool (CRT), liquid crystal display (LCD) etc. and loudspeaker etc.；Storage section 608 including hard disk etc.； And the communications portion 609 of the network interface card including LAN card, modem etc..Communications portion 609 via such as because The network of spy's net executes communication process.Driver 610 is also connected to I/O interface 605 as needed.Detachable media 611, such as Disk, CD, magneto-optic disk, semiconductor memory etc. are mounted on as needed on driver 610, in order to read from thereon Computer program be mounted into storage section 608 as needed.

Particularly, in accordance with an embodiment of the present disclosure, it may be implemented as computer above with reference to the process of flow chart description Software program.For example, embodiment of the disclosure includes a kind of computer program product comprising be carried on computer-readable medium On computer program, which includes the program code for method shown in execution flow chart.In such reality It applies in example, which can be downloaded and installed from network by communications portion 609, and/or from detachable media 611 are mounted.When the computer program is executed by central processing unit (CPU) 601, limited in execution the present processes Above-mentioned function.It should be noted that computer-readable medium described herein can be computer-readable signal media or Computer readable storage medium either the two any combination.Computer readable storage medium for example can be --- but Be not limited to --- electricity, magnetic, optical, electromagnetic, infrared ray or semiconductor system, device or device, or any above combination. The more specific example of computer readable storage medium can include but is not limited to: have one or more conducting wires electrical connection, Portable computer diskette, hard disk, random access storage device (RAM), read-only memory (ROM), erasable type may be programmed read-only deposit Reservoir (EPROM or flash memory), optical fiber, portable compact disc read-only memory (CD-ROM), light storage device, magnetic memory Part or above-mentioned any appropriate combination.In this application, computer readable storage medium, which can be, any include or stores The tangible medium of program, the program can be commanded execution system, device or device use or in connection.And In the application, computer-readable signal media may include in a base band or the data as the propagation of carrier wave a part are believed Number, wherein carrying computer-readable program code.The data-signal of this propagation can take various forms, including but not It is limited to electromagnetic signal, optical signal or above-mentioned any appropriate combination.Computer-readable signal media can also be computer Any computer-readable medium other than readable storage medium storing program for executing, the computer-readable medium can send, propagate or transmit use In by the use of instruction execution system, device or device or program in connection.Include on computer-readable medium Program code can transmit with any suitable medium, including but not limited to: wireless, electric wire, optical cable, RF etc., Huo Zheshang Any appropriate combination stated.

Flow chart and block diagram in attached drawing are illustrated according to the system of the various embodiments of the application, method and computer journey The architecture, function and operation in the cards of sequence product.In this regard, each box in flowchart or block diagram can generation A part of one module, program segment or code of table, a part of the module, program segment or code include one or more use The executable instruction of the logic function as defined in realizing.It should also be noted that in some implementations as replacements, being marked in box The function of note can also occur in a different order than that indicated in the drawings.For example, two boxes succeedingly indicated are actually It can be basically executed in parallel, they can also be executed in the opposite order sometimes, and this depends on the function involved.Also it to infuse Meaning, the combination of each box in block diagram and or flow chart and the box in block diagram and or flow chart can be with holding The dedicated hardware based system of functions or operations as defined in row is realized, or can use specialized hardware and computer instruction Combination realize.

Being described in unit involved in the embodiment of the present application can be realized by way of software, can also be by hard The mode of part is realized.Described unit also can be set in the processor, for example, can be described as: a kind of processor packet Include the first input unit, the first determination unit, the second input unit and the second determination unit.Wherein, the title of these units exists The restriction to the unit itself is not constituted in the case of certain, for example, the first input unit is also described as " successively extracting Each extracted continuous multiple frames are input to the list of the first detection model of training in advance by the continuous multiple frames in target video Member ".

As on the other hand, present invention also provides a kind of computer-readable medium, which be can be Included in device described in above-described embodiment；It is also possible to individualism, and without in the supplying device.Above-mentioned calculating Machine readable medium carries one or more program, when said one or multiple programs are executed by the device, so that should Device: successively extracting the continuous multiple frames in target video, and each extracted continuous multiple frames are input to the first of training in advance Detection model obtains the candidate categories of inputted continuous multiple frames；It include in the continuous multiple frames of target category by candidate categories Designated frame is determined as candidate frame；Identified candidate frame is input to the second detection model of training in advance, obtains being inputted The classification testing result of candidate frame；Classification testing result based on candidate frame determines the classification testing result of the target video.

Above description is only the preferred embodiment of the application and the explanation to institute's application technology principle.Those skilled in the art Member is it should be appreciated that invention scope involved in the application, however it is not limited to technology made of the specific combination of above-mentioned technical characteristic Scheme, while should also cover in the case where not departing from foregoing invention design, it is carried out by above-mentioned technical characteristic or its equivalent feature Any combination and the other technical solutions formed.Such as features described above has similar function with (but being not limited to) disclosed herein Can technical characteristic replaced mutually and the technical solution that is formed.

Claims

1. a kind of method for generating information, comprising:

The continuous multiple frames in target video are successively extracted, each extracted continuous multiple frames are input to the first inspection of training in advance Model is surveyed, the candidate categories of inputted continuous multiple frames are obtained, wherein first detection model is for determining image sequence Candidate categories；

Candidate categories are determined as candidate frame comprising the designated frame in the continuous multiple frames of target category；

Identified candidate frame is input to the second detection model of training in advance, obtains the classification detection of inputted candidate frame As a result, wherein second detection model is for determining whether the classification of image is the target category；

Classification testing result based on candidate frame determines the classification testing result of the target video.

2. the method according to claim 1 for generating information, wherein the continuous multiple frames are continuous three frame, continuously Designated frame in multiframe is the intermediate frame of continuous three frame.

3. the method according to claim 1 for generating information, wherein the classification based on candidate frame detects knot Fruit determines the classification testing result of the target video, comprising:

The classification for determining classification testing result instruction candidate frame is the quantity of the target category；

It is greater than the first default value in response to the determination quantity, determines that the classification of the target video is the target category.

4. the method according to claim 1 for generating information, wherein the classification based on candidate frame detects knot Fruit determines the classification testing result of the target video, comprising:

The classification for determining classification testing result instruction candidate frame is the quantity of the target category, and, determine the total of candidate frame Quantity；

It is greater than the second default value in response to the ratio of the determination quantity and the total quantity, determines the class of the target video It Wei not the target category.

5. the method according to claim 1 for generating information, wherein first detection model is as follows Training obtains:

Obtain the first training sample set, wherein the training sample in the first training sample set includes first sample figure The markup information of picture and the classification for being used to indicate first sample image；

Using machine learning method, using the first sample image in the first training sample set as the first convolution pre-established The input of neural network, using the markup information of the first sample image inputted as the defeated of first convolutional neural networks Out, training first convolutional neural networks, using the first convolutional neural networks after training as the first detection model.

6. the method according to claim 1 for generating information, wherein second detection model is as follows Training obtains:

Obtain the second training sample set, wherein the training sample in the second training sample set includes the second sample graph Picture and be used to indicate the second sample image classification whether be target category markup information；

Using machine learning method, using the second sample image in the second training sample set as the second convolution pre-established The input of neural network, using the markup information of the second sample image inputted as the defeated of second convolutional neural networks Out, training second convolutional neural networks, using the second convolutional neural networks after training as the second detection model.

7. a kind of for generating the device of information, comprising:

First input unit is configured to successively extract the continuous multiple frames in target video, will each extracted continuous multiple frames It is input to the first detection model of training in advance, obtains the initial category of inputted continuous multiple frames, wherein first detection Model is used to determine the candidate categories of image sequence；

First determination unit is configured to initial category being determined as candidate for the designated frame in the continuous multiple frames of target category Frame；

Second input unit is configured to for identified candidate frame being input to the second detection model of training in advance, obtains institute The classification testing result of the candidate frame of input, wherein second detection model is used to determine whether the classification of image to be described Target category；

Second determination unit is configured to the classification testing result based on candidate frame, determines the classification detection of the target video As a result.

8. according to claim 7 for generating the device of information, wherein the continuous multiple frames are continuous three frame, described Designated frame in continuous multiple frames is the intermediate frame in continuous three frame.

9. according to claim 7 for generating the device of information, wherein second determination unit, comprising:

First determining module, the classification for being configured to determine classification testing result instruction candidate frame is the number of the target category Amount；

Second determining module is configured in response to determine that the quantity is greater than the first default value, determines the target video Classification be the target category.

10. according to claim 7 for generating the device of information, wherein second determination unit, comprising:

Third determining module, the classification for being configured to determine classification testing result instruction candidate frame is the number of the target category Amount, and, determine the total quantity of candidate frame；

4th determining module is configured in response to determine that the ratio of the quantity and the total quantity is greater than the second present count Value determines that the classification of the target video is the target category.

11. according to claim 7 for generating the device of information, wherein first detection model by walking as follows Rapid training obtains:

12. according to claim 7 for generating the device of information, wherein second detection model by walking as follows Rapid training obtains:

13. a kind of electronic equipment, comprising:

One or more processors；

Storage device is stored thereon with one or more programs,

When one or more of programs are executed by one or more of processors, so that one or more of processors are real Now such as method as claimed in any one of claims 1 to 6.

14. a kind of computer-readable medium, is stored thereon with computer program, wherein the realization when program is executed by processor Such as method as claimed in any one of claims 1 to 6.