Specific embodiment
The application is described in further detail with reference to the accompanying drawings and examples.It is understood that this place is retouched
The specific embodiment stated is used only for explaining related invention, rather than the restriction to the invention.It also should be noted that in order to
Convenient for description, part relevant to related invention is illustrated only in attached drawing.
It should be noted that in the absence of conflict, the features in the embodiments and the embodiments of the present application can phase
Mutually combination.The application is described in detail below with reference to the accompanying drawings and in conjunction with the embodiments.
Fig. 1 is shown can be using the method for generating information of the application or the implementation of the device for generating information
The exemplary architecture 100 of example.
As shown in Figure 1, system architecture 100 may include terminal device 101,102,103, network 104 and server 105.
Network 104 between terminal device 101,102,103 and server 105 to provide the medium of communication link.Network 104 can be with
Including various connection types, such as wired, wireless communication link or fiber optic cables etc..
Terminal device 101,102,103 is interacted by network 104 with server 105, to receive or send message etc..Terminal
Various client applications can be installed in equipment 101,102,103.Such as camera shooting class application, image processing class application, video
Class application etc..
Terminal device 101,102,103 can be hardware, be also possible to software.When terminal device 101,102,103 is hard
When part, it can be various electronic equipments, including but not limited to smart phone, tablet computer, E-book reader, on knee portable
Computer and desktop computer etc..When terminal device 101,102,103 is software, above-mentioned cited electricity may be mounted at
In sub- equipment.Multiple softwares or software module may be implemented into (such as providing multiple softwares of Distributed Services or soft in it
Part module), single software or software module also may be implemented into.It is not specifically limited herein.
Server 105 can be to provide the server of various services, for example, terminal device 101,102,103 send to
Processing video analyze etc. the processing server of processing.Processing server can extract the corresponding image sequence of video to be processed
And the image in image sequence is handled.Further, processing result can also be fed back to terminal and set by processing server
Standby 101,102,103.
It should be noted that above-mentioned video to be processed can also be stored directly in the local of server 105, server 105
The local video to be processed stored can directly be extracted and handled, at this point it is possible to there is no terminal device 101,102,
103 and network 104.
It should be noted that the method provided by the embodiment of the present application for generating information is generally held by server 105
Row, correspondingly, the device for generating information is generally positioned in server 105.
It may also be noted that can also be equipped with video processing class application in terminal device 101,102,103, terminal is set
Standby 101,102,103 can also be based on video processing class using handling video to be processed, at this point, for generating information
Method can also be executed by terminal device 101,102,103, and correspondingly, the device for generating information also can be set in terminal
In equipment 101,102,103.At this point, server 105 and network 104 can be not present in exemplary system architecture 100.
It should be noted that server can be hardware, it is also possible to software.When server is hardware, may be implemented
At the distributed server cluster that multiple servers form, individual server also may be implemented into.It, can when server is software
It, can also be with to be implemented as multiple softwares or software module (such as providing multiple softwares of Distributed Services or software module)
It is implemented as single software or software module.It is not specifically limited herein.
It should be understood that the number of terminal device, network and server in Fig. 1 is only schematical.According to realization need
It wants, can have any number of terminal device, network and server.
With continued reference to Fig. 2, it illustrates the processes according to one embodiment of the method for generating information of the application
200.This be used for generate information method the following steps are included:
Step 201, video to be processed is obtained.
In the present embodiment, it can use for generating the executing subject (server 105 as shown in Figure 1) of the method for information
The mode of wired connection or wireless connection obtains video to be processed from local or other storage equipment.Wherein, video to be processed can
To be the video of various types, various contents.Video to be processed is also possible to the video as specified by related personnel.
Step 202, the corresponding image sequence of video to be processed is extracted.
In the present embodiment, the corresponding image sequence of video to be processed can further be extracted according to video to be processed.One
As, video is made of a series of images.Therefore, a series of images for forming video to be processed is that video to be processed is corresponding
Image sequence.Specifically, can use existing various videos processing class using handling video to be processed, with obtain to
Handle the corresponding image sequence of video.It wherein, is to grind extensively at present to obtain corresponding image sequence to video to be processed processing
The well-known technique studied carefully and applied, details are not described herein.
Step 203, do not include people's image in response to the image determined in image sequence, obtain the corresponding sound of video to be processed
Frequently, and according to audio determine the classification information of video to be processed.
In the present embodiment, people's image can refer to the image of display someone.Specifically, show someone in whole or in part
The image of (as there was only face part) can be seen as people's image.Generally, the corresponding audio of video can refer to that video is had
Sound.The sound of video can be when recording, be recorded the sound that object and recording environment at that time are issued.Video
Sound be also possible to formerly carry out video record early period, the sound after then the sound of video is handled and/or dubbed
Sound.
Specifically, it can use existing various multi-media processing softwares to handle video to be processed, to obtain
The corresponding audio of video to be processed.It wherein, is to study and answer extensively at present to obtain corresponding audio to video to be processed processing
Well-known technique, details are not described herein.
In the present embodiment, classification information can be used to indicate that whether video to be processed is related to target group.Wherein, mesh
Mark crowd can be the group being made of the preassigned people of technical staff.Target group, which may also mean that, meets preset condition
Crowd.Preset condition may include that the owner in crowd has one or more same alike results (or attribute value) etc..
For example, preset condition can be the age less than 12 years old, then the group that 12 years old people below can be constituted
Stereoscopic is target group.In another example preset condition can be the age less than 12 years old and gender is female, then can be by 12 years old or less
The group that is constituted of girl be considered as target group.
In the present embodiment, when the information of video to be processed is related to target group, it is believed that video to be processed with
Target group is related.The information of video to be processed includes but is not limited to the content (in video such as to be processed that video to be processed includes
People, object, environment of appearance etc.), corresponding content of the audio of video to be processed (semanteme of the corresponding text of such as audio) etc..
For example, when occurring the people for belonging to target group in video to be processed, it is believed that video to be processed is related with target group.Again
For example, the corresponding Genus Homo of voice in the corresponding audio of video to be processed is when target group, it is believed that video to be processed with
Target group is related.
In the present embodiment, according to specific application demand (as target group refers to the group etc. who is constituted),
The classification information of video to be processed is determined using various methods.
It is alternatively possible to determine the classification of video to be processed according to the corresponding audio of video to be processed as follows
Information:
Step 1 obtains the first audio collection and the second audio collection.Wherein, the audio in the first audio collection can be target person
The audio of people in group.Audio in second audio collection can be the audio of the people in non-targeted crowd.It specifically, can be preparatory
The audio for acquiring or generating the people in some target groups using audio class application forms the first audio collection, acquires or utilize audio
The audio that class application generates the people in some non-targeted crowds forms the second audio collection.
Step 2 chooses target numbers audio respectively from the first audio collection and the second audio collection and forms first sample sound
Frequency collection and the second sample audio collection.Wherein, target numbers can be preassigned by related personnel, be also possible to according to actual conditions
(as the sum that target numbers are the audio collection audio that includes ten/it is first-class) and determining.Specifically, it is selected from audio collection
It takes the mode of audio can be to randomly select, be also possible to according to specific rules extraction.It for example, can be previously according to sound
Audio in audio collection is divided into one or more tone subsets by the length of frequency.The length of audio in the same tone subset
Degree can be located in same audio length of interval.
Step 3 calculates the similarity of each audio in the corresponding audio of video to be processed and first sample audio collection,
Maximum similarity is chosen as the first similarity.Similarly, the corresponding audio of video to be processed and the second sample audio are calculated
The similarity for each audio concentrated, chooses maximum similarity as the second similarity.
Specifically, the calculating of the similarity between two audios can be calculated using existing audio class application.Tool
Body, different calculations can be chosen according to different application demands.Such as, it is desirable to the semantic similarity of two audios,
The corresponding text of two audios can be obtained respectively, then calculated similar between corresponding two texts of two audios
Degree.In another example, it is desirable to the similarity of the frequency of two audios, then the corresponding waveform of two audios can first be obtained, so
The similarity between corresponding two waveforms of two audios is calculated afterwards.
Step 4 chooses the greater in the first similarity and the second similarity as target similarity, and by target
The classification information of the corresponding crowd of similarity is determined as the classification information of video to be processed.It is similar to be greater than second in the first similarity
When spending, can determine that video to be processed is related with target group, the second similarity be greater than the first similarity when, can determine to
It is unrelated with target group to handle video.
Optionally, the class of video to be processed can also be determined as follows according to the corresponding audio of video to be processed
Other information:
Step 1 obtains the corresponding Genus Homo of voice in audio by audio input to classification detection model trained in advance
In the probability of target group.Wherein, classification detection model can be used for characterizing audio Genus Homo corresponding with the voice in audio in
The corresponding relationship of the probability of target group.
Step 2 determines the size relation of obtained probability and destination probability threshold value.In response to determining that obtained probability is big
In destination probability threshold value, it will indicate that video to be processed classification information relevant to target group is determined as the classification of video to be processed
Information.In response to determining that obtained probability is less than destination probability threshold value, it will indicate that video to be processed and target group are incoherent
Classification information is determined as the classification information of video to be processed.Wherein, destination probability threshold value can be preset by related personnel,
(such as being calculated according to preset calculation formula) can be dynamically determined in actual process.
Wherein, classification detection model in above-mentioned steps one can be that training obtains in advance by various training methods.
It is alternatively possible to which training obtains above-mentioned classification detection model as follows:
Step 1 obtains training sample set.Wherein, each training sample includes audio and the corresponding probability of audio.Audio
Corresponding probability can indicate the corresponding Genus Homo of voice in audio in the probability of target group.Specifically, in training sample
Probability can be by manually marking to obtain according to audio.
Step 2 determines initial category detection model.Wherein, initial category detection model can be it is various types of without
Artificial neural network trained or that training is not completed, such as deep learning model.Initial category detection model is also possible to pair
The model that artificial neural network a variety of unbred or that training is not completed is combined.For example, initial category detects
Model can be to unbred convolutional neural networks, unbred Recognition with Recurrent Neural Network and unbred full articulamentum
The model being combined.
Step 3, using the method for machine learning, the audio in training sample that training sample is concentrated is as initial classes
The input of other detection model, using probability corresponding with the audio of input as desired output, training obtains above-mentioned classification detection mould
Type.
Specifically, initial category detection model can be trained based on preset loss function.Wherein, loss function
Value can be used to indicate the difference degree in the reality output and training sample of initial category detection model between probability.So
Afterwards, using the parameter of the method adjustment initial category detection model of backpropagation, and can met based on the value of loss function
In the case where preset trained termination condition, terminate training.After the completion of training, the initial category that training can be completed detects mould
Type is determined as above-mentioned initial category detection model.
Preset trained termination condition can include but is not limited at least one of following: the training time be more than preset duration,
Frequency of training is more than preset times, the value of loss function less than default discrepancy threshold etc..
Optionally, it can also train as follows and obtain above-mentioned classification detection model:
Step 1 obtains initial category and determines model.Wherein, initial category determines that model includes initial category detection model
The preliminary classification model being connect with initial category detection model.Wherein, preliminary classification model is by initial category detection model
Output is as input, and will indicate whether the corresponding people of voice in audio belongs to the markup information of target group as defeated
Out.Specifically, markup information can be by being manually labeled to obtain in advance.
Initial category detection model can be artificial neural network various types of unbred or that training is not completed,
Such as deep learning model.Initial category detection model is also possible to artificial mind a variety of unbred or that training is not completed
The model being combined through network.For example, initial category detection model can be to unbred convolutional neural networks,
The model that unbred Recognition with Recurrent Neural Network and unbred full articulamentum are combined.Preliminary classification model can be with
It is a classifier, for classifying to the information of input.
Step 2 obtains training sample set.Wherein, training sample may include audio and for indicating the voice in audio
Whether corresponding people belongs to the markup information of target group.Specifically, the probability in training sample can be by manually according to audio
Mark obtains.
Step 3, using the method for machine learning, the audio in training sample that training sample is concentrated is as initial classes
Not Que Ding model input, the desired output of model is determined using markup information corresponding with the audio of input as initial category,
The initial category that training obtains training completion determines model;
Specifically, model, which is trained, to be determined to initial category based on preset loss function.Wherein, loss function
Value can be used to indicate that initial category determines whether the markup information in the reality output and training sample of model consistent.Example
Such as, it uses " 0 " to indicate that initial category determines that the reality output of model is consistent with the markup information in training sample, uses " 1 " table
Show that initial category determines that the markup information in the reality output and training sample of model is inconsistent.It is then possible to based on loss letter
Several values is determined the parameter of model using the method adjustment initial category of backpropagation, and terminates item meeting preset training
In the case where part, terminate training, the initial category for obtaining training completion determines model.
Preset trained termination condition can include but is not limited at least one of following: the training time be more than preset duration,
Frequency of training is more than preset times, the value of loss function less than default discrepancy threshold etc..
The initial category that training is completed is determined the initial category detection model that model includes, training is completed by step 4
It is determined as above-mentioned classification detection model.
With continued reference to the signal that Fig. 3, Fig. 3 are according to the application scenarios of the method for generating information of the present embodiment
Figure.In the application scenarios of Fig. 3, the above-mentioned available video 301 to be processed of executing subject.Then, video 301 to be processed is extracted
Corresponding image sequence 302.Then, image detection can be carried out to each image in image sequence 302, with each figure of determination
It seem no comprising people's image.If not including people's image in each image, it is corresponding can further to obtain video 301 to be processed
Audio 303.
Later, audio 303 is inputted to classification detection model 304 trained in advance, obtains output result 305.Export result
305 can be used to indicate that in audio 303 the corresponding people of voice be children probability.Specifically as shown in figure label 305, output
It as a result include output probability, having is 0.8.
Further, it is possible to by the probability of output compared with probability threshold value 306 carries out size.As shown in figure label 306, generally
Rate threshold value is specially 0.65.Since the probability 0.8 that classification detection model 304 exports is greater than probability threshold value 0.65, it can recognize
It is children for the corresponding people of voice in the audio 303 of input.The testing result 307 of classification detection model 304 is obtained, and will
Testing result of the testing result 307 as the corresponding video 301 to be processed of audio 303, it can think video to be processed 301 with
Children are related.
The method provided by the above embodiment of the application by the image in the corresponding image sequence of video to be processed not
When comprising people's image, it is whether related with target group that video to be processed is determined according to the audio of video to be processed, to effectively keep away
The case where method for processing video frequency based on image can not handle the video to be processed for not including people's image is exempted from, and then has facilitated
Promote the stability and robustness of the processing to video to be processed.
With further reference to Fig. 4, it illustrates the processes 400 of another embodiment of the method for generating information.The use
In the process 400 for the method for generating information, comprising the following steps:
Step 401, video to be processed is obtained.
Step 402, the corresponding image sequence of video to be processed is extracted.
The specific implementation procedure of above-mentioned steps 401 and 402 can refer to step 201 in Fig. 2 corresponding embodiment and 202
Related description, details are not described herein.
Step 403, determine whether the image in image sequence includes people's image.If the image in image sequence is schemed comprising people
Picture executes following step 404, if the image in image sequence does not include people's image, executes following step 405.
In this step, it can use existing image detecting method (such as various pedestrian detection algorithms) in image sequence
Image detected, to determine whether the image in image sequence includes people's image.
Step 404, whether the people's image for determining that the image in image sequence includes meets preset condition.If meeting, execute
Following step 4041 executes step 405 if not meeting.
In this step, preset condition can be arranged by related personnel according to specific application scenarios and demand.It is default
Condition can be the restriction in terms of the quality for the people's image for including to the image in image sequence.The quality of people's image can be with employment
Image is convenient for the degree for extracting desired characteristics of image to characterize.For example, the quality of people's image can with the resolution ratio of employment image,
The information such as the size of the image-region for the people that people's image is shown indicate.
In some cases, the resolution ratio for people's image that the image in image sequence includes is excessively low or people's image is shown
The feature that can extract of people it is less (such as only comprising foot).At this point, analyzing the people that people's image is shown based on people's image is
The no accuracy for belonging to target group also will be greatly reduced.
Step 4041, people's image that the image in the corresponding image sequence of video to be processed includes is extracted, people's image is obtained
Set, and according to people's image collection, determine the classification information of video to be processed.
In this step, each image in image sequence can be determined first with existing some image detecting methods
It shows the image-region of people, and then extracts the corresponding people's image of image of display someone.
Further, it is possible to successively each personal images in people's image collection are analyzed, it is aobvious with each personal images of determination
Whether the people shown belongs to target group.If the people that each personal images are shown is not belonging to target group, it is believed that view to be processed
Frequency is unrelated with target group.If the Genus Homo for having at least one image to show is in target group, it is believed that video and mesh to be processed
Mark crowd is related.
Specifically, the existing various image processing methods based on people's image be can use, determine the class of video to be processed
Other information.For example, can directly be analyzed and processed to each personal images, determine whether the people that each personal images are shown belongs to mesh
Mark crowd, to obtain the classification information of video to be processed.
For everyone image, can also first analyze whether the people's image includes facial image, if the people's image includes people
Face image can use the existing various methods (such as based on the attribute recognition approach of face) based on Face datection and scheme to the people
As comprising facial image handled, to determine whether the face that show of the facial image related to target group, and then must
To the classification information of video to be processed.If the people's image does not include facial image, can use existing various based on human body inspection
The human body image that the method (such as based on the attribute recognition approach of human body) of survey includes to the people's image is analyzed, to determine human body
Whether the human body that image is shown is related with target group, and then obtains the classification information of video to be processed.
Step 405, the corresponding audio of video to be processed is obtained, and determines that the classification of video to be processed is believed according to audio
Breath.
The specific implementation procedure of this step 405 can refer to the related description of the step 203 in Fig. 2 corresponding embodiment,
This is repeated no more.
Figure 4, it is seen that the method for generating information compared with the corresponding embodiment of Fig. 2, in the present embodiment
Process 400 whether include that people's image has carried out different disposal to the image in the corresponding image sequence of video to be processed, into one
Whether step ground meets preset condition to subdivision the case where including facial image into one layer also according to the facial image for including, from
And make it possible to execute various types of videos to be processed different processing, it is same being effectively treated to video to be processed
When, further promote the accuracy of the processing result of video to be processed.
With further reference to Fig. 5, as the realization to method shown in above-mentioned each figure, this application provides for generating information
One embodiment of device, the Installation practice is corresponding with embodiment of the method shown in Fig. 2, which specifically can be applied to
In various electronic equipments.
As shown in figure 5, the device 500 provided in this embodiment for generating information includes video acquisition unit 501, image
Sequence extraction unit 502 and determination unit 503.Wherein, video acquisition unit 501 is configured to obtain video to be processed;Image
Sequence extraction unit 502 is configured to extract the corresponding image sequence of video to be processed;Determination unit 503 is configured in response to
It determines that the image in image sequence does not include people's image, obtains the corresponding audio of video to be processed;It is determined according to audio to be processed
The classification information of video, wherein classification information is for indicating whether video to be processed is related to target group.
In the present embodiment, in the device 500 for generating information: video acquisition unit 501, image sequence extraction unit
502 and determination unit 503 specific processing and its brought technical effect can be respectively with reference to the step in Fig. 2 corresponding embodiment
201, the related description of step 202, step 203 and step 204, details are not described herein.
In some optional implementations of the present embodiment, above-mentioned determination unit 503 is further configured to: in response to
It determines that image in image sequence includes people's image and people's image for including does not meet preset condition, it is corresponding to obtain video to be processed
Audio;The classification information of video to be processed is determined according to audio.
In some optional implementations of the present embodiment, above-mentioned determination unit 503 is further configured to: in response to
It determines that image in image sequence includes people's image and people's image for including meets preset condition, it is corresponding to extract video to be processed
People's image that image in image sequence includes, obtains people's image collection;According to people's image collection, the class of video to be processed is determined
Other information.
In some optional implementations of the present embodiment, above-mentioned determination unit 503 is further configured to: by audio
It is input to classification detection model trained in advance, obtains the corresponding Genus Homo of voice in audio in the probability of target group, wherein
Classification detection model is used to characterize the corresponding relationship of probability of the audio Genus Homo corresponding with the voice in audio in target group;It rings
Destination probability threshold value should be greater than in determining obtained probability, will indicate that video to be processed classification information relevant to target group is true
It is set to the classification information of video to be processed.
In some optional implementations of the present embodiment, above-mentioned determination unit 503 is further configured to: in response to
It determines that obtained probability is less than destination probability threshold value, will indicate that video to be processed and the incoherent classification information of target group determine
For the classification information of video to be processed.
In some optional implementations of the present embodiment, training obtains classification detection model as follows: obtaining
Initial category is taken to determine model, wherein initial category determines that model includes initial category detection model and detects with initial category
Model connection preliminary classification model, wherein preliminary classification model using the output of initial category detection model as input, and
Whether the corresponding people of voice indicated in audio is belonged into the markup information of target group as output;Training sample set is obtained,
Wherein, training sample includes the mark letter whether audio people corresponding with for indicating the voice in audio belongs to target group
Breath;Using the method for machine learning, the audio in the training sample that training sample is concentrated is determined into model as initial category
Input, the desired output of model is determined using markup information corresponding with the audio of input as initial category, is trained
The initial category of completion determines model;The initial category that training is completed is determined into the initial category that model includes, training is completed
Detection model is determined as classification detection model.
The device provided by the above embodiment of the application obtains video to be processed by video acquisition unit;Image sequence
Extraction unit extracts the corresponding image sequence of video to be processed;Determination unit is in response to determining that the image in image sequence does not include
People's image obtains the corresponding audio of video to be processed;The classification information of video to be processed is determined according to audio, wherein classification letter
Whether breath is related to target group for indicating video to be processed, can not be from each figure of correspondence in video to be processed to realize
As in extract people relevant information when, can be determined according to the corresponding audio of video to be processed video to be processed whether with target
Crowd is related, helps to promote the diversity and flexibility to video processing mode to be processed.
Below with reference to Fig. 6, it illustrates the computer systems 600 for the electronic equipment for being suitable for being used to realize the embodiment of the present application
Structural schematic diagram.Electronic equipment shown in Fig. 6 is only an example, function to the embodiment of the present application and should not use model
Shroud carrys out any restrictions.
As shown in fig. 6, computer system 600 includes central processing unit (CPU) 601, it can be read-only according to being stored in
Program in memory (ROM) 602 or be loaded into the program in random access storage device (RAM) 603 from storage section 608 and
Execute various movements appropriate and processing.In RAM 603, also it is stored with system 600 and operates required various programs and data.
CPU 601, ROM 602 and RAM 603 are connected with each other by bus 604.Input/output (I/O) interface 605 is also connected to always
Line 604.
I/O interface 605 is connected to lower component: the importation 606 including keyboard, mouse etc.;It is penetrated including such as cathode
The output par, c 607 of spool (CRT), liquid crystal display (LCD) etc. and loudspeaker etc.;Storage section 608 including hard disk etc.;
And the communications portion 609 of the network interface card including LAN card, modem etc..Communications portion 609 via such as because
The network of spy's net executes communication process.Driver 610 is also connected to I/O interface 605 as needed.Detachable media 611, such as
Disk, CD, magneto-optic disk, semiconductor memory etc. are mounted on as needed on driver 610, in order to read from thereon
Computer program be mounted into storage section 608 as needed.
Particularly, in accordance with an embodiment of the present disclosure, it may be implemented as computer above with reference to the process of flow chart description
Software program.For example, embodiment of the disclosure includes a kind of computer program product comprising be carried on computer-readable medium
On computer program, which includes the program code for method shown in execution flow chart.In such reality
It applies in example, which can be downloaded and installed from network by communications portion 609, and/or from detachable media
611 are mounted.When the computer program is executed by central processing unit (CPU) 601, limited in execution the present processes
Above-mentioned function.
It should be noted that the computer-readable medium of the application can be computer-readable signal media or computer
Readable storage medium storing program for executing either the two any combination.Computer readable storage medium for example can be --- but it is unlimited
In system, device or the device of --- electricity, magnetic, optical, electromagnetic, infrared ray or semiconductor, or any above combination.It calculates
The more specific example of machine readable storage medium storing program for executing can include but is not limited to: have the electrical connection, portable of one or more conducting wires
Formula computer disk, hard disk, random access storage device (RAM), read-only memory (ROM), erasable programmable read only memory
(EPROM or flash memory), optical fiber, portable compact disc read-only memory (CD-ROM), light storage device, magnetic memory device or
The above-mentioned any appropriate combination of person.In this application, computer readable storage medium can be it is any include or storage program
Tangible medium, which can be commanded execution system, device or device use or in connection.And in this Shen
Please in, computer-readable signal media may include in a base band or as carrier wave a part propagate data-signal,
In carry computer-readable program code.The data-signal of this propagation can take various forms, including but not limited to
Electromagnetic signal, optical signal or above-mentioned any appropriate combination.Computer-readable signal media can also be computer-readable
Any computer-readable medium other than storage medium, the computer-readable medium can send, propagate or transmit for by
Instruction execution system, device or device use or program in connection.The journey for including on computer-readable medium
Sequence code can transmit with any suitable medium, including but not limited to: wireless, electric wire, optical cable, RF etc. are above-mentioned
Any appropriate combination.
Flow chart and block diagram in attached drawing are illustrated according to the system of the various embodiments of the application, method and computer journey
The architecture, function and operation in the cards of sequence product.In this regard, each box in flowchart or block diagram can generation
A part of one module, program segment or code of table, a part of the module, program segment or code include one or more use
The executable instruction of the logic function as defined in realizing.It should also be noted that in some implementations as replacements, being marked in box
The function of note can also occur in a different order than that indicated in the drawings.For example, two boxes succeedingly indicated are actually
It can be basically executed in parallel, they can also be executed in the opposite order sometimes, and this depends on the function involved.Also it to infuse
Meaning, the combination of each box in block diagram and or flow chart and the box in block diagram and or flow chart can be with holding
The dedicated hardware based system of functions or operations as defined in row is realized, or can use specialized hardware and computer instruction
Combination realize.
Being described in unit involved in the embodiment of the present application can be realized by way of software, can also be by hard
The mode of part is realized.Described unit also can be set in the processor, for example, can be described as: a kind of processor, packet
Include video acquisition unit, image sequence acquiring unit and determination unit.Wherein, the title of these units is not under certain conditions
The restriction to the unit itself is constituted, for example, video acquisition unit is also described as " obtaining the unit of video to be processed ".
As on the other hand, present invention also provides a kind of computer-readable medium, which be can be
Included in electronic equipment described in above-described embodiment;It is also possible to individualism, and without in the supplying electronic equipment.
Above-mentioned computer-readable medium carries one or more program, when said one or multiple programs are held by the electronic equipment
When row, so that the electronic equipment: obtaining video to be processed;Extract the corresponding image sequence of video to be processed;Scheme in response to determining
As the image in sequence do not include people's image, obtain the corresponding audio of video to be processed;Video to be processed is determined according to audio
Classification information, wherein classification information is for indicating whether video to be processed is related to target group.
Above description is only the preferred embodiment of the application and the explanation to institute's application technology principle.Those skilled in the art
Member is it should be appreciated that invention scope involved in the application, however it is not limited to technology made of the specific combination of above-mentioned technical characteristic
Scheme, while should also cover in the case where not departing from foregoing invention design, it is carried out by above-mentioned technical characteristic or its equivalent feature
Any combination and the other technical solutions formed.Such as features described above has similar function with (but being not limited to) disclosed herein
Can technical characteristic replaced mutually and the technical solution that is formed.