CN110149531A

CN110149531A - The method and apparatus of video scene in a kind of identification video data

Info

Publication number: CN110149531A
Application number: CN201910522913.2A
Authority: CN
Inventors: 彭浩
Original assignee: Beijing Yingpu Technology Co Ltd
Current assignee: Beijing Yingpu Technology Co Ltd
Priority date: 2019-06-17
Filing date: 2019-06-17
Publication date: 2019-08-20
Also published as: WO2020252975A1

Abstract

The embodiment of the invention discloses a kind of method and apparatus of video scene in identification video data.The described method includes: determining that target video data includes the type of the video scene by preset image recognition model, and obtain the confidence level for the video frame that different types of video scene separately includes；The confidence level of different classes of the included video frame of the video scene is done into normalized respectively, obtains weighted value of the different classes of video scene in the target video data；In addition, obtaining the parameter information of user's input, target video scene is determined from the video scene according to the size of the parameter information and the weighted value, and the target video scene is back to client.Using method of the present invention, the corresponding target video scene of target video data can be quickly identified by comparing weighted value of the video scene in target video data, to improve the recognition efficiency and accuracy of target video scene.

Description

The method and apparatus of video scene in a kind of identification video data

Technical field

The present embodiments relate to video data processing technology fields, and in particular to video field in a kind of identification video data The method and apparatus of scape, in addition, further relating to a kind of electronic equipment and storage equipment.

Background technique

With the fast development of science and technology, various video datas are more and more abundant, and one section of complete video data is often Comprising multiple video scenes, the video scene identification of video data is a relatively common problem, still, to realize to institute The accurate identification for stating video scene in video data is relatively difficult.Therefore, the accuracy of video scene identification, drop how to be improved The video scene misclassification rate of low target video becomes those skilled in the art's technical problem urgently to be solved.

In order to solve the above-mentioned technical problem, the video scene identification method generallyd use in the prior art is to extract video counts The characteristics of image of the video frame included in characterizes video data, is then based on preset classifier to the image of video data Feature is identified and is classified that different classifications corresponds to different video scenes, to realize to video field in video data The identification of scape.However, above-mentioned video scene recognition methods is easy to be influenced by classifier performance, cause to divide video scene Class accuracy is not still high, is unable to satisfy the demand of active user.

Summary of the invention

For this purpose, the embodiment of the present invention provides a kind of method and apparatus for identifying video scene in video data, it is existing to solve Have since current video scene segmentation technique is not mature enough present in technology, caused target scene extracts inaccurate ask Topic.

To achieve the goals above, the embodiment of the present invention provides the following technical solutions:

The method of video scene in a kind of identification video data provided according to embodiments of the present invention, comprising: according to be checked The different of the included video scene of complete video data of survey are split processing to the complete video data, obtain target view Frequency evidence；Wherein, the target video data includes the video frame of at least one video scene；Known by preset image Other model determines that the target video data includes the type of the video scene, and obtains different types of video field The confidence level for the video frame that scape separately includes；Wherein, described image identification model for according to the target video data include institute State the depth nerve net that the characteristic information of video scene classifies to the video scene that the target video data includes Network model；The confidence level of different classes of the included video frame of the video scene is done into normalized respectively, is obtained different Weighted value of the video scene of classification in the target video data；The parameter information for obtaining user's input, according to institute The size for stating parameter information and the weighted value obtains target video scene from the video scene, and by the target video Scene is back to client.

Further, described that mesh is obtained from the video scene according to the size of the parameter information and the weighted value Video scene is marked, specifically includes: extracting targeted parameter value from the parameter information；According to the targeted parameter value and the power The size of weight values obtains the weighted value up to or over the first video field of default weight threshold from the video scene Scape, using first video scene as the target video scene；Wherein, first video scene includes at least one institute State video scene.

Further, the video scene recognition methods, further includes: extract target component from the parameter information Value；Judge whether the targeted parameter value is greater than the number of species that the target video data includes the video scene, if It is then to return to alarm prompt to the client.

Further, the confidence level of the video frame refers to that the video frame is the corresponding video frame of the video scene Probability value.

Further, the difference according to the included video scene of complete video data to be detected is to the complete view Frequency obtains target video data, specifically includes according to processing is split: obtaining the complete video data to be detected；It is logical Cross the color characteristic that feature extraction algorithm obtains video scene described in the complete video data；It is calculated by the feature extraction Method obtains the color characteristic of video frame described in the complete video data；According to institute two neighboring in the complete video data State the second color in the first color characteristic difference and the complete video data of video scene between two adjacent video frames Feature difference judges the switching position between the adjacent video scene；According to the switching position to the complete video number It is handled according to being split.

Correspondingly, the application also provides a kind of device for identifying video scene in video data, comprising: dividing processing list Member, for being split according to the difference of the included video scene of complete video data to be detected to the complete video data Processing obtains target video data；Wherein, the target video data includes the video frame of at least one video scene； Video scene recognition unit, for determining that the target video data includes the video by preset image recognition model The type of scene, and obtain the confidence level for the video frame that different types of video scene separately includes；Wherein, described image It includes the characteristic information of the video scene to the target video data packet that identification model, which is according to the target video data, The deep neural network model that the video scene contained is classified；Video scene weight analysis unit is used for inhomogeneity The confidence level of other included video frame of the video scene does normalized respectively, obtains the different classes of video field Weighted value of the scape in the target video data；Target video data obtaining unit, for obtaining the parameter letter of user's input Breath obtains target video scene according to the size of the parameter information and the weighted value from the video scene, and by institute It states target video scene and is back to client.

Further, the dividing processing unit is specifically used for: extracting targeted parameter value from the parameter information；According to The size of the targeted parameter value and the weighted value obtains the weighted value up to or over pre- from the video scene If the first video scene of weight threshold, using first video scene as the target video scene；Wherein, described first Video scene includes at least one described video scene.

Further, the target video data obtaining unit is specifically used for: obtaining the complete video number to be detected According to；The color characteristic of video scene described in the complete video data is obtained by feature extraction algorithm；Pass through the feature Extraction algorithm obtains the color characteristic of video frame described in the complete video data；According to adjacent in the complete video data In the first color characteristic difference and the complete video data of two video scenes between two adjacent video frames Second colors feature difference judges the switching position between the adjacent video scene；According to the switching position to described complete Video data is split processing.

Correspondingly, the application also provides a kind of electronic equipment, comprising: processor and memory；Wherein, the memory is used The program of the method for video scene, the equipment are powered and run the identification by the processor in storage identification video data In video data after the program of the method for video scene, following step is executed:

The complete video data are divided according to the difference of the included video scene of complete video data to be detected Processing is cut, target video data is obtained；Wherein, the target video data includes the video of at least one video scene Frame；By preset image recognition model, determine that the target video data includes the type of the video scene, and obtain not The confidence level for the video frame that the congener video scene separately includes；Wherein, described image identification model is according to Target video data include the characteristic information of the video scene video scene that includes to the target video data into The deep neural network model of row classification；The confidence level of different classes of the included video frame of the video scene is done respectively and is returned One change processing, obtains weighted value of the different classes of video scene in the target video data；Obtain user's input Parameter information, target video field is obtained from the video scene according to the size of the parameter information and the weighted value Scape, and the target video scene is back to client.

Correspondingly, the application also provides a kind of storage equipment, it is stored with the method for video scene in identification video data Program, the program are run by processor, execute following step:

Using the method for video scene in identification video data of the present invention, preset image recognition mould can be passed through Type obtains the confidence level for the video frame that different types of video scene separately includes, and then does normalized and obtain video Weighted value of the scene in target video data quickly determines the corresponding target of target video data by comparing weighted value size Video scene, improves the collecting efficiency and accuracy of target video scene, to improve the usage experience of user.

Detailed description of the invention

It, below will be to embodiment party in order to illustrate more clearly of embodiments of the present invention or technical solution in the prior art Formula or attached drawing needed to be used in the description of the prior art are briefly described.It should be evident that the accompanying drawings in the following description is only It is merely exemplary, it for those of ordinary skill in the art, without creative efforts, can also basis The attached drawing of offer, which is extended, obtains other implementation attached drawings.

Structure depicted in this specification, ratio, size etc., only to cooperate the revealed content of specification, for Those skilled in the art understands and reads, and is not intended to limit the invention enforceable qualifications, therefore does not have technical Essential meaning, the modification of any structure, the change of proportionate relationship or the adjustment of size are not influencing the function of the invention that can be generated Under effect and the purpose that can reach, should all still it fall in the range of disclosed technology contents obtain and can cover.

Fig. 1 is the flow chart of the method for video scene in a kind of identification video data provided in an embodiment of the present invention；

Fig. 2 is the schematic diagram of the device of video scene in a kind of identification video data provided in an embodiment of the present invention；

Fig. 3 is the schematic diagram of a kind of electronic equipment provided in an embodiment of the present invention.

Specific embodiment

Embodiments of the present invention are illustrated by particular specific embodiment below, those skilled in the art can be by this explanation Content disclosed by book is understood other advantages and efficacy of the present invention easily, it is clear that described embodiment is the present invention one Section Example, instead of all the embodiments.Based on the embodiments of the present invention, those of ordinary skill in the art are not doing Every other embodiment obtained under the premise of creative work out, shall fall within the protection scope of the present invention.

Below based on the method for video scene in identification video data of the present invention, embodiment is retouched in detail It states.As shown in Figure 1, it is a kind of flow chart for identifying the method for video scene in video data provided in an embodiment of the present invention, Specific implementation process the following steps are included:

Step S101: according to the difference of the included video scene of complete video data to be detected to the complete video number According to processing is split, target video data is obtained；Wherein, the target video data includes at least one described video scene Video frame.

In embodiments of the present invention, the complete video data may include at least two video scenes.Correspondingly, pressing It processing is split to the complete video data can obtain according to the difference of the included video scene of complete video data At least two sections of video datas (that is: target video data).Every section of video data may include several video frames.Due to working as forward sight Frequency according to the immature of cutting leads to that some non-same videos may be doped in the two sections of video datas obtained after dividing processing Video frame in scene, therefore, the target video data include the video frame of at least one video scene, are utilized at this time Misclassification rate is higher when existing neural network identifies target video data, needs to be further processed.

It should be noted that the difference according to the included video scene of complete video data to be detected is to described Complete video data are split processing, and then obtain target video data, can specifically be accomplished in that

The complete video data to be detected are obtained, institute in the complete video data is obtained by feature extraction algorithm The color characteristic for stating video scene obtains the face of video frame described in the complete video data by the feature extraction algorithm Color characteristic, according to the first color characteristic difference of the video scene two neighboring in the complete video data and described complete The second color characteristic difference in video data between two adjacent video frames, judges the switching between the adjacent video scene Position is split processing to the complete video data according to the switching position.

Step S102: by preset image recognition model, determine that the target video data includes the video scene Type, and obtain the confidence level for the video frame that different types of video scene separately includes；Wherein, described image identifies Model is used for The deep neural network model classified of the video scene.

In the step 101 to complete video data to be identified be split processing obtain target video data it Afterwards, Data Preparation has been done to carry out analysis to the video scene in this step.In a step 102, pass through preset figure As identification model analyzes the video scene, it can determine that the target video data includes the kind of the video scene Class, and obtain the confidence level for the video frame that different types of video scene separately includes.

In an embodiment of the present invention, the confidence level of the video frame refers to that the video frame is corresponding for the video scene Video frame probability value.

In the actual implementation process, immature due to current video data cutting, according to complete video number to be detected Different according to included video scene are split processing to the complete video data, and the target video data of acquisition includes extremely The video frame of a few video scene.Therefore, by the preset image recognition model to the target video data The video scene for including carries out identification classification, may obtain at least one video scene classification and the different classes of view Frequency scene separately includes the confidence level of video frame.

It should be noted that preset image recognition model of the present invention can refer to that one kind is neural network based Video scene classifier, the training of the video scene classifier can be by acquiring different types of sample image, to difference The sample image of type extracts characteristics of image, then using the type of the characteristics of image and sample image that extract to video scene point Class device is trained, and common video scene classifier has support vector machines (Support Vector Machine, SVM), shellfish This classifier of leaf, logistic regression classifier etc..

Step S103: the confidence level of different classes of the included video frame of the video scene is done at normalization respectively Reason, obtains weighted value of the different classes of video scene in the target video data.

Obtained in the step 102 video frame that different types of video scene separately includes confidence level it Afterwards, Data Preparation has been done for the normalized process in this step.It in step 103, can be by different classes of institute The confidence level for stating the included video frame of video scene does normalized respectively, and then obtains the different classes of video scene Weighted value in the target video data.

It is exemplified below, in the specific implementation process, according to the difference of included video scene to the complete video Data are split processing, it is assumed that and the target video data of acquisition may include the video frame of tri- different video scenes of A, B, C, At this point, identification classification is carried out to the video scene that the target video data includes by the preset image recognition model, Tri- different video scene types of A, B, C may be obtained.Assuming that A category video scene includes 5 video frames, each video frame Confidence level is followed successively by 0.6,0.5,0.4,0.4,0.6；B category video scene includes 3 video frames, the confidence level of each video frame It is followed successively by 0.6,0.1,0.5；C category video scene includes 2 video frames, and the confidence level of each video frame is followed successively by 0.3 He 0.3。

The confidence level of different classes of the included video frame of the video scene is done respectively using Weighted Average Algorithm and is returned One change processing, obtains the weighted average of each video scene.Specifically, by the corresponding confidence level 0.6 of 5 video frames, 0.5, it 0.4,0.4,0.6 is added, then divided by video frame number 5, obtaining weighted average is 0.5.Similarly, available B classification view The weighted average of frequency scene is that the weighted average of 0.4, C category video scene is 0.3.The weighted average is this hair Weighted value of the bright different classes of video scene in the target video data.That is: the video of A classification Weighted value of the scene in the target video data is 0.5；The video scene of B classification is in the target video data Weighted value be 0.4；Weighted value of the video scene of C classification in the target video data is 0.3.

In the actual implementation process, since the too low video frame of confidence level may be because being interfered by other factors Arriving to set a confidence threshold value first in specific operation process as a result, therefore, acquisition meets or exceeds the confidence The video frame for spending threshold value carries out next calculating.Such as confidence threshold value is set as 0.5, then basis meets or exceeds this and sets The video frame of confidence threshold calculates the weighted average of each video scene.Such as: the video scene of A classification only considers at this time The video frame that confidence level is 0.6,0.5 and 0.6, the video scene of B classification only consider that confidence level is 0.6 and 0.5 video Frame is not above the video frame of confidence threshold value in the video scene of C classification.The weighted average of the video scene of A classification Value is that the weighted average of the video scene of (0.6+0.5+0.5)/3, B classification is the video scene of (0.6+0.5)/2, C classification Weighted average be 0.In order to reduce calculation amount, when met or exceeded in each classification the video frame of the confidence threshold value compared with When more, the 3-6 video frame that top score can be chosen from each classification is calculated.

Step S104: obtain user input parameter information, according to the size of the parameter information and the weighted value from Target video scene is obtained in the video scene, and the target video scene is back to client.

Weighted value of the different classes of video scene in the target video data is obtained in the step 103 Later, Data Preparation has been done to obtain target video scene from the video scene in this step.In step 103, The size of the parameter information and the weighted value that can be inputted according to user obtains target video scene from the video scene And the target video scene is back to client.

In embodiments of the present invention, targeted parameter value can be extracted from the parameter information that user inputs, at this point, according to institute State the size of targeted parameter value and the weighted value, can be obtained from the video scene weighted value up to or over First video scene of default weight threshold, using first video scene as the target video scene.Wherein, described One video scene includes at least one described video scene.

Specifically, assuming that the targeted parameter value N of user's input is 2, first determine whether that the targeted parameter value N is 2 whether big In the number of species 3 that the target video data includes the video scene, if it is not, then video scene is in the target video Weighted value in data is from big to small successively are as follows: A category video scene > B category video scene > C category video scene.This When, default weight threshold can be set to 3, and the weighted value is obtained from the video scene up to or over 3 the first view Frequency scene (that is: A category video scene and B category video scene).Using A category video scene and B category video scene as knowledge The not target video scene of the video data.When the targeted parameter value N of user's input is 1, at this point, default weight threshold can be with 5 are set as, the weighted value is obtained from the video scene up to or over 5 the first video scene (that is: A category video Scene).Using A category video scene as the target video scene for identifying the video data.

It may determine that whether the targeted parameter value is greater than the target video data institute in embodiment of the present invention Number of species comprising the video scene, if so, returning to alarm prompt to the client.That is: when user inputs Targeted parameter value N be 5 when, targeted parameter value N for 5 be greater than the target video data comprising the video scene type Quantity 3 reminds the user that at this point, returning to alarm prompt to client.

Corresponding with the method for video scene in a kind of identification video data of above-mentioned offer, the present invention also provides a kind of knowledges The device of video scene in other video data.Since the embodiment of the device is similar to above method embodiment, so description Fairly simple, related place refers to the explanation of above method embodiment part, in identification video data described below The embodiment of the device of video scene is only illustrative.It please refers to shown in Fig. 2, is a kind of knowledge provided in an embodiment of the present invention The schematic diagram of the device of video scene in other video data.

The device of video scene includes following part in a kind of identification video data of the present invention:

Dividing processing unit 201, for the difference according to the included video scene of complete video data to be detected to institute It states complete video data and is split processing, obtain target video data；Wherein, the target video data includes at least one The video frame of the video scene.

Video scene recognition unit 202, for determining the target video data packet by preset image recognition model Type containing the video scene, and obtain the confidence level for the video frame that different types of video scene separately includes；Its In, it includes the characteristic information of the video scene to the target that described image identification model, which is according to the target video data, The deep neural network model that the video scene that video data includes is classified.

Video scene weight analysis unit 203, for setting different classes of the included video frame of the video scene Reliability does normalized respectively, obtains weighted value of the different classes of video scene in the target video data.

For example, in the specific implementation process, according to the difference of included video scene to the complete video data It is split processing, it is assumed that the target video data of acquisition may include the video frame of tri- different video scenes of A, B, C, this When, identification classification is carried out to the video scene that the target video data includes by the preset image recognition model, it can Tri- different video scene types of A, B, C can be obtained.Assuming that A category video scene includes 5 video frames, each video frame is set Reliability is followed successively by 0.6,0.5,0.2,0.2,0.5；B category video scene includes 3 video frames, the confidence level of each video frame according to Secondary is 0.7,0.3,0.5；C category video scene includes 2 video frames, and the confidence level of each video frame is followed successively by 0.3 and 0.3. The confidence level of different classes of the included video frame of the video scene is done into normalized respectively using Weighted Average Algorithm, Obtain the weighted average of each video scene.Specifically, by the corresponding confidence level of 5 video frames 0.6,0.5,0.4, 0.4, it 0.6 is added, then divided by video frame number 5, obtaining weighted average is 0.5.Similarly, available B category video scene Weighted average is that the weighted average of 0.4, C category video scene is 0.3.The weighted average is as of the present invention Weighted value of the different classes of video scene in the target video data.That is: the video scene of A classification is in institute Stating the weighted value in target video data is 0.5；Weighted value of the video scene of B classification in the target video data It is 0.4；Weighted value of the video scene of C classification in the target video data is 0.3.

Target video data obtaining unit 204, for obtaining the parameter information of user's input, according to the parameter information and The size of the weighted value obtains target video scene from the video scene, and the target video scene is back to visitor Family end.

Using the device of video scene in identification video data of the present invention, preset image recognition mould can be passed through Type obtains the confidence level for the video frame that different types of video scene separately includes, and then does normalized and obtain video Weighted value of the scene in target video data quickly determines the corresponding target of target video data by comparing weighted value size Video scene, improves the collecting efficiency and accuracy of target video scene, to improve the usage experience of user.

Corresponding with the method for video scene in a kind of identification video data of above-mentioned offer, the present invention also provides a kind of electricity Sub- equipment.Since the embodiment of the electronic equipment is similar to above method embodiment, so being described relatively simple, related place The explanation of above method embodiment part is referred to, electronic device described below is only illustrative.Please refer to Fig. 3 institute Show, is the schematic diagram of a kind of electronic equipment provided in an embodiment of the present invention.

The present invention provides a kind of electronic equipment and specifically includes: processor 301 and memory 302；Wherein, the memory 302, for storing the program of the method for video scene in identification video data, which is powered and passes through the processor 301 It runs in the identification video data after the program of the method for video scene, executes following step:

Correspondingly, the present invention also provides a kind of storage equipment, comprising: be stored with the side of video scene in identification video data The program of method, the program are run by processor, execute following step:

Although above having used general explanation and specific embodiment, the present invention is described in detail, at this On the basis of invention, it can be made some modifications or improvements, this will be apparent to those skilled in the art.Therefore, These modifications or improvements without departing from theon the basis of the spirit of the present invention are fallen within the scope of the claimed invention.

Claims

1. a kind of method of video scene in identification video data characterized by comprising

Place is split to the complete video data according to the difference of the included video scene of complete video data to be detected Reason obtains target video data；Wherein, the target video data includes the video frame of at least one video scene；

By preset image recognition model, determine that the target video data includes the type of the video scene, and obtain The confidence level for the video frame that different types of video scene separately includes；Wherein, described image identification model is the mesh The video scene that the characteristic information of the mark video data video scene that includes includes to the target video data into The deep neural network model of row classification；

The confidence level of different classes of the included video frame of the video scene is done into normalized respectively, is obtained different classes of Weighted value of the video scene in the target video data；

The parameter information for obtaining user's input, according to the size of the parameter information and the weighted value from the video scene Target video scene is obtained, and the target video scene is back to client.

2. the method for video scene in identification video data according to claim 1, which is characterized in that described according to be checked The different of the included video scene of complete video data of survey are split processing to the complete video data, obtain target view Frequency evidence, specifically includes:

Obtain the complete video data to be detected；

The color characteristic of video scene described in the complete video data is obtained by feature extraction algorithm；

The color characteristic of video frame described in the complete video data is obtained by the feature extraction algorithm；

According to the first color characteristic difference between the video scene two neighboring in the complete video data and described complete The second color characteristic difference in whole video data between two adjacent video frames, judges cutting between the adjacent video scene Change place；

Processing is split to the complete video data according to the switching position.

3. the method for video scene in identification video data according to claim 1, which is characterized in that described according to The size of parameter information and the weighted value obtains target video scene from the video scene, specifically includes:

Targeted parameter value is extracted from the parameter information；

According to the size of the targeted parameter value and the weighted value, obtained from the video scene weighted value reach or Person is more than the first video scene of default weight threshold, using first video scene as the target video scene；Wherein, First video scene includes at least one described video scene.

4. the method for video scene in identification video data according to claim 1, which is characterized in that further include:

Targeted parameter value is extracted from the parameter information；

Judge whether the targeted parameter value is greater than the number of species for the video scene that the target video data is included, If so, returning to alarm prompt to the client.

5. the method for video scene in identification video data according to claim 1, which is characterized in that the video frame Confidence level refers to that the video frame is the probability value of the corresponding video frame of the video scene.

6. the device of video scene in a kind of identification video data characterized by comprising

Dividing processing unit, for the difference according to the included video scene of complete video data to be detected to the complete view Frequency obtains target video data according to processing is split；Wherein, the target video data includes at least one described video The video frame of scene；

Video scene recognition unit, for determining that the target video data includes described by preset image recognition model The type of video scene, and obtain the confidence level for the video frame that different types of video scene separately includes；Wherein, described It includes the characteristic information of the video scene to the target video number that image recognition model, which is according to the target video data, According to comprising the deep neural network model classified of the video scene；

Video scene weight analysis unit, for distinguishing the confidence level of different classes of the included video frame of the video scene Normalized is done, weighted value of the different classes of video scene in the target video data is obtained；

Target video data obtaining unit, for obtaining the parameter information of user's input, according to the parameter information and the power The size of weight values obtains target video scene from the video scene, and the target video scene is back to client.

7. the device of video scene in identification video data according to claim 6, which is characterized in that the target video Data acquiring unit is specifically used for:

Obtain the complete video data to be detected；

8. the device of video scene in identification video data according to claim 6, which is characterized in that the dividing processing Unit is specifically used for:

Targeted parameter value is extracted from the parameter information；

9. a kind of electronic equipment characterized by comprising

Processor；And

Memory, for storing the program of the method for video scene in identification video data, which is powered and passes through the place Reason device is run in the identification video data after the program of the method for video scene, executes following step:

By preset image recognition model, determine that the target video data includes the type of the video scene, and obtain The confidence level for the video frame that different types of video scene separately includes；Wherein, described image identification model is according to institute State the video scene that the characteristic information that target video data includes the video scene includes to the target video data The deep neural network model classified；

10. a kind of storage equipment, which is characterized in that be stored with the program of the method for video scene in identification video data, the journey Sequence is run by processor, executes following step: