WO2020252975A1 - Method and apparatus for recognizing video scene in video data - Google Patents
Method and apparatus for recognizing video scene in video data Download PDFInfo
- Publication number
- WO2020252975A1 WO2020252975A1 PCT/CN2019/108434 CN2019108434W WO2020252975A1 WO 2020252975 A1 WO2020252975 A1 WO 2020252975A1 CN 2019108434 W CN2019108434 W CN 2019108434W WO 2020252975 A1 WO2020252975 A1 WO 2020252975A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- video
- video data
- scene
- target
- scenes
- Prior art date
Links
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/20—Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
- H04N21/23—Processing of content or additional data; Elementary server operations; Server middleware
- H04N21/234—Processing of video elementary streams, e.g. splicing of video streams, manipulating MPEG-4 scene graphs
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/20—Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
- H04N21/23—Processing of content or additional data; Elementary server operations; Server middleware
- H04N21/234—Processing of video elementary streams, e.g. splicing of video streams, manipulating MPEG-4 scene graphs
- H04N21/23418—Processing of video elementary streams, e.g. splicing of video streams, manipulating MPEG-4 scene graphs involving operations for analysing video streams, e.g. detecting features or characteristics
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/43—Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
- H04N21/44—Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream, rendering scenes according to MPEG-4 scene graphs
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/43—Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
- H04N21/44—Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream, rendering scenes according to MPEG-4 scene graphs
- H04N21/44008—Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream, rendering scenes according to MPEG-4 scene graphs involving operations for analysing video streams, e.g. detecting features or characteristics in the video stream
Definitions
- the embodiments of the present application relate to the technical field of video data processing, in particular to a method and apparatus for identifying video scenes in video data, and in addition, to an electronic device and a storage device.
- the video scene recognition method usually used in the prior art is to extract the image features of the video frames contained in the video data to characterize the video data, and then identify and identify the image features of the video data based on a preset classifier Classification, different categories correspond to different video scenes, so as to realize the recognition of video scenes in video data.
- the above video scene recognition method is easily affected by the performance of the classifier, resulting in the classification accuracy of the video scene is still not high, and cannot meet the needs of current users.
- the embodiments of the present application provide a method and device for recognizing video scenes in video data to solve the problem in the prior art that the current video scene segmentation technology is not mature enough, and the target scene extraction is not accurate enough.
- a method for identifying video scenes in video data includes: performing segmentation processing on the complete video data according to different video scenes included in the complete video data to be detected to obtain target video data; wherein, The target video data includes at least one video frame of the video scene; through a preset image recognition model, it is determined that the target video data includes the type of the video scene, and the different types of video scenes respectively included The confidence level of the video frame; wherein the image recognition model is a deep neural network model that classifies the video scene contained in the target video data according to the feature information of the video scene contained in the target video data; The confidence levels of the video frames contained in the video scenes of the categories are respectively normalized to obtain the weight values of the video scenes of different categories in the target video data; the parameter information input by the user is obtained according to the parameters The information and the size of the weight value obtain a target video scene from the video scene, and return the target video scene to the client.
- the obtaining a target video scene from the video scene according to the size of the parameter information and the weight value specifically includes: extracting a target parameter value from the parameter information; and according to the target parameter value and the weight value.
- the size of the weight value, the first video scene whose weight value reaches or exceeds a preset weight threshold is obtained from the video scene, and the first video scene is taken as the target video scene; wherein, the first video scene The video scene includes at least one of the video scenes.
- the video scene recognition method further includes: extracting a target parameter value from the parameter information; judging whether the target parameter value is greater than the number of types of the video scenes contained in the target video data, if so, Then, the alarm prompt information is returned to the client.
- the confidence of the video frame refers to the probability value that the video frame is a video frame corresponding to the video scene.
- the segmentation of the complete video data according to different video scenes included in the complete video data to be detected to obtain target video data specifically includes: obtaining the complete video data to be detected; and using a feature extraction algorithm Obtain the color features of the video scene in the complete video data; Obtain the color features of the video frame in the complete video data through the feature extraction algorithm; According to two adjacent videos in the complete video data The first color feature difference of the scene and the second color feature difference between two adjacent video frames in the complete video data, determine the switching position between the adjacent video scenes; The complete video data is segmented.
- the present application also provides a device for identifying video scenes in video data, including: a segmentation processing unit, configured to perform segmentation processing on the complete video data according to different video scenes contained in the complete video data to be detected, to obtain Target video data; wherein the target video data includes at least one video frame of the video scene; the video scene recognition unit is configured to determine that the target video data includes the type of the video scene through a preset image recognition model , And obtain the confidence levels of the video frames respectively contained in the different types of the video scenes; wherein, the image recognition model is based on the target video data containing the feature information of the video scene to the target video data.
- a segmentation processing unit configured to perform segmentation processing on the complete video data according to different video scenes contained in the complete video data to be detected, to obtain Target video data
- the target video data includes at least one video frame of the video scene
- the video scene recognition unit is configured to determine that the target video data includes the type of the video scene through a preset image recognition model , And obtain the confidence levels of the video frames respectively
- the deep neural network model for classifying the video scenes is used to normalize the confidence levels of the video frames contained in the video scenes of different categories to obtain the video scenes of different categories.
- the weight value in the target video data; the target video data obtaining unit is configured to obtain parameter information input by the user, obtain the target video scene from the video scene according to the parameter information and the size of the weight value, and combine The target video scene is returned to the client.
- the segmentation processing unit is specifically configured to: extract a target parameter value from the parameter information; according to the size of the target parameter value and the weight value, obtain the weight value from the video scene to reach or For a first video scene that exceeds a preset weight threshold, use the first video scene as the target video scene; wherein, the first video scene includes at least one of the video scenes.
- the target video data obtaining unit is specifically configured to: obtain the complete video data to be detected; obtain the color features of the video scene in the complete video data through a feature extraction algorithm; obtain through the feature extraction algorithm The color feature of the video frame in the complete video data; according to the first color feature difference of two adjacent video scenes in the complete video data and the difference between two adjacent video frames in the complete video data The second color feature difference is determined to determine the switching position between the adjacent video scenes; the complete video data is segmented according to the switching position.
- the present application also provides an electronic device, including: a processor and a memory; wherein the memory is used to store a program of a method for identifying video scenes in video data, and the device is powered on and runs the identification through the processor. After the procedure of the method of the video scene in the video data, perform the following steps:
- the complete video data is segmented to obtain target video data; wherein, the target video data includes at least one video frame of the video scene;
- the image recognition model determines that the target video data includes the type of the video scene, and obtains the confidence levels of the video frames contained in the different types of the video scenes; wherein the image recognition model is based on the target video data
- a deep neural network model that includes the feature information of the video scene to classify the video scenes included in the target video data; and normalizes the confidence levels of the video frames included in the video scenes of different categories, Obtain the weight values of the video scenes of different categories in the target video data; obtain the parameter information input by the user, obtain the target video scene from the video scene according to the parameter information and the size of the weight value, and Return the target video scene to the client.
- the present application also provides a storage device that stores a program of a method for identifying video scenes in video data, and the program is run by a processor to perform the following steps:
- the complete video data is segmented to obtain target video data; wherein, the target video data includes at least one video frame of the video scene;
- the image recognition model determines that the target video data includes the type of the video scene, and obtains the confidence levels of the video frames contained in the different types of the video scenes; wherein the image recognition model is based on the target video data
- a deep neural network model that includes the feature information of the video scene to classify the video scenes included in the target video data; and normalizes the confidence levels of the video frames included in the video scenes of different categories, Obtain the weight values of the video scenes of different categories in the target video data; obtain the parameter information input by the user, obtain the target video scene from the video scene according to the parameter information and the size of the weight value, and Return the target video scene to the client.
- the confidence levels of the video frames contained in different types of video scenes can be obtained through a preset image recognition model, and then normalized processing is performed to obtain the video scene
- the weight value in the target video data is compared with the size of the weight value to quickly determine the target video scene corresponding to the target video data, which improves the collection efficiency and accuracy of the target video scene, thereby improving the user experience.
- FIG. 1 is a flowchart of a method for identifying video scenes in video data provided by an embodiment of the application
- FIG. 2 is a schematic diagram of a device for identifying video scenes in video data provided by an embodiment of the application
- FIG. 3 is a schematic diagram of an electronic device provided by an embodiment of the application.
- FIG. 1 it is a flowchart of a method for identifying a video scene in video data provided by an embodiment of the application.
- the specific implementation process includes the following steps:
- Step S101 Perform segmentation processing on the complete video data according to different video scenes included in the complete video data to be detected to obtain target video data; wherein the target video data includes at least one video frame of the video scene.
- the complete video data may include at least two video scenes.
- at least two pieces of video data ie, target video data
- Each piece of video data can contain several video frames. Due to the immaturity of the current video data segmentation, the two pieces of video data obtained after the segmentation process may be mixed with some video frames in the same video scene. Therefore, the target video data includes at least one of the video scenes.
- Video frames at this time, when the existing neural network is used to identify the target video data, the misrecognition rate is relatively high, and further processing is required.
- segmentation of the complete video data according to the different video scenes contained in the complete video data to be detected to obtain the target video data can be specifically implemented in the following ways:
- Obtaining the complete video data to be detected, obtaining the color features of the video scene in the complete video data through a feature extraction algorithm, and obtaining the color features of the video frame in the complete video data through the feature extraction algorithm, Determine the adjacent video scenes based on the first color feature difference between two adjacent video scenes in the complete video data and the second color feature difference between two adjacent video frames in the complete video data.
- the switching position between the two, the complete video data is segmented according to the switching position.
- Step S102 Determine the type of the video scene included in the target video data through a preset image recognition model, and obtain the confidence levels of the video frames respectively included in the video scenes of different types; wherein, the image recognition model A deep neural network model used to classify the video scene contained in the target video data according to the feature information of the video scene contained in the target video data.
- step 101 After segmenting the complete video data to be identified in the step 101 to obtain the target video data, data preparation is done for analyzing the video scene in this step.
- step 102 the video scene is analyzed through a preset image recognition model, it can be determined that the target video data contains the type of the video scene, and the video frames contained in the different types of video scenes can be obtained. Confidence.
- the confidence level of the video frame refers to a probability value indicating that the video frame is a video frame in the video scene.
- the higher the obtained confidence the greater the probability that the video frame is the video frame in the video scene, that is, the higher the probability; on the contrary, the lower the obtained confidence, the lower the video frame is in the video scene.
- the smaller the probability value of the video frame the smaller the probability.
- the complete video data is segmented according to the different video scenes contained in the complete video data (such as a video recording) to be detected, and the target video data obtained Usually contains at least one video frame of the video scene. That is, the target video data obtained through the segmentation process may include video frames of multiple shots.
- Video frame is the probability value of the video frame in the video scene of the corresponding category.
- the preset image recognition model described in this application may refer to a video scene classifier based on a neural network
- the training of the video scene classifier may be by collecting different types of sample images, Extract the image features from the sample images, and then use the extracted image features and sample image types to train the video scene classifier.
- Common video scene classifiers include Support Vector Machine (SVM) and Bayesian classifiers. Logistic regression classifiers, etc., are not specifically limited here.
- Step S103 Normalize the confidence levels of the video frames included in the video scenes of different categories to obtain weight values of the video scenes of different categories in the target video data.
- step 102 After obtaining the confidence levels of the video frames respectively contained in the different types of video scenes in the step 102, data preparation is done for the normalization process in this step.
- step 103 the confidence levels of the video frames included in the video scenes of different categories may be normalized respectively, so as to obtain the weight values of the video scenes of different categories in the target video data.
- the complete video data is segmented according to the different video scenes included.
- the obtained target video data can include video frames of three different video scenes: A, B, and C.
- three different video scene categories of A, B, and C may be obtained. Assuming that a category A video scene contains 5 video frames, the confidence of each video frame is 0.6, 0.5, 0.4, 0.4, 0.6; the B category video scene contains 3 video frames, and the confidence of each video frame is 0.6, 0.1 , 0.5; C category video scene contains 2 video frames, and the confidence of each video frame is 0.3 and 0.3 in turn.
- a weighted average algorithm is used to normalize the confidence levels of the video frames included in the video scenes of different categories to obtain the weighted average of each video scene. Specifically, the confidence levels of 0.6, 0.5, 0.4, 0.4, and 0.6 corresponding to the 5 video frames are added together, and then divided by the number of video frames of 5 to obtain a weighted average value of 0.5. Similarly, it can be obtained that the weighted average value of category B video scenes is 0.4, and the weighted average value of category C video scenes is 0.3.
- the weighted average value is the weight value of the different types of video scenes in the target video data described in this application. That is: the weight value of the video scene in category A in the target video data is 0.5; the weight value of the video scene in category B in the target video data is 0.4; the video scene in category C is in The weight value in the target video data is 0.3.
- the video frame with too low confidence may be the result of interference from other factors
- the confidence threshold is set to 0.5, and then the weighted average value of each video scene is calculated according to the video frames that meet or exceed the confidence threshold. For example: at this time, the video scene of category A only considers the video frames with confidence levels of 0.6, 0.5 and 0.6, the video scene of category B only considers the video frames with confidence levels of 0.6 and 0.5, and the video scene of category C There are no video frames that exceed the confidence threshold.
- the weighted average of video scenes in category A is (0.6+0.5+0.5)/3
- the weighted average of video scenes in category B is (0.6+0.5)/2
- the weighted average of video scenes in category C is 0 .
- 3-6 video frames with the highest score can be selected from each category for calculation.
- Step S104 Obtain parameter information input by the user, obtain a target video scene from the video scene according to the parameter information and the size of the weight value, and return the target video scene to the client.
- step 103 After obtaining the weight values of the video scenes of different categories in the target video data in the step 103, data preparation is done to obtain the target video scene from the video scenes in this step.
- the target video scene may be obtained from the video scene according to the parameter information input by the user and the size of the weight value, and the target video scene may be returned to the client.
- the target parameter value can be extracted from the parameter information input by the user.
- the weight value can be obtained from the video scene.
- the first video scene that exceeds the preset weight threshold the first video scene is used as the target video scene.
- the first video scene includes at least one of the video scenes.
- the target parameter value N input by the user is 2
- the weight values in the target video data in descending order are: A category video scene>B category video scene>C category video scene.
- the preset weight threshold may be set to 3, and the first video scene (ie, category A video scene and category B video scene) whose weight value reaches or exceeds 3 is obtained from the video scene.
- the category A video scene and the category B video scene are used as the target video scene for identifying the video data.
- the preset weight threshold can be set to 5, and the first video scene with the weight value reaching or exceeding 5 (ie: category A video) is obtained from the video scene. Scenes).
- a category video scene is used as the target video scene for identifying the video data.
- the target parameter value is greater than the number of types of video scenes included in the target video data, and if so, an alarm prompt information is returned to the client. That is: when the target parameter value N input by the user is 5, the target parameter value N is 5 greater than the number of types of video scenes contained in the target video data 3. At this time, an alert message is returned to the client to remind user.
- the confidence levels of the video frames contained in different types of video scenes can be obtained through a preset image recognition model, and then normalized processing is performed to obtain the video scene
- the weight value in the target video data is compared with the size of the weight value to quickly determine the target video scene corresponding to the target video data, which improves the collection efficiency and accuracy of the target video scene, thereby improving the user experience.
- this application also provides a device for identifying video scenes in video data. Since the embodiment of the device is similar to the above method embodiment, the description is relatively simple. For related details, please refer to the description of the above method embodiment section.
- the embodiment of the device for identifying video scenes in video data described below is only Schematic. Please refer to FIG. 2, which is a schematic diagram of an apparatus for identifying video scenes in video data provided by an embodiment of the application.
- the device for identifying video scenes in video data described in this application includes the following parts:
- the segmentation processing unit 201 is configured to segment the complete video data according to different video scenes contained in the complete video data to be detected to obtain target video data; wherein, the target video data includes at least one of the video scenes Video frame.
- the complete video data may include at least two video scenes.
- at least two pieces of video data ie, target video data
- Each piece of video data can contain several video frames. Due to the immaturity of the current video data segmentation, the two pieces of video data obtained after the segmentation process may be mixed with some video frames in the same video scene. Therefore, the target video data includes at least one of the video scenes.
- Video frames at this time, when the existing neural network is used to identify the target video data, the misrecognition rate is relatively high, and further processing is required.
- segmentation of the complete video data according to the different video scenes contained in the complete video data to be detected to obtain the target video data can be specifically implemented in the following ways:
- Obtaining the complete video data to be detected, obtaining the color features of the video scene in the complete video data through a feature extraction algorithm, and obtaining the color features of the video frame in the complete video data through the feature extraction algorithm, Determine the adjacent video scenes based on the first color feature difference between two adjacent video scenes in the complete video data and the second color feature difference between two adjacent video frames in the complete video data.
- the switching position between the two, the complete video data is segmented according to the switching position.
- the video scene recognition unit 202 is configured to determine the type of the video scene contained in the target video data through a preset image recognition model, and obtain the confidence levels of the video frames contained in the video scenes of different types; wherein,
- the image recognition model is a deep neural network model that classifies the video scene included in the target video data according to the feature information of the video scene included in the target video data.
- the confidence level of the video frame refers to the probability value that the video frame is a video frame corresponding to the video scene.
- the complete video data is segmented according to the different video scenes contained in the complete video data to be detected, and the obtained target video data includes at least one of the videos.
- the video frame of the scene Therefore, by identifying and classifying the video scenes contained in the target video data through the preset image recognition model, it is possible to obtain at least one video scene category and the confidence that the video scenes of different categories respectively contain video frames.
- the preset image recognition model described in this application may refer to a video scene classifier based on a neural network
- the training of the video scene classifier may be by collecting different types of sample images, Extract the image features from the sample images, and then use the extracted image features and sample image types to train the video scene classifier.
- Common video scene classifiers include Support Vector Machine (SVM) and Bayesian classifiers. Logistic regression classifier, etc.
- the video scene weight analysis unit 203 is configured to normalize the confidence levels of the video frames included in the video scenes of different categories to obtain the weight values of the video scenes of different categories in the target video data.
- the complete video data is segmented according to the different video scenes included.
- the obtained target video data may include video frames of three different video scenes: A, B, and C.
- three different video scene categories of A, B, and C may be obtained. Assuming that a category A video scene contains 5 video frames, the confidence of each video frame is 0.6, 0.5, 0.2, 0.2, 0.5; the B category video scene contains 3 video frames, and the confidence of each video frame is 0.7, 0.3 , 0.5; C category video scene contains 2 video frames, and the confidence of each video frame is 0.3 and 0.3 in turn.
- a weighted average algorithm is used to normalize the confidence levels of the video frames included in the video scenes of different categories to obtain the weighted average of each video scene. Specifically, the confidence levels of 0.6, 0.5, 0.4, 0.4, and 0.6 corresponding to the 5 video frames are added together, and then divided by the number of video frames of 5 to obtain a weighted average value of 0.5. Similarly, it can be obtained that the weighted average value of category B video scenes is 0.4, and the weighted average value of category C video scenes is 0.3.
- the weighted average value is the weight value of the different types of video scenes in the target video data described in this application. That is: the weight value of the video scene in category A in the target video data is 0.5; the weight value of the video scene in category B in the target video data is 0.4; the video scene in category C is in The weight value in the target video data is 0.3.
- the video frame with too low confidence may be the result of interference from other factors
- the confidence threshold is set to 0.5, and then the weighted average value of each video scene is calculated according to the video frames that meet or exceed the confidence threshold. For example: at this time, the video scene of category A only considers the video frames with confidence levels of 0.6, 0.5 and 0.6, the video scene of category B only considers the video frames with confidence levels of 0.6 and 0.5, and the video scene of category C There are no video frames that exceed the confidence threshold.
- the weighted average of video scenes in category A is (0.6+0.5+0.5)/3
- the weighted average of video scenes in category B is (0.6+0.5)/2
- the weighted average of video scenes in category C is 0 .
- 3-6 video frames with the highest score can be selected from each category for calculation.
- the target video data obtaining unit 204 is configured to obtain parameter information input by the user, obtain a target video scene from the video scene according to the parameter information and the size of the weight value, and return the target video scene to the client .
- the target parameter value can be extracted from the parameter information input by the user.
- the weight value can be obtained from the video scene.
- the first video scene that exceeds the preset weight threshold the first video scene is used as the target video scene.
- the first video scene includes at least one of the video scenes.
- the target parameter value N input by the user is 2
- the weight values in the target video data in descending order are: A category video scene>B category video scene>C category video scene.
- the preset weight threshold may be set to 3, and the first video scene (ie, category A video scene and category B video scene) whose weight value reaches or exceeds 3 is obtained from the video scene.
- the category A video scene and the category B video scene are used as the target video scene for identifying the video data.
- the preset weight threshold can be set to 5, and the first video scene with the weight value reaching or exceeding 5 (ie: category A video) is obtained from the video scene. Scenes).
- a category video scene is used as the target video scene for identifying the video data.
- the target parameter value is greater than the number of types of video scenes included in the target video data, and if so, an alarm prompt information is returned to the client. That is: when the target parameter value N input by the user is 5, the target parameter value N is 5 greater than the number of types of video scenes contained in the target video data 3. At this time, an alarm message is returned to the client to remind user.
- the confidence levels of the video frames contained in different types of video scenes can be obtained through a preset image recognition model, and then normalized processing is performed to obtain the video scene in
- the weight value in the target video data is compared with the size of the weight value to quickly determine the target video scene corresponding to the target video data, which improves the collection efficiency and accuracy of the target video scene, thereby improving the user experience.
- the present application also provides an electronic device. Since the embodiment of the electronic device is similar to the foregoing method embodiment, the description is relatively simple. For related details, please refer to the description of the foregoing method embodiment section.
- the electronic device described below is only illustrative. Please refer to FIG. 3, which is a schematic diagram of an electronic device provided by an embodiment of this application.
- the present application provides an electronic device that specifically includes: a processor 301 and a memory 302; wherein the memory 302 is used to store a program of a method for identifying video scenes in video data, and the device is powered on and runs through the processor 301 After identifying the procedure of the video scene in the video data, perform the following steps:
- the complete video data is segmented to obtain target video data; wherein, the target video data includes at least one video frame of the video scene;
- the image recognition model determines that the target video data includes the type of the video scene, and obtains the confidence levels of the video frames contained in the different types of the video scenes; wherein the image recognition model is based on the target video data
- a deep neural network model that includes the feature information of the video scene to classify the video scenes included in the target video data; and normalizes the confidence levels of the video frames included in the video scenes of different categories, Obtain the weight values of the video scenes of different categories in the target video data; obtain the parameter information input by the user, obtain the target video scene from the video scene according to the parameter information and the size of the weight value, and Return the target video scene to the client.
- this application also provides a storage device, including: a program storing a method for identifying a video scene in video data, the program being run by a processor, and performing the following steps:
- the complete video data is segmented to obtain target video data; wherein, the target video data includes at least one video frame of the video scene;
- the image recognition model determines that the target video data includes the type of the video scene, and obtains the confidence levels of the video frames contained in the different types of the video scenes; wherein the image recognition model is based on the target video data
- a deep neural network model that includes the feature information of the video scene to classify the video scenes included in the target video data; and normalizes the confidence levels of the video frames included in the video scenes of different categories, Obtain the weight values of the video scenes of different categories in the target video data; obtain the parameter information input by the user, obtain the target video scene from the video scene according to the parameter information and the size of the weight value, and Return the target video scene to the client.
- the processor or the processor module may be an integrated circuit chip with signal processing capability.
- the processor can be a general-purpose processor, a digital signal processor (DSP), an application specific integrated circuit (ASIC), a field programmable gate array (Field Programmable Gate Array, FPGA) or other Programmable logic devices, discrete gate or transistor logic devices, discrete hardware components.
- DSP digital signal processor
- ASIC application specific integrated circuit
- FPGA field programmable gate array
- Programmable logic devices discrete gate or transistor logic devices, discrete hardware components.
- the methods, steps, and logical block diagrams disclosed in the embodiments of the present application can be implemented or executed.
- the general-purpose processor may be a microprocessor or the processor may also be any conventional processor or the like.
- the steps of the method disclosed in the embodiments of the present application may be directly embodied as being executed and completed by a hardware decoding processor, or executed and completed by a combination of hardware and software modules in the decoding processor.
- the software module can be located in a mature storage medium in the field such as random access memory, flash memory, read-only memory, programmable read-only memory, or electrically erasable programmable memory, registers.
- the processor reads the information in the storage medium and completes the steps of the above method in combination with its hardware.
- the storage medium may be a memory, for example, may be a volatile memory or a non-volatile memory, or may include both volatile and non-volatile memory.
- the non-volatile memory can be a read-only memory (Read-Only Memory, ROM for short), a programmable read-only memory (Programmable ROM, PROM for short), and an erasable programmable read-only memory (Erasable PROM, EPROM for short). , Electrically Erasable Programmable Read-Only Memory (Electrically EPROM, EEPROM for short) or flash memory.
- Read-Only Memory Read-Only Memory
- PROM programmable read-only memory
- Erasable PROM Erasable PROM, EPROM for short
- Electrically Erasable Programmable Read-Only Memory Electrically Erasable Programmable Read-Only Memory (Electrically EPROM, EEPROM for short) or flash memory.
- the volatile memory may be a random access memory (Random Access Memory, RAM for short), which is used as an external cache.
- RAM Random Access Memory
- many forms of RAM are available, such as static random access memory (Static RAM, SRAM for short), dynamic random access memory (Dynamic RAM, DRAM for short), and synchronous dynamic random access memory (Synchronous RAM).
- DRAM static random access memory
- DRAM dynamic random access memory
- DRAM dynamic random access memory
- Synchronous RAM Synchronous Dynamic Random Access Memory
- DRAM Double Data Rate Synchronous Dynamic Random Access Memory
- Enhanced SDRAM for ESDRAM
- Synch link DRAM, SLDRAM for short Synchronously Connected Dynamic Random Access Memory
- DRRAM Direct Ram bus RAM
- the storage media described in the embodiments of the present application are intended to include, but are not limited to, these and any other suitable types of memory.
- the functions described in this application can be implemented by a combination of hardware and software.
- the corresponding function can be stored in a computer-readable medium or transmitted as one or more instructions or codes on the computer-readable medium.
- the computer-readable medium includes a computer storage medium and a communication medium, where the communication medium includes any medium that facilitates the transfer of a computer program from one place to another.
- the storage medium may be any available medium that can be accessed by a general-purpose or special-purpose computer.
Landscapes
- Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- Image Analysis (AREA)
Abstract
Description
Claims (10)
- 一种识别视频数据中视频场景的方法,其特征在于,包括:A method for identifying video scenes in video data, which is characterized in that it includes:按照待检测的完整视频数据所包含视频场景的不同对所述完整视频数据进行分割处理,获得目标视频数据;其中,所述目标视频数据包含至少一个所述视频场景的视频帧;Performing segmentation processing on the complete video data according to different video scenes included in the complete video data to be detected to obtain target video data; wherein the target video data includes at least one video frame of the video scene;通过预设的图像识别模型,确定所述目标视频数据包含所述视频场景的种类,并获得不同种类的所述视频场景分别包含的视频帧的置信度;其中,所述图像识别模型为所述目标视频数据包含的所述视频场景的特征信息对所述目标视频数据包含的所述视频场景进行分类的深度神经网络模型;Through a preset image recognition model, it is determined that the target video data includes the type of the video scene, and the confidence levels of the video frames contained in the different types of the video scenes are obtained; wherein, the image recognition model is the The feature information of the video scene included in the target video data is a deep neural network model that classifies the video scene included in the target video data;将不同类别的所述视频场景所包含视频帧的置信度分别做归一化处理,获得不同类别的所述视频场景在所述目标视频数据中的权重值;Normalizing the confidence levels of the video frames included in the video scenes of different categories to obtain the weight values of the video scenes of the different categories in the target video data;获得用户输入的参数信息,根据所述参数信息和所述权重值的大小从所述视频场景中获得目标视频场景,并将所述目标视频场景返回至客户端。Obtain parameter information input by the user, obtain a target video scene from the video scene according to the parameter information and the size of the weight value, and return the target video scene to the client.
- 根据权利要求1所述的识别视频数据中视频场景的方法,其特征在于,所述按照待检测的完整视频数据所包含视频场景的不同对所述完整视频数据进行分割处理,获得目标视频数据,具体包括:The method for recognizing video scenes in video data according to claim 1, wherein the complete video data is segmented according to different video scenes included in the complete video data to be detected to obtain the target video data, Specifically:获得待检测的所述完整视频数据;Obtaining the complete video data to be detected;通过特征提取算法获得所述完整视频数据中所述视频场景的颜色特征;Obtaining the color features of the video scene in the complete video data through a feature extraction algorithm;通过所述特征提取算法获得所述完整视频数据中所述视频帧的颜色特征;Obtaining the color feature of the video frame in the complete video data by using the feature extraction algorithm;根据所述完整视频数据中相邻两个所述视频场景之间的第一颜色特征差异和所述完整视频数据中相邻两个视频帧之间的第二颜色特征差异,判断相邻所述视频场景之间的切换位置;According to the first color feature difference between two adjacent video scenes in the complete video data and the second color feature difference between two adjacent video frames in the complete video data, determine the adjacent Switch position between video scenes;根据所述切换位置对所述完整视频数据进行分割处理。Perform segmentation processing on the complete video data according to the switching position.
- 根据权利要求1所述的识别视频数据中视频场景的方法,其特征在于,所述根据所述参数信息和所述权重值的大小从所述视频场景中获得目标视频场景,具体包括:The method for recognizing a video scene in video data according to claim 1, wherein the obtaining a target video scene from the video scene according to the parameter information and the size of the weight value specifically comprises:从所述参数信息中提取目标参数值;Extracting the target parameter value from the parameter information;根据所述目标参数值和所述权重值的大小,从所述视频场景中获得所述权重值达到或者超过预设权重阈值的第一视频场景,将所述第一视频场景作为所述目标视频场景;其中,所述第一视频场景包含至少一个所述视频场景。According to the size of the target parameter value and the weight value, obtain a first video scene whose weight value reaches or exceed a preset weight threshold from the video scene, and use the first video scene as the target video Scene; wherein, the first video scene includes at least one of the video scenes.
- 根据权利要求1所述的识别视频数据中视频场景的方法,其特征在于,还包括:The method for identifying video scenes in video data according to claim 1, further comprising:从所述参数信息中提取目标参数值;Extracting the target parameter value from the parameter information;判断所述目标参数值是否大于所述目标视频数据所包含的所述视频场景的种类数量,若是,则向所述客户端返回告警提示信息。It is determined whether the target parameter value is greater than the number of types of video scenes included in the target video data, and if so, an alarm prompt message is returned to the client.
- 根据权利要求1所述的识别视频数据中视频场景的方法,其特征在于,所述视频帧的置信度是指所述视频帧为所述视频场景对应的视频帧的概率值。The method for identifying a video scene in video data according to claim 1, wherein the confidence of the video frame refers to the probability value of the video frame being a video frame corresponding to the video scene.
- 一种识别视频数据中视频场景的装置,其特征在于,包括:A device for identifying video scenes in video data is characterized in that it comprises:分割处理单元,用于按照待检测的完整视频数据所包含视频场景的不同对所述完整视频数据进行分割处理,获得目标视频数据;其中,所述目标视频数据包含至少一个所述视频场景的视频帧;The segmentation processing unit is configured to segment the complete video data according to the different video scenes contained in the complete video data to be detected to obtain target video data; wherein the target video data includes at least one video of the video scene frame;视频场景识别单元,用于通过预设的图像识别模型,确定所述目标视频数据包含所述视频场景的种类,并获得不同种类的所述视频场景分别包含的视频帧的置信度;其中,所述图像识别模型为根据所述目标视频数据包含所述视频场景的特征信息对所述目标视频数据包含的所述视频场景进行分类的深度神经网络模型;The video scene recognition unit is used to determine the type of the video scene contained in the target video data through a preset image recognition model, and obtain the confidence levels of the video frames contained in the video scenes of different types; wherein, The image recognition model is a deep neural network model that classifies the video scene included in the target video data according to the feature information of the video scene included in the target video data;视频场景权重分析单元,用于将不同类别的所述视频场景所包含视频帧的置信度分别做归一化处理,获得不同类别的所述视频场景在所述目标视频数据中的权重值;A video scene weight analysis unit, configured to normalize the confidence levels of the video frames included in the video scenes of different categories to obtain the weight values of the video scenes of different categories in the target video data;目标视频数据获得单元,用于获得用户输入的参数信息,根据所述参数信息和所述权重值的大小从所述视频场景中获得目标视频场景,并将所述目标视频场景返回至客户端。The target video data obtaining unit is configured to obtain parameter information input by the user, obtain a target video scene from the video scene according to the parameter information and the size of the weight value, and return the target video scene to the client.
- 根据权利要求6所述的识别视频数据中视频场景的装置,其特征在于,所述目标视频数据获得单元具体用于:The device for identifying video scenes in video data according to claim 6, wherein the target video data obtaining unit is specifically configured to:获得待检测的所述完整视频数据;Obtaining the complete video data to be detected;通过特征提取算法获得所述完整视频数据中所述视频场景的颜色特征;Obtaining the color features of the video scene in the complete video data through a feature extraction algorithm;通过所述特征提取算法获得所述完整视频数据中所述视频帧的颜色特征;Obtaining the color feature of the video frame in the complete video data by using the feature extraction algorithm;根据所述完整视频数据中相邻两个所述视频场景之间的第一颜色特征差异和所述完整视频数据中相邻两个视频帧之间的第二颜色特征差异,判断相邻所述视频场景之间的切换位置;According to the first color feature difference between two adjacent video scenes in the complete video data and the second color feature difference between two adjacent video frames in the complete video data, determine the adjacent Switch position between video scenes;根据所述切换位置对所述完整视频数据进行分割处理。Perform segmentation processing on the complete video data according to the switching position.
- 根据权利要求6所述的识别视频数据中视频场景的装置,其特征在于,所述分割处理单元具体用于:The device for identifying video scenes in video data according to claim 6, wherein the segmentation processing unit is specifically configured to:从所述参数信息中提取目标参数值;Extracting the target parameter value from the parameter information;根据所述目标参数值和所述权重值的大小,从所述视频场景中获得所述权重值达到或者超过预设权重阈值的第一视频场景,将所述第一视频场景作为所述目标视频场景;其中,所述第一视频场景包含至少一个所述视频场景。According to the size of the target parameter value and the weight value, obtain a first video scene whose weight value reaches or exceed a preset weight threshold from the video scene, and use the first video scene as the target video Scene; wherein, the first video scene includes at least one of the video scenes.
- 一种电子设备,其特征在于,包括:An electronic device, characterized in that it comprises:处理器;以及Processor; and存储器,用于存储识别视频数据中视频场景的方法的程序,该设备通电并通过所述处理器运行该识别视频数据中视频场景的方法的程序后,执行下述步骤:The memory is used to store the program of the method for identifying the video scene in the video data. After the device is powered on and runs the program of the method for identifying the video scene in the video data through the processor, the following steps are executed:按照待检测的完整视频数据所包含视频场景的不同对所述完整视频数据进行分割处理,获得目标视频数据;其中,所述目标视频数据包含 至少一个所述视频场景的视频帧;Performing segmentation processing on the complete video data according to different video scenes included in the complete video data to be detected to obtain target video data; wherein the target video data includes at least one video frame of the video scene;通过预设的图像识别模型,确定所述目标视频数据包含所述视频场景的种类,并获得不同种类的所述视频场景分别包含的视频帧的置信度;其中,所述图像识别模型为根据所述目标视频数据包含所述视频场景的特征信息对所述目标视频数据包含的所述视频场景进行分类的深度神经网络模型;Through a preset image recognition model, the target video data is determined to include the type of the video scene, and the confidence levels of the video frames contained in the different types of video scenes are obtained; wherein, the image recognition model is based on The target video data includes feature information of the video scene, and a deep neural network model that classifies the video scene included in the target video data;将不同类别的所述视频场景所包含视频帧的置信度分别做归一化处理,获得不同类别的所述视频场景在所述目标视频数据中的权重值;Normalizing the confidence levels of the video frames included in the video scenes of different categories to obtain the weight values of the video scenes of the different categories in the target video data;获得用户输入的参数信息,根据所述参数信息和所述权重值的大小从所述视频场景中获得目标视频场景,并将所述目标视频场景返回至客户端。Obtain parameter information input by the user, obtain a target video scene from the video scene according to the parameter information and the size of the weight value, and return the target video scene to the client.
- 一种存储设备,其特征在于,存储有识别视频数据中视频场景的方法的程序,该程序被处理器运行,执行下述步骤:A storage device, characterized in that it stores a program of a method for identifying video scenes in video data, and the program is run by a processor to perform the following steps:按照待检测的完整视频数据所包含视频场景的不同对所述完整视频数据进行分割处理,获得目标视频数据;其中,所述目标视频数据包含至少一个所述视频场景的视频帧;Performing segmentation processing on the complete video data according to different video scenes included in the complete video data to be detected to obtain target video data; wherein the target video data includes at least one video frame of the video scene;通过预设的图像识别模型,确定所述目标视频数据包含所述视频场景的种类,并获得不同种类的所述视频场景分别包含的视频帧的置信度;其中,所述图像识别模型为根据所述目标视频数据包含所述视频场景的特征信息对所述目标视频数据包含的所述视频场景进行分类的深度神经网络模型;Through a preset image recognition model, the target video data is determined to include the type of the video scene, and the confidence levels of the video frames contained in the different types of video scenes are obtained; wherein, the image recognition model is based on The target video data includes feature information of the video scene, and a deep neural network model that classifies the video scene included in the target video data;将不同类别的所述视频场景所包含视频帧的置信度分别做归一化处理,获得不同类别的所述视频场景在所述目标视频数据中的权重值;Normalizing the confidence levels of the video frames included in the video scenes of different categories to obtain the weight values of the video scenes of the different categories in the target video data;获得用户输入的参数信息,根据所述参数信息和所述权重值的大小从所述视频场景中获得目标视频场景,并将所述目标视频场景返回至客户端。Obtain parameter information input by the user, obtain a target video scene from the video scene according to the parameter information and the size of the weight value, and return the target video scene to the client.
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910522913.2A CN110149531A (en) | 2019-06-17 | 2019-06-17 | The method and apparatus of video scene in a kind of identification video data |
CN201910522913.2 | 2019-06-17 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2020252975A1 true WO2020252975A1 (en) | 2020-12-24 |
Family
ID=67591546
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2019/108434 WO2020252975A1 (en) | 2019-06-17 | 2019-09-27 | Method and apparatus for recognizing video scene in video data |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN110149531A (en) |
WO (1) | WO2020252975A1 (en) |
Families Citing this family (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110149531A (en) * | 2019-06-17 | 2019-08-20 | 北京影谱科技股份有限公司 | The method and apparatus of video scene in a kind of identification video data |
CN110933462B (en) * | 2019-10-14 | 2022-03-25 | 咪咕文化科技有限公司 | Video processing method, system, electronic device and storage medium |
CN113177603B (en) * | 2021-05-12 | 2022-05-06 | 中移智行网络科技有限公司 | Training method of classification model, video classification method and related equipment |
CN115334351B (en) * | 2022-08-02 | 2023-10-31 | Vidaa国际控股(荷兰)公司 | Display equipment and self-adaptive image quality adjusting method |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20070030391A1 (en) * | 2005-08-04 | 2007-02-08 | Samsung Electronics Co., Ltd. | Apparatus, medium, and method segmenting video sequences based on topic |
CN102207966A (en) * | 2011-06-01 | 2011-10-05 | 华南理工大学 | Video content quick retrieving method based on object tag |
CN108537134A (en) * | 2018-03-16 | 2018-09-14 | 北京交通大学 | A kind of video semanteme scene cut and mask method |
CN108848422A (en) * | 2018-04-19 | 2018-11-20 | 清华大学 | A kind of video abstraction generating method based on target detection |
CN109213895A (en) * | 2017-07-05 | 2019-01-15 | 合网络技术(北京)有限公司 | A kind of generation method and device of video frequency abstract |
CN110149531A (en) * | 2019-06-17 | 2019-08-20 | 北京影谱科技股份有限公司 | The method and apparatus of video scene in a kind of identification video data |
Family Cites Families (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9754351B2 (en) * | 2015-11-05 | 2017-09-05 | Facebook, Inc. | Systems and methods for processing content using convolutional neural networks |
CN108053420B (en) * | 2018-01-05 | 2021-11-02 | 昆明理工大学 | Partition method based on finite space-time resolution class-independent attribute dynamic scene |
CN109145840B (en) * | 2018-08-29 | 2022-06-24 | 北京字节跳动网络技术有限公司 | Video scene classification method, device, equipment and storage medium |
-
2019
- 2019-06-17 CN CN201910522913.2A patent/CN110149531A/en active Pending
- 2019-09-27 WO PCT/CN2019/108434 patent/WO2020252975A1/en active Application Filing
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20070030391A1 (en) * | 2005-08-04 | 2007-02-08 | Samsung Electronics Co., Ltd. | Apparatus, medium, and method segmenting video sequences based on topic |
CN102207966A (en) * | 2011-06-01 | 2011-10-05 | 华南理工大学 | Video content quick retrieving method based on object tag |
CN109213895A (en) * | 2017-07-05 | 2019-01-15 | 合网络技术(北京)有限公司 | A kind of generation method and device of video frequency abstract |
CN108537134A (en) * | 2018-03-16 | 2018-09-14 | 北京交通大学 | A kind of video semanteme scene cut and mask method |
CN108848422A (en) * | 2018-04-19 | 2018-11-20 | 清华大学 | A kind of video abstraction generating method based on target detection |
CN110149531A (en) * | 2019-06-17 | 2019-08-20 | 北京影谱科技股份有限公司 | The method and apparatus of video scene in a kind of identification video data |
Also Published As
Publication number | Publication date |
---|---|
CN110149531A (en) | 2019-08-20 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2020252975A1 (en) | Method and apparatus for recognizing video scene in video data | |
CN110163114B (en) | Method and system for analyzing face angle and face blurriness and computer equipment | |
US10896349B2 (en) | Text detection method and apparatus, and storage medium | |
CN109697416B (en) | Video data processing method and related device | |
WO2020252917A1 (en) | Fuzzy face image recognition method and apparatus, terminal device, and medium | |
CN106557726B (en) | Face identity authentication system with silent type living body detection and method thereof | |
EP3882809A1 (en) | Face key point detection method, apparatus, computer device and storage medium | |
US20200410212A1 (en) | Fast side-face interference resistant face detection method | |
TWI497422B (en) | A system and method for recognizing license plate image | |
US8358837B2 (en) | Apparatus and methods for detecting adult videos | |
US8867828B2 (en) | Text region detection system and method | |
KR101615254B1 (en) | Detecting facial expressions in digital images | |
TW202004637A (en) | Risk prediction method and apparatus, storage medium, and server | |
US10867182B2 (en) | Object recognition method and object recognition system thereof | |
KR20210110823A (en) | Image recognition method, training method of recognition model, and related devices and devices | |
CN111326139B (en) | Language identification method, device, equipment and storage medium | |
CN111552837A (en) | Animal video tag automatic generation method based on deep learning, terminal and medium | |
WO2018192245A1 (en) | Automatic scoring method for photo based on aesthetic assessment | |
US9111130B2 (en) | Facilitating face detection with user input | |
CN111368632A (en) | Signature identification method and device | |
WO2022062028A1 (en) | Wine label recognition method, wine information management method and apparatus, device, and storage medium | |
CN111401343A (en) | Method for identifying attributes of people in image and training method and device for identification model | |
CN114639155A (en) | Emotion recognition method, emotion recognition device, storage medium and processor | |
CN113947209A (en) | Integrated learning method, system and storage medium based on cloud edge cooperation | |
JP4749879B2 (en) | Face discrimination method, apparatus, and program |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 19933925 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 19933925 Country of ref document: EP Kind code of ref document: A1 |
|
32PN | Ep: public notification in the ep bulletin as address of the adressee cannot be established |
Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 28.03.2022) |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 19933925 Country of ref document: EP Kind code of ref document: A1 |