WO2020252975A1 - Method and apparatus for recognizing video scene in video data - Google Patents

Method and apparatus for recognizing video scene in video data Download PDF

Info

Publication number
WO2020252975A1
WO2020252975A1 PCT/CN2019/108434 CN2019108434W WO2020252975A1 WO 2020252975 A1 WO2020252975 A1 WO 2020252975A1 CN 2019108434 W CN2019108434 W CN 2019108434W WO 2020252975 A1 WO2020252975 A1 WO 2020252975A1
Authority
WO
WIPO (PCT)
Prior art keywords
video
video data
scene
target
scenes
Prior art date
Application number
PCT/CN2019/108434
Other languages
French (fr)
Chinese (zh)
Inventor
彭浩
Original Assignee
北京影谱科技股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京影谱科技股份有限公司 filed Critical 北京影谱科技股份有限公司
Publication of WO2020252975A1 publication Critical patent/WO2020252975A1/en

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/234Processing of video elementary streams, e.g. splicing of video streams, manipulating MPEG-4 scene graphs
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/234Processing of video elementary streams, e.g. splicing of video streams, manipulating MPEG-4 scene graphs
    • H04N21/23418Processing of video elementary streams, e.g. splicing of video streams, manipulating MPEG-4 scene graphs involving operations for analysing video streams, e.g. detecting features or characteristics
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream, rendering scenes according to MPEG-4 scene graphs
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream, rendering scenes according to MPEG-4 scene graphs
    • H04N21/44008Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream, rendering scenes according to MPEG-4 scene graphs involving operations for analysing video streams, e.g. detecting features or characteristics in the video stream

Definitions

  • the embodiments of the present application relate to the technical field of video data processing, in particular to a method and apparatus for identifying video scenes in video data, and in addition, to an electronic device and a storage device.
  • the video scene recognition method usually used in the prior art is to extract the image features of the video frames contained in the video data to characterize the video data, and then identify and identify the image features of the video data based on a preset classifier Classification, different categories correspond to different video scenes, so as to realize the recognition of video scenes in video data.
  • the above video scene recognition method is easily affected by the performance of the classifier, resulting in the classification accuracy of the video scene is still not high, and cannot meet the needs of current users.
  • the embodiments of the present application provide a method and device for recognizing video scenes in video data to solve the problem in the prior art that the current video scene segmentation technology is not mature enough, and the target scene extraction is not accurate enough.
  • a method for identifying video scenes in video data includes: performing segmentation processing on the complete video data according to different video scenes included in the complete video data to be detected to obtain target video data; wherein, The target video data includes at least one video frame of the video scene; through a preset image recognition model, it is determined that the target video data includes the type of the video scene, and the different types of video scenes respectively included The confidence level of the video frame; wherein the image recognition model is a deep neural network model that classifies the video scene contained in the target video data according to the feature information of the video scene contained in the target video data; The confidence levels of the video frames contained in the video scenes of the categories are respectively normalized to obtain the weight values of the video scenes of different categories in the target video data; the parameter information input by the user is obtained according to the parameters The information and the size of the weight value obtain a target video scene from the video scene, and return the target video scene to the client.
  • the obtaining a target video scene from the video scene according to the size of the parameter information and the weight value specifically includes: extracting a target parameter value from the parameter information; and according to the target parameter value and the weight value.
  • the size of the weight value, the first video scene whose weight value reaches or exceeds a preset weight threshold is obtained from the video scene, and the first video scene is taken as the target video scene; wherein, the first video scene The video scene includes at least one of the video scenes.
  • the video scene recognition method further includes: extracting a target parameter value from the parameter information; judging whether the target parameter value is greater than the number of types of the video scenes contained in the target video data, if so, Then, the alarm prompt information is returned to the client.
  • the confidence of the video frame refers to the probability value that the video frame is a video frame corresponding to the video scene.
  • the segmentation of the complete video data according to different video scenes included in the complete video data to be detected to obtain target video data specifically includes: obtaining the complete video data to be detected; and using a feature extraction algorithm Obtain the color features of the video scene in the complete video data; Obtain the color features of the video frame in the complete video data through the feature extraction algorithm; According to two adjacent videos in the complete video data The first color feature difference of the scene and the second color feature difference between two adjacent video frames in the complete video data, determine the switching position between the adjacent video scenes; The complete video data is segmented.
  • the present application also provides a device for identifying video scenes in video data, including: a segmentation processing unit, configured to perform segmentation processing on the complete video data according to different video scenes contained in the complete video data to be detected, to obtain Target video data; wherein the target video data includes at least one video frame of the video scene; the video scene recognition unit is configured to determine that the target video data includes the type of the video scene through a preset image recognition model , And obtain the confidence levels of the video frames respectively contained in the different types of the video scenes; wherein, the image recognition model is based on the target video data containing the feature information of the video scene to the target video data.
  • a segmentation processing unit configured to perform segmentation processing on the complete video data according to different video scenes contained in the complete video data to be detected, to obtain Target video data
  • the target video data includes at least one video frame of the video scene
  • the video scene recognition unit is configured to determine that the target video data includes the type of the video scene through a preset image recognition model , And obtain the confidence levels of the video frames respectively
  • the deep neural network model for classifying the video scenes is used to normalize the confidence levels of the video frames contained in the video scenes of different categories to obtain the video scenes of different categories.
  • the weight value in the target video data; the target video data obtaining unit is configured to obtain parameter information input by the user, obtain the target video scene from the video scene according to the parameter information and the size of the weight value, and combine The target video scene is returned to the client.
  • the segmentation processing unit is specifically configured to: extract a target parameter value from the parameter information; according to the size of the target parameter value and the weight value, obtain the weight value from the video scene to reach or For a first video scene that exceeds a preset weight threshold, use the first video scene as the target video scene; wherein, the first video scene includes at least one of the video scenes.
  • the target video data obtaining unit is specifically configured to: obtain the complete video data to be detected; obtain the color features of the video scene in the complete video data through a feature extraction algorithm; obtain through the feature extraction algorithm The color feature of the video frame in the complete video data; according to the first color feature difference of two adjacent video scenes in the complete video data and the difference between two adjacent video frames in the complete video data The second color feature difference is determined to determine the switching position between the adjacent video scenes; the complete video data is segmented according to the switching position.
  • the present application also provides an electronic device, including: a processor and a memory; wherein the memory is used to store a program of a method for identifying video scenes in video data, and the device is powered on and runs the identification through the processor. After the procedure of the method of the video scene in the video data, perform the following steps:
  • the complete video data is segmented to obtain target video data; wherein, the target video data includes at least one video frame of the video scene;
  • the image recognition model determines that the target video data includes the type of the video scene, and obtains the confidence levels of the video frames contained in the different types of the video scenes; wherein the image recognition model is based on the target video data
  • a deep neural network model that includes the feature information of the video scene to classify the video scenes included in the target video data; and normalizes the confidence levels of the video frames included in the video scenes of different categories, Obtain the weight values of the video scenes of different categories in the target video data; obtain the parameter information input by the user, obtain the target video scene from the video scene according to the parameter information and the size of the weight value, and Return the target video scene to the client.
  • the present application also provides a storage device that stores a program of a method for identifying video scenes in video data, and the program is run by a processor to perform the following steps:
  • the complete video data is segmented to obtain target video data; wherein, the target video data includes at least one video frame of the video scene;
  • the image recognition model determines that the target video data includes the type of the video scene, and obtains the confidence levels of the video frames contained in the different types of the video scenes; wherein the image recognition model is based on the target video data
  • a deep neural network model that includes the feature information of the video scene to classify the video scenes included in the target video data; and normalizes the confidence levels of the video frames included in the video scenes of different categories, Obtain the weight values of the video scenes of different categories in the target video data; obtain the parameter information input by the user, obtain the target video scene from the video scene according to the parameter information and the size of the weight value, and Return the target video scene to the client.
  • the confidence levels of the video frames contained in different types of video scenes can be obtained through a preset image recognition model, and then normalized processing is performed to obtain the video scene
  • the weight value in the target video data is compared with the size of the weight value to quickly determine the target video scene corresponding to the target video data, which improves the collection efficiency and accuracy of the target video scene, thereby improving the user experience.
  • FIG. 1 is a flowchart of a method for identifying video scenes in video data provided by an embodiment of the application
  • FIG. 2 is a schematic diagram of a device for identifying video scenes in video data provided by an embodiment of the application
  • FIG. 3 is a schematic diagram of an electronic device provided by an embodiment of the application.
  • FIG. 1 it is a flowchart of a method for identifying a video scene in video data provided by an embodiment of the application.
  • the specific implementation process includes the following steps:
  • Step S101 Perform segmentation processing on the complete video data according to different video scenes included in the complete video data to be detected to obtain target video data; wherein the target video data includes at least one video frame of the video scene.
  • the complete video data may include at least two video scenes.
  • at least two pieces of video data ie, target video data
  • Each piece of video data can contain several video frames. Due to the immaturity of the current video data segmentation, the two pieces of video data obtained after the segmentation process may be mixed with some video frames in the same video scene. Therefore, the target video data includes at least one of the video scenes.
  • Video frames at this time, when the existing neural network is used to identify the target video data, the misrecognition rate is relatively high, and further processing is required.
  • segmentation of the complete video data according to the different video scenes contained in the complete video data to be detected to obtain the target video data can be specifically implemented in the following ways:
  • Obtaining the complete video data to be detected, obtaining the color features of the video scene in the complete video data through a feature extraction algorithm, and obtaining the color features of the video frame in the complete video data through the feature extraction algorithm, Determine the adjacent video scenes based on the first color feature difference between two adjacent video scenes in the complete video data and the second color feature difference between two adjacent video frames in the complete video data.
  • the switching position between the two, the complete video data is segmented according to the switching position.
  • Step S102 Determine the type of the video scene included in the target video data through a preset image recognition model, and obtain the confidence levels of the video frames respectively included in the video scenes of different types; wherein, the image recognition model A deep neural network model used to classify the video scene contained in the target video data according to the feature information of the video scene contained in the target video data.
  • step 101 After segmenting the complete video data to be identified in the step 101 to obtain the target video data, data preparation is done for analyzing the video scene in this step.
  • step 102 the video scene is analyzed through a preset image recognition model, it can be determined that the target video data contains the type of the video scene, and the video frames contained in the different types of video scenes can be obtained. Confidence.
  • the confidence level of the video frame refers to a probability value indicating that the video frame is a video frame in the video scene.
  • the higher the obtained confidence the greater the probability that the video frame is the video frame in the video scene, that is, the higher the probability; on the contrary, the lower the obtained confidence, the lower the video frame is in the video scene.
  • the smaller the probability value of the video frame the smaller the probability.
  • the complete video data is segmented according to the different video scenes contained in the complete video data (such as a video recording) to be detected, and the target video data obtained Usually contains at least one video frame of the video scene. That is, the target video data obtained through the segmentation process may include video frames of multiple shots.
  • Video frame is the probability value of the video frame in the video scene of the corresponding category.
  • the preset image recognition model described in this application may refer to a video scene classifier based on a neural network
  • the training of the video scene classifier may be by collecting different types of sample images, Extract the image features from the sample images, and then use the extracted image features and sample image types to train the video scene classifier.
  • Common video scene classifiers include Support Vector Machine (SVM) and Bayesian classifiers. Logistic regression classifiers, etc., are not specifically limited here.
  • Step S103 Normalize the confidence levels of the video frames included in the video scenes of different categories to obtain weight values of the video scenes of different categories in the target video data.
  • step 102 After obtaining the confidence levels of the video frames respectively contained in the different types of video scenes in the step 102, data preparation is done for the normalization process in this step.
  • step 103 the confidence levels of the video frames included in the video scenes of different categories may be normalized respectively, so as to obtain the weight values of the video scenes of different categories in the target video data.
  • the complete video data is segmented according to the different video scenes included.
  • the obtained target video data can include video frames of three different video scenes: A, B, and C.
  • three different video scene categories of A, B, and C may be obtained. Assuming that a category A video scene contains 5 video frames, the confidence of each video frame is 0.6, 0.5, 0.4, 0.4, 0.6; the B category video scene contains 3 video frames, and the confidence of each video frame is 0.6, 0.1 , 0.5; C category video scene contains 2 video frames, and the confidence of each video frame is 0.3 and 0.3 in turn.
  • a weighted average algorithm is used to normalize the confidence levels of the video frames included in the video scenes of different categories to obtain the weighted average of each video scene. Specifically, the confidence levels of 0.6, 0.5, 0.4, 0.4, and 0.6 corresponding to the 5 video frames are added together, and then divided by the number of video frames of 5 to obtain a weighted average value of 0.5. Similarly, it can be obtained that the weighted average value of category B video scenes is 0.4, and the weighted average value of category C video scenes is 0.3.
  • the weighted average value is the weight value of the different types of video scenes in the target video data described in this application. That is: the weight value of the video scene in category A in the target video data is 0.5; the weight value of the video scene in category B in the target video data is 0.4; the video scene in category C is in The weight value in the target video data is 0.3.
  • the video frame with too low confidence may be the result of interference from other factors
  • the confidence threshold is set to 0.5, and then the weighted average value of each video scene is calculated according to the video frames that meet or exceed the confidence threshold. For example: at this time, the video scene of category A only considers the video frames with confidence levels of 0.6, 0.5 and 0.6, the video scene of category B only considers the video frames with confidence levels of 0.6 and 0.5, and the video scene of category C There are no video frames that exceed the confidence threshold.
  • the weighted average of video scenes in category A is (0.6+0.5+0.5)/3
  • the weighted average of video scenes in category B is (0.6+0.5)/2
  • the weighted average of video scenes in category C is 0 .
  • 3-6 video frames with the highest score can be selected from each category for calculation.
  • Step S104 Obtain parameter information input by the user, obtain a target video scene from the video scene according to the parameter information and the size of the weight value, and return the target video scene to the client.
  • step 103 After obtaining the weight values of the video scenes of different categories in the target video data in the step 103, data preparation is done to obtain the target video scene from the video scenes in this step.
  • the target video scene may be obtained from the video scene according to the parameter information input by the user and the size of the weight value, and the target video scene may be returned to the client.
  • the target parameter value can be extracted from the parameter information input by the user.
  • the weight value can be obtained from the video scene.
  • the first video scene that exceeds the preset weight threshold the first video scene is used as the target video scene.
  • the first video scene includes at least one of the video scenes.
  • the target parameter value N input by the user is 2
  • the weight values in the target video data in descending order are: A category video scene>B category video scene>C category video scene.
  • the preset weight threshold may be set to 3, and the first video scene (ie, category A video scene and category B video scene) whose weight value reaches or exceeds 3 is obtained from the video scene.
  • the category A video scene and the category B video scene are used as the target video scene for identifying the video data.
  • the preset weight threshold can be set to 5, and the first video scene with the weight value reaching or exceeding 5 (ie: category A video) is obtained from the video scene. Scenes).
  • a category video scene is used as the target video scene for identifying the video data.
  • the target parameter value is greater than the number of types of video scenes included in the target video data, and if so, an alarm prompt information is returned to the client. That is: when the target parameter value N input by the user is 5, the target parameter value N is 5 greater than the number of types of video scenes contained in the target video data 3. At this time, an alert message is returned to the client to remind user.
  • the confidence levels of the video frames contained in different types of video scenes can be obtained through a preset image recognition model, and then normalized processing is performed to obtain the video scene
  • the weight value in the target video data is compared with the size of the weight value to quickly determine the target video scene corresponding to the target video data, which improves the collection efficiency and accuracy of the target video scene, thereby improving the user experience.
  • this application also provides a device for identifying video scenes in video data. Since the embodiment of the device is similar to the above method embodiment, the description is relatively simple. For related details, please refer to the description of the above method embodiment section.
  • the embodiment of the device for identifying video scenes in video data described below is only Schematic. Please refer to FIG. 2, which is a schematic diagram of an apparatus for identifying video scenes in video data provided by an embodiment of the application.
  • the device for identifying video scenes in video data described in this application includes the following parts:
  • the segmentation processing unit 201 is configured to segment the complete video data according to different video scenes contained in the complete video data to be detected to obtain target video data; wherein, the target video data includes at least one of the video scenes Video frame.
  • the complete video data may include at least two video scenes.
  • at least two pieces of video data ie, target video data
  • Each piece of video data can contain several video frames. Due to the immaturity of the current video data segmentation, the two pieces of video data obtained after the segmentation process may be mixed with some video frames in the same video scene. Therefore, the target video data includes at least one of the video scenes.
  • Video frames at this time, when the existing neural network is used to identify the target video data, the misrecognition rate is relatively high, and further processing is required.
  • segmentation of the complete video data according to the different video scenes contained in the complete video data to be detected to obtain the target video data can be specifically implemented in the following ways:
  • Obtaining the complete video data to be detected, obtaining the color features of the video scene in the complete video data through a feature extraction algorithm, and obtaining the color features of the video frame in the complete video data through the feature extraction algorithm, Determine the adjacent video scenes based on the first color feature difference between two adjacent video scenes in the complete video data and the second color feature difference between two adjacent video frames in the complete video data.
  • the switching position between the two, the complete video data is segmented according to the switching position.
  • the video scene recognition unit 202 is configured to determine the type of the video scene contained in the target video data through a preset image recognition model, and obtain the confidence levels of the video frames contained in the video scenes of different types; wherein,
  • the image recognition model is a deep neural network model that classifies the video scene included in the target video data according to the feature information of the video scene included in the target video data.
  • the confidence level of the video frame refers to the probability value that the video frame is a video frame corresponding to the video scene.
  • the complete video data is segmented according to the different video scenes contained in the complete video data to be detected, and the obtained target video data includes at least one of the videos.
  • the video frame of the scene Therefore, by identifying and classifying the video scenes contained in the target video data through the preset image recognition model, it is possible to obtain at least one video scene category and the confidence that the video scenes of different categories respectively contain video frames.
  • the preset image recognition model described in this application may refer to a video scene classifier based on a neural network
  • the training of the video scene classifier may be by collecting different types of sample images, Extract the image features from the sample images, and then use the extracted image features and sample image types to train the video scene classifier.
  • Common video scene classifiers include Support Vector Machine (SVM) and Bayesian classifiers. Logistic regression classifier, etc.
  • the video scene weight analysis unit 203 is configured to normalize the confidence levels of the video frames included in the video scenes of different categories to obtain the weight values of the video scenes of different categories in the target video data.
  • the complete video data is segmented according to the different video scenes included.
  • the obtained target video data may include video frames of three different video scenes: A, B, and C.
  • three different video scene categories of A, B, and C may be obtained. Assuming that a category A video scene contains 5 video frames, the confidence of each video frame is 0.6, 0.5, 0.2, 0.2, 0.5; the B category video scene contains 3 video frames, and the confidence of each video frame is 0.7, 0.3 , 0.5; C category video scene contains 2 video frames, and the confidence of each video frame is 0.3 and 0.3 in turn.
  • a weighted average algorithm is used to normalize the confidence levels of the video frames included in the video scenes of different categories to obtain the weighted average of each video scene. Specifically, the confidence levels of 0.6, 0.5, 0.4, 0.4, and 0.6 corresponding to the 5 video frames are added together, and then divided by the number of video frames of 5 to obtain a weighted average value of 0.5. Similarly, it can be obtained that the weighted average value of category B video scenes is 0.4, and the weighted average value of category C video scenes is 0.3.
  • the weighted average value is the weight value of the different types of video scenes in the target video data described in this application. That is: the weight value of the video scene in category A in the target video data is 0.5; the weight value of the video scene in category B in the target video data is 0.4; the video scene in category C is in The weight value in the target video data is 0.3.
  • the video frame with too low confidence may be the result of interference from other factors
  • the confidence threshold is set to 0.5, and then the weighted average value of each video scene is calculated according to the video frames that meet or exceed the confidence threshold. For example: at this time, the video scene of category A only considers the video frames with confidence levels of 0.6, 0.5 and 0.6, the video scene of category B only considers the video frames with confidence levels of 0.6 and 0.5, and the video scene of category C There are no video frames that exceed the confidence threshold.
  • the weighted average of video scenes in category A is (0.6+0.5+0.5)/3
  • the weighted average of video scenes in category B is (0.6+0.5)/2
  • the weighted average of video scenes in category C is 0 .
  • 3-6 video frames with the highest score can be selected from each category for calculation.
  • the target video data obtaining unit 204 is configured to obtain parameter information input by the user, obtain a target video scene from the video scene according to the parameter information and the size of the weight value, and return the target video scene to the client .
  • the target parameter value can be extracted from the parameter information input by the user.
  • the weight value can be obtained from the video scene.
  • the first video scene that exceeds the preset weight threshold the first video scene is used as the target video scene.
  • the first video scene includes at least one of the video scenes.
  • the target parameter value N input by the user is 2
  • the weight values in the target video data in descending order are: A category video scene>B category video scene>C category video scene.
  • the preset weight threshold may be set to 3, and the first video scene (ie, category A video scene and category B video scene) whose weight value reaches or exceeds 3 is obtained from the video scene.
  • the category A video scene and the category B video scene are used as the target video scene for identifying the video data.
  • the preset weight threshold can be set to 5, and the first video scene with the weight value reaching or exceeding 5 (ie: category A video) is obtained from the video scene. Scenes).
  • a category video scene is used as the target video scene for identifying the video data.
  • the target parameter value is greater than the number of types of video scenes included in the target video data, and if so, an alarm prompt information is returned to the client. That is: when the target parameter value N input by the user is 5, the target parameter value N is 5 greater than the number of types of video scenes contained in the target video data 3. At this time, an alarm message is returned to the client to remind user.
  • the confidence levels of the video frames contained in different types of video scenes can be obtained through a preset image recognition model, and then normalized processing is performed to obtain the video scene in
  • the weight value in the target video data is compared with the size of the weight value to quickly determine the target video scene corresponding to the target video data, which improves the collection efficiency and accuracy of the target video scene, thereby improving the user experience.
  • the present application also provides an electronic device. Since the embodiment of the electronic device is similar to the foregoing method embodiment, the description is relatively simple. For related details, please refer to the description of the foregoing method embodiment section.
  • the electronic device described below is only illustrative. Please refer to FIG. 3, which is a schematic diagram of an electronic device provided by an embodiment of this application.
  • the present application provides an electronic device that specifically includes: a processor 301 and a memory 302; wherein the memory 302 is used to store a program of a method for identifying video scenes in video data, and the device is powered on and runs through the processor 301 After identifying the procedure of the video scene in the video data, perform the following steps:
  • the complete video data is segmented to obtain target video data; wherein, the target video data includes at least one video frame of the video scene;
  • the image recognition model determines that the target video data includes the type of the video scene, and obtains the confidence levels of the video frames contained in the different types of the video scenes; wherein the image recognition model is based on the target video data
  • a deep neural network model that includes the feature information of the video scene to classify the video scenes included in the target video data; and normalizes the confidence levels of the video frames included in the video scenes of different categories, Obtain the weight values of the video scenes of different categories in the target video data; obtain the parameter information input by the user, obtain the target video scene from the video scene according to the parameter information and the size of the weight value, and Return the target video scene to the client.
  • this application also provides a storage device, including: a program storing a method for identifying a video scene in video data, the program being run by a processor, and performing the following steps:
  • the complete video data is segmented to obtain target video data; wherein, the target video data includes at least one video frame of the video scene;
  • the image recognition model determines that the target video data includes the type of the video scene, and obtains the confidence levels of the video frames contained in the different types of the video scenes; wherein the image recognition model is based on the target video data
  • a deep neural network model that includes the feature information of the video scene to classify the video scenes included in the target video data; and normalizes the confidence levels of the video frames included in the video scenes of different categories, Obtain the weight values of the video scenes of different categories in the target video data; obtain the parameter information input by the user, obtain the target video scene from the video scene according to the parameter information and the size of the weight value, and Return the target video scene to the client.
  • the processor or the processor module may be an integrated circuit chip with signal processing capability.
  • the processor can be a general-purpose processor, a digital signal processor (DSP), an application specific integrated circuit (ASIC), a field programmable gate array (Field Programmable Gate Array, FPGA) or other Programmable logic devices, discrete gate or transistor logic devices, discrete hardware components.
  • DSP digital signal processor
  • ASIC application specific integrated circuit
  • FPGA field programmable gate array
  • Programmable logic devices discrete gate or transistor logic devices, discrete hardware components.
  • the methods, steps, and logical block diagrams disclosed in the embodiments of the present application can be implemented or executed.
  • the general-purpose processor may be a microprocessor or the processor may also be any conventional processor or the like.
  • the steps of the method disclosed in the embodiments of the present application may be directly embodied as being executed and completed by a hardware decoding processor, or executed and completed by a combination of hardware and software modules in the decoding processor.
  • the software module can be located in a mature storage medium in the field such as random access memory, flash memory, read-only memory, programmable read-only memory, or electrically erasable programmable memory, registers.
  • the processor reads the information in the storage medium and completes the steps of the above method in combination with its hardware.
  • the storage medium may be a memory, for example, may be a volatile memory or a non-volatile memory, or may include both volatile and non-volatile memory.
  • the non-volatile memory can be a read-only memory (Read-Only Memory, ROM for short), a programmable read-only memory (Programmable ROM, PROM for short), and an erasable programmable read-only memory (Erasable PROM, EPROM for short). , Electrically Erasable Programmable Read-Only Memory (Electrically EPROM, EEPROM for short) or flash memory.
  • Read-Only Memory Read-Only Memory
  • PROM programmable read-only memory
  • Erasable PROM Erasable PROM, EPROM for short
  • Electrically Erasable Programmable Read-Only Memory Electrically Erasable Programmable Read-Only Memory (Electrically EPROM, EEPROM for short) or flash memory.
  • the volatile memory may be a random access memory (Random Access Memory, RAM for short), which is used as an external cache.
  • RAM Random Access Memory
  • many forms of RAM are available, such as static random access memory (Static RAM, SRAM for short), dynamic random access memory (Dynamic RAM, DRAM for short), and synchronous dynamic random access memory (Synchronous RAM).
  • DRAM static random access memory
  • DRAM dynamic random access memory
  • DRAM dynamic random access memory
  • Synchronous RAM Synchronous Dynamic Random Access Memory
  • DRAM Double Data Rate Synchronous Dynamic Random Access Memory
  • Enhanced SDRAM for ESDRAM
  • Synch link DRAM, SLDRAM for short Synchronously Connected Dynamic Random Access Memory
  • DRRAM Direct Ram bus RAM
  • the storage media described in the embodiments of the present application are intended to include, but are not limited to, these and any other suitable types of memory.
  • the functions described in this application can be implemented by a combination of hardware and software.
  • the corresponding function can be stored in a computer-readable medium or transmitted as one or more instructions or codes on the computer-readable medium.
  • the computer-readable medium includes a computer storage medium and a communication medium, where the communication medium includes any medium that facilitates the transfer of a computer program from one place to another.
  • the storage medium may be any available medium that can be accessed by a general-purpose or special-purpose computer.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Image Analysis (AREA)

Abstract

Disclosed are a method and apparatus for recognizing a video scene in video data. The method comprises: determining the types of video scenes comprised in target video data by means of a preset image recognition model, and obtaining confidence levels of video frames respectively comprised in different types of the video scenes; respectively performing normalization processing on the confidence levels of the video frames comprised in the different types of the video scenes to obtain weight values of the different types of the video scenes in the target video data; in addition, obtaining parameter information inputted by a user, determining a target video scene from the video scenes according to the parameter information and the weight values, and returning the target video scene to a client. By adopting the method according to the present application, the target video scene corresponding to the target video data can be quickly recognized by comparing the weight values of the video scenes in the target video data, so as to improve the recognition efficiency and accuracy of the target video scene.

Description

一种识别视频数据中视频场景的方法和装置Method and device for identifying video scene in video data
本申请要求于2018年06月17日提交中国专利局,申请号为CN201910522913.2,申请名称为“一种识别视频数据中视频场景的方法和装置”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application claims the priority of a Chinese patent application filed with the Chinese Patent Office on June 17, 2018. The application number is CN201910522913.2, and the application title is "A method and device for identifying video scenes in video data". The entire content of the Chinese patent application Incorporated in this application by reference.
技术领域Technical field
本申请实施例涉及视频数据处理技术领域,具体涉及一种识别视频数据中视频场景的方法和装置,另外,还涉及一种电子设备和存储设备。The embodiments of the present application relate to the technical field of video data processing, in particular to a method and apparatus for identifying video scenes in video data, and in addition, to an electronic device and a storage device.
背景技术Background technique
随着科学技术的快速发展,各种视频资料越来越丰富,一段完整的视频数据往往包含多个视频场景,视频数据的视频场景识别是一个比较常见的问题,但是,要想实现对所述视频数据中视频场景的精准识别却比较困难。因此,如何提高视频场景识别的准确度,降低目标视频的视频场景误识率成为本领域技术人员急需解决的技术问题。With the rapid development of science and technology, various video materials are becoming more and more abundant. A complete video data often contains multiple video scenes. The video scene recognition of video data is a relatively common problem. However, if you want to achieve It is difficult to accurately identify video scenes in video data. Therefore, how to improve the accuracy of video scene recognition and reduce the misrecognition rate of the video scene of the target video has become an urgent technical problem for those skilled in the art.
为了解决上述技术问题,现有技术中通常采用的视频场景识别办法是提取视频数据中所包含视频帧的图像特征来表征视频数据,然后基于预设的分类器对视频数据的图像特征进行识别和分类,不同的类别对应不同的视频场景,从而实现了对视频数据中视频场景的识别。然而,上述视频场景识别方法容易受到分类器性能的影响,导致对视频场景的分类准确度仍然不高,无法满足当前用户的需求。In order to solve the above technical problems, the video scene recognition method usually used in the prior art is to extract the image features of the video frames contained in the video data to characterize the video data, and then identify and identify the image features of the video data based on a preset classifier Classification, different categories correspond to different video scenes, so as to realize the recognition of video scenes in video data. However, the above video scene recognition method is easily affected by the performance of the classifier, resulting in the classification accuracy of the video scene is still not high, and cannot meet the needs of current users.
发明内容Summary of the invention
为此,本申请实施例提供一种识别视频数据中视频场景的方法和装置,以解决现有技术中存在的由于当前视频场景切分技术不够成熟,导 致的目标场景提取不够精确的问题。To this end, the embodiments of the present application provide a method and device for recognizing video scenes in video data to solve the problem in the prior art that the current video scene segmentation technology is not mature enough, and the target scene extraction is not accurate enough.
为了实现上述目的,本申请实施例提供如下技术方案:In order to achieve the foregoing objectives, the embodiments of the present application provide the following technical solutions:
根据本申请实施例提供的一种识别视频数据中视频场景的方法,包括:按照待检测的完整视频数据所包含视频场景的不同对所述完整视频数据进行分割处理,获得目标视频数据;其中,所述目标视频数据包含至少一个所述视频场景的视频帧;通过预设的图像识别模型,确定所述目标视频数据包含所述视频场景的种类,并获得不同种类的所述视频场景分别包含的视频帧的置信度;其中,所述图像识别模型为根据所述目标视频数据包含所述视频场景的特征信息对所述目标视频数据包含的所述视频场景进行分类的深度神经网络模型;将不同类别的所述视频场景所包含视频帧的置信度分别做归一化处理,获得不同类别的所述视频场景在所述目标视频数据中的权重值;获得用户输入的参数信息,根据所述参数信息和所述权重值的大小从所述视频场景中获得目标视频场景,并将所述目标视频场景返回至客户端。According to an embodiment of the present application, a method for identifying video scenes in video data includes: performing segmentation processing on the complete video data according to different video scenes included in the complete video data to be detected to obtain target video data; wherein, The target video data includes at least one video frame of the video scene; through a preset image recognition model, it is determined that the target video data includes the type of the video scene, and the different types of video scenes respectively included The confidence level of the video frame; wherein the image recognition model is a deep neural network model that classifies the video scene contained in the target video data according to the feature information of the video scene contained in the target video data; The confidence levels of the video frames contained in the video scenes of the categories are respectively normalized to obtain the weight values of the video scenes of different categories in the target video data; the parameter information input by the user is obtained according to the parameters The information and the size of the weight value obtain a target video scene from the video scene, and return the target video scene to the client.
进一步的,所述根据所述参数信息和所述权重值的大小从所述视频场景中获得目标视频场景,具体包括:从所述参数信息中提取目标参数值;根据所述目标参数值和所述权重值的大小,从所述视频场景中获得所述权重值达到或者超过预设权重阈值的第一视频场景,将所述第一视频场景作为所述目标视频场景;其中,所述第一视频场景包含至少一个所述视频场景。Further, the obtaining a target video scene from the video scene according to the size of the parameter information and the weight value specifically includes: extracting a target parameter value from the parameter information; and according to the target parameter value and the weight value. The size of the weight value, the first video scene whose weight value reaches or exceeds a preset weight threshold is obtained from the video scene, and the first video scene is taken as the target video scene; wherein, the first video scene The video scene includes at least one of the video scenes.
进一步的,所述的视频场景识别方法,还包括:从所述参数信息中提取目标参数值;判断所述目标参数值是否大于所述目标视频数据所包含所述视频场景的种类数量,若是,则向所述客户端返回告警提示信息。Further, the video scene recognition method further includes: extracting a target parameter value from the parameter information; judging whether the target parameter value is greater than the number of types of the video scenes contained in the target video data, if so, Then, the alarm prompt information is returned to the client.
进一步的,所述视频帧的置信度是指所述视频帧为所述视频场景对应的视频帧的概率值。Further, the confidence of the video frame refers to the probability value that the video frame is a video frame corresponding to the video scene.
进一步的,所述按照待检测的完整视频数据所包含视频场景的不同对所述完整视频数据进行分割处理,获得目标视频数据,具体包括:获得待检测的所述完整视频数据;通过特征提取算法获得所述完整视频数 据中所述视频场景的颜色特征;通过所述特征提取算法获得所述完整视频数据中所述视频帧的颜色特征;根据所述完整视频数据中相邻两个所述视频场景的第一颜色特征差异和所述完整视频数据中相邻两个视频帧之间的第二颜色特征差异,判断相邻所述视频场景之间的切换位置;根据所述切换位置对所述完整视频数据进行分割处理。Further, the segmentation of the complete video data according to different video scenes included in the complete video data to be detected to obtain target video data specifically includes: obtaining the complete video data to be detected; and using a feature extraction algorithm Obtain the color features of the video scene in the complete video data; Obtain the color features of the video frame in the complete video data through the feature extraction algorithm; According to two adjacent videos in the complete video data The first color feature difference of the scene and the second color feature difference between two adjacent video frames in the complete video data, determine the switching position between the adjacent video scenes; The complete video data is segmented.
相应的,本申请还提供一种识别视频数据中视频场景的装置,包括:分割处理单元,用于按照待检测的完整视频数据所包含视频场景的不同对所述完整视频数据进行分割处理,获得目标视频数据;其中,所述目标视频数据包含至少一个所述视频场景的视频帧;视频场景识别单元,用于通过预设的图像识别模型,确定所述目标视频数据包含所述视频场景的种类,并获得不同种类的所述视频场景分别包含的视频帧的置信度;其中,所述图像识别模型为根据所述目标视频数据包含所述视频场景的特征信息对所述目标视频数据包含的所述视频场景进行分类的深度神经网络模型;视频场景权重分析单元,用于将不同类别的所述视频场景所包含视频帧的置信度分别做归一化处理,获得不同类别的所述视频场景在所述目标视频数据中的权重值;目标视频数据获得单元,用于获得用户输入的参数信息,根据所述参数信息和所述权重值的大小从所述视频场景中获得目标视频场景,并将所述目标视频场景返回至客户端。Correspondingly, the present application also provides a device for identifying video scenes in video data, including: a segmentation processing unit, configured to perform segmentation processing on the complete video data according to different video scenes contained in the complete video data to be detected, to obtain Target video data; wherein the target video data includes at least one video frame of the video scene; the video scene recognition unit is configured to determine that the target video data includes the type of the video scene through a preset image recognition model , And obtain the confidence levels of the video frames respectively contained in the different types of the video scenes; wherein, the image recognition model is based on the target video data containing the feature information of the video scene to the target video data. The deep neural network model for classifying the video scenes; the video scene weight analysis unit is used to normalize the confidence levels of the video frames contained in the video scenes of different categories to obtain the video scenes of different categories. The weight value in the target video data; the target video data obtaining unit is configured to obtain parameter information input by the user, obtain the target video scene from the video scene according to the parameter information and the size of the weight value, and combine The target video scene is returned to the client.
进一步的,所述分割处理单元具体用于:从所述参数信息中提取目标参数值;根据所述目标参数值和所述权重值的大小,从所述视频场景中获得所述权重值达到或者超过预设权重阈值的第一视频场景,将所述第一视频场景作为所述目标视频场景;其中,所述第一视频场景包含至少一个所述视频场景。Further, the segmentation processing unit is specifically configured to: extract a target parameter value from the parameter information; according to the size of the target parameter value and the weight value, obtain the weight value from the video scene to reach or For a first video scene that exceeds a preset weight threshold, use the first video scene as the target video scene; wherein, the first video scene includes at least one of the video scenes.
进一步的,所述目标视频数据获得单元具体用于:获得待检测的所述完整视频数据;通过特征提取算法获得所述完整视频数据中所述视频场景的颜色特征;通过所述特征提取算法获得所述完整视频数据中所述视频帧的颜色特征;根据所述完整视频数据中相邻两个所述视频场景的第一颜色特征差异和所述完整视频数据中相邻两个视频帧之间的第二颜 色特征差异,判断相邻所述视频场景之间的切换位置;根据所述切换位置对所述完整视频数据进行分割处理。Further, the target video data obtaining unit is specifically configured to: obtain the complete video data to be detected; obtain the color features of the video scene in the complete video data through a feature extraction algorithm; obtain through the feature extraction algorithm The color feature of the video frame in the complete video data; according to the first color feature difference of two adjacent video scenes in the complete video data and the difference between two adjacent video frames in the complete video data The second color feature difference is determined to determine the switching position between the adjacent video scenes; the complete video data is segmented according to the switching position.
相应的,本申请还提供一种电子设备,包括:处理器和存储器;其中,所述存储器用于存储识别视频数据中视频场景的方法的程序,该设备通电并通过所述处理器运行该识别视频数据中视频场景的方法的程序后,执行下述步骤:Correspondingly, the present application also provides an electronic device, including: a processor and a memory; wherein the memory is used to store a program of a method for identifying video scenes in video data, and the device is powered on and runs the identification through the processor. After the procedure of the method of the video scene in the video data, perform the following steps:
按照待检测的完整视频数据所包含视频场景的不同对所述完整视频数据进行分割处理,获得目标视频数据;其中,所述目标视频数据包含至少一个所述视频场景的视频帧;通过预设的图像识别模型,确定所述目标视频数据包含所述视频场景的种类,并获得不同种类的所述视频场景分别包含的视频帧的置信度;其中,所述图像识别模型为根据所述目标视频数据包含所述视频场景的特征信息对所述目标视频数据包含的所述视频场景进行分类的深度神经网络模型;将不同类别的所述视频场景所包含视频帧的置信度分别做归一化处理,获得不同类别的所述视频场景在所述目标视频数据中的权重值;获得用户输入的参数信息,根据所述参数信息和所述权重值的大小从所述视频场景中获得目标视频场景,并将所述目标视频场景返回至客户端。According to the different video scenes included in the complete video data to be detected, the complete video data is segmented to obtain target video data; wherein, the target video data includes at least one video frame of the video scene; The image recognition model determines that the target video data includes the type of the video scene, and obtains the confidence levels of the video frames contained in the different types of the video scenes; wherein the image recognition model is based on the target video data A deep neural network model that includes the feature information of the video scene to classify the video scenes included in the target video data; and normalizes the confidence levels of the video frames included in the video scenes of different categories, Obtain the weight values of the video scenes of different categories in the target video data; obtain the parameter information input by the user, obtain the target video scene from the video scene according to the parameter information and the size of the weight value, and Return the target video scene to the client.
相应的,本申请还提供一种存储设备,存储有识别视频数据中视频场景的方法的程序,该程序被处理器运行,执行下述步骤:Correspondingly, the present application also provides a storage device that stores a program of a method for identifying video scenes in video data, and the program is run by a processor to perform the following steps:
按照待检测的完整视频数据所包含视频场景的不同对所述完整视频数据进行分割处理,获得目标视频数据;其中,所述目标视频数据包含至少一个所述视频场景的视频帧;通过预设的图像识别模型,确定所述目标视频数据包含所述视频场景的种类,并获得不同种类的所述视频场景分别包含的视频帧的置信度;其中,所述图像识别模型为根据所述目标视频数据包含所述视频场景的特征信息对所述目标视频数据包含的所述视频场景进行分类的深度神经网络模型;将不同类别的所述视频场景所包含视频帧的置信度分别做归一化处理,获得不同类别的所述视频场景在所述目标视频数据中的权重值;获得用户输入的参数信息,根据所 述参数信息和所述权重值的大小从所述视频场景中获得目标视频场景,并将所述目标视频场景返回至客户端。According to the different video scenes included in the complete video data to be detected, the complete video data is segmented to obtain target video data; wherein, the target video data includes at least one video frame of the video scene; The image recognition model determines that the target video data includes the type of the video scene, and obtains the confidence levels of the video frames contained in the different types of the video scenes; wherein the image recognition model is based on the target video data A deep neural network model that includes the feature information of the video scene to classify the video scenes included in the target video data; and normalizes the confidence levels of the video frames included in the video scenes of different categories, Obtain the weight values of the video scenes of different categories in the target video data; obtain the parameter information input by the user, obtain the target video scene from the video scene according to the parameter information and the size of the weight value, and Return the target video scene to the client.
采用本申请所述的识别视频数据中视频场景的方法,能够通过预设的图像识别模型获得不同种类的所述视频场景分别包含的视频帧的置信度,进而做归一化处理获得视频场景在目标视频数据中的权重值,通过比较权重值大小快速确定目标视频数据对应的目标视频场景,提高了目标视频场景的采集效率和精确度,从而提高了用户的使用体验。Using the method for identifying video scenes in video data described in this application, the confidence levels of the video frames contained in different types of video scenes can be obtained through a preset image recognition model, and then normalized processing is performed to obtain the video scene The weight value in the target video data is compared with the size of the weight value to quickly determine the target video scene corresponding to the target video data, which improves the collection efficiency and accuracy of the target video scene, thereby improving the user experience.
附图说明Description of the drawings
为了更清楚地说明本申请的实施方式或现有技术中的技术方案,下面将对实施方式或现有技术描述中所需要使用的附图作简单地介绍。显而易见地,下面描述中的附图仅仅是示例性的,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据提供的附图引伸获得其它的实施附图。In order to more clearly describe the embodiments of the present application or the technical solutions in the prior art, the following will briefly introduce the drawings that need to be used in the description of the embodiments or the prior art. Obviously, the drawings in the following description are only exemplary. For those of ordinary skill in the art, other implementation drawings can be derived from the provided drawings without creative work.
本说明书所绘示的结构、比例、大小等,均仅用以配合说明书所揭示的内容,以供熟悉此技术的人士了解与阅读,并非用以限定本申请可实施的限定条件,故不具技术上的实质意义,任何结构的修饰、比例关系的改变或大小的调整,在不影响本申请所能产生的功效及所能达成的目的下,均应仍落在本申请所揭示的技术内容得能涵盖的范围内。The structure, ratio, size, etc. shown in this manual are only used to match the content disclosed in the manual for people who are familiar with this technology to understand and read. They are not used to limit the implementation conditions of this application, so they are not technical. The substantive meaning of the above, any structural modification, proportional relationship change or size adjustment, without affecting the effects and objectives that can be achieved by this application, should still fall within the technical content disclosed in this application. Can cover the range.
图1为本申请实施例提供的一种识别视频数据中视频场景的方法的流程图;FIG. 1 is a flowchart of a method for identifying video scenes in video data provided by an embodiment of the application;
图2为本申请实施例提供的一种识别视频数据中视频场景的装置的示意图;2 is a schematic diagram of a device for identifying video scenes in video data provided by an embodiment of the application;
图3为本申请实施例提供的一种电子设备的示意图。FIG. 3 is a schematic diagram of an electronic device provided by an embodiment of the application.
具体实施方式Detailed ways
以下由特定的具体实施例说明本申请的实施方式,熟悉此技术的人士可由本说明书所揭露的内容轻易地了解本申请的其他优点及功效,显然,所描述的实施例是本申请一部分实施例,而不是全部的实施例。基 于本申请中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。The following specific examples illustrate the implementation of this application. Those familiar with this technology can easily understand the other advantages and effects of this application from the content disclosed in this specification. Obviously, the described examples are part of this application. , Not all examples. Based on the embodiments in this application, all other embodiments obtained by those of ordinary skill in the art without creative work shall fall within the protection scope of this application.
下面基于本申请所述的识别视频数据中视频场景的方法,对其实施例进行详细描述。如图1所示,其为本申请实施例提供的一种识别视频数据中视频场景的方法的流程图,具体实现过程包括以下步骤:The following describes the embodiments in detail based on the method for identifying video scenes in video data described in this application. As shown in FIG. 1, it is a flowchart of a method for identifying a video scene in video data provided by an embodiment of the application. The specific implementation process includes the following steps:
步骤S101:按照待检测的完整视频数据所包含视频场景的不同对所述完整视频数据进行分割处理,获得目标视频数据;其中,所述目标视频数据包含至少一个所述视频场景的视频帧。Step S101: Perform segmentation processing on the complete video data according to different video scenes included in the complete video data to be detected to obtain target video data; wherein the target video data includes at least one video frame of the video scene.
在本申请实施例中,所述的完整视频数据可以包含至少两个视频场景。相应的,按照所述完整视频数据所包含视频场景的不同对所述完整视频数据进行分割处理可以获得至少两段视频数据(即:目标视频数据)。每段视频数据可以包含若干个视频帧。由于当前视频数据切分的不成熟,导致分割处理后获得的两段视频数据中可能掺杂了一些非同一视频场景中的视频帧,因此,所述目标视频数据包含至少一个所述视频场景的视频帧,此时利用现有的神经网络对目标视频数据进行识别时误识率较高,需要进一步进行处理。In the embodiment of the present application, the complete video data may include at least two video scenes. Correspondingly, at least two pieces of video data (ie, target video data) can be obtained by performing segmentation processing on the complete video data according to different video scenes included in the complete video data. Each piece of video data can contain several video frames. Due to the immaturity of the current video data segmentation, the two pieces of video data obtained after the segmentation process may be mixed with some video frames in the same video scene. Therefore, the target video data includes at least one of the video scenes. Video frames, at this time, when the existing neural network is used to identify the target video data, the misrecognition rate is relatively high, and further processing is required.
需要说明的是,所述的按照待检测的完整视频数据所包含视频场景的不同对所述完整视频数据进行分割处理,进而获得目标视频数据,具体可以通过如下方式实现:It should be noted that the segmentation of the complete video data according to the different video scenes contained in the complete video data to be detected to obtain the target video data can be specifically implemented in the following ways:
获得待检测的所述完整视频数据,通过特征提取算法获得所述完整视频数据中所述视频场景的颜色特征,通过所述特征提取算法获得所述完整视频数据中所述视频帧的颜色特征,根据所述完整视频数据中相邻两个所述视频场景的第一颜色特征差异和所述完整视频数据中相邻两个视频帧之间的第二颜色特征差异,判断相邻所述视频场景之间的切换位置,根据所述切换位置对所述完整视频数据进行分割处理。Obtaining the complete video data to be detected, obtaining the color features of the video scene in the complete video data through a feature extraction algorithm, and obtaining the color features of the video frame in the complete video data through the feature extraction algorithm, Determine the adjacent video scenes based on the first color feature difference between two adjacent video scenes in the complete video data and the second color feature difference between two adjacent video frames in the complete video data The switching position between the two, the complete video data is segmented according to the switching position.
步骤S102:通过预设的图像识别模型,确定所述目标视频数据包含所述视频场景的种类,并获得不同种类的所述视频场景分别包含的视频帧的置信度;其中,所述图像识别模型用于根据所述目标视频数据包含 所述视频场景的特征信息,对所述目标视频数据包含的所述视频场景进行分类的深度神经网络模型。Step S102: Determine the type of the video scene included in the target video data through a preset image recognition model, and obtain the confidence levels of the video frames respectively included in the video scenes of different types; wherein, the image recognition model A deep neural network model used to classify the video scene contained in the target video data according to the feature information of the video scene contained in the target video data.
在所述步骤101中对待识别的完整视频数据进行分割处理获得目标视频数据之后,为本步骤中对所述视频场景进行分析做了数据准备工作。在步骤102中,通过预设的图像识别模型对所述视频场景进行分析,可以确定所述目标视频数据包含所述视频场景的种类,并获得不同种类的所述视频场景分别包含的视频帧的置信度。After segmenting the complete video data to be identified in the step 101 to obtain the target video data, data preparation is done for analyzing the video scene in this step. In step 102, the video scene is analyzed through a preset image recognition model, it can be determined that the target video data contains the type of the video scene, and the video frames contained in the different types of video scenes can be obtained. Confidence.
在本申请的实施例中,所述视频帧的置信度是指表示所述视频帧为所述视频场景中的视频帧的概率值。获得的置信度越高表示该视频帧是所述视频场景中的视频帧的概率值越大,即可能性越高;相反,获得的置信度越低表示该视频帧是所述视频场景中的视频帧的概率值越小,即可能性越小。In the embodiment of the present application, the confidence level of the video frame refers to a probability value indicating that the video frame is a video frame in the video scene. The higher the obtained confidence, the greater the probability that the video frame is the video frame in the video scene, that is, the higher the probability; on the contrary, the lower the obtained confidence, the lower the video frame is in the video scene. The smaller the probability value of the video frame, the smaller the probability.
在实际实施过程中,由于当前视频数据切分的不成熟,按照待检测的完整视频数据(比如一段视频录像)所包含视频场景的不同对所述完整视频数据进行分割处理,获得的目标视频数据通常包含至少一个所述视频场景的视频帧。即:通过分割处理获得的所述目标视频数据中可能包括多个镜头的视频帧。In the actual implementation process, due to the immaturity of current video data segmentation, the complete video data is segmented according to the different video scenes contained in the complete video data (such as a video recording) to be detected, and the target video data obtained Usually contains at least one video frame of the video scene. That is, the target video data obtained through the segmentation process may include video frames of multiple shots.
因此,通过所述预设的图像识别模型对所述目标视频数据包含的视频场景进行识别分类,可能获得至少一个视频场景类别,以及所述不同类别的视频场景分别包含视频帧的置信度,即:视频帧是对应类别的视频场景中视频帧的概率值。Therefore, by identifying and classifying the video scenes contained in the target video data through the preset image recognition model, it is possible to obtain at least one video scene category, and the confidence that the video scenes of different categories respectively contain video frames, that is, : Video frame is the probability value of the video frame in the video scene of the corresponding category.
需要说明的是,本申请所述的预设的图像识别模型可以是指一种基于神经网络的视频场景分类器,该视频场景分类器的训练可以是通过采集不同种类的样本图像,对不同种类的样本图像提取图像特征,然后利用提取的图像特征及样本图像的种类对视频场景分类器进行训练,常见的视频场景分类器有支持向量机(Support Vector Machine,SVM),贝叶斯分类器,逻辑回归分类器等,在此不做具体限定。It should be noted that the preset image recognition model described in this application may refer to a video scene classifier based on a neural network, and the training of the video scene classifier may be by collecting different types of sample images, Extract the image features from the sample images, and then use the extracted image features and sample image types to train the video scene classifier. Common video scene classifiers include Support Vector Machine (SVM) and Bayesian classifiers. Logistic regression classifiers, etc., are not specifically limited here.
步骤S103:将不同类别的所述视频场景所包含视频帧的置信度分别 做归一化处理,获得不同类别的所述视频场景在所述目标视频数据中的权重值。Step S103: Normalize the confidence levels of the video frames included in the video scenes of different categories to obtain weight values of the video scenes of different categories in the target video data.
在所述步骤102中获得不同种类的所述视频场景分别包含的视频帧的置信度之后,为本步骤中的归一化处理过程做了数据准备工作。在步骤103中,可以将不同类别的所述视频场景所包含视频帧的置信度分别做归一化处理,进而获得不同类别的所述视频场景在所述目标视频数据中的权重值。After obtaining the confidence levels of the video frames respectively contained in the different types of video scenes in the step 102, data preparation is done for the normalization process in this step. In step 103, the confidence levels of the video frames included in the video scenes of different categories may be normalized respectively, so as to obtain the weight values of the video scenes of different categories in the target video data.
下面举例说明,在具体实施过程中,按照所包含视频场景的不同对所述完整视频数据进行分割处理,假设获得的目标视频数据可以包含A、B、C三个不同视频场景的视频帧,此时,通过所述预设的图像识别模型对所述目标视频数据包含的视频场景进行识别分类,可能获得A、B、C三个不同视频场景类别。假设A类别视频场景包含5个视频帧,各个视频帧的置信度依次为0.6、0.5、0.4、0.4、0.6;B类别视频场景包含3个视频帧,各个视频帧的置信度依次为0.6、0.1、0.5;C类别视频场景包含2个视频帧,各个视频帧的置信度依次为0.3和0.3。The following example illustrates that in the specific implementation process, the complete video data is segmented according to the different video scenes included. It is assumed that the obtained target video data can include video frames of three different video scenes: A, B, and C. When performing recognition and classification of the video scenes contained in the target video data through the preset image recognition model, three different video scene categories of A, B, and C may be obtained. Assuming that a category A video scene contains 5 video frames, the confidence of each video frame is 0.6, 0.5, 0.4, 0.4, 0.6; the B category video scene contains 3 video frames, and the confidence of each video frame is 0.6, 0.1 , 0.5; C category video scene contains 2 video frames, and the confidence of each video frame is 0.3 and 0.3 in turn.
利用加权平均算法将不同类别的所述视频场景所包含视频帧的置信度分别做归一化处理,获得各个视频场景的加权平均值。具体的,将5个视频帧分别对应的置信度0.6、0.5、0.4、0.4、0.6相加,再除以视频帧数5,得到加权平均值为0.5。同理,可以得到B类别视频场景的加权平均值为0.4,C类别视频场景的加权平均值为0.3。所述加权平均值即为本申请所述的不同类别的所述视频场景在所述目标视频数据中的权重值。即:A类别的所述视频场景在所述目标视频数据中的权重值为0.5;B类别的所述视频场景在所述目标视频数据中的权重值为0.4;C类别的所述视频场景在所述目标视频数据中的权重值为0.3。A weighted average algorithm is used to normalize the confidence levels of the video frames included in the video scenes of different categories to obtain the weighted average of each video scene. Specifically, the confidence levels of 0.6, 0.5, 0.4, 0.4, and 0.6 corresponding to the 5 video frames are added together, and then divided by the number of video frames of 5 to obtain a weighted average value of 0.5. Similarly, it can be obtained that the weighted average value of category B video scenes is 0.4, and the weighted average value of category C video scenes is 0.3. The weighted average value is the weight value of the different types of video scenes in the target video data described in this application. That is: the weight value of the video scene in category A in the target video data is 0.5; the weight value of the video scene in category B in the target video data is 0.4; the video scene in category C is in The weight value in the target video data is 0.3.
在实际实施过程中,由于置信度太低的视频帧可能是因为受到其他因素的干扰得到的结果,因此,在具体操作过程中可以首先设定一个置信度阈值,获得达到或超过该置信度阈值的视频帧进行接下来的计算。例如置信度阈值设定为0.5,然后根据达到或超过该置信度阈值的视频帧, 计算各个视频场景的加权平均值。例如:此时A类别的视频场景只考虑置信度为0.6、0.5和0.6的视频帧即可,B类别的视频场景只考虑置信度为0.6和0.5的视频帧即可,C类别的视频场景中没有超过置信度阈值的视频帧。A类别的视频场景的加权平均值是(0.6+0.5+0.5)/3,B类别的视频场景的加权平均值是(0.6+0.5)/2,C类别的视频场景的加权平均值是为0。为了减少计算量,当每个类别中达到或超过该置信度阈值的视频帧较多时,可以从每个类别中选取最高得分的3-6个视频帧进行计算即可。In the actual implementation process, because the video frame with too low confidence may be the result of interference from other factors, in the specific operation process, you can first set a confidence threshold, and obtain that the confidence threshold is reached or exceeded. Of the video frame for the next calculation. For example, the confidence threshold is set to 0.5, and then the weighted average value of each video scene is calculated according to the video frames that meet or exceed the confidence threshold. For example: at this time, the video scene of category A only considers the video frames with confidence levels of 0.6, 0.5 and 0.6, the video scene of category B only considers the video frames with confidence levels of 0.6 and 0.5, and the video scene of category C There are no video frames that exceed the confidence threshold. The weighted average of video scenes in category A is (0.6+0.5+0.5)/3, the weighted average of video scenes in category B is (0.6+0.5)/2, and the weighted average of video scenes in category C is 0 . In order to reduce the amount of calculation, when there are many video frames that meet or exceed the confidence threshold in each category, 3-6 video frames with the highest score can be selected from each category for calculation.
步骤S104:获得用户输入的参数信息,根据所述参数信息和所述权重值的大小从所述视频场景中获得目标视频场景,并将所述目标视频场景返回至客户端。Step S104: Obtain parameter information input by the user, obtain a target video scene from the video scene according to the parameter information and the size of the weight value, and return the target video scene to the client.
在所述步骤103中获得不同类别的所述视频场景在所述目标视频数据中的权重值之后,为本步骤中从所述视频场景中获得目标视频场景做了数据准备工作。在步骤103中,可以根据用户输入的参数信息和所述权重值的大小从所述视频场景中获得目标视频场景并将所述目标视频场景返回至客户端。After obtaining the weight values of the video scenes of different categories in the target video data in the step 103, data preparation is done to obtain the target video scene from the video scenes in this step. In step 103, the target video scene may be obtained from the video scene according to the parameter information input by the user and the size of the weight value, and the target video scene may be returned to the client.
在本申请实施例中,可以从用户输入的参数信息中提取目标参数值,此时,根据所述目标参数值和所述权重值的大小,可以从所述视频场景中获得所述权重值达到或者超过预设权重阈值的第一视频场景,将所述第一视频场景作为所述目标视频场景。其中,所述第一视频场景包含至少一个所述视频场景。In the embodiment of the present application, the target parameter value can be extracted from the parameter information input by the user. At this time, according to the size of the target parameter value and the weight value, the weight value can be obtained from the video scene. Or the first video scene that exceeds the preset weight threshold, the first video scene is used as the target video scene. Wherein, the first video scene includes at least one of the video scenes.
具体的,假设用户输入的目标参数值N为2,首先判断所述目标参数值N为2是否大于所述目标视频数据所包含所述视频场景的种类数量3,若否,则视频场景在所述目标视频数据中的权重值为从大到小依次为:A类别视频场景>B类别视频场景>C类别视频场景。此时,预设权重阈值可以设为3,从所述视频场景中获得所述权重值达到或者超过3的第一视频场景(即:A类别视频场景和B类别视频场景)。将A类别视频场景和B类别视频场景作为识别该视频数据的目标视频场景。当用户输入的 目标参数值N为1时,此时,预设权重阈值可以设为5,从所述视频场景中获得所述权重值达到或者超过5的第一视频场景(即:A类别视频场景)。将A类别视频场景作为识别该视频数据的目标视频场景。Specifically, assuming that the target parameter value N input by the user is 2, it is first judged whether the target parameter value N is 2 greater than the number of types of video scenes contained in the target video data 3, if not, the video scene is The weight values in the target video data in descending order are: A category video scene>B category video scene>C category video scene. At this time, the preset weight threshold may be set to 3, and the first video scene (ie, category A video scene and category B video scene) whose weight value reaches or exceeds 3 is obtained from the video scene. The category A video scene and the category B video scene are used as the target video scene for identifying the video data. When the target parameter value N input by the user is 1, at this time, the preset weight threshold can be set to 5, and the first video scene with the weight value reaching or exceeding 5 (ie: category A video) is obtained from the video scene. Scenes). A category video scene is used as the target video scene for identifying the video data.
本申请所述的实施例中可以判断所述目标参数值是否大于所述目标视频数据所包含所述视频场景的种类数量,若是,则向所述客户端返回告警提示信息。即:当用户输入的目标参数值N为5时,目标参数值N为5大于所述目标视频数据所包含所述视频场景的种类数量3,此时,向客户端返回告警提示信息,从而提醒用户。In the embodiment described in this application, it can be determined whether the target parameter value is greater than the number of types of video scenes included in the target video data, and if so, an alarm prompt information is returned to the client. That is: when the target parameter value N input by the user is 5, the target parameter value N is 5 greater than the number of types of video scenes contained in the target video data 3. At this time, an alert message is returned to the client to remind user.
采用本申请所述的识别视频数据中视频场景的方法,能够通过预设的图像识别模型获得不同种类的所述视频场景分别包含的视频帧的置信度,进而做归一化处理获得视频场景在目标视频数据中的权重值,通过比较权重值大小快速确定目标视频数据对应的目标视频场景,提高了目标视频场景的采集效率和精确度,从而提高了用户的使用体验。Using the method for identifying video scenes in video data described in this application, the confidence levels of the video frames contained in different types of video scenes can be obtained through a preset image recognition model, and then normalized processing is performed to obtain the video scene The weight value in the target video data is compared with the size of the weight value to quickly determine the target video scene corresponding to the target video data, which improves the collection efficiency and accuracy of the target video scene, thereby improving the user experience.
与上述提供的一种识别视频数据中视频场景的方法相对应,本申请还提供一种识别视频数据中视频场景的装置。由于该装置的实施例相似于上述方法实施例,所以描述的比较简单,相关之处请参见上述方法实施例部分的说明即可,下面描述的识别视频数据中视频场景的装置的实施例仅是示意性的。请参考图2所示,其为本申请实施例提供的一种识别视频数据中视频场景的装置的示意图。Corresponding to the method for identifying video scenes in video data provided above, this application also provides a device for identifying video scenes in video data. Since the embodiment of the device is similar to the above method embodiment, the description is relatively simple. For related details, please refer to the description of the above method embodiment section. The embodiment of the device for identifying video scenes in video data described below is only Schematic. Please refer to FIG. 2, which is a schematic diagram of an apparatus for identifying video scenes in video data provided by an embodiment of the application.
本申请所述的一种识别视频数据中视频场景的装置包括如下部分:The device for identifying video scenes in video data described in this application includes the following parts:
分割处理单元201,用于按照待检测的完整视频数据所包含视频场景的不同对所述完整视频数据进行分割处理,获得目标视频数据;其中,所述目标视频数据包含至少一个所述视频场景的视频帧。The segmentation processing unit 201 is configured to segment the complete video data according to different video scenes contained in the complete video data to be detected to obtain target video data; wherein, the target video data includes at least one of the video scenes Video frame.
在本申请实施例中,所述的完整视频数据可以包含至少两个视频场景。相应的,按照所述完整视频数据所包含视频场景的不同对所述完整视频数据进行分割处理可以获得至少两段视频数据(即:目标视频数据)。每段视频数据可以包含若干个视频帧。由于当前视频数据切分的不成熟,导致分割处理后获得的两段视频数据中可能掺杂了一些非同一视频场景 中的视频帧,因此,所述目标视频数据包含至少一个所述视频场景的视频帧,此时利用现有的神经网络对目标视频数据进行识别时误识率较高,需要进一步进行处理。In the embodiment of the present application, the complete video data may include at least two video scenes. Correspondingly, at least two pieces of video data (ie, target video data) can be obtained by performing segmentation processing on the complete video data according to different video scenes included in the complete video data. Each piece of video data can contain several video frames. Due to the immaturity of the current video data segmentation, the two pieces of video data obtained after the segmentation process may be mixed with some video frames in the same video scene. Therefore, the target video data includes at least one of the video scenes. Video frames, at this time, when the existing neural network is used to identify the target video data, the misrecognition rate is relatively high, and further processing is required.
需要说明的是,所述的按照待检测的完整视频数据所包含视频场景的不同对所述完整视频数据进行分割处理,进而获得目标视频数据,具体可以通过如下方式实现:It should be noted that the segmentation of the complete video data according to the different video scenes contained in the complete video data to be detected to obtain the target video data can be specifically implemented in the following ways:
获得待检测的所述完整视频数据,通过特征提取算法获得所述完整视频数据中所述视频场景的颜色特征,通过所述特征提取算法获得所述完整视频数据中所述视频帧的颜色特征,根据所述完整视频数据中相邻两个所述视频场景的第一颜色特征差异和所述完整视频数据中相邻两个视频帧之间的第二颜色特征差异,判断相邻所述视频场景之间的切换位置,根据所述切换位置对所述完整视频数据进行分割处理。Obtaining the complete video data to be detected, obtaining the color features of the video scene in the complete video data through a feature extraction algorithm, and obtaining the color features of the video frame in the complete video data through the feature extraction algorithm, Determine the adjacent video scenes based on the first color feature difference between two adjacent video scenes in the complete video data and the second color feature difference between two adjacent video frames in the complete video data The switching position between the two, the complete video data is segmented according to the switching position.
视频场景识别单元202,用于通过预设的图像识别模型,确定所述目标视频数据包含所述视频场景的种类,并获得不同种类的所述视频场景分别包含的视频帧的置信度;其中,所述图像识别模型为根据所述目标视频数据包含所述视频场景的特征信息对所述目标视频数据包含的所述视频场景进行分类的深度神经网络模型。The video scene recognition unit 202 is configured to determine the type of the video scene contained in the target video data through a preset image recognition model, and obtain the confidence levels of the video frames contained in the video scenes of different types; wherein, The image recognition model is a deep neural network model that classifies the video scene included in the target video data according to the feature information of the video scene included in the target video data.
在本申请的实施例中,所述视频帧的置信度是指所述视频帧为所述视频场景对应的视频帧的概率值。In the embodiment of the present application, the confidence level of the video frame refers to the probability value that the video frame is a video frame corresponding to the video scene.
在实际实施过程中,由于当前视频数据切分的不成熟,按照待检测的完整视频数据所包含视频场景的不同对所述完整视频数据进行分割处理,获得的目标视频数据包含至少一个所述视频场景的视频帧。因此,通过所述预设的图像识别模型对所述目标视频数据包含的视频场景进行识别分类,可能获得至少一个视频场景类别,以及所述不同类别的视频场景分别包含视频帧的置信度。In the actual implementation process, due to the immaturity of current video data segmentation, the complete video data is segmented according to the different video scenes contained in the complete video data to be detected, and the obtained target video data includes at least one of the videos. The video frame of the scene. Therefore, by identifying and classifying the video scenes contained in the target video data through the preset image recognition model, it is possible to obtain at least one video scene category and the confidence that the video scenes of different categories respectively contain video frames.
需要说明的是,本申请所述的预设的图像识别模型可以是指一种基于神经网络的视频场景分类器,该视频场景分类器的训练可以是通过采集不同种类的样本图像,对不同种类的样本图像提取图像特征,然后利 用提取的图像特征及样本图像的种类对视频场景分类器进行训练,常见的视频场景分类器有支持向量机(Support Vector Machine,SVM),贝叶斯分类器,逻辑回归分类器等。It should be noted that the preset image recognition model described in this application may refer to a video scene classifier based on a neural network, and the training of the video scene classifier may be by collecting different types of sample images, Extract the image features from the sample images, and then use the extracted image features and sample image types to train the video scene classifier. Common video scene classifiers include Support Vector Machine (SVM) and Bayesian classifiers. Logistic regression classifier, etc.
视频场景权重分析单元203,用于将不同类别的所述视频场景所包含视频帧的置信度分别做归一化处理,获得不同类别的所述视频场景在所述目标视频数据中的权重值。The video scene weight analysis unit 203 is configured to normalize the confidence levels of the video frames included in the video scenes of different categories to obtain the weight values of the video scenes of different categories in the target video data.
举例说明,在具体实施过程中,按照所包含视频场景的不同对所述完整视频数据进行分割处理,假设获得的目标视频数据可以包含A、B、C三个不同视频场景的视频帧,此时,通过所述预设的图像识别模型对所述目标视频数据包含的视频场景进行识别分类,可能获得A、B、C三个不同视频场景类别。假设A类别视频场景包含5个视频帧,各个视频帧的置信度依次为0.6、0.5、0.2、0.2、0.5;B类别视频场景包含3个视频帧,各个视频帧的置信度依次为0.7、0.3、0.5;C类别视频场景包含2个视频帧,各个视频帧的置信度依次为0.3和0.3。利用加权平均算法将不同类别的所述视频场景所包含视频帧的置信度分别做归一化处理,获得各个视频场景的加权平均值。具体的,将5个视频帧分别对应的置信度0.6、0.5、0.4、0.4、0.6相加,再除以视频帧数5,得到加权平均值为0.5。同理,可以得到B类别视频场景的加权平均值为0.4,C类别视频场景的加权平均值为0.3。所述加权平均值即为本申请所述的不同类别的所述视频场景在所述目标视频数据中的权重值。即:A类别的所述视频场景在所述目标视频数据中的权重值为0.5;B类别的所述视频场景在所述目标视频数据中的权重值为0.4;C类别的所述视频场景在所述目标视频数据中的权重值为0.3。For example, in the specific implementation process, the complete video data is segmented according to the different video scenes included. It is assumed that the obtained target video data may include video frames of three different video scenes: A, B, and C. In order to identify and classify the video scenes contained in the target video data through the preset image recognition model, three different video scene categories of A, B, and C may be obtained. Assuming that a category A video scene contains 5 video frames, the confidence of each video frame is 0.6, 0.5, 0.2, 0.2, 0.5; the B category video scene contains 3 video frames, and the confidence of each video frame is 0.7, 0.3 , 0.5; C category video scene contains 2 video frames, and the confidence of each video frame is 0.3 and 0.3 in turn. A weighted average algorithm is used to normalize the confidence levels of the video frames included in the video scenes of different categories to obtain the weighted average of each video scene. Specifically, the confidence levels of 0.6, 0.5, 0.4, 0.4, and 0.6 corresponding to the 5 video frames are added together, and then divided by the number of video frames of 5 to obtain a weighted average value of 0.5. Similarly, it can be obtained that the weighted average value of category B video scenes is 0.4, and the weighted average value of category C video scenes is 0.3. The weighted average value is the weight value of the different types of video scenes in the target video data described in this application. That is: the weight value of the video scene in category A in the target video data is 0.5; the weight value of the video scene in category B in the target video data is 0.4; the video scene in category C is in The weight value in the target video data is 0.3.
在实际实施过程中,由于置信度太低的视频帧可能是因为受到其他因素的干扰得到的结果,因此,在具体操作过程中可以首先设定一个置信度阈值,获得达到或超过该置信度阈值的视频帧进行接下来的计算。例如置信度阈值设定为0.5,然后根据达到或超过该置信度阈值的视频帧,计算各个视频场景的加权平均值。例如:此时A类别的视频场景只考虑 置信度为0.6、0.5和0.6的视频帧即可,B类别的视频场景只考虑置信度为0.6和0.5的视频帧即可,C类别的视频场景中没有超过置信度阈值的视频帧。A类别的视频场景的加权平均值是(0.6+0.5+0.5)/3,B类别的视频场景的加权平均值是(0.6+0.5)/2,C类别的视频场景的加权平均值是为0。为了减少计算量,当每个类别中达到或超过该置信度阈值的视频帧较多时,可以从每个类别中选取最高得分的3-6个视频帧进行计算即可。In the actual implementation process, because the video frame with too low confidence may be the result of interference from other factors, in the specific operation process, you can first set a confidence threshold, and obtain that the confidence threshold is reached or exceeded. Of the video frame for the next calculation. For example, the confidence threshold is set to 0.5, and then the weighted average value of each video scene is calculated according to the video frames that meet or exceed the confidence threshold. For example: at this time, the video scene of category A only considers the video frames with confidence levels of 0.6, 0.5 and 0.6, the video scene of category B only considers the video frames with confidence levels of 0.6 and 0.5, and the video scene of category C There are no video frames that exceed the confidence threshold. The weighted average of video scenes in category A is (0.6+0.5+0.5)/3, the weighted average of video scenes in category B is (0.6+0.5)/2, and the weighted average of video scenes in category C is 0 . In order to reduce the amount of calculation, when there are many video frames that meet or exceed the confidence threshold in each category, 3-6 video frames with the highest score can be selected from each category for calculation.
目标视频数据获得单元204,用于获得用户输入的参数信息,根据所述参数信息和所述权重值的大小从所述视频场景中获得目标视频场景,并将所述目标视频场景返回至客户端。The target video data obtaining unit 204 is configured to obtain parameter information input by the user, obtain a target video scene from the video scene according to the parameter information and the size of the weight value, and return the target video scene to the client .
在本申请实施例中,可以从用户输入的参数信息中提取目标参数值,此时,根据所述目标参数值和所述权重值的大小,可以从所述视频场景中获得所述权重值达到或者超过预设权重阈值的第一视频场景,将所述第一视频场景作为所述目标视频场景。其中,所述第一视频场景包含至少一个所述视频场景。In the embodiment of the present application, the target parameter value can be extracted from the parameter information input by the user. At this time, according to the size of the target parameter value and the weight value, the weight value can be obtained from the video scene. Or the first video scene that exceeds the preset weight threshold, the first video scene is used as the target video scene. Wherein, the first video scene includes at least one of the video scenes.
具体的,假设用户输入的目标参数值N为2,首先判断所述目标参数值N为2是否大于所述目标视频数据所包含所述视频场景的种类数量3,若否,则视频场景在所述目标视频数据中的权重值为从大到小依次为:A类别视频场景>B类别视频场景>C类别视频场景。此时,预设权重阈值可以设为3,从所述视频场景中获得所述权重值达到或者超过3的第一视频场景(即:A类别视频场景和B类别视频场景)。将A类别视频场景和B类别视频场景作为识别该视频数据的目标视频场景。当用户输入的目标参数值N为1时,此时,预设权重阈值可以设为5,从所述视频场景中获得所述权重值达到或者超过5的第一视频场景(即:A类别视频场景)。将A类别视频场景作为识别该视频数据的目标视频场景。Specifically, assuming that the target parameter value N input by the user is 2, it is first judged whether the target parameter value N is 2 greater than the number of types of video scenes contained in the target video data 3, if not, the video scene is The weight values in the target video data in descending order are: A category video scene>B category video scene>C category video scene. At this time, the preset weight threshold may be set to 3, and the first video scene (ie, category A video scene and category B video scene) whose weight value reaches or exceeds 3 is obtained from the video scene. The category A video scene and the category B video scene are used as the target video scene for identifying the video data. When the target parameter value N input by the user is 1, at this time, the preset weight threshold can be set to 5, and the first video scene with the weight value reaching or exceeding 5 (ie: category A video) is obtained from the video scene. Scenes). A category video scene is used as the target video scene for identifying the video data.
本申请所述的实施例中可以判断所述目标参数值是否大于所述目标视频数据所包含所述视频场景的种类数量,若是,则向所述客户端返回告警提示信息。即:当用户输入的目标参数值N为5时,目标参数值N 为5大于所述目标视频数据所包含所述视频场景的种类数量3,此时,向客户端返回告警提示信息,从而提醒用户。In the embodiment described in this application, it can be determined whether the target parameter value is greater than the number of types of video scenes included in the target video data, and if so, an alarm prompt information is returned to the client. That is: when the target parameter value N input by the user is 5, the target parameter value N is 5 greater than the number of types of video scenes contained in the target video data 3. At this time, an alarm message is returned to the client to remind user.
采用本申请所述的识别视频数据中视频场景的装置,能够通过预设的图像识别模型获得不同种类的所述视频场景分别包含的视频帧的置信度,进而做归一化处理获得视频场景在目标视频数据中的权重值,通过比较权重值大小快速确定目标视频数据对应的目标视频场景,提高了目标视频场景的采集效率和精确度,从而提高了用户的使用体验。Using the device for recognizing video scenes in video data described in this application, the confidence levels of the video frames contained in different types of video scenes can be obtained through a preset image recognition model, and then normalized processing is performed to obtain the video scene in The weight value in the target video data is compared with the size of the weight value to quickly determine the target video scene corresponding to the target video data, which improves the collection efficiency and accuracy of the target video scene, thereby improving the user experience.
与上述提供的一种识别视频数据中视频场景的方法相对应,本申请还提供一种电子设备。由于该电子设备的实施例相似于上述方法实施例,所以描述的比较简单,相关之处请参见上述方法实施例部分的说明即可,下面描述的电子设备仅是示意性的。请参考图3所示,其为本申请实施例提供的一种电子设备的示意图。Corresponding to the method for identifying video scenes in video data provided above, the present application also provides an electronic device. Since the embodiment of the electronic device is similar to the foregoing method embodiment, the description is relatively simple. For related details, please refer to the description of the foregoing method embodiment section. The electronic device described below is only illustrative. Please refer to FIG. 3, which is a schematic diagram of an electronic device provided by an embodiment of this application.
本申请提供一种电子设备具体包括:处理器301和存储器302;其中,所述存储器302,用于存储识别视频数据中视频场景的方法的程序,该设备通电并通过所述处理器301运行该识别视频数据中视频场景的方法的程序后,执行下述步骤:The present application provides an electronic device that specifically includes: a processor 301 and a memory 302; wherein the memory 302 is used to store a program of a method for identifying video scenes in video data, and the device is powered on and runs through the processor 301 After identifying the procedure of the video scene in the video data, perform the following steps:
按照待检测的完整视频数据所包含视频场景的不同对所述完整视频数据进行分割处理,获得目标视频数据;其中,所述目标视频数据包含至少一个所述视频场景的视频帧;通过预设的图像识别模型,确定所述目标视频数据包含所述视频场景的种类,并获得不同种类的所述视频场景分别包含的视频帧的置信度;其中,所述图像识别模型为根据所述目标视频数据包含所述视频场景的特征信息对所述目标视频数据包含的所述视频场景进行分类的深度神经网络模型;将不同类别的所述视频场景所包含视频帧的置信度分别做归一化处理,获得不同类别的所述视频场景在所述目标视频数据中的权重值;获得用户输入的参数信息,根据所述参数信息和所述权重值的大小从所述视频场景中获得目标视频场景,并将所述目标视频场景返回至客户端。According to the different video scenes included in the complete video data to be detected, the complete video data is segmented to obtain target video data; wherein, the target video data includes at least one video frame of the video scene; The image recognition model determines that the target video data includes the type of the video scene, and obtains the confidence levels of the video frames contained in the different types of the video scenes; wherein the image recognition model is based on the target video data A deep neural network model that includes the feature information of the video scene to classify the video scenes included in the target video data; and normalizes the confidence levels of the video frames included in the video scenes of different categories, Obtain the weight values of the video scenes of different categories in the target video data; obtain the parameter information input by the user, obtain the target video scene from the video scene according to the parameter information and the size of the weight value, and Return the target video scene to the client.
相应的,本申请还提供一种存储设备,包括:存储有识别视频数据 中视频场景的方法的程序,该程序被处理器运行,执行下述步骤:Correspondingly, this application also provides a storage device, including: a program storing a method for identifying a video scene in video data, the program being run by a processor, and performing the following steps:
按照待检测的完整视频数据所包含视频场景的不同对所述完整视频数据进行分割处理,获得目标视频数据;其中,所述目标视频数据包含至少一个所述视频场景的视频帧;通过预设的图像识别模型,确定所述目标视频数据包含所述视频场景的种类,并获得不同种类的所述视频场景分别包含的视频帧的置信度;其中,所述图像识别模型为根据所述目标视频数据包含所述视频场景的特征信息对所述目标视频数据包含的所述视频场景进行分类的深度神经网络模型;将不同类别的所述视频场景所包含视频帧的置信度分别做归一化处理,获得不同类别的所述视频场景在所述目标视频数据中的权重值;获得用户输入的参数信息,根据所述参数信息和所述权重值的大小从所述视频场景中获得目标视频场景,并将所述目标视频场景返回至客户端。According to the different video scenes included in the complete video data to be detected, the complete video data is segmented to obtain target video data; wherein, the target video data includes at least one video frame of the video scene; The image recognition model determines that the target video data includes the type of the video scene, and obtains the confidence levels of the video frames contained in the different types of the video scenes; wherein the image recognition model is based on the target video data A deep neural network model that includes the feature information of the video scene to classify the video scenes included in the target video data; and normalizes the confidence levels of the video frames included in the video scenes of different categories, Obtain the weight values of the video scenes of different categories in the target video data; obtain the parameter information input by the user, obtain the target video scene from the video scene according to the parameter information and the size of the weight value, and Return the target video scene to the client.
在本申请实施例中,处理器或处理器模块可以是一种集成电路芯片,具有信号的处理能力。处理器可以是通用处理器、数字信号处理器(Digital Signal Processor,简称DSP)、专用集成电路(Application Specific Integrated Circuit,简称ASIC)、现场可编程门阵列(Field Programmable Gate Array,简称FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件。In the embodiment of the present application, the processor or the processor module may be an integrated circuit chip with signal processing capability. The processor can be a general-purpose processor, a digital signal processor (DSP), an application specific integrated circuit (ASIC), a field programmable gate array (Field Programmable Gate Array, FPGA) or other Programmable logic devices, discrete gate or transistor logic devices, discrete hardware components.
可以实现或者执行本申请实施例中的公开的各方法、步骤及逻辑框图。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。结合本申请实施例所公开的方法的步骤可以直接体现为硬件译码处理器执行完成,或者用译码处理器中的硬件及软件模块组合执行完成。软件模块可以位于随机存储器,闪存、只读存储器,可编程只读存储器或者电可擦写可编程存储器、寄存器等本领域成熟的存储介质中。处理器读取存储介质中的信息,结合其硬件完成上述方法的步骤。The methods, steps, and logical block diagrams disclosed in the embodiments of the present application can be implemented or executed. The general-purpose processor may be a microprocessor or the processor may also be any conventional processor or the like. The steps of the method disclosed in the embodiments of the present application may be directly embodied as being executed and completed by a hardware decoding processor, or executed and completed by a combination of hardware and software modules in the decoding processor. The software module can be located in a mature storage medium in the field such as random access memory, flash memory, read-only memory, programmable read-only memory, or electrically erasable programmable memory, registers. The processor reads the information in the storage medium and completes the steps of the above method in combination with its hardware.
存储介质可以是存储器,例如可以是易失性存储器或非易失性存储器,或可包括易失性和非易失性存储器两者。The storage medium may be a memory, for example, may be a volatile memory or a non-volatile memory, or may include both volatile and non-volatile memory.
其中,非易失性存储器可以是只读存储器(Read-Only Memory,简 称ROM)、可编程只读存储器(Programmable ROM,简称PROM)、可擦除可编程只读存储器(Erasable PROM,简称EPROM)、电可擦除可编程只读存储器(Electrically EPROM,简称EEPROM)或闪存。Among them, the non-volatile memory can be a read-only memory (Read-Only Memory, ROM for short), a programmable read-only memory (Programmable ROM, PROM for short), and an erasable programmable read-only memory (Erasable PROM, EPROM for short). , Electrically Erasable Programmable Read-Only Memory (Electrically EPROM, EEPROM for short) or flash memory.
易失性存储器可以是随机存取存储器(Random Access Memory,简称RAM),其用作外部高速缓存。通过示例性但不是限制性说明,许多形式的RAM可用,例如静态随机存取存储器(Static RAM,简称SRAM)、动态随机存取存储器(Dynamic RAM,简称DRAM)、同步动态随机存取存储器(Synchronous DRAM,简称SDRAM)、双倍数据速率同步动态随机存取存储器(Double Data Rate SDRAM,简称DDRSDRAM)、增强型同步动态随机存取存储器(Enhanced SDRAM,简称ESDRAM)、同步连接动态随机存取存储器(Synch link DRAM,简称SLDRAM)和直接内存总线随机存取存储器(Direct Ram bus RAM,简称DRRAM)。The volatile memory may be a random access memory (Random Access Memory, RAM for short), which is used as an external cache. By way of exemplary but not restrictive description, many forms of RAM are available, such as static random access memory (Static RAM, SRAM for short), dynamic random access memory (Dynamic RAM, DRAM for short), and synchronous dynamic random access memory (Synchronous RAM). DRAM, SDRAM for short), Double Data Rate Synchronous Dynamic Random Access Memory (Double Data Rate SDRAM, for DDRSDRAM), Enhanced Synchronous Dynamic Random Access Memory (Enhanced SDRAM, for ESDRAM), Synchronously Connected Dynamic Random Access Memory ( Synch link DRAM, SLDRAM for short) and Direct Ram bus RAM (DRRAM for short).
本申请实施例描述的存储介质旨在包括但不限于这些和任意其它适合类型的存储器。The storage media described in the embodiments of the present application are intended to include, but are not limited to, these and any other suitable types of memory.
本领域技术人员应该可以意识到,在上述一个或多个示例中,本申请所描述的功能可以用硬件与软件组合来实现。当应用软件时,可以将相应功能存储在计算机可读介质中或者作为计算机可读介质上的一个或多个指令或代码进行传输。计算机可读介质包括计算机存储介质和通信介质,其中通信介质包括便于从一个地方向另一个地方传送计算机程序的任何介质。存储介质可以是通用或专用计算机能够存取的任何可用介质。Those skilled in the art should be aware that in one or more of the above examples, the functions described in this application can be implemented by a combination of hardware and software. When software is applied, the corresponding function can be stored in a computer-readable medium or transmitted as one or more instructions or codes on the computer-readable medium. The computer-readable medium includes a computer storage medium and a communication medium, where the communication medium includes any medium that facilitates the transfer of a computer program from one place to another. The storage medium may be any available medium that can be accessed by a general-purpose or special-purpose computer.
以上所述的具体实施方式,对本申请的目的、技术方案和有益效果进行了进一步详细说明,所应理解的是,以上所述仅为本申请的具体实施方式而已,并不用于限定本申请的保护范围,凡在本申请的技术方案的基础之上,所做的任何修改、等同替换、改进等,均应包括在本申请的保护范围之内。The specific implementations described above further describe the purpose, technical solutions and beneficial effects of this application in detail. It should be understood that the above are only specific implementations of this application and are not intended to limit the scope of this application. The protection scope, any modification, equivalent replacement, improvement, etc. made on the basis of the technical solution of this application shall be included in the protection scope of this application.

Claims (10)

  1. 一种识别视频数据中视频场景的方法,其特征在于,包括:A method for identifying video scenes in video data, which is characterized in that it includes:
    按照待检测的完整视频数据所包含视频场景的不同对所述完整视频数据进行分割处理,获得目标视频数据;其中,所述目标视频数据包含至少一个所述视频场景的视频帧;Performing segmentation processing on the complete video data according to different video scenes included in the complete video data to be detected to obtain target video data; wherein the target video data includes at least one video frame of the video scene;
    通过预设的图像识别模型,确定所述目标视频数据包含所述视频场景的种类,并获得不同种类的所述视频场景分别包含的视频帧的置信度;其中,所述图像识别模型为所述目标视频数据包含的所述视频场景的特征信息对所述目标视频数据包含的所述视频场景进行分类的深度神经网络模型;Through a preset image recognition model, it is determined that the target video data includes the type of the video scene, and the confidence levels of the video frames contained in the different types of the video scenes are obtained; wherein, the image recognition model is the The feature information of the video scene included in the target video data is a deep neural network model that classifies the video scene included in the target video data;
    将不同类别的所述视频场景所包含视频帧的置信度分别做归一化处理,获得不同类别的所述视频场景在所述目标视频数据中的权重值;Normalizing the confidence levels of the video frames included in the video scenes of different categories to obtain the weight values of the video scenes of the different categories in the target video data;
    获得用户输入的参数信息,根据所述参数信息和所述权重值的大小从所述视频场景中获得目标视频场景,并将所述目标视频场景返回至客户端。Obtain parameter information input by the user, obtain a target video scene from the video scene according to the parameter information and the size of the weight value, and return the target video scene to the client.
  2. 根据权利要求1所述的识别视频数据中视频场景的方法,其特征在于,所述按照待检测的完整视频数据所包含视频场景的不同对所述完整视频数据进行分割处理,获得目标视频数据,具体包括:The method for recognizing video scenes in video data according to claim 1, wherein the complete video data is segmented according to different video scenes included in the complete video data to be detected to obtain the target video data, Specifically:
    获得待检测的所述完整视频数据;Obtaining the complete video data to be detected;
    通过特征提取算法获得所述完整视频数据中所述视频场景的颜色特征;Obtaining the color features of the video scene in the complete video data through a feature extraction algorithm;
    通过所述特征提取算法获得所述完整视频数据中所述视频帧的颜色特征;Obtaining the color feature of the video frame in the complete video data by using the feature extraction algorithm;
    根据所述完整视频数据中相邻两个所述视频场景之间的第一颜色特征差异和所述完整视频数据中相邻两个视频帧之间的第二颜色特征差异,判断相邻所述视频场景之间的切换位置;According to the first color feature difference between two adjacent video scenes in the complete video data and the second color feature difference between two adjacent video frames in the complete video data, determine the adjacent Switch position between video scenes;
    根据所述切换位置对所述完整视频数据进行分割处理。Perform segmentation processing on the complete video data according to the switching position.
  3. 根据权利要求1所述的识别视频数据中视频场景的方法,其特征在于,所述根据所述参数信息和所述权重值的大小从所述视频场景中获得目标视频场景,具体包括:The method for recognizing a video scene in video data according to claim 1, wherein the obtaining a target video scene from the video scene according to the parameter information and the size of the weight value specifically comprises:
    从所述参数信息中提取目标参数值;Extracting the target parameter value from the parameter information;
    根据所述目标参数值和所述权重值的大小,从所述视频场景中获得所述权重值达到或者超过预设权重阈值的第一视频场景,将所述第一视频场景作为所述目标视频场景;其中,所述第一视频场景包含至少一个所述视频场景。According to the size of the target parameter value and the weight value, obtain a first video scene whose weight value reaches or exceed a preset weight threshold from the video scene, and use the first video scene as the target video Scene; wherein, the first video scene includes at least one of the video scenes.
  4. 根据权利要求1所述的识别视频数据中视频场景的方法,其特征在于,还包括:The method for identifying video scenes in video data according to claim 1, further comprising:
    从所述参数信息中提取目标参数值;Extracting the target parameter value from the parameter information;
    判断所述目标参数值是否大于所述目标视频数据所包含的所述视频场景的种类数量,若是,则向所述客户端返回告警提示信息。It is determined whether the target parameter value is greater than the number of types of video scenes included in the target video data, and if so, an alarm prompt message is returned to the client.
  5. 根据权利要求1所述的识别视频数据中视频场景的方法,其特征在于,所述视频帧的置信度是指所述视频帧为所述视频场景对应的视频帧的概率值。The method for identifying a video scene in video data according to claim 1, wherein the confidence of the video frame refers to the probability value of the video frame being a video frame corresponding to the video scene.
  6. 一种识别视频数据中视频场景的装置,其特征在于,包括:A device for identifying video scenes in video data is characterized in that it comprises:
    分割处理单元,用于按照待检测的完整视频数据所包含视频场景的不同对所述完整视频数据进行分割处理,获得目标视频数据;其中,所述目标视频数据包含至少一个所述视频场景的视频帧;The segmentation processing unit is configured to segment the complete video data according to the different video scenes contained in the complete video data to be detected to obtain target video data; wherein the target video data includes at least one video of the video scene frame;
    视频场景识别单元,用于通过预设的图像识别模型,确定所述目标视频数据包含所述视频场景的种类,并获得不同种类的所述视频场景分别包含的视频帧的置信度;其中,所述图像识别模型为根据所述目标视频数据包含所述视频场景的特征信息对所述目标视频数据包含的所述视频场景进行分类的深度神经网络模型;The video scene recognition unit is used to determine the type of the video scene contained in the target video data through a preset image recognition model, and obtain the confidence levels of the video frames contained in the video scenes of different types; wherein, The image recognition model is a deep neural network model that classifies the video scene included in the target video data according to the feature information of the video scene included in the target video data;
    视频场景权重分析单元,用于将不同类别的所述视频场景所包含视频帧的置信度分别做归一化处理,获得不同类别的所述视频场景在所述目标视频数据中的权重值;A video scene weight analysis unit, configured to normalize the confidence levels of the video frames included in the video scenes of different categories to obtain the weight values of the video scenes of different categories in the target video data;
    目标视频数据获得单元,用于获得用户输入的参数信息,根据所述参数信息和所述权重值的大小从所述视频场景中获得目标视频场景,并将所述目标视频场景返回至客户端。The target video data obtaining unit is configured to obtain parameter information input by the user, obtain a target video scene from the video scene according to the parameter information and the size of the weight value, and return the target video scene to the client.
  7. 根据权利要求6所述的识别视频数据中视频场景的装置,其特征在于,所述目标视频数据获得单元具体用于:The device for identifying video scenes in video data according to claim 6, wherein the target video data obtaining unit is specifically configured to:
    获得待检测的所述完整视频数据;Obtaining the complete video data to be detected;
    通过特征提取算法获得所述完整视频数据中所述视频场景的颜色特征;Obtaining the color features of the video scene in the complete video data through a feature extraction algorithm;
    通过所述特征提取算法获得所述完整视频数据中所述视频帧的颜色特征;Obtaining the color feature of the video frame in the complete video data by using the feature extraction algorithm;
    根据所述完整视频数据中相邻两个所述视频场景之间的第一颜色特征差异和所述完整视频数据中相邻两个视频帧之间的第二颜色特征差异,判断相邻所述视频场景之间的切换位置;According to the first color feature difference between two adjacent video scenes in the complete video data and the second color feature difference between two adjacent video frames in the complete video data, determine the adjacent Switch position between video scenes;
    根据所述切换位置对所述完整视频数据进行分割处理。Perform segmentation processing on the complete video data according to the switching position.
  8. 根据权利要求6所述的识别视频数据中视频场景的装置,其特征在于,所述分割处理单元具体用于:The device for identifying video scenes in video data according to claim 6, wherein the segmentation processing unit is specifically configured to:
    从所述参数信息中提取目标参数值;Extracting the target parameter value from the parameter information;
    根据所述目标参数值和所述权重值的大小,从所述视频场景中获得所述权重值达到或者超过预设权重阈值的第一视频场景,将所述第一视频场景作为所述目标视频场景;其中,所述第一视频场景包含至少一个所述视频场景。According to the size of the target parameter value and the weight value, obtain a first video scene whose weight value reaches or exceed a preset weight threshold from the video scene, and use the first video scene as the target video Scene; wherein, the first video scene includes at least one of the video scenes.
  9. 一种电子设备,其特征在于,包括:An electronic device, characterized in that it comprises:
    处理器;以及Processor; and
    存储器,用于存储识别视频数据中视频场景的方法的程序,该设备通电并通过所述处理器运行该识别视频数据中视频场景的方法的程序后,执行下述步骤:The memory is used to store the program of the method for identifying the video scene in the video data. After the device is powered on and runs the program of the method for identifying the video scene in the video data through the processor, the following steps are executed:
    按照待检测的完整视频数据所包含视频场景的不同对所述完整视频数据进行分割处理,获得目标视频数据;其中,所述目标视频数据包含 至少一个所述视频场景的视频帧;Performing segmentation processing on the complete video data according to different video scenes included in the complete video data to be detected to obtain target video data; wherein the target video data includes at least one video frame of the video scene;
    通过预设的图像识别模型,确定所述目标视频数据包含所述视频场景的种类,并获得不同种类的所述视频场景分别包含的视频帧的置信度;其中,所述图像识别模型为根据所述目标视频数据包含所述视频场景的特征信息对所述目标视频数据包含的所述视频场景进行分类的深度神经网络模型;Through a preset image recognition model, the target video data is determined to include the type of the video scene, and the confidence levels of the video frames contained in the different types of video scenes are obtained; wherein, the image recognition model is based on The target video data includes feature information of the video scene, and a deep neural network model that classifies the video scene included in the target video data;
    将不同类别的所述视频场景所包含视频帧的置信度分别做归一化处理,获得不同类别的所述视频场景在所述目标视频数据中的权重值;Normalizing the confidence levels of the video frames included in the video scenes of different categories to obtain the weight values of the video scenes of the different categories in the target video data;
    获得用户输入的参数信息,根据所述参数信息和所述权重值的大小从所述视频场景中获得目标视频场景,并将所述目标视频场景返回至客户端。Obtain parameter information input by the user, obtain a target video scene from the video scene according to the parameter information and the size of the weight value, and return the target video scene to the client.
  10. 一种存储设备,其特征在于,存储有识别视频数据中视频场景的方法的程序,该程序被处理器运行,执行下述步骤:A storage device, characterized in that it stores a program of a method for identifying video scenes in video data, and the program is run by a processor to perform the following steps:
    按照待检测的完整视频数据所包含视频场景的不同对所述完整视频数据进行分割处理,获得目标视频数据;其中,所述目标视频数据包含至少一个所述视频场景的视频帧;Performing segmentation processing on the complete video data according to different video scenes included in the complete video data to be detected to obtain target video data; wherein the target video data includes at least one video frame of the video scene;
    通过预设的图像识别模型,确定所述目标视频数据包含所述视频场景的种类,并获得不同种类的所述视频场景分别包含的视频帧的置信度;其中,所述图像识别模型为根据所述目标视频数据包含所述视频场景的特征信息对所述目标视频数据包含的所述视频场景进行分类的深度神经网络模型;Through a preset image recognition model, the target video data is determined to include the type of the video scene, and the confidence levels of the video frames contained in the different types of video scenes are obtained; wherein, the image recognition model is based on The target video data includes feature information of the video scene, and a deep neural network model that classifies the video scene included in the target video data;
    将不同类别的所述视频场景所包含视频帧的置信度分别做归一化处理,获得不同类别的所述视频场景在所述目标视频数据中的权重值;Normalizing the confidence levels of the video frames included in the video scenes of different categories to obtain the weight values of the video scenes of the different categories in the target video data;
    获得用户输入的参数信息,根据所述参数信息和所述权重值的大小从所述视频场景中获得目标视频场景,并将所述目标视频场景返回至客户端。Obtain parameter information input by the user, obtain a target video scene from the video scene according to the parameter information and the size of the weight value, and return the target video scene to the client.
PCT/CN2019/108434 2019-06-17 2019-09-27 Method and apparatus for recognizing video scene in video data WO2020252975A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910522913.2A CN110149531A (en) 2019-06-17 2019-06-17 The method and apparatus of video scene in a kind of identification video data
CN201910522913.2 2019-06-17

Publications (1)

Publication Number Publication Date
WO2020252975A1 true WO2020252975A1 (en) 2020-12-24

Family

ID=67591546

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/108434 WO2020252975A1 (en) 2019-06-17 2019-09-27 Method and apparatus for recognizing video scene in video data

Country Status (2)

Country Link
CN (1) CN110149531A (en)
WO (1) WO2020252975A1 (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110149531A (en) * 2019-06-17 2019-08-20 北京影谱科技股份有限公司 The method and apparatus of video scene in a kind of identification video data
CN110933462B (en) * 2019-10-14 2022-03-25 咪咕文化科技有限公司 Video processing method, system, electronic device and storage medium
CN113177603B (en) * 2021-05-12 2022-05-06 中移智行网络科技有限公司 Training method of classification model, video classification method and related equipment
CN115334351B (en) * 2022-08-02 2023-10-31 Vidaa国际控股(荷兰)公司 Display equipment and self-adaptive image quality adjusting method

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070030391A1 (en) * 2005-08-04 2007-02-08 Samsung Electronics Co., Ltd. Apparatus, medium, and method segmenting video sequences based on topic
CN102207966A (en) * 2011-06-01 2011-10-05 华南理工大学 Video content quick retrieving method based on object tag
CN108537134A (en) * 2018-03-16 2018-09-14 北京交通大学 A kind of video semanteme scene cut and mask method
CN108848422A (en) * 2018-04-19 2018-11-20 清华大学 A kind of video abstraction generating method based on target detection
CN109213895A (en) * 2017-07-05 2019-01-15 合网络技术(北京)有限公司 A kind of generation method and device of video frequency abstract
CN110149531A (en) * 2019-06-17 2019-08-20 北京影谱科技股份有限公司 The method and apparatus of video scene in a kind of identification video data

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9754351B2 (en) * 2015-11-05 2017-09-05 Facebook, Inc. Systems and methods for processing content using convolutional neural networks
CN108053420B (en) * 2018-01-05 2021-11-02 昆明理工大学 Partition method based on finite space-time resolution class-independent attribute dynamic scene
CN109145840B (en) * 2018-08-29 2022-06-24 北京字节跳动网络技术有限公司 Video scene classification method, device, equipment and storage medium

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070030391A1 (en) * 2005-08-04 2007-02-08 Samsung Electronics Co., Ltd. Apparatus, medium, and method segmenting video sequences based on topic
CN102207966A (en) * 2011-06-01 2011-10-05 华南理工大学 Video content quick retrieving method based on object tag
CN109213895A (en) * 2017-07-05 2019-01-15 合网络技术(北京)有限公司 A kind of generation method and device of video frequency abstract
CN108537134A (en) * 2018-03-16 2018-09-14 北京交通大学 A kind of video semanteme scene cut and mask method
CN108848422A (en) * 2018-04-19 2018-11-20 清华大学 A kind of video abstraction generating method based on target detection
CN110149531A (en) * 2019-06-17 2019-08-20 北京影谱科技股份有限公司 The method and apparatus of video scene in a kind of identification video data

Also Published As

Publication number Publication date
CN110149531A (en) 2019-08-20

Similar Documents

Publication Publication Date Title
WO2020252975A1 (en) Method and apparatus for recognizing video scene in video data
CN110163114B (en) Method and system for analyzing face angle and face blurriness and computer equipment
US10896349B2 (en) Text detection method and apparatus, and storage medium
CN109697416B (en) Video data processing method and related device
WO2020252917A1 (en) Fuzzy face image recognition method and apparatus, terminal device, and medium
CN106557726B (en) Face identity authentication system with silent type living body detection and method thereof
EP3882809A1 (en) Face key point detection method, apparatus, computer device and storage medium
US20200410212A1 (en) Fast side-face interference resistant face detection method
TWI497422B (en) A system and method for recognizing license plate image
US8358837B2 (en) Apparatus and methods for detecting adult videos
US8867828B2 (en) Text region detection system and method
KR101615254B1 (en) Detecting facial expressions in digital images
TW202004637A (en) Risk prediction method and apparatus, storage medium, and server
US10867182B2 (en) Object recognition method and object recognition system thereof
KR20210110823A (en) Image recognition method, training method of recognition model, and related devices and devices
CN111326139B (en) Language identification method, device, equipment and storage medium
CN111552837A (en) Animal video tag automatic generation method based on deep learning, terminal and medium
WO2018192245A1 (en) Automatic scoring method for photo based on aesthetic assessment
US9111130B2 (en) Facilitating face detection with user input
CN111368632A (en) Signature identification method and device
WO2022062028A1 (en) Wine label recognition method, wine information management method and apparatus, device, and storage medium
CN111401343A (en) Method for identifying attributes of people in image and training method and device for identification model
CN114639155A (en) Emotion recognition method, emotion recognition device, storage medium and processor
CN113947209A (en) Integrated learning method, system and storage medium based on cloud edge cooperation
JP4749879B2 (en) Face discrimination method, apparatus, and program

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19933925

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19933925

Country of ref document: EP

Kind code of ref document: A1

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 28.03.2022)

122 Ep: pct application non-entry in european phase

Ref document number: 19933925

Country of ref document: EP

Kind code of ref document: A1