WO2020252975A1

WO2020252975A1 - Method and apparatus for recognizing video scene in video data

Info

Publication number: WO2020252975A1
Application number: PCT/CN2019/108434
Authority: WO
Inventors: 彭浩
Original assignee: 北京影谱科技股份有限公司
Priority date: 2019-06-17
Filing date: 2019-09-27
Publication date: 2020-12-24
Also published as: CN110149531A

Abstract

Disclosed are a method and apparatus for recognizing a video scene in video data. The method comprises: determining the types of video scenes comprised in target video data by means of a preset image recognition model, and obtaining confidence levels of video frames respectively comprised in different types of the video scenes; respectively performing normalization processing on the confidence levels of the video frames comprised in the different types of the video scenes to obtain weight values of the different types of the video scenes in the target video data; in addition, obtaining parameter information inputted by a user, determining a target video scene from the video scenes according to the parameter information and the weight values, and returning the target video scene to a client. By adopting the method according to the present application, the target video scene corresponding to the target video data can be quickly recognized by comparing the weight values of the video scenes in the target video data, so as to improve the recognition efficiency and accuracy of the target video scene.

Description

Method and device for identifying video scene in video data

This application claims the priority of a Chinese patent application filed with the Chinese Patent Office on June 17, 2018. The application number is CN201910522913.2, and the application title is "A method and device for identifying video scenes in video data". The entire content of the Chinese patent application Incorporated in this application by reference.

Technical field

The embodiments of the present application relate to the technical field of video data processing, in particular to a method and apparatus for identifying video scenes in video data, and in addition, to an electronic device and a storage device.

Background technique

With the rapid development of science and technology, various video materials are becoming more and more abundant. A complete video data often contains multiple video scenes. The video scene recognition of video data is a relatively common problem. However, if you want to achieve It is difficult to accurately identify video scenes in video data. Therefore, how to improve the accuracy of video scene recognition and reduce the misrecognition rate of the video scene of the target video has become an urgent technical problem for those skilled in the art.

In order to solve the above technical problems, the video scene recognition method usually used in the prior art is to extract the image features of the video frames contained in the video data to characterize the video data, and then identify and identify the image features of the video data based on a preset classifier Classification, different categories correspond to different video scenes, so as to realize the recognition of video scenes in video data. However, the above video scene recognition method is easily affected by the performance of the classifier, resulting in the classification accuracy of the video scene is still not high, and cannot meet the needs of current users.

Summary of the invention

To this end, the embodiments of the present application provide a method and device for recognizing video scenes in video data to solve the problem in the prior art that the current video scene segmentation technology is not mature enough, and the target scene extraction is not accurate enough.

In order to achieve the foregoing objectives, the embodiments of the present application provide the following technical solutions:

According to an embodiment of the present application, a method for identifying video scenes in video data includes: performing segmentation processing on the complete video data according to different video scenes included in the complete video data to be detected to obtain target video data; wherein, The target video data includes at least one video frame of the video scene; through a preset image recognition model, it is determined that the target video data includes the type of the video scene, and the different types of video scenes respectively included The confidence level of the video frame; wherein the image recognition model is a deep neural network model that classifies the video scene contained in the target video data according to the feature information of the video scene contained in the target video data; The confidence levels of the video frames contained in the video scenes of the categories are respectively normalized to obtain the weight values of the video scenes of different categories in the target video data; the parameter information input by the user is obtained according to the parameters The information and the size of the weight value obtain a target video scene from the video scene, and return the target video scene to the client.

Further, the obtaining a target video scene from the video scene according to the size of the parameter information and the weight value specifically includes: extracting a target parameter value from the parameter information; and according to the target parameter value and the weight value. The size of the weight value, the first video scene whose weight value reaches or exceeds a preset weight threshold is obtained from the video scene, and the first video scene is taken as the target video scene; wherein, the first video scene The video scene includes at least one of the video scenes.

Further, the video scene recognition method further includes: extracting a target parameter value from the parameter information; judging whether the target parameter value is greater than the number of types of the video scenes contained in the target video data, if so, Then, the alarm prompt information is returned to the client.

Further, the confidence of the video frame refers to the probability value that the video frame is a video frame corresponding to the video scene.

Further, the segmentation of the complete video data according to different video scenes included in the complete video data to be detected to obtain target video data specifically includes: obtaining the complete video data to be detected; and using a feature extraction algorithm Obtain the color features of the video scene in the complete video data; Obtain the color features of the video frame in the complete video data through the feature extraction algorithm; According to two adjacent videos in the complete video data The first color feature difference of the scene and the second color feature difference between two adjacent video frames in the complete video data, determine the switching position between the adjacent video scenes; The complete video data is segmented.

Correspondingly, the present application also provides a device for identifying video scenes in video data, including: a segmentation processing unit, configured to perform segmentation processing on the complete video data according to different video scenes contained in the complete video data to be detected, to obtain Target video data; wherein the target video data includes at least one video frame of the video scene; the video scene recognition unit is configured to determine that the target video data includes the type of the video scene through a preset image recognition model , And obtain the confidence levels of the video frames respectively contained in the different types of the video scenes; wherein, the image recognition model is based on the target video data containing the feature information of the video scene to the target video data. The deep neural network model for classifying the video scenes; the video scene weight analysis unit is used to normalize the confidence levels of the video frames contained in the video scenes of different categories to obtain the video scenes of different categories. The weight value in the target video data; the target video data obtaining unit is configured to obtain parameter information input by the user, obtain the target video scene from the video scene according to the parameter information and the size of the weight value, and combine The target video scene is returned to the client.

Further, the segmentation processing unit is specifically configured to: extract a target parameter value from the parameter information; according to the size of the target parameter value and the weight value, obtain the weight value from the video scene to reach or For a first video scene that exceeds a preset weight threshold, use the first video scene as the target video scene; wherein, the first video scene includes at least one of the video scenes.

Further, the target video data obtaining unit is specifically configured to: obtain the complete video data to be detected; obtain the color features of the video scene in the complete video data through a feature extraction algorithm; obtain through the feature extraction algorithm The color feature of the video frame in the complete video data; according to the first color feature difference of two adjacent video scenes in the complete video data and the difference between two adjacent video frames in the complete video data The second color feature difference is determined to determine the switching position between the adjacent video scenes; the complete video data is segmented according to the switching position.

Correspondingly, the present application also provides an electronic device, including: a processor and a memory; wherein the memory is used to store a program of a method for identifying video scenes in video data, and the device is powered on and runs the identification through the processor. After the procedure of the method of the video scene in the video data, perform the following steps:

According to the different video scenes included in the complete video data to be detected, the complete video data is segmented to obtain target video data; wherein, the target video data includes at least one video frame of the video scene; The image recognition model determines that the target video data includes the type of the video scene, and obtains the confidence levels of the video frames contained in the different types of the video scenes; wherein the image recognition model is based on the target video data A deep neural network model that includes the feature information of the video scene to classify the video scenes included in the target video data; and normalizes the confidence levels of the video frames included in the video scenes of different categories, Obtain the weight values of the video scenes of different categories in the target video data; obtain the parameter information input by the user, obtain the target video scene from the video scene according to the parameter information and the size of the weight value, and Return the target video scene to the client.

Correspondingly, the present application also provides a storage device that stores a program of a method for identifying video scenes in video data, and the program is run by a processor to perform the following steps:

Using the method for identifying video scenes in video data described in this application, the confidence levels of the video frames contained in different types of video scenes can be obtained through a preset image recognition model, and then normalized processing is performed to obtain the video scene The weight value in the target video data is compared with the size of the weight value to quickly determine the target video scene corresponding to the target video data, which improves the collection efficiency and accuracy of the target video scene, thereby improving the user experience.

Description of the drawings

In order to more clearly describe the embodiments of the present application or the technical solutions in the prior art, the following will briefly introduce the drawings that need to be used in the description of the embodiments or the prior art. Obviously, the drawings in the following description are only exemplary. For those of ordinary skill in the art, other implementation drawings can be derived from the provided drawings without creative work.

The structure, ratio, size, etc. shown in this manual are only used to match the content disclosed in the manual for people who are familiar with this technology to understand and read. They are not used to limit the implementation conditions of this application, so they are not technical. The substantive meaning of the above, any structural modification, proportional relationship change or size adjustment, without affecting the effects and objectives that can be achieved by this application, should still fall within the technical content disclosed in this application. Can cover the range.

FIG. 1 is a flowchart of a method for identifying video scenes in video data provided by an embodiment of the application;

2 is a schematic diagram of a device for identifying video scenes in video data provided by an embodiment of the application;

FIG. 3 is a schematic diagram of an electronic device provided by an embodiment of the application.

Detailed ways

The following specific examples illustrate the implementation of this application. Those familiar with this technology can easily understand the other advantages and effects of this application from the content disclosed in this specification. Obviously, the described examples are part of this application. , Not all examples. Based on the embodiments in this application, all other embodiments obtained by those of ordinary skill in the art without creative work shall fall within the protection scope of this application.

The following describes the embodiments in detail based on the method for identifying video scenes in video data described in this application. As shown in FIG. 1, it is a flowchart of a method for identifying a video scene in video data provided by an embodiment of the application. The specific implementation process includes the following steps:

Step S101: Perform segmentation processing on the complete video data according to different video scenes included in the complete video data to be detected to obtain target video data; wherein the target video data includes at least one video frame of the video scene.

In the embodiment of the present application, the complete video data may include at least two video scenes. Correspondingly, at least two pieces of video data (ie, target video data) can be obtained by performing segmentation processing on the complete video data according to different video scenes included in the complete video data. Each piece of video data can contain several video frames. Due to the immaturity of the current video data segmentation, the two pieces of video data obtained after the segmentation process may be mixed with some video frames in the same video scene. Therefore, the target video data includes at least one of the video scenes. Video frames, at this time, when the existing neural network is used to identify the target video data, the misrecognition rate is relatively high, and further processing is required.

It should be noted that the segmentation of the complete video data according to the different video scenes contained in the complete video data to be detected to obtain the target video data can be specifically implemented in the following ways:

Obtaining the complete video data to be detected, obtaining the color features of the video scene in the complete video data through a feature extraction algorithm, and obtaining the color features of the video frame in the complete video data through the feature extraction algorithm, Determine the adjacent video scenes based on the first color feature difference between two adjacent video scenes in the complete video data and the second color feature difference between two adjacent video frames in the complete video data The switching position between the two, the complete video data is segmented according to the switching position.

Step S102: Determine the type of the video scene included in the target video data through a preset image recognition model, and obtain the confidence levels of the video frames respectively included in the video scenes of different types; wherein, the image recognition model A deep neural network model used to classify the video scene contained in the target video data according to the feature information of the video scene contained in the target video data.

After segmenting the complete video data to be identified in the step 101 to obtain the target video data, data preparation is done for analyzing the video scene in this step. In step 102, the video scene is analyzed through a preset image recognition model, it can be determined that the target video data contains the type of the video scene, and the video frames contained in the different types of video scenes can be obtained. Confidence.

In the embodiment of the present application, the confidence level of the video frame refers to a probability value indicating that the video frame is a video frame in the video scene. The higher the obtained confidence, the greater the probability that the video frame is the video frame in the video scene, that is, the higher the probability; on the contrary, the lower the obtained confidence, the lower the video frame is in the video scene. The smaller the probability value of the video frame, the smaller the probability.

In the actual implementation process, due to the immaturity of current video data segmentation, the complete video data is segmented according to the different video scenes contained in the complete video data (such as a video recording) to be detected, and the target video data obtained Usually contains at least one video frame of the video scene. That is, the target video data obtained through the segmentation process may include video frames of multiple shots.

Therefore, by identifying and classifying the video scenes contained in the target video data through the preset image recognition model, it is possible to obtain at least one video scene category, and the confidence that the video scenes of different categories respectively contain video frames, that is, : Video frame is the probability value of the video frame in the video scene of the corresponding category.

It should be noted that the preset image recognition model described in this application may refer to a video scene classifier based on a neural network, and the training of the video scene classifier may be by collecting different types of sample images, Extract the image features from the sample images, and then use the extracted image features and sample image types to train the video scene classifier. Common video scene classifiers include Support Vector Machine (SVM) and Bayesian classifiers. Logistic regression classifiers, etc., are not specifically limited here.

Step S103: Normalize the confidence levels of the video frames included in the video scenes of different categories to obtain weight values of the video scenes of different categories in the target video data.

After obtaining the confidence levels of the video frames respectively contained in the different types of video scenes in the step 102, data preparation is done for the normalization process in this step. In step 103, the confidence levels of the video frames included in the video scenes of different categories may be normalized respectively, so as to obtain the weight values of the video scenes of different categories in the target video data.

The following example illustrates that in the specific implementation process, the complete video data is segmented according to the different video scenes included. It is assumed that the obtained target video data can include video frames of three different video scenes: A, B, and C. When performing recognition and classification of the video scenes contained in the target video data through the preset image recognition model, three different video scene categories of A, B, and C may be obtained. Assuming that a category A video scene contains 5 video frames, the confidence of each video frame is 0.6, 0.5, 0.4, 0.4, 0.6; the B category video scene contains 3 video frames, and the confidence of each video frame is 0.6, 0.1 , 0.5; C category video scene contains 2 video frames, and the confidence of each video frame is 0.3 and 0.3 in turn.

A weighted average algorithm is used to normalize the confidence levels of the video frames included in the video scenes of different categories to obtain the weighted average of each video scene. Specifically, the confidence levels of 0.6, 0.5, 0.4, 0.4, and 0.6 corresponding to the 5 video frames are added together, and then divided by the number of video frames of 5 to obtain a weighted average value of 0.5. Similarly, it can be obtained that the weighted average value of category B video scenes is 0.4, and the weighted average value of category C video scenes is 0.3. The weighted average value is the weight value of the different types of video scenes in the target video data described in this application. That is: the weight value of the video scene in category A in the target video data is 0.5; the weight value of the video scene in category B in the target video data is 0.4; the video scene in category C is in The weight value in the target video data is 0.3.

In the actual implementation process, because the video frame with too low confidence may be the result of interference from other factors, in the specific operation process, you can first set a confidence threshold, and obtain that the confidence threshold is reached or exceeded. Of the video frame for the next calculation. For example, the confidence threshold is set to 0.5, and then the weighted average value of each video scene is calculated according to the video frames that meet or exceed the confidence threshold. For example: at this time, the video scene of category A only considers the video frames with confidence levels of 0.6, 0.5 and 0.6, the video scene of category B only considers the video frames with confidence levels of 0.6 and 0.5, and the video scene of category C There are no video frames that exceed the confidence threshold. The weighted average of video scenes in category A is (0.6+0.5+0.5)/3, the weighted average of video scenes in category B is (0.6+0.5)/2, and the weighted average of video scenes in category C is 0 . In order to reduce the amount of calculation, when there are many video frames that meet or exceed the confidence threshold in each category, 3-6 video frames with the highest score can be selected from each category for calculation.

Step S104: Obtain parameter information input by the user, obtain a target video scene from the video scene according to the parameter information and the size of the weight value, and return the target video scene to the client.

After obtaining the weight values of the video scenes of different categories in the target video data in the step 103, data preparation is done to obtain the target video scene from the video scenes in this step. In step 103, the target video scene may be obtained from the video scene according to the parameter information input by the user and the size of the weight value, and the target video scene may be returned to the client.

In the embodiment of the present application, the target parameter value can be extracted from the parameter information input by the user. At this time, according to the size of the target parameter value and the weight value, the weight value can be obtained from the video scene. Or the first video scene that exceeds the preset weight threshold, the first video scene is used as the target video scene. Wherein, the first video scene includes at least one of the video scenes.

Specifically, assuming that the target parameter value N input by the user is 2, it is first judged whether the target parameter value N is 2 greater than the number of types of video scenes contained in the target video data 3, if not, the video scene is The weight values in the target video data in descending order are: A category video scene>B category video scene>C category video scene. At this time, the preset weight threshold may be set to 3, and the first video scene (ie, category A video scene and category B video scene) whose weight value reaches or exceeds 3 is obtained from the video scene. The category A video scene and the category B video scene are used as the target video scene for identifying the video data. When the target parameter value N input by the user is 1, at this time, the preset weight threshold can be set to 5, and the first video scene with the weight value reaching or exceeding 5 (ie: category A video) is obtained from the video scene. Scenes). A category video scene is used as the target video scene for identifying the video data.

In the embodiment described in this application, it can be determined whether the target parameter value is greater than the number of types of video scenes included in the target video data, and if so, an alarm prompt information is returned to the client. That is: when the target parameter value N input by the user is 5, the target parameter value N is 5 greater than the number of types of video scenes contained in the target video data 3. At this time, an alert message is returned to the client to remind user.

Corresponding to the method for identifying video scenes in video data provided above, this application also provides a device for identifying video scenes in video data. Since the embodiment of the device is similar to the above method embodiment, the description is relatively simple. For related details, please refer to the description of the above method embodiment section. The embodiment of the device for identifying video scenes in video data described below is only Schematic. Please refer to FIG. 2, which is a schematic diagram of an apparatus for identifying video scenes in video data provided by an embodiment of the application.

The device for identifying video scenes in video data described in this application includes the following parts:

The segmentation processing unit 201 is configured to segment the complete video data according to different video scenes contained in the complete video data to be detected to obtain target video data; wherein, the target video data includes at least one of the video scenes Video frame.

The video scene recognition unit 202 is configured to determine the type of the video scene contained in the target video data through a preset image recognition model, and obtain the confidence levels of the video frames contained in the video scenes of different types; wherein, The image recognition model is a deep neural network model that classifies the video scene included in the target video data according to the feature information of the video scene included in the target video data.

In the embodiment of the present application, the confidence level of the video frame refers to the probability value that the video frame is a video frame corresponding to the video scene.

In the actual implementation process, due to the immaturity of current video data segmentation, the complete video data is segmented according to the different video scenes contained in the complete video data to be detected, and the obtained target video data includes at least one of the videos. The video frame of the scene. Therefore, by identifying and classifying the video scenes contained in the target video data through the preset image recognition model, it is possible to obtain at least one video scene category and the confidence that the video scenes of different categories respectively contain video frames.

It should be noted that the preset image recognition model described in this application may refer to a video scene classifier based on a neural network, and the training of the video scene classifier may be by collecting different types of sample images, Extract the image features from the sample images, and then use the extracted image features and sample image types to train the video scene classifier. Common video scene classifiers include Support Vector Machine (SVM) and Bayesian classifiers. Logistic regression classifier, etc.

The video scene weight analysis unit 203 is configured to normalize the confidence levels of the video frames included in the video scenes of different categories to obtain the weight values of the video scenes of different categories in the target video data.

For example, in the specific implementation process, the complete video data is segmented according to the different video scenes included. It is assumed that the obtained target video data may include video frames of three different video scenes: A, B, and C. In order to identify and classify the video scenes contained in the target video data through the preset image recognition model, three different video scene categories of A, B, and C may be obtained. Assuming that a category A video scene contains 5 video frames, the confidence of each video frame is 0.6, 0.5, 0.2, 0.2, 0.5; the B category video scene contains 3 video frames, and the confidence of each video frame is 0.7, 0.3 , 0.5; C category video scene contains 2 video frames, and the confidence of each video frame is 0.3 and 0.3 in turn. A weighted average algorithm is used to normalize the confidence levels of the video frames included in the video scenes of different categories to obtain the weighted average of each video scene. Specifically, the confidence levels of 0.6, 0.5, 0.4, 0.4, and 0.6 corresponding to the 5 video frames are added together, and then divided by the number of video frames of 5 to obtain a weighted average value of 0.5. Similarly, it can be obtained that the weighted average value of category B video scenes is 0.4, and the weighted average value of category C video scenes is 0.3. The weighted average value is the weight value of the different types of video scenes in the target video data described in this application. That is: the weight value of the video scene in category A in the target video data is 0.5; the weight value of the video scene in category B in the target video data is 0.4; the video scene in category C is in The weight value in the target video data is 0.3.

The target video data obtaining unit 204 is configured to obtain parameter information input by the user, obtain a target video scene from the video scene according to the parameter information and the size of the weight value, and return the target video scene to the client .

In the embodiment described in this application, it can be determined whether the target parameter value is greater than the number of types of video scenes included in the target video data, and if so, an alarm prompt information is returned to the client. That is: when the target parameter value N input by the user is 5, the target parameter value N is 5 greater than the number of types of video scenes contained in the target video data 3. At this time, an alarm message is returned to the client to remind user.

Using the device for recognizing video scenes in video data described in this application, the confidence levels of the video frames contained in different types of video scenes can be obtained through a preset image recognition model, and then normalized processing is performed to obtain the video scene in The weight value in the target video data is compared with the size of the weight value to quickly determine the target video scene corresponding to the target video data, which improves the collection efficiency and accuracy of the target video scene, thereby improving the user experience.

Corresponding to the method for identifying video scenes in video data provided above, the present application also provides an electronic device. Since the embodiment of the electronic device is similar to the foregoing method embodiment, the description is relatively simple. For related details, please refer to the description of the foregoing method embodiment section. The electronic device described below is only illustrative. Please refer to FIG. 3, which is a schematic diagram of an electronic device provided by an embodiment of this application.

The present application provides an electronic device that specifically includes: a processor 301 and a memory 302; wherein the memory 302 is used to store a program of a method for identifying video scenes in video data, and the device is powered on and runs through the processor 301 After identifying the procedure of the video scene in the video data, perform the following steps:

Correspondingly, this application also provides a storage device, including: a program storing a method for identifying a video scene in video data, the program being run by a processor, and performing the following steps:

In the embodiment of the present application, the processor or the processor module may be an integrated circuit chip with signal processing capability. The processor can be a general-purpose processor, a digital signal processor (DSP), an application specific integrated circuit (ASIC), a field programmable gate array (Field Programmable Gate Array, FPGA) or other Programmable logic devices, discrete gate or transistor logic devices, discrete hardware components.

The methods, steps, and logical block diagrams disclosed in the embodiments of the present application can be implemented or executed. The general-purpose processor may be a microprocessor or the processor may also be any conventional processor or the like. The steps of the method disclosed in the embodiments of the present application may be directly embodied as being executed and completed by a hardware decoding processor, or executed and completed by a combination of hardware and software modules in the decoding processor. The software module can be located in a mature storage medium in the field such as random access memory, flash memory, read-only memory, programmable read-only memory, or electrically erasable programmable memory, registers. The processor reads the information in the storage medium and completes the steps of the above method in combination with its hardware.

The storage medium may be a memory, for example, may be a volatile memory or a non-volatile memory, or may include both volatile and non-volatile memory.

Among them, the non-volatile memory can be a read-only memory (Read-Only Memory, ROM for short), a programmable read-only memory (Programmable ROM, PROM for short), and an erasable programmable read-only memory (Erasable PROM, EPROM for short). , Electrically Erasable Programmable Read-Only Memory (Electrically EPROM, EEPROM for short) or flash memory.

The volatile memory may be a random access memory (Random Access Memory, RAM for short), which is used as an external cache. By way of exemplary but not restrictive description, many forms of RAM are available, such as static random access memory (Static RAM, SRAM for short), dynamic random access memory (Dynamic RAM, DRAM for short), and synchronous dynamic random access memory (Synchronous RAM). DRAM, SDRAM for short), Double Data Rate Synchronous Dynamic Random Access Memory (Double Data Rate SDRAM, for DDRSDRAM), Enhanced Synchronous Dynamic Random Access Memory (Enhanced SDRAM, for ESDRAM), Synchronously Connected Dynamic Random Access Memory ( Synch link DRAM, SLDRAM for short) and Direct Ram bus RAM (DRRAM for short).

The storage media described in the embodiments of the present application are intended to include, but are not limited to, these and any other suitable types of memory.

Those skilled in the art should be aware that in one or more of the above examples, the functions described in this application can be implemented by a combination of hardware and software. When software is applied, the corresponding function can be stored in a computer-readable medium or transmitted as one or more instructions or codes on the computer-readable medium. The computer-readable medium includes a computer storage medium and a communication medium, where the communication medium includes any medium that facilitates the transfer of a computer program from one place to another. The storage medium may be any available medium that can be accessed by a general-purpose or special-purpose computer.

The specific implementations described above further describe the purpose, technical solutions and beneficial effects of this application in detail. It should be understood that the above are only specific implementations of this application and are not intended to limit the scope of this application. The protection scope, any modification, equivalent replacement, improvement, etc. made on the basis of the technical solution of this application shall be included in the protection scope of this application.

Claims

A method for identifying video scenes in video data, which is characterized in that it includes:

Performing segmentation processing on the complete video data according to different video scenes included in the complete video data to be detected to obtain target video data; wherein the target video data includes at least one video frame of the video scene;

Through a preset image recognition model, it is determined that the target video data includes the type of the video scene, and the confidence levels of the video frames contained in the different types of the video scenes are obtained; wherein, the image recognition model is the The feature information of the video scene included in the target video data is a deep neural network model that classifies the video scene included in the target video data;

Normalizing the confidence levels of the video frames included in the video scenes of different categories to obtain the weight values of the video scenes of the different categories in the target video data;

Obtain parameter information input by the user, obtain a target video scene from the video scene according to the parameter information and the size of the weight value, and return the target video scene to the client.
The method for recognizing video scenes in video data according to claim 1, wherein the complete video data is segmented according to different video scenes included in the complete video data to be detected to obtain the target video data, Specifically:

Obtaining the complete video data to be detected;

Obtaining the color features of the video scene in the complete video data through a feature extraction algorithm;

Obtaining the color feature of the video frame in the complete video data by using the feature extraction algorithm;

According to the first color feature difference between two adjacent video scenes in the complete video data and the second color feature difference between two adjacent video frames in the complete video data, determine the adjacent Switch position between video scenes;

Perform segmentation processing on the complete video data according to the switching position.
The method for recognizing a video scene in video data according to claim 1, wherein the obtaining a target video scene from the video scene according to the parameter information and the size of the weight value specifically comprises:

Extracting the target parameter value from the parameter information;

According to the size of the target parameter value and the weight value, obtain a first video scene whose weight value reaches or exceed a preset weight threshold from the video scene, and use the first video scene as the target video Scene; wherein, the first video scene includes at least one of the video scenes.
The method for identifying video scenes in video data according to claim 1, further comprising:

Extracting the target parameter value from the parameter information;

It is determined whether the target parameter value is greater than the number of types of video scenes included in the target video data, and if so, an alarm prompt message is returned to the client.
The method for identifying a video scene in video data according to claim 1, wherein the confidence of the video frame refers to the probability value of the video frame being a video frame corresponding to the video scene.
A device for identifying video scenes in video data is characterized in that it comprises:

The segmentation processing unit is configured to segment the complete video data according to the different video scenes contained in the complete video data to be detected to obtain target video data; wherein the target video data includes at least one video of the video scene frame;

The video scene recognition unit is used to determine the type of the video scene contained in the target video data through a preset image recognition model, and obtain the confidence levels of the video frames contained in the video scenes of different types; wherein, The image recognition model is a deep neural network model that classifies the video scene included in the target video data according to the feature information of the video scene included in the target video data;

A video scene weight analysis unit, configured to normalize the confidence levels of the video frames included in the video scenes of different categories to obtain the weight values of the video scenes of different categories in the target video data;

The target video data obtaining unit is configured to obtain parameter information input by the user, obtain a target video scene from the video scene according to the parameter information and the size of the weight value, and return the target video scene to the client.
The device for identifying video scenes in video data according to claim 6, wherein the target video data obtaining unit is specifically configured to:

Obtaining the complete video data to be detected;

Obtaining the color features of the video scene in the complete video data through a feature extraction algorithm;

Obtaining the color feature of the video frame in the complete video data by using the feature extraction algorithm;

According to the first color feature difference between two adjacent video scenes in the complete video data and the second color feature difference between two adjacent video frames in the complete video data, determine the adjacent Switch position between video scenes;

Perform segmentation processing on the complete video data according to the switching position.
The device for identifying video scenes in video data according to claim 6, wherein the segmentation processing unit is specifically configured to:

Extracting the target parameter value from the parameter information;

According to the size of the target parameter value and the weight value, obtain a first video scene whose weight value reaches or exceed a preset weight threshold from the video scene, and use the first video scene as the target video Scene; wherein, the first video scene includes at least one of the video scenes.
An electronic device, characterized in that it comprises:

Processor; and

The memory is used to store the program of the method for identifying the video scene in the video data. After the device is powered on and runs the program of the method for identifying the video scene in the video data through the processor, the following steps are executed:

Performing segmentation processing on the complete video data according to different video scenes included in the complete video data to be detected to obtain target video data; wherein the target video data includes at least one video frame of the video scene;

Through a preset image recognition model, the target video data is determined to include the type of the video scene, and the confidence levels of the video frames contained in the different types of video scenes are obtained; wherein, the image recognition model is based on The target video data includes feature information of the video scene, and a deep neural network model that classifies the video scene included in the target video data;

Normalizing the confidence levels of the video frames included in the video scenes of different categories to obtain the weight values of the video scenes of the different categories in the target video data;

Obtain parameter information input by the user, obtain a target video scene from the video scene according to the parameter information and the size of the weight value, and return the target video scene to the client.
A storage device, characterized in that it stores a program of a method for identifying video scenes in video data, and the program is run by a processor to perform the following steps:

Performing segmentation processing on the complete video data according to different video scenes included in the complete video data to be detected to obtain target video data; wherein the target video data includes at least one video frame of the video scene;

Through a preset image recognition model, the target video data is determined to include the type of the video scene, and the confidence levels of the video frames contained in the different types of video scenes are obtained; wherein, the image recognition model is based on The target video data includes feature information of the video scene, and a deep neural network model that classifies the video scene included in the target video data;

Normalizing the confidence levels of the video frames included in the video scenes of different categories to obtain the weight values of the video scenes of the different categories in the target video data;

Obtain parameter information input by the user, obtain a target video scene from the video scene according to the parameter information and the size of the weight value, and return the target video scene to the client.