CN116503640A

CN116503640A - Video detection method, device, electronic equipment and storage medium

Info

Publication number: CN116503640A
Application number: CN202310250616.3A
Authority: CN
Inventors: 张丽; 杜悦艺; 孙亚生
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2023-03-15
Filing date: 2023-03-15
Publication date: 2023-07-28

Abstract

The disclosure provides a video detection method, a video detection device, electronic equipment and a storage medium, relates to the technical field of artificial intelligence, and particularly relates to the technical fields of computer vision, video processing, image recognition and the like. The specific implementation scheme is as follows: acquiring video data to be detected; classifying the video data to obtain a blurred image frame and a first clear image frame in the video data; detecting the first clear image frame to obtain a target detection result of the first clear image frame; and selecting a second clear image frame used for replacing the fuzzy image frame from the first clear image frame so as to determine the target detection result of the video data according to the target detection result of the first clear image frame and the target detection result of the second clear image frame.

Description

Video detection method, device, electronic equipment and storage medium

Technical Field

The present disclosure relates to the field of artificial intelligence, and in particular, to the technical fields of computer vision, video processing, image recognition, and the like.

Background

Video detection, i.e., video object detection, is mainly the object recognition and localization of each frame of image in video. However, unlike the mere detection of objects in pictures, the detection of video objects also needs to solve the problems of motion blur, video defocus, partial occlusion of objects, and deformation of objects in video images due to the unique situation of the video itself.

Currently, the video detection modes in the related art may include video detection based on network structure optimization, video detection based on knowledge distillation algorithm, and the like.

Disclosure of Invention

The disclosure provides a video detection method, a video detection device, electronic equipment and a storage medium.

According to an aspect of the present disclosure, there is provided a method of video detection, including:

acquiring video data to be detected;

classifying the video data to obtain a blurred image frame and a first clear image frame in the video data;

detecting the first clear image frame to obtain a target detection result of the first clear image frame;

and selecting a second clear image frame used for replacing the fuzzy image frame from the first clear image frame so as to determine the target detection result of the video data according to the target detection result of the first clear image frame and the target detection result of the second clear image frame.

According to another aspect of the present disclosure, there is provided an apparatus for video detection, including:

an acquisition unit configured to acquire video data to be detected;

the classification unit is used for performing classification processing on the video data to obtain a blurred image frame and a first clear image frame in the video data;

The detection unit is used for carrying out detection processing on the first clear image frame so as to obtain a target detection result of the first clear image frame;

and the determining unit is used for selecting a second clear image frame used for replacing the fuzzy image frame from the first clear image frame so as to determine the target detection result of the video data according to the target detection result of the first clear image frame and the target detection result of the second clear image frame.

According to still another aspect of the present disclosure, there is provided an electronic apparatus including:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein, the liquid crystal display device comprises a liquid crystal display device,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the aspects and methods of any one of the possible implementations described above.

According to yet another aspect of the present disclosure, there is provided a non-transitory computer-readable storage medium storing computer instructions for causing the computer to perform the method of the aspects and any possible implementation described above.

According to a further aspect of the present disclosure there is provided a computer program product comprising a computer program which, when executed by a processor, implements the method of the aspects and any one of the possible implementations described above.

As can be seen from the above technical solutions, in the embodiments of the present disclosure, the obtained video data to be detected may be subjected to classification processing to obtain a blurred image frame and a first clear image frame in the video data, and then the first clear image frame may be subjected to detection processing to obtain a target detection result of the first clear image frame, so that a second clear image frame for replacing the blurred image frame may be selected from the first clear image frame, so that the target detection result of the video data may be determined according to the target detection result of the first clear image frame and the target detection result of the second clear image frame.

It should be understood that the description in this section is not intended to identify key or critical features of the embodiments of the disclosure, nor is it intended to be used to limit the scope of the disclosure. Other features of the present disclosure will become apparent from the following specification.

Drawings

The drawings are for a better understanding of the present solution and are not to be construed as limiting the present disclosure. Wherein:

FIG. 1 is a schematic diagram according to a first embodiment of the present disclosure;

FIG. 2 is a schematic diagram according to a second embodiment of the present disclosure;

FIG. 3 is a schematic diagram according to a third embodiment of the present disclosure;

fig. 4 is a block diagram of an electronic device for implementing a method of video detection of an embodiment of the present disclosure.

Detailed Description

Exemplary embodiments of the present disclosure are described below in conjunction with the accompanying drawings, which include various details of the embodiments of the present disclosure to facilitate understanding, and should be considered as merely exemplary. Accordingly, one of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

It will be apparent that the described embodiments are some, but not all, of the embodiments of the present disclosure. All other embodiments, which can be made by one of ordinary skill in the art based on the embodiments in this disclosure without inventive faculty, are intended to be within the scope of this disclosure.

It should be noted that, the terminal device in the embodiments of the present disclosure may include, but is not limited to, smart devices such as a mobile phone, a personal digital assistant (Personal Digital Assistant, PDA), a wireless handheld device, and a Tablet Computer (Tablet Computer); the display device may include, but is not limited to, a personal computer, a television, or the like having a display function.

In addition, the term "and/or" herein is merely an association relationship describing an association object, and means that three relationships may exist, for example, a and/or B may mean: a exists alone, A and B exist together, and B exists alone. In addition, the character "/" herein generally indicates that the front and rear associated objects are an "or" relationship.

At present, a common video detection mode mainly comprises a video detection method based on network structure optimization and a video detection method based on a knowledge distillation algorithm. Specifically, the video detection method based on network structure optimization may be, for example, using a residual network (res nets), so that a target detection model of a video may be deeper and deeper, and the expressive force of the video is stronger, so that the effect of video detection is better and better. Or, based on the depth separable convolution network, the parameter quantity and the calculated quantity of the target detection model of the video are reduced greatly, so that the video detection speed is improved.

However, in the video detection method based on network structure optimization, because the network structure is complex, in practical application, multiple model training experiments are required, and resources are very consumed. Moreover, although the detection speed can be improved to a certain extent by optimizing the network structure, the efficiency requirement of practical application cannot be well met. In the video detection method based on the knowledge distillation algorithm, since the target detection is often composed of a model of a plurality of modules, each module needs to perform knowledge distillation independently, the training process is complex, and the actual gain is poor.

Therefore, it is desirable to provide a method for detecting video, which can efficiently detect a target object in video, so as to optimize the reliability of video detection. The method specifically comprises the following steps:

fig. 1 is a schematic diagram according to a first embodiment of the present disclosure, as shown in fig. 1.

101. And acquiring video data to be detected.

102. And classifying the video data to obtain a blurred image frame and a first clear image frame in the video data.

103. And detecting the first clear image frame to obtain a target detection result of the first clear image frame.

104. And selecting a second clear image frame used for replacing the fuzzy image frame from the first clear image frame so as to determine the target detection result of the video data according to the target detection result of the first clear image frame and the target detection result of the second clear image frame.

It should be noted that, the blurred image frame may multiplex the target detection result of the second clear image frame, that is, the target detection result of the second clear image frame is taken as the target detection result of the corresponding blurred image frame. In other words, the target detection result of the second clear image frame may be the target detection result of the blurred image frame. And combining the target detection result of the first clear image frame and the target detection result of the fuzzy image frame to obtain the target detection result of all the image frames of the video data, namely the target detection result of the video data.

The execution bodies 101 to 104 may be part or all of applications located in the local terminal, or may be functional units such as plug-ins or software development kits (Software Development Kit, SDKs) provided in the applications located in the local terminal, or may be processing engines located in a server on the network side, or may be distributed systems located on the network side, for example, processing engines or distributed systems in a video detection platform on the network side, which is not particularly limited in this embodiment.

It will be appreciated that the application may be a native program (native app) installed on the native terminal, or may also be a web page program (webApp) of a browser on the native terminal, which is not limited in this embodiment.

In this way, the embodiment of the disclosure may perform classification processing on the obtained video data to be detected to obtain a blurred image frame and a first clear image frame in the video data, and further may perform detection processing on the first clear image frame to obtain a target detection result of the first clear image frame, so that a second clear image frame for replacing the blurred image frame may be selected from the first clear image frame, so as to determine the target detection result of the video data according to the target detection result of the first clear image frame and the target detection result of the second clear image frame.

Optionally, in one possible implementation manner of this embodiment, in 102, the video data may be specifically classified by using a preset classification model to obtain a classification result, and then the blurred image frame and the first sharp image frame in the video data may be obtained according to the classification result.

In this implementation, the preset classification model may include a lightweight model based on a convolutional neural network. For example, the preset classification model may include, but is not limited to, a lightweight model of the mobilet family. The preset classification model can be used for analyzing the definition of the image and classifying the image based on the definition of the image.

In a specific implementation process of the implementation manner, the video data to be detected can be input into a preset classification model, and the blurred image frame and the first clear image frame in the video data can be output.

In particular, the first clear image frame may be all clear image frames in the video data. The blurred image frames may be all blurred image frames in the video data.

In this way, in the implementation manner, the video data can be classified by using the preset classification model to obtain the blurred image frames and the clear image frames in the video data, so that the accuracy and the reliability of classification of all the image frames of the video are improved. And the target detection processing of the fuzzy image frames and the clear image frames can be conveniently carried out later, so that the target detection result of the video can be obtained more accurately and effectively, and the reliability of video detection is improved.

Optionally, in one possible implementation manner of this embodiment, in 103, a detection process may be specifically performed on the first clear image frame by using a preset target detection model, so as to obtain a result of the detection process, and further, a target detection result of the first clear image frame may be obtained according to the result of the detection process.

In this implementation, the preset target detection model may include a model based on a target detection algorithm.

The preset object detection model may further include object detection models for different object objects. For example, the preset target detection model may include, but is not limited to, a license plate detection model, a face detection model, and the like.

In a specific implementation process of the implementation manner, the video data to be detected can be specifically input into a preset classification model, and the blurred image frame and the first clear image frame can be output.

In particular, the first clear image frame may be all clear image frames in the video data. The number of first clear image frames may be at least one and the number of blurred image frames may be at least one.

For example, the first clear image frame may be a set of clear image frames and the blurred image frame may be a set of blurred image frames.

It will be appreciated that the target detection algorithm may be an existing algorithm capable of performing a target detection function. In the actual service, a corresponding target detection algorithm may be selected according to the service requirement, which may not be specifically limited herein.

In this way, in the implementation manner, the target detection result of the first clear image frame can be obtained by using the preset target detection model to perform detection processing on the first clear image frame, so that the target detection processing on the clear image frame is only needed in the follow-up process, the data processing amount is reduced, and meanwhile, the adverse effect of the fuzzy image frame on the target detection result is reduced, so that the reliability of video detection is further improved.

It should be noted that, based on the various specific implementation procedures provided in the present implementation manner, the video detection method of the present embodiment may be implemented in combination with the various specific implementation procedures provided in the foregoing implementation manner. The detailed description may refer to the relevant content in the foregoing implementation, and will not be repeated here.

Optionally, in one possible implementation manner of the present embodiment, in 104, specifically, a first distance between the blurred image frame and the first clear image frame may be acquired, and then, in response to the first distance meeting a preset distance condition, the first clear image frame may be used as a second clear image frame for replacing the blurred image frame.

In this implementation, the first distance may be a distance of the current blurred image frame from one first clear image frame, i.e. a distance of two image frames.

For example, a first distance of 5 may represent a phase difference of 5 frames between one blurred image frame and one first sharp image frame.

In this implementation, the preset distance condition may include that the first distance reaches a preset distance threshold, or that the first distance is within a preset distance range, or the like.

Specifically, the preset distance threshold may be preconfigured according to the condition of the target object in the video data to be detected.

Illustratively, the target object may have a faster movement speed, a smaller distance threshold may be preconfigured, or the target object may have a slower movement speed, a larger distance threshold may be preconfigured.

For example, the target object is a traveling vehicle, and the distance threshold may be preconfigured to be 2 frames. The target object is a pedestrian, and the distance threshold may be preconfigured to be 5 frames.

Alternatively, the preset distance range may be preconfigured according to the condition of the target object in the video data to be detected.

For example, the target object is a traveling vehicle, and the distance range may be preconfigured to be 2 frames. The target object is a pedestrian, and the distance range may be preconfigured to be 5 frames.

It will be appreciated that here the selection is made among the first sharp image frames adjacent to the current blurred image frame. For example, a first clear image frame 2 frames away from the current blurred image frame may be selected.

In a specific implementation process of the implementation manner, for any one current blurred image frame, if a first distance between the current blurred image frame and any one first clear image frame reaches a preset distance threshold, the first clear image frame may be used as a second clear image frame for replacing the current blurred image frame.

It is understood that the second clear image frame for replacing the current blurred image frame may include at least one of a first clear image frame before the current blurred image frame and a first clear image frame after the current blurred image frame. In practical application, the second clear image frame which can be used for replacing the current blurred image frame can be selected according to the practical service requirement.

In this way, in the present implementation, the first clear image frame whose first distance satisfies the preset distance condition may be taken as the second clear image frame for replacing the blurred image frame by acquiring the first distance between the blurred image frame and the first clear image frame. Therefore, a second clear image frame more suitable for replacing the fuzzy image frame can be selected, so that the fuzzy image frame can multiplex the target detection result of the second clear image frame, the fuzzy image frame does not need to be subjected to target detection processing, the data processing efficiency is improved, the adverse effect of the fuzzy image frame on the target detection result is reduced, and the reliability and the accuracy of video detection are further improved.

In another specific implementation process of the implementation manner, a second distance between the blurred image frame and a second clear image frame replacing the blurred image frame may be further obtained, and further, according to the second distance, an expansion process may be performed on a target detection result of the second clear image frame, so as to obtain a target detection result of the second clear image frame after the expansion process.

In this implementation, the second distance may be a relative distance between the blurred image frame and a second clear image frame replacing the blurred image frame.

In one case of the specific implementation process, the target detection frame in the target detection result of the second clear image frame can be obtained, and then the amplification factor can be determined according to the second distance, so that the target detection frame can be expanded according to the amplification factor.

Specifically, the magnification corresponding to the second distance may be determined according to the second distance and a preset relationship between the second distance and the magnification.

Here, in the preset relationship between the second distance and the magnification, the larger the second distance, the larger the corresponding magnification may be, and the smaller the second distance, the smaller the corresponding magnification may be.

In addition, the magnification factor can be determined according to the motion condition of the target object in the video data to be detected. The faster the movement speed of the target object, the larger the corresponding magnification may be, and the slower the movement speed of the target object, the smaller the corresponding magnification may be.

Optionally, the target detection frame may be further subjected to an equal-scale expansion process according to the magnification.

For example, if the magnification is 4, the length and width of the target detection frame may be simultaneously extended by 4 times.

In this way, the target detection result of the second clear image frame after the expansion processing can be obtained by carrying out the expansion processing on the target detection result of the second clear image frame according to the second distance between the fuzzy image frame and the second clear image frame replacing the fuzzy image frame, and the occurrence of the omission problem can be effectively prevented, thereby further improving the reliability and the accuracy of video detection.

And the target detection frame in the target detection results of the second clear image frames can be expanded according to the magnification corresponding to the second distance, so that more comprehensive target detection results can be obtained, missed detection is further avoided, and the reliability and accuracy of video detection are further improved.

Optionally, in one possible implementation manner of this embodiment, in 101, each image frame of the video data to be detected may be specifically acquired, and further, according to a preset reduction multiple, a reduction process may be performed on each image frame, so as to obtain the video data after the reduction process.

In one specific implementation of this implementation, first, each image frame of video data to be detected may be acquired. And secondly, carrying out equal-proportion reduction processing on each image frame according to a preset reduction multiple to obtain each image frame after the reduction processing. And obtaining the video data after the reduction processing according to each image frame after the reduction processing.

In another specific implementation of this implementation, first, each image frame of video data to be detected may be acquired. And secondly, carrying out equal-proportion amplification processing on each image frame according to preset amplification factors to obtain each amplified image frame. And obtaining the video data after the amplification processing according to each image frame after the amplification processing.

It will be appreciated that it may be determined to perform a reduction process or an enlargement process on each image frame of the video data according to the requirements of the actual service scenario.

For example, if the actual traffic scenario requires a high speed of video detection, a downscaling process may be performed on each image frame of the video data. If the accuracy and effect requirements of the actual service scene on the video detection are high, each image frame of the video data can be amplified.

Here, the preset reduction magnification and the preset magnification may be both preconfigured according to the video detection history experience.

In this way, in the implementation manner, the video data after the shrinkage processing can be obtained by performing the shrinkage processing on each image frame of the video data according to the preset shrinkage multiple, so that the data volume to be processed in the subsequent detection can be effectively reduced, and the processing speed of the video detection is improved.

In addition, the amplified video data can be obtained by amplifying each image frame of the video data according to the preset amplification factor, and as the image characteristic information in the amplified video data is possibly richer and more comprehensive, a more effective target detection result can be obtained when the target detection is carried out on the amplified video data, so that the effect of video detection is improved.

In this embodiment, the obtained video data to be detected may be subjected to classification processing to obtain a blurred image frame and a first clear image frame in the video data, so that the first clear image frame may be subjected to detection processing to obtain a target detection result of the first clear image frame, so that a second clear image frame for replacing the blurred image frame may be selected from the first clear image frame, and the target detection result of the video data may be determined according to the target detection result of the first clear image frame and the target detection result of the second clear image frame.

In addition, by adopting the technical scheme provided by the embodiment, the video data can be classified by utilizing the preset classification model, so that the blurred image frames and the clear image frames in the video data are obtained, and the accuracy and the reliability of the classification of the video image frames are improved. And the target detection processing of the fuzzy image frames and the clear image frames can be conveniently carried out later, so that the target detection result of the video can be obtained more accurately and effectively, and the reliability of video detection is improved.

In addition, by adopting the technical scheme provided by the embodiment, the detection processing can be performed on the first clear image frame by utilizing the preset target detection model, so as to obtain the target detection result of the first clear image frame. Therefore, only the object detection processing is needed for the clear image frames, the data processing amount is reduced, and meanwhile, the adverse effect of the blurred image frames on the object detection result is reduced, so that the reliability of video detection is further improved.

In addition, by adopting the technical scheme provided by the embodiment, the first clear image frame with the first distance meeting the preset distance condition can be used as the second clear image frame for replacing the blurred image frame by acquiring the first distance between the blurred image frame and the first clear image frame. Therefore, a second clear image frame more suitable for replacing the fuzzy image frame can be selected, so that the fuzzy image frame can multiplex the target detection result of the second clear image frame, the fuzzy image frame does not need to be subjected to target detection processing, the data processing efficiency is improved, the adverse effect of the fuzzy image frame on the target detection result is reduced, and the reliability and the accuracy of video detection are further improved.

In addition, by adopting the technical scheme provided by the embodiment, the target detection result of the second clear image frame can be obtained by performing expansion processing on the target detection result of the second clear image frame according to the second distance between the fuzzy image frame and the second clear image frame replacing the fuzzy image frame, and the occurrence of the omission problem can be effectively prevented, so that the reliability and the accuracy of video detection are further improved.

In addition, by adopting the technical scheme provided by the embodiment, each image frame of the video data can be subjected to reduction processing according to the preset reduction multiple, so that the video data after the reduction processing is obtained, and the data volume to be processed in the subsequent detection can be effectively reduced, so that the processing speed of the video detection is improved.

Fig. 2 is a schematic diagram, as shown in fig. 2, according to a second embodiment of the present disclosure.

In this embodiment, in order to better understand the method of video detection of the present disclosure, the method of video detection of the present disclosure will now be described in detail with reference to an application example.

201. Each image frame of video data to be detected is acquired.

202. And carrying out equal-proportion reduction processing or equal-proportion amplification processing on each image frame of the video data according to a preset reduction multiple or a preset amplification multiple so as to obtain the video data after the reduction processing or the amplification processing.

In this embodiment, the video data is formed of successive images frame by frame, and the resolution of each image frame is generally the same, for example, 1920 pixels (px) by 1080 pixels (px).

Here, the optimized video detection process may be achieved by performing an equal-scale scaling process on each image frame of the video data.

For example, a large-size image, i.e., an image frame, such as an image with a size of 1920px x 1080px, may be scaled down to 640px x 360px in an equal proportion, and video detection may be performed based on the scaled-down image, which may increase the detection speed by 800 times.

In addition, for example, the image with 640px x 360px size can be equally scaled up to 840px x 472px, and the detection effect can be improved by 20% by performing video detection based on the amplified image.

It is understood that it may be determined whether to perform reduction processing or enlargement processing on an image in video data according to an actual application scenario. If the actual application scene pays more attention to the detection speed, the size of the image frame can be properly reduced in an equal proportion; if the detection effect is more important in the actual application scene, the size of the image frame can be properly amplified in equal proportion.

203. And classifying the video data subjected to the reduction processing or the amplification processing by using a preset classification model to obtain a fuzzy image frame and a first clear image frame in the video data.

204. And detecting the first clear image frame by using a preset target detection model to obtain a target detection result of the first clear image frame.

In this embodiment, the preset classification model may include, but is not limited to, a lightweight model of the mobilet series.

For example, the pre-set classification model may include a mobilet-v 1-0.25 lightweight network model.

In this embodiment, the preset target detection model may include target detection models for different target objects. For example, the preset target detection model may include, but is not limited to, a license plate detection model, a face detection model, and the like.

Since the object detection model has poor detection effect on blurred image frames, the object cannot be detected correctly in general, and the video detection time is long. Here, only the first clear image frame may be subjected to detection processing using a preset object detection model.

205. A first distance between the blurred image frame and the first sharp image frame is acquired.

206. And if the first distance reaches a preset distance threshold value, the first clear image frame is used as a second clear image frame for replacing the fuzzy image frame.

In this embodiment, for any one of the current blurred image frames, if the first distance between the current blurred image frame and any one of the first clear image frames reaches a preset distance threshold, the first clear image frame may be used as the second clear image frame for replacing the current blurred image frame.

Further, first, a second distance between the blurred image frame and a second clear image frame replacing the blurred image frame may be acquired. And secondly, determining the amplification factor corresponding to the second distance according to the second distance and the preset relation between the second distance and the amplification factor. And obtaining a target detection frame in the target detection result of the second clear image frame, and performing expansion processing on the target detection frame according to the magnification to obtain the target detection result of the expanded second clear image frame.

In particular, the second distance may be a relative distance between the blurred image frame and a second clear image frame replacing the blurred image frame.

Specifically, in the preset relationship between the second distance and the magnification, the larger the second distance, the larger the corresponding magnification may be, and the smaller the second distance, the smaller the corresponding magnification may be.

Further, it will be appreciated that if the current blurred image frame does not meet the needs of the actual application scenario, the blurred image frame may be discarded directly.

207. And acquiring a target detection result of the second clear image frame so that the fuzzy image frame multiplexes the target detection result of the second clear image frame.

In this embodiment, based on the processing in the step 204, the target detection results of all the clear image frames may be obtained, and further, the target detection result of the second clear image frame may be obtained from the target detection results of all the clear image frames.

Here, the target detection result of the second clear image frame may be directly taken as the target detection result of the blurred image frame, that is, the blurred image frame is the target detection result of multiplexing the second clear image frame.

208. And determining the target detection result of the video data according to the target detection result of the first clear image frame and the target detection result of the second clear image frame.

In this embodiment, the target detection result of the second clear image frame is the target detection result of the blurred image frame.

Here, the target detection result of the video data may be obtained in combination with the target detection result of the first clear image frame and the target detection result of the blurred image frame.

By adopting the technical scheme provided by the embodiment, the blurred image frames and the clear image frames in the video data can be separated first, then the object detection processing is carried out on the clear image frames, the blurred image frames only need to directly multiplex the object detection results of the front and rear adjacent clear image frames, and then the object detection results of the blurred image frames and the object detection results of other clear image frames are combined, so that more effective video detection results can be obtained, the adverse effect of the blurred video image frames on the video detection results is avoided, and the reliability of video detection is optimized.

Moreover, by adopting the technical scheme provided by the embodiment, the detection effect of the blurred image frame in the video is improved by 90% and the detection effect of the whole video is improved by 10% under the condition that the video detection speed is unchanged.

It should be noted that, for simplicity of description, the foregoing method embodiments are all described as a series of acts, but it should be understood by those skilled in the art that the present disclosure is not limited by the order of acts described, as some steps may be performed in other orders or concurrently in accordance with the present disclosure. Further, those skilled in the art will also appreciate that the embodiments described in the specification are all of the preferred embodiments, and that the acts and modules referred to are not necessarily required by the present disclosure.

In the foregoing embodiments, the descriptions of the embodiments are emphasized, and for parts of one embodiment that are not described in detail, reference may be made to related descriptions of other embodiments.

Fig. 3 is a schematic diagram according to a third embodiment of the present disclosure, as shown in fig. 3. The apparatus 300 for video detection of the present embodiment may include an acquisition unit 301, a classification unit 302, a detection unit 303, and a determination unit 304, where the acquisition unit 301 is configured to acquire video data to be detected; a classification unit 302, configured to perform classification processing on the video data to obtain a blurred image frame and a first clear image frame in the video data; a detection unit 303, configured to perform detection processing on the first clear image frame, so as to obtain a target detection result of the first clear image frame; a determining unit 304, configured to select a second clear image frame for replacing the blurred image frame from the first clear image frames, so as to determine an object detection result of the video data according to an object detection result of the first clear image frame and an object detection result of the second clear image frame.

It should be noted that, part or all of the video detection apparatus in this embodiment may be an application located at a local terminal, or may also be a functional unit such as a plug-in unit or a software development kit (Software Development Kit, SDK) disposed in the application located at the local terminal, or may also be a processing engine located in a server on a network side, or may also be a distributed system located on the network side, for example, a processing engine or a distributed system in a video detection platform on the network side, which is not limited in this embodiment.

Optionally, in one possible implementation manner of this embodiment, the classifying unit 302 may specifically be configured to perform a classification process on the video data using a preset classification model, to obtain a classification process result, and obtain the blurred image frame and the first sharp image frame in the video data according to the classification process result.

Optionally, in one possible implementation manner of this embodiment, the detecting unit 303 may be specifically configured to perform detection processing on the first clear image frame by using a preset target detection model, to obtain a result of the detection processing, and obtain a target detection result of the first clear image frame according to the result of the detection processing.

Alternatively, in one possible implementation manner of this embodiment, the determining unit 304 may be specifically configured to obtain a first distance between the blurred image frame and the first clear image frame, and in response to the first distance meeting a preset distance condition, take the first clear image frame as a second clear image frame for replacing the blurred image frame.

Optionally, in one possible implementation manner of this embodiment, the determining unit 304 may be further configured to obtain a second distance between the blurred image frame and a second clear image frame that replaces the blurred image frame, and perform expansion processing on the target detection result of the second clear image frame according to the second distance, so as to obtain the target detection result of the expanded second clear image frame.

Optionally, in one possible implementation manner of this embodiment, the determining unit 304 may be further configured to obtain a target detection frame in the target detection result of the second clear image frame, determine a magnification according to the second distance, and perform expansion processing on the target detection frame according to the magnification.

Optionally, in one possible implementation manner of this embodiment, the acquiring unit 301 may be specifically configured to acquire each image frame of video data to be detected, and perform reduction processing on each image frame according to a preset reduction multiple, so as to obtain the video data after the reduction processing.

In this embodiment, the obtaining unit obtains the video data to be detected, and the classifying unit classifies the video data to obtain the blurred image frame and the first clear image frame in the video data, and the detecting unit detects the first clear image frame to obtain the target detection result of the first clear image frame, so that the determining unit can select the second clear image frame for replacing the blurred image frame from the first clear image frame, so as to determine the target detection result of the video data according to the target detection result of the first clear image frame and the target detection result of the second clear image frame.

In the technical scheme of the disclosure, related personal information of the user, such as collection, storage, use, processing, transmission, provision, disclosure and other processes of images, attribute data and the like of the user, all conform to the regulations of related laws and regulations and do not violate the popular regulations.

According to embodiments of the present disclosure, the present disclosure also provides an electronic device, a readable storage medium and a computer program product.

Fig. 4 illustrates a schematic block diagram of an example electronic device 400 that may be used to implement embodiments of the present disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular telephones, smartphones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the disclosure described and/or claimed herein.

As shown in fig. 4, the electronic device 400 includes a computing unit 401 that can perform various suitable actions and processes according to a computer program stored in a Read Only Memory (ROM) 402 or a computer program loaded from a storage unit 408 into a Random Access Memory (RAM) 403. In the RAM 403, various programs and data required for the operation of the electronic device 400 may also be stored. The computing unit 401, ROM 402, and RAM 403 are connected to each other by a bus 404. An input/output (I/O) interface 405 is also connected to bus 404.

Various components in electronic device 400 are connected to I/O interface 405, including: an input unit 406 such as a keyboard, a mouse, etc.; an output unit 407 such as various types of displays, speakers, and the like; a storage unit 408, such as a magnetic disk, optical disk, etc.; and a communication unit 409 such as a network card, modem, wireless communication transceiver, etc. The communication unit 409 allows the electronic device 400 to exchange information/data with other devices via a computer network, such as the internet, and/or various telecommunication networks.

The computing unit 401 may be a variety of general purpose and/or special purpose processing components having processing and computing capabilities. Some examples of computing unit 401 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various specialized Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, etc. The computing unit 401 performs the respective methods and processes described above, such as a method of video detection. For example, in some embodiments, the method of video detection may be implemented as a computer software program tangibly embodied on a machine-readable medium, such as the storage unit 408. In some embodiments, part or all of the computer program may be loaded and/or installed onto the electronic device 400 via the ROM 402 and/or the communication unit 409. When a computer program is loaded into RAM 403 and executed by computing unit 401, one or more steps of the method of video detection described above may be performed. Alternatively, in other embodiments, the computing unit 401 may be configured to perform the method of video detection by any other suitable means (e.g. by means of firmware).

Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuit systems, field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), systems On Chip (SOCs), complex Programmable Logic Devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs, the one or more computer programs may be executed and/or interpreted on a programmable system including at least one programmable processor, which may be a special purpose or general-purpose programmable processor, that may receive data and instructions from, and transmit data and instructions to, a storage system, at least one input device, and at least one output device.

Program code for carrying out methods of the present disclosure may be written in any combination of one or more programming languages. These program code may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus such that the program code, when executed by the processor or controller, causes the functions/operations specified in the flowchart and/or block diagram to be implemented. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.

In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. The machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and pointing device (e.g., a mouse or trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic input, speech input, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a background component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such background, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), and the internet.

The computer system may include a client and a server. The client and server are typically remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server may be a cloud server, a server of a distributed system, or a server incorporating a blockchain.

It should be appreciated that various forms of the flows shown above may be used to reorder, add, or delete steps. For example, the steps recited in the present disclosure may be performed in parallel or sequentially or in a different order, provided that the desired results of the technical solutions of the present disclosure are achieved, and are not limited herein.

The above detailed description should not be taken as limiting the scope of the present disclosure. It will be apparent to those skilled in the art that various modifications, combinations, sub-combinations and alternatives are possible, depending on design requirements and other factors. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present disclosure are intended to be included within the scope of the present disclosure.

Claims

1. A method of video detection, comprising:

acquiring video data to be detected;

2. The method of claim 1, wherein the classifying the video data to obtain blurred image frames and first sharp image frames in the video data comprises:

classifying the video data by using a preset classification model to obtain a classification processing result;

and obtaining the blurred image frame and the first clear image frame in the video data according to the classification processing result.

3. The method according to claim 1 or 2, wherein the detecting the first clear image frame to obtain a target detection result of the first clear image frame includes:

detecting the first clear image frame by using a preset target detection model to obtain a detection processing result;

and obtaining a target detection result of the first clear image frame according to the detection processing result.

4. A method according to any of claims 1-3, wherein said selecting a second clear image frame from said first clear image frames for replacing said blurred image frame comprises:

acquiring a first distance between the blurred image frame and the first clear image frame;

And responding to the first distance meeting a preset distance condition, and taking the first clear image frame as a second clear image frame for replacing the blurred image frame.

5. The method of claim 4, wherein the method further comprises:

acquiring a second distance between the blurred image frame and a second clear image frame replacing the blurred image frame;

and according to the second distance, performing expansion processing on the target detection result of the second clear image frame to obtain the target detection result of the second clear image frame after expansion processing.

6. The method of claim 5, wherein the expanding the object detection result of the second distinct image frame according to the second distance includes:

acquiring a target detection frame in a target detection result of the second clear image frame;

determining the magnification factor according to the second distance;

and according to the magnification, performing expansion processing on the target detection frame.

7. The method of any of claims 1-6, wherein the acquiring video data to be detected comprises:

acquiring each image frame of video data to be detected;

And carrying out reduction processing on each image frame according to a preset reduction multiple so as to obtain the video data after the reduction processing.

8. An apparatus for video detection, comprising:

an acquisition unit configured to acquire video data to be detected;

9. The device according to claim 8, wherein the classification unit is specifically configured to:

10. The device according to claim 8 or 9, wherein the detection unit is specifically configured to:

detecting the first clear image frame by using a preset target detection module to obtain a detection processing result;

11. The apparatus according to any one of claims 8-10, wherein the determining unit is specifically configured to:

12. The apparatus of claim 11, wherein the determining unit is further configured to:

13. The apparatus of claim 12, wherein the determining unit is further configured to:

determining the magnification factor according to the second distance;

14. The apparatus according to any one of claims 8-13, wherein the acquisition unit is specifically configured to:

acquiring each image frame of video data to be detected;

15. An electronic device, comprising:

at least one processor; and

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-7.

16. A non-transitory computer readable storage medium storing computer instructions for causing the computer to perform the method of any one of claims 1-7.

17. A computer program product comprising a computer program which, when executed by a processor, implements the method according to any of claims 1-7.