CN113792697A

CN113792697A - Target detection method and device, electronic equipment and readable storage medium

Info

Publication number: CN113792697A
Application number: CN202111117096.6A
Authority: CN
Inventors: 卢欣龙; 易曌平
Original assignee: Chongqing Unisinsight Technology Co Ltd
Current assignee: Chongqing Unisinsight Technology Co Ltd
Priority date: 2021-09-23
Filing date: 2021-09-23
Publication date: 2021-12-14
Anticipated expiration: 2041-09-23
Also published as: CN113792697B

Abstract

The invention provides a target detection method, a target detection device, electronic equipment and a readable storage medium, wherein the method comprises the following steps: determining a plurality of detection frames from the video frame sequence and detection information of a target object in the detection frames; the detection frame represents a video frame used for carrying out target detection processing; all detection frames meet a preset frame interval; predicting coordinate information of the target object in the non-detection frame according to the detection information corresponding to the determined target detection frame aiming at any non-detection frame with the target object; the target detection frame is at least one detection frame located before the non-detection frame, and one detection frame located after the non-detection frame and closest to the non-detection frame. The invention can effectively solve the problems of discontinuous target tracking frame, unsmooth track, incomplete privacy shielding and the like.

Description

Target detection method and device, electronic equipment and readable storage medium

Technical Field

The invention relates to the field of computer images, in particular to a target detection method, a target detection device, electronic equipment and a readable storage medium.

Background

Along with the rapid development of AI technology, the application of target detection in the security monitoring field is more extensive, through target detection, can discern people in the video, the monitoring target of focus such as car fast, and then carry out structurization analysis, perimeter prevention, people's stream and traffic density analysis, the show of real-time tracking frame to the target, perhaps carry out processing such as mosaic to the privacy target and shelter from.

At present, in order to implement target detection of multiple videos at the same time, a low frame rate analysis and detection scheme is generally adopted, but the existing low frame rate analysis and detection scheme can multiplex the same detection result for shielding or fuzzy processing, cannot achieve real-time complete processing, and is inaccurate in processing position, so that the target tracking effect is poor.

Disclosure of Invention

One of the objectives of the present invention is to provide a target detection method, an apparatus, an electronic device and a readable storage medium, which are used to optimize a target tracking effect, so as to achieve accurate processing of a target.

The technical scheme of the invention can be realized as follows:

in a first aspect, the present invention provides a method for detecting an object, the method comprising: determining a plurality of detection frames and detection information of a target object in the detection frames from a video frame sequence; wherein the detection frame represents a video frame used for target detection processing; all the detection frames meet a preset frame interval; for any non-detection frame with the target object, predicting coordinate information of the target object in the non-detection frame according to the detection information corresponding to the determined target detection frame; wherein the target detection frame is at least one of the detection frames located before the non-detection frame and one of the detection frames located after the non-detection frame and closest to the non-detection frame.

In a second aspect, the present invention provides an object detection apparatus, comprising: the device comprises a determining module, a judging module and a judging module, wherein the determining module is used for determining a plurality of detection frames and detection information of a target object in the detection frames from a video frame sequence; wherein the detection frame represents a video frame used for target detection processing; all the detection frames meet a preset frame interval; the prediction module is used for predicting the coordinate information of the target object in the non-detection frame according to the detection information corresponding to the determined target detection frame aiming at any non-detection frame with the target object; wherein the target detection frame is at least one detection frame located before the non-detection frame, and one detection frame located after the non-detection frame and closest to the non-detection frame.

In a third aspect, the present invention provides an electronic device comprising a processor and a memory, wherein the memory stores a computer program executable by the processor, and the processor can execute the computer program to implement the object detection method of the first aspect.

In a fourth aspect, the present invention provides a readable storage medium having stored thereon a computer program which, when executed by a processor, implements the object detection method of the first aspect.

The invention provides a target detection method, a target detection device, electronic equipment and a readable storage medium, wherein the method comprises the following steps: determining a plurality of detection frames and detection information of a target object in the detection frames from a video frame sequence; wherein the detection frame represents a video frame used for target detection processing; all the detection frames meet a preset frame interval; for any non-detection frame with the target object, predicting coordinate information of the target object in the non-detection frame according to the detection information corresponding to the determined target detection frame; wherein the target detection frame is at least one of the detection frames located before the non-detection frame and one of the detection frames located after the non-detection frame and closest to the non-detection frame. The target detection method provided by the embodiment of the invention predicts the target result of the non-detection frame based on the detection information of the detection frame through the prediction algorithm under the condition of low-frame-rate target detection to obtain the target prediction result, and is different from the prior art in that the prior art implements corresponding business processing on the detection result of the multiplexing detection frame of the non-detection frame, so that the problems of poor processing effect and inaccuracy are easy to occur.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the embodiments will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present invention and therefore should not be considered as limiting the scope, and for those skilled in the art, other related drawings can be obtained according to the drawings without inventive efforts.

FIG. 1 is an application environment for object detection;

FIG. 2 is a schematic flow chart of a target detection method according to an embodiment of the present invention;

fig. 3 is a schematic view of a scenario provided by an embodiment of the present invention;

FIG. 4 is a schematic flow chart diagram of another method for detecting an object according to an embodiment of the present invention;

fig. 5 is a schematic flow chart of an implementation manner of step S204 according to an embodiment of the present invention;

fig. 6 is a schematic flow chart of another implementation manner of step S204 according to the embodiment of the present invention;

FIG. 7 is a functional block diagram of an object detection apparatus according to an embodiment of the present invention;

fig. 8 is a block diagram of an electronic device according to an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. The components of embodiments of the present invention generally described and illustrated in the figures herein may be arranged and designed in a wide variety of different configurations.

Thus, the following detailed description of the embodiments of the present invention, presented in the figures, is not intended to limit the scope of the invention, as claimed, but is merely representative of selected embodiments of the invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, it need not be further defined and explained in subsequent figures.

In the description of the present invention, it should be noted that if the terms "upper", "lower", "inside", "outside", etc. indicate an orientation or a positional relationship based on that shown in the drawings or that the product of the present invention is used as it is, this is only for convenience of description and simplification of the description, and it does not indicate or imply that the device or the element referred to must have a specific orientation, be constructed in a specific orientation, and be operated, and thus should not be construed as limiting the present invention.

Furthermore, the appearances of the terms "first," "second," and the like, if any, are used solely to distinguish one from another and are not to be construed as indicating or implying relative importance.

It should be noted that the features of the embodiments of the present invention may be combined with each other without conflict.

Along with the rapid development of AI technology, the application of target detection in the security monitoring field is more extensive, through target detection, can discern people in the video, the monitoring target of focus such as car fast, and then carry out structurization analysis, perimeter prevention, people's circulation flow density analysis, real-time tracking frame show to the target, perhaps carry out processing such as mosaic to the privacy target and shelter from.

Referring to fig. 1, fig. 1 is an application environment for object detection, which may include: may include a network 10, a backend device 20, and at least one camera device 30.

Wherein: network 10 may include various types of connections, such as wire, wireless communication links, or fiber optic cables, to name a few.

The backend device 20 may be, but not limited to, a streaming media server with AI capability, or a server cluster composed of a plurality of streaming media servers with AI capability, or the backend device 20 may also be a Digital Video Recorder (DVR) or a Network Video Recorder (NVR), and the object detection method provided by the embodiment of the present invention may be applied to the backend device 20.

The camera device 30 may be, but is not limited to, a general audio-video camera, a web camera ipc (ip camera) for providing a photographed video frame to the backend device 20, and the camera device 30 may be, but is not limited to, a variety of public places such as a bank, a supermarket, a company, an airport, and the like.

In the embodiment of the present invention, after the image capturing device 30 transmits the acquired image to the backend device 20 through the network 10, the backend device 20 executes the target detection method provided in the embodiment of the present invention according to the acquired image.

It should be noted that the embodiment of the present invention is not only applicable to monitoring scenes, but also applicable to any video acquired by an image acquisition device to execute the target detection method provided by the embodiment of the present invention.

Continuing with the application environment shown in fig. 1 as an example, in the prior art, in order to enable multiple videos to perform target detection simultaneously, the backend device 20 generally adopts a low frame rate analysis detection scheme. For example, the original Frame rate of the video is 25fps (Frame Per Second, Frame rate), and the Frame rate for performing target detection is 8fps, so as to save hardware computation power and achieve the purpose of detecting multiple channels simultaneously.

However, a lower target analysis frame rate may satisfy the monitoring requirement, and may not have a great influence on the effect of structuring, perimeter, and the like. But for application scenarios tracked in real time (e.g. tracking box display, privacy occlusion, etc.), the low frame rate analysis detection may affect the application effect.

For example, when the original frame rate is 25fps and the frame rate of target detection is 8fps, only 1 frame of every 3 frames can be subjected to target detection, and the 3 frames can be subjected to display and processing by multiplexing the same result. For the real-time tracking frame display, the final effect is that people and vehicles move smoothly, but the tracking frame is not smooth; for the blocking of the man-car mosaic, the target cannot be completely blocked in real time, no frame for target detection is available, and the blocking position is not accurate.

In order to solve this problem, the current solution generally adopts a method of dynamically detecting the frame rate, and dynamically configures the frame rate according to the detection capability and the number of detection paths.

For example, the total detection capability of the backend device is 1080P @25fps, and when only 1-path target detection is started, the target detection is performed by adopting 25 fps; when 2 paths are started, target detection is carried out by adopting 12 fps; when 3 paths are opened, 8fps is adopted for target detection. This balances the contradiction between the detection effect and the computational power limitation to some extent, but does not solve the problem of inaccurate processing position.

The related art also provides a trajectory prediction algorithm, which can deduce future motion trajectories through historical trajectories, but the predicted trajectories of the algorithms generally have certain errors with actual trajectories, are usually applied to application scenes such as automatic driving, automatic logistics distribution robots and the like which need to predict future rough routes of people and vehicles in advance, and are not suitable for application scenes with relatively fine requirements on target coordinates such as target tracking, trajectory reduction, privacy blocking and the like.

In order to solve the above technical problems, embodiments of the present invention provide a target detection method, where in a case of low frame rate target detection, a detection result of a non-detection frame is predicted by a prediction algorithm to obtain a target prediction result, and the non-detection frame is further processed correspondingly based on the predicted detection result, so that an application effect of low frame rate delivery detection can be effectively improved, and problems of discontinuity of target tracking frames such as a human and a vehicle, unsmooth track, incomplete privacy occlusion, and the like are solved.

Referring to fig. 2, fig. 2 is a schematic flowchart of a target detection method according to an embodiment of the present invention, where the method includes:

s201, determining a plurality of detection frames and detection information of the target object in the detection frames from the video frame sequence.

The detection frames represent video frames used for target detection processing, and preset frame intervals are met among all the detection frames. In the video frame sequence, except the detection frame, the rest of the video frames to be processed can be regarded as non-detection video frames, that is, target detection processing is not required, so that the effect of low frame rate analysis detection can be realized.

In some possible embodiments, the preset frame interval may be customized according to the actual low frame rate analysis requirement, for example, the preset frame interval may be set to 3 frames as an example, and it is assumed that the video frame sequence is: f1, F2, F3, F4, F5, F6, F7, F8, F9, F10, F11, F12, F13, F14, F15, F16, F17, F18, F19, F20, F21, F22, and the detection frame includes: f1, F4, F7, F10, F13, F16, F19, F22.

In other possible embodiments, the preset frame interval may also be dynamically configured according to the detection capability and the number of detection paths of the backend device.

In this embodiment, the target object may be, but is not limited to, a person, a vehicle, an animal, and the like, and the detection information may be, but is not limited to, important information including coordinate information, type information, a unique ID, a time stamp, and the like of the target object. In the implementation process, target detection may be performed on one target object in one detection frame according to the service requirement, or target detection may be performed on all target objects in the detection frame.

And S204, predicting the coordinate information of the target object in the non-detection frame according to the detection information corresponding to the determined target detection frame aiming at any non-detection frame with the target object.

The target detection frame is at least one detection frame located before the non-detection frame, and one detection frame located after the non-detection frame and closest to the non-detection frame.

In step S204, each detection frame corresponds to detection information, so that, for any one non-detection frame, the detection frame used for predicting the coordinate information of the non-detection video frame can be determined according to the sequence order of the video frame sequences, and further, the coordinate information of the target object in the non-detection frame can be predicted according to the detection information of the target object in the target detection frame.

It is understood that the predicted coordinate information may be used to guide corresponding business processing on the target object in the non-detection frame, such as overlaying a human face or a vehicle tracking frame, mosaic, blur, and the like.

The target detection method provided by the embodiment of the invention predicts the target result of the non-detection frame based on the detection information of the detection frame through the prediction algorithm under the condition of low-frame-rate target detection to obtain the target prediction result, and is different from the prior art in that the prior art implements corresponding business processing on the detection result of the multiplexing detection frame of the non-detection frame, so that the problems of poor processing effect and inaccuracy are easy to occur.

To facilitate understanding of the above implementation flow, please refer to a scene diagram shown in fig. 3, which shows a group of video frame sequences, and it is understood that the video frame sequences may be sequences composed of video frames decoded by the backend device every second.

For example, assuming a frame rate of 22fps per second for the back-end device, the sequence of video frames may have 22 images. According to the low frame rate detection capability supported by the back-end equipment, a preset frame interval can be defined to determine a plurality of detection frames from 22 frames of images.

As shown in fig. 3, assume that the video frame sequence is: f1, F2, F3, F4, F5, F6, F7, F8, F9, F10, F11, F12, F13, F14, F15, F16, F17, F18, F19, F20, F21, F22. The detecting the frame includes: f1, F4, F7, F10, F13, F16, F19, F22, the other video frames in the sequence of video frames being non-detection frames. The detection information corresponding to each detection frame may be denoted as a.

In one scenario, assuming that the non-detection frame is F2, it is determined that the target detection frame corresponding to F2 is F1 or F4, the detection information corresponding to F1 is a1, and the detection information corresponding to F4 is a4, and then the detection information a2 corresponding to F2 can be predicted according to a1 and a 4.

In another scenario, assuming that the non-detection frame is F8, it can be seen that three detection frames F1, F4, and F7 exist before F8, and in an implementation, if it is determined that the target detection frame corresponding to F8 is F7 and F10, detection information a8 corresponding to F8 can be predicted according to F7 and F10; in one implementation, if the target detection frames corresponding to F8 are F1, F4, F7, and F10, detection information A8 corresponding to F8 may be predicted from a1, a4, a7, and a 10.

It is understood that when a plurality of target detection frames are employed to predict detection information of a non-detection frame, the accuracy of the prediction result can be improved.

In a specific implementation process, for step S201, a plurality of detection frames may be determined in the video frame sequence according to a preset frame interval, and then target detection processing is performed on the plurality of detection frames to obtain detection information.

For example, after each video frame is decoded, whether the current video frame is a detection frame may be determined according to a preset frame interval, and if the current video frame is a detection frame, the current video frame may be subjected to target detection, and the detection result may be buffered in the detection result sequence l (a), and the previous video frame may be buffered in the new video frame sequence l (f). The detection result can be directly read from l (a) in the prediction process.

For example, with continued reference to fig. 3, the backend apparatus sequentially receives video frames transmitted from the image capturing apparatus, determines the received first frame F1 as a detection frame, performs object detection on F1 to obtain a1, buffers a1 to l (a), and buffers F1 to a sequence of video frames l (F).

Further, for a subsequently received current video frame, it is determined whether a preset frame interval is satisfied between the current video frame and a last detection frame in l (F), for example, the preset frame interval is 3 frames, the current video frame is F2, and the preset frame interval is not satisfied between F2 and F1, then F2 is cached in l (F), and the determination of the subsequent video frame is continued, when the current video frame is F4, it is found that the preset frame interval is satisfied between F4 and F1, F4 is determined as a detection frame, target detection is performed on F4 to obtain a4, a4 is cached in l (a), F4 is cached in l (F), and so on, and a detection frame in the subsequent video frame is determined.

In a specific implementation process, continuing with fig. 3 as an example, the implementation process may be executed in the backend device according to the following steps:

step 1, the backend device receives F1, determines F1 as a detection frame, performs target detection on F1 to obtain a detection result a1, buffers F1 into l (F), and buffers a1 into l (a).

And step 2, sequentially receiving F2 and F3, determining that F2 and F3 are non-detection frames, and buffering F2 and F3 to a queue L (F).

Step 3, receiving F4, determining F4 as a detection frame, performing target detection on F4 to obtain a detection result a4, caching F4 into l (F), and caching a4 into l (a).

Step 4, inquiring that there are 1 detection result a1 in the detection result queue l (a), and further calculating a motion vector V1, V1 ═ a 4-a 1)/(T (F4) -T (F1)) of the target in unit time between F1 and F4 according to the coordinate change of the corresponding target (the target with the same ID) in a4 and a1, wherein a4 and a1 may respectively represent coordinate information of the target in F4 and F1, and T (F4) and T (F1) respectively represent time information of photographing the target.

Step 5, inquiring that non-detection frames F1 and F2 exist in the frame queue l (F), respectively calculating a time difference Δ t1 of F2 relative to F1 and a time difference Δ t2 of F3 relative to F1, and further obtaining the predicted coordinate of the target corresponding to F2 as: a2 ═ Δ t1 ═ V1+ a1, A3 ═ Δ t2 ═ V1+ a 1.

And 6, repeating the steps 1-5 aiming at the subsequent video frames until all the non-detection frames are predicted.

The coordinate information of the target object in the non-detection frame can be predicted through the process, the service processing can be carried out on the target object in each video frame according to the coordinate position corresponding to each video frame, and the processed YUV is recoded and distributed.

In some possible embodiments, after step 6, in order to reduce the buffer pressure of the backend device, the video frames F1, F2, F3 that have undergone traffic processing may also be deleted from l (F).

Alternatively, with continued reference to fig. 3, it can be seen that for F2 and F3, there is only one detection frame before it, so that there are only 2 target detection frames corresponding to F2 and F3, namely F1 and F4; and for the subsequent non-detection frames (for example, F5 and F6), there are at least 2 detection frames before the subsequent non-detection frames, so that for the subsequent non-detection frames, the corresponding target detection frames can be greater than 2, and when the target detection frames are greater than 2, the prediction accuracy can be correspondingly improved.

Therefore, this embodiment further provides an implementation manner for determining a target detection frame corresponding to a non-detection frame, please refer to fig. 4, and fig. 4 is a schematic flowchart of another target detection method provided in this embodiment of the present invention.

S202, if only one detection frame exists before the non-detection frame, determining the detection frame and a detection frame which is positioned behind the non-detection frame and is closest to the non-detection frame as target detection frames.

S203, if at least a plurality of detection frames exist before the non-detection frame, determining a preset number of detection frames before the non-detection frame, or determining a detection frame before the non-detection frame and closest to the non-detection frame and a detection frame after the non-detection frame and closest to the non-detection frame as target detection frames;

for convenience of understanding, please continue to refer to fig. 3, for the non-detection frames F2 and F3, there is only one detection frame F1 before the non-detection frame, and the detection frame that is located after and closest to F2 and F3 is F4, so the target detection frames corresponding to F2 and F3 are only F1 and F4. For the non-detection frame F8, the detection frames existing before it are F1, F4 and F7, and the detection frame closest to it is F10, then the target detection frame corresponding to F8 may be as follows:

in one implementation, F7 and F10 may be determined to be target detection frames corresponding to F8.

In another implementation, at least 2 detection frames of F1, F4, and F7 and F10 may be determined as target detection frames, for example, F1, F4, F7, and F10 are all determined as target detection frames corresponding to F8, and when a detection result a8 corresponding to F8 is predicted, prediction may be performed by combining detection information of F1, F4, F7, and F10, which is beneficial to improving accuracy of prediction.

For example, in the process of receiving a video frame by the backend device, the detection result of the detection frame may be cached in the detection result sequence l (a), so that, assuming that 4 detection results can be stored in the sequence l (a) to the maximum, the preset number N satisfies the following relation: 2< N < 4.

Optionally, with continuing reference to fig. 3, it can be seen from the foregoing that, for one non-detection frame, only 2 target detection frames may correspond, or more than 2 target detection frames may also correspond, and two embodiments of predicting the target detection result of the non-detection frame are given in the following for these two scenarios.

In a scenario, when there is only one target detection frame before the non-detection frame, an implementation manner of step S204 may be as shown in fig. 5, where fig. 5 is a schematic flowchart of an implementation manner of step S204 according to an embodiment of the present invention.

S204-1a, determining a motion vector of the target object according to the coordinate information of the target object in a target detection frame before the non-detection frame and the coordinate information of the target object in a target detection frame after the non-detection frame.

S204-2a, calculating the time difference value between the non-detection frame and a target detection frame positioned before the non-detection frame;

and S204-3a, predicting the coordinate information of the target object in the non-detection frame according to the motion vector, the time difference value and the coordinate information of a target detection frame positioned before the non-detection frame.

For example, with continued reference to fig. 3, for a non-detection frame F2, the corresponding target detection frames are determined to be F1 and F4, the detection results may be a1 and a4, respectively, and first, a motion vector V1 per unit time of the target object between F1 and F4 is calculated. Further, calculating the time difference Δ t1 of the non-detection frame F2 with respect to F1, the coordinate information of the prediction target object 001 in the non-detection frame F2 may be: a2 ═ Δ t1 × V1+ a1, and similarly, detection information A3 corresponding to the non-detection frame F3 can be predicted.

In another scenario, when there are multiple detection frames before the non-detection frame, another implementation manner of step S204 may be as shown in fig. 6, where fig. 6 is a schematic flowchart of another implementation manner of step S204 provided in the embodiment of the present invention.

And S204-1b, sequentially obtaining a plurality of motion vector sequences according to the sequence of the preset number of target detection frames and the respective coordinate information of two adjacent target detection frames.

In this embodiment, assuming that the preset number is N, for N target detection frames, a motion vector of every two adjacent target detection frames in unit time is calculated, and is denoted as V, and the calculation method is as follows: v (N) ═ a (N) — a (N-1))/Δ t (N-1), where a (N) represents the detection result of the nth target detection frame, a (N-1) represents the detection result of the nth-1 target detection frame, and Δ t (N-1) represents the time stamp difference between a (N) and a (N-1). N-1 motion vectors can be calculated according to the N target detections to form a motion vector sequence.

For example, continuing with the example of fig. 3, assuming that the non-detected frame is F8, the corresponding target detected frames are F1, F4, F7 and F10, and the detection results corresponding to the target detected frames are a1, a4, a7 and a10, the motion vector V1 between a1 and a4, the motion vector V4 between a4 and a7, the motion vector V7 between a7 and a10 are calculated, and the motion vector sequence is { V1, V4 and V7 }.

And S204-2b, respectively calculating time difference value sequences of the non-detection frame and a plurality of target detection frames positioned before the non-detection frame.

For example, with continued reference to the above example, the time difference between the non-detection frames F8 and F1, F4, F7, and F10, that is, Δ T7 ═ T (F8) -T (F7), Δ T4 ═ T (F8) -T (F4), Δ T1 ═ T (F8) -T (F1), is calculated, and the time difference sequence is set as { Δ T1, Δ T4, Δ T7 }.

S204-3b, determining the weight coefficient sequence corresponding to a plurality of target detection frames before the non-detection frame.

In this embodiment, the setting principle of the weight coefficient corresponding to each target detection frame may be: the closer the distance between the target detection frame and the non-detection frame is, the larger the weight corresponding to the target detection frame is, and the sum of all the weight coefficients is 1.

For example, continuing with the above example, the target detection frames located before the non-detection frame F8 are F1, F4, F7, and assuming that the respective weight coefficients of F1, F4, and F7 are K1, K4, and K7, where the distances between F7, F4, F1, and F8 increase in sequence, then K7> K4> K1, and K7+ K4+ K1 is 1, which may be, for example, K7 is 0.7, K4 is 0.2, and K1 is 0.1.

And S204-4b, predicting the coordinate information of the target object in the non-detection frame according to the motion vector sequence, the time difference sequence and the weight coefficient sequence.

In the embodiment of the present invention, the detection information of the non-detection frame can be predicted according to the following relation:

where a denotes a prediction result of a non-detection frame, k (i) denotes an ith weight coefficient, v (i) denotes an ith motion vector in the motion vector sequence, and Δ t (i) denotes an ith time difference in the time difference sequence.

For example, continuing with the above example, for the non-detection frame F8, the motion vector sequence is { V1, V4, V7}, the temporal difference sequence is { Δ t1, Δ t4, Δ t7}, and the weight coefficient sequence is {0.1, 0.2,0.7}, so that the prediction result a8 ═ K7 × Δ t7 × V7+ K4 × Δ t4 × V4+ K1 × Δ t1 × V1 corresponding to the non-detection frame F8.

In the above example, the predicted coordinates calculated by the algorithm take into account the future moving direction of F8, i.e. the motion vectors of A7-A10, and also take into account the motion vectors of it, i.e. A4-A7 and A1-A4, and the calculated predicted coordinates are smoother and more accurate.

Optionally, in the embodiment, for the detected frame and the non-detected frame, the target object may be correspondingly service-processed according to the service requirement and the detection result or the prediction result, and a possible implementation manner is further provided below: according to the coordinate information of the target object in the detection frame and the coordinate information of the target object in the non-detection frame, the following processing is carried out on the target object: occlusion processing or blurring processing or configuration tracking identification box processing.

For example, assuming that the target object in the detection frame is a person, coordinate information of the person in the detection frame may be predicted, and then the face is subjected to occlusion processing or mosaic processing, or configured with a tracking frame identifier.

In order to implement the steps in the foregoing embodiments to achieve the corresponding technical effects, the object detection method provided in the embodiments of the present invention may be implemented in a hardware device or in a form of a software module, and when the object detection method is implemented in a form of a software module, an object detection apparatus is further provided in the embodiments of the present invention, please refer to fig. 7, where fig. 7 is a functional block diagram of the object detection apparatus provided in the embodiments of the present invention, and the object detection apparatus 300 may include:

a determining module 310, configured to determine, from a sequence of video frames, a plurality of detection frames and detection information of a target object in the detection frames; wherein the detection frame represents a video frame used for target detection processing; all the detection frames meet a preset frame interval;

a predicting module 320, configured to predict, according to the detection information corresponding to the determined target detection frame, coordinate information of the target object in any non-detection frame where the target object exists;

wherein the target detection frame is at least one of the detection frames located before the non-detection frame and one of the detection frames located after the non-detection frame and closest to the non-detection frame.

Optionally, the detection information includes coordinate information and time information of the target object; when there is only one detected frame before the non-detected frame, the prediction module 320 is specifically configured to: determining a motion vector of the target object according to the coordinate information of the target object in the one target detection frame before the non-detection frame and the coordinate information of the target object in the target detection frame after the non-detection frame; calculating a time difference between the non-detection frame and one of the target detection frames located before the non-detection frame; and predicting the coordinate information of the target object in the non-detection frame according to the motion vector, the time difference value and the coordinate information of the target detection frame before the non-detection frame.

Optionally, the detection information includes coordinate information and time information of the target object in a target detection frame; when there are multiple detected frames before the non-detected frame, the prediction module 320 is further specifically configured to: according to the sequence of the preset number of target detection frames, obtaining a plurality of motion vector sequences according to the respective coordinate information of two adjacent target detection frames in sequence; respectively calculating time difference value sequences between the non-detection frames and a plurality of target detection frames positioned before the non-detection frames; determining a weight coefficient sequence corresponding to a plurality of target detection frames before the non-detection frame; and predicting the coordinate information of the target object in the non-detection frame according to the motion vector sequence, the time difference value sequence and the weight coefficient sequence.

Optionally, the determining module 310 is further configured to: if only one detection frame exists before the non-detection frame, determining the detection frame and a detection frame which is positioned behind the non-detection frame and is closest to the non-detection frame as the target detection frame; if at least a plurality of detection frames exist before the non-detection frame, determining a preset number of detection frames before the non-detection frame, or determining one detection frame before the non-detection frame and closest to the non-detection frame and one detection frame after the non-detection frame and closest to the non-detection frame as the target detection frame; wherein the preset number is less than or equal to the number of detection frames preceding the non-detection frame.

Optionally, the target detection apparatus 300 further includes a service processing module, and according to the coordinate information of the target object in the detection frame and the coordinate information of the target object in the non-detection frame, performs the following processing on the target object: occlusion processing or blurring processing or configuration tracking identification box processing.

Optionally, the determining module 310 is specifically configured to: determining the plurality of detection frames in the video frame sequence according to the preset frame interval; and carrying out target detection processing on the plurality of detection frames to obtain the detection information.

The embodiment of the present invention further provides an electronic device, which may be the backend device 20 shown in fig. 1, or another electronic device with a data processing function, which is not limited in the present invention.

Referring to fig. 8, fig. 8 is a block diagram of an electronic device according to an embodiment of the present invention. The electronic device 50 comprises a communication interface 501, a processor 502 and a memory 503. The processor 502, memory 503 and communication interface 501 are electrically connected to each other, directly or indirectly, to enable the transfer or interaction of data. For example, the components may be electrically connected to each other via one or more communication buses or signal lines. The memory 503 may be used for storing software programs and modules, such as program instructions/modules corresponding to the object detection method provided by the embodiment of the present invention, and the processor 502 executes various functional applications and data processing by executing the software programs and modules stored in the memory 503. The communication interface 501 may be used for communicating signaling or data with other node devices. The electronic device 50 may have a plurality of communication interfaces 501 in the present invention.

The Memory 503 may be, but is not limited to, a Random Access Memory (RAM), a Read Only Memory (ROM), a Programmable Read-Only Memory (PROM), an Erasable Read-Only Memory (EPROM), an electrically Erasable Read-Only Memory (EEPROM), and the like.

The processor 502 may be an integrated circuit chip having signal processing capabilities. The Processor may be a general-purpose Processor including a Central Processing Unit (CPU), a Network Processor (NP), etc.; but may also be a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic, discrete hardware components, etc.

Alternatively, when the object detection method is implemented in the form of a software module, each functional module in the object detection apparatus 300 according to another embodiment of the present invention may be stored in a memory in the form of software or Firmware (Firmware) or be fixed in an Operating System (OS) of the electronic device 50, and may be executed by the processor 502 in the electronic device 50. Meanwhile, data, codes of programs, and the like required to execute the above modules may be stored in the memory 503.

The embodiment of the present invention further provides a readable storage medium, on which a computer program is stored, and the computer program, when executed by a processor, implements the object detection method according to any one of the foregoing embodiments. The computer readable storage medium may be, but is not limited to, various media that can store program codes, such as a U disk, a removable hard disk, a ROM, a RAM, a PROM, an EPROM, an EEPROM, a magnetic or optical disk, etc.

The above description is only for the specific embodiment of the present invention, but the scope of the present invention is not limited thereto, and any changes or substitutions that can be easily conceived by those skilled in the art within the technical scope of the present invention are included in the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the appended claims.

Claims

1. A method of object detection, the method comprising:

determining a plurality of detection frames and detection information of a target object in the detection frames from a video frame sequence; wherein the detection frame represents a video frame used for target detection processing; all the detection frames meet a preset frame interval;

for any non-detection frame with the target object, predicting coordinate information of the target object in the non-detection frame according to the detection information corresponding to the determined target detection frame;

2. The object detection method according to claim 1, characterized in that the detection information includes coordinate information and time information of the target object;

when only one detection frame exists before the non-detection frame, predicting the coordinate information of the target object in the non-detection frame according to the detection information corresponding to the determined target detection frame, wherein the method comprises the following steps:

determining a motion vector of the target object according to the coordinate information of the target object in a target detection frame before the non-detection frame and the coordinate information of the target object in the target detection frame after the non-detection frame;

calculating a time difference between the non-detection frame and one of the target detection frames located before the non-detection frame;

and predicting the coordinate information of the target object in the non-detection frame according to the motion vector, the time difference value and the coordinate information of the target detection frame before the non-detection frame.

3. The object detection method according to claim 1, wherein the detection information includes coordinate information and time information of the object in an object detection frame;

when a plurality of detection frames exist before the non-detection frame, predicting the coordinate information of the target object in the non-detection frame according to the detection information corresponding to the determined target detection frame, wherein the method comprises the following steps:

according to the sequence of a preset number of target detection frames, obtaining a plurality of motion vector sequences according to the respective coordinate information of two adjacent target detection frames in sequence;

respectively calculating time difference value sequences between the non-detection frames and a plurality of target detection frames positioned before the non-detection frames;

determining a weight coefficient sequence corresponding to a plurality of target detection frames before the non-detection frame;

and predicting the coordinate information of the target object in the non-detection frame according to the motion vector sequence, the time difference value sequence and the weight coefficient sequence.

4. The object detection method according to any one of claims 1 to 3, wherein, in any one of non-detection frames in which the object exists, the method further comprises, before predicting the coordinate information of the object in the non-detection frame according to the detection information corresponding to the determined object detection frame, the method further comprises:

if only one detection frame exists before the non-detection frame, taking the detection frame and a detection frame which is positioned behind the non-detection frame and is closest to the non-detection frame as the target detection frame;

if at least a plurality of detection frames exist before the non-detection frame, determining a preset number of detection frames before the non-detection frame, or determining one detection frame before the non-detection frame and closest to the non-detection frame and one detection frame after the non-detection frame and closest to the non-detection frame as the target detection frame;

wherein the preset number is less than or equal to the number of detection frames preceding the non-detection frame.

5. The object detection method of claim 1, further comprising:

according to the coordinate information of the target object in the detection frame and the coordinate information of the target object in the non-detection frame, performing the following processing on the target object: occlusion processing or blurring processing or configuration tracking identification box processing.

6. The method of claim 1, wherein determining a plurality of detection frames from the sequence of video frames, and the detection information of the target object in each detection frame comprises:

determining the plurality of detection frames in the video frame sequence according to the preset frame interval;

and carrying out target detection processing on the plurality of detection frames to obtain the detection information.

7. An object detection device, comprising:

the device comprises a determining module, a judging module and a judging module, wherein the determining module is used for determining a plurality of detection frames and detection information of a target object in the detection frames from a video frame sequence; wherein the detection frame represents a video frame used for target detection processing; all the detection frames meet a preset frame interval;

the prediction module is used for predicting the coordinate information of the target object in the non-detection frame according to the detection information corresponding to the determined target detection frame aiming at any non-detection frame with the target object;

wherein the target detection frame is at least one detection frame located before the non-detection frame, and one detection frame located after the non-detection frame and closest to the non-detection frame.

8. The object detection apparatus according to claim 7, characterized in that the detection information includes coordinate information and time information of the target object; when there is only one target detection frame located before the non-detection frame, the prediction module is specifically configured to:

determining a motion vector of the target object according to the coordinate information of the target object in the one target detection frame before the non-detection frame and the coordinate information of the target object in the target detection frame after the non-detection frame;

9. An electronic device, comprising a processor and a memory, the memory storing a computer program executable by the processor, the processor being operable to execute the computer program to implement the object detection method of any one of claims 1-6.

10. A readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the object detection method according to any one of claims 1-6.