CN113792697B

CN113792697B - Target detection method, target detection device, electronic equipment and readable storage medium

Info

Publication number: CN113792697B
Application number: CN202111117096.6A
Authority: CN
Inventors: 卢欣龙; 易曌平
Original assignee: Chongqing Unisinsight Technology Co Ltd
Current assignee: Chongqing Unisinsight Technology Co Ltd
Priority date: 2021-09-23
Filing date: 2021-09-23
Publication date: 2023-09-05
Anticipated expiration: 2041-09-23
Also published as: CN113792697A

Abstract

The invention provides a target detection method, a target detection device, electronic equipment and a readable storage medium, wherein the method comprises the following steps: determining a plurality of detection frames from the video frame sequence and detection information of a target object in the detection frames; wherein the detection frame characterizes a video frame for performing a target detection process; all detection frames meet the preset frame interval; predicting coordinate information of the target object in a non-detection frame according to detection information corresponding to the determined target detection frame aiming at any non-detection frame with the target object; the target detection frame is at least one detection frame positioned before the non-detection frame, and one detection frame positioned after the non-detection frame and nearest to the non-detection frame. The method can effectively solve the problems of discontinuous target tracking frame, unsmooth track, incomplete privacy shielding and the like.

Description

Target detection method, target detection device, electronic equipment and readable storage medium

Technical Field

The present invention relates to the field of computer images, and in particular, to a target detection method, apparatus, electronic device, and readable storage medium.

Background

Along with the rapid development of the AI technology, the application of target detection in the security monitoring field is wider, and the heavy point monitoring targets such as people, vehicles and the like in the video can be rapidly identified through the target detection, so that the targets are subjected to structural analysis, perimeter precaution, people flow and traffic density analysis, real-time tracking frame display or mosaic shielding and the like.

At present, in order to realize simultaneous target detection of multiple paths of videos, a low-frame-rate analysis and detection scheme is generally adopted, but the existing low-frame-rate analysis and detection scheme can multiplex the same detection result to carry out shielding or fuzzy processing, cannot realize real-time complete processing, is inaccurate in processing position, and causes poor target tracking effect.

Disclosure of Invention

One of the purposes of the invention is to provide a target detection method, a device, an electronic device and a readable storage medium, which are used for optimizing a target tracking effect so as to realize accurate processing of a target.

The technical scheme of the invention can be realized as follows:

in a first aspect, the present invention provides a method of target detection, the method comprising: determining a plurality of detection frames and detection information of a target object in the detection frames from a video frame sequence; wherein the detection frame characterizes a video frame for target detection processing; all the detection frames meet the preset frame interval; predicting coordinate information of the target object in a non-detection frame according to the detection information corresponding to the determined target detection frame aiming at any non-detection frame with the target object; wherein the target detection frame is at least one of the detection frames located before the non-detection frame, and one of the detection frames located after and closest to the non-detection frame.

In a second aspect, the present invention provides an object detection apparatus comprising: the determining module is used for determining a plurality of detection frames and detection information of a target object in the detection frames from the video frame sequence; wherein the detection frame characterizes a video frame for target detection processing; all the detection frames meet the preset frame interval; the prediction module is used for predicting coordinate information of the target object in a non-detection frame according to the detection information corresponding to the determined target detection frame aiming at any non-detection frame in which the target object exists; wherein the target detection frame is at least one detection frame located before the non-detection frame, and one detection frame located after the non-detection frame and closest to the non-detection frame.

In a third aspect, the present invention provides an electronic device comprising a processor and a memory, the memory storing a computer program executable by the processor, the processor being executable by the computer program to implement the object detection method of the first aspect.

In a fourth aspect, the present invention provides a readable storage medium having stored thereon a computer program which, when executed by a processor, implements the object detection method of the first aspect.

The invention provides a target detection method, a target detection device, electronic equipment and a readable storage medium, wherein the method comprises the following steps: determining a plurality of detection frames and detection information of a target object in the detection frames from a video frame sequence; wherein the detection frame characterizes a video frame for target detection processing; all the detection frames meet the preset frame interval; predicting coordinate information of the target object in a non-detection frame according to the detection information corresponding to the determined target detection frame aiming at any non-detection frame with the target object; wherein the target detection frame is at least one of the detection frames located before the non-detection frame, and one of the detection frames located after and closest to the non-detection frame. The target detection method provided by the embodiment of the invention predicts the target result of the non-detection frame based on the detection information of the detection frame through a prediction algorithm under the condition of low-frame-rate target detection, and obtains the target prediction result, and is different from the prior art in that the prior art is used for carrying out corresponding business processing on the detection result of the multiplexing detection frame for the non-detection frame, so that the problems of poor processing effect, inaccuracy and the like easily occur.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings that are needed in the embodiments will be briefly described below, it being understood that the following drawings only illustrate some embodiments of the present invention and therefore should not be considered as limiting the scope, and other related drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is an application environment for object detection;

FIG. 2 is a schematic flow chart of a target detection method according to an embodiment of the present invention;

FIG. 3 is a schematic view of a scenario provided in an embodiment of the present invention;

FIG. 4 is a schematic flow chart of another object detection method according to an embodiment of the present invention;

FIG. 5 is a schematic flow chart diagram of one implementation of step S204 provided by an embodiment of the present invention;

FIG. 6 is a schematic flow chart diagram of another implementation of step S204 provided in an embodiment of the present invention;

FIG. 7 is a functional block diagram of an object detection device according to an embodiment of the present invention;

fig. 8 is a block schematic diagram of an electronic device according to an embodiment of the present invention.

Detailed Description

For the purpose of making the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is apparent that the described embodiments are some embodiments of the present invention, but not all embodiments of the present invention. The components of the embodiments of the present invention generally described and illustrated in the figures herein may be arranged and designed in a wide variety of different configurations.

Thus, the following detailed description of the embodiments of the invention, as presented in the figures, is not intended to limit the scope of the invention, as claimed, but is merely representative of selected embodiments of the invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

It should be noted that: like reference numerals and letters denote like items in the following figures, and thus once an item is defined in one figure, no further definition or explanation thereof is necessary in the following figures.

In the description of the present invention, it should be noted that, if the terms "upper", "lower", "inner", "outer", and the like indicate an azimuth or a positional relationship based on the azimuth or the positional relationship shown in the drawings, or the azimuth or the positional relationship in which the inventive product is conventionally put in use, it is merely for convenience of describing the present invention and simplifying the description, and it is not indicated or implied that the apparatus or element referred to must have a specific azimuth, be configured and operated in a specific azimuth, and thus it should not be construed as limiting the present invention.

Furthermore, the terms "first," "second," and the like, if any, are used merely for distinguishing between descriptions and not for indicating or implying a relative importance.

It should be noted that the features of the embodiments of the present invention may be combined with each other without conflict.

Referring to fig. 1, fig. 1 is an application environment for object detection, where the application environment may include: may include a network 10, a backend device 20, and at least one image capture device 30.

Wherein: the network 10 may include various connection types, such as wired, wireless communication links, or fiber optic cables, among others.

The backend device 20 may be, but not limited to, a streaming media server with AI capability or a server cluster formed by a plurality of streaming media servers with AI capability, or the backend device 20 may also be a digital video recorder (Digital Video Recorder, abbreviated as DVR) or a network video recorder (Network Video Recorder, abbreviated as NVR), and the target detection method provided by the embodiment of the present invention may be applied to the backend device 20.

The image pickup apparatus 30 may be, but is not limited to, a general audio and video camera, a web camera IPC (IP Camera) for providing photographed video frames to the back-end apparatus 20, and the image pickup apparatus 30 may be, but is not limited to, disposed in various public places such as banks, supermarkets, companies, airports, and the like.

In the embodiment of the present invention, after the image capturing apparatus 30 transmits the acquired image to the backend apparatus 20 through the network 10, the backend apparatus 20 performs the target detection method provided in the embodiment of the present invention according to the acquired image.

It should be noted that the embodiment of the invention is not only suitable for monitoring scenes, but also any video acquired by the image acquisition device can execute the target detection method provided by the embodiment of the invention.

Continuing with the application environment shown in fig. 1 as an example, in the prior art, in order to enable simultaneous object detection of multiple video channels, the back-end device 20 generally adopts a low frame rate analysis detection scheme. For example, the original Frame rate of the video is 25fps (Frame Per Second, i.e. Frame rate), and the Frame rate for performing the object detection is 8fps, so as to save hardware calculation force, and achieve the purpose of detecting multiple paths simultaneously.

However, the lower target analysis frame rate, while meeting the monitoring requirements, does not have a significant impact on the effectiveness of structuring, perimeter, etc. But for real-time tracked application scenarios (e.g., track box presentation, privacy occlusion, etc.), low frame rate analysis detection can affect the application effectiveness.

For example, when the original frame rate is 25fps and the frame rate of target detection is 8fps, only 1 frame can be subjected to target detection in every 3 frames, and the 3 frames can only be displayed and processed by multiplexing the same result. For real-time tracking frame display, the final effect is that people and vehicles move smoothly, but the tracking frame is not smooth; for the man-car mosaic shielding, real-time complete shielding of the target cannot be achieved, frames for target detection are not available, and shielding positions are inaccurate.

To solve this problem, the current solution is generally to use a method of dynamically detecting the frame rate, and dynamically configuring the detected frame rate according to the detection capability and the number of detection paths.

For example, the total detection capability of the back-end device is 1080P@25fps, and when only 1-path target detection is started, target detection is performed by using 25 fps; when 2 paths are opened, 12fps is adopted for target detection; when 3 lanes are opened, 8fps is used for target detection. This balances the contradiction between the detection effect and the calculation force limitation to some extent, but does not solve the problem of inaccurate processing positions.

The related technology also provides a track prediction algorithm, and future motion tracks can be deduced through historical tracks, but the predicted tracks of the algorithm have certain errors with actual tracks, are usually applied to application scenes of automatic driving, automatic logistics distribution robots and the like, which need to foresee future approximate routes of people and vehicles in advance, and are not suitable for application scenes of target tracking, track restoration, privacy shielding and the like, which have finer requirements on target coordinates.

In order to solve the technical problems, the embodiment of the invention provides a target detection method, which predicts the detection result of a non-detection frame through a prediction algorithm under the condition of low-frame-rate target detection to obtain a target prediction result, and further correspondingly processes the non-detection frame based on the predicted detection result, so that the application effect of low-frame-rate detection can be effectively improved, and the problems of discontinuous target tracking frames, unsmooth track, incomplete privacy shielding and the like of vehicles and people are solved.

Referring to fig. 2, fig. 2 is a schematic flowchart of a target detection method according to an embodiment of the present invention, where the method includes:

s201, determining a plurality of detection frames from a video frame sequence and detection information of a target object in the detection frames.

The detection frames represent video frames used for carrying out target detection processing, and preset frame intervals are met among all detection frames. In the video frame sequence, except for the detection frame, the rest of the video frames to be processed can be regarded as non-detection video frames, namely, target detection processing is not needed, so that the effect of low-frame-rate analysis detection can be realized.

In some possible embodiments, the preset frame interval may be customized according to the actual low frame rate analysis requirement, for example, the preset frame interval may be set to 3 frames as an example, assuming that the video frame sequence is: f1, F2, F3, F4, F5, F6, F7, F8, F9, F10, F11, F12, F13, F14, F15, F16, F17, F18, F19, F20, F21, F22, then the detection frame comprises: f1, F4, F7, F10, F13, F16, F19, F22.

In other possible embodiments, the preset frame interval may be dynamically configured according to the detection capability and the detection path number of the back-end device.

In this embodiment, the target object may be, but not limited to, a person, a car, an animal, etc., and the detection information may be, but not limited to, important information including coordinate information, type information, unique ID identification, time stamp, etc. of the target object. In the implementation process, the target detection can be performed on one target object in one detection frame according to the service requirement, or the target detection can be performed on all target objects in the detection frame.

S204, for any non-detection frame with the target object, predicting the coordinate information of the target object in the non-detection frame according to the detection information corresponding to the determined target detection frame.

The target detection frame is at least one detection frame positioned before the non-detection frame, and one detection frame positioned after the non-detection frame and nearest to the non-detection frame.

In step S204, since each detection frame corresponds to detection information, for any one non-detection frame, the detection frame used for predicting the coordinate information of the non-detection video frame can be determined according to the sequence order of the video frame sequence, and further, the coordinate information of the target object in the non-detection frame can be predicted according to the detection information of the target object in the target detection frame.

It will be appreciated that the predicted coordinate information may be used to guide the corresponding business process to be performed on the target object in the non-detected frame, such as the business process of overlaying a face or a vehicle tracking frame, a mosaic, a blur, etc.

The target detection method provided by the embodiment of the invention predicts the target result of the non-detection frame based on the detection information of the detection frame through a prediction algorithm under the condition of low-frame-rate target detection, and obtains the target prediction result, and is different from the prior art in that the prior art is used for carrying out corresponding business processing on the detection result of the multiplexing detection frame for the non-detection frame, so that the problems of poor processing effect, inaccuracy and the like easily occur.

For easy understanding of the implementation process, please refer to a scene diagram shown in fig. 3, which shows a set of video frame sequences, it will be understood that the video frame sequences may be sequences composed of video frames decoded by the back-end device every second.

For example, assuming a frame rate of 22fps per second for the backend device, the sequence of video frames may have 22 frames of images. A preset frame interval may be defined to determine a plurality of detection frames from the 22 frame image based on the low frame rate detection capability supported by the backend device.

As shown in fig. 3, assume that the video frame sequence is: f1, F2, F3, F4, F5, F6, F7, F8, F9, F10, F11, F12, F13, F14, F15, F16, F17, F18, F19, F20, F21, F22. The detection frame includes: f1, F4, F7, F10, F13, F16, F19, F22, the other video frames in the sequence of video frames being non-detection frames. The detection information corresponding to each detection frame may be denoted as a.

In one scenario, assuming that the non-detection frame is F2, it is determined that the target detection frame corresponding to F2 is F1, F4, the detection information corresponding to F1 is A1, and the detection information corresponding to F4 is A4, and then the detection information A2 corresponding to F2 can be predicted according to A1 and A4.

In another scenario, assuming that the non-detection frame is F8, it can be seen that three detection frames F1, F4, and F7 exist before F8, and in one implementation, if it is determined that the target detection frames corresponding to F8 are F7 and F10, the detection information A8 corresponding to F8 can be predicted according to F7 and F10; in one implementation, if the target detection frame corresponding to F8 is F1, F4, F7, and F10, the detection information A8 corresponding to F8 may be predicted according to A1, A4, A7, and a 10.

It is understood that when a plurality of target detection frames are employed to predict the detection information of the non-detection frames, the accuracy of the prediction result can be improved.

In a specific implementation process, for step S201, a plurality of detection frames may be determined in a video frame sequence according to a preset frame interval, and then target detection processing is performed on the plurality of detection frames to obtain detection information.

For example, whether the current video frame is a detection frame may be determined according to a preset frame interval after each video frame is decoded, if so, target detection is performed on the current video frame, the detection result is cached in a detection result sequence L (a), and the previous video frame is cached in a new video frame sequence L (F). The detection result can be directly read from L (A) in the prediction process.

For example, with continued reference to fig. 3, the backend apparatus sequentially receives video frames transmitted from the image capturing apparatus, determines the received first frame F1 as a detection frame, performs object detection on F1 to obtain A1, buffers A1 to L (a), and buffers F1 to a video frame sequence L (F).

Further, for the current video frame received subsequently, whether a preset frame interval is met or not is judged, for example, the preset frame interval is 3 frames, the current video frame is F2, the preset frame interval is not met between F2 and F1, F2 is cached to L (F), judgment of the subsequent video frame is continued, when the current video frame is F4, F4 is determined to be a detection frame when the preset frame interval is met between F4 and F1, target detection is carried out on F4 to obtain A4, A4 is cached to L (A), F4 is cached to L (F), and the detection frames in the subsequent video frame are determined by analogy.

In the specific implementation process, taking fig. 3 as an example, the implementation process may be executed in the back-end device according to the following steps:

step 1, the back-end equipment receives F1, determines F1 as a detection frame, performs target detection on F1 to obtain a detection result A1, caches F1 into L (F), and caches A1 into L (A).

And step 2, F2 and F3 are received in sequence, F2 and F3 are determined to be non-detection frames, and F2 and F3 are cached to a queue L (F).

And step 3, receiving F4, determining F4 as a detection frame, carrying out target detection on F4 to obtain a detection result A4, caching F4 into L (F), and caching A4 into L (A).

Step 4, inquiring that 1 detection result A1 exists in the detection result queue L (A), and further calculating a motion vector V1, V1= (A4-A1)/(T (F4) -T (F1)) of a target in unit time between F1 and F4 according to the coordinate change of a corresponding target (a target with the same ID) in the A4 and the A1, wherein A4 and A1 can respectively represent coordinate information of the target object in the F4 and the F1, and T (F4) and T (F1) respectively represent time information of shooting the target object.

Step 5, inquiring that non-detection frames F1 and F2 exist in the frame queue L (F), and respectively calculating the time difference delta t1 of F2 relative to F1 and the time difference delta t2 of F3 relative to F1, so as to obtain the predicted coordinates of the corresponding target of F2 as follows: a2 =Δt1 v1+a1, a3=Δt2 v1+a1.

And 6, repeating the steps 1-5 for the subsequent video frames until all the non-detection frames complete prediction.

The coordinate information of the target object in the non-detection frame can be predicted through the process, further, business processing can be carried out on the target object in each video frame according to the coordinate position corresponding to each video frame, and the YUV after processing is recoded and distributed.

In some possible embodiments, after step 6, in order to reduce the buffer pressure of the backend device, the video frames F1, F2, F3 that have undergone the service processing may also be deleted from L (F).

Alternatively, with continued reference to fig. 3, it can be seen that for F2 and F3 there is only one detection frame before it, so that there are only 2 target detection frames for F2 and F3, namely F1 and F4; for the subsequent non-detection frames (for example, F5 and F6), there are at least 2 detection frames before it, so for the subsequent non-detection frames, the corresponding target detection frames may be greater than 2, and when the target detection frames are greater than 2, the prediction accuracy may be improved accordingly.

Therefore, the present embodiment also provides an implementation manner of determining the target detection frame corresponding to the non-detection frame, please refer to fig. 4, and fig. 4 is a schematic flowchart of another target detection method provided in the embodiment of the present invention.

If there is only one detected frame before the non-detected frame, the one detected frame and one detected frame located after and closest to the non-detected frame are determined as target detected frames S202.

S203, if at least a plurality of detection frames exist before the non-detection frame, determining a preset number of detection frames before the non-detection frame, or determining one detection frame which is positioned before the non-detection frame and is closest to the non-detection frame and one detection frame which is positioned after the non-detection frame and is closest to the non-detection frame as a target detection frame;

for ease of understanding, please continue to refer to fig. 3, for the non-detection frames F2, F3, there is only one detection frame F1 before it, and one detection frame located after F2 and F3 and closest to F4 is detected, then the target detection frames corresponding to F2 and F3 are only F1 and F4. For the non-detection frame F8, there are F1, F4 and F7 in the detection frames existing before it, and the detection frame closest to it after it is F10, then the target detection frame corresponding to F8 may be as follows:

in one implementation, it may be determined that F7 and F10 are the target detection frames corresponding to F8.

In another implementation manner, at least 2 detection frames F1, F4, and F7 and F10 may be determined as target detection frames, for example, each of F1, F4, F7, and F10 is determined as a target detection frame corresponding to F8, and when the detection result A8 corresponding to F8 is predicted, prediction may be performed by combining the detection information of each of F1, F4, F7, and F10, which is beneficial to improving accuracy of prediction.

The preset number is smaller than or equal to the number of detection frames before the non-detection frames, and may be determined according to the buffering capability of the back-end device, for example, in the process that the back-end device receives the video frame, the detection results of the detection frames may be buffered into the detection result sequence L (a), so, assuming that 4 detection results can be stored in the L (a) at maximum, the preset number N satisfies the following relation: 2< = N < = 4.

Alternatively, as will be appreciated from the foregoing description with continued reference to fig. 3, for one non-detection frame, there may be only 2 target detection frames, or there may also be more than 2 target detection frames, and embodiments of the target detection results of two predicted non-detection frames are given below for both scenarios.

In one scenario, when there is only one target detection frame before the non-detection frame, an implementation of step S204 may be as shown in fig. 5, and fig. 5 is a schematic flowchart of an implementation of step S204 provided by an embodiment of the present invention.

S204-1a, determining the motion vector of the target object according to the coordinate information of the target object in the target detection frame positioned before the non-detection frame and the coordinate information of the target object in the target detection frame positioned after the non-detection frame.

S204-2a, calculating a time difference value between the non-detection frame and a target detection frame positioned before the non-detection frame;

s204-3a, predicting the coordinate information of the target object in the non-detection frame according to the motion vector, the time difference value and the coordinate information of one target detection frame positioned before the non-detection frame.

For example, with continued reference to fig. 3, for the non-detection frame being F2, the corresponding target detection frames are determined to be F1 and F4, and the detection results may be A1 and A4, respectively, and first, the motion vector V1 of the target object between F1 and F4 in unit time is calculated. Further, when the time difference Δt1 between the non-detection frame F2 and F1 is calculated, the coordinate information of the prediction target object 001 in the non-detection frame F2 may be: a2 By the same token, detection information A3 corresponding to the non-detection frame F3 can be predicted, =Δt1×v1+a1.

In another scenario, when there are multiple detection frames before the non-detection frame, another implementation of step S204 may be as shown in fig. 6, where fig. 6 is a schematic flow chart of another implementation of step S204 provided by an embodiment of the present invention.

S204-1b, according to the sequence order of the preset number of target detection frames, sequentially obtaining a plurality of motion vector sequences according to the respective coordinate information of the two adjacent target detection frames.

In this embodiment, assuming that the preset number is N, for N target detection frames, motion vectors of unit time of every two adjacent target detection frames are calculated, and V is calculated, where the calculation method is as follows: v (N) = (a (N) -a (N-1))/Δt (N-1), wherein a (N) represents the detection result of the nth target detection frame, a (N-1) represents the detection result of the nth-1 target detection frame, and Δt (N-1) represents the timestamp difference between a (N) and a (N-1). N-1 motion vectors can be calculated according to N target detections, and a motion vector sequence is formed.

For example, continuing to take fig. 3 as an example, assuming that the non-detection frame is F8, the corresponding target detection frames are F1, F4, F7, F10, and the detection results corresponding to the respective target detection frames are A1, A4, A7, a10, the motion vectors V1, V4, V7 are calculated between A1 to A4, V7, respectively.

S204-2b, respectively calculating time difference sequences of the non-detection frame and a plurality of target detection frames positioned before the non-detection frame.

For example, with continued reference to the above example, the time difference between the non-detection frames F8 and F1, F4, F7, F10, i.e., Δt7=t (F8) -T (F7), Δt4=t (F8) -T (F4), Δt1=t (F8) -T (F1), is calculated, and the time difference sequence is denoted as { Δt1, Δt4, Δt7}.

S204-3b, determining a weight coefficient sequence corresponding to a plurality of target detection frames positioned before the non-detection frames.

In this embodiment, the setting principle of the weight coefficient corresponding to each target detection frame may be: the closer to the non-detection frame, the greater the weight corresponding to the target detection frame, and the sum of all the weight coefficients is 1.

For example, with continued reference to the above example, the target detection frame preceding the non-detection frame F8 is F1, F4, F7, and the weight coefficients corresponding to F1, F4, and F7 are K1, K4, and K7, where if the distances between F7, F4, and F1 and F8 increase in sequence, K7> K4> K1, and k7+k4+k1=1, for example, k7=0.7, k4=0.2, and k1=0.1 may be set.

S204-4b, predicting the coordinate information of the target object in the non-detection frame according to the motion vector sequence, the time difference value sequence and the weight coefficient sequence.

In the embodiment of the present invention, the detection information of the non-detection frame may be predicted according to the following relation: wherein a represents the prediction result of the non-detection frame, K (i) represents the i-th weight coefficient, V (i) represents the i-th motion vector in the sequence of motion vectors, and Δt (i) represents the i-th time difference in the sequence of time differences.

For example, with continued reference to the above example, for the non-detection frame F8, the motion vector sequence is { V1, V4, V7}, the time difference sequence is { Δt1, Δt4, Δt7}, and the weight coefficient sequence is {0.1,0.2,0.7}, then the prediction result a8=k7×Δt7+k4×Δt4+k1×Δt1×v1 corresponding to the non-detection frame F8.

In the above example, the predicted coordinates calculated by the algorithm take into account the future running direction of F8, i.e. the motion vectors of A7 to A10, and also take into account the motion vectors it has been, i.e. A4 to A7 and A1 to A4, and the calculated predicted coordinates will be smoother and more accurate.

Optionally, in the embodiment, for the detected frame and the non-detected frame, according to the service requirement, the target object may be subjected to corresponding service processing according to the detection result or the prediction result, and a possible implementation is further provided below: according to the coordinate information of the target object in the detection frame and the coordinate information of the target object in the non-detection frame, the following processing is carried out on the target object: occlusion processing or blurring processing or configuration tracking identification box processing.

For example, if the target object in the detection frame is a person, the coordinate information of the person in the detection frame can be predicted, then the face is subjected to shielding processing or mosaic processing, or the tracking frame identification is configured, and since the coordinate information of the non-detection frame is predicted according to the detection information of the detection frame, the processing effect and accuracy can be effectively improved in the business processing process.

In order to implement the steps in the foregoing embodiments to achieve the corresponding technical effects, the object detection method provided in the embodiment of the present invention may be implemented in a hardware device or in a software module, and when the object detection method is implemented in a software module, the embodiment of the present invention further provides an object detection apparatus, please refer to fig. 7, fig. 7 is a functional block diagram of the object detection apparatus provided in the embodiment of the present invention, and the object detection apparatus 300 may include:

a determining module 310, configured to determine a plurality of detection frames from a video frame sequence, and detection information of a target object in the detection frames; wherein the detection frame characterizes a video frame for target detection processing; all the detection frames meet the preset frame interval;

a prediction module 320, configured to predict, for any non-detection frame in which the target object exists, coordinate information of the target object in the non-detection frame according to the detection information corresponding to the determined target detection frame;

wherein the target detection frame is at least one of the detection frames located before the non-detection frame, and one of the detection frames located after and closest to the non-detection frame.

Optionally, the detection information includes coordinate information and time information of the target object; when there is only one detected frame before the non-detected frame, the prediction module 320 is specifically configured to: determining a motion vector of the target object based on the coordinate information of the target object in the one target detection frame located before the non-detection frame and the coordinate information of the target object in the target detection frame located after the non-detection frame; calculating a time difference between the non-detection frame and one of the target detection frames located before the non-detection frame; and predicting the coordinate information of the target object in the non-detection frame according to the motion vector, the time difference value and the coordinate information of the target detection frame positioned before the non-detection frame.

Optionally, the detection information includes coordinate information and time information of the target object in a target detection frame; when there are multiple detected frames preceding the non-detected frame, the prediction module 320 is further specifically configured to: according to the sequence order of the preset number of target detection frames, a plurality of motion vector sequences are obtained according to the respective coordinate information of two adjacent target detection frames in sequence; respectively calculating time difference sequences between the non-detection frame and a plurality of target detection frames positioned in front of the non-detection frame; determining a weight coefficient sequence corresponding to a plurality of target detection frames positioned in front of the non-detection frames; and predicting coordinate information of the target object in the non-detection frame according to the motion vector sequence, the time difference value sequence and the weight coefficient sequence.

Optionally, the determining module 310 is further configured to: if only one detection frame exists before the non-detection frame, determining the one detection frame and one detection frame which is positioned behind the non-detection frame and is closest to the non-detection frame as the target detection frame; if at least a plurality of detection frames exist before the non-detection frame, determining a preset number of detection frames before the non-detection frame, or one detection frame which is positioned before the non-detection frame and is closest to the non-detection frame, and one detection frame which is positioned after the non-detection frame and is closest to the non-detection frame as the target detection frame; wherein the preset number is less than or equal to the number of detected frames preceding the non-detected frame.

Optionally, the target detection apparatus 300 further includes a service processing module, according to the coordinate information of the target object in the detection frame and the coordinate information of the target object in the non-detection frame, performing the following processing on the target object: occlusion processing or blurring processing or configuration tracking identification box processing.

Optionally, the determining module 310 is specifically configured to: determining the plurality of detection frames in the video frame sequence according to the preset frame interval; and performing target detection processing on the plurality of detection frames to obtain the detection information.

The embodiment of the present invention further provides an electronic device, which may be the back-end device 20 shown in fig. 1 or other electronic devices with data processing functions, and the present invention is not limited thereto.

Fig. 8 is a schematic block diagram of an electronic device according to an embodiment of the present invention. The electronic device 50 comprises a communication interface 501, a processor 502 and a memory 503. The processor 502, the memory 503 and the communication interface 501 are electrically connected to each other directly or indirectly to realize data transmission or interaction. For example, the components may be electrically connected to each other via one or more communication buses or signal lines. The memory 503 may be used to store software programs and modules, such as program instructions/modules corresponding to the object detection method provided in the embodiments of the present invention, and the processor 502 executes the software programs and modules stored in the memory 503, thereby performing various functional applications and data processing. The communication interface 501 may be used for communication of signaling or data with other node devices. The electronic device 50 may have a plurality of communication interfaces 501 in the present invention.

The Memory 503 may be, but is not limited to, random access Memory (Random Access Memory, RAM), read Only Memory (ROM), programmable Read Only Memory (Programmable Read-Only Memory, PROM), erasable Read Only Memory (Erasable Programmable Read-Only Memory, EPROM), electrically erasable Read Only Memory (Electric Erasable Programmable Read-Only Memory, EEPROM), etc.

The processor 502 may be an integrated circuit chip with signal processing capabilities. The processor may be a general-purpose processor including a central processing unit (Central Processing Unit, CPU), a network processor (Network Processor, NP), etc.; but also digital signal processors (Digital Signal Processing, DSP), application specific integrated circuits (Application Specific Integrated Circuit, ASIC), field programmable gate arrays (Field-Programmable Gate Array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc.

Alternatively, when the object detection method is implemented in the form of a software module, each functional module in the object detection apparatus 300 according to the embodiment of the present invention may be stored in a memory in the form of software or Firmware (Firmware) or be cured in an Operating System (OS) of the electronic device 50, and may be executed by the processor 502 in the electronic device 50. Meanwhile, data, codes of programs, and the like necessary for executing the above-described modules may be stored in the memory 503.

The embodiment of the present invention also provides a readable storage medium having stored thereon a computer program which, when executed by a processor, implements the object detection method according to any of the foregoing embodiments. The computer readable storage medium may be, but is not limited to, a usb disk, a removable hard disk, ROM, RAM, PROM, EPROM, EEPROM, a magnetic disk, or an optical disk, etc. various media capable of storing program codes.

The foregoing is merely illustrative of the present invention, and the present invention is not limited thereto, and any changes or substitutions easily contemplated by those skilled in the art within the scope of the present invention should be included in the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims

1. A method of target detection, the method comprising:

determining a plurality of detection frames and detection information of a target object in the detection frames from a video frame sequence; wherein the detection frame characterizes a video frame for target detection processing; all the detection frames meet the preset frame interval;

predicting coordinate information of the target object in a non-detection frame according to the detection information corresponding to the determined target detection frame aiming at any non-detection frame with the target object;

wherein the target detection frame is at least one of the detection frames located before the non-detection frame, and one of the detection frames located after and closest to the non-detection frame;

the detection information comprises coordinate information and time information of the target object; when only one detection frame exists before the non-detection frame, predicting coordinate information of the target object in the non-detection frame according to the detection information corresponding to the determined target detection frame, including:

determining a motion vector of the target object according to the coordinate information of the target object in a target detection frame positioned before the non-detection frame and the coordinate information of the target object in the target detection frame positioned after the non-detection frame; calculating a time difference between the non-detection frame and one of the target detection frames located before the non-detection frame; predicting coordinate information of the target object in the non-detection frame according to the motion vector, the time difference value and the coordinate information of the target detection frame positioned before the non-detection frame; or,

the detection information comprises coordinate information and time information of the target object in a target detection frame; when a plurality of detection frames exist before the non-detection frame, predicting coordinate information of the target object in the non-detection frame according to the detection information corresponding to the determined target detection frame, including: according to the sequence order of a preset number of target detection frames, a plurality of motion vector sequences are obtained sequentially according to the respective coordinate information of two adjacent target detection frames; respectively calculating time difference sequences between the non-detection frame and a plurality of target detection frames positioned in front of the non-detection frame; determining a weight coefficient sequence corresponding to a plurality of target detection frames positioned in front of the non-detection frames; and predicting coordinate information of the target object in the non-detection frame according to the motion vector sequence, the time difference value sequence and the weight coefficient sequence.

2. The target detection method according to claim 1, wherein, in a non-detection frame in which the target object exists for any one, the target object is predicted before coordinate information of the non-detection frame based on the detection information corresponding to the determined target detection frame, the method further comprising:

if only one detection frame exists before the non-detection frame, the one detection frame and one detection frame which is positioned behind the non-detection frame and is closest to the non-detection frame are taken as the target detection frame;

if at least a plurality of detection frames exist before the non-detection frame, determining a preset number of detection frames before the non-detection frame, or one detection frame which is positioned before the non-detection frame and is closest to the non-detection frame, and one detection frame which is positioned after the non-detection frame and is closest to the non-detection frame as the target detection frame;

wherein the preset number is less than or equal to the number of detected frames preceding the non-detected frame.

3. The target detection method according to claim 1, characterized in that the method further comprises:

and carrying out the following processing on the target object according to the coordinate information of the target object in the detection frame and the coordinate information of the target object in the non-detection frame: occlusion processing or blurring processing or configuration tracking identification box processing.

4. The method according to claim 1, wherein determining a plurality of detection frames from a sequence of video frames, and detection information of a target object in each of the detection frames, comprises:

determining the plurality of detection frames in the video frame sequence according to the preset frame interval;

and performing target detection processing on the plurality of detection frames to obtain the detection information.

5. An object detection apparatus, comprising:

the determining module is used for determining a plurality of detection frames and detection information of a target object in the detection frames from the video frame sequence; wherein the detection frame characterizes a video frame for target detection processing; all the detection frames meet the preset frame interval;

the prediction module is used for predicting coordinate information of the target object in a non-detection frame according to the detection information corresponding to the determined target detection frame aiming at any non-detection frame in which the target object exists;

wherein the target detection frame is at least one detection frame located before the non-detection frame, and one detection frame located after the non-detection frame and closest to the non-detection frame;

the detection information comprises coordinate information and time information of the target object; when there is only one target detection frame before the non-detection frame, the prediction module is specifically configured to:

determining a motion vector of the target object based on the coordinate information of the target object in the one target detection frame located before the non-detection frame and the coordinate information of the target object in the target detection frame located after the non-detection frame; calculating a time difference between the non-detection frame and one of the target detection frames located before the non-detection frame; predicting coordinate information of the target object in the non-detection frame according to the motion vector, the time difference value and the coordinate information of the target detection frame positioned before the non-detection frame; or,

6. An electronic device comprising a processor and a memory, the memory storing a computer program executable by the processor, the processor executable by the computer program to implement the object detection method of any one of claims 1-4.

7. A readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the object detection method according to any one of claims 1-4.