CN113794886A - Method and device for assisting target detection using video coding information - Google Patents

Method and device for assisting target detection using video coding information Download PDF

Info

Publication number
CN113794886A
CN113794886A CN202110918613.3A CN202110918613A CN113794886A CN 113794886 A CN113794886 A CN 113794886A CN 202110918613 A CN202110918613 A CN 202110918613A CN 113794886 A CN113794886 A CN 113794886A
Authority
CN
China
Prior art keywords
target
window
frame
target detection
detected
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110918613.3A
Other languages
Chinese (zh)
Other versions
CN113794886B (en
Inventor
徐林
周炎钧
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Rongming Microelectronics Jinan Co ltd
Original Assignee
Rongming Microelectronics Jinan Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Rongming Microelectronics Jinan Co ltd filed Critical Rongming Microelectronics Jinan Co ltd
Priority to CN202110918613.3A priority Critical patent/CN113794886B/en
Publication of CN113794886A publication Critical patent/CN113794886A/en
Application granted granted Critical
Publication of CN113794886B publication Critical patent/CN113794886B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/50Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
    • H04N19/503Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving temporal prediction
    • H04N19/51Motion estimation or motion compensation
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/134Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
    • H04N19/136Incoming video signal characteristics or properties
    • H04N19/137Motion inside a coding unit, e.g. average field, frame or block difference
    • H04N19/139Analysis of motion vectors, e.g. their magnitude, direction, variance or reliability
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/169Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
    • H04N19/17Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object
    • H04N19/176Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object the region being a block, e.g. a macroblock
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/50Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
    • H04N19/503Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving temporal prediction
    • H04N19/51Motion estimation or motion compensation
    • H04N19/513Processing of motion vectors
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/50Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
    • H04N19/593Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving spatial prediction techniques

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Image Analysis (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)

Abstract

本发明公开了一种采用视频编码信息辅助目标检测方法及设备,包括:为待检测视频帧确定目标参考帧,所述目标参考帧包括数个子窗口块;为各子窗口块:在所述待检测视频帧确定该子窗口块的参考位置;确定该参考位置的参考值;基于各子窗口块的参考值确定所述待检测视频帧的整体参考值;在目标形变函数的约束范围内,基于所述整体参考值,在所述待检测视频帧中查找目标窗口;基于所述目标窗口执行目标检测。本公开的方法能够有效降低计算量,提高目标检测的效率,并在整体上,节约硬件资源和减少功耗和延时。

Figure 202110918613

The invention discloses a method and device for assisting target detection using video coding information, including: determining a target reference frame for a video frame to be detected, the target reference frame including several sub-window blocks; for each sub-window block: in the to-be-detected video frame Detect the video frame to determine the reference position of the sub-window block; determine the reference value of the reference position; determine the overall reference value of the video frame to be detected based on the reference value of each sub-window block; within the constraint range of the target deformation function, based on For the overall reference value, a target window is found in the video frame to be detected; target detection is performed based on the target window. The method of the present disclosure can effectively reduce the amount of calculation, improve the efficiency of target detection, and on the whole, save hardware resources and reduce power consumption and delay.

Figure 202110918613

Description

Method and apparatus for assisting target detection using video coding information
Technical Field
The invention relates to the technical field of computers, in particular to a method and equipment for assisting target detection by adopting video coding information.
Background
Object detection and recognition is a common problem in computer vision, which has wide application in many areas of life. The purpose of target detection is to distinguish a target from an uninteresting portion in an image or video, determine whether the target exists, and determine the position of the target if the target exists. At present, a method based on deep learning is generally used for detecting a target with higher precision. The position and the type of the target are found through the convolutional neural network, and the accuracy of the target can reach or even exceed the level of human eyes.
The biggest drawback of deep learning-based target detection is high computational complexity. The use of object detection on each frame of video may require a significant amount of computational resources (typically using a GPU) while also potentially increasing latency. Therefore, object detection is often used in conjunction with object tracking. Even so, the computational complexity of target tracking is high. This limits the field of use of target detection to a great extent and also causes great computational and energy consumption.
Another common approach is video object detection using motion information. Currently, the motion information used in a relatively large amount is optical flow information. The method for measuring the instantaneous speed of the pixel motion of a space moving object on an observation imaging plane is a method for finding the corresponding relation between the previous frame and the current frame by using the change of the pixels in the image sequence on a time domain and the correlation between the adjacent frames so as to calculate the motion information of the object between the adjacent frames. In general, optical flow is due to movement of the foreground objects themselves in the scene, motion of the camera, or both. But the computational effort to extract and use the optical flow itself is very large.
Because the existing target detection has high computational complexity, almost all protocols need to carry out motion estimation in the video coding and decoding process, and the target tracking is effectively carried out by utilizing the motion estimation, thereby improving the efficiency and the performance of the target detection.
Disclosure of Invention
The embodiment of the invention provides a method and equipment for assisting target detection by adopting video coding information, which are used for realizing auxiliary target detection and improving the efficiency of target detection.
In a first aspect, an embodiment of the present invention provides a method for assisting target detection by using video coding information, including: determining a target reference frame for a video frame to be detected, wherein the target reference frame comprises a plurality of sub-window blocks; for each sub-window block: determining the reference position of the sub-window block in the video frame to be detected; determining a reference value for the reference position; determining an overall reference value based on the reference values of the respective sub-window blocks; searching a target window in the video frame to be detected based on the integral reference value within the constraint range of the target deformation function; target detection is performed based on the target window.
In some embodiments, determining the target reference frame for the video frame to be detected comprises:
and selecting a target reference frame in front of the video frame to be detected based on a preset time range and the video frame to be detected.
In some embodiments, the preset time range is determined at a specified time interval, or is determined based on a time correlation function, wherein the closer to the video frame to be detected, the higher the confidence of the target reference frame.
In some embodiments, determining the reference value for the reference location comprises:
and determining a reference value of the reference position based on the size of the block corresponding to the reference position and the reliability of the reference position, wherein the reference value of the reference position increases with the increase of the size of the block corresponding to the reference position and/or the increase of the reliability of the reference position.
In some embodiments, finding a target window in the video frame to be detected comprises:
and searching a target window in the video frame to be detected within the constraint range of the target deformation function, so that the overall reference value of the target window and the overall reference value of the video frame to be detected exceed a first threshold, and the sum of the reference values of the reference positions contained in the target window and the size of the target window exceed a second threshold.
In some embodiments, further comprising: and pre-configuring a target detection count value, subtracting 1 from the target detection count value under the condition that the target window is found, and executing target detection based on each target window under the condition that the target detection count value is 0.
In some embodiments, further comprising: and under the condition that the target window cannot be found, directly executing target detection.
In some embodiments, the first threshold and/or the second threshold are adjusted to stabilize the search target window.
In a second aspect, an embodiment of the present invention further provides an object detection device for video coding, including a processor configured to execute steps of implementing an auxiliary object detection method for video coding according to embodiments of the present disclosure.
In a third aspect, an embodiment of the present invention further provides a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and when the computer program is executed by a processor, the steps of the auxiliary target detection method for video coding according to the embodiments of the present disclosure are implemented.
According to the embodiment of the invention, the target window is searched in the video frame to be detected based on the determined integral reference value in the constraint range of the target deformation function, and the target detection is executed based on the target window, so that the calculated amount can be effectively reduced, and the target detection efficiency is improved. Thereby saving hardware resources and reducing power consumption and delay on the whole.
The foregoing description is only an overview of the technical solutions of the present invention, and the embodiments of the present invention are described below in order to make the technical means of the present invention more clearly understood and to make the above and other objects, features, and advantages of the present invention more clearly understandable.
Drawings
Various other advantages and benefits will become apparent to those of ordinary skill in the art upon reading the following detailed description of the preferred embodiments. The drawings are only for purposes of illustrating the preferred embodiments and are not to be construed as limiting the invention. Also, like reference numerals are used to refer to like parts throughout the drawings. In the drawings:
fig. 1 shows a schematic diagram of moving object vector detection.
Fig. 2 shows a basic flowchart of a target detection method according to an embodiment of the present disclosure.
Detailed Description
Exemplary embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.
Almost all protocols require motion estimation during video encoding and decoding. The basic idea is to divide each frame of the image sequence into a plurality of non-overlapping blocks, consider the displacement of all pixels in the blocks to be the same, then find out the block most similar to the current block, i.e. the matching block, from each macro block to the reference frame according to a certain block matching criterion within a certain search range, and the relative displacement between the matching block and the current block is the motion vector.
Generally, the displacement of the object motion between two adjacent frames before and after the current macroblock position is not large, so that a small area around the macroblock at the same position in the historical frame can be searched from the current macroblock position to find a matching block. Obviously, the larger the search area, the higher the calculation cost, so that some skill is needed in practical products to accelerate the search speed and reduce the calculation amount. For example, the h.264 protocol allows for finding a matching block in any reconstructed frame in the DPB buffer to find an optimal matching block. It is clear that the cost of searching is proportional to the number of reference frames, e.g. the time (chip area) and power consumption required to search for a matching block in 2 reference frames is essentially 2 times that of the single reference frame case. Therefore, from the viewpoint of cost saving, the simplest scheme is to only refer to the previous frame, which saves both reference frame buffer and encoder computation power, but has the disadvantage of sacrificing some code rate performance.
After finding a best matching block, Motion estimation outputs a Motion Vector (MV), i.e., the position coordinates of the reference block with respect to the current block. If no suitable matching block is found, the macroblock may be intra predicted. Thus, as shown in fig. 1, both P-frames and B-frames may have Intra-coded macroblocks in h.264 and h.265 technologies.
If the frequency of target detection is reduced, the method can be used in certain application scenes. But is not applicable to scenarios where video frame rates are required. For example, in a region enhancement application, a video encoder needs to adjust the QP for a specific object to improve image quality, which requires target recognition for each frame. Even when license plate recognition is carried out in a parking lot, some high-speed vehicles may miss detection due to the interval of the timing snapshots.
In the conventional video target detection using motion information, additional computing resources are required to estimate the motion information, so that resource waste is caused, and the cost of hardware and energy consumption is increased.
In order to determine and track target detection by using motion vectors, an embodiment of the present invention provides a method for assisting target detection by using video coding information, including:
step S201, determining a target reference frame for a video frame to be detected, wherein the target reference frame comprises a plurality of sub-window blocks. In particular, the sub-window block in this embodiment may be a target location found through deep learning or other methods. Due to the continuity of video in many cases, the target reference frame, e.g., the previous frame, can be selected using the previous video frame in the temporal dimension. In step S202, for each sub-window block: determining the reference position of the sub-window block in the video frame to be detected; a reference value for the reference position is determined. That is, for a block in the target window of each target reference frame, a reference position of the sub-window block may be determined in the video frame to be detected, and the reference position may be a corresponding position of the sub-window block in the video frame to be detected. From this reference position, a reference value may be calculated, which may be used to describe the temporal confidence of the sub-window-block in embodiments of the present disclosure. An overall reference value is determined based on the reference values of the respective sub-window blocks in step S203. The overall reference value may be determined by summing, averaging or weighted averaging, for example, and is not limited herein.
Then, in step S204, in the constraint range of the target deformation function, based on the overall reference value, a target window is searched in the video frame to be detected. Finally, in step S205, target detection is performed based on the target window. In some embodiments, initially, the target detection result of the previous frame may be obtained by deep learning, so as to determine the position of each sub-window block containing the moving target. And the position of the sub-window block in the target detection result region in the frame before detection can be found through the motion vector. These locations may be very concentrated and may also be very dispersed. In the decentralized case, if all locations need to be covered, the new target window it generates is large. To control this, the present application proposes to limit the range of the deformation by a target deformation function, the target window satisfying a condition similar to that of the previous frame, and also allowing a certain variation.
The method of the present disclosure is able to find its new position for the next frame (video frame to be detected) by motion estimation without reusing deep learning. And the motion estimation is a necessary step in the video coding process, so that the target detection efficiency is effectively improved, and the target detection result of each frame is from the motion vectors and the target detection results of other frames, so that the operation resource is effectively saved. And searching a target window through a target deformation function, thereby reducing the calculation amount of searching.
In some embodiments, determining the target reference frame for the video frame to be detected comprises:
and selecting a target reference frame in front of the video frame to be detected based on a preset time range and the video frame to be detected. In some embodiments, the preset time range is determined at a specified time interval, or is determined based on a time correlation function, wherein the closer to the video frame to be detected, the higher the confidence of the target reference frame. For example, a video frame before a specified time interval may be selected as the target reference frame, for example, a previous frame may be selected as the target reference frame, and a second previous frame may also be selected as the target reference frame. The time reliability function f (t): the input is the time interval with the current frame, and the output is the reliability. The shorter the interval, the closer the target reference frame and the current frame are, the higher their confidence level is. Its implementation can be implemented with a one-dimensional LUT. The maximum value of the output is 1, and the minimum value is 0. For example, if it is desired to refer to only the previous frame (previous frame as target reference frame) F (t ═ -1) ═ 1, F (t | ═ -1) ═ 0. If it is desired to refer to two frames F (t-1) 0.8, F (t-2) 0.8, F (t-1, -2).
In some embodiments, determining the reference value for the reference location comprises: and determining a reference value of the reference position based on the size of the block corresponding to the reference position and the reliability of the reference position, wherein the reference value of the reference position increases with the increase of the size of the block corresponding to the reference position and/or the increase of the reliability of the reference position. As an example, the size × confidence value of the block corresponding to the reference position may be utilized. Step S103, determining an overall reference value based on the reference values of the respective sub-window blocks. The overall reference value may be determined by way of summation, for example.
In some embodiments, finding a target window in the video frame to be detected comprises: and searching a target window in the video frame to be detected within the constraint range of the target deformation function, so that the overall reference value of the target window and the overall reference value of the video frame to be detected exceed a first threshold, and the sum of the reference values of the reference positions contained in the target window and the size of the target window exceed a second threshold. The specific search method may be to use the target deformation function Q (X, Y): the allowable deformation range of the target window in the current frame. The deformation allowed for the target window corresponding to the current frame is compared to the size and shape of the target window in the previous frame. For example, Q (0.1 ) may be a deformation (increase or decrease) that allows 10% of each of the width and height of the target window. It is also possible to allow only fixed size target windows [1.1W, 1.1H ], [1.1W, 1.0H ], [1.1W, 0.9H ], [1.0W, 1.1H ], [1.0W, 1.0H ], [1.0W, 0.9H ], [0.9W, 1.1H ], [0.9W, 1.0H ], [0.9W, 0.9H ]. Q (0, 0) indicates that the target window size is consistent with the previous frame. In some embodiments, the target deformation function may be configured to tolerate a small range of scene deformation. The purpose of the target warping function in this example is to limit the scope of the search, reducing the computational load of the search. Meanwhile, the deformation of the target in the video can be tolerated to a certain extent. Considering the short time from frame to frame, the distortion tolerance is set to a small range in most scenes in this example. The first threshold value Pc and the second threshold value Pd may be preset, so that a new target window may be searched in the current frame (frame to be detected) within the allowable range of the target deformation function, and the ratio of the sum of the reliable reference values of the reference positions to the overall reliable reference value may need to exceed Pc. If a matching target window is found, it is further detected that the ratio of the sum of the sizes of the corresponding reference positions to the size of the target window needs to exceed Pd.
In some embodiments, the first threshold and/or the second threshold may be adjusted to stabilize the search target window. In this example, it is easier to obtain a larger reference position coverage, i.e., larger than Pc, considering the large target window. While a small target window is easier to achieve with higher coverage density. By adjusting Pc and Pd, the stable target window can be controlled to be finally obtained.
In some embodiments, a new target window may be found in the current frame within the allowable range of the target warping function, which may contain the ratio of the sum of the trusted reference values of the reference positions to the overall trusted reference value that needs to exceed Pc.
The new window must contain a certain proportion of blocks in the old window (move into the new window) → new window enlargement. The ratio of the sum of the corresponding reference position sizes to the size of the target window for the new target window found needs to exceed Pd, but there must not be too many unknown blocks in the new window (which do not come from the old window) → the new window shrinking. The target window is obtained by matching according to the process. Based on two factors of Pd and Pc, the position of the target in the current frame and the size of the target window are controlled, so that the target window fits the actual requirement.
In some embodiments, further comprising: and pre-configuring a target detection count value, subtracting 1 from the target detection count value under the condition that the target window is found, and executing target detection based on each target window under the condition that the target detection count value is 0. In some embodiments, further comprising: and under the condition that the target window cannot be found, directly executing target detection. For example, the target detection count value N ═ k may be configured in advance, and the target detection count value may be configured in a forced target detection counter for performing counting of forced target detection.
If a new frame appears, if intra prediction is forced due to the frame being an I-frame (key frame) or other mechanism. The number of reference frames is 0, and the motion vector cannot be determined, so that target detection can be performed on the frame by a preset method, such as a deep learning method. The value N ═ k.
And if the target window conforming to Pc and Pd is found, outputting the target window N-1.
If there is no target window that meets both Pc and Pd (not meeting the target tracking condition), directly running target detection (e.g. directly completing target localization or target detection for the frame through deep learning). The value N ═ k.
If N is 0, target detection is run and N is reset. N ═ k.
In summary, the method of the present invention can use the motion vector to track the target, thereby reducing the use of computing resources. The target detection result of each frame is derived from the motion vectors and target detection results of other frames, and in the process, continuous accumulation of errors can be generated. Setting the forced target detection counter is also employed in this example to prevent accumulated errors. Therefore, the newly appeared target cannot be missed under the condition of ensuring target tracking. The method disclosed by the invention cannot find a new target, the target to be tracked can be detected by matching with deep learning in the specific implementation process, then the target tracking is carried out by the method disclosed by the invention, and the algorithm complexity in the case of tracking the same target is 1/30 or even lower because the target positioning is completed by using deep learning in interframes, so that the method disclosed by the invention effectively saves the computing resources. The range of searching is limited by using the target deformation function, and the calculation amount of searching is greatly reduced. By adjusting Pc and Pd, the stable target window can be controlled to be finally obtained. And the weight of the reference frame is adjusted by using the time credibility function, so that a more accurate result is obtained. During decoding. The decoder can also obtain information of the motion vectors. Therefore, the method disclosed by the invention can be used for accelerating the target detection and saving the computation.
In a second aspect, an embodiment of the present invention further provides an object detection device for video coding, including a processor configured to execute steps of implementing an auxiliary object detection method for video coding according to embodiments of the present disclosure.
In a third aspect, an embodiment of the present invention further provides a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and when the computer program is executed by a processor, the steps of the auxiliary target detection method for video coding according to the embodiments of the present disclosure are implemented.
It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.
The above-mentioned serial numbers of the embodiments of the present invention are merely for description and do not represent the merits of the embodiments.
Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner. Based on such understanding, the technical solutions of the present invention may be embodied in the form of a software product, which is stored in a storage medium (such as ROM/RAM, magnetic disk, optical disk) and includes instructions for enabling a terminal (such as a mobile phone, a computer, a server, or a network device) to execute the method according to the embodiments of the present invention.
While the present invention has been described with reference to the embodiments shown in the drawings, the present invention is not limited to the embodiments, which are illustrative and not restrictive, and it will be apparent to those skilled in the art that various changes and modifications can be made therein without departing from the spirit and scope of the invention as defined in the appended claims.

Claims (10)

1.一种采用视频编码信息辅助目标检测方法,其特征在于,包括:1. a method for assisting target detection using video coding information, is characterized in that, comprising: 为待检测视频帧确定目标参考帧,所述目标参考帧包括数个子窗口块;Determine a target reference frame for the video frame to be detected, and the target reference frame includes several sub-window blocks; 为各子窗口块:For each child window block: 在所述待检测视频帧中确定该子窗口块的参考位置;Determine the reference position of the sub-window block in the video frame to be detected; 确定该参考位置的参考值;determine the reference value of the reference position; 基于各子窗口块的参考值确定整体参考值;Determine the overall reference value based on the reference value of each sub-window block; 在目标形变函数的约束范围内,基于所述整体参考值,在所述待检测视频帧中查找目标窗口;Searching for a target window in the video frame to be detected based on the overall reference value within the constraint range of the target deformation function; 基于所述目标窗口执行目标检测。Object detection is performed based on the object window. 2.如权利要求1所述的采用视频编码信息辅助目标检测方法,其特征在于,为待检测视频帧确定目标参考帧包括:2. The method for assisting target detection using video coding information as claimed in claim 1, wherein determining the target reference frame for the video frame to be detected comprises: 基于预设时间范围和所述待检测视频帧,在所述待检测视频帧之前选取目标参考帧。Based on a preset time range and the to-be-detected video frame, a target reference frame is selected before the to-be-detected video frame. 3.如权利要求2所述的采用视频编码信息辅助目标检测方法,其特征在于,所述预设时间范围是指定的时间间隔确定的,或者,是基于时间相关函数来确定的,其中距所述待检测视频帧越近,目标参考帧的可信度越高。3. The method for assisting target detection using video coding information as claimed in claim 2, wherein the preset time range is determined by a specified time interval, or is determined based on a time correlation function, wherein The closer the video frame to be detected is, the higher the reliability of the target reference frame. 4.如权利要求3所述的采用视频编码信息辅助目标检测方法,其特征在于,确定该参考位置的参考值包括:4. The method according to claim 3, wherein determining the reference value of the reference position comprises: 基于该参考位置对应的块的大小和该参考位置的可信度来确定该参考位置的参考值,其中参考位置的参考值随着该参考位置对应的块的大小的增大,和/或,该参考位置的可信度增大,而增大。The reference value of the reference position is determined based on the size of the block corresponding to the reference position and the reliability of the reference position, wherein the reference value of the reference position increases with the size of the block corresponding to the reference position, and/or, The confidence of this reference position increases, which increases. 5.如权利要求4所述的采用视频编码信息辅助目标检测方法,其特征在于,在所述待检测视频帧中查找目标窗口包括:5. The method for assisting target detection using video coding information as claimed in claim 4, wherein searching for a target window in the video frame to be detected comprises: 在目标形变函数的约束范围内,在所述待检测视频帧中查找目标窗口,以使得目标窗口的整体参考值与所述待检测视频帧的整体参考值超过第一阈值,且,使得目标窗口包含的参考位置的参考值之和以及目标窗口的大小超过第二阈值。Within the constraint range of the target deformation function, a target window is searched in the video frame to be detected, so that the overall reference value of the target window and the overall reference value of the video frame to be detected exceed the first threshold, and the target window is The sum of the reference values of the included reference positions and the size of the target window exceeds the second threshold. 6.如权利要求5所述的采用视频编码信息辅助目标检测方法,其特征在于,还包括:6. The method for assisting target detection using video coding information as claimed in claim 5, further comprising: 预先配置目标检测计数值,在查找到目标窗口的情况下,目标检测计数值减1,在所述目标检测计数值为0的情况下,基于各目标窗口执行目标检测。The target detection count value is pre-configured, and when a target window is found, the target detection count value is decremented by 1, and when the target detection count value is 0, target detection is performed based on each target window. 7.如权利要求4所述的采用视频编码信息辅助目标检测方法,其特征在于,还包括:7. The method for assisting target detection using video coding information as claimed in claim 4, further comprising: 在无法查找到目标窗口的情况下,直接执行目标检测。In the case that the target window cannot be found, the target detection is performed directly. 8.如权利要求5所述的采用视频编码信息辅助目标检测方法,其特征在于,调整所述第一阈值和/或第二阈值,以稳定查找目标窗口。8 . The method for assisting target detection using video coding information according to claim 5 , wherein the first threshold and/or the second threshold are adjusted to stably find the target window. 9 . 9.一种采用视频编码信息辅助目标检测设备,其特征在于,包括处理器,被配置为执行如权利要求1-8任一项所述的采用视频编码信息辅助目标检测方法的步骤。9. A device for assisting target detection using video coding information, comprising a processor configured to perform the steps of the method for assisting target detection using video coding information according to any one of claims 1-8. 10.一种计算机可读存储介质,其特征在于,所述计算机可读存储介质上存储有计算机程序,所述计算机程序被处理器执行时实现如权利要求1至8中任一项所述的采用视频编码信息辅助目标检测方法的步骤。10. A computer-readable storage medium, wherein a computer program is stored on the computer-readable storage medium, and when the computer program is executed by a processor, the computer program according to any one of claims 1 to 8 is implemented. The steps of using video coding information to assist object detection method.
CN202110918613.3A 2021-08-11 2021-08-11 Method and device for assisting target detection using video coding information Active CN113794886B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110918613.3A CN113794886B (en) 2021-08-11 2021-08-11 Method and device for assisting target detection using video coding information

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110918613.3A CN113794886B (en) 2021-08-11 2021-08-11 Method and device for assisting target detection using video coding information

Publications (2)

Publication Number Publication Date
CN113794886A true CN113794886A (en) 2021-12-14
CN113794886B CN113794886B (en) 2025-01-14

Family

ID=78875940

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110918613.3A Active CN113794886B (en) 2021-08-11 2021-08-11 Method and device for assisting target detection using video coding information

Country Status (1)

Country Link
CN (1) CN113794886B (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2007228093A (en) * 2006-02-21 2007-09-06 Toshiba Corp Device and method for detecting motion
CN105654512A (en) * 2015-12-29 2016-06-08 深圳羚羊微服机器人科技有限公司 Target tracking method and device
CN108596109A (en) * 2018-04-26 2018-09-28 济南浪潮高新科技投资发展有限公司 A kind of object detection method and device based on neural network and motion vector
CN110516620A (en) * 2019-08-29 2019-11-29 腾讯科技(深圳)有限公司 Method for tracking target, device, storage medium and electronic equipment
CN112070797A (en) * 2020-08-21 2020-12-11 中国科学院计算技术研究所 A target detection method, system, acceleration device, medium and electronic device

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2007228093A (en) * 2006-02-21 2007-09-06 Toshiba Corp Device and method for detecting motion
CN105654512A (en) * 2015-12-29 2016-06-08 深圳羚羊微服机器人科技有限公司 Target tracking method and device
CN108596109A (en) * 2018-04-26 2018-09-28 济南浪潮高新科技投资发展有限公司 A kind of object detection method and device based on neural network and motion vector
CN110516620A (en) * 2019-08-29 2019-11-29 腾讯科技(深圳)有限公司 Method for tracking target, device, storage medium and electronic equipment
CN112070797A (en) * 2020-08-21 2020-12-11 中国科学院计算技术研究所 A target detection method, system, acceleration device, medium and electronic device

Also Published As

Publication number Publication date
CN113794886B (en) 2025-01-14

Similar Documents

Publication Publication Date Title
RU2381630C2 (en) Method and device for determining block conformity quality
CN101534440B (en) Video signal encoding method
WO2015138008A1 (en) Continuous block tracking for temporal prediction in video encoding
WO2004013985A1 (en) Method and apparatus for performing high quality fast predictive motion search
JP2003284076A (en) Device and method for detecting movement in digital image storage utilizing mpeg image compression technique
EP2670143B1 (en) Video encoding device, video encoding method and video encoding program
Gorur et al. Skip decision and reference frame selection for low-complexity H. 264/AVC surveillance video coding
Kim Fast coding unit (CU) determination algorithm for high-efficiency video coding (HEVC) in smart surveillance application
You et al. Moving object tracking in H. 264/AVC bitstream
WO2024001345A1 (en) Image processing method, electronic device, and computer storage medium
CN113794886B (en) Method and device for assisting target detection using video coding information
CN116363565B (en) Target track determining method and device, electronic equipment and storage medium
US20090190038A1 (en) Method, video encoder, and integrated circuit for detecting non-rigid body motion
US10063880B2 (en) Motion detecting apparatus, motion detecting method and program
JPH10341440A (en) Moving image encoding method and device therefor
JP4094699B2 (en) Image segmentation method and circuit for implementing the same
KR20230073819A (en) High Resolution Video Analyzing Method, Device, and Computer Program Using Streaming Module
Mishra et al. A search pattern based on the repeated motion vectors components for the fast block matching motion estimation in temporal coding
CN116994178B (en) Video processing method, video recognition method, video processing device and vehicle
Kameda et al. Multi-frame motion compensation using extrapolated frame by optical flow for lossless video coding
JP2006014183A (en) Image encoding device and method, and program therefor
Zhang et al. Automatic video object segmentation by integrating object registration and background constructing technology
Hill et al. Sub-pixel motion estimation using kernel methods
JP2019149721A (en) Moving image coding apparatus, control method of the same, and program
JPH10304371A (en) Moving vector detecting device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant