CN113794886A - Method and apparatus for assisting target detection using video coding information - Google Patents
Method and apparatus for assisting target detection using video coding information Download PDFInfo
- Publication number
- CN113794886A CN113794886A CN202110918613.3A CN202110918613A CN113794886A CN 113794886 A CN113794886 A CN 113794886A CN 202110918613 A CN202110918613 A CN 202110918613A CN 113794886 A CN113794886 A CN 113794886A
- Authority
- CN
- China
- Prior art keywords
- target
- window
- frame
- detected
- video frame
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000001514 detection method Methods 0.000 title claims abstract description 74
- 238000000034 method Methods 0.000 title claims abstract description 45
- 238000004590 computer program Methods 0.000 claims description 6
- 238000005307 time correlation function Methods 0.000 claims description 3
- 230000033001 locomotion Effects 0.000 description 24
- 230000006870 function Effects 0.000 description 15
- 239000013598 vector Substances 0.000 description 10
- 238000013135 deep learning Methods 0.000 description 9
- 230000008569 process Effects 0.000 description 8
- 238000004364 calculation method Methods 0.000 description 4
- 230000003287 optical effect Effects 0.000 description 4
- 230000008901 benefit Effects 0.000 description 3
- 238000006073 displacement reaction Methods 0.000 description 3
- 238000012935 Averaging Methods 0.000 description 2
- 238000005265 energy consumption Methods 0.000 description 2
- 230000002123 temporal effect Effects 0.000 description 2
- 238000009825 accumulation Methods 0.000 description 1
- 238000004422 calculation algorithm Methods 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000013527 convolutional neural network Methods 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000003384 imaging method Methods 0.000 description 1
- 230000004807 localization Effects 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 239000002699 waste material Substances 0.000 description 1
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/50—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
- H04N19/503—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving temporal prediction
- H04N19/51—Motion estimation or motion compensation
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/134—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
- H04N19/136—Incoming video signal characteristics or properties
- H04N19/137—Motion inside a coding unit, e.g. average field, frame or block difference
- H04N19/139—Analysis of motion vectors, e.g. their magnitude, direction, variance or reliability
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/169—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
- H04N19/17—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object
- H04N19/176—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object the region being a block, e.g. a macroblock
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/50—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
- H04N19/503—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving temporal prediction
- H04N19/51—Motion estimation or motion compensation
- H04N19/513—Processing of motion vectors
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/50—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
- H04N19/593—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving spatial prediction techniques
Landscapes
- Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- Compression Or Coding Systems Of Tv Signals (AREA)
Abstract
The invention discloses a method and a device for assisting target detection by adopting video coding information, which comprises the following steps: determining a target reference frame for a video frame to be detected, wherein the target reference frame comprises a plurality of sub-window blocks; for each sub-window block: determining the reference position of the sub-window block in the video frame to be detected; determining a reference value for the reference position; determining the overall reference value of the video frame to be detected based on the reference value of each sub-window block; searching a target window in the video frame to be detected based on the integral reference value within the constraint range of the target deformation function; target detection is performed based on the target window. The method disclosed by the invention can effectively reduce the calculated amount, improve the efficiency of target detection, and save hardware resources and reduce power consumption and delay on the whole.
Description
Technical Field
The invention relates to the technical field of computers, in particular to a method and equipment for assisting target detection by adopting video coding information.
Background
Object detection and recognition is a common problem in computer vision, which has wide application in many areas of life. The purpose of target detection is to distinguish a target from an uninteresting portion in an image or video, determine whether the target exists, and determine the position of the target if the target exists. At present, a method based on deep learning is generally used for detecting a target with higher precision. The position and the type of the target are found through the convolutional neural network, and the accuracy of the target can reach or even exceed the level of human eyes.
The biggest drawback of deep learning-based target detection is high computational complexity. The use of object detection on each frame of video may require a significant amount of computational resources (typically using a GPU) while also potentially increasing latency. Therefore, object detection is often used in conjunction with object tracking. Even so, the computational complexity of target tracking is high. This limits the field of use of target detection to a great extent and also causes great computational and energy consumption.
Another common approach is video object detection using motion information. Currently, the motion information used in a relatively large amount is optical flow information. The method for measuring the instantaneous speed of the pixel motion of a space moving object on an observation imaging plane is a method for finding the corresponding relation between the previous frame and the current frame by using the change of the pixels in the image sequence on a time domain and the correlation between the adjacent frames so as to calculate the motion information of the object between the adjacent frames. In general, optical flow is due to movement of the foreground objects themselves in the scene, motion of the camera, or both. But the computational effort to extract and use the optical flow itself is very large.
Because the existing target detection has high computational complexity, almost all protocols need to carry out motion estimation in the video coding and decoding process, and the target tracking is effectively carried out by utilizing the motion estimation, thereby improving the efficiency and the performance of the target detection.
Disclosure of Invention
The embodiment of the invention provides a method and equipment for assisting target detection by adopting video coding information, which are used for realizing auxiliary target detection and improving the efficiency of target detection.
In a first aspect, an embodiment of the present invention provides a method for assisting target detection by using video coding information, including: determining a target reference frame for a video frame to be detected, wherein the target reference frame comprises a plurality of sub-window blocks; for each sub-window block: determining the reference position of the sub-window block in the video frame to be detected; determining a reference value for the reference position; determining an overall reference value based on the reference values of the respective sub-window blocks; searching a target window in the video frame to be detected based on the integral reference value within the constraint range of the target deformation function; target detection is performed based on the target window.
In some embodiments, determining the target reference frame for the video frame to be detected comprises:
and selecting a target reference frame in front of the video frame to be detected based on a preset time range and the video frame to be detected.
In some embodiments, the preset time range is determined at a specified time interval, or is determined based on a time correlation function, wherein the closer to the video frame to be detected, the higher the confidence of the target reference frame.
In some embodiments, determining the reference value for the reference location comprises:
and determining a reference value of the reference position based on the size of the block corresponding to the reference position and the reliability of the reference position, wherein the reference value of the reference position increases with the increase of the size of the block corresponding to the reference position and/or the increase of the reliability of the reference position.
In some embodiments, finding a target window in the video frame to be detected comprises:
and searching a target window in the video frame to be detected within the constraint range of the target deformation function, so that the overall reference value of the target window and the overall reference value of the video frame to be detected exceed a first threshold, and the sum of the reference values of the reference positions contained in the target window and the size of the target window exceed a second threshold.
In some embodiments, further comprising: and pre-configuring a target detection count value, subtracting 1 from the target detection count value under the condition that the target window is found, and executing target detection based on each target window under the condition that the target detection count value is 0.
In some embodiments, further comprising: and under the condition that the target window cannot be found, directly executing target detection.
In some embodiments, the first threshold and/or the second threshold are adjusted to stabilize the search target window.
In a second aspect, an embodiment of the present invention further provides an object detection device for video coding, including a processor configured to execute steps of implementing an auxiliary object detection method for video coding according to embodiments of the present disclosure.
In a third aspect, an embodiment of the present invention further provides a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and when the computer program is executed by a processor, the steps of the auxiliary target detection method for video coding according to the embodiments of the present disclosure are implemented.
According to the embodiment of the invention, the target window is searched in the video frame to be detected based on the determined integral reference value in the constraint range of the target deformation function, and the target detection is executed based on the target window, so that the calculated amount can be effectively reduced, and the target detection efficiency is improved. Thereby saving hardware resources and reducing power consumption and delay on the whole.
The foregoing description is only an overview of the technical solutions of the present invention, and the embodiments of the present invention are described below in order to make the technical means of the present invention more clearly understood and to make the above and other objects, features, and advantages of the present invention more clearly understandable.
Drawings
Various other advantages and benefits will become apparent to those of ordinary skill in the art upon reading the following detailed description of the preferred embodiments. The drawings are only for purposes of illustrating the preferred embodiments and are not to be construed as limiting the invention. Also, like reference numerals are used to refer to like parts throughout the drawings. In the drawings:
fig. 1 shows a schematic diagram of moving object vector detection.
Fig. 2 shows a basic flowchart of a target detection method according to an embodiment of the present disclosure.
Detailed Description
Exemplary embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.
Almost all protocols require motion estimation during video encoding and decoding. The basic idea is to divide each frame of the image sequence into a plurality of non-overlapping blocks, consider the displacement of all pixels in the blocks to be the same, then find out the block most similar to the current block, i.e. the matching block, from each macro block to the reference frame according to a certain block matching criterion within a certain search range, and the relative displacement between the matching block and the current block is the motion vector.
Generally, the displacement of the object motion between two adjacent frames before and after the current macroblock position is not large, so that a small area around the macroblock at the same position in the historical frame can be searched from the current macroblock position to find a matching block. Obviously, the larger the search area, the higher the calculation cost, so that some skill is needed in practical products to accelerate the search speed and reduce the calculation amount. For example, the h.264 protocol allows for finding a matching block in any reconstructed frame in the DPB buffer to find an optimal matching block. It is clear that the cost of searching is proportional to the number of reference frames, e.g. the time (chip area) and power consumption required to search for a matching block in 2 reference frames is essentially 2 times that of the single reference frame case. Therefore, from the viewpoint of cost saving, the simplest scheme is to only refer to the previous frame, which saves both reference frame buffer and encoder computation power, but has the disadvantage of sacrificing some code rate performance.
After finding a best matching block, Motion estimation outputs a Motion Vector (MV), i.e., the position coordinates of the reference block with respect to the current block. If no suitable matching block is found, the macroblock may be intra predicted. Thus, as shown in fig. 1, both P-frames and B-frames may have Intra-coded macroblocks in h.264 and h.265 technologies.
If the frequency of target detection is reduced, the method can be used in certain application scenes. But is not applicable to scenarios where video frame rates are required. For example, in a region enhancement application, a video encoder needs to adjust the QP for a specific object to improve image quality, which requires target recognition for each frame. Even when license plate recognition is carried out in a parking lot, some high-speed vehicles may miss detection due to the interval of the timing snapshots.
In the conventional video target detection using motion information, additional computing resources are required to estimate the motion information, so that resource waste is caused, and the cost of hardware and energy consumption is increased.
In order to determine and track target detection by using motion vectors, an embodiment of the present invention provides a method for assisting target detection by using video coding information, including:
step S201, determining a target reference frame for a video frame to be detected, wherein the target reference frame comprises a plurality of sub-window blocks. In particular, the sub-window block in this embodiment may be a target location found through deep learning or other methods. Due to the continuity of video in many cases, the target reference frame, e.g., the previous frame, can be selected using the previous video frame in the temporal dimension. In step S202, for each sub-window block: determining the reference position of the sub-window block in the video frame to be detected; a reference value for the reference position is determined. That is, for a block in the target window of each target reference frame, a reference position of the sub-window block may be determined in the video frame to be detected, and the reference position may be a corresponding position of the sub-window block in the video frame to be detected. From this reference position, a reference value may be calculated, which may be used to describe the temporal confidence of the sub-window-block in embodiments of the present disclosure. An overall reference value is determined based on the reference values of the respective sub-window blocks in step S203. The overall reference value may be determined by summing, averaging or weighted averaging, for example, and is not limited herein.
Then, in step S204, in the constraint range of the target deformation function, based on the overall reference value, a target window is searched in the video frame to be detected. Finally, in step S205, target detection is performed based on the target window. In some embodiments, initially, the target detection result of the previous frame may be obtained by deep learning, so as to determine the position of each sub-window block containing the moving target. And the position of the sub-window block in the target detection result region in the frame before detection can be found through the motion vector. These locations may be very concentrated and may also be very dispersed. In the decentralized case, if all locations need to be covered, the new target window it generates is large. To control this, the present application proposes to limit the range of the deformation by a target deformation function, the target window satisfying a condition similar to that of the previous frame, and also allowing a certain variation.
The method of the present disclosure is able to find its new position for the next frame (video frame to be detected) by motion estimation without reusing deep learning. And the motion estimation is a necessary step in the video coding process, so that the target detection efficiency is effectively improved, and the target detection result of each frame is from the motion vectors and the target detection results of other frames, so that the operation resource is effectively saved. And searching a target window through a target deformation function, thereby reducing the calculation amount of searching.
In some embodiments, determining the target reference frame for the video frame to be detected comprises:
and selecting a target reference frame in front of the video frame to be detected based on a preset time range and the video frame to be detected. In some embodiments, the preset time range is determined at a specified time interval, or is determined based on a time correlation function, wherein the closer to the video frame to be detected, the higher the confidence of the target reference frame. For example, a video frame before a specified time interval may be selected as the target reference frame, for example, a previous frame may be selected as the target reference frame, and a second previous frame may also be selected as the target reference frame. The time reliability function f (t): the input is the time interval with the current frame, and the output is the reliability. The shorter the interval, the closer the target reference frame and the current frame are, the higher their confidence level is. Its implementation can be implemented with a one-dimensional LUT. The maximum value of the output is 1, and the minimum value is 0. For example, if it is desired to refer to only the previous frame (previous frame as target reference frame) F (t ═ -1) ═ 1, F (t | ═ -1) ═ 0. If it is desired to refer to two frames F (t-1) 0.8, F (t-2) 0.8, F (t-1, -2).
In some embodiments, determining the reference value for the reference location comprises: and determining a reference value of the reference position based on the size of the block corresponding to the reference position and the reliability of the reference position, wherein the reference value of the reference position increases with the increase of the size of the block corresponding to the reference position and/or the increase of the reliability of the reference position. As an example, the size × confidence value of the block corresponding to the reference position may be utilized. Step S103, determining an overall reference value based on the reference values of the respective sub-window blocks. The overall reference value may be determined by way of summation, for example.
In some embodiments, finding a target window in the video frame to be detected comprises: and searching a target window in the video frame to be detected within the constraint range of the target deformation function, so that the overall reference value of the target window and the overall reference value of the video frame to be detected exceed a first threshold, and the sum of the reference values of the reference positions contained in the target window and the size of the target window exceed a second threshold. The specific search method may be to use the target deformation function Q (X, Y): the allowable deformation range of the target window in the current frame. The deformation allowed for the target window corresponding to the current frame is compared to the size and shape of the target window in the previous frame. For example, Q (0.1 ) may be a deformation (increase or decrease) that allows 10% of each of the width and height of the target window. It is also possible to allow only fixed size target windows [1.1W, 1.1H ], [1.1W, 1.0H ], [1.1W, 0.9H ], [1.0W, 1.1H ], [1.0W, 1.0H ], [1.0W, 0.9H ], [0.9W, 1.1H ], [0.9W, 1.0H ], [0.9W, 0.9H ]. Q (0, 0) indicates that the target window size is consistent with the previous frame. In some embodiments, the target deformation function may be configured to tolerate a small range of scene deformation. The purpose of the target warping function in this example is to limit the scope of the search, reducing the computational load of the search. Meanwhile, the deformation of the target in the video can be tolerated to a certain extent. Considering the short time from frame to frame, the distortion tolerance is set to a small range in most scenes in this example. The first threshold value Pc and the second threshold value Pd may be preset, so that a new target window may be searched in the current frame (frame to be detected) within the allowable range of the target deformation function, and the ratio of the sum of the reliable reference values of the reference positions to the overall reliable reference value may need to exceed Pc. If a matching target window is found, it is further detected that the ratio of the sum of the sizes of the corresponding reference positions to the size of the target window needs to exceed Pd.
In some embodiments, the first threshold and/or the second threshold may be adjusted to stabilize the search target window. In this example, it is easier to obtain a larger reference position coverage, i.e., larger than Pc, considering the large target window. While a small target window is easier to achieve with higher coverage density. By adjusting Pc and Pd, the stable target window can be controlled to be finally obtained.
In some embodiments, a new target window may be found in the current frame within the allowable range of the target warping function, which may contain the ratio of the sum of the trusted reference values of the reference positions to the overall trusted reference value that needs to exceed Pc.
The new window must contain a certain proportion of blocks in the old window (move into the new window) → new window enlargement. The ratio of the sum of the corresponding reference position sizes to the size of the target window for the new target window found needs to exceed Pd, but there must not be too many unknown blocks in the new window (which do not come from the old window) → the new window shrinking. The target window is obtained by matching according to the process. Based on two factors of Pd and Pc, the position of the target in the current frame and the size of the target window are controlled, so that the target window fits the actual requirement.
In some embodiments, further comprising: and pre-configuring a target detection count value, subtracting 1 from the target detection count value under the condition that the target window is found, and executing target detection based on each target window under the condition that the target detection count value is 0. In some embodiments, further comprising: and under the condition that the target window cannot be found, directly executing target detection. For example, the target detection count value N ═ k may be configured in advance, and the target detection count value may be configured in a forced target detection counter for performing counting of forced target detection.
If a new frame appears, if intra prediction is forced due to the frame being an I-frame (key frame) or other mechanism. The number of reference frames is 0, and the motion vector cannot be determined, so that target detection can be performed on the frame by a preset method, such as a deep learning method. The value N ═ k.
And if the target window conforming to Pc and Pd is found, outputting the target window N-1.
If there is no target window that meets both Pc and Pd (not meeting the target tracking condition), directly running target detection (e.g. directly completing target localization or target detection for the frame through deep learning). The value N ═ k.
If N is 0, target detection is run and N is reset. N ═ k.
In summary, the method of the present invention can use the motion vector to track the target, thereby reducing the use of computing resources. The target detection result of each frame is derived from the motion vectors and target detection results of other frames, and in the process, continuous accumulation of errors can be generated. Setting the forced target detection counter is also employed in this example to prevent accumulated errors. Therefore, the newly appeared target cannot be missed under the condition of ensuring target tracking. The method disclosed by the invention cannot find a new target, the target to be tracked can be detected by matching with deep learning in the specific implementation process, then the target tracking is carried out by the method disclosed by the invention, and the algorithm complexity in the case of tracking the same target is 1/30 or even lower because the target positioning is completed by using deep learning in interframes, so that the method disclosed by the invention effectively saves the computing resources. The range of searching is limited by using the target deformation function, and the calculation amount of searching is greatly reduced. By adjusting Pc and Pd, the stable target window can be controlled to be finally obtained. And the weight of the reference frame is adjusted by using the time credibility function, so that a more accurate result is obtained. During decoding. The decoder can also obtain information of the motion vectors. Therefore, the method disclosed by the invention can be used for accelerating the target detection and saving the computation.
In a second aspect, an embodiment of the present invention further provides an object detection device for video coding, including a processor configured to execute steps of implementing an auxiliary object detection method for video coding according to embodiments of the present disclosure.
In a third aspect, an embodiment of the present invention further provides a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and when the computer program is executed by a processor, the steps of the auxiliary target detection method for video coding according to the embodiments of the present disclosure are implemented.
It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.
The above-mentioned serial numbers of the embodiments of the present invention are merely for description and do not represent the merits of the embodiments.
Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner. Based on such understanding, the technical solutions of the present invention may be embodied in the form of a software product, which is stored in a storage medium (such as ROM/RAM, magnetic disk, optical disk) and includes instructions for enabling a terminal (such as a mobile phone, a computer, a server, or a network device) to execute the method according to the embodiments of the present invention.
While the present invention has been described with reference to the embodiments shown in the drawings, the present invention is not limited to the embodiments, which are illustrative and not restrictive, and it will be apparent to those skilled in the art that various changes and modifications can be made therein without departing from the spirit and scope of the invention as defined in the appended claims.
Claims (10)
1. A method for assisting object detection using video coding information, comprising:
determining a target reference frame for a video frame to be detected, wherein the target reference frame comprises a plurality of sub-window blocks;
for each sub-window block:
determining the reference position of the sub-window block in the video frame to be detected;
determining a reference value for the reference position;
determining an overall reference value based on the reference values of the respective sub-window blocks;
searching a target window in the video frame to be detected based on the integral reference value within the constraint range of the target deformation function;
target detection is performed based on the target window.
2. The method of claim 1, wherein determining the target reference frame for the video frame to be detected comprises:
and selecting a target reference frame in front of the video frame to be detected based on a preset time range and the video frame to be detected.
3. The method as claimed in claim 2, wherein the predetermined time range is determined at a predetermined time interval or based on a time correlation function, wherein the closer to the video frame to be detected, the higher the confidence level of the target reference frame.
4. The method of claim 3, wherein determining the reference value of the reference position comprises:
and determining a reference value of the reference position based on the size of the block corresponding to the reference position and the reliability of the reference position, wherein the reference value of the reference position increases with the increase of the size of the block corresponding to the reference position and/or the increase of the reliability of the reference position.
5. The method as claimed in claim 4, wherein said searching for the target window in the video frame to be detected comprises:
and searching a target window in the video frame to be detected within the constraint range of the target deformation function, so that the overall reference value of the target window and the overall reference value of the video frame to be detected exceed a first threshold, and the sum of the reference values of the reference positions contained in the target window and the size of the target window exceed a second threshold.
6. The method of claim 5, further comprising:
and pre-configuring a target detection count value, subtracting 1 from the target detection count value under the condition that the target window is found, and executing target detection based on each target window under the condition that the target detection count value is 0.
7. The method of claim 4, further comprising:
and under the condition that the target window cannot be found, directly executing target detection.
8. The method of claim 5, wherein the first threshold and/or the second threshold are adjusted to stabilize the search target window.
9. An apparatus for assisting object detection with video coding information, comprising a processor configured to perform the steps of the method for assisting object detection with video coding information according to any of claims 1 to 8.
10. A computer-readable storage medium, characterized in that a computer program is stored on the computer-readable storage medium, which computer program, when being executed by a processor, carries out the steps of the method for assisting object detection with video coding information according to any one of claims 1 to 8.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110918613.3A CN113794886A (en) | 2021-08-11 | 2021-08-11 | Method and apparatus for assisting target detection using video coding information |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110918613.3A CN113794886A (en) | 2021-08-11 | 2021-08-11 | Method and apparatus for assisting target detection using video coding information |
Publications (1)
Publication Number | Publication Date |
---|---|
CN113794886A true CN113794886A (en) | 2021-12-14 |
Family
ID=78875940
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110918613.3A Pending CN113794886A (en) | 2021-08-11 | 2021-08-11 | Method and apparatus for assisting target detection using video coding information |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113794886A (en) |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2007228093A (en) * | 2006-02-21 | 2007-09-06 | Toshiba Corp | Device and method for detecting motion |
CN105654512A (en) * | 2015-12-29 | 2016-06-08 | 深圳羚羊微服机器人科技有限公司 | Target tracking method and device |
CN108596109A (en) * | 2018-04-26 | 2018-09-28 | 济南浪潮高新科技投资发展有限公司 | A kind of object detection method and device based on neural network and motion vector |
CN110516620A (en) * | 2019-08-29 | 2019-11-29 | 腾讯科技(深圳)有限公司 | Method for tracking target, device, storage medium and electronic equipment |
CN112070797A (en) * | 2020-08-21 | 2020-12-11 | 中国科学院计算技术研究所 | Target detection method, system, acceleration device, medium and electronic equipment |
-
2021
- 2021-08-11 CN CN202110918613.3A patent/CN113794886A/en active Pending
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2007228093A (en) * | 2006-02-21 | 2007-09-06 | Toshiba Corp | Device and method for detecting motion |
CN105654512A (en) * | 2015-12-29 | 2016-06-08 | 深圳羚羊微服机器人科技有限公司 | Target tracking method and device |
CN108596109A (en) * | 2018-04-26 | 2018-09-28 | 济南浪潮高新科技投资发展有限公司 | A kind of object detection method and device based on neural network and motion vector |
CN110516620A (en) * | 2019-08-29 | 2019-11-29 | 腾讯科技(深圳)有限公司 | Method for tracking target, device, storage medium and electronic equipment |
CN112070797A (en) * | 2020-08-21 | 2020-12-11 | 中国科学院计算技术研究所 | Target detection method, system, acceleration device, medium and electronic equipment |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
RU2381630C2 (en) | Method and device for determining block conformity quality | |
US8982951B2 (en) | Adaptive motion estimation coding | |
WO2015138008A1 (en) | Continuous block tracking for temporal prediction in video encoding | |
EP1530831A1 (en) | Method and apparatus for performing high quality fast predictive motion search | |
Gorur et al. | Skip decision and reference frame selection for low-complexity H. 264/AVC surveillance video coding | |
EP2670143B1 (en) | Video encoding device, video encoding method and video encoding program | |
CN116886903A (en) | Encoding method, apparatus and computer device for skipping HEVC intra-frame encoding | |
Kim | Fast coding unit (CU) determination algorithm for high-efficiency video coding (HEVC) in smart surveillance application | |
You et al. | Moving object tracking in H. 264/AVC bitstream | |
CN116363565B (en) | Target track determining method and device, electronic equipment and storage medium | |
CN113794886A (en) | Method and apparatus for assisting target detection using video coding information | |
US8208552B2 (en) | Method, video encoder, and integrated circuit for detecting non-rigid body motion | |
US20090296819A1 (en) | Moving Picture Decoding Apparatus and Moving Picture Decoding Method | |
Dong et al. | Real-time moving object segmentation and tracking for H. 264/AVC surveillance videos | |
CN117376571A (en) | Image processing method, electronic device, and computer storage medium | |
Hill et al. | Sub-pixel motion estimation using kernel methods | |
JPH10341440A (en) | Moving image encoding method and device therefor | |
Jiménez-Moreno et al. | Bayesian adaptive algorithm for fast coding unit decision in the High Efficiency Video Coding (HEVC) standard | |
US10063880B2 (en) | Motion detecting apparatus, motion detecting method and program | |
JP2019149721A (en) | Moving image coding apparatus, control method of the same, and program | |
JPH10304371A (en) | Moving vector detecting device | |
JP2006014183A (en) | Image encoding device and method, and program therefor | |
Zhang et al. | Automatic video object segmentation by integrating object registration and background constructing technology | |
Günyel et al. | Multi-resolution motion estimation for motion compensated frame interpolation | |
KR102032793B1 (en) | Method and Apparatus for effective motion vector decision for motion estimation |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |