CN111062975A - Method for accelerating real-time target detection of video frame based on perceptual hash algorithm - Google Patents

Method for accelerating real-time target detection of video frame based on perceptual hash algorithm Download PDF

Info

Publication number
CN111062975A
CN111062975A CN201911124925.6A CN201911124925A CN111062975A CN 111062975 A CN111062975 A CN 111062975A CN 201911124925 A CN201911124925 A CN 201911124925A CN 111062975 A CN111062975 A CN 111062975A
Authority
CN
China
Prior art keywords
target detection
picture
target
video
frame
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201911124925.6A
Other languages
Chinese (zh)
Other versions
CN111062975B (en
Inventor
陈旋
王冲
崇传兵
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jiangsu Aijia Household Products Co Ltd
Original Assignee
Jiangsu Aijia Household Products Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jiangsu Aijia Household Products Co Ltd filed Critical Jiangsu Aijia Household Products Co Ltd
Priority to CN201911124925.6A priority Critical patent/CN111062975B/en
Publication of CN111062975A publication Critical patent/CN111062975A/en
Application granted granted Critical
Publication of CN111062975B publication Critical patent/CN111062975B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/262Analysis of motion using transform domain methods, e.g. Fourier domain methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/223Analysis of motion using block-matching
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10016Video; Image sequence
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20048Transform domain processing
    • G06T2207/20052Discrete cosine transform [DCT]

Abstract

The invention relates to a method for accelerating real-time target detection of a video frame based on a perceptual hash algorithm, and belongs to the technical field of video processing. The invention is based on the characteristics of short interval time of adjacent frames and subtle picture change, and utilizes the target detection result of the previous frame and the fingerprint information of the picture to use the detection result of the previous frame or reduce the detection area of the current frame, thereby accelerating the target detection speed.

Description

Method for accelerating real-time target detection of video frame based on perceptual hash algorithm
Technical Field
The invention relates to a method for accelerating real-time target detection of a video frame based on a perceptual hash algorithm, and belongs to the technical field of video processing.
Background
Object detection is a computer vision technique for detecting target objects, such as cars, buildings and humans, which are usually identified by pictures or videos. Object detection locates an object in the image and draws a bounding box around the object. This process is generally divided into two steps: the object is classified and the type is determined, and then a box is drawn around the object.
The problem to be solved by video object detection is the correct identification and positioning of objects for each frame in the video. With respect to image object detection, video is highly redundant, containing a large amount of Temporal locality (i.e. similar at different times) and spatial locality (i.e. similar looking in different scenes), i.e. Temporal Context, information. The context relation of the time sequence is fully utilized, the situation of a large amount of redundancy between continuous frames in the video can be solved, and the detection speed is improved; the detection quality can be improved, and the problems of motion blur, video defocusing, partial shielding, deformation and the like of the video relative to the image are solved.
The existing target detection is mostly finished based on deep learning, the calculation amount is huge, a strong GPU is needed to carry out the target detection, the GPU resource is relatively expensive, and the hardware cost is too high; video target detection views of different scenes are mostly static background images, and moving targets appear in a short time, so that the background images are only continuously detected, and no target exists actually, and GPU resources are wasted.
Disclosure of Invention
The technical problem to be solved by the invention is as follows: the method for accelerating real-time target detection of video frames is provided based on the characteristic of the video memory, the calculated amount is reduced, GPU resources are saved, the target detection speed is accelerated, limited and expensive system resources can be released, and the target detection time can be shortened.
A method for accelerating real-time target detection of video frames based on a perceptual hash algorithm comprises the following steps:
step 1, obtaining a first frame f1 of a video, calculating a picture fingerprint p1 of f1, and performing target detection to obtain a target box 1;
step 2, obtaining a first frame f2 of the video, and calculating a picture fingerprint p2 of f 2; comparing p1 with p 2;
if p1 ═ p2, then the target in f2 is considered box1 as well;
if p1 is not equal to p2, the box1 is enlarged to a certain proportion to be used as the box 2;
step 3, deleting areas with the same size and position as box2 from both f1 and f2, and calculating the picture fingerprints of the deleted areas again to obtain p3 and p 4;
step 4, if p3 is equal to p4, performing target detection on a region with the same size and position as the box2 in the second frame to obtain a target box 3; and if p3 is not equal to p4, performing target detection on the data before f2 is not deleted, and obtaining a target box 4.
In one embodiment, the amplification ratio is not particularly limited, and may be 10%, 30%, 50%, 100%, 150%, or the like, and may be set manually according to the target situation.
In one embodiment, the picture fingerprint is calculated by a hash method, and the specific steps include:
s1, reducing the size of the picture;
s2, simplifying colors;
s3, discrete cosine transform processing;
s4, taking the upper left corner of the matrix after discrete cosine transform processing;
s5, calculating the average value of all values in the matrix obtained in S4; then, a hash value of 64 bits of 0 or 1 is set to the matrix obtained in S4, and a value equal to or larger than the average value is set to "1", and a value smaller than the average value is set to "0", thereby obtaining a picture fingerprint.
In one embodiment, S1 refers to a reduction to a size of 8x 8.
In one embodiment, simplifying the color in S2 refers to conversion to 64 levels of gray.
In one embodiment, a 32 × 32 discrete cosine transform is used in S3.
In one embodiment, the upper left corner in S4 is taken as a matrix of 8 × 8.
Advantageous effects
1. Reducing the amount of computation for target detection
2. Object detection with reduced repetition of video frames
3. Reducing system stress on servers
4. The speed of target detection is accelerated, and the time of target detection is shortened
Drawings
FIG. 1 is a perceptual hashing algorithm flow
FIG. 2 is a process flow of the present invention
Detailed Description
The video is composed of a continuous static picture, and the continuously changed static picture forms a dynamic video by utilizing the human visual persistence effect. If the picture is two-dimensional, the picture data records the information of the pixel points and the related position information, and the video is three-dimensional, so that the time information is increased, and the video is more complex. A frame is the smallest component of a video sequence, the most basic unit, a still image is a frame, and a video is an image sequence consisting of consecutive frames. The frame contains all the video information. A shot consists of a series of consecutive frames that generally describe the motion of a body of things in succession in the same scene. Within the same shot, the difference between adjacent frames is not large, and it often describes the change of the same object subject within the shot, and the change of the object includes continuous actions such as translation, zooming, etc.
The invention is based on the characteristics of short interval time of adjacent frames and subtle picture change, and utilizes the target detection result of the previous frame and the detection result of the previous frame based on the picture fingerprint information or reduces the detection area of the current frame, thereby accelerating the target detection speed.
The technical concept of the method is explained in detail as follows:
1) firstly, reading video data to obtain a first frame f1, firstly, carrying out picture fingerprint calculation (p1) aiming at the frame f1 to obtain a hash value, carrying out target detection (box1) and obtaining a detected target;
2) reading the video data to obtain a next frame f2, carrying out picture fingerprint calculation (p2) aiming at the frame f2, and obtaining a hash value again;
3) comparing picture fingerprints p1 and p2
If the picture fingerprint p2 of the frame f2 is the same as the picture fingerprint p1 of the frame f1, it indicates that the two adjacent pictures are unchanged, and the boundbox of the frame f2 is the boundbox of the frame f1, it can be detected that the target boundbox of the frame f2 is also the box 1;
if the picture fingerprint p2 of the frame f2 is different from the picture fingerprint p1 of the frame f1, it indicates that the two pictures are changed, but since f1 and f2 are two adjacent pictures, the time interval between the two frames is very short, and the moving distance of the target object is very small, the boundbox box1 of the frame f1 is moderately enlarged to box2, for example, enlarged by 50%;
4) the resulting box2 is enlarged, the box2 is cropped out of both frame f1 and frame f2 to regions f1r, f2r, and then the picture fingerprints of the remaining regions are recalculated, p3, p4, respectively
5) Comparing picture fingerprints p3 and p4
If the fingerprints of the pictures are equal, the regions f1r and f2r are considered to have no target, and the target of the frame f2 is in the region box2, target detection is carried out on the small region of the region box2, and the target boundbox box3 can be obtained; thus reducing the amount of computation;
if the picture fingerprints are not equal, the target of the frame f2 is considered to be partially in f2r, and the target boundbox box4 can be obtained by performing target detection on the frame f 2.
It can be seen from the above process that whether the target in the video changes or not and the preliminary situation of the position change can be preliminarily determined quickly by enlarging the bound box1 to the box2 and then calculating the hash value of the remaining positions, and only when it is confirmed that the change occurs, the appropriate target area is detected again, thus avoiding a large number of calculation processes.
In the above method, the method for calculating the image fingerprint by using the perceptual hash algorithm may adopt the following processes:
1. the size is reduced.
The fastest way to remove high frequencies and details is to reduce the size by keeping the structure bright and dark.
The picture is reduced to a size of 8x8 for a total of 64 pixels. The picture difference caused by different sizes and proportions is abandoned.
2. The color is simplified.
And converting the reduced picture into 64-level gray. That is, all pixels have 64 colors in total.
3. DCT (discrete cosine transform) is calculated.
DCT is the frequency clustering and the ladder shape of the picture decomposition, although JPEG uses 8 × 8 DCT transform, here 32 × 32 DCT transform.
4. The DCT is reduced.
Although the result of DCT is a matrix of 32 x 32 size, we only need to retain the 8x8 matrix in the upper left corner, which part presents the lowest frequencies in the picture.
5. The average value is calculated.
The average of all 64 values was calculated.
6. The DCT is further reduced.
According to the 8-by-8 DCT matrix, a hash value of 64 bits of 0 or 1 is set, wherein the hash value greater than or equal to the DCT mean value is set as '1', and the hash value smaller than the DCT mean value is set as '0'. The results do not tell us about the low frequency of authenticity, but only roughly the relative proportion of the frequency we have with respect to the mean. As long as the overall structure of the picture remains unchanged, the hash result value is unchanged. The influence of gamma correction or color histogram adjustment can be avoided.
7. A hash value is calculated.
Setting 64bit to 64bit long integer, the order of combination is not important as long as it is guaranteed that all pictures are in the same order. The 32 x 32 DCT is converted to a 32 x 32 image.

Claims (7)

1. A method for accelerating real-time target detection of video frames based on a perceptual hash algorithm is characterized by comprising the following steps:
step 1, obtaining a first frame f1 of a video, calculating a picture fingerprint p1 of f1, and performing target detection to obtain a target box 1;
step 2, obtaining a first frame f2 of the video, and calculating a picture fingerprint p2 of f 2; comparing p1 with p 2;
if p1= p2, the target in f2 is considered to be box1 as well;
if p1 is not equal to p2, the box1 is enlarged to a certain proportion to be used as the box 2;
step 3, deleting areas with the same size and position as box2 from both f1 and f2, and calculating the picture fingerprints of the deleted areas again to obtain p3 and p 4;
step 4, if p3= p4, performing target detection on a region with the same size and position as the box2 in the second frame to obtain a target box 3; and if p3 is not equal to p4, performing target detection on the data before f2 is not deleted, and obtaining a target box 4.
2. The method for accelerating real-time target detection of video frames based on perceptual hashing algorithm of claim 1, wherein in an embodiment, the amplification ratio is not particularly limited, and may be 10%, 30%, 50%, 100%, 150%, etc., and may be set manually according to target conditions.
3. The method for accelerating real-time target detection of video frames based on perceptual hashing algorithm of claim 1, wherein in an embodiment, the picture fingerprints are calculated by a hashing method, and the specific steps include:
s1, reducing the size of the picture;
s2, simplifying colors;
s3, discrete cosine transform processing;
s4, taking the upper left corner of the matrix after discrete cosine transform processing;
s5, calculating the average value of all values in the matrix obtained in S4; then, a hash value of 64 bits of 0 or 1 is set to the matrix obtained in S4, and a value equal to or larger than the average value is set to "1", and a value smaller than the average value is set to "0", thereby obtaining a picture fingerprint.
4. The method for accelerating real-time object detection of video frames based on perceptual hashing algorithm of claim 3, wherein in one embodiment, S1 refers to a reduction to a size of 8x 8.
5. The method for accelerating real-time object detection of video frames based on perceptual hashing algorithm of claim 3, wherein in one embodiment, the simplified color in S2 is converted to 64-level gray.
6. The method for accelerating real-time object detection of video frames based on perceptual hashing algorithm of claim 3, wherein in one embodiment, 32 x 32 discrete cosine transform is used in S3.
7. The method for accelerating real-time object detection of video frames based on perceptual hashing algorithm of claim 3, wherein in one embodiment, the top left corner of S4 is taken as a matrix of 8x 8.
CN201911124925.6A 2019-11-18 2019-11-18 Method for accelerating real-time target detection of video frame based on perceptual hash algorithm Active CN111062975B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911124925.6A CN111062975B (en) 2019-11-18 2019-11-18 Method for accelerating real-time target detection of video frame based on perceptual hash algorithm

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911124925.6A CN111062975B (en) 2019-11-18 2019-11-18 Method for accelerating real-time target detection of video frame based on perceptual hash algorithm

Publications (2)

Publication Number Publication Date
CN111062975A true CN111062975A (en) 2020-04-24
CN111062975B CN111062975B (en) 2022-07-08

Family

ID=70297870

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911124925.6A Active CN111062975B (en) 2019-11-18 2019-11-18 Method for accelerating real-time target detection of video frame based on perceptual hash algorithm

Country Status (1)

Country Link
CN (1) CN111062975B (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108897775A (en) * 2018-06-01 2018-11-27 昆明理工大学 A kind of rapid image identifying system and method based on perceptual hash
CN110349191A (en) * 2019-06-25 2019-10-18 昆明理工大学 A kind of visual tracking method based on perceptual hash algorithm

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108897775A (en) * 2018-06-01 2018-11-27 昆明理工大学 A kind of rapid image identifying system and method based on perceptual hash
CN110349191A (en) * 2019-06-25 2019-10-18 昆明理工大学 A kind of visual tracking method based on perceptual hash algorithm

Also Published As

Publication number Publication date
CN111062975B (en) 2022-07-08

Similar Documents

Publication Publication Date Title
CN108038422B (en) Camera device, face recognition method and computer-readable storage medium
US9262811B2 (en) System and method for spatio temporal video image enhancement
CN108694705B (en) Multi-frame image registration and fusion denoising method
US9615039B2 (en) Systems and methods for reducing noise in video streams
US20230214976A1 (en) Image fusion method and apparatus and training method and apparatus for image fusion model
CN112308095A (en) Picture preprocessing and model training method and device, server and storage medium
CN110944176B (en) Image frame noise reduction method and computer storage medium
Berdnikov et al. Real-time depth map occlusion filling and scene background restoration for projected-pattern-based depth cameras
CN114037938B (en) NFL-Net-based low-illumination target detection method
CN112019827B (en) Method, device, equipment and storage medium for enhancing video image color
CN113242428A (en) ROI (region of interest) -based post-processing acceleration method in video conference scene
JP2000048211A (en) Movile object tracking device
Zhiwei et al. New method of background update for video-based vehicle detection
CN109462748B (en) Stereo video color correction algorithm based on homography matrix
CN111583357A (en) Object motion image capturing and synthesizing method based on MATLAB system
CN111681236B (en) Target density estimation method with attention mechanism
CN110136085B (en) Image noise reduction method and device
CN111062975B (en) Method for accelerating real-time target detection of video frame based on perceptual hash algorithm
WO2023042337A1 (en) Image processing system
CN112532938B (en) Video monitoring system based on big data technology
CN115471413A (en) Image processing method and device, computer readable storage medium and electronic device
CN116263942A (en) Method for adjusting image contrast, storage medium and computer program product
US20050129312A1 (en) Unit for and method of segmentation
JP3731741B2 (en) Color moving image processing method and processing apparatus
CN112581400A (en) Tuning image enhancement method based on Gaussian standard deviation and contrast

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information
CB02 Change of applicant information

Address after: 211100 floor 5, block a, China Merchants high speed rail Plaza project, No. 9, Jiangnan Road, Jiangning District, Nanjing, Jiangsu (South Station area)

Applicant after: JIANGSU AIJIA HOUSEHOLD PRODUCTS Co.,Ltd.

Address before: 211100 No. 18 Zhilan Road, Science Park, Jiangning District, Nanjing City, Jiangsu Province

Applicant before: JIANGSU AIJIA HOUSEHOLD PRODUCTS Co.,Ltd.

GR01 Patent grant
GR01 Patent grant