CN110942002B - Unmanned aerial vehicle aerial video frame positioning method based on rotation invariant perception hash - Google Patents

Unmanned aerial vehicle aerial video frame positioning method based on rotation invariant perception hash Download PDF

Info

Publication number
CN110942002B
CN110942002B CN201911129923.6A CN201911129923A CN110942002B CN 110942002 B CN110942002 B CN 110942002B CN 201911129923 A CN201911129923 A CN 201911129923A CN 110942002 B CN110942002 B CN 110942002B
Authority
CN
China
Prior art keywords
sequence
rotation
hash
unmanned aerial
aerial vehicle
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201911129923.6A
Other languages
Chinese (zh)
Other versions
CN110942002A (en
Inventor
印鉴
陈智聪
陈殷齐
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sun Yat Sen University
Original Assignee
Sun Yat Sen University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sun Yat Sen University filed Critical Sun Yat Sen University
Priority to CN201911129923.6A priority Critical patent/CN110942002B/en
Publication of CN110942002A publication Critical patent/CN110942002A/en
Application granted granted Critical
Publication of CN110942002B publication Critical patent/CN110942002B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/10Terrestrial scenes
    • G06V20/13Satellite images
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/80Analysis of captured images to determine intrinsic or extrinsic camera parameters, i.e. camera calibration
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T9/00Image coding
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10016Video; Image sequence

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Multimedia (AREA)
  • Astronomy & Astrophysics (AREA)
  • Remote Sensing (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Image Processing (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)

Abstract

The application provides an unmanned aerial vehicle aerial video frame positioning method based on rotation invariant perception hash, which has a certain translation invariant (pooling principle); differential coding is used during coding, so that illumination weather conditions are irrelevant; adding a circumferential coding sequence irrelevant to a starting point to ensure that the rotation invariance exists; the corner portions are not encoded in the circumferential encoding order because the corner portions may differ significantly from view to view, thereby eliminating the different effects of other scenes of the corner upon rotation. Experiments on real unmanned aerial vehicle videos show that compared with the prior perceptual hash method, the method successfully solves the problem that the traditional perceptual hash has no rotation unchanged characteristic.

Description

Unmanned aerial vehicle aerial video frame positioning method based on rotation invariant perception hash
Technical Field
The application relates to the field of video image processing algorithms, in particular to an unmanned aerial vehicle aerial video frame positioning method based on rotation invariant perception hash.
Background
In recent years, with the popularization of unmanned aerial vehicles, the use of unmanned aerial vehicle aerial videos to complete some tasks has been increasingly focused on by industry. However, in the task of abnormal cruising of the unmanned aerial vehicle in the river channel, the unmanned aerial vehicle videos in different time periods of the same river channel need to be positioned frame by frame, namely, one frame of one video is found out on the same position of one frame of the other video. Such tasks have not been studied.
The related technology is based on the picture search of the deep learning, but is not in line with the application of unmanned video frame positioning, because the picture search of the deep learning aims at finding the semantic match between pictures, the semantic information of a whole video is basically the same, the time consumption of the deep learning is too large, and a short video has a huge frame number. We employ a simple and fast similar picture search technique of perceptual hashing (Perceptual Hashing). Perceptual hashing is a type of unidirectional mapping of a set of multimedia data to a set of perceptual digests, i.e. a unique mapping of multimedia digital representations with the same perceptual content to a segment of digital digests, and satisfies perceptual robustness and security. The perceptual hash provides a safe and reliable technical support for information service modes such as multimedia content identification, retrieval, authentication and the like.
The Hash function (Hash Functions) is a digital Digest (Digest) for irreversibly extracting the original data, has the characteristics of unidirectionality, vulnerability and the like, and can ensure the uniqueness and the non-tamper property of the original data. Various hash functions have been successfully applied in the fields of information retrieval and management, data authentication, and the like. However, the conventional hash function cannot satisfy the frame positioning function of two unmanned aerial vehicle aerial videos at different times, because even if some differences in lens rotation exist between videos shot at different times in the same place, the hash map is incorrect (the rotation hash value of the same picture is completely different).
Disclosure of Invention
The application provides an unmanned aerial vehicle aerial video frame positioning method based on rotation invariant perception hash, which realizes that different influences of other scenes of corners during rotation are eliminated.
In order to achieve the technical effects, the technical scheme of the application is as follows:
an unmanned aerial vehicle aerial video frame positioning method based on rotation invariant perception hash comprises the following steps:
s1: carrying out rotation-invariant hash value calculation on a video frame-reference frame to be positioned and a video frame-target frame to be retrieved;
s2: and comparing the hash value difference between the reference frame and the target frame and all frames of the unmanned aerial vehicle video at another time to find a frame with the smallest difference.
Further, the specific process of the step S1 is:
s11: performing picture scaling on the reference frame and the target frame;
s12: converting the gray level of the zoomed picture;
s13: the gray level diagram is subjected to a circumferential coding sequence for obtaining hash values;
s14: the binary form of the hash value is obtained for the picture processed in the step S13, and then the hash value which is unchanged in rotation is obtained;
s15: and (5) obtaining an information fingerprint according to the result of the step S14.
Further, the process of step S11 is:
the pictures are uniformly scaled to 8 x 8, and the total of 64 pixels of the pictures are subjected to certain translation invariance, and when the translation amount is in a scaling range, the thumbnails are basically consistent.
Further, the process of step S12 is: the scaled picture is converted into a 256-level gray scale.
Further, the process of step S13 is:
encoding the hash according to the circumference to make it rotation invariance, taking the length r (1-H/2) as the radius of the circumference according to the length H of the picture, and taking the polar coordinate mode x=rcosθ+x for each radius length 0 ,y=rsinθ+y 0 Calculating the coordinate sequence of each circle, wherein θ is 0-360 ° Each value of (x) 0 ,y 0 ) To scale the center of the picture, a circumferential sequence (x 1 1 ,y 1 1 )(x 1 2 ,y 1 2 )(x 1 3 ,y 1 3 ) .. wherein x is r i An i-th non-repeating coordinate index calculated according to a degree of 0 °,1 °,2 ° -360 ° under a circumference representing a radius r is obtained, and then S is obtained 1 =I(x 1 1 ,y 1 1 ),I(x 1 2 ,y 1 2 ),I(x 1 3 ,y 1 3 ) .. A sequence of pixel values indexed by a circumferential coordinate, r such sequences can be obtained as S (the r-th sequence is S r ) Such an encoding sequence will also automatically remove corner effects because two frames with rotation differences in the actual captured video will be quite different in the corner portions.
Further, the process of obtaining the binary form of the hash value in the step S14 is:
for each S by differential means r Coding the sequence, traversing each pixel of the gray level picture, recording the previous value larger than the current value as 1, otherwise, 0, comparing the last pixel value with the last pixel so that the last pixel value is connected end to form a circle, and the mostThe resulting r such binary sequences are denoted as B, where the r-th sequence is B r
Further, the step S14 of obtaining the hash value with unchanged rotation is:
handle B r Performing cyclic shift, taking the sequence with the smallest binary sequence value after cyclic shift as the final sequence of the circle, and finally obtaining r such binary sequences which are irrelevant to the starting point and are marked as Z, wherein the r-th sequence is Z r
Further, the process of step S15 is:
z is added in the order of r from 1 to H/2 r Combined to form the final information fingerprint x=z 1 Z 2 Z 3 ...Z H/2
Further, the process of step S2 is:
and calculating the Hamming distance of fingerprints of the reference frame and the target frame to be compared, wherein the larger the Hamming distance is, the more inconsistent the pictures are, and the smaller the Hamming distance is, the more similar the pictures are, when the distance is 0, the description is completely the same, all video frames on the target frame are compared once, the first 20 frames with the minimum Hamming distance are found, and the frames with the highest similarity are required frames for further SSIM comparison.
Compared with the prior art, the technical scheme of the application has the beneficial effects that:
the rotation-invariant perceptual hash of the application has a certain translation-invariant (pooling principle) by converting an image into a thumbnail; differential coding is used during coding, so that illumination weather conditions are irrelevant; adding a circumferential coding sequence irrelevant to a starting point to ensure that the rotation invariance exists; the corner portions are not encoded in the circumferential encoding order because the corner portions may differ significantly from view to view, thereby eliminating the different effects of other scenes of the corner upon rotation. Experiments on real unmanned aerial vehicle videos show that compared with the prior perceptual hash method, the method successfully solves the problem that the traditional perceptual hash has no rotation unchanged characteristic.
Drawings
FIG. 1 is a schematic diagram of the algorithm rotation invariant encoding mode of the present application;
FIG. 2 is a schematic diagram of a rotation invariant perceptual hash calculation of the present application;
FIG. 3 is a schematic diagram of a video frame positioning application flow of the present application;
FIG. 4 is a classical Lena diagram;
fig. 5 is a comparison chart of the frame positioning application result of the real shot video.
Detailed Description
The drawings are for illustrative purposes only and are not to be construed as limiting the present patent;
for the purpose of better illustrating the embodiments, certain elements of the drawings may be omitted, enlarged or reduced and do not represent the actual product dimensions;
it will be appreciated by those skilled in the art that certain well-known structures in the drawings and descriptions thereof may be omitted.
The technical scheme of the application is further described below with reference to the accompanying drawings and examples.
As shown in fig. 1-2, an unmanned aerial vehicle aerial video frame positioning method based on rotation invariant perception hash comprises the following steps:
s1: carrying out rotation-invariant hash value calculation on a video frame-reference frame to be positioned and a video frame-target frame to be retrieved;
s2: and comparing the hash value difference between the reference frame and the target frame and all frames of the unmanned aerial vehicle video at another time to find a frame with the smallest difference.
The specific process of step S1 is:
s11: performing picture scaling on the reference frame and the target frame;
s12: converting the gray level of the zoomed picture;
s13: the gray level diagram is subjected to a circumferential coding sequence for obtaining hash values;
s14: the binary form of the hash value is obtained for the picture processed in the step S13, and then the hash value which is unchanged in rotation is obtained;
s15: and (5) obtaining an information fingerprint according to the result of the step S14.
The process of step S11 is:
the pictures are uniformly scaled to 8 x 8, and the total of 64 pixels of the pictures are subjected to certain translation invariance, and when the translation amount is in a scaling range, the thumbnails are basically consistent.
The process of step S12 is: the scaled picture is converted into a 256-level gray scale.
As shown in fig. 3, the process of step S13 is:
encoding the hash according to the circumference to make it rotation invariance, taking the length r (1-H/2) as the radius of the circumference according to the length H of the picture, and taking the polar coordinate mode x=rcosθ+x for each radius length 0 ,y=rsinθ+y 0 Calculating the coordinate sequence of each circle, wherein θ is the value of each of 0-360 degrees, (x) 0 ,y 0 ) To scale the center of the picture, a circumferential sequence (x 1 1 ,y 1 1 )(x 1 2 ,y 1 2 )(x 1 3 ,y 1 3 ) .. wherein x is r i An i-th non-repeating coordinate index calculated according to a degree of 0 °,1 °,2 ° -360 ° under a circumference representing a radius r is obtained, and then S is obtained 1 =I(x 1 1 ,y 1 1 ),I(x 1 2 ,y 1 2 ),I(x 1 3 ,y 1 3 ) .. A sequence of pixel values indexed by a circumferential coordinate, r such sequences can be obtained as S (the r-th sequence is S r ) Such an encoding sequence will also automatically remove corner effects because two frames with rotation differences in the actual captured video will be quite different in the corner portions.
The process of obtaining the binary form of the hash value in step S14 is:
for each S by differential means r Coding the sequence, traversing each pixel of the gray level picture, wherein the previous value is greater than the current value and recorded as 1, otherwise, the previous value is 0, and comparing the last pixel value with the next pixel to enable the last pixel value to be connected end to form a circle, finally obtaining r binary sequences recorded as B, wherein the first pixel value isr sequences are B r
The process of obtaining the hash value with unchanged rotation in step S14 is as follows:
handle B r Performing cyclic shift, taking the sequence with the smallest binary sequence value after cyclic shift as the final sequence of the circle, and finally obtaining r such binary sequences which are irrelevant to the starting point and are marked as Z, wherein the r-th sequence is Z r
The process of step S15 is:
z is added in the order of r from 1 to H/2 r Combined to form the final information fingerprint x=z 1 Z 2 Z 3 ...Z H/2
The process of step S2 is:
and calculating the Hamming distance of fingerprints of the reference frame and the target frame to be compared, wherein the larger the Hamming distance is, the more inconsistent the pictures are, and the smaller the Hamming distance is, the more similar the pictures are, when the distance is 0, the description is completely the same, all video frames on the target frame are compared once, the first 20 frames with the minimum Hamming distance are found, and the frames with the highest similarity are required frames for further SSIM comparison.
The experiment selects a classical Lena image and an image after rotation of the Lena image (see fig. 4), which is compared with the conventional hash method. The results of the comparison are shown in Table 1.
Table 1 comparing the rotated image with the conventional hash method
Experiments show that our rotation-invariant hash is more robust in the Lena diagram than in the conventional method, regardless of rotation.
An actual unmanned aerial vehicle aerial video is adopted. A video frame is randomly selected, and a traditional method of perceptual hash is used in video of another time period. The overall process is the same as in fig. 3. Because there is no group true, we do a subjective comparison of the frames that were eventually searched. Fig. 5 is a comparison of the results of the frame positioning application of the real shot video.
From the results, it can be seen that our application has a larger improvement over the previous hash method, so that frames with rotation can also be accurately positioned. Since the entire video in fig. 5 does not have the same view frame, the result of the phash search is very bad. But is accurately found in the method of the present application.
The same or similar reference numerals correspond to the same or similar components;
the positional relationship depicted in the drawings is for illustrative purposes only and is not to be construed as limiting the present patent;
it is to be understood that the above examples of the present application are provided by way of illustration only and not by way of limitation of the embodiments of the present application. Other variations or modifications of the above teachings will be apparent to those of ordinary skill in the art. It is not necessary here nor is it exhaustive of all embodiments. Any modification, equivalent replacement, improvement, etc. which come within the spirit and principles of the application are desired to be protected by the following claims.

Claims (4)

1. The unmanned aerial vehicle aerial video frame positioning method based on rotation invariant perception hash is characterized by comprising the following steps of:
s1: carrying out rotation-invariant hash value calculation on a video frame-reference frame to be positioned and a video frame-target frame to be retrieved;
the specific process of the step S1 is as follows:
s11: performing picture scaling on the reference frame and the target frame;
s12: converting the gray level of the zoomed picture;
s13: the gray level diagram is subjected to a circumferential coding sequence for obtaining hash values; the process of the step S13 is:
encoding the hash according to a circumference to make it rotationally invariant, taking a length r (1~H/2) as the radius of the circumference according to the length H of the picture, each radius length using polar coordinatesMode(s) of (a),/>Calculating a coordinate sequence of each circle, wherein +.>For pressing->The values of->To scale the center of the picture, this results in a circumferential sequence +.>Wherein (1)>The circumference with a radius r is indicated by +.>The i-th non-repeated coordinate index calculated by the degree of (2) is obtained>A sequence of pixel values indexed by a circumferential coordinate, r such sequences are obtained and are denoted +.>The r sequence is->Such an encoding sequence will also automatically remove corner effects because two frames with rotation differences in the actual captured video will be completely different in the corner portions;
s14: the binary form of the hash value is obtained for the picture processed in the step S13, and then the hash value which is unchanged in rotation is obtained;
the process of obtaining the binary form of the hash value in step S14 is:
by means of difference for eachThe sequence is encoded, each pixel of the gray level picture is traversed, the previous value is greater than the current value and recorded as 1, otherwise, the previous value is 0, the last pixel value is compared with the last pixel, so that the last pixel value is connected end to form a circumference, and r binary sequences are finally obtained and recorded as ∈ ->Wherein the r-th sequence is +.>
The process of obtaining the rotation-invariant hash value in step S14 is as follows:
handlePerforming cyclic shift, taking the sequence with the smallest binary sequence value after cyclic shift as the final sequence of the circle, and finally obtaining r such starting point independent binary sequences as +.>Wherein the r-th sequence is +.>
S15: obtaining an information fingerprint according to the result of the step S14;
the process of step S15 is:
in order of r from 1 to H/2Combining to form the final letterInnovative fingerprint->
S2: and comparing the hash value difference between the reference frame and the target frame and all frames of the unmanned aerial vehicle video at another time to find a frame with the smallest difference.
2. The unmanned aerial vehicle aerial video frame positioning method based on the rotation invariant perceptual hash of claim 1, wherein the process of step S11 is:
the pictures are uniformly scaled to 8 x 8, and the total of 64 pixels of the pictures are subjected to certain translation invariance, and when the translation amount is in a scaling range, the thumbnails are basically consistent.
3. The unmanned aerial vehicle aerial video frame positioning method based on the rotation invariant perceptual hash of claim 2, wherein the process of step S12 is: the scaled picture is converted into a 256-level gray scale.
4. The unmanned aerial vehicle aerial video frame positioning method based on rotation invariant perceptual hashing of claim 3, wherein the process of step S2 is:
and calculating the Hamming distance of fingerprints of the reference frame and the target frame to be compared, wherein the larger the Hamming distance is, the more inconsistent the pictures are, and the smaller the Hamming distance is, the more similar the pictures are, when the distance is 0, the description is completely the same, all video frames on the target frame are compared once, the first 20 frames with the minimum Hamming distance are found, and the frames with the highest similarity are required frames for further SSIM comparison.
CN201911129923.6A 2019-11-18 2019-11-18 Unmanned aerial vehicle aerial video frame positioning method based on rotation invariant perception hash Active CN110942002B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911129923.6A CN110942002B (en) 2019-11-18 2019-11-18 Unmanned aerial vehicle aerial video frame positioning method based on rotation invariant perception hash

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911129923.6A CN110942002B (en) 2019-11-18 2019-11-18 Unmanned aerial vehicle aerial video frame positioning method based on rotation invariant perception hash

Publications (2)

Publication Number Publication Date
CN110942002A CN110942002A (en) 2020-03-31
CN110942002B true CN110942002B (en) 2023-11-07

Family

ID=69907730

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911129923.6A Active CN110942002B (en) 2019-11-18 2019-11-18 Unmanned aerial vehicle aerial video frame positioning method based on rotation invariant perception hash

Country Status (1)

Country Link
CN (1) CN110942002B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112434185B (en) * 2020-10-26 2023-07-14 国家广播电视总局广播电视规划院 Method, system, server and storage medium for searching similar video clips
CN113704532B (en) * 2020-11-25 2024-04-26 天翼数字生活科技有限公司 Method and system for improving picture retrieval recall rate
CN112698661B (en) * 2021-03-22 2021-08-24 成都睿铂科技有限责任公司 Aerial survey data acquisition method, device and system for aircraft and storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105069042A (en) * 2015-07-23 2015-11-18 北京航空航天大学 Content-based data retrieval methods for unmanned aerial vehicle spying images
CN106126585A (en) * 2016-06-20 2016-11-16 北京航空航天大学 Unmanned plane image search method based on quality grading with the combination of perception Hash feature
CN107040790A (en) * 2017-04-01 2017-08-11 华南理工大学 A kind of video content certification and tampering location method based on many granularity Hash
CN109918537A (en) * 2019-01-18 2019-06-21 杭州电子科技大学 A kind of method for quickly retrieving of the ship monitor video content based on HBase
CN110427895A (en) * 2019-08-06 2019-11-08 李震 A kind of video content similarity method of discrimination based on computer vision and system

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105069042A (en) * 2015-07-23 2015-11-18 北京航空航天大学 Content-based data retrieval methods for unmanned aerial vehicle spying images
CN106126585A (en) * 2016-06-20 2016-11-16 北京航空航天大学 Unmanned plane image search method based on quality grading with the combination of perception Hash feature
CN107040790A (en) * 2017-04-01 2017-08-11 华南理工大学 A kind of video content certification and tampering location method based on many granularity Hash
CN109918537A (en) * 2019-01-18 2019-06-21 杭州电子科技大学 A kind of method for quickly retrieving of the ship monitor video content based on HBase
CN110427895A (en) * 2019-08-06 2019-11-08 李震 A kind of video content similarity method of discrimination based on computer vision and system

Also Published As

Publication number Publication date
CN110942002A (en) 2020-03-31

Similar Documents

Publication Publication Date Title
CN110942002B (en) Unmanned aerial vehicle aerial video frame positioning method based on rotation invariant perception hash
Cao et al. Coverless information hiding based on the molecular structure images of material.
Lee et al. Robust video fingerprinting for content-based video identification
JP5175854B2 (en) Image descriptor for image recognition
Lei et al. Video sequence matching based on the invariance of color correlation
Qu et al. A convolutive mixing model for shifted double JPEG compression with application to passive image authentication
Kim et al. Adaptive weighted fusion with new spatial and temporal fingerprints for improved video copy detection
EP2776981A2 (en) Methods and apparatuses for mobile visual search
Sarkar et al. Video fingerprinting: features for duplicate and similar video detection and query-based video retrieval
CN108038488B (en) Robustness image hashing method based on SIFT and LBP mixing
Brasnett et al. Recent developments on standardisation of MPEG-7 Visual Signature Tools
CN105224619B (en) A kind of spatial relationship matching process and system suitable for video/image local feature
Wang et al. Spatial descriptor embedding for near-duplicate image retrieval
CN106952211B (en) Compact image hashing method based on feature point projection
Min et al. Video copy detection using inclined video tomography and bag-of-visual-words
Himeur et al. Joint color and texture descriptor using ring decomposition for robust video copy detection in large databases
Nie et al. Key-frame based robust video hashing using isometric feature mapping
Sujin et al. Copy-Move Geometric Tampering Estimation Through Enhanced SIFT Detector Method.
KR101367821B1 (en) video identification method and apparatus using symmetric information of hierachical image blocks
Hebbar et al. A Deep Learning Framework with Transfer Learning Approach for Image Forgery Localization
JP2018194956A (en) Image recognition dive, method and program
Fan et al. Image tampering detection using noise histogram features
Mengyang et al. Content-based video copy detection using binary object fingerprints
Chao Introduction to video fingerprinting
Venugopalan et al. Copy-Move Forgery Detection-A Study and the Survey

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant