CN110942002B

CN110942002B - Unmanned aerial vehicle aerial video frame positioning method based on rotation invariant perception hash

Info

Publication number: CN110942002B
Application number: CN201911129923.6A
Authority: CN
Inventors: 印鉴; 陈智聪; 陈殷齐
Original assignee: Sun Yat Sen University
Current assignee: Sun Yat Sen University
Priority date: 2019-11-18
Filing date: 2019-11-18
Publication date: 2023-11-07
Anticipated expiration: 2039-11-18
Also published as: CN110942002A

Abstract

The application provides an unmanned aerial vehicle aerial video frame positioning method based on rotation invariant perception hash, which has a certain translation invariant (pooling principle); differential coding is used during coding, so that illumination weather conditions are irrelevant; adding a circumferential coding sequence irrelevant to a starting point to ensure that the rotation invariance exists; the corner portions are not encoded in the circumferential encoding order because the corner portions may differ significantly from view to view, thereby eliminating the different effects of other scenes of the corner upon rotation. Experiments on real unmanned aerial vehicle videos show that compared with the prior perceptual hash method, the method successfully solves the problem that the traditional perceptual hash has no rotation unchanged characteristic.

Description

Unmanned aerial vehicle aerial video frame positioning method based on rotation invariant perception hash

Technical Field

The application relates to the field of video image processing algorithms, in particular to an unmanned aerial vehicle aerial video frame positioning method based on rotation invariant perception hash.

Background

In recent years, with the popularization of unmanned aerial vehicles, the use of unmanned aerial vehicle aerial videos to complete some tasks has been increasingly focused on by industry. However, in the task of abnormal cruising of the unmanned aerial vehicle in the river channel, the unmanned aerial vehicle videos in different time periods of the same river channel need to be positioned frame by frame, namely, one frame of one video is found out on the same position of one frame of the other video. Such tasks have not been studied.

The related technology is based on the picture search of the deep learning, but is not in line with the application of unmanned video frame positioning, because the picture search of the deep learning aims at finding the semantic match between pictures, the semantic information of a whole video is basically the same, the time consumption of the deep learning is too large, and a short video has a huge frame number. We employ a simple and fast similar picture search technique of perceptual hashing (Perceptual Hashing). Perceptual hashing is a type of unidirectional mapping of a set of multimedia data to a set of perceptual digests, i.e. a unique mapping of multimedia digital representations with the same perceptual content to a segment of digital digests, and satisfies perceptual robustness and security. The perceptual hash provides a safe and reliable technical support for information service modes such as multimedia content identification, retrieval, authentication and the like.

The Hash function (Hash Functions) is a digital Digest (Digest) for irreversibly extracting the original data, has the characteristics of unidirectionality, vulnerability and the like, and can ensure the uniqueness and the non-tamper property of the original data. Various hash functions have been successfully applied in the fields of information retrieval and management, data authentication, and the like. However, the conventional hash function cannot satisfy the frame positioning function of two unmanned aerial vehicle aerial videos at different times, because even if some differences in lens rotation exist between videos shot at different times in the same place, the hash map is incorrect (the rotation hash value of the same picture is completely different).

Disclosure of Invention

The application provides an unmanned aerial vehicle aerial video frame positioning method based on rotation invariant perception hash, which realizes that different influences of other scenes of corners during rotation are eliminated.

In order to achieve the technical effects, the technical scheme of the application is as follows:

an unmanned aerial vehicle aerial video frame positioning method based on rotation invariant perception hash comprises the following steps:

s1: carrying out rotation-invariant hash value calculation on a video frame-reference frame to be positioned and a video frame-target frame to be retrieved;

s2: and comparing the hash value difference between the reference frame and the target frame and all frames of the unmanned aerial vehicle video at another time to find a frame with the smallest difference.

Further, the specific process of the step S1 is:

s11: performing picture scaling on the reference frame and the target frame;

s12: converting the gray level of the zoomed picture;

s13: the gray level diagram is subjected to a circumferential coding sequence for obtaining hash values;

s14: the binary form of the hash value is obtained for the picture processed in the step S13, and then the hash value which is unchanged in rotation is obtained;

s15: and (5) obtaining an information fingerprint according to the result of the step S14.

Further, the process of step S11 is:

the pictures are uniformly scaled to 8 x 8, and the total of 64 pixels of the pictures are subjected to certain translation invariance, and when the translation amount is in a scaling range, the thumbnails are basically consistent.

Further, the process of step S12 is: the scaled picture is converted into a 256-level gray scale.

Further, the process of step S13 is:

encoding the hash according to the circumference to make it rotation invariance, taking the length r (1-H/2) as the radius of the circumference according to the length H of the picture, and taking the polar coordinate mode x=rcosθ+x for each radius length ₀ ，y＝rsinθ+y ₀ Calculating the coordinate sequence of each circle, wherein θ is 0-360 ^° Each value of (x) ₀ ,y ₀ ) To scale the center of the picture, a circumferential sequence (x ¹ ₁ ,y ¹ ₁ )(x ¹ ₂ ,y ¹ ₂ )(x ¹ ₃ ,y ¹ ₃ ) .. wherein x is ^r _i An i-th non-repeating coordinate index calculated according to a degree of 0 °,1 °,2 ° -360 ° under a circumference representing a radius r is obtained, and then S is obtained ¹ ＝I(x ¹ ₁ ,y ¹ ₁ ),I(x ¹ ₂ ,y ¹ ₂ ),I(x ¹ ₃ ,y ¹ ₃ ) .. A sequence of pixel values indexed by a circumferential coordinate, r such sequences can be obtained as S (the r-th sequence is S ^r ) Such an encoding sequence will also automatically remove corner effects because two frames with rotation differences in the actual captured video will be quite different in the corner portions.

Further, the process of obtaining the binary form of the hash value in the step S14 is:

for each S by differential means ^r Coding the sequence, traversing each pixel of the gray level picture, recording the previous value larger than the current value as 1, otherwise, 0, comparing the last pixel value with the last pixel so that the last pixel value is connected end to form a circle, and the mostThe resulting r such binary sequences are denoted as B, where the r-th sequence is B ^r 。

Further, the step S14 of obtaining the hash value with unchanged rotation is:

handle B ^r Performing cyclic shift, taking the sequence with the smallest binary sequence value after cyclic shift as the final sequence of the circle, and finally obtaining r such binary sequences which are irrelevant to the starting point and are marked as Z, wherein the r-th sequence is Z ^r 。

Further, the process of step S15 is:

z is added in the order of r from 1 to H/2 ^r Combined to form the final information fingerprint x=z ¹ Z ² Z ³ ...Z ^H/2 。

Further, the process of step S2 is:

and calculating the Hamming distance of fingerprints of the reference frame and the target frame to be compared, wherein the larger the Hamming distance is, the more inconsistent the pictures are, and the smaller the Hamming distance is, the more similar the pictures are, when the distance is 0, the description is completely the same, all video frames on the target frame are compared once, the first 20 frames with the minimum Hamming distance are found, and the frames with the highest similarity are required frames for further SSIM comparison.

Compared with the prior art, the technical scheme of the application has the beneficial effects that:

the rotation-invariant perceptual hash of the application has a certain translation-invariant (pooling principle) by converting an image into a thumbnail; differential coding is used during coding, so that illumination weather conditions are irrelevant; adding a circumferential coding sequence irrelevant to a starting point to ensure that the rotation invariance exists; the corner portions are not encoded in the circumferential encoding order because the corner portions may differ significantly from view to view, thereby eliminating the different effects of other scenes of the corner upon rotation. Experiments on real unmanned aerial vehicle videos show that compared with the prior perceptual hash method, the method successfully solves the problem that the traditional perceptual hash has no rotation unchanged characteristic.

Drawings

FIG. 1 is a schematic diagram of the algorithm rotation invariant encoding mode of the present application;

FIG. 2 is a schematic diagram of a rotation invariant perceptual hash calculation of the present application;

FIG. 3 is a schematic diagram of a video frame positioning application flow of the present application;

FIG. 4 is a classical Lena diagram;

fig. 5 is a comparison chart of the frame positioning application result of the real shot video.

Detailed Description

The drawings are for illustrative purposes only and are not to be construed as limiting the present patent;

for the purpose of better illustrating the embodiments, certain elements of the drawings may be omitted, enlarged or reduced and do not represent the actual product dimensions;

it will be appreciated by those skilled in the art that certain well-known structures in the drawings and descriptions thereof may be omitted.

The technical scheme of the application is further described below with reference to the accompanying drawings and examples.

As shown in fig. 1-2, an unmanned aerial vehicle aerial video frame positioning method based on rotation invariant perception hash comprises the following steps:

The specific process of step S1 is:

s11: performing picture scaling on the reference frame and the target frame;

s12: converting the gray level of the zoomed picture;

The process of step S11 is:

The process of step S12 is: the scaled picture is converted into a 256-level gray scale.

As shown in fig. 3, the process of step S13 is:

encoding the hash according to the circumference to make it rotation invariance, taking the length r (1-H/2) as the radius of the circumference according to the length H of the picture, and taking the polar coordinate mode x=rcosθ+x for each radius length ₀ ，y＝rsinθ+y ₀ Calculating the coordinate sequence of each circle, wherein θ is the value of each of 0-360 degrees, (x) ₀ ,y ₀ ) To scale the center of the picture, a circumferential sequence (x ¹ ₁ ,y ¹ ₁ )(x ¹ ₂ ,y ¹ ₂ )(x ¹ ₃ ,y ¹ ₃ ) .. wherein x is ^r _i An i-th non-repeating coordinate index calculated according to a degree of 0 °,1 °,2 ° -360 ° under a circumference representing a radius r is obtained, and then S is obtained ¹ ＝I(x ¹ ₁ ,y ¹ ₁ ),I(x ¹ ₂ ,y ¹ ₂ ),I(x ¹ ₃ ,y ¹ ₃ ) .. A sequence of pixel values indexed by a circumferential coordinate, r such sequences can be obtained as S (the r-th sequence is S ^r ) Such an encoding sequence will also automatically remove corner effects because two frames with rotation differences in the actual captured video will be quite different in the corner portions.

The process of obtaining the binary form of the hash value in step S14 is:

for each S by differential means ^r Coding the sequence, traversing each pixel of the gray level picture, wherein the previous value is greater than the current value and recorded as 1, otherwise, the previous value is 0, and comparing the last pixel value with the next pixel to enable the last pixel value to be connected end to form a circle, finally obtaining r binary sequences recorded as B, wherein the first pixel value isr sequences are B ^r 。

The process of obtaining the hash value with unchanged rotation in step S14 is as follows:

The process of step S15 is:

The process of step S2 is:

The experiment selects a classical Lena image and an image after rotation of the Lena image (see fig. 4), which is compared with the conventional hash method. The results of the comparison are shown in Table 1.

Table 1 comparing the rotated image with the conventional hash method

Experiments show that our rotation-invariant hash is more robust in the Lena diagram than in the conventional method, regardless of rotation.

An actual unmanned aerial vehicle aerial video is adopted. A video frame is randomly selected, and a traditional method of perceptual hash is used in video of another time period. The overall process is the same as in fig. 3. Because there is no group true, we do a subjective comparison of the frames that were eventually searched. Fig. 5 is a comparison of the results of the frame positioning application of the real shot video.

From the results, it can be seen that our application has a larger improvement over the previous hash method, so that frames with rotation can also be accurately positioned. Since the entire video in fig. 5 does not have the same view frame, the result of the phash search is very bad. But is accurately found in the method of the present application.

The same or similar reference numerals correspond to the same or similar components;

the positional relationship depicted in the drawings is for illustrative purposes only and is not to be construed as limiting the present patent;

it is to be understood that the above examples of the present application are provided by way of illustration only and not by way of limitation of the embodiments of the present application. Other variations or modifications of the above teachings will be apparent to those of ordinary skill in the art. It is not necessary here nor is it exhaustive of all embodiments. Any modification, equivalent replacement, improvement, etc. which come within the spirit and principles of the application are desired to be protected by the following claims.

Claims

1. The unmanned aerial vehicle aerial video frame positioning method based on rotation invariant perception hash is characterized by comprising the following steps of:

the specific process of the step S1 is as follows:

s11: performing picture scaling on the reference frame and the target frame;

s12: converting the gray level of the zoomed picture;

s13: the gray level diagram is subjected to a circumferential coding sequence for obtaining hash values; the process of the step S13 is:

encoding the hash according to a circumference to make it rotationally invariant, taking a length r (1~H/2) as the radius of the circumference according to the length H of the picture, each radius length using polar coordinatesMode(s) of (a)，/>Calculating a coordinate sequence of each circle, wherein +.>For pressing->The values of->To scale the center of the picture, this results in a circumferential sequence +.>Wherein (1)>The circumference with a radius r is indicated by +.>The i-th non-repeated coordinate index calculated by the degree of (2) is obtained>A sequence of pixel values indexed by a circumferential coordinate, r such sequences are obtained and are denoted +.>The r sequence is->Such an encoding sequence will also automatically remove corner effects because two frames with rotation differences in the actual captured video will be completely different in the corner portions;

the process of obtaining the binary form of the hash value in step S14 is:

by means of difference for eachThe sequence is encoded, each pixel of the gray level picture is traversed, the previous value is greater than the current value and recorded as 1, otherwise, the previous value is 0, the last pixel value is compared with the last pixel, so that the last pixel value is connected end to form a circumference, and r binary sequences are finally obtained and recorded as ∈ ->Wherein the r-th sequence is +.>；

The process of obtaining the rotation-invariant hash value in step S14 is as follows:

handlePerforming cyclic shift, taking the sequence with the smallest binary sequence value after cyclic shift as the final sequence of the circle, and finally obtaining r such starting point independent binary sequences as +.>Wherein the r-th sequence is +.>；

S15: obtaining an information fingerprint according to the result of the step S14;

the process of step S15 is:

in order of r from 1 to H/2Combining to form the final letterInnovative fingerprint->；

2. The unmanned aerial vehicle aerial video frame positioning method based on the rotation invariant perceptual hash of claim 1, wherein the process of step S11 is:

3. The unmanned aerial vehicle aerial video frame positioning method based on the rotation invariant perceptual hash of claim 2, wherein the process of step S12 is: the scaled picture is converted into a 256-level gray scale.

4. The unmanned aerial vehicle aerial video frame positioning method based on rotation invariant perceptual hashing of claim 3, wherein the process of step S2 is: