CN110942002A

CN110942002A - Unmanned aerial vehicle aerial photography video frame positioning method based on rotation invariant perceptual hashing

Info

Publication number: CN110942002A
Application number: CN201911129923.6A
Authority: CN
Inventors: 印鉴; 陈智聪; 陈殷齐
Original assignee: National Sun Yat Sen University
Current assignee: National Sun Yat Sen University
Priority date: 2019-11-18
Filing date: 2019-11-18
Publication date: 2020-03-31
Anticipated expiration: 2039-11-18
Also published as: CN110942002B

Abstract

The invention provides an unmanned aerial vehicle aerial video frame positioning method based on rotation invariant perceptual hashing, which has certain translation and deformation (posing principle); differential encoding is used during encoding, so that illumination weather conditions are irrelevant; and the starting point-independent circular coding sequence is added to make the rotation invariance; the corner parts are not coded in the circular coding sequence because the corner parts have large difference due to different visual angles, thereby eliminating the different influences of other scenes of the corners in the rotation process. Experiments on real unmanned aerial vehicle videos show that compared with the prior perceptual hash method, the method successfully solves the problem that the traditional perceptual hash has no rotation invariant characteristic.

Description

Unmanned aerial vehicle aerial photography video frame positioning method based on rotation invariant perceptual hashing

Technical Field

The invention relates to the field of video image processing algorithms, in particular to an unmanned aerial vehicle aerial photography video frame positioning method based on rotation invariant perceptual hashing.

Background

In recent years, with the popularization of unmanned aerial vehicles, the use of unmanned aerial vehicle aerial videos to complete some tasks has received more and more attention from the industry. However, in the task of the unmanned aerial vehicle river channel abnormal cruising, the unmanned aerial vehicle videos in different time periods of the same river channel need to be positioned frame by frame, that is, a frame of one video at the same position on another video is found. Such a task has not been studied in a relevant place.

The related technology is based on deep learning picture retrieval, but is not suitable for application of unmanned aerial vehicle video frame positioning, because the deep learning picture retrieval aims at finding semantic matching between pictures, while semantic information of the whole video is basically the same and the deep learning is too time-consuming, and a short video has huge frame number. So we adopt Perceptual Hashing (Perceptual Hashing) as a simple and fast similar picture searching technique. Perceptual hashing is a type of one-way mapping from a multimedia data set to a perceptual summary set, i.e., a multimedia digital representation with the same perceptual content is uniquely mapped into a segment of digital summary, and perceptual robustness and security are satisfied. The perceptual hash provides safe and reliable technical support for information service modes such as multimedia content identification, retrieval, authentication and the like.

The Hash Functions are irreversible digital digests (digests) for extracting original data, have the characteristics of unidirectionality, vulnerability and the like, and can ensure the uniqueness and the non-tampering property of the original data. Various hash functions have been successfully applied in the fields of information retrieval and management, data authentication, and the like. However, the conventional hash function cannot satisfy the frame positioning function of two unmanned aerial vehicle aerial videos at different times, because even if videos shot at different times in the same place have some lens rotation differences, the hash mapping is incorrect due to the slight differences (the hash values are different by rotating the same picture).

Disclosure of Invention

The invention provides an unmanned aerial vehicle aerial video frame positioning method based on rotation invariant perceptual hashing, which eliminates different influences of other scenes of corners during rotation.

In order to achieve the technical effects, the technical scheme of the invention is as follows:

an unmanned aerial vehicle aerial photography video frame positioning method based on rotation invariant perceptual hashing comprises the following steps:

s1: calculating hash values of the video frame to be positioned, the reference frame and the video frame to be retrieved, the target frame, which are not changed in rotation;

s2: and comparing the reference frame and the target frame with all frames of the unmanned aerial vehicle video at another time to find a frame with the minimum difference in comparison with the hash value.

Further, the specific process of step S1 is:

s11: carrying out picture scaling on the reference frame and the target frame;

s12: carrying out gray-scale image conversion on the zoomed image;

s13: acquiring a circumferential coding sequence of the hash value from the gray level image;

s14: obtaining a binary form of the hash value of the picture processed in the step S13 so as to obtain a hash value with unchanged rotation;

s15: an information fingerprint is derived from the result of S14.

Further, the process of step S11 is:

the pictures are uniformly scaled to 8 × 8 pictures with 64 pixels, so that the operation has certain translation invariance, and when the translation amount is in a scaling range, the thumbnails are basically consistent.

Further, the process of step S12 is: and converting the scaled picture into a 256-step gray-scale image.

Further, the process of step S13 is:

encoding hash according to a circle mode to make the hash have rotation invariance, taking the length r (1-H/2) as the radius of the circle according to the length H of the picture, and taking the length of each radius as the mode x of polar coordinates (rcos theta + x)₀，y＝rsinθ+y₀Calculating a coordinate sequence of each circle, wherein theta is calculated according to the ratio of 0-360^°Each value of (x)₀,y₀) To scale the center of the picture, a circular sequence (x) is obtained when r is 1¹ ₁,y¹ ₁)(x¹ ₂,y¹ ₂)(x¹ ₃,y¹ ₃) ... wherein x^r _iUnder the circumference of a circle with radius rThe ith non-repeating coordinate index calculated according to the degrees of 0 degrees, 1 degrees and 2 degrees to 360 degrees can be obtained¹＝I(x¹ ₁,y¹ ₁),I(x¹ ₂,y¹ ₂),I(x¹ ₃,y¹ ₃) ... a sequence of pixel values indexed by a circular coordinate, r such sequences can be obtained as S (the r-th sequence is S)^r) Such a coding sequence also automatically removes the corner effect, because two frames with rotation differences in the actual captured video will be completely different in the corner portion.

Further, the process of obtaining the binary form of the hash value in step S14 is:

for each S in a differential manner^rCoding the sequence, traversing each pixel of the gray picture, recording the previous value as 1 when the previous value is larger than the current value, otherwise, recording the previous value as 0, comparing the last pixel value with a pixel, connecting the last pixel value and the pixel end to form a circumference, and finally obtaining r binary sequences as B, wherein the r-th sequence is B^r。

Further, the process of acquiring the rotation-invariant hash value in step S14 is:

handle B^rPerforming cyclic shift, taking the sequence with the minimum binary sequence value after the cyclic shift as the final sequence of the circumference, and finally obtaining r binary sequences which are irrelevant to the starting point and are marked as Z, wherein the r sequence is Z^r。

Further, the process of step S15 is:

in the order of r from 1 to H/2, Z^rCombined to form the final information fingerprint X ═ Z¹Z²Z³...Z^H/2。

Further, the process of step S2 is:

and calculating the Hamming distance of the fingerprints of the reference frame and the target frame which need to be compared, wherein the larger the Hamming distance is, the more inconsistent the images are, otherwise, the smaller the Hamming distance is, the more similar the images are, when the distance is 0, the images are completely the same, all the video frames on the target frame are compared once, the first 20 frames with the minimum Hamming distance are found, and the frames with the highest similarity are the required frames when further SSIM comparison is carried out.

Compared with the prior art, the technical scheme of the invention has the beneficial effects that:

the rotation invariant perceptual hash of the invention has certain translation and no deformation by transforming the image into the thumbnail (posing principle); differential encoding is used during encoding, so that illumination weather conditions are irrelevant; and the starting point-independent circular coding sequence is added to make the rotation invariance; the corner parts are not coded in the circular coding sequence because the corner parts have large difference due to different visual angles, thereby eliminating the different influences of other scenes of the corners in the rotation process. Experiments on real unmanned aerial vehicle videos show that compared with the prior perceptual hash method, the method successfully solves the problem that the traditional perceptual hash has no rotation invariant characteristic.

Drawings

FIG. 1 is a schematic diagram of the algorithm rotation invariant coding scheme of the present invention;

FIG. 2 is a schematic diagram illustrating the computation of a rotation invariant perceptual hash according to the present invention;

FIG. 3 is a schematic view of a video frame alignment application of the present invention;

FIG. 4 is a classic Lena diagram;

FIG. 5 is a comparison graph of frame alignment application results for live video.

Detailed Description

The drawings are for illustrative purposes only and are not to be construed as limiting the patent;

for the purpose of better illustrating the embodiments, certain features of the drawings may be omitted, enlarged or reduced, and do not represent the size of an actual product;

it will be understood by those skilled in the art that certain well-known structures in the drawings and descriptions thereof may be omitted.

The technical solution of the present invention is further described below with reference to the accompanying drawings and examples.

As shown in fig. 1-2, an unmanned aerial vehicle aerial video frame positioning method based on rotation invariant perceptual hashing includes the following steps:

The specific process of step S1 is:

s11: carrying out picture scaling on the reference frame and the target frame;

s12: carrying out gray-scale image conversion on the zoomed image;

s15: an information fingerprint is derived from the result of S14.

The process of step S11 is:

The process of step S12 is: and converting the scaled picture into a 256-step gray-scale image.

As shown in fig. 3, the process of step S13 is:

encoding hash according to a circle mode to make the hash have rotation invariance, taking the length r (1-H/2) as the radius of the circle according to the length H of the picture, and taking the length of each radius as the mode x of polar coordinates (rcos theta + x)₀，y＝rsinθ+y₀And calculating a coordinate sequence of each circle, wherein theta is a value of 0-360 degrees, (x)₀,y₀) To scale the center of the picture, a circular sequence (x) is obtained when r is 1¹ ₁,y¹ ₁)(x¹ ₂,y¹ ₂)(x¹ ₃,y¹ ₃) ... wherein x^r _iRepresenting 0, 1, 2 DEG, 3 under a circle of radius rThe ith non-repeating coordinate index calculated in degrees of 60 deg., then S is obtained¹＝I(x¹ ₁,y¹ ₁),I(x¹ ₂,y¹ ₂),I(x¹ ₃,y¹ ₃) ... a sequence of pixel values indexed by a circular coordinate, r such sequences can be obtained as S (the r-th sequence is S)^r) Such a coding sequence also automatically removes the corner effect, because two frames with rotation differences in the actual captured video will be completely different in the corner portion.

The process of obtaining the binary form of the hash value in step S14 is:

The process of acquiring the rotation-invariant hash value in step S14 is:

The process of step S15 is:

The process of step S2 is:

The classical Lena graph and the image after the Lena graph is rotated (as shown in fig. 4) are selected experimentally and compared with the traditional hash method. The comparative results are shown in Table 1.

TABLE 1Lena graph after rotation, comparison with the traditional Hash method

Experiments show that our rotation-invariant hash is more robust than the hash in the conventional method in Lena's graph, regardless of the rotation.

Adopt actual unmanned aerial vehicle video of taking photo by plane. A video frame is randomly selected, and a traditional perceptual hash method and the like are used in the video of another time period. The whole process is the same as in fig. 3. Because there is no ground true, we do a subjective comparison on the finally searched frame. FIG. 5 shows a comparison of the frame alignment application results for live video.

From the results, it can be seen that our invention has a great improvement compared to the previous hash method, so that the frame with rotation can be accurately positioned. In fig. 5, since the same view angle frame does not exist in the whole video, the result of the hash search is very poor. But is found exactly in the method of the present application.

The same or similar reference numerals correspond to the same or similar parts;

the positional relationships depicted in the drawings are for illustrative purposes only and are not to be construed as limiting the present patent;

it should be understood that the above-described embodiments of the present invention are merely examples for clearly illustrating the present invention, and are not intended to limit the embodiments of the present invention. Other variations and modifications will be apparent to persons skilled in the art in light of the above description. And are neither required nor exhaustive of all embodiments. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present invention should be included in the protection scope of the claims of the present invention.

Claims

1. An unmanned aerial vehicle aerial photography video frame positioning method based on rotation invariant perceptual hashing is characterized by comprising the following steps:

2. The unmanned aerial vehicle aerial photography video frame positioning method based on rotation invariant perceptual hashing as claimed in claim 1, wherein the specific process of the step S1 is:

s11: carrying out picture scaling on the reference frame and the target frame;

s12: carrying out gray-scale image conversion on the zoomed image;

s15: an information fingerprint is derived from the result of S14.

3. The unmanned aerial vehicle aerial video frame positioning method based on rotation invariant perceptual hashing as claimed in claim 2, wherein the process of the step S11 is:

4. The unmanned aerial vehicle aerial video frame positioning method based on rotation invariant perceptual hashing as claimed in claim 3, wherein the process of the step S12 is: and converting the scaled picture into a 256-step gray-scale image.

5. The unmanned aerial vehicle aerial video frame positioning method based on rotation invariant perceptual hashing as claimed in claim 4, wherein the process of the step S13 is:

encoding hash according to a circle mode to make the hash have rotation invariance, taking the length r (1-H/2) as the radius of the circle according to the length H of the picture, and taking the length of each radius as the mode x of polar coordinates (rcos theta + x)₀，y＝rsinθ+y₀And calculating a coordinate sequence of each circle, wherein theta is a value of 0-360 degrees, (x)₀,y₀) To scale the center of the picture, a circular sequence (x) is obtained when r is 1¹ ₁,y¹ ₁)(x¹ ₂,y¹ ₂)(x¹ ₃,y¹ ₃) ... wherein x^r _iRepresenting the ith non-repeating coordinate index calculated according to the degrees of 0 DEG, 1 DEG, 2 DEG-360 DEG under the circumference with the radius r, then S can be obtained¹＝I(x¹ ₁,y¹ ₁),I(x¹ ₂,y¹ ₂),I(x¹ ₃,y¹ ₃) ... a sequence of pixel values indexed by a circular coordinate, r such sequences can be obtained as S (the r-th sequence is S)^r) Such a coding sequence also automatically removes the corner effect, because two frames with rotation differences in the actual captured video will be completely different in the corner portion.

6. The unmanned aerial vehicle aerial video frame positioning method based on rotation invariant perceptual hashing as claimed in claim 5, wherein the process of obtaining the binary form of the hash value in the step S14 is:

for each S in a differential manner^rCoding the sequence, traversing each pixel of the gray picture, recording the previous value as 1 if the previous value is larger than the current value, otherwise, recording the previous value as 0, comparing the last pixel value with a pixel, connecting the last pixel value and the pixel end to form a circle, and finally obtaining r binary sequences as B, wherein the r-th sequenceIs B^r。

7. The unmanned aerial vehicle aerial photography video frame positioning method based on rotation invariant perceptual hashing as claimed in claim 6, wherein the process of obtaining the rotation invariant hash value in step S14 is:

8. The unmanned aerial vehicle aerial video frame positioning method based on rotation invariant perceptual hashing as claimed in claim 7, wherein the process of the step S15 is:

9. The unmanned aerial vehicle aerial video frame positioning method based on rotation invariant perceptual hashing as claimed in claim 8, wherein the process of the step S2 is: