CN114429435A - Wide-field-of-view range target searching device, system and method in degraded visual environment - Google Patents

Wide-field-of-view range target searching device, system and method in degraded visual environment Download PDF

Info

Publication number
CN114429435A
CN114429435A CN202210081227.8A CN202210081227A CN114429435A CN 114429435 A CN114429435 A CN 114429435A CN 202210081227 A CN202210081227 A CN 202210081227A CN 114429435 A CN114429435 A CN 114429435A
Authority
CN
China
Prior art keywords
image
feature
target
fusion
camera
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210081227.8A
Other languages
Chinese (zh)
Inventor
刘延芳
周芮
齐骥
齐乃明
魏皓暄
霍明英
丁元川
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Harbin Institute of Technology
Original Assignee
Harbin Institute of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Harbin Institute of Technology filed Critical Harbin Institute of Technology
Priority to CN202210081227.8A priority Critical patent/CN114429435A/en
Publication of CN114429435A publication Critical patent/CN114429435A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/50Image enhancement or restoration by the use of more than one image, e.g. averaging, subtraction
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10032Satellite or aerial image; Remote sensing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10048Infrared image
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20212Image combination
    • G06T2207/20221Image fusion; Image merging

Abstract

A wide-field-of-view range target searching device, system and method in a degraded visual environment belongs to the technical field of machine vision. In order to solve the problem that the prior art can not realize large-scale and small-target searching and identification under a degraded visual environment, the device comprises a multi-view visual system, a connecting structure and a two-axis holder; wherein the multi-view vision system comprises an infrared camera, a long-focus camera and a wide-angle camera; the multi-view vision system is arranged on a two-axis tripod head by a connecting structure, and the two-axis tripod head realizes two-degree-of-freedom rotation; the central camera on the multi-view vision system is a long-focus camera, the infrared camera and the wide-angle camera are respectively arranged on two sides of the multi-view vision system, and the three cameras are distributed on the same holder. The system comprises a controller, an image processor and a set of the devices. The method realizes the small target search and position attitude estimation in a wide view field range under the degraded visual environment through the processes of image fusion, fusion evaluation, feature detection, attention switching, target identification and the like.

Description

Wide-field-of-view range target searching device, system and method in degraded visual environment
Technical Field
The invention relates to a wide field of view range object search system and method. Belongs to the technical field of machine vision.
Background
Along with the development of unmanned aerial vehicle technique, unmanned aerial vehicle function is more and more powerful, all plays important effect in military and civilian field, but unmanned aerial vehicle autonomy is limited at present, battles or carries out the task under the complex environment, still needs personnel to participate in the operation, lacks the autonomous perceptibility to operational environment or task environment. Therefore, how to improve the situation perception capability of the operator and utilize the sensor to autonomously perceive the situation and reduce the workload of the operator is a key research problem.
The visual sensor has the characteristics of high precision, high resolution, high sensitivity, small volume, light weight and the like, and is a typical sensor applied to checking and printing integrated and emission-independent equipment. However, the vision sensor is sensitive to illumination change, and it is difficult to meet the requirements of large-field detection and high-precision identification. In order to meet all-weather and large-area combat requirements or task requirements, higher requirements are put forward on sensors such as weapon systems and maneuvering systems, and new challenges are put forward on the aspects of continuous target tracking and accurate target identification of visual imaging technologies.
Currently, visible light sensors and infrared sensors are commonly used as visual sensors. Visible light images generally have higher spatial resolution, rich details, and obvious light-dark contrast. However, visible light images are susceptible to adverse conditions, such as underlighting, fog, or other adverse weather, known as Degraded Visual Environment (DVE). The infrared image has good resistance to the interference, but the resolution is relatively low, and the texture information is not rich enough. Therefore, it is difficult to provide highly robust image information under degraded visual environment by using a single sensor, and multi-view sensor fusion is imperative.
Disclosure of Invention
The invention aims to solve the problem that the prior art can not realize large-range and small-target search and identification under a degraded visual environment.
The wide-field-of-view range target searching device under the degraded visual environment comprises a multi-view visual system, a connecting structure and a two-axis holder; wherein the multi-view vision system comprises an infrared camera, a long-focus camera and a wide-angle camera;
the multi-view vision system is arranged on a two-axis tripod head by a connecting structure, and the two-axis tripod head realizes two-degree-of-freedom rotation;
the center camera on the multi-view vision system is a long-focus camera, the two sides of the center camera are respectively provided with an infrared camera and a wide-angle camera, and the three cameras are distributed on the same holder. Preferably, the three cameras are distributed in a line on the same pan/tilt head.
The wide-field-of-view range target searching system under the degraded visual environment comprises a controller, an image processor and a set of wide-field-of-view range target searching device under the degraded visual environment;
the controller controls the wide-angle camera and the infrared camera to acquire images and transmits the acquired images to the image processor;
the image processor performs image fusion and feature detection processing, and then transmits target position information to the controller; the image processor is also used for processing the image shot by the long-focus camera to obtain the target type and the attitude information;
the controller controls the motion motor to rotate, and the motor rotates to drive the camera and the position sensor to act; the controller is also used for controlling the long-focus camera to shoot.
The method for searching the target with the wide field range in the degraded visual environment comprises the following steps:
s1, decomposing the infrared image and the visible light image, and then fusing by a fusion strategy;
the infrared image and the visible light image are respectively obtained by an infrared camera and a wide-angle camera;
s2, performing feature detection on the fused image to obtain feature description;
s3, evaluating the fused image based on a feature-oriented image evaluation mode, screening the fused image, wherein the screened fused image is obtained based on an infrared image and a visible light image and is recorded as a wide-field fused image; recording the characteristic rich region in the wide-field fusion image as an interesting region;
s4, recording the position of the region after the region of interest is obtained in the wide-field fusion image by adopting an attention switching strategy, and controlling the wide-field range target searching system under the degraded visual environment to enable the region of interest to be located in the center of the field of view; acquiring an image corresponding to the region of interest by using a long-focus camera, wherein the image acquired by the long-focus camera is a high-resolution image;
s5, performing target identification on the image corresponding to the telephoto camera by adopting deep learning to obtain information such as target type, position and the like; and repeating the steps until all tasks are completed.
Further, in the process of decomposing and fusing the infrared image and the visible light image in S1, the image is divided into low-rank blocks of different sizes by using a multi-scale low-rank fusion method, and the low-rank blocks of different levels are fused by using different fusion strategies.
Further, the image is divided into low-rank blocks with different sizes by adopting a multi-scale low-rank fusion method, and the process of fusing the low-rank maps with different levels by adopting different fusion strategies comprises the following steps:
(1) taking the infrared image and the visible light image as images to be decomposed, and dividing the images to be decomposed into image matrixes formed by decomposing low-rank blocks with different sizes;
(2) fusing feature graphs with different sizes by adopting different fusion strategies, and adopting a summation fusion strategy for the images with the feature graph length/width being less than 50% of the original image scale; adopting a maximum value fusion strategy for the length/width of the original image which is more than or equal to 50 percent of the original image scale, and finally obtaining a plurality of fusion results under different fusion strategies; superposing and fusing the generated large and small feature maps to obtain a new large map which is a fusion result, and further obtain a series of different feature maps aiming at the same input;
wherein the partial images are fused based on a summation fusion strategy, represented as:
Fi(x,y)=Tri(x,y)+Tvi(x,y),i≤3
in the formula, Fi(x, y) represents the pixel value of the corresponding position of the fused image, Tri(x,y)、Tvi(x, y) respectively represent pixel values of positions corresponding to the infrared significant image (decomposed low-rank graph) and the visible significant image (decomposed low-rank graph);
the maximum value fusion rule fuses the partial decomposition images, and the expression is as follows:
Fi(x,y)=W(x,y)Tvi(x,y)+[1-W(x,y)]Tri(x,y)
in the formula, W (x, y) is an image fusion weight and is selected under guidance of a fusion image quality evaluation method.
Further, the step of S2, performing feature detection on the fused image to obtain a feature description, includes the following steps:
firstly, segmenting an image to be fused, marking each segmented region as a small region, wherein the geometric center of each small region is O, and the point C of the intensity centroid in the small region is described as follows:
Figure BDA0003485989080000031
where I (x, y) is the pixel value at image (x, y);
establishing OC vectors, and dividing each small region into N equal parts by taking the OC vectors as starting point boundaries in each small region, wherein the geometric characteristics of each small partCenter is defined as OsubiContinuously selecting the intensity center C in each fractionsubRepeating the calculation to obtain OsubiCsubiA vector; and calculate OC vector and OsubiCsubiAngle theta between vectors (i ═ 1,2, …, N)i
Figure BDA0003485989080000032
Will thetaiCarrying out binary coding: equally dividing the 2 pi angle into N' equal parts, each equal part being expressed by a binary code according to the included angle thetaiCorresponding to one of the N' equal parts, and taking the corresponding binary code as thetaiBinary coding of (2);
aiming at a small area, N is equally divided in total, and a vector with the size of 1xN is obtained in one small area, namely a feature description;
the vector with the size of 1xN obtained by corresponding to one small area is used as the description feature of the area, and each image can obtain the feature description of a plurality of vectors with the size of 1 xN. And for each image to be subjected to feature extraction, performing image feature matching by adopting a brute force search mode, and weighting the fused image and the infrared and visible light image feature matching results by utilizing the brute force search mode, so as to determine the feature robustness and the matching accuracy of the fused image.
Further, in the S3, the fused image is evaluated based on the feature-oriented image evaluation method, and in the process of screening the fused image, the evaluation index includes feature abundance; the feature abundance comprises feature robustness and matching accuracy; the feature robustness and the matching accuracy in the feature abundance are the feature robustness and the matching accuracy of the fusion image corresponding to the feature description determined in step S2.
Further, in S3, the fused image is evaluated based on the feature-oriented image evaluation method, and in the process of screening the fused image, the evaluation index further includes QAB/FStructural similarity, information entropy, spatial frequency.
Further, the specific process of S4 includes the following steps:
and driving the two-axis pan-tilt to rotate according to the position (x, y) of the center of the region of interest in the fused image, so that the center of the region of interest is always the position of the center of the region image, wherein the pitching motion depends on a pitching freedom motor and a rotating shaft, and the yawing motion depends on a yawing freedom motor and a rotating shaft.
Further, the process of performing target recognition on the high-resolution image by using deep learning to obtain information such as a target type and a target position in S5 includes the following steps:
adopting a twin network for target identification, wherein the twin network structure consists of a convolution layer, a pooling layer and a full-connection layer;
the method comprises the steps of utilizing an image shot by a long-focus camera as input of a neural network, carrying out feature extraction on the image through a convolution layer and a pooling layer, and integrating features through a full connection layer to obtain an estimated value p of a target pose, namely x, y, z, alpha, beta and gamma are estimated values of six degrees of freedom, wherein x, y and z are three-dimensional position coordinates, alpha is a yaw angle, beta is a pitch angle, and gamma is a roll angle.
Has the advantages that:
the wide-field-of-view range target searching device and system in the degraded visual environment have the following beneficial effects:
(1) in order to reduce or eliminate accidents of the unmanned aerial vehicle in a degraded visual environment and improve situation perception capability, the enhanced synthetic visual system integrating the infrared camera and the visible light camera is provided by utilizing strong immunity of the infrared image and rich information of the visible light image.
(2) Aiming at the problem that a single sensor is difficult to meet the requirements of image resolution and detection field of view, a method for cooperative work of a wide-angle camera and a telephoto camera is provided, and the problem of meeting the dual indexes of image resolution and detection field of view is solved by adopting an attention switching mechanism and an intelligent image processing algorithm.
The method for searching the target with the wide view field range in the degraded visual environment has the following beneficial effects:
(1) and a multi-scale low-rank decomposition method is adopted, the image characteristics of each decomposition layer are fully considered, fusion rules are specifically designed for the saliency maps and the global low-rank maps of different layers, and a characteristic abundance evaluation standard is introduced, so that a fusion image with high contrast and clear texture is finally obtained.
(2) And (3) applying an attention switching mechanism, firstly carrying out feature detection to obtain an interested region, then obtaining a high-resolution image of the interested region, and finally completing a small target identification task in a wide field range.
The invention provides a wide-field-of-view range target searching method in a degraded visual environment, which adopts a multi-scale low-rank image fusion method to perform infrared image and visible light image fusion, and adopts a fusion image evaluation method introducing feature abundance to screen and obtain a fusion image with high contrast and clear texture; and (3) applying an attention switching mechanism, performing feature detection on the large-range low-resolution image to obtain an interested region, staring at the interested region to obtain a local high-resolution image, and performing target identification on the high-resolution image. The invention relates to a wide-field-of-view range target searching method in a degraded visual environment, which can solve the practical problem of large-range and small-target searching and identifying task requirements in the degraded visual environment.
Drawings
FIG. 1 is a block diagram of a wide field of view range object search method in a degraded visual environment according to the present invention;
FIG. 2 is a schematic diagram of a multi-scale low-rank decomposition model according to the present invention;
FIG. 3 is a flow chart of the attention switching method according to the present invention;
FIG. 4 is a schematic diagram of a multi-view fusion target recognition system; the system comprises a multi-view vision system 1, a connecting structure 2 and a two-axis cloud deck 3;
FIG. 5 is a schematic diagram of a multi-vision system; wherein, the system comprises an infrared camera 1-1, a telephoto camera 1-2 and a wide-angle camera 1-3;
FIG. 6 is a topological diagram of the multi-view fusion target recognition system of the present invention;
FIG. 7 is a flow chart of the multi-objective fusion target identification system of the present invention;
FIG. 8 is a schematic diagram illustrating the division of an input image when N is taken as 4;
FIG. 9 is a schematic diagram of the composition of the neural network loss function.
Detailed Description
The first embodiment is as follows: the present embodiment is described with reference to figures 4 to 5,
the embodiment is a wide-field-of-view range target searching device in a degraded visual environment, and the device comprises a multi-view visual system 1, a connecting structure 2 and a two-axis holder 3; the multi-view vision system 1 comprises an infrared camera 1-1, a telephoto camera 1-2 and a wide-angle camera 1-3.
The multi-view vision system 1 is arranged on a two-axis tripod head 3 by a connecting structure 2, and the two-axis tripod head 3 realizes two-degree-of-freedom rotation.
The central camera on the multi-view vision system 1 is a long-focus camera 1-2, the two sides are respectively an infrared camera 1-1 and a wide-angle camera 1-3, the three cameras are distributed on the same tripod head, preferably the three cameras are distributed on the same tripod head in a linear mode, so that the fusion of images is facilitated, the determination of an interested area can be ensured, the processing of the long-focus camera and the like is facilitated, and the method can actually comprise but not limited to the fact that the three cameras are distributed on the same tripod head in a linear mode. The wide-angle camera has a large field range, but the relative resolution of the field is low; the long-focus camera has a small field range and high relative resolution of the field; the infrared camera is sensitive to infrared features.
The device seems simple in structure, but is actually finally determined after bionics research, particularly a wide-angle camera and an infrared camera simulate the afterglow effect of an animal vision system, and the rapid search of a dynamic suspicious target in a large range is realized; the infrared camera has the imaging capability on an infrared spectrum band, and enhances a target image by utilizing infrared information under the condition of unsatisfactory illumination conditions or obvious target infrared characteristics; the long-focus staring camera simulates staring and fine recognition of an animal vision system to a target, and discrimination of the target type before target locking and fine recognition of the target posture, morphology and other characteristics after the target locking are realized. Therefore, the wide-field-of-view range target searching device under the degraded visual environment is essentially a multi-target fusion target identification system based on bionic vision. So although the device appears structurally simple, this structural arrangement is not prior art which does not consider bionics to provide a multi-ocular visual system.
In addition, although the present invention seems to have a simple structure, it should be noted that, in the process of performing target search, the existing methods are all implemented based on a monocular camera or a binocular camera, even though the binocular camera is the same in type (basically, a visible light image camera), the camera device and more precisely the multi-view vision system are not set in consideration of the content of bionics, and because the content of bionics is not considered at all in the prior art, the situation that a telephoto camera, an infrared camera and a wide-angle camera are used in combination is not considered macroscopically, and further, the situation that the positions of the telephoto camera, the infrared camera and the wide-angle camera are specially set is not considered to achieve the functional simulation in bionics.
When the system starts to work, firstly, the infrared camera 1-1 and the wide-angle camera 1-3 are controlled to collect images, image fusion and feature detection results are obtained through the image processor, and regions (regions of interest) with large feature quantity are found and transmitted to the controller; and then, the controller calculates and controls the pitching motion motor and the yawing motion motor to move according to the obtained processing result, and the position information fed back by the position sensor in real time ensures that the motors move to the specified position in a closed loop manner, so that the region of interest is positioned in the center of the wide-field fusion image field of view.
And then, controlling a long-focus camera to acquire an image to obtain a high-resolution image at the interested area containing the suspected target. And carrying out target identification based on the high-resolution image to obtain information such as the category, the pose and the like of the target.
Therefore, the large-view-field-range and small-target searching and identifying task under the degraded visual environment is completed.
Compared with the prior art, the wide-field-of-view range target searching device in the degraded visual environment has the following beneficial effects:
(1) in order to reduce or eliminate accidents of the airplane in a degraded visual environment and improve situation perception capability, an enhanced synthetic visual system integrating an infrared camera and a visible light camera is provided by utilizing strong immunity of the infrared image and rich information of the visible light image.
(2) Aiming at the problem that a single sensor is difficult to meet the requirements of image resolution and detection field of view, a method for cooperative work of a wide-angle camera and a telephoto camera is provided, and the problem of meeting the dual indexes of image resolution and detection field of view is solved by adopting an attention switching mechanism and an intelligent image processing algorithm.
The second embodiment is as follows: the present embodiment is described with reference to figures 6 to 7,
the embodiment is a wide-view-field-range target searching system in a degraded visual environment, and the system comprises a controller, an image processor and a set of wide-view-field-range target searching device in the degraded visual environment.
As shown in fig. 7, the controller controls the motor to move and the infrared camera, the wide-angle camera and the telephoto camera to acquire and trigger images, and acquires position sensor data. The camera image is transmitted to the image processor, the image processor transmits the processing result to the controller, and the controller generates a corresponding instruction.
More specifically, the present invention is to provide a novel,
firstly, a controller controls a wide-angle camera and an infrared camera to acquire images, the acquired images are transmitted to an image processor, and target position information of the images after image fusion and feature detection processing is transmitted to the controller; the controller controls the motion motor to rotate, and the motor rotates to drive the camera and the position sensor to act; the controller collects data of the position sensor to perform position closed-loop control, so that the image of the region of interest with rich characteristics is positioned in the center of the visual field of the sensor, then the telephoto camera is controlled to shoot to obtain a high-resolution image of the region of interest, and the image processor obtains the type and the posture information of the target.
The infrared camera 1-1 and the wide-angle camera 1-2 respectively shoot images, infrared imaging and visible light imaging are fused, target searching is carried out by utilizing the fused images, a suspicious target exists in a region with rich characteristics, the image miss distance of the suspicious target region is calculated, and the servo holder 3 rotates to enable the suspicious target region to be positioned at the center of the image, so that searching and positioning of the suspicious target are completed; switching the tele-cameras 1-3 to image a target area, capturing suspected targets at the center of a view field, further identifying the targets by a deep learning mode by utilizing richer characteristics, and locking the targets and tracking the targets by the wide-angle cameras after confirming the targets of the task; under the condition that the target tracking is normal, the target is positioned in the center of the view field of the wide-angle camera, namely in the view field of the tele camera, and the tele camera outputs a pose estimation result in real time; and if the suspected target is not the target of the task, the target is failed to be locked or the target is failed to be tracked in the tracking process, the target search is carried out again.
In conclusion, the wide-field-range target searching system under the degraded visual environment can obtain stable images and complete large-range searching and small-target identification tasks.
Compared with the prior art, the wide-field-of-view range target searching system in the degraded visual environment has the following beneficial effects:
(1) in order to reduce or eliminate accidents of the airplane in the DVE environment and improve situation perception capability, the enhanced synthetic vision system integrating the infrared camera and the visible light camera is provided by utilizing the strong immunity of the infrared image and the rich information of the visible light image.
(2) Aiming at the problem that a single sensor is difficult to meet the requirements of image resolution and detection field of view, a method for cooperative work of a wide-angle camera and a telephoto camera is provided, and the problem of meeting the dual indexes of image resolution and detection field of view is solved by adopting an attention switching mechanism and an intelligent image processing algorithm.
The wide-field-of-view range target searching device in the degraded visual environment comprises an infrared camera, a long-focus camera and a wide-angle camera, wherein the infrared camera and the wide-angle camera perform image fusion to obtain an anti-interference image and perform large-range target searching; and the long-focus camera performs further target identification and tracking on the suspected target area. The multi-view fusion system can solve the practical problems of large-range and small-target search and identification in ground reconnaissance and navigation scenes based on bionic vision.
The third concrete implementation mode:
the embodiment is a method for searching a target in a wide field range in a degraded visual environment, and as shown in fig. 1, the method includes five steps of image fusion, fusion evaluation, feature detection, attention switching, and target identification, and each step is sequentially executed, so that a function of searching the target in the wide field range in the degraded visual environment is finally realized, and tasks of large-range search and small-target identification are completed. The image fusion is to fuse an infrared image and a visible light image by adopting a multi-scale low-rank fusion method to obtain a fusion image; and (4) evaluating the fused image by introducing a characteristic abundance evaluation standard for image evaluation to obtain the fused image with high resolution and clear texture. And performing feature detection on the fusion image to obtain an interesting region with rich features, performing attention switching, acquiring a high-resolution image in the interesting region, and then performing target identification.
Specifically, the method for searching for a target with a wide field range in a degraded visual environment according to the embodiment includes the following steps:
s1, aiming at the infrared image and the visible light image, dividing the image into low-rank blocks with different sizes by adopting a multi-scale low-rank fusion method, wherein the low-rank blocks comprise a local low-rank diagram and a global low-rank diagram; fusing low-rank graphs of different levels by adopting different fusion strategies;
the infrared image and the visible light image are respectively obtained by an infrared camera 1-1 and a wide-angle camera 1-3;
and S2, performing feature detection on the fused image, obtaining features by adopting a two-stage centroid-mass center feature detection mode, calculating the directions of feature points and performing feature description.
S3, evaluating the fused image based on the characteristic-oriented image evaluation mode, and screening the fused image; and recording the feature rich region in the fused image as a region of interest.
S4, recording the position of the region after the region of interest is obtained in the wide-field fusion image by adopting an attention switching strategy, and controlling the wide-field range target searching system under the degraded visual environment to enable the region of interest to be located in the center of the field of view; acquiring a high-resolution image in a region of interest;
as shown in fig. 3, in practice, the present invention firstly performs feature detection on the wide view field image, performs feature distribution judgment to obtain an area of interest with rich features, and then switches attention to further image the area of interest to obtain a high resolution image.
S5, performing target identification on the image corresponding to the telephoto camera by adopting deep learning to obtain information such as target type, position and the like; and repeating the steps until all tasks are completed.
The image obtained by the tele camera is generally a single image larger than 100w pixels, and is a high-resolution image relative to the wide-range images obtained by the wide-angle camera and the infrared camera.
In some embodiments, the process of performing image fusion of the infrared image and the visible light image by using the multi-scale low-rank decomposition method in S1 includes the following steps:
as shown in fig. 2, (1) an infrared image and a visible light image are taken as images to be decomposed, and an image to be decomposed Y is divided into image matrixes formed by decomposing low-rank blocks with different sizes;
(2) fusing feature maps with different sizes by adopting different fusion strategies, and adopting a summation fusion strategy for the image with the length/width of the feature map being less than 50% of the original image scale (small-scale feature map); and adopting a maximum value fusion strategy for the length/width of the image which is more than or equal to 50% of the original image scale (large-scale feature map), and finally obtaining a plurality of fusion results under different fusion strategies.
Fusing the partial images based on a summation fusion strategy, represented as:
Fi(x,y)=Tri(x,y)+Tvi(x,y),i≤3
in the formula: fi(x, y) represents the pixel value of the corresponding position of the fused image, Tri(x,y)、TviThe (x, y) values of pixels at positions corresponding to the infrared significant image (decomposed low rank map) and the visible significant image (decomposed low rank map) are represented, respectively.
The maximum value fusion rule fuses the partial decomposition images, and the expression is as follows:
Fi(x,y)=W(x,y)Tvi(x,y)+[1-W(x,y)]Tri(x,y)
in the formula, W (x, y) is an image fusion weight and is selected under guidance of a fusion image quality evaluation method.
The significance information of the large-scale significant part and the texture information of the part can be better reserved based on the maximum absolute value weighting fusion mode, and the complementary characteristics of the infrared image and the visible image can be better fused.
In fact, the above fusion process can be summarized as follows:
1) decomposition into different size characteristic maps
2) Fusing the large feature map and the large feature map, fusing the small feature map and the small feature map, and obtaining fused large and small feature maps under unified image input under a series of unified scenes by adopting different strategies;
the above fusion strategy is selected in the present embodiment, but the present invention includes, but is not limited to, the above strategy as long as a plurality of fused images of different decomposition and fusion strategies are finally obtained.
3) Superposing and fusing the size characteristic graphs generated in the step 2) to obtain a new large graph which is a fusion result;
the fusion strategy employed for 2) and 3) can be employed in a variety of ways, resulting in a series of different feature maps for the same input.
S2, the process of detecting the characteristics on the fused image, obtaining the characteristics by adopting a two-stage centroid-centroid characteristic detection mode, calculating the characteristic point direction and describing the characteristics comprises the following steps:
calculating the characteristics of the fused images by using a characteristic descriptor, and simply using all subsequent fused images as images for convenient expression;
the angular point feature detector adopts a low-dimensional feature description method, firstly, the image to be fused is segmented, each segmented region is marked as a small region, the geometric center of each small region is O, and the strength centroid C point in each small region is described as:
Figure BDA0003485989080000091
where I (x, y) is the pixel value at the image (x, y).
Establishing OC vectors, dividing each small region into N equal parts by taking the OC vectors as starting point boundaries in each small region, and defining the geometric center of each small part as OsubiContinuously selecting the intensity center C in each fractionsubRepeating the calculation to obtain OsubiCsubiVector quantity; and calculate OC vector and OsubiCsubiAngle theta between vectors (i ═ 1,2, …, N)i
Figure BDA0003485989080000101
Will thetaiCarrying out binary coding: equally dividing the 2 pi angle into N' equal parts, each equal part is represented by a binary code, preferably equally dividing the 2 pi angle into eight equal parts, each equal part is represented by eight binary codes of 000, 001, 010, 011, 100, 101, 110 and 111 according to the included angle thetaiCorresponding to one of the N' equal parts, and taking the corresponding binary code as thetaiBinary coding of (2);
for a small area, there are N equal divisions in total, and a small area obtains a vector of 1xN size, which is a feature description.
Taking a vector of 1xN size obtained by corresponding to a small area as a description feature, and respectively matching the fused image with the infrared image and the visible image by adopting a violent search mode aiming at each fused image, thereby completing the feature extraction step; meanwhile, weighting is carried out on the fusion image and the infrared and visible light image feature matching results by utilizing a violent searching mode, so that the feature robustness and the matching accuracy of the fusion image are determined.
As shown in fig. 8, taking N to take 4 as an example to describe, firstly, an input image is divided, mxm small regions are selected, the geometric center of each region is an O point, the region intensity centroid is a C point, an OC vector is obtained, 4 sub-regions are divided counterclockwise around the O point by taking the OC vector as a starting edge, and a geometric centroid O is obtained in each regionsubiIntensity centroid CsubiAnd i is 1,2, 3, 4. Calculating OC vector and OsubiCsubiAngle of vector to obtain theta1、θ2、θ3、θ4The angle is represented as a 1x4 vector in binary coding as a description of the characteristic region.
Therefore, an image can be described by a plurality of 1xN vectors, and the matching is searched in a block form.
The process of evaluating the fused image by using the feature-oriented image fusion quality evaluation method described in S3 includes the steps of:
to achieve robustness and stability of feature extraction, at QAB/FAnd on the basis of objective evaluation methods such as structural similarity, information entropy and spatial frequency, indexes of image feature abundance are introduced and used for correlating image fusion results and the extraction quality of image feature information. Wherein QAB/FIs an objective image fusion evaluation method, which is composed of C.S. XYdeas, V.
Figure BDA0003485989080000102
A pixel level image fusion quality assessment indicator is presented that reflects the quality of visual information obtained from input image fusion.
Factors considered for feature abundance include, but are not limited to:
1) the feature robustness is as follows: the feature robustness considers the stability of the extracted features and is insensitive to small deformation of the features, and the method comprises the following steps: rotation, scaling, blurring, illumination, etc.; namely, the extracted features of the two images are unique and stable under different visual angles of the same target.
2) Matching accuracy: matching the different fusion images with the infrared image and the visible image of the frame respectively, and considering the ratio of the correct matching number to the corresponding feature number; namely, the fused image can completely reflect the characteristics of the infrared and visible light images before fusion.
The features in the feature abundance adopt description features determined by the secondary centroid-centroid in the step S2, global interferences such as rotation, scale scaling, blurring and illumination change are added to the fused image in consideration of the feature robustness, and whether the number of the description features has large change or not is considered; the matching accuracy is that distance calculation is carried out on the fused image and the infrared image and the visible light image respectively to carry out feature matching, and the matching accuracy is higher when the matching success quantity is larger.
If the evaluation speed is increased, in some embodiments, the feature abundance may be directly used for scoring, that is, for the feature extraction result of step S2, the feature abundance factors are counted, the score of each factor is weighted, that is, the score is comprehensively scored, and according to the obtained comprehensive score, the highest score is selected as the final fused image.
In some embodiments, the fused image may be referred to from QAB/FThe method comprises the steps of scoring by objective methods such as structural similarity, information entropy and spatial frequency, scoring from feature abundance, carrying out weighted summation on the scores to obtain a comprehensive score, and selecting the person with the highest score as the final fusion image. Therefore, various factors in the evaluation system are comprehensively considered, the method is suitable for low-texture images, the feature extraction can be effectively carried out, the image fusion quality is further evaluated, and the region-of-interest result is obtained.
The specific process of the step S4 includes the following steps:
and then, driving the two-axis pan-tilt to rotate according to the position (x, y) of the center of the region of interest in the fused image, so that the center of the region of interest is always the position of the center of the region image, wherein the pitching motion depends on a pitching freedom motor and a rotating shaft, and the yawing motion depends on a yawing freedom motor and a rotating shaft.
S5, the process of identifying the target of the high-resolution image by adopting deep learning to obtain the information such as the type, the position and the like of the target comprises the following steps:
and a twin network is adopted for target identification, and the twin network structure consists of a convolutional layer, a pooling layer and a full-connection layer. The method comprises the steps of utilizing an image shot by a long-focus camera as input of a neural network, carrying out feature extraction on the image through a convolution layer and a pooling layer, and integrating features through a full connection layer to obtain an estimated value p of a target pose, namely x, y, z, alpha, beta and gamma are estimated values of six degrees of freedom, wherein x, y and z are three-dimensional position coordinates, alpha is a yaw angle, beta is a pitch angle, and gamma is a roll angle.
Aiming at the twin network, inputting a plurality of photos at different moments into the same network to obtain the loss of a single frame and the loss of the difference between two frames, and defining a new loss for training;
as shown in fig. 9, the loss function consists of two parts: a space pose constraint loss function and a single-frame pose loss function, namely:
L=λ1Lcons(pt,pt-1,pt-2)+λ2Lp(p,p′)
wherein λ is12Weight value, LconsFor spatial pose constraint loss function, LpIs a single frame pose loss function.
The single frame pose loss function is:
Figure BDA0003485989080000121
wherein x, y, z, alpha, beta and gamma are actual values of the pose with six degrees of freedom at the current moment, and x ', y', z ', alpha', beta 'and gamma' are estimated values of the pose with six degrees of freedom at the current moment.
The spatial pose constraint loss function is:
Figure BDA0003485989080000122
wherein p istIs a single-frame pose estimation value with six degrees of freedom at the time t, pt-1Is a single-frame your six-degree-of-freedom pose estimation value at the time of t-1, pt-2Is a single-frame six-degree-of-freedom pose estimation value at the time of t-2.
And in the training stage, a virtual simulation system is used for collecting continuous frame data images and corresponding pose labels, and a twin convolution neural network structure is adopted for training according to a defined loss function. The input of the using stage is a continuous frame sequence image, and the output is a pose result with six degrees of freedom.
The image fusion, fusion evaluation, target search, attention switching and target identification full-flow tasks are completed, and the large-view-field and small-target search and identification functions in the degraded visual environment are realized.
Compared with the prior art, the wide-field-of-view range target searching method in the degraded visual environment has the following beneficial effects:
(1) and a multi-scale low-rank decomposition method is adopted, the image characteristics of each decomposition layer are fully considered, fusion rules are specifically designed for the saliency maps and the global low-rank maps of different layers, and a characteristic abundance evaluation standard is introduced, so that a fusion image with high contrast and clear texture is finally obtained.
(2) And (3) applying an attention switching mechanism, firstly carrying out feature detection to obtain an interested region, then obtaining a high-resolution image of the interested region, and finally completing a small target identification task in a wide field range.
The invention provides a wide-field-of-view range target searching method in a degraded visual environment, which adopts a multi-scale low-rank image fusion method to perform infrared image and visible light image fusion, and adopts a fusion image evaluation method introducing feature abundance to screen and obtain a fusion image with high contrast and clear texture; and (3) applying an attention switching mechanism, performing feature detection on the large-range low-resolution image to obtain an interested region, staring at the interested region to obtain a local high-resolution image, and performing target identification on the high-resolution image. The invention relates to a wide-field-of-view range target searching method in a degraded visual environment, which can solve the practical problem of large-range and small-target searching and identifying task requirements in the degraded visual environment.
The present invention is capable of other embodiments and its several details are capable of modifications in various obvious respects, all without departing from the spirit and scope of the present invention.

Claims (10)

1. The wide-field-of-view range target searching device under the degraded visual environment is characterized by comprising a multi-view visual system (1), a connecting structure (2) and a two-axis tripod head (3); wherein the multi-view vision system (1) comprises an infrared camera (1-1), a telephoto camera (1-2) and a wide-angle camera (1-3);
the multi-view vision system (1) is arranged on the two-axis tripod head (3) through a connecting structure (2), and the two-axis tripod head (3) realizes two-degree-of-freedom rotation;
the center camera on the multi-view vision system (1) is a long-focus camera (1-2), the two sides of the multi-view vision system are respectively an infrared camera (1-1) and a wide-angle camera (1-3), and the three cameras are distributed on the same pan-tilt.
2. The system for searching the target with the wide field of view range in the degraded visual environment is characterized by comprising a controller, an image processor and a set of target searching devices with the wide field of view range in the degraded visual environment according to claim 1;
the controller controls the wide-angle camera and the infrared camera to acquire images, and the acquired images are transmitted to the image processor;
the image processor performs image fusion and feature detection processing, and then transmits target position information to the controller; the image processor is also used for processing the image shot by the long-focus camera to obtain the target type and the attitude information;
the controller controls the motion motor to rotate, and the motor rotates to drive the camera and the position sensor to act; the controller is also used for controlling the long-focus camera to shoot.
3. The method for searching the target with the wide field range in the degraded visual environment is characterized by comprising the following steps of:
s1, decomposing the infrared image and the visible light image, and then fusing by a fusion strategy;
the infrared image and the visible light image are respectively obtained through an infrared camera (1-1) and a wide-angle camera (1-3);
s2, performing feature detection on the fused image to obtain feature description;
s3, evaluating the fused image based on a feature-oriented image evaluation mode, screening the fused image, wherein the screened fused image is obtained based on an infrared image and a visible light image and is recorded as a wide-field fused image; recording the characteristic rich region in the wide-field fusion image as an interesting region;
s4, recording the position of the region after the region of interest is obtained in the wide-field fusion image by adopting an attention switching strategy, and controlling the wide-field range target searching system under the degraded visual environment to enable the region of interest to be located in the center of the field of view; acquiring an image corresponding to the region of interest by using a long-focus camera, wherein the image acquired by the long-focus camera is a high-resolution image;
s5, performing target identification on the image corresponding to the telephoto camera by adopting deep learning to obtain information such as target type, position and the like; and repeating the steps until all tasks are completed.
4. The method for searching for the target with the wide field of view in the degraded visual environment of claim 3, wherein in the process of decomposing and fusing the infrared image and the visible image, the image is divided into low-rank blocks with different sizes by using a multi-scale low-rank fusion method, and the low-rank blocks with different levels are fused by using different fusion strategies in step S1.
5. The method for searching the target with the wide field range in the degraded visual environment according to claim 4, wherein the image is divided into low-rank blocks with different sizes by adopting a multi-scale low-rank fusion method, and the process of fusing the low-rank maps with different levels by adopting different fusion strategies comprises the following steps:
(1) taking the infrared image and the visible light image as images to be decomposed, and dividing the images to be decomposed into image matrixes formed by decomposing low-rank blocks with different sizes;
(2) fusing feature graphs with different sizes by adopting different fusion strategies, and adopting a summation fusion strategy for the images with the feature graph length/width being less than 50% of the original image scale; adopting a maximum value fusion strategy for the length/width of the original image which is more than or equal to 50 percent of the original image scale, and finally obtaining a plurality of fusion results under different fusion strategies; superposing and fusing the generated large and small feature maps to obtain a new large map which is a fusion result, and further obtain a series of different feature maps aiming at the same input;
wherein the partial images are fused based on a summation fusion strategy, represented as:
Fi(x,y)=Tri(x,y)+Tvi(x,y),i≤3
in the formula, Fi(x, y) represents the pixel value of the corresponding position of the fused image, Tri(x,y)、Tvi(x, y) respectively representing pixel values of corresponding positions of the infrared significant image and the visible light significant image;
the maximum value fusion rule fuses the partial decomposition images, and the expression is as follows:
Fi(x,y)=W(x,y)Tvi(x,y)+[1-W(x,y)]Tri(x,y)
in the formula, W (x, y) is an image fusion weight and is selected under guidance of a fusion image quality evaluation method.
6. The method for searching for the target with the wide field of view in the degraded visual environment of claim 5, wherein the step of performing feature detection on the fused image to obtain the feature description at S2 comprises the following steps:
firstly, segmenting an image to be fused, marking each segmented region as a small region, wherein the geometric center of each small region is O, and the point C of the intensity centroid in the small region is described as follows:
Figure FDA0003485989070000021
where I (x, y) is the pixel value at image (x, y);
establishing OC vectors, dividing each small region into N equal parts by taking the OC vectors as starting point boundaries in each small region, and defining the geometric center of each small part as OsubiContinuously selecting the intensity center C in each fractionsubRepeating the calculation to obtain OsubiCsubiVector quantity; and calculate OC vector and OsubiCsubiAngle theta between vectorsi
Figure FDA0003485989070000022
Will thetaiCarrying out binary coding: equally dividing the 2 pi angle into N' equal parts, each equal part being expressed by a binary code according to the included angle thetaiCorresponding to one of the N' equal parts, and taking the corresponding binary code as thetaiBinary coding of (2);
aiming at a small area, N is equally divided in total, and a vector with the size of 1xN is obtained in one small area, namely a feature description;
the vector with the size of 1xN obtained by corresponding to one small area is used as the description feature of the area, and each image can obtain the feature description of a plurality of vectors with the size of 1 xN. And for each image to be subjected to feature extraction, performing image feature matching by adopting a brute force search mode, and weighting the fused image and the infrared and visible light image feature matching results by utilizing the brute force search mode, so as to determine the feature robustness and the matching accuracy of the fused image.
7. The method according to claim 6, wherein the fused image is evaluated based on the feature-oriented image evaluation method in S3, and the evaluation index includes feature abundance in the process of screening the fused image; the feature abundance comprises feature robustness and matching accuracy; the feature robustness and the matching accuracy in the feature abundance are the feature robustness and the matching accuracy of the fusion image corresponding to the feature description determined in step S2.
8. The method for searching for the target with the wide field of view in the degraded visual environment of claim 7, wherein the step S3 is to evaluate the fused image based on the feature-oriented image evaluation method, and the evaluation index further includes Q in the process of screening the fused imageAB/FStructural similarity, information entropy, spatial frequency.
9. The method for searching the target with the wide field of view range in the degraded visual environment according to claim 7 or 8, wherein the specific process of S4 comprises the following steps:
and driving the two-axis pan-tilt to rotate according to the position (x, y) of the center of the region of interest in the fused image, so that the center of the region of interest is always the position of the center of the region image, wherein the pitching motion depends on a pitching freedom motor and a rotating shaft, and the yawing motion depends on a yawing freedom motor and a rotating shaft.
10. The method for searching the target with the wide field of view in the degraded visual environment of claim 9, wherein the step S5 of performing target recognition on the high-resolution image by using deep learning to obtain the information about the type and the position of the target includes the following steps:
adopting a twin network for target identification, wherein the twin network structure consists of a convolution layer, a pooling layer and a full-connection layer;
the method comprises the steps of utilizing an image shot by a long-focus camera as input of a neural network, carrying out feature extraction on the image through a convolution layer and a pooling layer, and integrating features through a full connection layer to obtain an estimated value p of a target pose, namely x, y, z, alpha, beta and gamma are estimated values of six degrees of freedom, wherein x, y and z are three-dimensional position coordinates, alpha is a yaw angle, beta is a pitch angle, and gamma is a roll angle.
CN202210081227.8A 2022-01-24 2022-01-24 Wide-field-of-view range target searching device, system and method in degraded visual environment Pending CN114429435A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210081227.8A CN114429435A (en) 2022-01-24 2022-01-24 Wide-field-of-view range target searching device, system and method in degraded visual environment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210081227.8A CN114429435A (en) 2022-01-24 2022-01-24 Wide-field-of-view range target searching device, system and method in degraded visual environment

Publications (1)

Publication Number Publication Date
CN114429435A true CN114429435A (en) 2022-05-03

Family

ID=81312398

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210081227.8A Pending CN114429435A (en) 2022-01-24 2022-01-24 Wide-field-of-view range target searching device, system and method in degraded visual environment

Country Status (1)

Country Link
CN (1) CN114429435A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115359048A (en) * 2022-10-19 2022-11-18 中国工程物理研究院应用电子学研究所 Real-time dynamic alignment measurement method based on closed-loop tracking and aiming and tracking and aiming device
CN116030449A (en) * 2023-02-17 2023-04-28 济南邦德激光股份有限公司 Automatic sorting method and automatic sorting system for laser cutting pieces

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115359048A (en) * 2022-10-19 2022-11-18 中国工程物理研究院应用电子学研究所 Real-time dynamic alignment measurement method based on closed-loop tracking and aiming and tracking and aiming device
CN115359048B (en) * 2022-10-19 2023-01-31 中国工程物理研究院应用电子学研究所 Real-time dynamic alignment measurement method based on closed-loop tracking and aiming and tracking and aiming device
CN116030449A (en) * 2023-02-17 2023-04-28 济南邦德激光股份有限公司 Automatic sorting method and automatic sorting system for laser cutting pieces
CN116030449B (en) * 2023-02-17 2023-09-01 济南邦德激光股份有限公司 Automatic sorting method and automatic sorting system for laser cutting pieces

Similar Documents

Publication Publication Date Title
CN110956651B (en) Terrain semantic perception method based on fusion of vision and vibrotactile sense
CN113269098B (en) Multi-target tracking positioning and motion state estimation method based on unmanned aerial vehicle
Zhao et al. Detection, tracking, and geolocation of moving vehicle from uav using monocular camera
CN111583136B (en) Method for simultaneously positioning and mapping autonomous mobile platform in rescue scene
CN111325797A (en) Pose estimation method based on self-supervision learning
CN106529538A (en) Method and device for positioning aircraft
US20090304231A1 (en) Method of automatically detecting and tracking successive frames in a region of interesting by an electronic imaging device
CN109341689A (en) Vision navigation method of mobile robot based on deep learning
CN112734765B (en) Mobile robot positioning method, system and medium based on fusion of instance segmentation and multiple sensors
CN106878687A (en) A kind of vehicle environment identifying system and omni-directional visual module based on multisensor
CN111679695B (en) Unmanned aerial vehicle cruising and tracking system and method based on deep learning technology
CN114429435A (en) Wide-field-of-view range target searching device, system and method in degraded visual environment
CN111862673B (en) Parking lot vehicle self-positioning and map construction method based on top view
CN205426175U (en) Fuse on -vehicle multisensor's SLAM device
CN109900274B (en) Image matching method and system
CN110009675A (en) Generate method, apparatus, medium and the equipment of disparity map
CN113313659B (en) High-precision image stitching method under multi-machine cooperative constraint
Zhang et al. Deep learning based object distance measurement method for binocular stereo vision blind area
CN113256731A (en) Target detection method and device based on monocular vision
Souza et al. Template-based autonomous navigation in urban environments
Wang et al. Monocular visual SLAM algorithm for autonomous vessel sailing in harbor area
Sambolek et al. Person detection in drone imagery
Zhao et al. Environmental perception and sensor data fusion for unmanned ground vehicle
Astrid et al. For safer navigation: Pedestrian-view intersection classification
Mitsudome et al. Autonomous mobile robot searching for persons with specific clothing on urban walkway

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination