CN116128919A

CN116128919A - Multi-temporal image abnormal target detection method and system based on polar constraint

Info

Publication number: CN116128919A
Application number: CN202211392966.5A
Authority: CN
Inventors: 尹增山; 晋诗瑶; 张如意
Original assignee: Shanghai Engineering Center for Microsatellites; Innovation Academy for Microsatellites of CAS
Current assignee: Shanghai Engineering Center for Microsatellites; Innovation Academy for Microsatellites of CAS
Priority date: 2022-11-08
Filing date: 2022-11-08
Publication date: 2023-05-16

Abstract

The invention provides a multi-temporal image abnormal target detection method and system based on epipolar constraint, which comprises the steps of firstly adopting an ASIFT algorithm to detect image characteristic points, adopting a RANSAC algorithm to remove mismatching points, introducing epipolar constraint, and calculating a basic matrix; secondly, calibrating points on an interested area in an original image, and obtaining polar lines corresponding to the calibrated points of the interested area in a target image by utilizing a basic matrix; and finally, calibrating a template image area to be matched in the original image by using a robust efficient template matching method based on the depth convolution characteristic, performing template matching in the target image, and judging whether the target is abnormal or not by comparing the position relation between the polar line and a template matching result rectangular frame in the target image. The method can effectively detect the targets with abnormal movements, has the characteristics of high accuracy, simple method, obvious effect, convenience in engineering realization and the like, and has important significance in actual production and life.

Description

Multi-temporal image abnormal target detection method and system based on polar constraint

Technical Field

The invention relates to the technical field of abnormal target detection, in particular to a multi-temporal image abnormal target detection method and system based on polar line constraint.

Background

The computer vision technology is widely applied in the fields of target detection, target identification and tracking, SLAM (instant localization and mapping), three-dimensional environment construction and the like. The moving object detection technology is a core technology in the field of computer vision and is widely applied to the fields of military, civil use and the like. In military, the method can be combined with radar technology to analyze the enemy target to be detected; the system can be used for automatic driving, vehicle monitoring, human motion vision analysis and the like in civil use. Current moving object detection techniques typically process video images, extract moving objects from a video or image sequence, and obtain characteristic information of the moving objects, such as color, shape, contour, etc. The video-based moving object detection algorithm essentially searches for differences from a continuous image sequence, detects and extracts the differences, and analyzes the differences according to whether a camera moves or not and whether the camera does not move. Common methods under the condition that the camera does not move include a continuous interframe difference method, a background difference method, an optical flow method, a motion energy method and the like, and the development place is mature at present; the camera motion is based on motion compensation, grid voxel, geometric constraint, deep learning and other methods. But the moving object detection algorithm relies on video frames for processing and requires uninterrupted detection. However, in some situations in the actual production and life process, such as disaster monitoring, the video sequence is not required to be continuously monitored, a certain time is required to be spaced for detecting a specific area and sampling picture information, and an interesting target with abnormal motion is found out by comparing two images, so that the conventional video-based moving target detection algorithm does not meet the requirement of an actual task.

Disclosure of Invention

The invention provides a multi-temporal image abnormal target detection method and system based on polar line constraint aiming at the defects in the prior art.

According to one aspect of the present invention, there is provided a multi-temporal image alienation target detection method based on epipolar constraint, including:

performing ASIFT feature point detection and matching on the acquired first image and second image to obtain a matched feature point pair;

performing mismatching elimination on the matched characteristic point pairs to obtain optimized characteristic point pairs;

calculating a basic matrix by utilizing the optimized feature point pairs;

selecting random points in the region of interest in the first image, outputting pixel coordinates of the random points, and calculating polar lines corresponding to points corresponding to the random points in the second image according to the values of the basic matrix and the pixel coordinates;

intercepting a region of interest from the first image as a template image for representing an object to be detected; searching an image area most similar to the template image in the second image by utilizing the template image, and marking the second image by adopting a marking frame;

and comparing the position relation between the polar lines and the marking frame in the second image, if the polar lines above the set threshold value are intersected with the marking frame, determining that the target to be detected is not abnormal, otherwise, determining that the target to be detected is abnormal.

Optionally, performing ASIFT feature point detection and matching on the first image and the second image, and adopting a bidirectional matching method based on feature points, including:

respectively extracting scale-invariant feature points of the first image and the second image by adopting an ASIFT algorithm;

taking the first image as a reference image and the second image as an image to be matched; acquiring characteristic points in the first image, and searching and matching the most similar characteristic points in the second image to form a characteristic point pair; the step is circulated, the matching of other characteristic points is completed, and a matching characteristic point pair omega 1 is obtained;

taking the second image as a reference image and the first image as an image to be matched; acquiring characteristic points in the second image, and searching the most similar characteristic points in the first image to match the characteristic points in the first image to form a characteristic point pair; the step is circulated, the matching of other characteristic points is completed, and a matching characteristic point pair omega 2 is obtained;

and counting the common matching point pairs belonging to the matching characteristic point pair omega 1 and the matching characteristic point pair omega 2, and taking the common matching point pairs as a final matching characteristic point pair omega.

Optionally, performing mismatching elimination on the matching feature point pairs, and adopting a RANSAC algorithm, including:

Randomly selecting a plurality of pairs of matching characteristic points from the first image and the second image to form a matching characteristic point pair data set, and solving a homography transformation matrix H by utilizing the matching characteristic point pairs, wherein the method comprises the following steps of:

setting the coordinates of the feature points in the first image as (x, y), the coordinates of the feature points in the second image as (x 1, y 1), and the homography matrix H as

The corresponding solution formula is:

listing a corresponding equation set according to the solving formula by utilizing the matching characteristic point pairs, and solving an unknown parameter H in a homography matrix H according to the equation set ₁₁ 、h ₁₂ 、h ₁₃ 、h ₂₁ 、h ₂₂ 、h ₂₃ 、h ₃₁ 、h ₃₂ ；

Calculating projection errors of all data in the data set and the homography matrix H by the matching feature points, and adding an inner point set I if the errors are smaller than a threshold value;

if the number of the elements of the current inner point set I is larger than that of the elements of the optimal inner point set I_best, updating the I_best=I, and updating the iteration times k at the same time; wherein the initial value of the optimal interior point set I_best is 0; the calculation formula of the iteration number k is as follows:

wherein, p is confidence; w is the proportion of the inner points; m is the minimum matching characteristic point pair sample number required by calculating the homography matrix H;

if the actual iteration number is greater than k, exiting projection error calculation; otherwise, adding 1 to the iteration number, and repeating the step of calculating the projection error;

And based on the finally obtained interior point set I, removing the residual matching characteristic point pairs which do not belong to the interior point set, and realizing mismatching removal.

Alternatively, p takes a value of 0.995.

Alternatively, m takes a value of 4.

Optionally, the method further comprises: and when the number of the obtained optimized feature point pairs is larger than or equal to a set number threshold, carrying out the subsequent step of calculating a basic matrix.

Optionally, calculating a base matrix using the optimized feature point pairs includes:

the space geometrical constraint relation which is satisfied between the first image and the second image and used for representing the corresponding relation between the target point to be detected and the epipolar line is determined as follows:

wherein p is _l And p _r Respectively are provided withThe method comprises the steps of taking a projection point of a target point p to be detected under a first camera and a second camera as a projection point; f is a base matrix for representing a relationship between epipolar lines and points in the constraint relationship, the base matrix F being represented as:

wherein M is _l And M _r Internal parameter matrixes of the first camera and the second camera respectively; e is an eigenvalue matrix;

randomly selecting four pairs of characteristic point pairs from the optimized characteristic point pairs to calculate a basic matrix F, and obtaining an initial value of the basic matrix F;

and carrying out iterative computation on the basic matrix F by adopting other characteristic point pairs in the optimized characteristic point pairs, accurately carrying out the value of the basic matrix F, and finally reflecting the value of the basic matrix F of the optimal space geometric constraint relation between the first image and the second image.

Optionally, selecting a random point in the region of interest in the first image, outputting a pixel coordinate of the random point, and calculating a epipolar line corresponding to a point corresponding to the random point in the second image according to the value of the basic matrix and the pixel coordinate, including:

selecting random partial pixel points of the region of interest in the first image, and outputting pixel coordinate values of the pixel points;

storing the pixel coordinate values of all selected pixel points in a matrix form to obtain a coordinate value matrix of the pixel points;

and calling a coordinate value matrix of the pixel point, and solving the polar line corresponding to the pixel point in the second image by utilizing a space geometric constraint relation between the first image and the second image.

Optionally, searching an image area most similar to the template image in the second image by using the template image, and adopting a robust template matching method of a scale-adaptive depth convolution feature, including:

intercepting a region of interest in the first image as a target to be detected and storing the region of interest as a template image img;

extracting a depth convolution feature vector with a self-adaptive scale from the template image img and the second image through a pretrained VGG-Net model, measuring the distance between the feature of the template image img and the corresponding feature of the second image by using an NCC algorithm, and detecting an image area, within the set similarity threshold, of the similarity measurement of the second image and the template image img according to the set similarity threshold, namely the image area which is the most similar to the template image in the second image;

And outputting pixel coordinate values of the upper left corner and the lower right corner of the image area, and labeling the second image.

Optionally, the set threshold of the polar line is 0.95, that is, when more than 95% of polar lines intersect with the labeling frame, the target to be detected is determined to be not subject to abnormal movement, otherwise, the target to be detected is determined to be subject to abnormal movement.

According to another aspect of the present invention, there is provided a multi-temporal image alienation target detection system based on epipolar constraint, comprising:

the ASIFT algorithm module is used for carrying out ASIFT feature point detection and matching on the acquired first image and second image to obtain matched feature point pairs;

the feature point optimizing module is used for carrying out mismatching elimination on the matched feature point pairs to obtain optimized feature point pairs;

a base matrix calculation module for calculating a base matrix using the optimized feature point pairs;

the epipolar calculation module is used for selecting random points in the region of interest in the first image, outputting pixel coordinates of the random points, and calculating epipolar corresponding to the points corresponding to the random points in the second image according to the value of the basic matrix and the pixel coordinates;

The target region labeling module is used for intercepting a region of interest from the first image to be used as a template image for reflecting a target to be detected; searching an image area most similar to the template image in the second image by utilizing the template image, and marking the second image by adopting a marking frame;

and the abnormal movement detection module is used for comparing the position relation between the polar lines in the second image and the marking frame, and if the polar lines above the set threshold value are intersected with the marking frame, the target to be detected is determined to be abnormal, otherwise, the target to be detected is determined to be abnormal.

According to a third aspect of the present invention there is provided a terminal comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor when executing the program being operable to perform the method of any one of the preceding claims or to run the system of the preceding claims.

According to a fourth aspect of the present invention there is provided a computer readable storage medium having stored thereon a computer program which when executed by a processor is operable to perform a method of any of the above, or to run a system as described above.

Due to the adoption of the technical scheme, compared with the prior art, the invention has at least one of the following beneficial effects:

the multi-temporal image abnormal target detection method and system based on polar constraint provided by the invention meet the requirements of abnormal target detection in image pairs under the conditions of long interval time and shooting at any different view angles, solve the problem that continuous frames need to be compared in the traditional video-based image moving target detection technology, have better flexibility, do not need to train a data set, avoid the problem of larger deep learning workload, and have wide application value in actual production and life.

According to the multi-temporal image abnormal target detection method and system based on epipolar constraint, epipolar constraint is introduced, and the epipolar constraint is combined with image characteristics, so that abnormal targets in two images shot at any different time and different angles are detected by manually selecting target objects to be detected.

The multi-temporal image abnormal target detection method and system based on polar constraint can effectively detect targets with abnormal motion in video, has higher accuracy, solves the problem that video or image sequences need to be monitored in real time in the traditional video-based moving target detection technology, and has important significance in actual production and life.

Drawings

Other features, objects and advantages of the present invention will become more apparent upon reading of the detailed description of non-limiting embodiments, given with reference to the accompanying drawings in which:

FIG. 1 is a flowchart illustrating a method for detecting a multi-temporal image transaction object based on epipolar constraint according to one embodiment of the present invention.

FIG. 2 is a schematic diagram of a multi-temporal image alienation target detection system based on epipolar constraint according to one embodiment of the present invention.

Fig. 3 is a schematic view of longitude and latitude angles in the ASIFT algorithm according to a preferred embodiment of the present invention.

FIG. 4 is a schematic diagram of latitude and longitude analog sampling in an ASIFT algorithm according to a preferred embodiment of the present invention; wherein (a) is a perspective view along the y-axis direction and (b) is a perspective view along the z-axis direction.

FIG. 5 is a schematic illustration of epipolar constraint in accordance with one preferred embodiment of the present invention.

Fig. 6 to 10 are schematic diagrams of image pairs taken at different viewing angles in an embodiment of the present invention, where (a) is a first image img1 and (b) is a second image img2.

Fig. 11 is a graph showing success rate curves and AUC values of the target detection method and the other four methods according to the embodiment of the present invention.

Fig. 12 to 16 are respectively a second image epipolar line drawing and matching result image and a target abnormality determination result in an image pair photographed at different angles, wherein (a) is the second image epipolar line drawing and matching result image and (b) is the target abnormality determination result.

Detailed Description

The following describes embodiments of the present invention in detail: the embodiment is implemented on the premise of the technical scheme of the invention, and detailed implementation modes and specific operation processes are given. It should be noted that variations and modifications can be made by those skilled in the art without departing from the spirit of the invention, which falls within the scope of the invention.

In order to meet the requirements of detecting abnormal targets in image pairs under the conditions of long interval time and shooting at any different view angles and solve the problem that continuous frames need to be compared in the traditional video-based image moving target detection technology, one embodiment of the invention provides a multi-temporal phase image abnormal target detection method based on polar constraint, which has good flexibility, does not need a data set to train, avoids the problem of large deep learning workload and has wide application value in actual production and life.

As shown in fig. 1, the method for detecting a multi-temporal image transaction target based on epipolar constraint provided in this embodiment may include:

s1, performing ASIFT feature point detection and matching on the acquired first image and second image to obtain a matched feature point pair. In a preferred embodiment, performing ASIFT feature point detection and matching on the first image and the second image, and adopting a bidirectional matching method based on feature points may include:

S11, respectively extracting scale-invariant feature points of a first image and a second image by adopting an ASIFT algorithm;

s12, taking the first image as a reference image and the second image as an image to be matched; acquiring characteristic points in a first image, and searching the most similar characteristic points in a second image to match the characteristic points in the second image to form a characteristic point pair; the step is circulated, the matching of other characteristic points is completed, and a matching characteristic point pair omega 1 is obtained;

s13, taking the second image as a reference image and the first image as an image to be matched; acquiring characteristic points in the second image, and searching the most similar characteristic points in the first image to match the characteristic points in the first image to form a characteristic point pair; the step is circulated, the matching of other characteristic points is completed, and a matching characteristic point pair omega 2 is obtained;

s14, counting the common matching point pairs belonging to the matching characteristic point pair omega 1 and the matching characteristic point pair omega 2, and taking the common matching point pairs as the final matching characteristic point pair omega.

S2, carrying out mismatching elimination on the matched characteristic point pairs to obtain optimized characteristic point pairs. In a preferred embodiment, this step may further comprise: and when the number of the obtained optimized feature point pairs is larger than or equal to a set number threshold, carrying out the subsequent step of calculating the basic matrix F. In a preferred embodiment, the performing mismatching elimination on the matching feature point pairs, and adopting the RANSAC algorithm may include:

S21, randomly selecting a plurality of pairs of matching characteristic point pairs from the first image and the second image to form a matching characteristic point pair data set, and solving a homography transformation matrix H by utilizing the characteristic matching point pairs, wherein the homography transformation matrix H comprises the following specific steps of:

the coordinates of the feature points in the first image are selected as (x, y), the coordinates of the feature points in the second image are selected as (x 1, y 1), and the homography matrix H is

The corresponding solution formula is:

by utilizing the characteristic matching point pairs, a corresponding equation set is listed according to a solving formula, and unknown parameters H in the homography matrix H can be solved according to the equation set ₁₁ 、h ₁₂ 、h ₁₃ 、h ₂₁ 、h ₂₂ 、h ₂₃ 、h ₃₁ 、h ₃₂ ；

S22, calculating projection errors of all data in the data set and the homography matrix H by the matching feature points, and adding an inner point set I if the errors are smaller than a threshold value;

s23, if the number of the elements of the current inner point set I is greater than that of the elements of the optimal inner point set I_best, updating I_best=I, and updating the iteration times k; wherein, the initial value of the optimal inner point set I_best is 0; the calculation formula of the iteration number k is as follows:

where p is the confidence level, preferably a value of 0.995; w is the proportion of the inner points; m is the minimum matching feature point pair sample number required by calculating the homography matrix H, and preferably, the value is 4;

s24, if the actual iteration number is greater than k, exiting; otherwise, adding 1 to the iteration number, and repeating the steps of S22 and S23;

S25, based on the finally obtained internal point set I, eliminating the residual matching characteristic point pairs which do not belong to the internal point set, and realizing mismatching elimination.

And S3, calculating a basic matrix F by utilizing the optimized characteristic point pairs. In a preferred embodiment, the calculation of the basis matrix F using the optimized feature point pairs may include:

s31, determining that the space geometric constraint relation which is satisfied between the first image and the second image and used for representing the corresponding relation between the target point to be detected and the epipolar line is:

wherein p is _l And p _r Respectively projecting points of the target point p to be detected under the first camera and the second camera; f is a base matrix for representing the relationship between epipolar lines and points in the constraint relationship, the base matrix F being represented as:

s32, randomly selecting four pairs of characteristic point pairs from the optimized characteristic point pairs to calculate a basic matrix F, and obtaining an initial value of the basic matrix F;

and S33, performing iterative computation on the basic matrix F by adopting other characteristic point pairs in the optimized characteristic point pairs, accurately calculating the value of the basic matrix F, and finally reflecting the value of the basic matrix F of the optimal space geometric constraint relation between the first image and the second image.

S4, selecting random points in the interested area in the first image, outputting pixel coordinates of the random points, and calculating polar lines corresponding to points corresponding to the random points in the second image according to the value of the basic matrix F and the pixel coordinates. In a preferred embodiment, selecting a random point in the region of interest in the first image, outputting a pixel coordinate of the random point, and calculating a epipolar line corresponding to a point corresponding to the random point in the second image according to the value of the base matrix F and the pixel coordinate, which may include:

s41, selecting random partial pixel points of the region of interest in the first image, and outputting pixel coordinate values of the pixel points;

s42, storing the pixel coordinate values of all selected pixel points in a matrix form to obtain a coordinate value matrix of the pixel points;

s43, calling a coordinate value matrix of the pixel points, and solving the polar lines corresponding to the pixel points in the second image by utilizing the space geometric constraint relation between the first image and the second image.

S5, intercepting a region of interest in the first image as a template image for reflecting an object to be detected; and searching an image area which is most similar to the template image in the second image by using the template image, and marking by using a marking frame in the second image. In a preferred embodiment, the method for matching a robust template by using a scale-adaptive depth convolution feature can include:

S51, intercepting a region of interest in a first image as a target to be detected, and storing the region of interest as a template image img;

s52, extracting a depth convolution feature vector with self-adaptive scale from the template image img and the second image through a pretrained VGG-Net model, measuring the distance between the feature of the template image img and the corresponding feature of the second image by using an NCC algorithm, and detecting an image area, within the set similarity threshold, of the similarity measurement of the second image and the template image img according to the set similarity threshold, namely the image area which is the most similar to the template image in the second image;

and S53, outputting pixel coordinate values of the upper left corner and the lower right corner of the image area, and labeling the second image.

S6, comparing the position relation between the polar lines and the marking frame in the second image, if the polar lines above the set threshold value are intersected with the marking frame, determining that the target to be detected is not abnormal, otherwise, determining that the target to be detected is abnormal. In a preferred embodiment, the threshold of the polar line may be set to 0.95, that is, when more than 95% of polar lines intersect the marking frame, it is determined that the target to be detected is not abnormal, otherwise, it is determined that the target to be detected is abnormal.

In a preferred embodiment, the target detection method provided in this embodiment may further include:

s0, acquiring a multi-phase image, wherein the multi-phase image comprises a first image and a second image.

The technical scheme provided by the embodiment of the invention is further described below with reference to the attached drawings and the preferred embodiment.

The preferred embodiment of the invention provides a multi-temporal image abnormal target detection method based on polar line constraint, which comprises the following steps:

step one, detecting and matching the characteristic points of the first image and the second image by adopting a bidirectional matching method based on the characteristic points to obtain a matched characteristic point pair, comprising the following steps:

and (1) extracting ASIFT feature points of the first image and the second image.

And (2) taking the first image as a reference image and the second image as an image to be matched. Extracting characteristic points in the first image, and searching the most similar characteristic points in the second image to match the characteristic points in the second image so as to form a characteristic point pair; then, the step is circulated, the matching of other characteristic points is completed, the number of successfully matched characteristic points is output, and the matched characteristic point pair omega 1 is recorded;

Step (3), taking the second image as a reference image, taking the first image as an image to be matched, completing characteristic point matching by taking the second image as the reference image by using the method in the step (2), and recording a characteristic point pair omega 2 successfully completed matching;

and (4) counting the common matching point pairs belonging to omega 1 and omega 2, and taking the common matching point pairs as a final matching characteristic point pair omega.

The implementation steps of the ASIFT algorithm are as follows:

(1) Construction of affine transformation model

The ASIFT algorithm firstly adopts horizontal and vertical angles to simulate all possible affine distortions to transform an image, and an affine transformation model can know that six unknown parameters exist in the model, wherein two unknown parameters representing translation can be simulated and represented by using a mathematical method, and the remaining four unknown parameters a1, a2, a3 and a4 can be represented by the following formulas:

where k represents the focal length of the camera,

represents the rotation of the image to be matched relative to the reference image, phi represents the angle of observation, phi epsilon 0, pi]T and 1 respectively represent two eigenvalues of the diagonal matrix; as shown in fig. 3, an affine camera model of an aerial image is shown, where k represents a scaling factor, t=1/cos θ.

(2) The longitude and latitude are sampled. Longitude phi and latitude theta are two key parameters for determining deformation of the optical axis of the camera and are excessively inclined

So, the latitude may be sampled in an equal ratio array t=1, a1, a2. The sampling interval is set to be equal to the sampling accuracy and sparsity

Sampling is performed. As shown in FIG. 4, the sampling interval +.>

And (3) perspective at sampling intervals of lower and observation angles phi. The points in fig. 4 represent sampling points, and fig. 4 (a) is a perspective view along the y-axis direction, and fig. 4 (b) is a perspective view along the z-axis direction.

(3) Sampling an image

The ASIFT algorithm processes images in a dual resolution mode. The method can improve the operation efficiency, stability and robustness of the algorithm by adopting a double-resolution mode, and can better solve the problem of image matching under different resolutions. In the ASIFT algorithm, when images of different resolutions are encountered, they are classified as being implemented at low resolution.

The resolution is reduced by sub-sampling the query image u and the image v to be searched by using a KXK sampling factor to obtain u '=sk GK u, v' =sk GK v. The SK is a subsampling factor, and the GK is an anti-aliasing Gaussian discrete filter and is used for carrying out anti-aliasing smoothing on the image, so that the distortion of the image in the downsampling process is reduced as much as possible.

(4) And performing SIFT feature point detection and matching on the image to obtain a matched feature point pair.

Step two, carrying out mismatching elimination on the matched characteristic point pairs in the step one by utilizing a RANSAC algorithm, and displaying the number of the mismatching eliminated matched characteristic point pairs; if the number is less than the set number threshold, the subsequent processing is not completed, and the subsequent processing can be continued due to enough feature points.

In the stereoscopic vision, under the condition that the position of an object is not changed, the pixel points in the images photographed from different visual angles are required to meet the position constraint relation, the feature matching points in the object with the abnormal motion are removed through the RANSAC algorithm and are not included in subsequent calculation, so that errors in the solution of a subsequent basic matrix F are prevented, and meanwhile, the accuracy of the solution of the subsequent polar lines is improved. The constraint relation of the RANSAC algorithm is a homography matrix H between two images, and mismatching points are removed according to H.

The RANSAC algorithm carries out the specific steps of rejecting mismatching points in two images as follows:

(1) Four pairs of matching characteristic point pairs are randomly selected from the first image and the second image to form a matching characteristic point pair data set, and a homography transformation matrix (homography matrix for short) H is solved by using the four pairs of matching characteristic point pairs. Specifically, the coordinates of the pixel points in the first image are selected as (x, y), the coordinates of the pixel points in the second image are selected as (x 1, y 1), and the homography matrix H is

The corresponding solution formula is: />

Using four pairs of matching characteristic point pairs, listing corresponding four equation sets according to a solving formula, and solving eight unknown parameters h in the homography matrix according to the four equation sets ₁₁ 、h ₁₂ 、h ₁₃ 、h ₂₁ 、h ₂₂ 、h ₂₃ 、h ₃₁ 、h ₃₂ ；

(2) Calculating projection errors of all data and homography matrix H in the data set by the matching feature points, and adding an inner point set I if the errors are smaller than a threshold value;

(3) If the number of the elements of the current inner point set I is larger than that of the elements of the optimal inner point set I_best, updating the I_best=I, and updating the iteration times k at the same time; wherein, the initial value of the optimal interior point set I_best is 0, and is continuously updated in the follow-up process; the calculation formula of the iteration number k is as follows:

wherein, p is confidence coefficient, and is generally 0.995; w is the proportion of the inner points; m is the minimum matching feature point pair sample number required for calculating the homography matrix h=4;

(4) If the actual iteration number is greater than k, exiting; otherwise, adding 1 to the iteration times, and repeating the steps (2) and (3);

(5) Based on the internal point set I finally obtained in the steps, the residual matching characteristic point pairs which do not belong to the internal point set are removed, and the false matching point removal is realized.

And thirdly, calculating a basic matrix F by utilizing the feature point pairs subjected to mismatching elimination.

It is known that in binocular vision, epipolar geometry refers to the spatial geometrical constraint that should be satisfied between two images by taking the same object at different locations. This constraint reflects a point-to-line correspondence, the line corresponding to the point being called epipolar line, while the relationship between epipolar line and point is represented by the basis matrix F. As shown in FIG. 5, the slave O is represented _l And O _r Point observation and shooting of the same position, O _l 、O _r Representing the positions of the left and right cameras; i _l And I _r Representing imaging planes of the left and right cameras; e, e _l Represents O _r Projection point e on left imaging plane _r Represents O _l Projection point at right imaging plane, e _l 、e _r Called poles; l (L) _l And L _r Pole-crossing polar lines; p is p _l And p _r Is observed point p at camera O _l And camera O _r A lower proxel.

(1) The formula for determining the spatial geometry constraint relationship (epipolar constraint) is as follows:

wherein F represents a basic matrix, and the calculation formula is as follows:

wherein M is _l And M _r Respectively represent camera O _l And camera O _r Is a matrix of internal parameters; e represents the eigenvalue matrix.

(2) Calculating the basis matrix F

The solving method of the basic matrix F is as follows: since F is a 3 x 3 matrix, there are eight unknown parameters, including the internal and external parameters of the camera, at least eight linear equations are required to solve, corresponding to the point location information, since one set of point pairs can list two equations, at least four sets of point pairs are required. In order to improve the accuracy of solving the basic matrix F, the method adopts the RANSAC algorithm to remove the optimized characteristic point pairs obtained after the mismatching point pairs are removed, and carries out continuous iterative calculation to continuously remove the inner points with larger distance from the polar lines, so that the number of the inner points meeting the constraint of the basic matrix is more as much as possible, and the accuracy of the basic matrix is improved; until the sum of all the remaining interior points from the epipolar line is within a certain threshold range, the final basis matrix F is calculated with all the interior point sets.

Four pairs of optimized feature point pairs are randomly selected from the first image and the second image to calculate a basic matrix F, an initial value of the basic matrix F is obtained, then iterative calculation of other optimized matching point pairs in the first image and the second image is carried out, the value of the basic matrix F is gradually and accurately calculated, and finally a matrix value which can reflect the pixel position constraint relation in the image pair most accurately is obtained.

And step four, selecting random points in the region of interest in the first image by using a human-computer interaction mouse, outputting pixel coordinates of the points, calculating polar lines corresponding to the points in the second image 2 according to the value of the basic matrix F in the step three and the pixel coordinates of the points in the output first image, and marking in the figure.

Calculating the value of the basic matrix F by the third step; when the pixel coordinates of the specified point in the first image are known, the positional relationship restored to the second image is embodied in a straight line manner by the basis matrix F, that is, the correspondence relationship of the point and the line, that is, the positional relationship of the pixel point and the epipolar line. The specific steps of drawing the polar lines of the designated points are as follows:

(1) Selecting a first image with a mouseRandom partial pixel point in the wanted detection area and outputting the pixel coordinate value x of the point _i ，y _i ]；

(2) Storing the coordinate values of all selected pixel points in a matrix form;

(3) Invoking the pixel coordinate matrix of the midpoint in the step (2) and utilizing a formula

And solving the polar lines corresponding to the pixel points in the second image, and marking in the second image.

Step five, intercepting a region of interest in a first image by using a rectangular frame, and storing the region of interest as a template image for reflecting a target to be detected; and searching an image area most similar to the template image by using a robust template matching method of scale self-adaptive depth convolution characteristics in the second image by using the template image, and marking the image area by using a rectangular frame in the second image.

The template matching method based on scale self-adaptive depth convolution feature extraction comprises the following specific steps:

(1) Scale-adaptive depth convolution feature extraction

In the scale-adaptive feature extraction method, feature vectors are extracted from templates and input images using VGG-Net. Unlike the common cnns-based method, the method does not scale the template or image to a specific size, such as 224 x 224, but adaptively identifies the target layer of VGG-Net and extracts feature vectors from the target layer. The CNNs of each layer have a receptive field of Rfl, with the width of the receptive field of the first layer Rfl being defined as:

For simplicity, the number of layers/is set depending on their order, e.g., conv1-1 is 1 layer, conv1-2 is 2 layers, pool-1 is 3 layers. f (f) ^l Representing the size of the first layer filter, s ⁱ Representing the step size of the i-th layer. If the template size is smaller than the receptive field size of the target layer, then the layerAn outer region of nonsensical template zero padding is processed. Thus, the size of the target layer receptive field is limited to be less than or equal to the size of the template. Thus, the target layer i is denoted as:

l*＝max(l-k,1)s.t.Rfl≤min(w,h)

in order to satisfy the condition in the formula, k represents a constant coefficient, k should satisfy 0 or more, and in the algorithm, k is uniformly set to 3 to facilitate the consistency of the subsequent operations. Wherein the receive field size of the l-3 layer is about half of min (w, h), w representing the pixel width of the template image and h representing the pixel height of the template image. Because there is also a pooling layer between the first layer and the first-3 layer. The template image T and the target image I are then input into the CNNs, and template features N and image feature maps M are extracted from the target layer. The method only needs to calculate the convolution feature vector for each template and image once, and the efficiency is far higher than that of searching the image by using a native sliding window mode.

(2) Similarity measurement method based on NCC

The most similar patch was found using NCC (normalized cross correlation) between M and N. First, by:

calculate NCC between M and N: wherein the method comprises the steps of

Feature map representing a region to be matched searched in an image, the width w thereof _f And height h _f And N is equal in size. Then, the position (i×j) where the NCC value is maximum is calculated by the formula.

(3) Position optimization

Since the method is mapping on the convolution feature layer, the step size of the sliding window is considered to be the size of the target layer. Thus, to obtain higher accuracy, the method is used forThe location of the domain is optimized. First, the initial frame position is set by the position (i, j) where the NCC value is the largest

Wherein->

Is the upper left corner position of the initial box,

is the lower right corner position of the initial box. Then, the block is taken near the maximum position (i×j×) on NCC to refine. The detailed steps are that if one wants to obtain refined x ₁ A3 x 4patch was used, the weight of which was expressed as:

wherein u, v respectively represent the value ranges [ -1,1] and [ -2,1], corresponding to the size of 3 x 4 patch.

In the method, firstly, a region of interest is intercepted in a first image and is used as a target to be detected, then the target is saved as a template image img, then the template image img and a second image are subjected to pretrained VGG-Net, a depth convolution feature vector with self-adaptive scale is extracted, then the NCC is utilized to measure the distance between the feature of the template image img and the corresponding feature of the second image, the size of a threshold is designated, and the region, with the similarity measure of the second image and the template image img, within a given threshold range is detected. The region meeting the threshold range is designated as a successfully matched region, and the pixel coordinate values of the upper left corner and the lower right corner of the region are output.

And step six, comparing the position relation between the polar lines and the rectangular frame in the second image, if more than 95% of polar lines are intersected with the rectangular frame, determining that the target to be detected is not subject to abnormal movement, otherwise, determining that the target to be detected is subject to abnormal movement.

Experiments show that the template matching algorithm provided in the fifth step has higher matching accuracy, and can output the actual position of the template image in the second image more accurately. The template image successfully matched in the second image reflects the actual position of the target to be detected in the second image, and the polar line drawn in the fourth step reflects the position of the object in the second image under the condition that the object is not moved, so that the position relation of the polar line and the rectangular frame reflects the change condition of the position of the object. Therefore, by judging the position relation of the epipolar line and the template image rectangular frame successfully matched in the second image, whether the object is abnormal or not can be known. Setting the threshold to 0.95; if more than 95% of the total number of the polar lines have an intersecting relation with the rectangular frame, judging that the object is still in situ and has no abnormal movement; otherwise, if most polar lines are separated from the position relation of the rectangular frame, the object is considered to be abnormal.

The technical scheme provided by the embodiment of the invention is further described below with reference to a specific application example.

The specific application example is described below for five image pairs of the same area and at different viewing angles photographed by a mobile phone, as shown in (a) and (b) of fig. 6 to 10.

In this specific application example, the multi-temporal image abnormal target detection method based on epipolar constraint includes the following steps:

step one, extracting and matching the image feature points by using an ASIFT algorithm, wherein the feature point number of the feature point extraction and matching and the time spent in the matching stage are shown in table 1.

Table 1ASIFT algorithm extraction and match feature points and match time

And step two, carrying out mismatching elimination on the matching characteristic points by using a RANSAC algorithm, and eliminating the mismatched internal points by using the RANSAC algorithm as shown in table 2.

TABLE 2 number of interior points after RANSAC mismatching culling

	img1-img2 matching internal points
		(a)	4131
(b)	40
		(c)	172
(d)	7807
		(e)	10180

And step four, selecting random points in the interested area in img1 by using the human-computer interaction mouse, and outputting pixel coordinates of the points. And (3) calculating the polar lines corresponding to the points in img2 according to the value of the basic matrix F in the step three and the pixel coordinates of the points in img1, and displaying the polar lines in the figure by using different colors at random.

Step five, intercepting the region of interest by using a rectangular frame in img1 and saving the region of interest as a template image; the template image is used for searching an image area which is most similar to the template image in img2 by using a robust template matching method of scale self-adaptive depth convolution characteristics, and a rectangular frame is used for marking in the image 2. In this example, 105 template-image pairs are first sampled with 35 videos (3 pairs per video) in the tracking dataset. The templates were randomly selected and the image was sampled 20 frames after the template was captured. For each pair of measured ground trunk and predicted box intersection ratios IoU, quantitative comparisons were then made using different IoU thresholds and area under the curve (AUC), and the template matching method employed herein based on scale-adaptive depth convolution feature extraction was compared to the four methods SSD, NCC, HM, BBS, as shown in fig. 11.

Meanwhile, the IoU quantitative analysis is utilized to quantitatively analyze the effect of the template matching algorithm extracted by the scale self-adaptive depth convolution characteristic on five images selected in the example, and the template matching algorithm is compared with other four template matching methods, and the comparison result is shown in a table 3.

Table 3 algorithms IoU comparison for five images

And step six, comparing the position relation between the polar lines and the rectangular frame in the image 2, if more than 95% of polar lines are intersected with the rectangular frame, determining that the object is not subject to abnormal movement, otherwise, determining that the object is subject to abnormal movement. The ASIFT algorithm and the template matching method based on the scale self-adaptive depth convolution feature extraction adopted in the method have good performance, so that the target detection method provided by the embodiment of the invention has higher accuracy. In this example, the final output of img2 and the program judgment result of the five image pairs are respectively shown in fig. 12 to fig. 16 (a) and (b), and the abnormal situation of the object position is correctly output, so that the content of the invention has good application effect and can be applied to production and life.

The embodiment of the invention provides a multi-temporal image abnormal target detection system based on epipolar constraint.

As shown in fig. 2, the multi-temporal image transaction object detection system based on epipolar constraint provided in this embodiment may include:

the ASIFT algorithm module is used for carrying out ASIFT feature point detection and matching on the first image and the second image to obtain matched feature point pairs;

a base matrix calculation module for calculating a base matrix F using the optimized feature point pairs;

the epipolar calculation module is used for selecting random points in the region of interest in the first image, outputting pixel coordinates of the random points, and calculating epipolar corresponding to the points corresponding to the random points in the second image according to the value of the basic matrix F and the pixel coordinates;

the target region labeling module is used for intercepting a region of interest in the first image to be used as a template image for reflecting a target to be detected; searching an image area most similar to the template image in the second image by using the template image, and marking the second image by using a marking frame;

In a preferred embodiment, the object detection system provided in this embodiment may further include:

And the image acquisition module is used for acquiring a multi-phase image, wherein the multi-phase image comprises a first image and a second image.

It should be noted that, the steps in the method provided by the present invention may be implemented by using corresponding modules, devices, units, etc. in the system, and those skilled in the art may refer to a technical solution of the method to implement the composition of the system, that is, the embodiment in the method may be understood as a preferred example of constructing the system, which is not described herein.

An embodiment of the present invention provides a terminal including a memory, a processor, and a computer program stored on the memory and executable on the processor, where the processor, when executing the program, is operative to perform a method according to any one of the foregoing embodiments of the present invention, or to perform a system according to any one of the foregoing embodiments of the present invention.

Optionally, a memory for storing a program; memory, which may include volatile memory (english) such as random-access memory (RAM), such as static random-access memory (SRAM), double data rate synchronous dynamic random-access memory (Double Data Rate Synchronous Dynamic Random Access Memory, DDR SDRAM), and the like; the memory may also include a non-volatile memory (English) such as a flash memory (English). The memory is used to store computer programs (e.g., application programs, functional modules, etc. that implement the methods described above), computer instructions, etc., which may be stored in one or more memories in a partitioned manner. And the above-described computer programs, computer instructions, data, etc. may be invoked by a processor.

The computer programs, computer instructions, etc. described above may be stored in one or more memories in partitions. And the above-described computer programs, computer instructions, data, etc. may be invoked by a processor.

And a processor for executing the computer program stored in the memory to implement the steps in the method or the modules of the system according to the above embodiments. Reference may be made in particular to the description of the previous method and system embodiments.

The processor and the memory may be separate structures or may be integrated structures that are integrated together. When the processor and the memory are separate structures, the memory and the processor may be connected by a bus coupling.

An embodiment of the present invention also provides a computer readable storage medium having stored thereon a computer program which, when executed by a processor, is operative to perform the method of any of the above embodiments of the present invention or to run the system of any of the above embodiments of the present invention.

The method and the system for detecting the multi-temporal image abnormal target based on the polar constraint solve the problems that most of current moving target detection and tracking algorithms are based on video or image sequences, continuous monitoring of pictures is needed, and a plurality of video frames or a plurality of sequence images are needed to be extracted for comparison. According to the method and the system, epipolar constraint is introduced, epipolar constraint is combined with image characteristics, and abnormal target detection in two images photographed at different angles at any different time is realized by manually selecting target objects to be detected. Firstly, adopting an ASIFT algorithm to detect image feature points, adopting a RANSAC algorithm to remove mismatching points, combining stereoscopic vision, introducing epipolar constraint, and obtaining a basic matrix F between image pairs; secondly, calibrating points on an interested area in an original image, and obtaining polar lines corresponding to the calibrated points of the interested area in a target image by utilizing a basic matrix F; and finally, calibrating a template image area to be matched in the original image by using a robust efficient template matching method based on the depth convolution characteristic, performing template matching in the target image, and judging whether the target is abnormal or not by comparing the position relation between the polar line and a template matching result rectangular frame in the target image. The result shows that the method and the system can effectively detect the targets with abnormal movements, have higher accuracy, solve the problem that the video-based moving target detection algorithm needs to monitor the video or the image sequence in real time, have the characteristics of simple method, obvious effect, convenience for engineering realization and the like, and have important significance in actual production and life.

Those skilled in the art will appreciate that the invention provides a system and its individual devices that can be implemented entirely by logic programming of method steps, in the form of logic gates, switches, application specific integrated circuits, programmable logic controllers, embedded microcontrollers, etc., in addition to the system and its individual devices being implemented in pure computer readable program code. Therefore, the system and various devices thereof provided by the present invention may be considered as a hardware component, and the devices included therein for implementing various functions may also be considered as structures within the hardware component; means for achieving the various functions may also be considered as being either a software module that implements the method or a structure within a hardware component.

The foregoing embodiments of the present invention are not all well known in the art.

The foregoing describes specific embodiments of the present invention. It is to be understood that the invention is not limited to the particular embodiments described above, and that various changes and modifications may be made by one skilled in the art within the scope of the claims without affecting the spirit of the invention.

Claims

1. The method for detecting the multi-temporal image abnormal target based on polar constraint is characterized by comprising the following steps of:

calculating a basic matrix by utilizing the optimized feature point pairs;

2. The epipolar constraint-based multi-temporal image anomaly target detection method of claim 1, wherein the performing of ASIFT feature point detection and matching on the first image and the second image employs a feature point-based bi-directional matching method comprising:

3. The epipolar constraint-based multi-temporal image anomaly target detection method of claim 1, wherein the performing mismatching rejection on the matching feature point pairs, using a RANSAC algorithm, comprises:

The corresponding solution formula is:

4. The epipolar constrained multi-temporal image anomaly target detection method of claim 3, further comprising:

and when the number of the obtained optimized feature point pairs is larger than or equal to a set number threshold, carrying out the subsequent step of calculating a basic matrix.

5. The epipolar constrained multi-temporal image anomaly target detection method of claim 1, wherein computing a basis matrix using the optimized feature point pairs comprises:

wherein p is _l And p _r Respectively projecting points of the target point p to be detected under the first camera and the second camera; f is a base matrix for representing a relationship between epipolar lines and points in the constraint relationship, the base matrix F being represented as:

6. The epipolar constraint-based multi-temporal image anomaly target detection method of claim 1, wherein selecting random points in a region of interest in the first image, outputting pixel coordinates of the random points, and calculating epipolar lines corresponding to points corresponding to the random points in the second image according to values of the base matrix and the pixel coordinates, comprising:

7. The epipolar constraint-based multi-temporal image alienation target detection method according to claim 1, wherein the method for using the template image to find an image region most similar to the template image in the second image comprises the steps of:

8. The method for detecting the multi-temporal image abnormal target based on epipolar constraint according to claim 1, wherein the set threshold of epipolar is 0.95, namely, when more than 95% of epipolar is intersected with the labeling frame, the target to be detected is determined to be abnormal, otherwise, the target to be detected is determined to be abnormal.

9. A multi-temporal image alienation target detection system based on epipolar constraint, comprising:

10. A terminal comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor is operable to perform the method of any one of claims 1-8 or to run the system of claim 9 when the program is executed by the processor.

11. A computer readable storage medium having stored thereon a computer program, which when executed by a processor is operative to perform the method of any one of claims 1-8 or to run the system of claim 9.