WO2022000838A1

WO2022000838A1 - Markov random field-based method for labeling remote control tower video target

Info

Publication number: WO2022000838A1
Application number: PCT/CN2020/118643
Authority: WO
Inventors: 何亮; 程先峰; 杨恺; 叶鑫鑫; 刘胜新
Original assignee: 南京莱斯信息技术股份有限公司
Priority date: 2020-07-03
Filing date: 2020-09-29
Publication date: 2022-01-06
Also published as: CN111814654A; CN111814654B

Abstract

A Markov random field-based method for labeling a remote control tower video target, comprising the steps of: 1) establishing a model; 2) using a greedy algorithm to solve a sparse representation of a sequence of consecutive video frames, and obtaining a preliminary estimation of the background; 3) using a recurrent neural network to solve an image segmentation problem, and obtaining a foreground target tracking result and a background estimation; 4) using the nearest neighbor method to establish a correspondence between the positions of target coordinate points in the world coordinate system and automatic dependent surveillance broadcast data, so as to associate label information in the automatic dependent surveillance broadcast data with a video, thus achieving automatic labeling. The described method utilizes a sparse sampling means to reduce a data set of a calculation operation and reduce the complexity of solving a background. By using the background as an input and using a Hopfield network self-optimizing feature, an optimized estimation of a foreground target is automatically formed.

Description

Method of hanging signage on remote tower video target based on Markov random field

technical field

The invention belongs to the technical field of remote towers, and specifically refers to a method for hanging signs on video objects of remote towers based on a Markov random field.

Background technique

At present, with the accelerated pace of people's life, air travel has become an important way of travel, and the construction of general airports is also accelerating step by step. It is expected that the total number of domestic general airports will exceed 2,000 in 2030; The number of airport flights is small, and the daily income is limited. According to the traditional airport construction and control construction planning tower, its construction cost and operating cost cannot be offset and benefited in the regular operation cycle. Moreover, the explosive growth of regional airports and general aviation airports will inevitably drive the demand for controller talents, and the training of controller talents cannot fully keep up with the needs of airport construction. In addition, the need for apron control handover and runway expansion is further driving the development of remote tower technology.

Remote tower video surveillance can effectively help controllers to manage surface traffic, but video surveillance can only provide image information. The video target automatic signage function can intuitively and accurately display the flight number, speed, model and other signage information in the video, effectively reduce the controller's control load, improve the control efficiency, and ensure the control safety.

The existing video and surveillance data fusion automatic labeling methods mainly use the background difference method and KLT algorithm to realize the detection and tracking of the aircraft, select the target center point in a single frame image as the video position coordinates, and perform coordinate transformation on the aircraft latitude and longitude in the surveillance data. It is mapped with the video position coordinates, but the single-frame image coordinate mapping method has the problem of signage delay and loss.

The mixed Gaussian model is used to establish the background model, the coordinates of the aircraft image are obtained by the background difference method, and then the feature points are selected on the airport map and the video image to establish the mapping relationship, so as to realize the image tracking data and broadcast automatic dependent surveillance (ADS-B, Automatic Dependent Surveillance). Surveillance Broadcast) data fusion, this method uses covariance matrix and homography mapping to correct measurement errors, focusing on reducing the correlation error between image detection results and radar tracking results, ignoring video tracking results errors. At the same time, the impact of hardware cost is ignored. For example, the method of single-frame matching and association is used. For each video frame, it is necessary to process the workflow of image target detection, coordinate mapping, error correction, database search and correlation monitoring data, which is affected by system performance. When dealing with consecutive multi-frame targets, there may be a delay or target loss.

In the motion detection model method, the motion segmentation method classifies the pixels according to the motion mode. The common one is the KLT method, which decomposes the image into different motion levels according to the vector velocity field of the moving target on the pixel surface and different motion parameters. The method does not require prior information, but the calculation is complex and the hardware cost is high.

SUMMARY OF THE INVENTION

In view of the above-mentioned deficiencies of the prior art, the purpose of the present invention is to provide a method for hanging a signage on a remote tower video target based on a Markov random field. It takes the background as an input, and uses the Hopfield network's autonomous optimization characteristics to automatically form an optimal estimate of the foreground target.

For achieving the above object, the technical scheme adopted in the present invention is as follows:

A method for hanging a signage of a remote tower video target based on a Markov random field of the present invention, the steps are as follows:

1) Build the model: Assuming that the background images in consecutive video frames are linearly correlated, the moving objects are regarded as pixels that cannot be included in the background matrix during the linear decomposition of the video sequence. Pixels are classified into background and foreground;

2) Use a greedy algorithm to solve the sparse representation of the continuous video frame sequence to obtain a preliminary estimate of the background;

3) Use the recursive (Hopfield) neural network to solve the image segmentation problem, and obtain an estimate of the foreground label set; use the foreground label set to revise the preliminary estimate of the background obtained in step 2), and obtain the foreground target tracking result and background estimation ;

4) Use the pinhole perspective model to establish the transformation matrix from the video image coordinate system to the world coordinate system, and solve the coordinate position of the foreground target tracking result in the video frame under the world coordinate system; use the nearest neighbor method to establish the above-mentioned world coordinate system. The corresponding relationship between the target coordinate point position and the ADS-B broadcast data, so as to associate the sign information in the ADS-B broadcast with the video, and realize the automatic hanging of the sign.

Further, the step 1) comprises: I _t ∈R ^m using a vector represented by the video sequence in the t-th frame is stacked to form columns, the frame includes m _{pixels; D = [I 1, ...} , I _^n] ∈R ^{m ×} _n matrix by a vector showing a frame composed of I, representing the entire video sequence comprising n frames; B∈R ^{m × n} matrix with the same dimension D, representing the background video frame, the It consists of n frame vectors, each frame has m pixels; the kth pixel of the tth frame is denoted as kt; the intensity of the background is measured by the image grayscale, and the continuous video frame is considered to be a continuous video frame when the lighting conditions are basically unchanged during the inspection period. The background intensity in the sequence is basically unchanged, so for a continuous video sequence D, the background images in each frame are considered to be linearly related, and the moving target is regarded as a pixel that cannot be included in the background matrix B during the linear decomposition of the video sequence. Denoted as foreground E, the target in the current frame t is regarded as a linear representation in the subspace stretched by the vector of the previous t-1 frame, and the matrix composed of the previous t-1 frame is D _t-1 =[I ₁ ,...,I _t-1 ], then the image of the t-th frame is recorded as:

y _t =B+E=D _t-1 x+E (1)

_{The matrix B=D t-1} x composed of the background in each frame is a low-rank matrix, that is, the background matrix B satisfies rank(B)≤K, K is a predefined constant, and the coefficient x is a sparse vector; considering the noise in the scene , and assuming that the noise obeys ^{the Gaussian white noise with mean 0 and variance σ 2} , the video frame signal of formula (1) is expressed as:

Wherein, I is a unit matrix, x _e I represents white Gaussian noise, and under the influence of noise, the size of the pixel gray value of the video image of the t-th frame is denoted as y _kt =B _kt +e _kt =ψ _kt x+e _kt ; Define the binary label support set S∈{0,1} ^m×n as the image pixel label, and its elements are specified as:

Then the background modeling problem boils down to solving the optimization problem shown in the following equation (4):

When _Skt = 1, that is, the pixel kt belongs to the foreground, the background is covered by the foreground, and the grayscale of the video frame signal is the same as the foreground, so the detection of the target is actually the estimation of the foreground label set; The interaction causes the image label field to be non-segmentally smooth. Define E _{smooth to} record the degree of non-segmental smoothness of the label field, and E _{data to} record the error between the label and the measured data. The problem of estimating the foreground label set is transformed into the energy optimization of the label field. The problem, i.e.:

E(S)=E _smooth (S)+E _data (S) (5)

get the minimum value;

Define an orthogonal projection of a matrix X in the linear matrix space of the support set S:

is the complement of Γ _S (X), we have

The detection of the dynamic aircraft target y in the video frame is the minimization of the following energy function;

where the parameter α>0 is a constant related to the sparsity of the coefficient vector x, which controls the complexity of the background.

Further, the step 2) specifically includes: assuming that the optimal support set estimate S has been obtained, the formula (7) is simplified to the following optimization problem:

Use a Gaussian random matrix Φ as the RIP matrix to compress and sample the observation y:

z=Φy=ΦΨx=Θx (9)

Then the problem shown in Equation (8) is transformed into the L1 normal form minimization problem shown in Equation (10):

min||x|| ₁ st||Φy-Θx|| ₂ ≤ε (10)

During initialization, a short video at the beginning of the video is used as a training frame, the background complexity is known, the influence of parameter α is ignored, and α=1, and the greedy algorithm is used to solve (10) to obtain the initial background estimate. The optimization solution of the foreground label set further optimizes the background estimation, and in the subsequent iterations, the current frame y is used to replace the template with the smallest sparse representation coefficient x in _{D t-1.}

Further, the step 3) specifically includes: when the sparse coefficient x is given, the energy function shown in formula (7) is converted into:

in,

Given x, the constant C is also determined; in order to obtain the estimation of the support S in Eq. (11), so as to obtain the foreground image in each frame, the image based on Markov random fields (MRFs) is used. segmentation method;

Use G={(i,j)|0≤i≤h,0≤j≤w} to represent the set of all pixels in the h×w image of the current frame, and g=(i,j)∈G to represent the two-dimensional image The pixel in the i-th row and the j-th column defines the neighborhood of the pixel as N _g ={f∈G|[dist(f,g)] ² ≤r,f≠g}, where dist(f,g ) represents the Euclidean distance between pixel positions; for the subset c in the image G, when each pair of different elements is always adjacent, a cluster is formed, and C is the set of all clusters c;

For each pixel position g on the image, it corresponds to a ^{random value in a label support set S∈{0,1} m×n} . It is assumed that the local conditional probability of the label value of the foreground pixel only varies with the state of its neighborhood. The value set S of the pixel label including the positional relationship is a Markov random field about the neighborhood system N, according to the observed image data Y, the value of the pixel label is selected according to the observed image data Y. The value can be derived from the Bayesian criterion:

Among them, P(Y) is the prior distribution of the observed data, and a given video frame image can be regarded as a constant; P(S) is the prior distribution of the label field. According to the Hammersley-Cliford theorem, the potential of a given cluster is function V _c (l _c ), using

Fitting the label field prior distribution, l _c represents a reference point on the cluster c,

is the sum of the energy of the potential functions on each cluster; the definition of the potential function in the Ising model is:

in,

is the label at point g of the image pixel in the t-th frame, q is the point on the neighborhood of g,

k is the Boltzmann constant, and when the temperature T is constant, β is a constant; at this time, the prior distribution of the label field is:

P(Y|S) is the likelihood probability. Usually, it is assumed that each pixel is independent and has the same Gaussian distribution. The likelihood probability is regarded as the product of the likelihood probability at each pixel: P(Y|S)=∏ _{g∈ G} P(y _g |s _g ), take the logarithm of it to get:

in,

and

are the mean and variance of the Gaussian distribution obeyed by each label, respectively; the maximum a posteriori probability (Maximum A Posteriori, MAP) criterion is selected as the optimal judgment criterion for image segmentation, then the optimal solution of the objective function is to maximize the equation (12). For the solution of the posterior probability, take the logarithm of both sides to obtain the following objective function:

The optimal solution of the objective function shown in Eq. (16) is solved by using the self-optimization characteristic of the recurrent neural network.

Further, the step 3) specifically also includes: let u _k , v _k be the input and output voltages of the kth neuron in the recurrent neural network, respectively, R _k , C _k are their input resistance and input capacitance, respectively, I _k is the bias current, g _k (u _k ) is the transfer function _{of the neuron, ω jk} is the connection resistance between the neuron j and the neuron k, that is, the connection weight, the overall energy function of the network usually has the following form:

Taking the derivative of the above energy function with respect to time, we have:

Since C _k > 0, the sigmoid function is selected

As a transfer function, g ^-1 is a monotone non-decreasing function, and

At this time, the energy function shown in Equation (17) shows a downward decay trend as a whole over time, and is simplified as:

When the network is stable, the energy function converges to the minimum value, so the recurrent neural network realizes the autonomous iterative optimization of the input signal;

According to the autonomous optimization characteristics of the recurrent neural network, the images are labeled

As an input to a recurrent neural network, while setting the network's bias current

According to equation (19), the energy function of the network is:

Binarize the image, at this time the pixel value on the image

Equivalent to label

The 8-neighborhood second-order system model is used to model the image label field, and the Ising function shown in equation (13) is selected as the potential function to obtain the estimation of the foreground label:

in,

is a constant term. Comparing equations (20) and (21), it is found that the estimation of the foreground label is regarded as an autonomous optimization solution to the minimum value of the energy function of the recurrent neural network shown in (20).

Further, the step 4) specifically includes: obtaining the tracking and monitoring of the aircraft target in the video image coordinate system by estimating the above-mentioned background and foreground, establishing the mapping relationship between the image pixel coordinates and the world coordinates, and finding relevant information in the radar tracking results. aircraft placard information;

Assuming that the coordinates of the target point in the pixel plane coordinate system are (u, v) ^T , and the coordinates in the world coordinate system are (x, y, z) ^T , using the pinhole perspective model, the pixel plane coordinates of the target point are obtained in the world Coordinate conversion relationship:

Among them, f _x , f _y are parameters representing the focal length, (u ₀ , v ₀ ) ^{T is the position of the} main point relative to the image plane (projection plane), that is, the intersection of the main optical axis and the image plane; z _c is the pixel The offset of the plane origin relative to the origin of the camera coordinate system is a constant; R is the rotation matrix of the camera, and T is the translation matrix, denoted:

Then (22) simplifies to:

p _i = KCp _w (23)

Using Markov random field and sparse background to solve, obtain the foreground target in the continuous video frame, adopt the batch method, note P _i =[p _i1 ,p _i2 ,...,p _it ] The target pixel coordinate vector in consecutive t frames is composed of , the corresponding matrix in the world coordinate system is P _w =[p _w1 ,p _w2 ,...p _wt ], then (23) becomes:

P _i = KCP _w (24)

According to the formula (24), the coordinates of the tracking result of the foreground target in the world coordinate system are obtained, and the nearest neighbor method is used to establish the corresponding relationship between the video tracking coordinates and the ADS-B data, so as to realize the data association, so as to integrate the ADS-B in the data. The signage information of the flight number is associated with the video to realize automatic signage.

Beneficial effects of the present invention:

1. The background is modeled as a sparse representation of a continuous video frame sequence, and the complexity of the background solution can be reduced by using a greedy algorithm to solve the sparse signal recovery problem.

2. Define the foreground solution as an image segmentation problem based on the Markov random field. On the basis of obtaining the background layer, use the independent optimization characteristics of the Hopfield network to establish the corresponding relationship between the network input and the Markov random field energy function of the foreground modeling, Automatically optimize the image label set to obtain a smooth foreground target. The foreground target can be fed back to the background solution process, and the number of iterations controls the computational complexity of the overall foreground and background estimation.

3. After the moving target is automatically captured in the continuous frame video image, the corresponding relationship between the image coordinates and the broadcast automatic correlation monitoring data is established through coordinate conversion, and the single-frame look-up table mapping method is transformed into the batch processing method through the transformation matrix. The target image coordinates in consecutive frames are converted into world coordinates, and then the database is searched and correlated with the broadcast ADS-C data according to the nearest neighbor principle, which reduces the problem of signage delay and target loss caused by processing performance limitations to a certain extent.

Description of drawings

FIG. 1 is a schematic diagram of the method of the present invention.

Figure 2 is a diagram of a recurrent neural network neuron model.

detailed description

Terminology Description:

Sparse: If the linear representation of a real-valued, finite-length one-dimensional discrete signal y∈R ^N contains only K bases, the signal y is said to be K-sparse, and K is called the sparsity of the signal y.

Compressed sampling: Also known as Compressive sensing or Sparse sampling, it exploits the sparse characteristics of the signal and uses random sampling to obtain discrete samples of the signal under the condition that the sampling rate is much smaller than the Nyquist sampling rate. The signal is then perfectly reconstructed by a nonlinear reconstruction algorithm.

Image segmentation: The technology and process of dividing an image into several specific regions with unique properties and proposing objects of interest. It is a computer vision task of marking designated regions according to the content of the image.

Markov random field: A random field with Markov properties. When a value in the phase space is randomly assigned to each position according to a certain distribution, the whole is called a random field; and the Markov property refers to that when a sequence of random variables is arranged in time sequence, the N+1th moment The distribution characteristics of , have nothing to do with the value of the random variable before N time.

In order to facilitate the understanding of those skilled in the art, the present invention will be further described below with reference to the embodiments and the accompanying drawings, and the contents mentioned in the embodiments are not intended to limit the present invention.

Referring to shown in Figure 1, a kind of remote tower video target hanging sign method based on Markov random field of the present invention, the steps are as follows:

Said step 1) comprises: I _t ∈R ^m using a vector represented by the video sequence in the t-th frame is stacked to form columns, the frame includes m _{pixels; D = [I 1, ...} , I n] ∈ R ^{m×n is} a matrix composed of a vector I representing a frame, representing the entire video sequence including n frames; B ∈ R ^m×n is a matrix of the same dimension as D, representing the background in the video frame, which is represented by the n frame vector It consists of m pixels per frame; the k-th pixel of the t-th frame is denoted as kt; the intensity of the background is measured by the image grayscale, and the lighting conditions are basically unchanged during the inspection period, it is considered that the background in the continuous video frame sequence The intensity is basically unchanged, so for a continuous video sequence D, the background images in each frame are considered to be linearly related, and the moving objects are regarded as pixels that cannot be included in the background matrix B during the linear decomposition of the video sequence, and are recorded as foreground. E, regard the target in the current frame t as a linear representation in the subspace stretched by the vector of the previous t-1 frame, and denote the matrix composed of the previous t-1 frame as D _t-1 =[I ₁ ,..., I _t-1 ], then the image of the t-th frame is recorded as:

y _t =B+E=D _t-1 x+E (1)

E(S)=E _smooth (S)+E _data (S) (5)

get the minimum value;

is the complement of Γ _S (X), we have

Assuming that the optimal support set estimate S has been obtained, equation (7) reduces to the following optimization problem:

z=Φy=ΦΨx=Θx (9)

min||x|| ₁ st||Φy-Θx|| ₂ ≤ε (10)

When the sparse coefficient x is given, the energy function shown in equation (7) is transformed into:

in,

Among them, P(Y) is the prior distribution of the observed data, and a given video frame image can be regarded as a constant; P(S) is the prior distribution of the label field. According to the Hammersley-Cliford theorem, the potential of a given cluster is function V _c (l _c ), the prior distribution of the label field is approximately

l _c represents the label of the point on the cluster c,

in,

and

The step 3) specifically further includes: referring to FIG. 2 , let u _k , v _k be the input and output voltages of the kth neuron in the recurrent neural network, respectively, and R _k , C _k are their input resistance and input voltage respectively. Capacitance, I _k is the bias current, g _k (u _k ) is the transfer function _{of the neuron, ω jk} is the connection resistance between the neuron j and the neuron k, that is, the connection weight, the overall energy function of the network usually has in the form of:

Since C _k > 0, the sigmoid function is selected

As a transfer function, g ^-1 is a monotone non-decreasing function, and

According to equation (19), the energy function of the network is:

Binarize the image, at this time the pixel value on the image

Equivalent to label

in,

4) Use the pinhole perspective model to establish the transformation matrix from the video image coordinate system to the world coordinate system, and solve the coordinate position of the foreground target tracking result under the world coordinate system; use the nearest neighbor method to establish the target coordinate point position under the above-mentioned world coordinate system Corresponding relationship with ADS-B broadcast data, so as to associate the sign information in ADS-B with the video to realize automatic sign hanging;

From the estimation of the above background and foreground, the tracking and monitoring of the aircraft target in the video image coordinate system is obtained, the mapping relationship between the image pixel coordinates and the world coordinates is established, and the relevant aircraft sign information is found in the radar tracking result;

Then (22) simplifies to:

p _i = KCp _w (23)

Using Markov random field and sparse background to solve, obtain the foreground target in the continuous video frame, adopt the batch method, note P _i =[pi _i1 , _pi2 ,...,p _it ] The target pixel coordinate vector in consecutive t frames is composed of , the corresponding matrix in the world coordinate system is P _w =[p _w1 ,p _w2 ,...p _wt ], then (23) becomes:

P _i = KCP _w (24)

There are many specific application ways of the present invention, and the above are only the preferred embodiments of the present invention. It should be pointed out that for those skilled in the art, without departing from the principle of the present invention, several improvements can be made. These Improvements should also be considered as the protection scope of the present invention.

Claims

A method for hanging a signage for a remote tower video target based on a Markov random field, characterized in that the steps are as follows:

1) Build the model: Assuming that the background images in consecutive video frames are linearly correlated, the moving objects are regarded as pixels that cannot be included in the background matrix during the linear decomposition of the video sequence. Pixels are classified into background and foreground;

2) Use a greedy algorithm to solve the sparse representation of the continuous video frame sequence to obtain a preliminary estimate of the background;

3) solve the image segmentation problem by using the recurrent neural network, and obtain the estimation to the foreground label set; utilize the foreground label set, revise the preliminary estimation of the background obtained in step 2), and obtain the foreground target tracking result and the background estimation;

4) Use the pinhole perspective model to establish the transformation matrix from the video image coordinate system to the world coordinate system, and solve the coordinate position of the foreground target tracking result in the video frame under the world coordinate system; use the nearest neighbor method to establish the above-mentioned world coordinate system. The corresponding relationship between the target coordinate point position and the ADS-B broadcast data, so as to associate the sign information in the ADS-B broadcast with the video, and realize the automatic hanging of the sign.
The Markov random field-based method for hanging a signage on a remote tower video target according to claim 1, wherein the step 1) specifically comprises: using I t ∈ R m to represent the image of the t-th frame in the video sequence column vector formed by stacking, the frame includes m pixels; D = [I 1, ... , I n] ∈R m × n vector by a matrix representing the I frame composition, represents the entire video sequence comprising n frames ; B∈R m×n is a matrix of the same dimension as D, representing the background in the video frame, consisting of n frame vectors, each frame has m pixels; the kth pixel of the tth frame is denoted as kt; To measure the intensity of the background, when the lighting conditions are basically unchanged during the investigation period, it is considered that the background intensity in the continuous video frame sequence is basically unchanged. Therefore, for a continuous video sequence D, the background images in each frame are considered to be linear correlation. , the moving target is regarded as a pixel that cannot be included in the background matrix B during the linear decomposition of the video sequence, and is denoted as the foreground E, and the target in the current frame t is regarded as the subspace stretched by the previous t-1 frame vector. A linear representation of , denoting the matrix composed of the first t-1 frames as D t-1 =[I 1 ,...,I t-1 ], then the image of the t-th frame is denoted as:

y t =B+E=D t-1 x+E (1)

The matrix B=D t-1 x composed of the background in each frame is a low-rank matrix, that is, the background matrix B satisfies rank(B)≤K, K is a predefined constant, and the coefficient x is a sparse vector; considering the noise in the scene , and assuming that the noise obeys the Gaussian white noise with mean 0 and variance σ 2 , the video frame signal of formula (1) is expressed as:

Wherein, I is a unit matrix, x e I represents white Gaussian noise, and under the influence of noise, the size of the pixel gray value of the video image of the t-th frame is denoted as y kt =B kt +e kt =ψ kt x+e kt ; Define the binary label support set S∈{0,1} m×n as the image pixel label, and its elements are specified as:

Then the background modeling problem boils down to solving the optimization problem shown in the following equation (4):

When Skt = 1, that is, the pixel kt belongs to the foreground, the background is covered by the foreground, and the grayscale of the video frame signal is the same as the foreground, so the detection of the target is actually the estimation of the foreground label set; The interaction causes the image label field to be non-segmentally smooth. Define E smooth to record the degree of non-segmental smoothness of the label field, and E data to record the error between the label and the measured data. The problem of estimating the foreground label set is transformed into the energy optimization of the label field. The problem, i.e.:

E(S)=E smooth (S)+E data (S) (5)

get the minimum value;

Define an orthogonal projection of a matrix X in the linear matrix space of the support set S:

is the complement of Γ S (X), we have
The detection of the dynamic aircraft target y in the video frame is the minimization of the following energy function;

where the parameter α>0 is a constant related to the sparsity of the coefficient vector x, which controls the complexity of the background.
The Markov random field-based method for hanging a signage on a remote tower video target according to claim 2, wherein the step 2) specifically includes: assuming that the optimal support set estimate S has been obtained, the formula (7) is simplified as The following optimization problem:

Using a Gaussian random matrix Φ as the RIP matrix, compressively sample the observations y:

z=Φy=ΦΨx=Θx (9)

Then the problem shown in Equation (8) is transformed into the L1 normal form minimization problem shown in Equation (10):

min||x|| 1 st ||Φy-Θx|| 2 ≤ε (10)

During initialization, a short video at the beginning of the video is used as a training frame, the background complexity is known, the influence of parameter α is ignored, and α=1, and the greedy algorithm is used to solve Equation (10) to obtain the initial background estimation. On this basis, Through the optimization solution of the foreground label set, the background estimation is further optimized, and in the subsequent iterations, the current frame y is used to replace the template corresponding to the smallest sparse representation coefficient x in D t-1.
The method for hanging a signage on a remote tower video target based on a Markov random field according to claim 3, wherein the step 3) specifically comprises: when the sparse coefficient x is given, the energy shown in the formula (7) The function converts to:

in,
Given x, the constant C is also determined accordingly; in order to obtain the estimation of the support S in the formula (11), so as to obtain the foreground image in each frame, the image segmentation method based on the Markov random field is adopted;

Use G={(i,j)|0≤i≤h,0≤j≤w} to represent the set of all pixels in the h×w image of the current frame, and g=(i,j)∈G to represent the two-dimensional image The pixel in the i-th row and the j-th column defines the neighborhood of the pixel as N g ={f∈G|[dist(f,g)] 2 ≤r,f≠g}, where dist(f,g ) represents the Euclidean distance between pixel positions; for the subset c in the image G, when each pair of different elements is always adjacent, a cluster is formed, and C is the set of all clusters c;

For each pixel position g on the image, it corresponds to a random value in a label support set S∈{0,1} m×n . It is assumed that the local conditional probability of the label value of the foreground pixel only varies with the state of its neighborhood. The value set S of the pixel label including the positional relationship is a Markov random field about the neighborhood system N, according to the observed image data Y, the value of the pixel label is selected according to the observed image data Y. The value can be derived from the Bayesian criterion:

Among them, P(Y) is the prior distribution of the observed data, and a given video frame image can be regarded as a constant; P(S) is the prior distribution of the label field. According to the Hammersley-Cliford theorem, the potential of a given cluster is function V c (l c ), using
Fitting the label field prior distribution, l c represents a reference point on the cluster c,
is the sum of the energy of the potential functions on each cluster; the definition of the potential function in the Ising model is:

in,
is the label at point g of the image pixel in the t-th frame, q is the point on the neighborhood of g,
k is the Boltzmann constant, and when the temperature T is constant, β is a constant; at this time, the prior distribution of the label field is:

P(Y|S) is the likelihood probability. It is usually assumed that each pixel is independent and has the same Gaussian distribution. The likelihood probability is regarded as the product of the likelihood probability at each pixel: P(Y|S)=Π g∈ G P(y g |s g ), take the logarithm of it to get:

in,
and
are the mean and variance of the Gaussian distribution obeyed by each label, respectively; the maximum posterior probability criterion is selected as the optimal judgment criterion for image segmentation, then the optimal solution of the objective function is the solution that maximizes the posterior probability of Equation (12). Take the logarithm to get the following objective function:

The optimal solution of the objective function shown in Eq. (16) is solved by using the self-optimization characteristic of the recurrent neural network.
The Markov random field-based remote tower video target hanging signage method according to claim 4, wherein the step 3) specifically further comprises: let u k and v k be respectively the kth in the recurrent neural network The input and output voltages of the neuron, R k , C k are their input resistance and input capacitance, respectively, I k is the bias current, g k (u k ) is the transfer function of the neuron, ω jk is the neuron j and the neuron The connection resistance between elements k, that is, the connection weight, the overall energy function of the network usually has the following form:

Taking the derivative of the above energy function with respect to time, we have:

Since C k > 0, the sigmoid function is selected
As a transfer function, g -1 is a monotone non-decreasing function, and
At this time, the energy function shown in Equation (17) shows a downward decay trend as a whole over time, and is simplified as:

When the network is stable, the energy function converges to the minimum value, so the recurrent neural network realizes the autonomous iterative optimization of the input signal;

According to the autonomous optimization characteristics of the recurrent neural network, the images are labeled
As an input to a recurrent neural network, while setting the network's bias current
According to equation (19), the energy function of the network is:

Binarize the image, at this time the pixel value on the image
Equivalent to label
The 8-neighborhood second-order system model is used to model the image label field, and the Ising function shown in equation (13) is selected as the potential function to obtain the estimation of the foreground label:

in,
is a constant term. Comparing equations (20) and (21), it is found that the estimation of the foreground label is regarded as an autonomous optimization solution to the minimum value of the energy function of the recurrent neural network shown in equation (20).
The Markov random field-based remote tower video target hanging signage method according to claim 5, wherein the step 4) specifically comprises: from the estimation of the background and the foreground, obtain the video image coordinate system of the aircraft in the coordinate system. Tracking and monitoring of targets, establishing the mapping relationship between image pixel coordinates and world coordinates, and finding relevant aircraft signage information in the radar tracking results;

Assuming that the coordinates of the target point in the pixel plane coordinate system are (u, v) T , and the coordinates in the world coordinate system are (x, y, z) T , using the pinhole perspective model, the pixel plane coordinates of the target point are obtained in the world Coordinate conversion relationship:

Among them, f x , f y are parameters representing the focal length, (u 0 , v 0 ) T is the position of the main point relative to the image plane, that is, the intersection of the main optical axis and the image plane; z c is the pixel plane origin relative to the image plane The offset of the origin of the camera coordinate system is a constant; R is the rotation matrix of the camera, and T is the translation matrix, note:

The formula (22) simplifies to:

p i = KCp w (23)

Using Markov random field and sparse background to solve, obtain the foreground target in the continuous video frame, adopt the batch method, note P i =[p i1 ,p i2 ,...,p it ] The target pixel coordinate vector in consecutive t frames is composed of , the corresponding matrix in the world coordinate system is P w =[p w1 ,p w2 ,...p wt ], then formula (23) becomes:

P i = KCP w (24)

According to the formula (24), the coordinates of the tracking result of the foreground target in the world coordinate system are obtained, and the nearest neighbor method is used to establish the corresponding relationship between the video tracking coordinates and the ADS-B data, so as to realize the data association, so as to integrate the ADS-B in the data. The signage information of the flight number is associated with the video to realize automatic signage.