CN108734109A

CN108734109A - A kind of visual target tracking method and system towards image sequence

Info

Publication number: CN108734109A
Application number: CN201810373435.9A
Authority: CN
Inventors: 刘李漫; 刘佳
Original assignee: South Central University for Nationalities
Current assignee: Hangzhou Tuke Intelligent Information Technology Co.,Ltd.
Priority date: 2018-04-24
Filing date: 2018-04-24
Publication date: 2018-11-02
Anticipated expiration: 2038-04-24
Also published as: CN108734109B

Abstract

The invention discloses a kind of visual target tracking method and system towards image sequence, wherein visual target tracking method include the following steps, utilize given initialisation image and target rectangle frame to be tracked, convolution regression model of the training for target following；Utilize the position for the convolution forecast of regression model target that training obtains；On the basis of target prodiction result, the size of target is further predicted；Position and the size of target are obtained according to tracking, updates convolution regression model.The present invention, it is related to the technologies such as the training of target entirety regression model, the training of target texture regression model, target prodiction, target sizes prediction, trace model update, the interference of various environmental factors in tracking scene can fully be overcome, it realizes the Accurate Prediction to target location and size, there is higher commercial value and research significance.

Description

A kind of visual target tracking method and system towards image sequence

Technical field

The present invention relates to technical field of computer vision, and in particular to a kind of visual target tracking side towards image sequence Method and system.

Background technology

In computer vision field, it usually needs using intelligent algorithm automatic identification and analysis video information, to real Now to the intelligentized control method of equipment.The target tracking algorism of view-based access control model image sequence can make full use of existing towards single width The algorithm of target detection of image tracks the movement locus of target in video fast and reliablely, is improved to understand and analyzing video Technical support.

With expanding rapidly for commercial production scale, automation and intelligence degree in industrial products production process also need It is continuously improved.Such as in video monitoring system, needs intelligent algorithm automatic identification and detect the abnormal thing occurred in video Part.And visual target tracking algorithm can automatically track each target in video, and the movement locus of target is obtained, to divide It analyses and understands that the anomalous event in video provides crucial technological means.But traditional visual target tracking algorithm there is also Following defect：

(1) size of target can not be predicted well, especially when apparent deformation occurs for target, tradition tracking Algorithm can not accurately predict target sizes, cause to lose target in subsequent tracking, and can not be provided for video analysis understanding can The bottom-up information leaned on.

(2) under the interference of a variety of environmental factors cannot accurately, reliably carry out target following.

In view of this, being badly in need of being improved existing visual target tracking algorithm, propose that one kind can overcome a variety of rings Border factor interference, and the visual target tracking algorithm of Accurate Prediction target location and size.

Invention content

The technical problem to be solved by the present invention is to existing visual target tracking algorithm, there are the positions of unpredictable target It sets and size, and tracking process is easy to be interfered by environment, cannot accurately and reliably carry out the problem of target following.

In order to solve the above-mentioned technical problem, the technical solution adopted in the present invention is to provide a kind of regarding towards image sequence The technical method for feeling target following, includes the following steps：

Using given initialisation image and target rectangle frame to be tracked, convolution of the training for target following returns mould Type；

Utilize the position for the convolution forecast of regression model target that training obtains；

On the basis of target prodiction result, the size of target is further predicted；

Position and the size of target are obtained according to tracking, updates convolution regression model.

In the above scheme, the method for training convolutional regression model includes the following steps：

Step 10, feature extraction network of the structure for expressing feature of interest characteristic, the network can be based on arbitrarily being used for The feature extracting method for expressing target information is realized；

Step 11, the corresponding feature of feature extraction network extraction current input image in step 10 is utilized；

Step 12, structure is realized based on single convolutional network layer, and object-oriented whole convolution regression model, the convolution Convolution kernel size and the target of layer are in the in the same size of feature space, while the output channel of convolution layer network is 1, the convolutional layer The output of network can be used for predicting the position of target；

Step 13, based on the feature extracted in step 11, corresponding training label figure is generated, training label figure is according to one A two-dimensional Gaussian function generates, and peak value has corresponded to the true position of target, utilized gradient descent algorithm iteration optimization step 12 In single convolutional layer；

Step 14, structure is realized based on single convolutional network layer, and the convolution regression model of object-oriented texture, the convolution Convolution kernel size and the target of layer are in the in the same size of feature space, while the output channel of convolution layer network is 1, the convolutional layer The output of network can be used for predicting the foreground of target；

Step 15, it based on the feature extracted in step 11, generates corresponding training label and schemes, with a square in the label figure Shape frame identifies the foreground of target, this utilizes the single convolutional layer in gradient descent algorithm iteration optimization step 12；

Step 16, the initial training of convolution regression model terminates.

In the above scheme, target prodiction method has specifically included following steps：

Step 20, the feature extraction network built in step 10, the corresponding feature of extraction current input image, after being are utilized Continuous target following is prepared；

Step 21, the object-oriented whole convolution obtained in the characteristics of image input step 12 obtained in step 20 is returned Return network, the target prodiction figure H (x based on target entirety regression model are calculated_t,y_t)；

Step 22, the convolution of the object-oriented texture obtained in the characteristics of image input step 14 obtained in step 20 is returned Return network, the target prospect prognostic chart T (x based on target texture regression model are calculated_t,y_t)；

Step 23, on the target prospect prognostic chart obtained in step 22 carry out mean filter operation, Filtering Template it is big It is small in the same size with target, the target prodiction figure F (x based on target prospect are calculated_t,y_t)；

Step 24, obtained in step 21 and step 23 two kinds of target prodiction figures are superimposed, are obtained final Target prodiction figure, and according to the corresponding index future position of maximum value in position prediction figure, calculation formula is：

In the above scheme, the target prodiction figure F (x based on target prospect in the step 23_t,y_t) calculating it is public Formula is：Wherein w_t-1And h_t-1It is respectively identified in previous frame tracking Obtained target sizes, R (x_t,y_t,w_t-1,h_t-1) denotation coordination be (x_t,y_t), size w_t-1,h_t-1Rectangle frame.T(i,j) The value in the rectangle frame of the target prospect prognostic chart of target texture regression model corresponding to each pixel is represented, i, j are square Pixel in shape frame

In the above scheme, target sizes prediction techniques includes the following steps：

Step 30, the feature extraction network built in step 10, the corresponding feature of extraction current input image, after being are utilized Continuous target following is prepared；

Step 31, the convolution of the object-oriented texture obtained in the characteristics of image input step 14 obtained in step 30 is returned Return network, the target prospect prognostic chart T (x based on target texture regression model are calculated_t,y_t)；

Step 32, the position x of current goal is obtained_t,y_tAnd size w of the known target in previous frame_t,h_tAfterwards, it calculates Target sizes are w_t,h_tPosterior probability；

Step 33, it reuses the method in step 32 and calculates the corresponding posterior probability of multiple target candidate sizes, selection The maximum target sizes of posterior probability are as final target sizes predicted value；

Step 34, target sizes prediction terminates.

In the above scheme, target sizes are w in the step 32_t,h_tThe calculation formula of posterior probability be：P(w_t,h_t |O,x_t,y_t,w_t-1,h_t-1)=P (O | x_t,y_t,w_t,h_t)P(w_t,h_t|w_t-1,h_t-1), wherein P (O | x_t,y_t,w_t,h_t) indicate target Position and size state be (x_t,y_t,w_t,h_t) probability, P (O | x_t,y_t,w_t,h_t)P(w_t,h_t|w_t-1,h_t-1) indicate adjacent two Target sizes state transition probability between frame, P(O|x_t,y_t,w_t,h_t)=A (w_t,h_t)-B(w_t,h_t), wherein A (w_t,h_t) indicate candidate target rectangle frame (x_t,y_t,w_t,h_t) on Average criterion foreground probability, B (w_t,h_t) indicate target rectangle frame (x_t,y_t,w_t,h_t) surrounding background area average criterion before Scape probability.

In the above scheme, update convolution regression model includes the following steps：

Step 40, it according to predicted obtained target location, generates for training object-oriented whole convolution to return The label figure of model, and utilize the monovolume lamination network parameter in gradient descent method update step 12；

Step 41, it according to predicted obtained target sizes, generates for training the convolution of object-oriented texture to return The label figure of model, and utilize the monovolume lamination network parameter in gradient descent method update step 14；

Step 42, convolution regression model update terminates.

The visual target tracking system towards image sequence that the present invention also provides a kind of, including：

Training module, convolution regression model of the training for target following；

Target prodiction module utilizes the position for the convolution forecast of regression model target that training obtains；

Target sizes prediction module further predicts the size of target on the basis of target prodiction result；

Update module obtains position and the size of target according to tracking, updates convolution regression model.

Compared with prior art, the present invention relates to the training of target entirety regression model, the training of target texture regression model, mesh The technologies such as cursor position prediction, target sizes prediction, trace model update, can fully overcome various environmental factors in tracking scene Interference, realize to the Accurate Prediction of target location and size, there is higher commercial value and research significance.

Description of the drawings

Fig. 1 is the flow diagram of the position of the present invention present invention；

Fig. 2 is the input initial frame image of the present invention；

Fig. 3 is that the convolution regression model of the present invention trains flow diagram；

Fig. 4 is the whole regression model schematic diagram of the present invention；

Fig. 5 is the texture regression model schematic diagram of the present invention；

Fig. 6 is the target prodiction flow diagram of the present invention；

Fig. 7 is the target prodiction figure based on whole regression model of the present invention；

Fig. 8 is the target prospect prognostic chart based on texture regression model of the present invention；

Fig. 9 is the target prodiction figure based on texture regression model of the present invention；

Figure 10 is that the target sizes of the present invention predict flow diagram.

Specific implementation mode

The present invention provides a kind of technical methods of the visual target tracking towards image sequence, can fully overcome tracking The interference of various environmental factors in scene realizes to the Accurate Prediction of target location and size, have higher commercial value and Research significance.The present invention is described in detail with specific implementation mode with reference to the accompanying drawings of the specification.

As depicted in figs. 1 and 2, the technical method of a kind of visual target tracking towards image sequence provided by the invention, Specifically it may include following steps：

Position and the size of target are finally obtained according to tracking, update convolution regression model.

Correspondingly, the visual target tracking system towards image sequence that the present invention also provides a kind of, including training module, Target prodiction module, target sizes prediction module and update module.

Convolution regression model of the training module training for target following；

The position for the convolution forecast of regression model target that target prodiction module is obtained using training；

The present invention relates to the training of target entirety regression model, the training of target texture regression model, target prodiction, targets The technologies such as size prediction, trace model update, can fully overcome the interference of various environmental factors in tracking scene, realize to mesh The Accurate Prediction of cursor position and size has higher commercial value and research significance.

As shown in figure 3, the method for training convolutional regression model specifically includes following steps：

The feature extraction network for expressing feature of interest characteristic is built, which can be based on arbitrary for expressing target The feature extracting method of information is realized；

Utilize the corresponding feature of feature extraction network extraction current input image；

Structure is realized based on single convolutional network layer, and object-oriented whole convolution regression model, the volume of the convolutional layer Core size is accumulated with target in the in the same size of feature space, while the output channel of convolution layer network is 1, the convolution layer network Output can be used for predicting the position of target；

The feature for expressing target information based on said extracted generates corresponding training label figure, training label Figure is generated according to a two-dimensional Gaussian function, and peak value has corresponded to the true position of target, excellent using gradient descent algorithm iteration Change single convolutional layer；

Structure is realized based on single convolutional network layer, and the convolution regression model of object-oriented texture, the volume of the convolutional layer Core size is accumulated with target in the in the same size of feature space, while the output channel of convolution layer network is 1, the convolution layer network Output can be used for predicting the foreground of target；

The feature for expressing target information based on said extracted generates corresponding training label and schemes, in the label figure The foreground of target is identified with a rectangle frame, this utilizes the single convolutional layer of gradient descent algorithm iteration optimization；

Convolution regression model initial training terminates.

As shown in Fig. 4~Fig. 9, target prodiction method has specifically included following steps：

Using the feature extraction network for expressing feature of interest characteristic of above-mentioned structure, extraction current input image corresponds to Feature, prepare for subsequent target following；

Obtained characteristics of image is input to above-mentioned object-oriented whole convolution Recurrent networks, is calculated based on target The target prodiction figure H (x of whole regression model_t,y_t)；

Characteristics of image obtained above is input to the convolution Recurrent networks of object-oriented texture, is calculated based on target The target prospect prognostic chart T (x of texture regression model_t,y_t)；

Mean filter operation, the size of Filtering Template and the size one of target will be carried out on above-mentioned target prospect prognostic chart It causes, the target prodiction figure F (x based on target prospect is calculated_t,y_t), wherein F (x_t,y_t) calculation formula be：

Wherein w_t-1And h_t-1It is respectively identified at the target sizes obtained in previous frame tracking, R (x_t,y_t,w_t-1,h_t-1)R(x_t, y_t,w_t-1,h_t-1) denotation coordination be (x_t,y_t), size w_t-1,h_t-1Rectangle frame, T (i, j) represent target texture return mould Value in the rectangle frame of the target prospect prognostic chart of type corresponding to each pixel, i, j are the pixel in rectangle frame；

By two kinds of target prodictions figure H (x obtained above_t,y_t) and F (x_t,y_t) be superimposed, it obtains final Target prodiction figure, and according to the corresponding index future position of maximum value in position prediction figure, calculation formula is：

As shown in Figure 10, the prediction techniques of target sizes includes the following steps：

Characteristics of image obtained above is input to the convolution Recurrent networks for the object-oriented texture that convolution layer network obtains, The target prospect prognostic chart T (x based on target texture regression model are calculated_t,y_t)；

Obtain the position x of current goal_t,y_tAnd size w of the known target in previous frame_t,h_tAfterwards, it is big to calculate target Small is w_t,h_tPosterior probability P (O | x_t,y_t,w_t,h_t), wherein calculation formula is：

P(w_t,h_t|O,x_t,y_t,w_t-1,h_t-1)=P (O | x_t,y_t,w_t,h_t)P(w_t,h_t|w_t-1,h_t-1), wherein P (O | x_t,y_t,w_t,h_t) Indicate that the position of target and size state are (x_t,y_t,w_t,h_t) probability, P (O | x_t,y_t,w_t,h_t)P(w_t,h_t|w_t-1,h_t-1) indicate Target sizes state transition probability between adjacent two frame, P(O|x_t,y_t,w_t,h_t)=A (w_t,h_t)-B(w_t,h_t), wherein A (w_t,h_t) indicate candidate target rectangle frame (x_t,y_t,w_t,h_t) on Average criterion foreground probability, B (w_t,h_t) indicate target rectangle frame (x_t,y_t,w_t,h_t) surrounding background area average criterion before Scape probability；

It reuses the above method and calculates the corresponding posterior probability of multiple target candidate sizes, select posterior probability maximum Target sizes are as final target sizes predicted value；

Target sizes prediction terminates.

The update of convolution regression model mainly includes the following steps that：

According to predicted obtained target location, the mark for training object-oriented whole convolution regression model is generated Note figure, and update above-mentioned monovolume lamination network parameter using gradient descent method；

According to predicted obtained target sizes, the mark of the convolution regression model for training object-oriented texture is generated Note figure, and update above-mentioned monovolume lamination network parameter using gradient descent method；

The update of convolution regression model terminates.

The method of the present invention is needed at given one by algorithm keeps track using continuous sequence of video images as input data After target rectangle frame, pass through the training of target entirety regression model, the training of target texture regression model, target prodiction, target Size prediction, trace model update and etc., realize being continuously tracked to the target in image sequence.This method can be in target Rotate or target be blocked when be accurately tracked by target, while solving traditional visual target tracking algorithm The problem of being difficult to Accurate Prediction target sizes, can be when target deforms upon, the size of Accurate Prediction target.The present invention simultaneously The method of proposition has tracking accuracy high, and the speed of service is very fast, it is insensitive to the interference of background environment the features such as, controlled in industry The occasions such as system, automated production have boundless application prospect.

The invention is not limited in above-mentioned preferred forms, and anyone should learn that is made under the inspiration of the present invention Structure change, the technical schemes that are same or similar to the present invention are each fallen within protection scope of the present invention.

Claims

1. a kind of technical method of the visual target tracking towards image sequence, which is characterized in that include the following steps：

Utilize given initialisation image and target rectangle frame to be tracked, convolution regression model of the training for target following；

2. a kind of technical method of visual target tracking towards image sequence according to claim 1, which is characterized in that The method of training convolutional regression model includes the following steps：

Step 10, feature extraction network of the structure for expressing feature of interest characteristic, the network can be based on arbitrary for expressing The feature extracting method of target information is realized；

Step 12, structure is realized based on single convolutional network layer, and object-oriented whole convolution regression model, the convolutional layer Convolution kernel size and target are in the in the same size of feature space, while the output channel of convolution layer network is 1, the convolution layer network Output can be used for predicting the position of target；

Step 13, based on the feature extracted in step 11, corresponding training label figure is generated, training label figure is according to one two It ties up Gaussian function to generate, peak value has corresponded to the true position of target, using in gradient descent algorithm iteration optimization step 12 Single convolutional layer；

Step 14, structure is realized based on single convolutional network layer, and the convolution regression model of object-oriented texture, the convolutional layer Convolution kernel size and target are in the in the same size of feature space, while the output channel of convolution layer network is 1, the convolution layer network Output can be used for predicting the foreground of target；

Step 15, it based on the feature extracted in step 11, generates corresponding training label and schemes, with a rectangle frame in the label figure The foreground of target is identified, this utilizes the single convolutional layer in gradient descent algorithm iteration optimization step 12；

Step 16, the initial training of convolution regression model terminates.

3. according to claim, a kind of technical method of visual target tracking towards image sequence described in 2, feature exists In target prodiction method has specifically included following steps：

Step 20, using the feature extraction network built in step 10, the corresponding feature of extraction current input image is subsequent Target following is prepared；

Step 21, the object-oriented whole convolution obtained in the characteristics of image input step 12 obtained in step 20 is returned into net The target prodiction figure H (x based on target entirety regression model are calculated in network_t,y_t)；

Step 22, the convolution of the object-oriented texture obtained in the characteristics of image input step 14 obtained in step 20 is returned into net The target prospect prognostic chart T (x based on target texture regression model are calculated in network_t,y_t)；

Step 23, on the target prospect prognostic chart obtained in step 22 carry out mean filter operation, the size of Filtering Template with Target it is in the same size, the target prodiction figure F (x based on target prospect are calculated_t,y_t)；

Step 24, obtained in step 21 and step 23 two kinds of target prodiction figures are superimposed, obtain final mesh Cursor position prognostic chart, and according to the corresponding index future position of maximum value in position prediction figure, calculation formula is：

4. a kind of technical method of visual target tracking towards image sequence according to claim 3, feature It is, the target prodiction figure F (x based on target prospect in the step 23_t,y_t) calculation formula be：Wherein w_t-1And h_t-1It is respectively identified in previous frame tracking and obtains Target sizes, R (x_t,y_t,w_t-1,h_t-1) denotation coordination be (x_t,y_t), size w_t-1,h_t-1Rectangle frame, T (i, j) represent Value in the rectangle frame of the target prospect prognostic chart of target texture regression model corresponding to each pixel, i, j are rectangle frame Interior pixel.

5. a kind of technical method of visual target tracking towards image sequence according to claim 2, which is characterized in that Target sizes prediction technique includes the following steps：

Step 30, using the feature extraction network built in step 10, the corresponding feature of extraction current input image is subsequent Target following is prepared；

Step 31, the convolution of the object-oriented texture obtained in the characteristics of image input step 14 obtained in step 30 is returned into net The target prospect prognostic chart T (x based on target texture regression model are calculated in network_t,y_t)；

Step 32, the position x of current goal is obtained_t,y_tAnd size w of the known target in previous frame_t,h_tAfterwards, target is calculated Size is w_t,h_tPosterior probability；

Step 33, it reuses the method in step 32 and calculates the corresponding posterior probability of multiple target candidate sizes, select posteriority The target sizes of maximum probability are as final target sizes predicted value；

Step 34, target sizes prediction terminates.

6. a kind of technical method of visual target tracking towards image sequence according to claim 5, which is characterized in that Target sizes are w in the step 32_t,h_tThe calculation formula of posterior probability be：P(w_t,h_t|O,x_t,y_t,w_t-1,h_t-1)=P (O |x_t,y_t,w_t,h_t)P(w_t,h_t|w_t-1,h_t-1), wherein P (O | x_t,y_t,w_t,h_t) indicate that the position of target and size state are (x_t, y_t,w_t,h_t) probability, P (O | x_t,y_t,w_t,h_t)P(w_t,h_t|w_t-1,h_t-1) indicate target sizes state transfer between adjacent two frame Probability,P(O|x_t,y_t,w_t,h_t)=A (w_t,h_t)- B(w_t,h_t), wherein A (w_t,h_t) indicate candidate target rectangle frame (x_t,y_t,w_t,h_t) on average criterion foreground probability, B (w_t, h_t) indicate target rectangle frame (x_t,y_t,w_t,h_t) surrounding background area average criterion foreground probability.

7. a kind of technical method of visual target tracking towards image sequence according to claim 2, which is characterized in that Update convolution regression model includes the following steps：

Step 40, it according to predicted obtained target location, generates for training object-oriented whole convolution regression model Label figure, and using gradient descent method update step 12 in monovolume lamination network parameter；

Step 41, according to predicted obtained target sizes, the convolution regression model for training object-oriented texture is generated Label figure, and using gradient descent method update step 14 in monovolume lamination network parameter；

Step 42, convolution regression model update terminates.

8. a kind of visual target tracking system towards image sequence, which is characterized in that including：