CN106530324A

CN106530324A - Visual cortex mechanism simulated video object tracking method

Info

Publication number: CN106530324A
Application number: CN201610921845.3A
Authority: CN
Inventors: 陈靓影; 徐如意; 张坤; 刘乐元
Original assignee: Huazhong Normal University
Current assignee: Huazhong Normal University
Priority date: 2016-10-21
Filing date: 2016-10-21
Publication date: 2017-03-22

Abstract

The invention discloses a visual cortex mechanism simulated video object tracking method. The method includes offline training and online tracking. In offline training, image sample sets of a tracking object are constructed; an image pyramid is established for samples; a BIM feature is extracted; and a classifier is trained. In online tracking, a detection area of an image to be detected is initialized, an image pyramid of the detection area is established; the BIM feature is extracted from the image pyramid, matching subblocks are obtained in the BIM feature extracting process, and the matching subblocks are used to predict a candidate object area; potential object areas are selected from the candidate area by iteration according to a determination result of the classifier; and an object area of the highest probability is selected from the potential object areas and serves as a final object area. According to the method, BIM is used to established an appearance model of an object, a tracking framework is introduced, and the object can be tracked stably by simulating the visual cortex mechanism.

Description

A kind of video target tracking method of analog vision cortex mechanism

Technical field

The invention belongs to computer vision and neurobiology field, and in particular to a kind of analog vision cortex mechanism is regarded Frequency method for tracking target.

Background technology

Biological excitation model (Biologically inspired model, BIM) is raw according to nerve by Serre et al. Neo-confucian Hubel and Wiesel is to setting up on the basis of visual cortex achievement in research.The feature extraction of BIM is one 4 layering knot Structure model, 4 Rotating fields are respectively designated as S1, C1, S2 and C2.The V1 layers of the corresponding visual cortex of S1 with C1 layers, extraction can be tolerated and put down Move the target texture feature of conversion；The category feature of target can be distinguished by the MST areas of the corresponding visual cortex of S2 with C2 floor, extraction.By In BIM based on neuro physiology achievement in research, compare other artificial features extracted and there is incomparable superiority, This model is widely used in Computer Vision Task, such as identification of Activity recognition, age, scene Recognition etc..With high-performance The exploitation of GPU and distributed system, BIM is possibly realized for video frequency object tracking task.

The research of video frequency object tracking is mainly comprised the following steps：Target detection, the foundation of target appearance model and to target Fast search.The first step needs from background image to be accurately positioned target, and this link is now as an independence Research direction；Second step is to extract to be capable of the feature of adaptive targets cosmetic variation to set up the display model of target, selected Feature can be comprising information such as the color of target, shape, textures；3rd step designs efficient search strategy, quickly positions mesh Mark.Mainly there is following searching algorithm at present：Particle filter, optical flow method, mean shift algorithm etc..

Video frequency object tracking algorithm research makes great progress in recent years, obtains good tracking effect.But simulate BIM introducing target tracking algorisms can be such that computer realizes more by the rare research of the target tracking algorism of visual cortex mechanism Plus meet the working mechanism of human vision cortex, reach that degree of accuracy is high, the tracking effect that robustness is good.

The content of the invention

The problem existed for prior art and urgent needss, the invention provides a kind of video target tracking method and be System, it is intended that setting up the display model of target using BIM and introducing tracking framework so that computer being capable of analog vision Cortex working mechanism steadily tracks target.

A kind of video target tracking method of analog vision cortex mechanism, including two ranks of off-line training and online tracking Section；

The off-line training step is comprised the following steps：

(11) build the positive and negative sample set of tracking target image；

(12) align negative sample and set up image pyramid, extract the BIM features of image pyramid, carry out according to BIM features The classifier training of the first round, the weight size obtained according to classifier training screen feature to reduce intrinsic dimensionality；

(13) classifier training of the second wheel is carried out using the BIM features reduced after intrinsic dimensionality；

The online tracking phase is comprised the following steps：

(21) detection image initialization detection zone is treated, the image pyramid of detection zone is set up；

(22) BIM features are extracted to the image pyramid of detection zone, are sent to the grader that step (13) is trained, Grader exports the result of determination of detection zone；Multiple matched sub-blocks are obtained during BIM features are extracted, using matching Block predicts object candidate area；

(23) potential target region is selected according to grader result of determination iteration from candidate region；

(24) target area maximum probability person is chosen from potential target region be final target area.

Further, the specific implementation of step (12) is：

(121) to each Sample Establishing image pyramid and BIM features are extracted, including S1 layer features, S2 layer features, S3 Layer feature and S4 layer features；

The calculation of the S1 layers feature is：Image pyramid is defined as by the result that S1 layers computing unit is calculated S1 layer features；S1 layer computing units are the wave filter groups that 4 Gabor filters are constituted, and are expressed as：

x₀=xcos θ+ysin θ,

y₀=-xsin θ+ycos θ

In formula, F (x, y, θ) represents wave filter in the corresponding response value in coordinate (x, y) place, and θ controls the direction of wave filter, Value is respectively4 direction 4 Gabor filters of correspondence；γ represents the length-width ratio of wave filter, and σ represents filter The bandwidth of ripple device, λ represent the wavelength of wave filter；

The calculation of the C1 layers feature is：S1 layers feature is defined as into C1 layers by the result of calculation of C1 computing units Feature；C1 layers computing unit extracts maximum, the office of computing unit setting in the corresponding local neighborhood of S1 layer features to input Portion's neighborhood is the local pyramid extracted from the corresponding image pyramid of S1 layer features；

The calculation of the S2 layers feature is：C1 layers feature is defined as into S2 by the result of calculation of S2 layer computing units Layer feature, S2 layers computing unit calculate characteristic sub-block X and each sub-block template in C1 layer pyramidsBetween it is similar Property degree, it is describedIt is the sub-block randomly selected from the C1 layer features of multiple samples, d is the quantity for extracting sub-block；It is described Similarity function is defined as：In formula,

The calculation of the C2 layers feature is：S2 layers feature is defined as into C2 by the result of calculation of C2 layer computing units Layer feature, C2 layers computing unit calculate the similarity response global maximum of each characteristic sub-block in C1 layer pyramids；

(122) screen characteristic sub-block

C2 layers feature is sent into into classifier training, using the feature selection ability of grader, to d sub-block template according to dividing The respective weights size that the training of class device is obtained is ranked up, and selects k maximum sub-block template to be used for the grader instruction of the second wheel Practice, k<d.

Further, the specific implementation of step (22) is：

(221) BIM features are extracted to the image pyramid of detection zone, is sent to the classification that step (13) is trained Device, grader export the result of determination of detection zone；

(222) BIM features are extracted to the image pyramid of detection zone, obtains the corresponding best match of k sub-block template Sub-block, predicts the candidate region of target using matched sub-block, and concrete implementation mode is：

Make detection zone image pyramid bottom geometric center be (x, y) zero, the center phase of matched sub-block Coordinate position (dx, dy) of the matched sub-block relative to zero is defined as to the vector of zero, make two sub-block templates it Between distance be D_ij, i, j=1 ..., k, the distance between its Corresponding matching sub-block are D '_ij；Calculate the target scale of present frame The factorMedian () takes median to all of ratio, improves the robustness of dimension calculation；According to Each matched sub-block predicts the center of targetBuild with the centre bit of target The object candidate area at center is set to, the collection being made up of these object candidate areas is collectively referred to as candidate collection.

Further, the specific implementation of step (23) is：

(231) one object candidate area of random selection from candidate collection.

(232) if grader output is judged to target, choose from all candidate regions from current goal candidate region Nearest candidate region is used as potential target region, and rejects potential target region from candidate collection and away from detection zone Candidate samples；Otherwise, the center average of the candidate region away from current detection region is calculated, chosen distance this average is most Near candidate region is used as potential target region, and rejects potential target region from candidate collection and near current detection area The candidate region in domain；

(233) the potential target extracted region BIM feature selected to step (232), is sent to step (13) and trains Grader, the result of determination in grader output potential target region；During BIM features are extracted, S2 layers and C2 layers Calculating may find new matched sub-block, perform the operation similar with step (222) to new matched sub-block, obtain new mesh New object candidate area is added to candidate collection by mark candidate region；

(234) judge that candidate collection then enters step (24), otherwise return to step (231) whether as empty, if so,.

Further, step (24) choose that correspondence grader result of determination shows most to have from potential target region can Final target area can be for target area person.

BIM is used for video frequency object tracking by above-mentioned steps by the present invention, compared to existing technology, with following excellent Point：

Compared to other target's feature-extraction methods, the complete analog vision cortex work of the feature that BIM is extracted more meets life Thing capture target general characteristic, with robustness it is good the characteristics of；Sub-block in C1 layers is a kind of local feature, with positioning office The function of portion's shelter target, and C2 layers are characterized in that a kind of global characteristics, although no positioning function but through machine learning but There is the ability of very strong differentiation target and background.Both features are combined by the present invention, can play respective advantage, both The speed of tracking can be improved, the precision of tracking is also ensured that.

Description of the drawings

Fig. 1 is video target tracking method flowchart of the present invention；

Fig. 2 is BIM feature extractions flow chart of the present invention.

Specific embodiment

In order that the objects, technical solutions and advantages of the present invention become more apparent, it is below in conjunction with drawings and Examples, right The present invention is further elaborated.It should be appreciated that specific embodiment described herein is only to explain the present invention, and It is not used in the restriction present invention.As long as additionally, technical characteristic involved in invention described below each embodiment Do not constitute conflict each other can just be mutually combined.

In order to help the understanding to technical solution of the present invention, the first technical term to being related to explain：

BIM models：Biological excitation model, including 4 Rotating fields, respectively S1 layers, C1 layers, S2 layers and C2 layers.Wherein, S generations Simple cell in the table visual cortex, C represent the complex cell in the visual cortex.Sample obtains being vision through S1 layers and C1 layers Primary features, be then passed through that S2 and C2 layers obtain is vision advanced features.

Image pyramid：Image pyramid be one with pyramid structure arrange image collection, all of figure in set As all originating from same original image, and it is by the continuous down-sampled acquisition of original image, until reaching specify minimum Resolution just stops down-sampled.

Fig. 1 is video target tracking method flowchart of the present invention, including off-line training and two stages of online tracking, Implement step as follows：

(1) off-line training step：

(11) select positive negative sample.

Positive sample represents target, takes target area manually in video initial frame, in order to be able to allow grader identification local to hide The target of gear, is blocked to target area at random with the square of white, the size value of white blocks for [8,0.5*min (w, H)], wherein w, h are respectively the width and height of target.In order to improve the identification ability to positive sample, from 101 figures of CALTECH As selecting the sample similar with target is tracked in storehouse, the multiformity of positive sample is improved.

Negative sample represents non-targeted, extracts from background area in initial video frame, and the area size of negative sample can be big In the size of target, the size of target is finally normalized to.In order to increase the representativeness of negative sample, it is also possible to from CALTECH Select and other close class images of target appearance in 101 image libraries.

(12) first round off-line learning.

(121) to Sample Establishing image pyramid, and BIM features are extracted.

Fig. 2 is BIM feature extractions flow chart of the present invention.

S1 layers are specially：

Computing unit in S1 layers is that 4 Gabor filters constitute wave filter group：

x₀=xcos θ+ysin θ,

y₀=-xsin θ+ycos θ

In formula, F (x, y, θ) represents wave filter in the corresponding response value in coordinate (x, y) place, and θ controls the direction of wave filter, Value is respectively4 direction 4 Gabor filters of correspondence；γ represents the length-width ratio of wave filter, and σ represents filter The bandwidth of ripple device, λ represent the wavelength of wave filter, and these three parameters are determined according to the physical characteristics of human eye receptive field, are constant.

S1 layer computing units are arranged in the pyramid structure similar to input picture so that computing unit can be in input Calculated on all yardsticks of image pyramid and position.The result definition that image pyramid is calculated by S1 layers computing unit For S1 layer features.

C1 layers are specially：

C1 layers are convergence-levels, and the computing unit in C1 layers is equally arranged as pyramid structure, each computing unit pair Maximum is extracted in the corresponding local neighborhood of S1 layer features of input.The local neighborhood that computing unit sets is as from S1 layer features pair The local pyramid scope of the double-layer structure extracted in the image pyramid answered, the pyramidal bottom size in local are 2 Δ s ×2Δs.The region calculated by C1 layer computing units is overlapped, and overlapping widths are Δ s.S1 layers feature is calculated into single by C1 The result of calculation of unit is defined as C1 layer features.

In order to extract ensuing S2 layers feature, needs are randomly selected various sizes of from the C1 layer features of great amount of samples Square sub-block is used as sub-block templateSize n of sub-block template be empirical value, for example for (4,8,12,16)；Extract Sub-block may come from C1 layers and export pyramidal different scale layer, and d is the quantity for extracting sub-block, and generally higher than 1000.

S2 layers are specially：

S2 layers calculate sub-block X and each sub-block template P in the C1 layer pyramids of input_iBetween similarity measurement.Phase It is defined as like degree function：In formula,Value is only It is only relevant with size n of sub-block.

C2 layers are specially：

C2 layers are also a convergence-level, respond global maximum by the similarity for calculating each characteristic sub-block, obtain C2 layer features, the result of calculation of C2 layers are the vectors of a d dimension.

(122) screen characteristic sub-block

C2 layers feature sends into classifier training, and grader can select SVM or Adaboost.Using the feature of grader Selective power, is ranked up according to the respective weights size that classifier training is obtained to d sub-block template, selects maximum k Display model of the sub-block template as target following.

The present invention is so that SVM classifier is realized as an example.The C2 features extracted are sent into into SVM classifier training, grader is obtained Discriminant function be：In formula, weights omega_i, correspondence support vector weights omega_iAnd Biasing b is obtained by training.Select k maximum weights omega_iCorresponding sub-block template x_i, k takes 100.This 100 sub-blocks will also For the second wheel training.

(13) second wheel off-line trainings.

The flow process of the second wheel off-line training is basically identical with first round training flow process, after all Sample Establishing image pyramids S1 is sequentially passed through, the calculating of C1, S2 and C2 layer computing unit, the C2 layer features of output pass to grader and is trained.It is different It is the sub-block template after step (122) screening that part is the sub-block template used in S2 layers so that the C2 layer feature dimensions of output Number is greatly reduced.And the grader that the second training in rotation gets will be used for online tracking phase for recognizing target.

(2) online tracking step：

In order to improve the search efficiency of tracker, the present invention adopts an iterative procedures search target.The process of search is such as Under：Initialized target detection zone.Calculated according to the S2 layers and C2 layers of BIM, find the best match of all sub-block templates, while Calculate the grader fraction of detection zone.The position of target is predicted using matched sub-block, target candidate position regional ensemble is obtained；Repeatedly A candidate region is only detected in generation detection candidate region, each iteration, extracts the BIM features of detection zone and calculates grader Fraction is detected.In iteration, the selection strategy of candidate samples is determined by the grader fraction of last iteration.It is in detection process, right The sub-block matching that S2 and C2 layers are newly produced in calculating carries out sub-block prediction, and new predicting the outcome is added to candidate samples set, Simultaneously in Rejection of samples set some insecure samples reducing the number of times of iteration.When in candidate samples set, no sample is needed When to be detected, iteration ends.The grader fraction of the candidate samples of all detections is ranked up, and determines final tracking knot Really.

(21) initialize detection zone.

Detection image initialization detection zone is treated, the image pyramid of detection zone is set up.If being currently the first frame Or previous frame is when cannot position target, detection zone is initialized using target detection.If during continuous tracking, then Region determined by previous frame tracking result can be used as detection zone.

(22) BIM features are extracted to the image pyramid of detection zone, are sent to the grader that step (13) is trained, Grader exports the result of determination of detection zone；Multiple matched sub-blocks are obtained during BIM features are extracted, using matching Block predicts object candidate area.

(221) BIM features are extracted to the image pyramid of detection zone, is sent to the classification that step (13) is trained Device, grader export the result of determination of detection zone, i.e. grader fraction.

(222) initialize candidate collection

BIM features are extracted to the image pyramid of detection zone, the corresponding best match sub-block of k sub-block template is obtained. The candidate region of target is predicted using matched sub-block：The geometric center of the image pyramid bottom of detection zone is made to sit for (x, y) Mark origin, the vector of the center relative coordinates origin of matched sub-block are defined as coordinate position of the matched sub-block relative to zero (dx, dy), makes the distance between sub-block template be D_ij, i, j=1 ..., k, the distance between its Corresponding matching sub-block are D '_ijMeter Calculate the target scale factor of present frameMedian () takes median to all of ratio, improves yardstick The robustness of calculating；The center of target is predicted according to each matched sub-blockStructure The object candidate area built centered on the center of target.The collection being made up of these object candidate areas is collectively referred to as Candidate Set Close.

(23) potential target region is selected according to grader result of determination iteration from candidate region.

(231) one object candidate area of selection from candidate collection.

(232) judge whether the object candidate area contains target according to grader fraction S (x), if it is determined that result is yes, The candidate region is then thought near target, otherwise, the candidate region is away from target.For example, assert grader fraction S (x) non-negative Target is corresponded to then, conversely, being non-targeted.So, in this step, if S (x) >=0, assertive goal closely detection zone Domain, then choose from the nearest candidate region in current goal candidate region as potential target region from all candidate regions, and Potential target region and the candidate samples away from detection zone are rejected from candidate collection.If S (x) ＜ 0, assertive goal does not exist Near current goal candidate region, then calculate the center average of the candidate region away from current detection region, chosen distance The nearest candidate region of this average as potential target region, and from candidate collection reject potential target region and near work as The candidate region of front detection zone.

(233) the potential target extracted region BIM feature selected to step (232), is sent to step (13) and trains Grader, the result of determination in grader output potential target region.During BIM features are extracted, S2 layers and C2 layers Calculating may find new matched sub-block.The operation similar with step (222) is performed to new matched sub-block, new mesh is obtained New object candidate area is added to candidate collection by mark candidate region.

(24) it is chosen for target area maximum probability person and is final target area.For example, assert grader fraction S X () non-negative then corresponds to target, conversely, being non-targeted.So, in this step, if max (S (x)) >=0, select maximum point The result of class device fraction is used as final tracking result；If max (S (x)) ＜ 0, present frame cannot find target.

As it will be easily appreciated by one skilled in the art that the foregoing is only presently preferred embodiments of the present invention, not to The present invention, all any modification, equivalent and improvement made within the spirit and principles in the present invention etc. is limited, all should be included Within protection scope of the present invention.

Claims

1. a kind of video target tracking method of analog vision cortex mechanism, it is characterised in that including off-line training and it is online with Two stages of track；

The off-line training step is comprised the following steps：

(11) build the positive and negative sample set of tracking target image；

(12) align negative sample and set up image pyramid, extract the BIM features of image pyramid, first is carried out according to BIM features The classifier training of wheel, the weight size obtained according to classifier training screen feature to reduce intrinsic dimensionality；

The online tracking phase is comprised the following steps：

(22) BIM features are extracted to the image pyramid of detection zone, is sent to the grader that step (13) is trained, classify Device exports the result of determination of detection zone；Multiple matched sub-blocks are obtained during BIM features are extracted, it is pre- using matched sub-block Survey object candidate area；

2. the video target tracking method of analog vision cortex mechanism according to claim 1, it is characterised in that the step Suddenly the specific implementation of (12) is：

(121) BIM features are extracted to each Sample Establishing image pyramid and, it is special including S1 layer features, S2 layer features, S3 layers Seek peace S4 layer features；

The calculation of the S1 layers feature is：Image pyramid is defined as into S1 layers by the result that S1 layers computing unit is calculated Feature；S1 layer computing units are the wave filter groups that 4 Gabor filters are constituted, and are expressed as：

F (x, y, θ) = \exp (- \frac{x_{0}^{2} + γ^{2} y_{0}^{2}}{2 σ^{2}}) s i n (\frac{2 π}{λ} x_{0}); x_{0} = x c o s θ + y s i n θ,

y₀=-xsin θ+ycos θ

In formula, F (x, y, θ) represents wave filter in the corresponding response value in coordinate (x, y) place, and θ controls the direction of wave filter, value Respectively4 direction 4 Gabor filters of correspondence；γ represents the length-width ratio of wave filter, and σ represents wave filter Bandwidth, λ represents the wavelength of wave filter；

The calculation of the C1 layers feature is：S1 layers feature is defined as into C1 layers by the result of calculation of C1 computing units special Levy；C1 layers computing unit extracts maximum, the local of computing unit setting in the corresponding local neighborhood of S1 layer features to input Neighborhood is the local pyramid extracted from the corresponding image pyramid of S1 layer features；

The calculation of the S2 layers feature is：C1 layers feature is defined as into S2 layers by the result of calculation of S2 layer computing units special Levy, S2 layers computing unit calculates characteristic sub-block X and each sub-block template in C1 layer pyramidsBetween similarity measurements, It is describedIt is the sub-block randomly selected from the C1 layer features of multiple samples, d is the quantity for extracting sub-block；It is described similar Degree function is defined as：In formula,

The calculation of the C2 layers feature is：S2 layers feature is defined as into C2 layers by the result of calculation of C2 layer computing units special Levy, C2 layers computing unit calculates the similarity response global maximum of each characteristic sub-block in C1 layer pyramids；

(122) screen characteristic sub-block

C2 layers feature is sent into into classifier training, using the feature selection ability of grader, to d sub-block template according to grader The respective weights size that training is obtained is ranked up, and selects k maximum sub-block template to be used for the classifier training of the second wheel, k< d。

3. the video target tracking method of analog vision cortex mechanism according to claim 2, it is characterised in that the step Suddenly the specific implementation of (22) is：

(221) BIM features are extracted to the image pyramid of detection zone, is sent to the grader that step (13) is trained, point Class device exports the result of determination of detection zone；

(222) BIM features are extracted to the image pyramid of detection zone, obtain the corresponding best match sub-block of k sub-block template, The candidate region of target is predicted using matched sub-block, concrete implementation mode is：

The geometric center for making the image pyramid bottom of detection zone is (x, y) zero, and the center of matched sub-block is relative to be sat The vector of mark origin is defined as coordinate position (dx, dy) of the matched sub-block relative to zero, makes between two sub-block templates Distance is D_ij, i, j=1 ..., k, the distance between its Corresponding matching sub-block are D '_ij；Calculate the target scale factor of present frameMedian () represents that reduced value takes median；The centre bit of target is predicted according to each matched sub-block PutThe object candidate area centered on the center of target is built, by these The collection that object candidate area is constituted is collectively referred to as candidate collection.

4. the video target tracking method of analog vision cortex mechanism according to claim 3, it is characterised in that the step Suddenly the specific implementation of (23) is：

(231) one object candidate area of random selection from candidate collection.

(232) if grader output is judged to target, choose nearest from current goal candidate region from all candidate regions Candidate region as potential target region, and reject potential target region and the time away from detection zone from candidate collection Sampling sheet；Otherwise, the center average of the candidate region away from current detection region is calculated, this average of chosen distance is nearest Candidate region is used as potential target region, and rejects potential target region from candidate collection and near current detection region Candidate region；

(233) the potential target extracted region BIM feature selected by step (232), be sent to that step (13) trains point Class device, the result of determination in grader output potential target region；During BIM features are extracted, the calculating of S2 layers and C2 layers New matched sub-block may be found, the operation similar with step (222) is performed to new matched sub-block, be obtained new target and wait New object candidate area is added to candidate collection by favored area；

5. the video target tracking method of analog vision cortex mechanism according to claim 3, it is characterised in that the step Suddenly (24) are chosen correspondence grader result of determination from potential target region and show most possibly to be finally for target area person Target area.