CN102222231B

CN102222231B - Visual attention information computing device based on guidance of dorsal pathway and processing method thereof

Info

Publication number: CN102222231B
Application number: CN201110139467.0A
Authority: CN
Inventors: 郑灵翔; 周昌乐
Original assignee: Xiamen University
Current assignee: Xiamen University
Priority date: 2011-05-26
Filing date: 2011-05-26
Publication date: 2015-04-08
Anticipated expiration: 2031-05-26
Also published as: CN102222231A

Abstract

The invention relates to a visual attention computational model based on guidance of a dorsal pathway and a processing method thereof. The processing method comprises the steps: a dorsal pathway spatial feature component saliency map generating unit generates modulating signals through computation according to a spatial feature component saliency map, and a ventral pathway non-spatial feature component saliency map generating unit extracts non-spatial features and generates a non-spatial feature component saliency map under the action of the modulating signals; and the spatial feature component saliency map and the non-spatial feature component saliency map are combined to generate an attention saliency map and further generate attention information. According to the invention, as the model of processing of visual signals of the biological vision dorsal pathway is established firstly, the calculation result of the model is used for guiding the whole computational process of visual attention, and the computation result and the image processing effect of the model are closer to the subjective visual sense of a person.

Description

A kind of visual attention information calculation element based on Dorsal stream guiding and disposal route thereof

Technical field

The invention belongs to Vision information processing field, relate to a kind of visual attention information calculation element based on Dorsal stream guiding and disposal route thereof.

Background technology

Visual Selective Attention mechanism is that brain is selected certain interested visual information to carry out processing consciously and stops other garbages to enter the one mechanism of consciousness processing.This mechanism is a kind of important psychological regulation mechanism in human information processing procedure, and it can make people that limited visual processes resource is concentrated on self interested scene information process.Due to the vital role of selective attention, make it not only in cognitive science research, obtain more concern, and be also day by day subject to people's attention in the application of computer vision field.

The computation model of Visual Selective Attention be using of human visual attention mechanism is proposed and a kind of computation model grown up.Due to the introducing of attention mechanism, make to a certain extent to detect selection area-of-interest in the picture, processing is carried out to the target most possibly received publicity and is treated as possibility.This not only can reduce the quantity of information of vision computing greatly, and can improve the effect detecting target, has very high using value.

The computation model of existing Visual Selective Attention can be divided into bottom-up (bottom-up) and top-down (top-down) two class.Bottom-up attention computation model is formed, by the computation model of data-driven based on the low-level visual features analysis of image; Top-down attention computation model is the computation model guided based on high-rise knowledge and visual task.The current bottom-up achievement noticing that calculating research obtains is more, also more ripe; And top-down attention calculates research comparatively difficulty, the guiding that its main method still uses some high layer informations simplified or some clue to carry out noting calculates.

The theories of psychology basis of bottom-up Visual Selective Attention computation model is that Treisman is theoretical about the feature integration noted.Mainly simulating human vision system is in the front attention stage to the parallel processing of visual signature for its strategy, and adopt the mode stimulating and drive, acquisition highlights target.It can select the stimulation of appearing suddenly, therefore speed by massively parallel processing.Most of bottom-up Visual Selective Attention computation model has the framework be similar to shown in Fig. 1, first this class model extracts various visual signature to input picture on multiple yardstick, and the visual signature extracted is merged generation one significantly figure with certain weight, finally adopt certain mechanism to select focus-of-attention from remarkable figure.

Visual Selective Attention model is the earliest that the people such as Koch proposed in 1985, but this model only proposes the Computational frame of a Visual Selective Attention model, and it does not really realize.However, the impact on various model thereafter of the model of Koch is very large, and the many computing method in this model, the mechanism particularly adopting WTA (Winner Take All) neural network to produce focus-of-attention is used for reference by many models thereafter.The people such as Clark really achieve a Visual Selective Attention computation model in 1988 first times, and use it for a Binocular robot system.Milanese proposes the concept of characteristic remarkable picture (conspicuity maps) first and utilizes the extraction of central authorities-periphery difference operator realization character in the 1993 bottom-up selective attention models proposed.Itti proposed a Visual Selective Attention computation model in 1998, based on the feature integration of Treisman, theoretical and Koch Visual Selective Attention Computational frame achieves a Visual Selective Attention computational tool bag iNVT (iLabNeuromorphic Vision C++Toolkit) to this model, it adopts gaussian pyramid structure, the basis of nonuniform sampling is obtained by central authorities-periphery difference operator the significance of feature, and its framework as shown in Figure 2.The model of Itti is the most famous so far, the Visual Selective Attention model had the greatest impact, and the correlative study of many scholars is all carried out on this model basis.Itti modeling biological vision system carries out parallel fabrication to low-level visual features such as the color of each object of observing, brightness and directions, and the selectivity process obtaining and note of vying each other to each sensation target.Vision attention is divided into feature registration and the hypothesis in two stages of feature integration according to Treisman feature integration is theoretical by it, first simulation feature registration phase extracts the physical features of each object in visual scene fast parallelly from view data, form the mental representation process of each visual signature absolute coding, linear filter and Core-Periphery difference operator is adopted to extract the various visual signature data of multiple space scale from input picture, and adopt lateral inhibition mechanism to realize the competition of each homogenous characteristics in each locus, form the characteristic pattern (Feature Map) characterizing each visual signature absolute coding, then this modeling feature integration stage passes through tournament selection, attention window is only concentrated on the position of an object, the feature suppression of every other object is got rid of the process outside consciousness level simultaneously, all characteristic patterns are merged formation one significantly figure (Saliency Map), and competed mutually by WTA (Winner TakeAll) neural network, select the object space that significance is the highest, as the target of attentional selection, it utilizes inhibition of return mechanism to realize noting selection and the switching of target simultaneously.

The people such as Draper propose a kind of Visual Selective Attention computation model in 2005 on the basis of Itti model, the feature of the multiple yardstick of this model nonjoinder, but they are retained, form the remarkable figure pyramid of multiple yardstick, thus make the conversion of model to two dimension have better adaptive faculty.Frintrop proposed another kind of Visual Selective Attention model in 2005, and brightness is divided into on-off and off-on two passages by it, makes model more reasonable to attentional selection in some cases.The people such as Walther then introduce the method for Iamge Segmentation in a model, when WTA neural network selects triumph region from remarkable figure, adopt image partition method gain attention select target, make note calculate select region more consistent with realistic objective region.Selectivity modulation (ST, Selective Tuning) visual attention model that the people such as Tsotsos propose is based on neural fusion, and it have employed pyramid structure, achieves the attentional selection based on motion and deep vision.Although bottom-up Visual Selective Attention computing method achieve certain effect, but due to the method only analysis chart as bottom-up information, complex scene is difficult to reach gratifying effect, and this computing method also with mankind's initiative, have object selective attention inconsistent, cannot complete in appointed task situation attention search.

In addition, current model only considered brightness in visual information, color and local directional information mostly, but the information comprised in the visual scene of reality will be enriched many, only use brightness, color and directional information can not express the attention treatment mechanism of biological vision system to actual scene completely.As Wolfe points out, the visual signature that impact is noted comprises color, direction, curvature, size, motion, fine pitch, Depth cue, gloss and shape appearance etc., and wherein motion feature is particularly responsive for this.In recent years, have many researchists to notice the effect of movable information in vision attention, propose the visual attention model that some comprise motion feature, the attention that the people such as such as Ma utilize movable information to achieve dynamic scene calculates; The attention that the people such as Lopez achieve moving scene in conjunction with brightness, motion and shape facility calculates; The dynamic scene attention computation model that the integrate features such as motion and color, brightness and direction propose by the people such as Hang; Movable information then by calculating the difference of the remarkable information of front and back two two field picture, is added existing attention computation model by the people such as Jeong.The defect that although these models all utilize movable information to improve Itti model to be analyzed focus-of-attention at some moving scene, but they are imperfection also, cannot obtain good effect to the motion conditions of some complexity, and these models do not take into full account that biological vision system is forming the processing mode to motion feature in attention process yet.

Can find from the research of Neurobiology, the Dorsal stream (Where path) of visually-perceptible is responsible for the process of the spatial informations such as motion feature.Its processing speed is very fast, and the spatial information of visual scene may form initial feedback signal after prefrontal lobe is processed, and this signal is passed to veutro path.This just means the preliminary aim location noting character before the Dorsal stream of visually-perceptible can utilize the spatial informations such as motion to carry out, thus makes Dorsal stream form guide effect to veutro path.In addition, researchist also finds that visual information has from retina part of nerve fibers to leave tractus opticus and directly forms retina-top cover projection towards superior colliculus (optic tectum) to the transmitting procedure of visual cortex.Topo graph is looked in retina-top cover projection formation, and it participates in the front attention modulation of vision directly.As can be seen here, in the vision system of the mankind, the spatial information not only non-space feature such as same color, brightness equally participates in the feature integration noted, the modulation noted before also participating in of space characteristics simultaneously.Can find out that the essential difference of various bottom-up computation strategy is that model extracts feature from above-mentioned analysis different, and the difference of processing mode to feature.From the feature extracted, comparatively conventional feature mainly contains brightness, color and direction texture etc.But these models are all by different characteristic simple superposition to the calculating of bottom-up attention mostly, the neuromechanism of this mode and vision system is not quite identical, relation in biological vision system between various feature is more complicated, and such as existing simple strategy all reckons without the interactive relation between visual centre back of the body veutro path.The vision noticing mechanism deviation of above-mentioned existing visual attention computation model and the mankind is larger.

Summary of the invention

The object of the present invention is to provide a kind of interactive relation can simulated between biological vision system back of the body veutro path, more consistent with the vision noticing mechanism of the mankind, and its result of calculation and the process effect of image and the subjective vision of people experience the more close visual attention information calculation element that guides based on Dorsal stream and disposal route thereof.

For achieving the above object, a kind of visual attention information calculation element guided based on Dorsal stream of the present invention, mainly comprises image input units, Dorsal stream space characteristics component remarkable figure generation unit, the remarkable figure generation unit of veutro path non-space characteristic component and notes information extraction unit;

Described image input units: for output video image to Dorsal stream space characteristics component remarkable figure generation unit and the remarkable figure generation unit of veutro path non-space characteristic component;

Described Dorsal stream space characteristics component remarkable figure generation unit: for the space characteristics that extracts in video image and the remarkable figure of span characteristic component outputs to attention information extraction unit; The space characteristics component that the remarkable figure generation unit of this Dorsal stream space characteristics component also generates according to it simultaneously significantly schemes to calculate generation modulation signal, acts on veutro path non-space characteristic component remarkable figure generation unit and extracts non-space feature and the process generating the remarkable figure of non-space characteristic component;

The described remarkable figure generation unit of veutro path non-space characteristic component: extract the non-space feature that video image comprises color, brightness and direction, then these features are utilized, under the effect of the modulation signal of the remarkable figure generation unit generation of Dorsal stream space characteristics component, generate the remarkable figure of non-space characteristic component and output to attention information extraction unit;

Described attention information extraction unit: note significantly figure for significantly being schemed by non-space characteristic component significantly to scheme with space characteristics component to merge to generate, and generate attention information.

The disposal route of a kind of visual attention computation model based on Dorsal stream guiding of the present invention, specifically comprises the steps:

Step 1, image input units output video image are to Dorsal stream space characteristics component remarkable figure generation unit and the remarkable figure generation unit of veutro path non-space characteristic component;

Step 2, described Dorsal stream space characteristics component remarkable figure generation unit extract the space characteristics in video image and the remarkable figure of span characteristic component outputs to attention information extraction unit; The space characteristics component that the remarkable figure generation unit of this Dorsal stream space characteristics component also generates according to it simultaneously significantly schemes to calculate generation modulation signal, acts on veutro path non-space characteristic component remarkable figure generation unit and extracts non-space feature and the process generating the remarkable figure of non-space characteristic component;

The remarkable figure generation unit of step 3, described veutro path non-space characteristic component extracts the non-space feature that video image comprises color, brightness and direction, then these features are utilized, under the effect of the modulation signal of the remarkable figure generation unit generation of Dorsal stream space characteristics component, generate the remarkable figure of non-space characteristic component and output to attention information extraction unit;

Non-space characteristic component is significantly schemed by step 4, described attention information extraction unit and space characteristics component significantly schemes to merge generation attention significantly figure, and generates attention information.

The described remarkable figure of span characteristic component comprises the steps:

(1) if a certain space characteristics is made up of one group of subcharacter, then gaussian pyramid is adopted to carry out multi-scale image decomposition respectively to the gray level image of each subcharacter of this group, obtain the multi-layer image that comprises 0-8 layer 9 yardstick altogether, wherein the 0th layer is original image, the layer that numeral is larger, its graphical rule is less, and the picture breakdown of described gaussian pyramid comprises level and smooth and down-sampled two computings, for gray level image I (x, the y)=I of a width Expressive Features ₀(x, y), its i-th tomographic image I _i(x, y) and the i-th-1 tomographic image I _i-1pass between (x, y) is

I_{i} (x, y) = Σ_{m = - M}^{M} Σ_{n = - N}^{N} C (m, n) I_{i - 1} (2 x + m, 2 y + n)

Wherein C (m, n) is smoothed image gaussian kernel function used;

(2) Core-Periphery difference operator, is utilized to calculate the characteristic pattern of each space characteristics on multi-layer image to described multi-layer image, described Core-Periphery Difference Calculation is subtracted each other by the interlayer of central core and outer perisphere and is realized, the the 2nd, 3 and 4 layer that described central core is obtained multi-layer image, outer perisphere corresponding to each central core then centered by the number of plies of layer add 3 or 4, total 2-5,2-6,3-6,3-7,4-7 and 4-8 be totally 6 interlayer Difference Calculation, thus make each space characteristics form the characteristic pattern of 6 different scales;

When Difference Calculation, first by unified for the size of central core and the periphery tomographic image picture size size adjusting to the 5th layer, then image pixel corresponding for two tomographic images is carried out phase reducing obtain characteristic pattern, its computing formula is as follows:

F(c,s)＝N(|M(c)ΘM(s)|)

s＝c+δ,c＝2,3,4,δ＝3,4

Wherein, the numbering of layer centered by c, s is the numbering of outer perisphere, the characteristic pattern that the central peripheral Difference Calculation that F (c, s) represents c-s interlayer obtains; M (c) represents central core image, M (s) represents corresponding periphery tomographic image, Θ is Difference Calculation, function N is a kind of nonlinear regulation operator of iteration, and for simulating the lateral inhibition phenomenon extensively existed between Cortical Neurons, the homogenous characteristics adopting lateral inhibitory neural network to realize adjacent position is vied each other and mutually suppressed, the feature of winning in competition is expressed, and it is interval to normalize to [0,1], its iterative process formula is:

F _.1(c,s)＝R(F ₀(c,s))

F _i+1(c,s)＝R(F _i(c,s)+F _i(c,s)*DOG-C),i≥1

Wherein the feature amplitude being less than 0 is set to 0 by function R, and be greater than the then linear projection of 0 to [0,1] interval, C is offset constant, and DOG is difference of Gaussian, and it is defined as follows:

DOG (x, y) = \frac{c_{ex}^{2}}{2 π σ_{ex}^{2}} e^{- (x^{2} + y^{2}) / 2 σ_{ex}^{2}} - \frac{c_{in}^{2}}{2 π σ_{in}^{2}} e^{- (x^{2} + y^{2}) / 2 σ_{in}^{2}}

Wherein, parameter c _exdetermine the intensity of pumping signal in iterative process, σ _exdetermine the reach of pumping signal in iterative process, c _indetermine in iterative process the intensity suppressing signal, σ _indetermine in iterative process the reach suppressing signal;

(3) if a certain space characteristics is made up of one group of subcharacter, then union operation is performed to each feature in this group subcharacter, the characteristic pattern of each layer is transformed to same scale, and by pixel, accumulation operations is performed to the feature of cross-layer, and use the normalization of above-mentioned nonlinear regulation operator function N realization character, form the characteristic remarkable picture of each different spaces feature; And utilize nonlinear regulation operator function N to realize normalization after the characteristic remarkable picture equal proportion of all space characteristics being merged to obtain space characteristics component and significantly scheme Γ _m;

Described space characteristics component significantly schemes Γ _mcomputing formula is as follows:

Γ_{M} = N (\underset{D}{Σ} {&CirclePlus;}_{c = 2}^{4} {&CirclePlus;}_{s = c + 3}^{c + 4} F_{D} (c, s))

Wherein, function N is nonlinear regulation operator, the characteristic pattern that the central peripheral Difference Calculation that F (c, s) represents c-s interlayer obtains, and computing ⊕ is union operation, and parameter D is subcharacter numbering, Γ _mfor the space characteristics component finally obtained significantly is schemed.

Generating the formula of Dorsal stream to the modulation signal G (x, y) of veutro path according to the intensity size of space characteristics in the remarkable figure of space characteristics component in described step 2 is G (x, y)=1+W (Γ _n(x, y)),

Wherein, Γ _n(x, y) significantly schemes for space characteristics component, and function W () is weighting function, and its to be a kind of codomain be [0,1] interval monotonic quantity, this function is for regulating the intensity of modulation signal.

At the modulation signal G (x that the remarkable figure generation unit of Dorsal stream space characteristics component generates in step 3, y) under effect, generate non-space characteristic component and significantly scheme F (x, y) formula is: F (x, y)=H (G (x, y), F'(x, y))

Wherein, H () for neurocyte be subject to modulation signal impact after response function, F'(x, y) the primitive character figure for extracting under non-modulated signal effect, in processing procedure, if modulation signal G is (x, y) picture size corresponding to and primitive character figure F'(x, y) inconsistent, then need first to be adjusted to primitive character figure F'(x, y) size.

Because first the present invention simulates biological vision Dorsal stream processing speed feature faster, first modeling is carried out to the visual signal process of biological vision Dorsal stream, the result of calculation of this model is for guiding the computation process of whole vision attention, thus make vision attention computation process more consistent with biological vision process, the results show, the analysis result that the present invention obtains is better than existing additive method.

Accompanying drawing explanation

Fig. 1 practises the bottom-up Visual Selective Attention computation model configuration diagram had;

Fig. 2 practises the Itti Visual Selective Attention model framework schematic diagram had;

Fig. 3 is visual attention computation model treatment scheme schematic diagram of the present invention;

Fig. 4 is the principle schematic of block matching algorithm of the present invention;

Fig. 5 is motion vector projecting direction schematic diagram of the present invention.

Below in conjunction with specific embodiment, the invention will be further described.

Embodiment

As shown in Figure 3, a kind of visual attention information calculation element guided based on Dorsal stream of the present invention, comprises image input units 1, Dorsal stream space characteristics component remarkable figure generation unit 2, the remarkable figure generation unit 3 of veutro path non-space characteristic component and notes information extraction unit 4;

Described image input units 1: for output video image to Dorsal stream space characteristics component remarkable figure generation unit 2 and the remarkable figure generation unit 3 of veutro path non-space characteristic component;

Described Dorsal stream space characteristics component remarkable figure generation unit 2: for extracting space characteristics in video image and the remarkable figure of span characteristic component outputs to and notes information extraction unit 4; And significantly scheme to calculate generation modulation signal according to this space characteristics component, act on veutro path non-space characteristic component remarkable figure generation unit 3 and extract non-space feature and generate the process of the remarkable figure of non-space characteristic component; simultaneously

The described remarkable figure generation unit 3 of veutro path non-space characteristic component: comprise the non-space features such as color, brightness and direction for extracting video image, and the modulation signal that Dorsal stream space characteristics component remarkable figure generation unit 2 generates is acted in the process generating the remarkable figure of non-space characteristic component;

Described attention information extraction unit 4: note significantly figure for significantly being schemed by non-space characteristic component significantly to scheme with space characteristics component to merge to generate, and generate attention information.

As shown in Figure 3, the disposal route of a kind of visual attention computation model based on Dorsal stream guiding of the present invention, specifically comprises the following steps:

Step 1, image input units 1 output video image are to Dorsal stream space characteristics component remarkable figure generation unit 2 and the remarkable figure generation unit 3 of veutro path non-space characteristic component;

Step 2, the remarkable figure generation unit of this Dorsal stream space characteristics component extract the space characteristics in the video image of input, and the remarkable figure of span characteristic component outputs to attention information extraction unit 4; And significantly scheme to calculate generation modulation signal according to this space characteristics component, act on veutro path non-space characteristic component remarkable figure generation unit 3 and extract non-space feature and generate the process of the remarkable figure of non-space characteristic component; simultaneously

The extracting method of the present invention to space characteristics does not limit, as long as the feature extracted can describe by the form of gray-scale map.

Such as, motion feature in scene can be described by the motion vector comprising movement velocity and direction of motion two kinds of information for motion feature, after obtaining motion vector, the direction of motion feature that the movement velocity being broken down into absolute motion velocity characteristic and projecting to 8 directions such as 0,1/4 π, 1/2 π, 3/4 π, π, 5/4 π, 3/2 π, 7/4 π and 2 π is respectively expressed.

Block matching algorithm is adopted to carry out motion vector computation, as shown in Figure 4, the image array of present frame is divided into a block matrix, again each block in present frame is searched in the scope that this block correspondence position maximum offset of former frame is P, find out searched block in match block corresponding to former frame, thus go out motion vector (dx, dy) according to the position calculation of match block.

Whether the block before and after being weighed by cost function between two frames is mated, and when cost function obtains minimum value, then thinks that two blocks are best matching blocks.The present invention adopts absolute square error (MSE) to be cost function, is defined as follows

MSE (x, y) = \frac{1}{N^{2}} Σ_{i = 0}^{N - 1} Σ_{j = 0}^{N - 1} {(B_{k} (i + x, j + y) - B_{k - 1} (i + x + dx, j + y + dy))}^{2}

Wherein N is the size of block, namely the number of pixels of every row (or often arrange); B _kfor the block of kth frame, its coordinate is (x, y).

According to the result of Block-matching, the motion vector representing object of which movement severe degree in the frame of front and back two can be calculated, according to the motion vector (dx, dy) calculated, the absolute motion speed of each pixel can be obtained:

V (x, y) = \sqrt{{dx}^{2} + {dy}^{2}}

Wherein, V (x, y) for coordinate be (x, y) block in the absolute motion speed of all pixels

Motion vector (dx, dy) is projected to 8 directions such as 0,1/4 π, 1/2 π, 3/4 π, π, 5/4 π, 3/2 π, 7/4 π and 2 π, shown in Figure 5,

The calculating of all directions projection adopts following formula

D ₀(x,y)＝|dx| ₊

D _π(x,y)＝|-dx| ₊

D_{\frac{1}{2} π} (x, y) = {| dy |}_{+}

D_{\frac{3}{2} π} (x, y) = {| - dy |}_{+}

D_{\frac{1}{4} π} (x, y) = {\frac{\sqrt{2}}{2} | dx + dy |}_{+}

D_{\frac{5}{4} π} (x, y) = {\frac{\sqrt{2}}{2} | - dx - dy |}_{+}

D_{\frac{3}{4} π} (x, y) = {\frac{\sqrt{2}}{2} | - dx + dy |}_{+}

D_{\frac{7}{4} π} (x, y) = {\frac{\sqrt{2}}{2} | dx - dy |}_{+}

Wherein || ₊minus value is set to zero by formula, is greater than zero then remain unchanged, makes motion vector projection in any direction all be greater than zero, if be less than zero in the projection speed of a direction, projected to contrary direction.

Carry out medium filtering to the movement velocity extracted and each direction of motion feature, and utilize frame difference method to obtain effective range of movement, the motion characteristic value outside this range of movement is set to zero entirely.

An expression absolute motion velocity characteristic and 8 expressions, 8 different motion direction motion features are defined according to the motion feature that said method extracts, on the gray-scale map describing these motion features, the brightness of each point just represents this speed in this direction of motion, the higher movement velocity of brightness is faster, and more harmonic motion speed is slower in brightness.

In described video image, the space characteristics conspicuousness space characteristics component of each target significantly schemes to describe.On the remarkable figure of space characteristics component, the higher target of brightness shows that the conspicuousness of this target is higher.

The concrete steps of the described remarkable figure of span characteristic component are as follows:

(1) if a certain space characteristics is made up of one group of subcharacter, then gaussian pyramid is adopted to carry out multi-scale image decomposition respectively to the gray level image of each subcharacter of this group, obtain the multi-layer image that comprises 0-8 layer 9 yardstick altogether, wherein the 0th layer is original image, the layer that numeral is larger, its graphical rule is less, forms the structure that is similar to pyramid;

The picture breakdown of described gaussian pyramid comprises level and smooth and down-sampled two computings, for gray level image I (x, the y)=I of a width Expressive Features ₀(x, y), its i-th tomographic image I _i(x, y) and the i-th-1 tomographic image I _i-1pass between (x, y) is

I_{i} (x, y) = Σ_{m = - M}^{M} Σ_{n = - N}^{N} C (m, n) I_{i - 1} (2 x + m, 2 y + n)

Wherein C (m, n) is smoothed image gaussian kernel function used;

(2) Core-Periphery difference operator, is utilized to calculate the characteristic pattern of each space characteristics on multi-layer image to described multi-layer image, described Core-Periphery Difference Calculation is subtracted each other by the interlayer of central core and outer perisphere and is realized, described central core is the 2nd, 3 and 4 layer of multi-layer image, outer perisphere corresponding to each central core then centered by the number of plies of layer add 3 or 4, total 2-5,2-6,3-6,3-7,4-7 and 4-8 be totally 6 interlayer Difference Calculation, thus make each space characteristics form the characteristic pattern of 6 different scales;

F(c,s)＝N(|M(c)ΘM(s)|)

s＝c+δ,c＝2,3,4,δ＝3,4

F _.1(c,s)＝R(F ₀(c,s))

F _i+1(c,s)＝R(F _i(c,s)+F _i(c,s)*DOG-C),i≥1

DOG (x, y) = \frac{c_{ex}^{2}}{2 π σ_{ex}^{2}} e^{- (x^{2} + y^{2}) / 2 σ_{ex}^{2}} - \frac{c_{in}^{2}}{2 π σ_{in}^{2}} e^{- (x^{2} + y^{2}) / 2 σ_{in}^{2}}

(3) if a certain space characteristics is made up of one group of subcharacter, then union operation is performed to each feature in this group subcharacter, the characteristic pattern of each layer is transformed to same scale, and by pixel, accumulation operations is performed to the feature of cross-layer, and use the normalization of above-mentioned nonlinear regulation operator function N realization character, form the characteristic remarkable picture of each different spaces feature, such as motion feature, then need the characteristic remarkable picture of the motion feature of each different directions and absolute motion speed directly to superpose merging and obtain motion feature and significantly scheme.

Described space characteristics component remarkable figure computing formula is as follows:

Γ_{M} = N (\underset{D}{Σ} {&CirclePlus;}_{c = 2}^{4} {&CirclePlus;}_{s = c + 3}^{c + 4} F_{D} (c, s))

Wherein, function N is nonlinear regulation operator, the characteristic pattern that the central peripheral Difference Calculation that F (c, s) represents c-s interlayer obtains, computing ⊕ is union operation, and parameter D is subcharacter numbering, such as, for motion feature, then D=-1,0, π/4, pi/2,3 π/4, π, 5 π/4,3 pi/2s, 7 π/4 grade 9 values, wherein-1 represents that absolute motion speed is levied, and its residual value is 8 direction of motion features, Γ _mfor the space characteristics component finally obtained significantly is schemed, for motion feature, it is exactly that the characteristic component of motion feature is significantly schemed.

(4) utilize nonlinear regulation operator function N to realize normalization after, the characteristic remarkable picture equal proportion of all these space characteristics obtained above being merged to obtain space characteristics component and significantly scheme Γ _n.

When the space characteristics conspicuousness of the visual object handled by Dorsal stream is stronger, then it is also more obvious to the humidification of the extraction of veutro path visual information.Dorsal stream is generated to the modulation signal of veutro path, by significantly scheming Γ to space characteristics component according to the intensity size of space characteristics in the remarkable figure of space characteristics component _nanalyze, the computing formula obtaining modulation signal G (x, y) is:

G(x,y)＝1+W(Γ _N(x,y))

Wherein, function W () is weighting function, and its to be a kind of codomain be [0,1] interval monotonic quantity, this function can be also can be linearly nonlinear, and it is for regulating the intensity of modulation signal.For the sake of simplicity, in the present invention, this weighting function adopts as given a definition

W (Γ) = λ {| \frac{Γ - \min (Γ)}{\max (Γ) - \min (Γ)} - δ |}_{> ϵ}

Above formula is substituted into the solution formula of modulation signal, can obtain

G (x, y) = 1 + λ {| \frac{Γ_{N} (x, y) - \min (Γ_{N} (x, y))}{\max (Γ_{N} (x, y) - \min (Γ_{N} (x, y)))} - δ |}_{> ϵ}

Wherein, λ is the strength of modulating signal scale factor relevant to subjective consciousness, and when subjective consciousness comparatively pays close attention to movable information, λ increases, otherwise reduces, and due to the more difficult description of subjective consciousness, this parameter is reduced to a default constant in the present invention; δ is space characteristics sensitive factor, and it also has certain relation with subjective consciousness, and equally in the present invention, it is reduced to a constant; || _{> ε}for thresholding operation, when the value of formula is less than threshold epsilon, its value is zero.

Step 3, the described remarkable figure generation unit 3 of veutro path non-space characteristic component: comprise the non-space features such as color, brightness and direction for extracting video image, and the modulation signal that Dorsal stream space characteristics component remarkable figure generation unit 2 generates is acted in the process generating the remarkable figure of non-space characteristic component, its method is as follows;

Brightness is calculated by intensity level r, g, b of red, green, blue three kinds of colors of the coloured image of input video, and its formula is as follows:

I = \frac{r + g + b}{3}

The color feature extracted of model simulates the function of retina P type gangliocyte and non-M-non-P type gangliocyte, extracts the colouring information of red-green (RG) and blue-yellow (BY) two passages respectively:

RG = {[\frac{r - g}{\max (r, g, b)}]}_{+}

BY = {[\frac{b - \min (r, g)}{\max (r, g, b)}]}_{+}

Here [] ₊represent and correct, max (r, g, b) is less than to the pixel of threshold value, the value of itself RG and BY is set to zero.

The direction character of image is not directly asked, and it can be calculated by brightness.Also can be calculated by the result of brightness directional characteristic result.

Brightness and color characteristic first adopt gaussian pyramid to carry out multi-resolution decomposition (total 0-8 layer 9 yardstick altogether, wherein the 0th layer is original image).

The picture breakdown of gaussian pyramid comprises level and smooth and down-sampled two computings.For piece image I (x, y), its i-th tomographic image I _i(x, y) and the i-th-1 tomographic image I _i-1pass between (x, y) is

I_{i} (x, y) = Σ_{m = - M}^{M} Σ_{n = - N}^{N} C (m, n) I_{i - 1} (2 x + m, 2 y + n)

Wherein C (m, n) is gaussian kernel function smoothly used.

The multi-layer image that directional characteristic gaussian pyramid decomposes then produces by the Gabor filter convolution of each Scale Decomposition figure and 0 of brightness, π/4, pi/2 and 3 π/4 four directions, and its computing formula is.

Wherein g is Gabor filter, and it is defined by formula below.

x'＝x cosθ+y sinθ

y'＝-x sinθ+y cosθ

Wherein, the parameter lambda in formula determines the wavelength of Gabor filter; Parameter θ determine wave filter towards; Parameter phi is phasing degree, which determines the type of wave filter; Parameter γ is aspect ratio, it is described that the aspect ratio of wave filter; Parameter σ can not directly determine, it is determined by following formula by bandwidth parameter b:

b = \log_{2} \frac{\frac{σ}{λ} π + \sqrt{\frac{\ln 2}{2}}}{\frac{σ}{λ} π - \sqrt{\frac{\ln 2}{2}}}, \frac{σ}{λ} = \frac{1}{π} \sqrt{\frac{\ln 2}{2}} * \frac{2^{b} + 1}{2^{b} - 1}

The feature of multiple yardsticks of the pyramid structure of the non-space features such as above-mentioned color, brightness and direction adopts Core-Periphery difference operator to calculate the characteristic pattern obtaining multiple yardstick.Described Core-Periphery Difference Calculation is realized by the interlayer phase reducing of central core with outer perisphere, central core is the 2nd, 3 and 4 layer of pyramid structure, outer perisphere corresponding to each central core then centered by the number of plies of layer add 3 or 4, so just form 2-5,2-6,3-6,3-7,4-7 and 4-8 be totally 6 interlayer Difference Calculation, thus make each feature form the characteristic pattern of 6 different scales.When Difference Calculation, first by unified for the size of central core and the periphery tomographic image picture size adjusting to some designated layers, then pixel corresponding for two tomographic images is carried out phase reducing obtain characteristic pattern, its computing formula is as follows:

F(c,s)＝N(|M(c)ΘM(s)|)

s＝c+δ,c＝2,3,4,δ＝3,4

Wherein, F represents the characteristic pattern calculated, and M (c) represents central core image (such as I, RG, BY or L of a certain feature _o), M (s) represents corresponding periphery tomographic image, Θ is Difference Calculation, function N is a kind of nonlinear regulation operation of iteration, it simulates the lateral inhibition phenomenon extensively existed between Cortical Neurons, and the homogenous characteristics adopting lateral inhibitory neural network to realize adjacent position is vied each other and mutually suppressed, and the feature of winning in competition is expressed, and it is interval to normalize to [0,1].Its iterative process can be by formulae express:

F _.1(c,s)＝R(F ₀(c,s))

F _i+1(c,s)＝R(F _i(c,s)+F _i(c,s)*DOG-C),i≥1

Wherein the feature amplitude being less than 0 is set to 0 by function R, is greater than the then linear projection of 0 to [0,1] interval; C is offset constant; DOG is difference of Gaussian, and it is defined as follows:

DOG (x, y) = \frac{c_{ex}^{2}}{2 π σ_{ex}^{2}} e^{- (x^{2} + y^{2}) / 2 σ_{ex}^{2}} - \frac{c_{in}^{2}}{2 π σ_{in}^{2}} e^{- (x^{2} + y^{2}) / 2 σ_{in}^{2}}

Wherein, parameter c _exdetermine the intensity of pumping signal in iterative process, σ _exdetermine the reach of pumping signal in iterative process, c _indetermine in iterative process the intensity suppressing signal, σ _indetermine in iterative process the reach suppressing signal.

Finally, the processing procedure that modulation signal G step 2 obtained by following formula acts on non-space feature obtains final non-space characteristic pattern F:

F(x,y)＝H(G(x,y),F'(x,y))

Wherein, H () for neurocyte be subject to modulation signal impact after response function, for simplification problem and be convenient to calculate, this modulating action simply can be equivalent to modulation signal to the primitive character figure F'(x, the y that extract under non-modulated signal operative condition) effect.In the present embodiment, for the sake of simplicity, make H () for modulation signal G (x, y) with primitive character figure F'(x, y) product, in processing procedure, if G is (x, y) picture size corresponding to and F'(x, y) inconsistent, need first to be adjusted to characteristic pattern F'(x, y) size.

All characteristic patterns for the same non-space feature obtained need to merge morphogenesis characters significantly schemes.First the characteristic pattern of each layer is transformed to same scale, then by pixel, accumulation operations is performed to the feature of cross-layer, and use the normalization of aforesaid nonlinear regulation operator function N realization character, finally form the characteristic remarkable picture in brightness, color and direction, its union operation can be by formulae express

Γ_{I} = N ({&CirclePlus;}_{c = 2}^{4} {&CirclePlus;}_{s = c + 3}^{c + 4} F_{I} (c, s))

Γ_{C} = N (\underset{j}{Σ} {&CirclePlus;}_{c = 2}^{4} {&CirclePlus;}_{s = c + 3}^{c + 4} F_{j} (c, s)), j = RG, BY

Γ_{O} = N (\underset{o}{Σ} {&CirclePlus;}_{c = 2}^{4} {&CirclePlus;}_{s = c + 3}^{c + 4} F_{o} (c, s)), o = 0, \frac{1}{4} π, \frac{1}{2} π, \frac{3}{4} π

Here computing ⊕ is the feature union operation of cross-layer; Γ _i, Γ _cand Γ _obe respectively under modulation signal effect, extract that the brightness obtained significantly is schemed, color characteristic is significantly schemed and direction character is significantly schemed, the characteristic pattern that wherein the remarkable figure of color characteristic is extracted under modulation signal effect by RG and BY two Color Channels obtains, the remarkable figure of direction character by 0, the characteristic pattern that extracts under modulation signal effect of π/4, pi/2 and 3 π/4 four direction features obtains.

Generating feature significantly schemes to adopt the reason of nonlinear regulation operator of iteration the same with characteristic pattern, and its main cause considers vying each other and suppression relation of existing between homogenous characteristics.To dissimilar feature, the research of Neurobiology shows that they are co-operating to a greater extent, thus generates the strategy taking directly superposition to merge when non-space characteristic component is significantly schemed from characteristic remarkable picture, that is to say and brightness is significantly schemed Γ _i, color characteristic significantly schemes Γ _csignificantly Γ is schemed with direction character _oequal proportion utilizes function N to realize normalization and obtains non-space characteristic component and significantly scheme Γ after merging _s.

Just appear in the remarkable figure of space characteristics component owing to only having the higher object of space characteristics conspicuousness, other guide in image is zero in the remarkable figure of space characteristics component, it can thus be appreciated that, the effect of modulation signal makes the neuronic activity of veutro path corresponding to the feature of the more significant object of aware space feature obtain easyization, space characteristics is enhanced compared with the non-space feature of obvious object in characteristic pattern, improves the conspicuousness of this object at the remarkable figure of non-space characteristic component.

Non-space characteristic component is significantly schemed by step 4, described attention information extraction unit 4 and space characteristics component significantly schemes to merge generation attention significantly figure, and generates attention information.

From noticing that the method extracting attention information remarkable figure has multiple, a kind of comparatively conventional method is the WTA neural network used in Itti model, WTA neural network is utilized to compete mutually, select the object space that significance is the highest, as the target (that is to say attention information) of attentional selection, it can utilize inhibition of return mechanism to realize noting selection and the switching of target simultaneously.

Of the present inventionly to focus on: significantly will scheme to calculate according to space characteristics component and generate modulation signal, act on veutro path non-space characteristic component remarkable figure generation unit extract non-space feature and generate in the process of the remarkable figure of non-space characteristic component, the neuronic activity of veutro path corresponding to the feature of the more significant object of aware space feature is made to obtain easyization, the non-space feature making space characteristics compared with obvious object is enhanced by this facilitation in characteristic pattern, it is made to be easier to win in competition, intensity in the remarkable figure of non-space characteristic component is stronger, namely improve the conspicuousness of this object at the remarkable figure of non-space characteristic component.

The above, it is only present pre-ferred embodiments, not technical scope of the present invention is imposed any restrictions, thus every above embodiment is done according to technical spirit of the present invention any trickle amendment, equivalent variations and modification, all still belong in the scope of technical solution of the present invention.

Claims

1. based on the visual attention information calculation element that Dorsal stream guides, it is characterized in that: mainly comprise image input units, Dorsal stream space characteristics component remarkable figure generation unit, the remarkable figure generation unit of veutro path non-space characteristic component and note information extraction unit;

The described remarkable figure generation unit of Dorsal stream space characteristics component: the space characteristics figure generating gray-scale map form for the space characteristics extracted in video image, and output to attention information extraction unit after utilizing the remarkable figure of this space characteristics figure span characteristic component; This Dorsal stream space characteristics component remarkable figure generation unit also generates Dorsal stream to the modulation signal of veutro path according to the intensity size of space characteristics in the remarkable figure of this space characteristics component simultaneously, acts on veutro path non-space characteristic component remarkable figure generation unit and extracts non-space feature and the process generating the remarkable figure of non-space characteristic component;

The described remarkable figure generation unit of veutro path non-space characteristic component: extract video image and comprise color, the non-space feature in brightness and direction, then these features are utilized, under the effect of the modulation signal of the remarkable figure generation unit generation of Dorsal stream space characteristics component, generate the remarkable figure of non-space characteristic component and output to attention information extraction unit, namely by following formula, the processing procedure that the modulation signal that Dorsal stream space characteristics component remarkable figure generation unit generates acts on non-space feature is obtained final non-space characteristic pattern F (x, y): F (x, y)=H (G (x, y), F'(x, y)), wherein, H () is subject to modulation signal G (x for neurocyte, y) response function after impact, G (x, y) be modulation signal, F'(x, y) be primitive character figure, finally, all characteristic patterns of the same non-space feature obtained are merged generate brightness, color and direction characteristic remarkable picture and output to attention information extraction unit,

2., based on a disposal route for the visual attention computation model of Dorsal stream guiding, it is characterized in that specifically comprising the steps:

3. the disposal route of a kind of visual attention computation model based on Dorsal stream guiding according to claim 2, is characterized in that: the remarkable figure of described span characteristic component comprises the steps:

I_{i} (x, y) = Σ_{m = - M}^{M} Σ_{n = - N}^{N} C (m, n) I_{i - 1} (2 x + m, 2 y + n)

Wherein C (m, n) is smoothed image gaussian kernel function used;

F(c,s)＝N(|M(c)ΘM(s)|)

s＝c+δ,c＝2,3,4,δ＝3,4

F _.1(c,s)＝R(F ₀(c,s))

F _i+1(c,s)＝R(F _i(c,s)+F _i(c,s)*DOG-C),i≥1

DOG (x, y) = \frac{c_{ex}^{2}}{2 π σ_{ex}^{2}} e^{- (x^{2} + y^{2}) / 2 σ_{ex}^{2}} - \frac{c_{in}^{2}}{2 π σ_{in}^{2}} e^{- (x^{2} + y^{2}) / 2 σ_{in}^{2}}

Γ_{M} = N (\underset{D}{Σ} {&CirclePlus;}_{c = 2}^{4} {&CirclePlus;}_{s = c + 3}^{c + 4} F_{D} (c, s))

4. the disposal route of a kind of visual attention computation model based on Dorsal stream guiding according to claim 2, it is characterized in that: in described step 2, generate Dorsal stream to the modulation signal G (x of veutro path according to the intensity size of space characteristics in the remarkable figure of space characteristics component, y) formula is G (x, y)=1+W (Γ _n(x, y)),

5. the disposal route of a kind of visual attention computation model based on Dorsal stream guiding according to claim 2, it is characterized in that: at the modulation signal G (x that the remarkable figure generation unit of Dorsal stream space characteristics component generates in step 3, y) under effect, generate non-space characteristic component and significantly scheme F (x, y) formula is: F (x, y)=H (G (x, y), F'(x, y))

Wherein, H () for neurocyte be subject to modulation signal impact after response function, F'(x, y) the primitive character figure for extracting under non-modulated signal effect, in processing procedure, if the picture size corresponding to modulation signal G (x, y) and primitive character figure F'(x, y) inconsistent, need first to be adjusted to primitive character figure F'(x, y) size.