CN109241834A

CN109241834A - A kind of group behavior recognition methods of the insertion based on hidden variable

Info

Publication number: CN109241834A
Application number: CN201810840520.1A
Authority: CN
Inventors: 郑伟诗; 李本超; 唐永毅
Original assignee: National Sun Yat Sen University
Current assignee: Sun Yat Sen University; National Sun Yat Sen University
Priority date: 2018-07-27
Filing date: 2018-07-27
Publication date: 2019-01-18

Abstract

The invention discloses a kind of group behavior recognition methods of insertion based on hidden variable, include the following steps: S1, group behavior is identified, carry out individual picture segmentation using the individual callout box provided in data set；S2, building hidden variable incorporation model, the graph model with node and side is characterized using the variable of hidden insertion；S3, the group behavior identification based on hidden variable incorporation model, by one group of hidden variable of construction, the feature that people and group interaction behavior can be expressed；S4, attention mechanism is introduced to each individual related with group behavior instantly and scene information progress feature insertion, by transformation being weighted to the source data of coding section, to be weighted transformation to target data in coding section addition attention mechanism.The present invention can portray more global group behavior feature, describe to obtain a more whole group behavior and then complete identification mission.

Description

A kind of group behavior recognition methods of the insertion based on hidden variable

Technical field

The invention belongs to technical field of computer vision, in particular to the group behavior of a kind of insertion based on hidden variable is known Other method.

Background technique

Relationship in video between everyone mainly by being passed through artwork by current main group behavior recognition methods Type is portrayed, and wherein everyone is taken as a node in figure, to carry out the deduction based on graph model.However, general Graph model all can carry out portraying for figure using contacting between node and node, this group behavior identification in can be regarded Work is to carry out group behavior identification based on interpersonal relationship.

General graph model all can carry out portraying for figure using contacting between node and node, this is identified in group behavior In can be deemed to be based on interpersonal relationship carry out group behavior identification.Group based on interpersonal relationship The relationship that body Activity recognition model is portrayed is mainly based upon the model of local relation.

Summary of the invention

The shortcomings that it is a primary object of the present invention to overcome the prior art and deficiency, provide a kind of insertion based on hidden variable Group behavior recognition methods, model proposed by the present invention can be described and identify to group behavior well, and Current preferably recognition result is obtained in common group behavior Study of recognition data set.

In order to achieve the above object, the invention adopts the following technical scheme:

A kind of group behavior recognition methods of the insertion based on hidden variable of the present invention, includes the following steps:

S1, group behavior is identified, carries out individual picture segmentation using the individual callout box provided in data set, passes through The feature extraction that double-current convolutional neural networks are carried out to the personal feature in picture, obtains each individual in group behavior scene Feature representation, it is same by double-current convolutional neural networks, inputted with video frame picture, obtain current group behavior field Feature representation under scape；

S2, building hidden variable incorporation model, the graph model with node and side is characterized using the variable of hidden insertion, So that each node insertion feature has the information with the node interdependent node, to the relationship between characterization people and group Hidden variable insertion is carried out, the parameter update mode of iteration is launched into recurrent neural network, thus to each individual in video Relationship modeling is carried out, group behavior identification is carried out according to the hidden variable that study obtains；

S3, the group behavior identification based on hidden variable incorporation model can be for people by constructing one group of hidden variable The feature expressed with group interaction behavior is obtained by encoding to the information such as individual appearance and movement with group The middle layer hidden variable of behavior semanteme is expressed, and for the owner in current scene, is distinguished using the hidden variable incorporation model of proposition The extraction of people Yu group interaction relationship are carried out to it, and overall group behavior hidden variable is carried out to scene and is expressed, and then will Hidden variable is embedded into semantic feature space by the method that feature is embedded in, and makes have close group behavior using supervisory signals The hidden variable of semantic information in feature space have closer distance, consequently facilitating it is subsequent according to hidden variable to group behavior Classified and is identified；

It is embedding that S4, introducing attention mechanism carry out feature to each individual related with group behavior instantly and scene information Enter, the attention mechanism be it is a kind of be verified, can effectively promote a kind of method of Sequence Learning mission effectiveness, In codec framework, by coding section be added attention mechanism, transformation is weighted to the source data of coding section, or Person introduces attention mechanism in decoding section and effectively improves model to be weighted transformation to target data and obtain to information Take ability and screening capacity.

Step S1 is specifically included as a preferred technical solution:

S1.1, the type for judging picture；

S1.1.1, when picture be RGB picture when, using in data set give individual callout box everyone is determined Position, and picture is carried out according to individual of the callout box to every frame picture and is taken, by the picture of interception carry out size conversion at 224 × 224 × 3, wherein 3 be RGB channel number, the RGB that transformed picture is input to double-current convolutional neural networks is flowed into convolution Feature extraction is carried out in network；

S1.1.2, when picture be light stream picture when, first by light stream picture carry out size conversion at 224 × 224 × 1, so It carries out light stream figure to be horizontally and vertically spliced into 224 × 224 × 2 according to channel afterwards, it finally again will be before the frame The light stream figure at totally ten moment carries out channel splicing afterwards, obtains the light stream graph expression of 224 × 224 × 20 stacking；

S1.2, wherein binary-flow network use 50 layers of residual error convolutional neural networks network Jing Guo UCF101 data set pre-training Parameter, feature take the output of the last one pond layer of residual error network, and characteristic dimension is 2048 dimensions；

S1.3, splice finally by by the feature of RGB channel and the output of light stream channel, finally obtain each individual The appearance and motion feature expression of 4096 dimensions.

Step S2 as a preferred technical solution, specifically:

S3.1, the insertion by the way that the posterior probability of hidden variable to be carried out to feature space, by hidden variable H_iPosterior probability p (H_i |{X_i) utilize characteristic pattern Φ (H_i) be embedded in, have:

First assume at presentIt is unlimited dimensional feature space, and the value of d can be true by the method for cross validation It is fixed；

S3.2, it needs to H_iIn all variables calculate integral, that is, have

Only when the structure of figure is tree, above formula can be calculated by information dispersal algorithm；

S3.3, expression of equal value is carried out to above formula using Fixed-point equation from the angle of insertion point, i.e.,Utilize formulaμ_XOperator is derived, and is had:

For mean field embedding grammar, functionAnd operatorThere is the non-linear of complexity before with potential-energy function Φ and Ψ Relationship, and Feature Mapping function phi need from the acquistion of data middle school to；

S3.4, pass throughParametrization expression is carried out using neural network, and using supervision message it is learnt to carry out non-linear The acquisition of relationship, it is assumed thatWherein d is hyper parameter, for operatorParametrization table is carried out by the neural network of following formula It reaches:

Wherein σ () :=max { 0, } is line rectification function, and remembers that parameter to be learned is W={ W₁,W₂,W₃}；Cause This, according toExpression formula the parameter of network is estimated in such a way that mean field iteration updates, thus to the pass in figure System is expressed using insertion feature.

Step S3 as a preferred technical solution, specifically:

Remember the individual variable that can each observe and group behavior scene difference x_i,i∈v_pAnd x_scene, corresponding using its Hidden variable h_i,i∈v_pAnd h_scene, the middle layer semanteme of each observational variable is indicated, which is understood that each Motion state of the people under current group behavior scene, wherein v_pFor the owner under current scene；Therefore, according to each only Vertical hidden variable expresses h_i, by relationship using people and group and context relation by the use of information of each independent variable Hidden variable is integrated, and the hidden variable after synthesis can indicate the information and group behavior scene information of people and group interaction；So The non-directed graph that each node homogeneously connects is established to all individuals under current scene afterwards, then to each of figure Node carries out the expression of condition posterior probability.

As a preferred technical solution, in the scene of group behavior identification, non-directed graph interconnected contains two kinds Semantic relation: the 1) relationship between individual and group and 2) relationship between individual and scene；According to it is each individual with group it Between relationship between each individual and scene, the posterior probability of the hidden variable of each individual can indicate Wherein V_pI refer under current scene, it is all other than i-th of individual Body；The posterior probability of the hidden variable of group's scene can indicateAccording to comprising complete The scene hidden variable h of office's group behavior information_sceneIt include the individual hidden variable h of local population behavioural information with each individual_iFrom The group behavior occurred under middle identification current scene, therefore the posterior probability of group behavior is represented by

The approximate mean field telescopiny of more wheels is improved using recurrent neural network as a preferred technical solution, X is indicated according to personal feature_iHidden variable deduction and feature insertion are carried out, the insertion feature h of hidden variable is obtained_iFeature is embedded in pass through Comprehensive and group behavior is carried out after more wheel approximation mean field processes to classify, and finally the group behavior under scene is identified；Tool For body,

H is remembered first_iFor hidden variable h_iIt is embedded into the expression of feature space, and utilizes individual appearance and motion feature x_iIt removes Proprietary average appearance and motion feature other than individual iAnd the scene hidden variable insertion that last round of iteration obtains Feature representationIt is modeled, therefore, obtains the more new-standard cement of individual hidden variable insertion feature are as follows:

Wherein, [；] indicate feature vector vertical splicing, | v_p| indicate number individual in group behavior scene, σ () It is the step-length that individual hidden variable is embedded in that feature updates for line rectification function (Rectified Linear Unit, ReLU) and λ；

Similarly,It is scene hidden variable h_sceneInsertion feature representation, pass through utilize global picture feature x_scene、 The average external appearance characteristic of current scene servantWith the integrant expression of the insertion feature of people Following expression is carried out with to scene insertion feature:

Since above formula is the non-linear relation modeling to individual and its Local Interaction behavior, in each scene insertion feature Iteration updates in step, is embedded in feature according to the scene of a upper iterative stepWith the scene of a current iteration step Feature, the average external appearance characteristic of individual and individual averagely insertion feature carry out part update to it so that insertion feature iteration more It can smoothly be restrained in new process；

Secondly, carrying out nonlinear transformation and normalization, the group behavior predicted by expressing above two insertion The posterior probability of category y are as follows:

Wherein, φ is softmax activation primitive；

Finally, after the group behavior posterior probability predicted, using utilization cross entropy loss function to prediction group Behavior category and actual population behavior category carry out error calculation:

Wherein θ model needs the parameter learnt, and K is the number of group behavior category.

Step S4 as a preferred technical solution, specifically:

S4.1, to it is each individual for group behavior under current scene degree of correlation by following nonlinear function into Row calculates:

Wherein, w_g,Give the group behavior degree of correlation of each individual, the interbehavior of people and group The weight of current group behavior is obtained by following formula:

Wherein, τ is the temperature parameter of softmax activation primitive；

S4.2, by carrying out the calculating based on attention mechanism to owner under current scene and group interaction information, newly Scene insertion feature be calculate by the following formula:

Therefore, using attention mechanism, when calculating scene hidden variable insertion expression as global group behavior information, Different group behavior weight g can be calculated to each individual according to current scene_iIndicate each individual to current group behavior Percentage contribution, so that different contributions are made in the iteration update to scene hidden variable；

S4.3, the scene that formula (21) is calculated are embedded in featureIt substitutes into formula (17) and formula (18), i.e., The group behavior recognition result based on hidden variable insertion after the available attention mechanism eventually by addition.

As a preferred technical solution, in step S4.1, in the formula (20), first with exponential function to group's row For degree of correlationIt is mapped, then is normalized, so that everyone is [0 to the codomain of the weight of current group behavior； 1]。

Compared with the prior art, the invention has the following advantages and beneficial effects:

(1) present invention proposes that group behavior is described in the relationship between user and group, and consideration is for everyone With the relationship of surrounding people, therefore model proposed by the present invention can portray more global group behavior feature, to obtain One more whole group behavior describes and then completes identification mission.In order to capture between people and group in video content Relationship, the present invention expresses relationship using the method for hidden variable, is based on deep neural network, mould proposed by the present invention Hidden variable can be embedded into semantic space by type, so that it is obtained a degree of semantic information, and then group behavior is helped to know Not.By expressing from global viewpoint the interactive relation of people and group, model proposed by the present invention can be right well Group behavior is described and identifies, and obtains currently preferably in common group behavior Study of recognition data set Recognition result.

(2) the invention proposes it is a kind of based on hidden variable insertion group behavior identification model, the model by with group People and group interaction information in behavior scene carry out feature space insertion, and using attention mechanism to each individual Group behavior information is weighted, so that the group behavior in scene is modeled and be described.

(3) method proposed by the present invention can well to group behaviors a variety of under different scenes carry out effective identification with Classification, and obtain the optimal performance under each group behavior database.Therefore, proposed by the present invention embedding based on hidden variable The group behavior identification model entered is a kind of effective group behavior identification model, can under intelligent video monitoring system into Row deployment, so that intelligent monitor system has group behavior recognition capability, so as to cope with various burst group rows For and emergency case.

Detailed description of the invention

Fig. 1 is that the present inventor and group interaction hidden variable are embedded in schematic diagram.

Fig. 2 is that the present invention is based on the group behavior identification model block schematic illustrations of hidden variable insertion

Specific embodiment

Present invention will now be described in further detail with reference to the embodiments and the accompanying drawings, but embodiments of the present invention are unlimited In this.

Embodiment

The present invention is based on the group behavior recognition methods of the insertion of hidden variable to propose the relationship pair between user and group Group behavior is described, consideration be everyone and surrounding people relationship, therefore model proposed by the present invention can be carved More global group behavior feature is drawn, describes to obtain a more whole group behavior and then completes identification mission.For The people in video content and the relationship between group are captured, the present invention using the method for hidden variable expresses relationship, base In deep neural network, hidden variable can be embedded into semantic space by model proposed by the present invention, keep its acquisition a degree of Semantic information, and then help group behavior identification.By expressing from global viewpoint the interactive relation of people and group, Model proposed by the present invention can be described and identify to group behavior well, and identify in common group behavior Data concentration obtains current preferably recognition result.

The present invention is based on the group behavior recognition methods of the insertion of hidden variable to include the following steps:

(1) individual visual signature expression；

Group behavior is identified, the present invention is by carrying out individual picture point using the individual callout box provided in data set It cuts, then carries out the feature extraction of double-current convolutional neural networks.Specifically, the present invention is using in data set for RGB picture Given individual callout box positions everyone, and carries out picture according to individual of the callout box to every frame picture and take, By carrying out size conversion at 224 × 224 × 3, wherein 3 be RGB channel number to the picture of interception.Transformed picture is defeated Enter and carries out feature extraction into the RGB stream convolutional network of double-current convolutional neural networks.For light stream picture, first by light stream figure Piece carries out size conversion at 224 × 224 × 1, then splices light stream figure horizontally and vertically according to channel At 224 × 224 × 2, the light stream figure at the front and back of the frame totally ten moment is finally subjected to channel splicing again, obtains 224 × 224 The light stream graph expression of × 20 stacking.Wherein binary-flow network uses 50 layers of residual error convolution Jing Guo UCF101 data set pre-training Neural network (ResNet-50) network parameter, feature take the output of the last one pond layer of residual error network, and characteristic dimension is 2048 dimensions.Splice finally by by the feature of RGB channel and the output of light stream channel, finally obtains each individual 4096 dimension Appearance and motion feature expression.Therefore, it is mentioned by the feature for carrying out double-current convolutional neural networks to the personal feature in picture It takes, we obtain the feature representation of each individual in group behavior sceneWherein v_pRepresent each of video scene Individual.In addition, it is same by double-current convolutional neural networks, it is inputted with video frame picture, available current group behavior Feature representation x under scene_scene。

It is specific as follows about double-current convolutional neural networks:

Video can usually resolve into two component parts, including two parts of room and time.For space segment, one As be every frame appearance information, mainly carry the external appearance characteristic about scene and target object.It is main for time portion Will motion information between video frame, express the motion component of observation object.Double-current convolutional neural networks are according to video Two parts of room and time, are utilized respectively convolutional neural networks and carry out information extraction and feature extraction to it, and independently into Row classification, mixing (Late fusion) processing is merged after last category score passes through, and is obtained final based on double fluid The classification results of information.For the convolutional neural networks of spatial channel, convolution is carried out simultaneously primarily with respect to the picture of each single frames Classification.Since certain movements are stronger with specific object and object dependencies, static appearance information can be as movement One of clue of classification.And pass through before Activity recognition research it can be concluded that, even if according only to picture appearance information, row Still there is certain competitiveness for the accuracy rate of identification.Further, in the convolutional neural networks parameter initialization of spatial channel During, it can use the data of current large-scale image identification mission, if ImageNet carries out the pre-training of model, use The network parameter of pre-training carries out the parameter initialization of the convolutional neural networks of spatial channel.

It is different from the convolutional neural networks of above-mentioned spatial channel for the convolutional neural networks of time channel, generally make Use the light stream stacking figure stacked by successive frame as inputting.Because containing more temporal informations and motion information, Light stream stacking figure can preferably state the motion information between video frame, and not need in network-evaluated movement Hold, and then the classification of Activity recognition is promoted bigger.For the stacking of light stream, since intensive light stream generally is regarded as one group Motion vector field (Displacement vector fields) between successive frame t and t+1, is denoted as d_t.Another note d_t(u, v) is the Motion vector of the t frame midpoint (u, v) to t+1 frame.For the horizontal and vertical component of vector field, it is denoted as d respectively_t ^xAnd d_t ^y, And it is characterized by two channels of light stream picture.In order to further characterize the motion information between successive frame, herein by To the light stream channel d of L successive frame_t ^x,yIt is stacked, forms the light stream that port number is 2L and stack figure.Formally, another w and h points Not Biao Shi video pictures width and height, the light stream of any time τ frame stacks figureIt indicates are as follows:

Therefore, for arbitrary point (u, v), light stream channel I_τ(u, v, c), c=[1:2L] encode the point in continuous frame L Between motion sequence.

(2) building is based on hidden variable incorporation model；

Hidden variable incorporation model can be characterized the graph model with node and side using the variable of hidden insertion, thus So that each node insertion feature has the information with the node interdependent node, therefore it is embedded in feature and not only characterizes and work as prosthomere The information of point, also characterizes the relation information of node associated therewith.Mainly introduce the derivation of hidden variable incorporation model in this part Journey and basic conception.The stochastic variable for remembering domain χ is X, and remembers that the example that stochastic variable is X is x.Probability on domain χ Density function is p (X), and remembers that probability density space isMultiple stochastic variable x₁, χ₂,…,χ_lJoint probability density p (χ₁, χ₂,…,χ_l).In addition, defining H is domainIn hidden variable, and corresponding probability density distribution be p (H).

Hilbert space probability distribution insertion can by among the feature space of distribution map to unlimited dimension i.e.:

Wherein distribution is mapped to a point in the mapping of its desired character namely feature space.The nuclear mapping side of distribution Method has very strong ability to express.Some of Feature Mappings are injections, i.e., if two distribution ps (X) and q (X) are different , then they can be mapped to two different points in feature space.For example, working asThe spy of many commonly-used core There is this property in sign space, such as Gaussian radial basis function core (Gaussian RBF kernel) The process for being embedded in feature is injection.Correspondingly, the injection feature of probability density p (X) can be embedded in μ by us_XFor The information of the sufficient statistic of the probability density, i.e., all about probability density function remains on μ_XIn, utilize μ_XIt can be with Uniquely restore p (X), and the operator arbitrarily on p (X) can be corresponded in μ_XOn operator, and generate the same knot Fruit.As this property enables to us using the functional f of insertion feature calculation probability density:Have:

Wherein,It applies to be corresponding in μ_XOn function.Similarly, which can be generalized in probability density OperatorHave:

WhereinFor the corresponding operator on acting on insertion feature.For the data with structural property Point, without loss of generality, it can be assumed that each structural data point x is a figure, wherein point set v=1 ... is contained, V and one The set ε on a side, and remember that the value of node i is x_i.For node variable X each in figure_i, we utilize additional hidden variable H_iIt is right The category of its node is modeled.Then, the Markov based on combination of two can be defined in these stochastic variables random Field relationship:

Wherein, Φ and Ψ is respectively non-negative node and side potential-energy function.In this model, knot of the variable all in accordance with figure Structure interconnects.Therefore, graph structure expression is carried out to input data, is equivalent to the conditional sampling directly according to undirected graph model Structure is associated.Notice that above-mentioned graph model is established to each independent data point, and for different Two graph models with conditional sampling structure, two data point χ and χ ' amount { H_iCannot be by directly observing obtaining, therefore Potential-energy function Φ and Ψ in graph model are difficult to learn.Therefore, the present invention is by carrying out feature for the posterior probability of hidden variable The insertion in space, thus using insertion characteristic properties mentioned above to two above function into row equivalent estimation.By hidden change Measure H_iPosterior probability p (H_i|{X_i) utilize characteristic pattern Φ (H_i) be embedded in, have:

Φ(H_i) parameter p (H in specific form and markov random file_i|{X_i) it be not currently determination, It can be learnt using supervisory signals later.First assume at presentIt is unlimited dimensional feature space, and d Value can pass through the method for cross validation determine.However, for general figure calculate telescopiny be it is extremely complex, In contain the deduction of graph model, this is needed to H_iIn all variables calculate integral, that is, have

Only when the structure of figure is tree, above formula can propagate (message passing) algorithm by information and be calculated. Therefore for ordinary circumstance, can utilize approximate inference method, as mean field infer (Mean field inference) and The method of circulation belief propagation (Loopy belief propagation) is inferred.Of the invention is embedded in based on hidden variable What the group behavior identification model of model was mainly further improved according to the method that mean field is inferred, therefore lower mask Insertion estimating method of the body introduction based on mean field.General mean field estimating method, which passes through, utilizes independent probability density letter Several products is to conditional probability density function p ({ H_i}|{x_j) calculated: p ({ H_i}|{x_j})≈∏_i∈νq_i(h_i), wherein q_i (h_i) >=0 is effective probability density function, so thatFurther, these probability density functions pass through Following variation free energy (Variational free energy) is minimized to obtain:

Optimization problem more than as can be seen that meets following Fixed-point equation (Fixed point equation), right Have in all i ∈ v:

WhereinFor variable H_iThe set of adjacent node in figure, c_iFor constant.Fixed-point equation q_i(h_i) it is its phase The distribution of adjacent side borderThe functional of set, it may be assumed that

If for each marginal probability density q_i, there is injection insertion:

So, according to formula (4), we can carry out equivalent table to above formula using Fixed-point equation from the angle of insertion point It reaches, i.e.,It is derived, is had using operator with according to formula (5):

For mean field embedding grammar, functionAnd operatorThere is the non-linear of complexity before with potential-energy function Φ and Ψ Relationship, and Feature Mapping function phi need from the acquistion of data middle school to.Therefore, can pass throughIt is carried out using neural network Parametrization indicates, and learns the acquisition of progress non-linear relation to it using supervision message.Assuming thatWherein d is super ginseng Number.For operatorWe can carry out Parameter Expression by the neural network of following formula:

Wherein σ () :=max { 0, } is line rectification function, and remembers that parameter to be learned is W={ W₁,W₂,W₃}.Cause This, can estimate the parameter of network in such a way that mean field iteration updates according to above formula, thus to the relationship in figure It is expressed using insertion feature.Model of the invention is formally according to above mode to the relationship between characterization people and group Hidden variable insertion is carried out, the parameter update mode of iteration is launched into recurrent neural network, thus to each individual in video Relationship modeling is carried out, group behavior identification is carried out according to the hidden variable that study obtains.

(3) the group behavior identification based on hidden variable incorporation model；

By constructing one group of hidden variable, the feature that people and group interaction behavior can be expressed, by right The information such as individual appearance and movement are encoded, and obtain having the middle layer hidden variable of group behavior semanteme to express, such as Fig. 1-2 institute Show.For the owner in current scene, hidden variable incorporation model proposed by the present invention carries out people and group interaction to it respectively The extraction of relationship, and overall group behavior hidden variable is carried out to scene and is expressed.Then hidden variable is embedded in by feature Method is embedded into semantic feature space, makes the hidden variable with close group behavior semantic information in spy using supervisory signals Levying has closer distance in space, consequently facilitating subsequent classified and identified to group behavior according to hidden variable.It is specific and Speech remembers the individual variable that can each observe and group behavior scene difference x_i,i∈v_pAnd x_scene, we are corresponding hidden with its Variable h_i,i∈v_pAnd h_scene, the middle layer semanteme of each observational variable is indicated, which is understood that everyone Motion state under current group behavior scene, wherein v_pFor the owner under current scene.Therefore, according to each independence Hidden variable express h_i, it is intended that by relationship using people and group and context relation by the letter of each independent variable Breath is integrated using hidden variable, and the hidden variable after synthesis can indicate the information and group behavior scene of people and group interaction Information.It therefore, is to establish each node to all individuals under current scene homogeneously to connect than relatively straightforward method Then non-directed graph carries out the expression of condition posterior probability to each node in figure.So in the scene of group behavior identification, The non-directed graph connected entirely contains two kinds of semantic relations: 1) individual and group between relationship and 2) individual and scene between Relationship.According to the relationship between each individual and group between each individual and scene, the posteriority of the hidden variable of each individual Probability can indicate Wherein v_pI refer under current scene, in addition to i-th All individuals other than body.The posterior probability of the hidden variable of group's scene can indicate For above-mentioned two listed probability density function, specific meaning is, each hidden variable hi is needed according to i-th people's Information, and the in addition to this proprietary information for including, then the information of comprehensive group's scene context, obtain itself and group Between interaction information, the group behavior information of one of group behavior part can be regarded as.For scene hidden variable h_scene, it is by individual appearances all under scene variable and group's scene and motion information, and comprehensive owner and group Interactive information describes the interaction scenario of owner and group under group's scene from the angle of an entirety, therefore can also see Do be a kind of overall situation group behavior information.According to the scene hidden variable h comprising global group behavior information_sceneWith per each and every one Body includes the individual hidden variable h of local population behavioural information_iWe can therefrom identify the group's row occurred under current scene For, therefore the posterior probability of group behavior can be expressed asHowever, although we can define The posterior probability of hidden variable, in reality inferred in traditional graph model and the frame of artificially defined feature to hidden variable Still very difficult.Therefore, it is intended that being built using deep neural network to the non-linear relation between people and group Hidden variable is embedded into feature space to carry out group behavior structural modeling by mould.The mark sheet being embedded in by hidden variable Up to the expression that can regard corresponding posterior probability as.For this purpose, the present invention propose the method using approximate mean field to expression people with The hidden variable of group interaction relationship carries out approximate inference, and captures people and group interaction process during feature insertion Between relation of long standing relation.If Fig. 1-2 shows, the present invention will take turns approximate mean field telescopiny more and be carried out using recurrent neural network It improves, x is indicated according to personal feature_iHidden variable deduction and feature insertion are carried out, the insertion feature h of hidden variable is obtained_iIt is embedded in feature Comprehensive and group behavior is carried out after excessively being taken turns approximate mean field process to classify, and finally the group behavior under scene is known Not.Specifically, remembering h first_iFor hidden variable h_iIt is embedded into the expression of feature space, and utilizes individual appearance and motion feature x_iProprietary average appearance and motion feature in addition to individual iAnd the hidden change of scene that last round of iteration obtains Amount insertion feature representationIt is modeled.Therefore the more new-standard cement of available individual hidden variable insertion feature are as follows:

Wherein, [；] indicate feature vector vertical splicing, | v_p| indicate number individual in group behavior scene, σ () It is the step-length that individual hidden variable is embedded in that feature updates for line rectification function (Rectified Linear Unit, ReLU) and λ. In order to keep the terseness of formula, the present invention ignores the expression to bias term under normal circumstances.Intuitively, for formula (15), Neighborhood characteristics are integrated with the mean value of neighborhood external appearance characteristic to indicateIt is used for the integration neighborhood characteristics The appearance information for expressing group, posture and direction including people etc..And the feature insertion of scene hidden variableThen it is used for table Show scene global contextual information, such as Crowds Distribute and scene background information.Each is updated in step, individual hidden variable It is embedded in featureAccording to the individual hidden variable feature of a upper iterative stepIt is adjacent with the integration according to current iteration step The insertion of the feature of characteristic of field and scene hidden variable carries out part update, is controlled by hyper parameter λ and updates step-length.As can be seen that every Individual hidden variable considers individual external appearance characteristic simultaneously, and group's external appearance characteristic relevant to individual and scene context are believed, from And carry out the reasoning of people and group relation.Therefore, the interactive relation of people and group can be by being embedded in featureIt is expressed.

Similarly,It is scene hidden variable h_sceneInsertion feature representation, therefore it is desirable that this feature expression can To capture group interaction behavioural information from global level, to further help individual hidden variable insertion feature representation more New and last group behavior identifies classification.Therefore, we are by utilizing global picture feature x_scene, current scene servant Average external appearance characteristicWith the integrant expression of the insertion feature of peopleIt is embedded in to scene Feature carries out following expression:

Since above formula is the non-linear relation modeling to individual and its Local Interaction behavior.In each scene insertion feature Iteration updates in step, is embedded in feature according to the scene of a upper iterative stepWith the scene of a current iteration step Feature, the average external appearance characteristic of individual and individual averagely insertion feature carry out part update to it so that insertion feature iteration more It can smoothly be restrained in new process.ThereforeIt may be considered the overall expression of global interactive relation.It is hidden according to individual The insertion expression and the insertion of scene hidden variable of variable are expressed, it is intended that according to each individual to group's row under current scene For contribution, i.e. local population behavioural information, and global group behavior information identifies current group behavior.Therefore, We can carry out nonlinear transformation and normalization, the group behavior category y predicted by expressing above two insertion Posterior probability are as follows:

Wherein, φ is softmax activation primitive.Finally, utilizing us after the group behavior posterior probability predicted Error calculation is carried out to prediction group behavior category and actual population behavior category using cross entropy loss function:

Wherein θ model needs the parameter learnt, and K is the number of group behavior category.y_kFor 1, the frame belongs to kth monoid Body behavior is not belonging to for 0.Therefore when there is prediction group behavior and actual population behavior category is inconsistent, L (θ) value It is larger, to punish the parameter in network in training, finally to predict group behavior category and actual population behavior Category is more close.Since the method for the present invention joins the incorporation model based on approximate mean field using recurrent neural network Numberization indicates, therefore the parameter of model is by back-propagation algorithm (the Back PropagationThrough conducted along the time Time, BPTT) algorithm updates and optimization.

(4) group behavior identification model is improved based on attention mechanism；

It notices in formula (16),Update be according under current scene it is proprietary averagely insertion feature into Row modeling.Therefore for the scene, so the interactive information between people and group is all identical for the contribution of group behavior 's.However, we should be more concerned about obvious or strong people and group interaction in order to correctly excavate to group structure Behavior, because these strong interbehaviors more contribute the understanding of the group behavior under current scene, and at the same time neglecting Influence of the weak interbehavior to current group Activity recognition.For example, as shown in Fig. 2, we more should be noted that under the scene To be the interactive relation waited between the people to go across the road because its for current group behavior classification and identification more Added with help.And excessively pay close attention to the interactive relation of passerby and group in background, then algorithm can be influenced to a certain extent to group's row For the accuracy of identification, and group behavior ambiguity can be generated.Therefore, we introduce attention mechanism to each with group instantly The related individual of body behavior and scene information carry out feature insertion.Attention mechanism be it is a kind of be verified, can be effectively A kind of method for promoting Sequence Learning mission effectiveness, in codec (Encoder-Decoder) frame, by encoding Attention mechanism is added in section, is weighted transformation to the source data of coding section, or introduce attention mechanism in decoding section, from And transformation is weighted to target data, model is effectively improved to the acquisition capability and screening capacity of information.The present invention passes through Recurrent neural network makes inferences the iteration renewal process that hidden variable feature is embedded in, and introduces attention machine in the process System is screened and is integrated to current group behavioural information, to promote network to the recognition capability of group behavior.Each individual The degree of correlation of group behavior under current scene is calculated by following nonlinear function:

Wherein,Give the group behavior degree of correlation of each individual, the interbehavior of people and group The weight of current group behavior is obtained by following formula:

Wherein, τ is the temperature parameter of softmax activation primitive.Above formula is first with exponential function to group behavior correlation DegreeIt is mapped, then is normalized, so that everyone is [0 to the codomain of the weight of current group behavior；1].It is logical It crosses and the calculating based on attention mechanism is carried out to owner under current scene and group interaction information, new scene is embedded in feature It is calculate by the following formula:

Therefore, using attention mechanism, when calculating scene hidden variable insertion expression as global group behavior information, Different group behavior weight g can be calculated to each individual according to current scene_iIndicate each individual to current group behavior Percentage contribution, so that different contributions are made in the iteration update to scene hidden variable.Finally, the field that formula (21) is calculated Scape is embedded in featureIt substitutes into formula (17) and formula (18), it can after obtaining eventually by attention mechanism is added Group behavior recognition result based on hidden variable insertion.

(5) experimental result and discussion

In order to verify the validity of the group behavior identification model proposed by the present invention based on hidden variable insertion, the present invention exists The verifying that model performance is carried out on three general group behavior identification databases, including 1) CAD group behavior database； 2) database is expanded in CAED group behavior.On the two general group behavior identification databases, group's row proposed by the present invention It has been more than the existing group behavior recognition methods delivered for identification model effect.In addition, in order to eliminate the influence of feature and The validity for further verifying model compared the group based on scene picture feature and individual average characteristics respectively in an experiment Body behavior classifying quality.Method proposed by the present invention specific implementation details is as follows: picture feature is extracted, using The ResNet-50 double fluid convolutional neural networks that pre-training is crossed on UCF101 behavior database, respectively to scene and individual picture into Row feature extraction, as external appearance characteristic used in this programme.In all experiments, if not otherwise specified, Softmax temperature Parameter is set as 0.25, and feature updates coefficient and is set as 0.3.Random cut-off parameter (Dropout Weight) in formula (18) is set It is 0.5.All parameters carry out random initializtion with Xavier method, and carry out parameter optimization using Adam method.This hair Bright method is based on Tensorflow frame and is encoded.

(5.1) CAD group behavior database；

CAD group behavior database (Collective Activity Dataset) is current group behavior identification field One of most common database wherein sharing 44 video-frequency bands and 5 kinds of different group behaviors, including goes across the road (Crossing), it waits (Waiting), be lined up (Queueing), walking (Walking) and talk (Talking).Video is every It is once marked every 10 frames, the category including the callout box of people and group behavior that occur in picture.The present invention according to The data test scheme of Deng et al. carries out model performance verifying, and carries out group behavior recognition performance with existing method Comparison, specific data are shown in Table 1.In the data set, insertion the number of iterations T and hidden variable characteristic length is respectively set in the present invention For 3 and 256.As shown in table 1, method performance proposed by the present invention has surmounted existing method and base based on non-deep learning In the method for deep learning, and the best group behavior recognition result on the data set obtained.Specifically, this The method that invention proposes obtains 85.4% group behavior recognition result on the data set, and the result is more best than at present Cardinality Kernel method is higher by 2% accuracy rate, while than the current best method based on deep learning Hierarchical Deep TemporalModel improves 4%.The experimental results showed that proposed by the present invention be based on people and group The modeling of body interbehavior effectively can be classified and be identified to group behavior, and more interpersonal than being currently based on The method of interbehavior is more efficient.

The comparison of table 1.CAD group behavior identification database experimental result

(5.2) CADE group behavior database；

As described in (5.1), for " walking " and " going across the road " two types of populations behavior, just go out in the process of mark The problem of existing category biasing.Therefore, Choi et al. proposes CADE group behavior number on the basis of CAD group behavior data set According to library, they remove the group behavior of " walking ", and add " dancing " and " running " two types of populations behavior, so that CADE With 6 kinds of different group behavior categories.Similarly, insertion the number of iterations T and hidden variable characteristic length difference is arranged in the present invention For 3 and 256, and the test set data dividing method continuing to use Deng et al. and proposing.Table 2 lists proposed by the present invention be based on The group behavior identification model and recognition result of the existing model on CADE group behavior database of hidden variable insertion.This hair The group behavior model of bright proposition obtains 97.94% discrimination on CADE database, than Sructure best at present Inference Machines method improves 7%.This experimental result further demonstrates that proposed by the present invention based on people and group Validity and superiority of the modeling of body interbehavior for group behavior identification scene.

The comparison of table 2.CADE group behavior identification database experimental result

(5.3) experimental result is further analyzed

Influence of (5.3.1) the attention mechanism to model performance.Further, we use attention to proposed by the present invention Power mechanism carries out group behavior identification and is further probed into, as a result as table 3 shows.In experimentation, insertion iteration time is set respectively Number T and hidden variable characteristic length are 3 and 256.As can be seen from Table 3, attention mechanism is added in the method as proposed in the present invention The recognition accuracy of group behavior can effectively be promoted.Particularly, it on CAD group behavior identification database, is added and pays attention to The promotion of 2% recognition accuracy can be obtained after power mechanism, the reason is that because attention mechanism can help model fine Interbehavior related with corresponding group behavior is paid close attention on ground, so that model has stronger group behavior identification.

Influence of the 3. attention mechanism of table to model performance

Influence of (5.3.2) the hidden variable characteristic length to model performance.Further, we are to hidden variable characteristic length pair The influence of model performance is probed into, and concrete outcome is as shown in table 4.The experimental results showed that when hidden variable characteristic length is 256 When, model all shows more preferably to the performance that group behavior is classified in all data sets, and increases or reduce hidden variable feature Length can then influence the classification performance of model to a certain extent.

Table 4. is embedded in influence of the characteristic dimension to model performance

In conclusion the invention proposes a kind of group behavior identification model based on hidden variable insertion, which passes through Feature space insertion is carried out with group interaction information with the people in group behavior scene, and using attention mechanism to each The group behavior information of individual is weighted, so that the group behavior in scene is modeled and be described.Therefore, of the invention The method of proposition as shown in experimental result, can well to group behaviors a variety of under different scenes carry out effective identification with Classification, and obtain the optimal performance under each group behavior database.Therefore, proposed by the present invention embedding based on hidden variable The group behavior identification model entered is a kind of effective group behavior identification model, can under intelligent video monitoring system into Row deployment, so that intelligent monitor system has group behavior recognition capability, so as to cope with various burst group rows For and emergency case.However, the method for the present invention also has certain shortcoming, one of them is then speed issue.The present invention mentions Method out needs a large amount of time in the extraction process of video features.Therefore, in following research, it is desirable to be able to logical Cross the target detection technique in conjunction with current quick picture and video, identified in conjunction with group behavior, thus obtain quickly and Effective group behavior method.

The above embodiment is a preferred embodiment of the present invention, but embodiments of the present invention are not by above-described embodiment Limitation, other any changes, modifications, substitutions, combinations, simplifications made without departing from the spirit and principles of the present invention, It should be equivalent substitute mode, be included within the scope of the present invention.

Claims

1. a kind of group behavior recognition methods of the insertion based on hidden variable, which is characterized in that include the following steps:

S1, group behavior is identified, individual picture segmentation is carried out using the individual callout box provided in data set, by figure Personal feature in piece carries out the feature extraction of double-current convolutional neural networks, obtains the feature of each individual in group behavior scene Expression, it is same by double-current convolutional neural networks, it is inputted with video frame picture, obtains the spy under current group behavior scene Sign expression；

S2, building hidden variable incorporation model, the graph model with node and side is characterized using the variable of hidden insertion, thus So that each node insertion feature has the information with the node interdependent node, the relationship between characterization people and group is carried out hidden Variable insertion, is launched into recurrent neural network for the parameter update mode of iteration, to close to each individual in video System's modeling carries out group behavior identification according to the hidden variable that study obtains；

S3, the group behavior identification based on hidden variable incorporation model can be for people and group by constructing one group of hidden variable The feature that body interbehavior is expressed is obtained by encoding to the information such as individual appearance and movement with group behavior Semantic middle layer hidden variable expression, for the owner in current scene, using the hidden variable incorporation model of proposition respectively to it The extraction of people and group interaction relationship are carried out, and overall group behavior hidden variable is carried out to scene and is expressed, then by hidden change Amount is embedded into semantic feature space by the method that feature is embedded in, and makes that there is close group behavior semanteme to believe using supervisory signals The hidden variable of breath has closer distance in feature space, consequently facilitating subsequent classify to group behavior according to hidden variable And identification；

S4, attention mechanism is introduced to each individual related with group behavior instantly and scene information progress feature insertion, institute State attention mechanism be it is a kind of be verified, can effectively promote a kind of method of Sequence Learning mission effectiveness, compile solve In code device frame, by the way that attention mechanism is added in coding section, transformation is weighted to the source data of coding section, or decoding Section introduces attention mechanism, to be weighted transformation to target data, effectively improve model to the acquisition capability of information with Screening capacity.

2. the group behavior recognition methods of the insertion based on hidden variable according to claim 1, which is characterized in that step S1 tool Body includes:

S1.1, the type for judging picture；

S1.1.1, when picture be RGB picture when, using in data set give individual callout box everyone is positioned, and And picture is carried out according to individual of the callout box to every frame picture and is taken, by the picture of interception carry out size conversion at 224 × 224 × 3, wherein 3 be RGB channel number, transformed picture is input in the RGB stream convolutional network of double-current convolutional neural networks Carry out feature extraction；

S1.1.2, when picture is light stream picture, light stream picture is subjected to size conversion at 224 × 224 × 1 first, then will Light stream figure horizontally and vertically carries out being spliced into 224 × 224 × 2 according to channel, is finally again total to the front and back of the frame The light stream figure at ten moment carries out channel splicing, obtains the light stream graph expression of 224 × 224 × 20 stacking；

S1.2, wherein binary-flow network are joined using 50 layers of residual error convolutional neural networks network Jing Guo UCF101 data set pre-training Number, feature take the output of the last one pond layer of residual error network, and characteristic dimension is 2048 dimensions；

S1.3, splice finally by by the feature of RGB channel and the output of light stream channel, finally obtain each individual 4096 dimension Appearance and motion feature expression.

3. the group behavior recognition methods of the insertion based on hidden variable according to claim 1, which is characterized in that step S2 tool Body are as follows:

S3.1, the insertion by the way that the posterior probability of hidden variable to be carried out to feature space, by hidden variable H_iPosterior probability p (H_i| {X_i) utilize characteristic pattern Φ (H_i) be embedded in, have:

Φ (H is first assumed at present_i)∈R^dIt is unlimited dimensional feature space, and the value of d can be determined by the method for cross validation；

S3.2, it needs to H_iIn all variables calculate integral, that is, have

S3.3, expression of equal value is carried out to above formula using Fixed-point equation from the angle of insertion point, i.e.,Utilize formulaOperator is derived, and is had:

For mean field embedding grammar, functionAnd operatorThere is complicated non-linear relation before with potential-energy function Φ and Ψ, And Feature Mapping function phi need from the acquistion of data middle school to；

S3.4, pass throughParametrization expression is carried out using neural network, and it is learnt using supervision message to carry out non-linear relation Acquisition, it is assumed thatWherein d is hyper parameter, for operatorParameter Expression is carried out by the neural network of following formula:

Wherein σ () :=max { 0, } is line rectification function, and remembers that parameter to be learned is W={ W₁,W₂,W₃}；Therefore, root According toExpression formula the parameter of network is estimated in such a way that mean field iteration updates, thus in figure relationship utilize Insertion feature is expressed.

4. the group behavior recognition methods of the insertion based on hidden variable according to claim 1, which is characterized in that step S3 tool Body are as follows:

Remember the individual variable that can each observe and group behavior scene differenceAnd x_scene, utilize its corresponding hidden change AmountAnd h_scene, the middle layer semanteme of each observational variable is indicated, the middle layer semanteme be understood that everyone Motion state under current group behavior scene, whereinFor the owner under current scene；Therefore, according to each independent hidden Variable expresses h_i, by relationship using people and group and context relation by the use of information hidden variable of each independent variable into Row integrates, and the hidden variable after synthesis can indicate the information and group behavior scene information of people and group interaction；Then to current All individuals under scene establish the non-directed graph that each node homogeneously connects, and then carry out item to each node in figure Part posterior probability indicates.

5. the group behavior recognition methods of the insertion based on hidden variable according to claim 4, which is characterized in that

Group behavior identification scene in, non-directed graph interconnected contains two kinds of semantic relations: 1) individual with group it Between relationship and 2) individual and scene between relationship；According to it is each individual and group between it is each individual and scene between The posterior probability of relationship, the hidden variable of each individual can indicateWhereinRefer under current scene, all individuals other than i-th of individual；The posterior probability of the hidden variable of group's scene can be with table ShowAccording to the scene hidden variable h comprising global group behavior information_sceneWith it is each Individual includes the individual hidden variable h of local population behavioural information_iTherefrom identify the group behavior occurred under current scene, therefore group The posterior probability of body behavior is represented by

6. the group behavior recognition methods of the insertion based on hidden variable according to claim 4, which is characterized in that will take turns close more It is improved like mean field telescopiny using recurrent neural network, x is indicated according to personal feature_iCarry out hidden variable deduction and spy Sign insertion, obtains the insertion feature h of hidden variable_iInsertion feature carries out synthesis and group's row after the approximate mean field process of excessive wheel For classification, finally the group behavior under scene is identified；Specifically,

H is remembered first_iFor hidden variable h_iIt is embedded into the expression of feature space, and utilizes individual appearance and motion feature x_iExcept individual i Proprietary average appearance and motion feature in additionAnd the scene hidden variable insertion mark sheet that last round of iteration obtains It reachesIt is modeled, therefore, obtains the more new-standard cement of individual hidden variable insertion feature are as follows:

Wherein, [；] indicate feature vector vertical splicing,Indicate number individual in group behavior scene, σ () is line Property rectification function (Rectified Linear Unit, ReLU) and λ be step-length that individual hidden variable is embedded in feature update；

Similarly,It is scene hidden variable h_sceneInsertion feature representation, pass through utilize global picture feature x_scene, it is current The average external appearance characteristic of scene servantWith the integrant expression of the insertion feature of peopleWith to field Scape insertion feature carries out following expression:

Since above formula is the non-linear relation modeling to individual and its Local Interaction behavior, in the iteration of each scene insertion feature It updates in step, feature is embedded according to the scene of a upper iterative stepWith the scene characteristic of a current iteration step, The average external appearance characteristic of individual and individual averagely insertion feature carry out part update to it, so that the mistake that insertion feature is updated in iteration It can smoothly be restrained in journey；

Secondly, carrying out nonlinear transformation and normalization, the group behavior category y predicted by expressing above two insertion Posterior probability are as follows:

Wherein, φ is softmax activation primitive；

7. the group behavior recognition methods of the insertion based on hidden variable according to claim 1, which is characterized in that step S4 tool Body are as follows:

S4.1, each individual counts the degree of correlation of group behavior under current scene by following nonlinear function It calculates:

Wherein,Give the group behavior degree of correlation of each individual, the interbehavior of people and group is for working as The weight of preceding group behavior is obtained by following formula:

Wherein, τ is the temperature parameter of softmax activation primitive；

S4.2, by carrying out the calculating based on attention mechanism, new field to owner under current scene and group interaction information Scape insertion feature is calculate by the following formula:

It therefore, can root when calculating scene hidden variable insertion expression as global group behavior information using attention mechanism Different group behavior weight g is calculated to each individual according to current scene_iIndicate contribution of each individual to current group behavior Degree, so that different contributions are made in the iteration update to scene hidden variable；

S4.3, the scene that formula (21) is calculated are embedded in featureIt substitutes into formula (17) and formula (18), it can It obtains eventually by the group behavior recognition result based on hidden variable insertion after addition attention mechanism.

8. the group behavior recognition methods of the insertion based on hidden variable according to claim 7, which is characterized in that step S4.1 In, in the formula (20), first with exponential function to group behavior degree of correlationIt is mapped, then is normalized, So that everyone is [0,1] to the codomain of the weight of current group behavior.