CN106529426A - Visual human behavior recognition method based on Bayesian model - Google Patents

Visual human behavior recognition method based on Bayesian model Download PDF

Info

Publication number
CN106529426A
CN106529426A CN201610921854.2A CN201610921854A CN106529426A CN 106529426 A CN106529426 A CN 106529426A CN 201610921854 A CN201610921854 A CN 201610921854A CN 106529426 A CN106529426 A CN 106529426A
Authority
CN
China
Prior art keywords
parameter
video
pattern
training video
behavior
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201610921854.2A
Other languages
Chinese (zh)
Inventor
胡卫明
杨双
原春锋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Institute of Automation of Chinese Academy of Science
Original Assignee
Institute of Automation of Chinese Academy of Science
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Institute of Automation of Chinese Academy of Science filed Critical Institute of Automation of Chinese Academy of Science
Priority to CN201610921854.2A priority Critical patent/CN106529426A/en
Publication of CN106529426A publication Critical patent/CN106529426A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/46Extracting features or characteristics from the video content, e.g. video fingerprints, representative shots or key frames
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
    • G06F18/24155Bayesian classification

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Probability & Statistics with Applications (AREA)
  • Multimedia (AREA)
  • Image Analysis (AREA)

Abstract

The invention relates to a visual human behavior recognition method based on a Bayesian model, and the method comprises the following steps: extracting the features in a training video, and forming the bottom expression for the human body behaviors in the training model; constructing a layered Bayesian model from the features, so as to extract human body behavior modes at different scales from the training video, and obtain the human body behavior expression based on high-layer semantic information; embedding a maximum interval mechanism, and achieving the learning of a layered Bayesian model of a discriminant; learning the parameters of the layered Bayesian model of the discriminant, so as to determine the parameters. The invention also relates to a visual human body behavior recognizer formed through the learning of the method. According to the invention, the maximum interval mechanism is introduced to the recognition model, and forms a unified layered Bayesian model of the discriminant with a former recognition model, can effectively respond to the condition of a complex behavior background, and achieves the robust behavior recognition.

Description

A kind of vision Human bodys' response method based on Bayesian model
Technical field
The present invention relates to computer vision field, more particularly to a kind of vision Human bodys' response based on Bayesian model Method.
Background technology
Vision Human bodys' response be in computer vision field one it is important study a question, it is in intelligent monitoring, height The occasions such as level man-machine interaction, Film Animation making all have huge using value.Usual vision Human bodys' response method master To include two steps:(1) the human body behavioural information in video is expressed as into the forms such as vector or figure, obtains vision human body behavior Expression;(2) sorting technique for utilizing the expression input for obtaining related, such as support vector machine etc., complete classification and identification.
At present with regard to, in many research work of vision human body behavior analysiss, many counting methods are to distinguish the two steps Complete independently, i.e., independently carried out according to sequencing substep.Such method is independently carried out with identification due to would indicate that, because This expression obtained by cannot both having ensured also be able to cannot be ensured with optimum designed recognition methodss suitable for later step Resulting expression in the utilization previous step that selected recognition methodss can be optimal.
On the other hand, Bayes's class model is due to being directly modeled to the relation between data, from the angle table of statistics The distribution situation of data is shown, can overcome the shortcomings of that traditional word bag model is beyond expression the potential applications of feature, often can be with Learn the substitutive characteristics with regard to data, thus be also widely applied in vision human body behavior analysiss field.But at present Most Bayes's class methods purely from the angle of production model, have ignored the utilization to identification information.
Meanwhile, at present in human body behavior analysiss task, it is to adopt the identification method based on largest interval criterion mostly, such as Support vector machine class method, realizes classification with identification.The method is due to directly good and bad to weigh classification performance Classification Loss For optimization aim, therefore preferable recognition effect is all achieved in the identification mission including many behavior analysiss.In addition, such Directly by the purpose solved to reach final optimal classification performance of optimization problem, implementation method is ripe for recognition methodss, because This is widely applied.
These current Activity recognition methods generally only lay particular emphasis on some link for representing or recognizing, and cannot form system One learning framework, it is impossible to make expression result mutually strengthen mutually regulation with recognition result so as to which the scope of application receives larger Restriction.
The content of the invention
The problems referred to above that the present invention is present for prior art, propose a kind of vision human body behavior based on Bayesian model Recognition methodss, which can effectively tackle the situation of complex behavior background, and then realize the Activity recognition of robust.
The present invention's is comprised the following steps based on the vision Human bodys' response method of Bayesian model:
Step 1:The feature in training video is extracted, the bottom to human body behavior in the training video is formed and is expressed;
Step 2:From the feature, layering Bayesian model is built, to extract different scale in the training video Under human body behavioral pattern, obtain the human body Behavior Expression based on high-layer semantic information;
Step 3:Embedded largest interval mechanism, realizes the study of the layering Bayesian model of discriminant;
Step 4:Learn the parameter of the layering Bayesian model of the discriminant, to determine the parameter.
Further, step 1 specifically includes herein below:
Step 1a:Based on the pixel value changes of pixel in the training video, in detecting the training video The point of significance of human body behavior;
Step 1b:Centered on each point of significance, description is built respectively, centered on being formed to each point of significance The description of regional area;
Step 1c:All description are clustered, corresponding vision word and visual dictionary is formed, and then is built The histogram vectors of word-based bag model, form the bottom expression of human body behavior in the training video.
Preferably, description is 3DSIFT description.
Further, step 2 specifically includes herein below:
Step 2a:Training video d ∈ { 1 ..., M }, wherein, M are extracted according to prior distribution Uniform (M) of the parameter for M For the quantity of all training videos;
Step 2b:It is θ according to parameterdThe distribution of global behavior pattern, extract from the training video d being extracted global Behavioral pattern zD, n=k, k=1 ..., K, wherein K represent the number of all different global behavior patterns;
Step 2c:According to depending on global behavior pattern z that is extractedD, n=k, parameter are τkLocal behavior mould Formula is distributed, and extracts local behavioral pattern hD, n=r, r=1 ..., R, wherein R represent the number of all different local behavioral pattern;
According to depending on the local behavioral pattern h that is extractedD, n=r, parameter are φrThe vision word point Cloth, extracts vision word wD, n∈ { 1 ..., V }.
Preferably, to the parameter θd、τkAnd φrGiving K dimension Dirichlet prior distributions, parameter that parameter is α respectively is The R dimension Dirichlet prior distributions of γ and the V dimension Dirichlet prior distributions that parameter is β.
Preferably, the global behavior pattern distribution and/or local behavioral pattern distribution and/or the vision word Be distributed as multinomial distribution.
Further, step 3 specifically includes herein below:
Step 3a:With the meansigma methodss of global behavior pattern frequency of occurrence described in each training videoAs to described The expression of training video;
Step 3b:The expression is transported to into figure parameters for ηcLinear classifier in, obtain the value of discriminant functionWherein c=1 ..., C represent c classes, and C represents class number;
Step 3c:Calculate the loss based on largest interval criterionIts In when the true classification of the video is c,Otherwise
Step 3d:Introduce and the loss ζD, cCorresponding hidden variable λD, c, and by the loss ζD, cIt is expressed as mixed distribution Form.
Further, step 4 specifically includes herein below:
Step 4a:To the global behavior pattern belonging to each vision word in the training video and local behavioral pattern The random integer value in interval [1, K] and [1, R] is given respectively;
Step 4b:Calculate hd,n=r, zD, n=k, λD, c、ηD, cD, cRepresent variable ηcD-th element) Posterior distrbutionp, And repeated sampling in turn is carried out respectively, until restraining or reaching a predetermined sampling number;
Step 4c:With the parameter θd、τkAnd φrPosterior distrbutionp average combined sampling after each statistic obtain to institute State parameter θd、τkAnd φrEstimation;
Step 4d:Record ASSOCIATE STATISTICS amount, for the deduction process of test video.
Further, also include the step of being identified to test video 5, the step 5 specifically includes herein below:
Step 5a:To the global behavior pattern belonging to each vision word in test video and local behavioral pattern difference Give the random integer value in interval [1, K] and [1, R];
Step 5b:Combine each parameter value and the statistic in the training video obtained in above-mentioned steps 4, to the test Global behavior pattern z in video belonging to each vision wordD, nWith local behavioral pattern hD, nSampled, until reaching convergence Condition reaches a predetermined sampling number;
Step 5c:Calculate the meansigma methodss of the frequency of occurrence of all global behavior patterns in the test videoAs right The expression of the test video;
Step 5d:Discriminant function parameter η obtained using studyc, calculate the test video and belong to all kinds of score valuesAnd the test video is divided into into that maximum class of score value, complete identification.
Preferably, local feature is characterized as described in extracting in step 1.
The vision Human bodys' response method based on Bayesian model of the present invention, by introducing largest interval mechanism to knowledge In other model, unite to form the layering Bayesian model of a unified discriminant with identification model before, so as to reality Show unified training video and represented purpose with the parameter of identification model, can effectively tackle the situation of complex behavior background, And then realize the Activity recognition of robust.
Description of the drawings
Fig. 1 is the flow chart of the vision Human bodys' response method based on Bayesian model of the present invention;
Fig. 2 is the layering Bayesian model schematic diagram of the present invention.
Specific embodiment
With reference to the accompanying drawings describing the preferred embodiment of the present invention.It will be apparent to a skilled person that this A little embodiments are used only for explaining the know-why of the present invention, it is not intended that limit the scope of the invention.
The hardware of the carrying out practically of the vision Human bodys' response method based on Bayesian model of the present invention and programming language Speech is not restricted by, and is write with any language and can realize the method for the present invention.Have in 2.83G hertz for example with one The computer of central processor and 4G byte of memorys, and with Matlab language combine with VC++ establishment based on complementary expression with it is embedding Enter the working procedure of multiple randomness, can be achieved with the method for the present invention.
Fig. 1 is the flow chart of the vision Human bodys' response method based on Bayesian model of the present invention, and Fig. 2 is the present invention Layering Bayesian model schematic diagram.The method is comprised the following steps:
Step 1:The feature in training video is extracted, the bottom to human body behavior in the training video is formed and is expressed;
Step 2:From the feature, layering Bayesian model is built, to extract different scale in the training video Under human body behavioral pattern, obtain the human body Behavior Expression based on high-layer semantic information;
Step 3:Embedded largest interval mechanism, realizes the study of the layering Bayesian model of discriminant;
Step 4:Learn the parameter of the layering Bayesian model of the discriminant, to determine the parameter.
In step 1, the feature extracted is preferably the local feature of human body behavior in training video.In addition it is also possible to Using global characteristics, but compared with global characteristics, typically to insensitive for noise, robustness is more preferable, so excellent here for local feature Local feature is selected first.
Specifically, step 1 comprises the steps:
Step 1a:Based on the pixel value changes of pixel in the training video, in detecting the training video The point of significance of human body behavior;
Step 1b:Centered on each point of significance, description is built respectively, centered on being formed to each point of significance The description of regional area;
Step 1c:All description are clustered, corresponding vision word and visual dictionary is formed, and then is built The histogram vectors of word-based bag model, form the bottom expression of human body behavior in the training video.
In the human body Behavior Expression of the histogram vectors of word-based bag model, each training video is because comprising multiple visions Word and be considered visual document d, wherein d ∈ { 1 ..., M }, M represent total visual document number, namely total video Number, vision word are designated as wD, n, n ∈ { 1 ..., Nd, wherein NdRepresent that the vision word in whole training video is total.
In the present invention, 3DSIFT descriptions is preferably used as description.
Common SIFT (Scale invariant features transform) descriptions son is to ask gradient to carry out calculating finally in the space dimension of image Eigenvalue, 3DSIFT here be will be general 2DSIFT description from image spreading to video, cover space dimension and time Dimension has three-dimensional XYT altogether, can preferably embody apparent characteristic.Therefore in the present invention 3DSIFT descriptions preferably, better than 2DSIFT and Other description.
Specifically, step 2 comprises the steps:
Step 2a:Extracted for the prior distribution Uniform (M) of M according to parameter and practice video d ∈ { 1 ..., M }, wherein, M is The quantity of whole training videos.
Step 2b:It is θ according to parameterdThe distribution of global behavior pattern, extract from the training video d being extracted global Behavioral pattern zD, n=k, k=1 ..., K, wherein K represent the number of all different global behavior patterns.
Step 2c:According to depending on global behavior pattern z that is extractedD, n=k, parameter are τkLocal behavior mould Formula is distributed, and extracts local behavioral pattern hD, n=r, r=1 ..., R, wherein R represent the number of all different local behavioral pattern.
Step 2d:According to depending on the local behavioral pattern h that is extractedD, n=r, parameter are φrThe vision list The distribution of word, extracts vision word wD, n∈ { 1 ..., V }.
Repeat NdSecondary step 2b~step 2d, until generate each vision word in training video d, wherein NdRepresent Vision word number in training video d.
Preferably, adopt and be uniformly distributed as prior distribution Uniform (M) so that each training video is when initial There is the chance equally drawn.In addition with other distributions, but can also be represented in advance to all training videos " with being uniformly distributed Treat as core ", information is had no bias for, generally more there is reasonability.
Preferably, to the parameter θd、τkAnd φrGiving K dimension Dirichlet prior distributions, parameter that parameter is α respectively is The R dimension Dirichlet prior distributions of γ and the V dimension Dirichlet prior distributions that parameter is β.
Preferably, in step 2b, under conditions of given current training video, global behavior pattern is distributed as ginseng Multinomial distribution Mult (z of the number for θD, n|θ)。
In step 2c, the local behavioral pattern under each global behavior pattern is distributed as multinomial distribution Mult (hD, n | τ, zD, n)。
Preferably, in step 2d, the condition of vision word is distributed as multinomial distribution, is abbreviated as Mult (wD, n|hD, n, φ)。
In the present invention, above-mentioned parameter θ can be tried to achieve as followsd、τkAnd φr
According in the current training video d being extracted global behavior pattern distribution parameter θ prior distribution p (θ | α, D), extract the global behavior pattern distribution variable θ of current training video dd, wherein θdIt is the matrix of a M × K, represents per a line The distribution of global behavior pattern in each training video, α are the vectors of a K dimension, represent the Di Li Crays elder generation obeyed by θ The parameter of distribution is tested, K is the number of all global behavior patterns;
According to the prior distribution p of the distributed constant τ of local behavioral pattern when given global behavior pattern is distributed (τ | γ, z =k), extract the local behavioral pattern distribution variable τ of current training video dk, wherein τkIt is the matrix of a K × R, per a line generation The distribution of local behavioral pattern in table each global behavior pattern, γ is the vector of a R dimension, represents Di that τ is obeyed The parameter of sharp Cray prior distribution, R are the numbers of all local behavioral pattern;
According to the prior distribution p of the distributed constant φ of vision word under given local behavioral pattern (φ | β, h=r), take out Take the distribution variable φ of the vision word under current local behavioral patternr, wherein h=r is to represent current local behavior mould Formula value is r, φrIt is the matrix of a R × V, represents dividing for the vision word under each local behavioral pattern per a line Cloth, β are the vectors of a V dimension, represent φrThe parameter of the Dirichlet prior distribution obeyed, V is the word of vision word composition Allusion quotation size.
Specifically, step 3 comprises the steps:
Step 3a:With the meansigma methodss of global behavior pattern frequency of occurrence described in each training videoAs to described The expression of training video;
Step 3b:The expression is transported to into figure parameters for ηcLinear classifier in, obtain the value of discriminant functionWherein c=1 ..., C represent c classes, and C represents class number;
Step 3c:Calculate the loss based on largest interval criterionIts In when the true classification of the video is c,Otherwise
Step 3d:Introduce and the loss ζD, cCorresponding hidden variable λD, c, and by the loss ζD, cIt is expressed as mixed distribution Form.
Specifically, step 4 comprises the steps:
Step 4a:To the global behavior pattern belonging to each vision word in the training video and local behavioral pattern The random integer value in interval [1, K] and [1, R] is given respectively;
Step 4b:Calculate hd,n=r, zD, n=k, λD, c、ηD, cPosterior distrbutionp, and carry out repeated sampling in turn respectively, directly To restraining or reach a predetermined sampling number;
Wherein, h is calculated as followsd,nThe Posterior probability distribution of=r, and which is sampled:
Subscript "-" in formula represents the vision word for not calculating current n-th in statistics, and D represents overall training set,Represent in addition to Current vision word, vision word number of the global behavior pattern value for k,Represent to remove and work as forward sight Feel beyond word, vision word number of the local behavioral pattern value for r,Represent in addition to Current vision word, global row It is k vision word numbers of the local behavioral pattern for r simultaneously for pattern,Represent in addition to Current vision word, local behavior Pattern is r vision word numbers of the vision word value for w simultaneously itself.
Z is calculated as followsD, nThe Posterior probability distribution of=k is as follows, and which is sampled:
Wherein, Expression global behavior in addition to current word in document d Number of words of the pattern for k,Represent the number of the whole vision words in training video d in addition to Current vision word Mesh, ηC, kRepresent vectorK-th element.
Calculate λD, cPosterior probability it is as follows, and which is sampled:
WhereinRepresent that variable x is obeyed with q, the broad sense dead wind area of b, g for parameter.
The Posterior distrbutionp of variable η is calculated as follows, and to sampling:
Wherein,
Repeat the above steps, in turn sample variation, zD, n、λD, c、ηD, cUntil restraining or reaching a predetermined sampling number. For example just stop sampling when the relative change of sample variation is less than 1e-7, or can generally arrange and reach when sampling number Stop for 100 times or so.
Step 4c:With the parameter θd、τkAnd φrPosterior distrbutionp average combined sampling after each statistic obtain to institute State parameter θd、τkAnd φrEstimation;
Step 4d:Record ASSOCIATE STATISTICS amount, for the deduction process of test video.Specifically, with θd、τkAnd φrAfter The each statistic tested after distribution average combined sampling is obtained to parameter θd、τkAnd φrEstimation, and record each statistic now, Including Nkr、Nrw、NkAnd Nr, local behavioral pattern under global behavior pattern k is illustrated respectively in for the vision word number of r, in office Under portion behavioral pattern r vision word value be the number of words of w, all global behavior pattern values for k vision word number and Number of words of all local behavioral pattern values for r.
The method of the present invention also includes the step of being identified to test video, and the step specifically includes following steps:
Step 5a:To the global behavior pattern belonging to each vision word in test video and local behavioral pattern difference Give the random integer value in interval [1, K] and [1, R];
Step 5b:Combine each parameter value and the statistic in the training video obtained in above-mentioned steps 4, to the test Global behavior pattern z in video belonging to each vision wordD, nWith local behavioral pattern hD, nSampled, until reaching convergence Condition reaches a predetermined sampling number;Wherein,
A () is sampled to the global behavior pattern in test video using following formula:
WhereinThe corresponding global behavior pattern of n-th vision word in expression test video d,Represent current to survey The data possessed by examination video d;The corresponding local behavioral pattern of n-th vision word in expression test video d, N-th vision word in expression test video d,Respectively represent test video d in vision list Word sum, global behavior pattern value is the vision word sum of k, global behavior pattern is in test video d in test video d K and local behavioral pattern are total for the vision word of k for all global behavior patterns in the vision word sum and test video of r Number, NK, rAnd NkIt is then the ASSOCIATE STATISTICS amount in the training set that records in step 4.
B () is sampled to the local behavioral pattern in test video using following formula:
WhereinWithBe illustrated respectively in behavioral pattern value in local in test video for r vision word sum and The number of words of word itself value w when behavioral pattern value is r in local;NrAnd NR, wRecord in being illustrated respectively in step 4g Training set in ASSOCIATE STATISTICS amount.
Repeat step (a) (b), in turn to the global behavior pattern corresponding to the vision word in test video and partial row Sampled for pattern, until reaching the condition of convergence or reaching a predetermined sampling number.For example when the relative change of sample variation Just stop sampling when changing less than 1e-7, or can generally arrange and stop when sampling number reaches 100 times or so.
Step 5c:Calculate the meansigma methodss of the frequency of occurrence of all global behavior patterns in the test videoAs right The expression of the test video;
Step 5d:Discriminant function parameter η obtained using studyc, calculate the test video and belong to all kinds of score valuesAnd the test video is divided into into that maximum class of score value, complete identification.
The step of by being identified to test video, can assess by above the step of the identity of model set up Can, and then model can be improved.
The vision Human bodys' response method based on Bayesian model of the present invention, by introducing largest interval mechanism to knowledge In other model, unite to form the layering Bayesian model of a unified discriminant with identification model before, so as to reality Show unified training video and represented purpose with the parameter of identification model, can effectively tackle the situation of complex behavior background, And then realize the Activity recognition of robust.
So far, technical scheme is described already in connection with preferred implementation shown in the drawings, but, this area Technical staff is it is easily understood that protection scope of the present invention is expressly not limited to these specific embodiments.Without departing from this On the premise of the principle of invention, those skilled in the art can make the change or replacement of equivalent to correlation technique feature, these Technical scheme after changing or replacing it is fallen within protection scope of the present invention.

Claims (10)

1. a kind of vision Human bodys' response method based on Bayesian model, the method are comprised the following steps:
Step 1:The feature in training video is extracted, the bottom to human body behavior in the training video is formed and is expressed;
Step 2:From the feature, layering Bayesian model is built, to extract in the training video under different scale Human body behavioral pattern, obtains the human body Behavior Expression based on high-layer semantic information;
Step 3:Embedded largest interval mechanism, realizes the study of the layering Bayesian model of discriminant;
Step 4:Learn the parameter of the layering Bayesian model of the discriminant, to determine the parameter.
2. method according to claim 1, it is characterised in that the step 1 includes:
Step 1a:Based on the pixel value changes of pixel in the training video, the human body in the training video is detected The point of significance of behavior;
Step 1b:Centered on each point of significance, description is built respectively, is formed to the local centered on each point of significance The description in region;
Step 1c:All description are clustered, corresponding vision word and visual dictionary is formed, and then structure is based on The histogram vectors of word bag model, form the bottom expression of human body behavior in the training video.
3. method according to claim 2, it is characterised in that description is 3DSIFT description.
4. method according to claim 2, it is characterised in that the step 2 includes:
Step 2a:Training video d ∈ { 1 ..., M } are extracted according to prior distribution Uniform (M) of the parameter for M, wherein, M is complete The quantity of training video described in portion;
Step 2b:It is θ according to parameterdThe distribution of global behavior pattern, extract global behavior from the training video d being extracted Pattern zD, n=k, k=1 ..., K, wherein K represent the number of all different global behavior patterns;
Step 2c:According to depending on global behavior pattern z that is extractedD, n=k, parameter are τkLocal behavioral pattern point Cloth, extracts local behavioral pattern hD, n=r, r=1 ..., R, wherein R represent the number of all different local behavioral pattern;
Step 2d:According to depending on the local behavioral pattern h that is extractedD, n=r, parameter are φrThe vision word Distribution, extracts vision word wD, n∈ { 1 ..., V }.
5. method according to claim 4, it is characterised in that to the parameter θd、τkAnd φrParameter is given respectively for α's K dimension Dirichlet prior distributions, the R dimension Dirichlet prior distributions that parameter is γ and parameter are divided for the V dimension Dirichlet priors of β Cloth.
6. method according to claim 4, it is characterised in that the global behavior pattern distribution and/or the partial row It is distributed for pattern and/or the vision word is distributed as multinomial distribution.
7. method according to claim 4, it is characterised in that the step 3 includes:
Step 3a:With the meansigma methodss of global behavior pattern frequency of occurrence described in each training videoAs to the training The expression of video;
Step 3b:The expression is transported to into figure parameters for ηcLinear classifier in, obtain the value of discriminant function Wherein c=1 ..., C represent c classes, and C represents class number;
Step 3c:Calculate the loss based on largest interval criterionWherein when When the true classification of the video is c,Otherwise
Step 3d:Introduce and the loss ζD, cCorresponding hidden variable λD, c, and by the loss ζD, cIt is expressed as mixed distribution shape Formula.
8. method according to claim 7, it is characterised in that the step 4 includes:
Step 4a:To the global behavior pattern belonging to each vision word in the training video and local behavioral pattern difference Give the random integer value in interval [1, K] and [1, R];
Step 4b:Calculate hd,n=r, zD, n=k, λD, c、ηD, cPosterior distrbutionp, and carry out repeated sampling in turn respectively, until receiving Hold back or reach a predetermined sampling number;
Step 4c:With the parameter θd、τkAnd φrPosterior distrbutionp average combined sampling after each statistic obtain to the ginseng Number θd、τkAnd φrEstimation;
Step 4d:Record ASSOCIATE STATISTICS amount, for the deduction process of test video.
9. method according to claim 8, it is characterised in that include the step of being identified to test video 5, the step Rapid 5 include:
Step 5a:Global behavior pattern belonging to each vision word in test video and local behavioral pattern are given respectively Random integer value in interval [1, K] and [1, R];
Step 5b:Combine each parameter value and the statistic in the training video obtained in above-mentioned steps 4, to the test video In global behavior pattern z belonging to each vision wordD, nWith local behavioral pattern hD, nSampled, until reaching the condition of convergence Or reach a predetermined sampling number;
Step 5c:Calculate the meansigma methodss of the frequency of occurrence of all global behavior patterns in the test videoAs to described The expression of test video;
Step 5d:Discriminant function parameter η obtained using studyc, calculate the test video and belong to all kinds of score values And the test video is divided into into that maximum class of score value, complete identification.
10. the method according to any one of claim 1-9, it is characterised in that what is extracted in step 1 described is characterized as office Portion's feature.
CN201610921854.2A 2016-10-21 2016-10-21 Visual human behavior recognition method based on Bayesian model Pending CN106529426A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610921854.2A CN106529426A (en) 2016-10-21 2016-10-21 Visual human behavior recognition method based on Bayesian model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610921854.2A CN106529426A (en) 2016-10-21 2016-10-21 Visual human behavior recognition method based on Bayesian model

Publications (1)

Publication Number Publication Date
CN106529426A true CN106529426A (en) 2017-03-22

Family

ID=58291503

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610921854.2A Pending CN106529426A (en) 2016-10-21 2016-10-21 Visual human behavior recognition method based on Bayesian model

Country Status (1)

Country Link
CN (1) CN106529426A (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103854027A (en) * 2013-10-23 2014-06-11 北京邮电大学 Crowd behavior identification method
CN103942533A (en) * 2014-03-24 2014-07-23 河海大学常州校区 Urban traffic illegal behavior detection method based on video monitoring system
CN104598889A (en) * 2015-01-30 2015-05-06 北京信息科技大学 Human action recognition method and device
CN104881651A (en) * 2015-05-29 2015-09-02 南京信息工程大学 Figure behavior identification method based on random projection and Fisher vectors
CN104966052A (en) * 2015-06-09 2015-10-07 南京邮电大学 Attributive characteristic representation-based group behavior identification method

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103854027A (en) * 2013-10-23 2014-06-11 北京邮电大学 Crowd behavior identification method
CN103942533A (en) * 2014-03-24 2014-07-23 河海大学常州校区 Urban traffic illegal behavior detection method based on video monitoring system
CN104598889A (en) * 2015-01-30 2015-05-06 北京信息科技大学 Human action recognition method and device
CN104881651A (en) * 2015-05-29 2015-09-02 南京信息工程大学 Figure behavior identification method based on random projection and Fisher vectors
CN104966052A (en) * 2015-06-09 2015-10-07 南京邮电大学 Attributive characteristic representation-based group behavior identification method

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
SHUANG YANG等: "Multi-Feature Max-Margin Hierarchical Bayesian Model for Action Recognition", 《THE IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION(CVPR) 2015》 *

Similar Documents

Publication Publication Date Title
CN108717568B (en) A kind of image characteristics extraction and training method based on Three dimensional convolution neural network
CN110414377B (en) Remote sensing image scene classification method based on scale attention network
CN110348399B (en) Hyperspectral intelligent classification method based on prototype learning mechanism and multidimensional residual error network
CN109086773A (en) Fault plane recognition methods based on full convolutional neural networks
CN109034210A (en) Object detection method based on super Fusion Features Yu multi-Scale Pyramid network
CN109255340A (en) It is a kind of to merge a variety of face identification methods for improving VGG network
CN109815826A (en) The generation method and device of face character model
CN109711426A (en) A kind of pathological picture sorter and method based on GAN and transfer learning
CN104268593A (en) Multiple-sparse-representation face recognition method for solving small sample size problem
CN114841257B (en) Small sample target detection method based on self-supervision comparison constraint
CN105160400A (en) L21 norm based method for improving convolutional neural network generalization capability
CN113762138B (en) Identification method, device, computer equipment and storage medium for fake face pictures
CN105243139A (en) Deep learning based three-dimensional model retrieval method and retrieval device thereof
CN108985929A (en) Training method, business datum classification processing method and device, electronic equipment
CN104809469A (en) Indoor scene image classification method facing service robot
CN110443286A (en) Training method, image-recognizing method and the device of neural network model
CN104298974A (en) Human body behavior recognition method based on depth video sequence
CN109063719A (en) A kind of image classification method of co-ordinative construction similitude and category information
CN113158861B (en) Motion analysis method based on prototype comparison learning
CN112784929A (en) Small sample image classification method and device based on double-element group expansion
CN103942571A (en) Graphic image sorting method based on genetic programming algorithm
CN113569895A (en) Image processing model training method, processing method, device, equipment and medium
CN109598671A (en) Image generating method, device, equipment and medium
CN108416397A (en) A kind of Image emotional semantic classification method based on ResNet-GCN networks
CN110443232A (en) Method for processing video frequency and relevant apparatus, image processing method and relevant apparatus

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20170322

RJ01 Rejection of invention patent application after publication