CN106529467B - Group behavior recognition methods based on multi-feature fusion - Google Patents
Group behavior recognition methods based on multi-feature fusion Download PDFInfo
- Publication number
- CN106529467B CN106529467B CN201610976817.1A CN201610976817A CN106529467B CN 106529467 B CN106529467 B CN 106529467B CN 201610976817 A CN201610976817 A CN 201610976817A CN 106529467 B CN106529467 B CN 106529467B
- Authority
- CN
- China
- Prior art keywords
- feature
- people
- behavior
- information
- scene
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/50—Context or environment of the image
- G06V20/52—Surveillance or monitoring of activities, e.g. for recognising suspicious objects
- G06V20/53—Recognition of crowd images, e.g. recognition of crowd congestion
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
- G06F18/2411—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on the proximity to a decision surface, e.g. support vector machines
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/35—Categorising the entire scene, e.g. birthday party or wedding scene
- G06V20/36—Indoor scenes
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/35—Categorising the entire scene, e.g. birthday party or wedding scene
- G06V20/38—Outdoor scenes
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/10—Character recognition
- G06V30/26—Techniques for post-processing, e.g. correcting the recognition result
- G06V30/262—Techniques for post-processing, e.g. correcting the recognition result using context analysis, e.g. lexical, syntactic or semantic context
- G06V30/274—Syntactic or semantic context, e.g. balancing
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Multimedia (AREA)
- Data Mining & Analysis (AREA)
- Computational Linguistics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Evolutionary Computation (AREA)
- General Engineering & Computer Science (AREA)
- Evolutionary Biology (AREA)
- Bioinformatics & Computational Biology (AREA)
- Artificial Intelligence (AREA)
- Life Sciences & Earth Sciences (AREA)
- Image Analysis (AREA)
Abstract
The invention discloses group behavior recognition methods based on multi-feature fusion, it include: to be extracted to the characteristic information of three different levels, it is respectively: feature is mentioned to single people, primary concern is that everyone position, size, motion information in every frame, and the feature that everyone is extracted with convolutional neural networks;Semantic feature extraction is carried out for interpersonal interaction, mainly considers interpersonal external action relationships and relative orientation relationship;Scene information is extracted to environment locating for people in group behavior.Using full link conditional random field models, these characteristic informations are merged, realize the identification to group behavior.This method considers various features information simultaneously, and more comprehensively, more effectively group behavior can be described, and improves the discrimination of group behavior, there is important application value in video monitoring.
Description
Technical field
The invention belongs to image processing techniques and area of pattern recognition, in particular to group behavior based on multi-feature fusion
Recognition methods.
Background technique
Activity recognition is the forward position direction being concerned in computer vision field, single people and it is double between row
In Study of recognition method, to have been achieved for significantly achieving.And in recent years, due to video monitoring, human-computer interaction, it is based on video
Content retrieval demand it is increasing, group behavior identification become gradually computation vision and pattern-recognition research hotspot it
One.But number involved in group behavior it is more and be not fixed, the variability of interpersonal interactive relation and scene are answered
Polygamy has greatly challenge to the research of group behavior.
In recent years, it works in many group behavior Study of recognition and is dedicated to studying what semantic information identified group behavior
It influences, and achieves certain achievement.For the group behavior under analysis video monitoring, what it is with greater need for consideration is semantic information,
I.e. interpersonal interaction and everyone institute's role in special group.Choi proposes a kind of semantic descriptions
Spatio-Temporal Local (STL), is mainly described between them using the relativeness of interpersonal posture
Interbehavior.The it is proposed of this descriptor for capturing semantic relation is the Shape Context algorithm in a pattern classification field
Based on, this feature captures other people spatial relations and face direction relations relative to person who attract people's attention.With field
Centered on someone in scape, position and the face direction relations of people around are calculated, are finally indicated with histogram.STL feature
Interpersonal spatial relation and certain interactive relation can be effectively captured, but its drawback is that not retouching
Interpersonal action relationships are stated, so recognition effect is less desirable.Lan proposes that a kind of Action Semantic based on appearance is retouched
Symbol (Action Context) is stated, is preferably described adjacent to the external action relationships of people when forefathers' using everyone and surrounding
Behavior.This descriptor is relatively good for the movement biggish group behavior recognition effect of diversity ratio, but the variation for visual angle
Compare sensitive, causes discrimination not high.Takuhiro combines the advantages of Lan and Choi method, interpersonal in consideration
On the basis of action relationships, it is also contemplated that interpersonal relative orientation relationship, it is insensitive to visual angle change, to recognition result
It has a certain upgrade, but still it is not ideal enough.It is right from above method as can be seen that their characteristic informations of consideration are relatively simple
Group behavior changeable in number and that interaction is complicated, we should extract characteristic information from many aspects, be integrated, in this way can be with
Group behavior more comprehensively and is effectively described.
Summary of the invention
For group behavior, since existing number is relatively more, and the behavior that everyone is showed is variant, if
It regards them as a group behavior, only extracts interaction feature and analyzed, it is clear that Shortcomings, it is contemplated that more energy
Enough characteristic informations for effectively describing group behavior, and comprehensively consider these features, group behavior knowledge could be carried out more significantly
Not.It is an object of the invention to propose a kind of group behavior recognition methods based on multi-feature fusion.It is characterized in that this method packet
Include following steps:
Step 1 is divided into the feature extraction that three parts carry out different levels, extracts single feature letter for single people respectively
Breath extracts interaction feature to interpersonal interactive relation, and interactive relation therein is mainly that interpersonal movement is closed
System and relative orientation relationship, and scene information extraction is carried out to environment locating for the people in group behavior;
Step 2, Fusion Features: interpersonal interaction feature is merged with scene information, using containing radial base letter
Several support vector machines (Support Vector Machine, SVM) sorting algorithms obtain behavior score, as full link condition
The unitary gesture of random field models, the binary gesture to the extracted characteristic information of single people as full link conditional random field models,
By all Fusion Features of extraction in a model, group behavior identification is carried out.
As a further improvement of the present invention, the step 1 specifically includes:
Step 1-1, one characteristic information is extracted, mainly considers everyone location information, size information (height letter
Breath), motion information (wherein location information and size information are provided in database), these three features mainly reflect each
The characteristic feature of people proposes feature to single people using convolutional neural networks (Convolution Neural Network, CNN),
It is also the supplement to the three kinds of characteristic feature information in front;
Step 1-2, feature extraction is carried out for interpersonal interactive relation, respectively centered on everyone, by him
The people neighbouring with him of surrounding is considered as his context, according to the behavior that itself behavior and the neighbouring people of surrounding are showed, extracts
Behavior contextual feature is denoted as AC descriptor, and this descriptor only captures interpersonal action relationships;It is basic herein
On, and consider everyone and he around neighbouring people relative orientation relationship, extract opposite contextual feature, be denoted as RAC descriptor;
Step 1-3, scene locating for the people in people's group behavior also provides necessary clue for Activity recognition, to locating for people
Environment extract scene information, mainly consider three kinds of scene informations: outdoor, indoor, automobile.The extraction of scene information is divided into two
Step carries out, and carries out outdoor, indoor classification to scene using spatial pyramid distribution method first, secondly observes field using eye tracker
Scape picture, available area-of-interest, analyzes area-of-interest, and whether see in scene has automobile storage to exist.
As a further improvement of the present invention, the step 2 specifically includes:
Step 2-1, it calculates the unitary gesture of full link conditional random field models: interpersonal interaction feature AC is described
Symbol and RAC descriptor merge to obtain new feature vector respectively with scene information, are classified using svm classifier model and are gone
For score, then it is converted to probability by softmax respectively, and to both probability by asking max to obtain new probability vector,
Using obtained result as the unitary gesture of full link conditional random field models;
Step 2-2, conditional random field models Fusion Features: are linked as complete for all characteristic informations that single people extracts
Binary gesture, learnt automatically according to the unitary gesture of model and binary gesture, carry out group behavior identification;
Beneficial effect
In current existing group behavior Study of recognition method, feature is proposed primarily directed to interpersonal interaction,
They will regard the owner in scene as a group and analyze, but be often possible in the video monitoring scene of reality
There are multiple groups, and each group carries out different activity, such as: a total of 5 people in scene, wherein there is 4 people
It is trapped among and talks together, as soon as but there is a people just to pass through on foot from the side, this people and other 4 people are not a groups,
Because the behavior that they show is different.Obviously it is unreasonable for owner being regarded as a group to carry out analysis.And
Current group behavior research method does not all account for the scene information of people's local environment, but scene information is for Activity recognition
Some clues can be provided.Such as: it is understood that behavior occurs in outdoor, has automobile, zebra stripes or traffic lights, then we are just
It may determine that this is unlikely to be talk or is lined up behavior, be larger a possibility that going across the road behavior instead;If it is generation
Indoors, it may not be possible to being to go across the road or go across the road in waiting.Therefore introducing scene information has group behavior analysis
Certain significance.Certainly, it can be found out in our experimental result, consider that scene information is effectively.Therefore I
Way be: consider feature, interpersonal interaction feature and the scene information of single people, and utilize full linking bar
Part random field models merge these characteristic informations, and realize automatically divide group (divide group foundation be: belong to the same group
Everyone have similar position, size and motion information), to achieve the effect that preferably to identify group behavior.
Detailed description of the invention
The main flow chart of Fig. 1 invention.
Fig. 2 proposes feature to single people using convolutional neural networks (CNN).
Annotation track figure that Fig. 3 is obtained according to eye movement test, annotation hotspot graph.
The identification knot of experiment of Fig. 4 context of methods on Collectivity Activity Dataset database
Fruit.
Specific embodiment
The invention will be further described with example with reference to the accompanying drawing, it is noted that described example is only intended to
Convenient for the understanding of the present invention, and any restriction effect is not played to it.
Group behavior recognition methods based on multi-feature fusion, includes the following steps:
Step 1, point three parts carry out different feature extractions, single characteristic information are extracted for single people respectively, to people
Interactively pick-up interaction feature between people, and scene information extraction is carried out to environment locating for the people in group behavior;
Step 2 merges interpersonal interaction feature with scene information, using containing radial basis function
Svm classifier algorithm obtains behavior score, and as the unitary gesture of full link conditional random field models, and it is extracted to be directed to single people
Binary gesture of the characteristic information as full link conditional random field models, by extracted all Fusion Features in a model,
Carry out group behavior identification.
The process of feature extraction includes:
Step 1-1, the characteristic information mentioned for single people mainly considers everyone location information, size information
(elevation information), motion information (wherein location information and size information are provided in database), and use convolutional Neural net
Network (Convolution Neural Network, CNN) proposes feature to single people;
Step 1-2, feature extraction is carried out for interpersonal interaction, respectively centered on everyone, around him
The people neighbouring with him be considered as his context, according to the behavior that itself behavior and the neighbouring people of surrounding are showed, extract behavior
Contextual feature is denoted as AC descriptor, on this basis, and consider everyone and he around neighbouring people relative orientation relationship,
Opposite contextual feature is extracted, RAC descriptor is denoted as;
Step 1-3, scene information is extracted to environment locating for people, mainly considers three kinds of scene informations: outdoor, indoor, vapour
Vehicle.The extraction of scene information is divided into the progress of two steps, carries out outdoor, indoor point to scene using spatial pyramid distribution method first
Class, secondly observes scene picture using eye tracker, and available area-of-interest analyzes area-of-interest, sees scene
In whether there is automobile storage to exist.
Step 1-1, feature extraction is carried out to each of every frame.Specific operating process has:
(1) the group behavior database Collectivity Activity dataset used in us provides everyone
Three dimensional local information, it is possible to obtain everyone location information, size information (elevation information).Using optical flow method
(HOF) motion information for extracting everyone indicates the dynamic and static state of people.
(2) feature is mentioned to everyone using convolutional neural networks (CNN).Convolutional neural networks are by multiple convolution, drop
Sampling operation may include the non-of whole sub-picture so proposing feature by convolutional neural networks in high-level carry out semantic integration
Normal information abundant, compares general characteristic feature, can more effectively describe everyone Global Information.Wherein convolution operation
It is to carry out convolution to a neighborhood of image to obtain the neighborhood characteristics of image, it can be such that original signal feature enhances, and reduce
Noise.It then is to integrate the characteristic point in neighborhood to obtain new feature followed by down-sampled operation at it, the purpose is to drop
Dimension so that intrinsic dimensionality is reduced, and keeps certain invariance (rotation, translation, flexible etc.).As shown in Fig. 2, being exactly that we are adopted
The structure of CNN, Cx indicate volume base, and Sx indicates down-sampled layer.Everyone detection block is given in database, due to
CNN carries out needing the size of picture to be consistent when feature extraction, is then normalized into everyone detection block identical big
It is small, using being normalized into 60 × 60 in our experiment, it can be seen from the figure that we use three convolutional layers, two
A down-sampled layer is finally 160 dimensions to the characteristic dimension that everyone mentions.
Step 1-2, feature is proposed for interpersonal interaction.Specific operating process has:
(1) extraction of behavior contextual feature (AC), this feature consider everyone and he around people adjacent to it
Behavior expression.HOG feature is mentioned to everyone, is then classified by SVM, the score of every class behavior: A is obtainedi=[S1i,
S2i,…,SKi], wherein SniIndicate the score that behavior label n is corresponded to by resulting i-th of the people of SVM classifier.It is with i-th of people
The people region of (dis ∈ (0.5 × h, 2 × h)) neighbouring around him is considered as context area, to the extracted region by center
Contextual feature (the wherein height that h corresponds to everyone):
M: context area is divided into M sub-regions, Nm(i): i-th of people in m-th of subregion.Such as at first
There are 2 people close with him in the region sub-context, around him, then the behavior score of the two people is taken out, and takes every
The maximum value of a respective behavior score obtains first sub- contextual feature.Behavior contextual feature can be obtained are as follows: ACi=[Ai,
Ci]。
(2) extraction of opposite behavior contextual feature (RAC), RAC, which is considered, not only allows for behavioural characteristic, also captures
The relativeness of people from center and people around, such as: people from center is towards the right side, and another person around him is towards a left side, then their opposite is closed
System is defined as in the opposite direction.AC descriptor is not due to accounting for relative orientation relationship, so more sensitive to visual angle change,
RAC descriptor overcomes this defect, is the improvement to AC descriptor.The extracting method of RAC descriptor is similar with AC's, because
Behavior and direction are considered simultaneously, so its behavioural characteristic is K dimension: K=U × V, U: behavior classification number, V: direction classification number.
According to HOG feature and the direction of resulting i-th of the people of svm classifier is carried out, obtains everyone opposite behavior score: According to HOG feature and carry out svm classifier
The direction of resulting i-th of people.According toThe phase of opposite the behavior descriptor and m-th of sub- context area of i-th of people can be obtained
To context descriptor:
The relative descriptors of entire context area are as follows:So the opposite behavior description of i-th of people
Symbol are as follows:
Step 1-3., scene information is extracted to environment locating for people in group behavior.Specific operating process has:
(1) it is realized using spatial pyramid matching algorithm and scene is divided into outdoor, interior.
Spatial pyramid method is the statistical picture characteristic point point in different resolution (corresponding pyramidal different levels)
Cloth, to obtain the spatial information of image.Firstly, extracting scale invariant feature conversion to every frame picture of all video sequences
(SIFT) descriptor of all pictures is carried out Kmeans cluster and generates visual dictionary by descriptor, and dictionary is dimensioned to M=
200.The frequency that all vision words of every frame image occur in different levels is calculated, according to formulaWherein L:
The level of spatial pyramid, is set as L=2, so every frame picture may finally be indicated with the feature vector of one 4200 dimension.
The classification to scene is finally realized according to spatial pyramid matching.
(2) detecting whether there is automobile in scene.
When human visual system handles more complicated scene, its visual attention can be concentrated to minority in this scenario
In several objects, make every effort to obtain the main information in scene, the region that these objects are constituted in the scene within the shortest time
Referred to as area-of-interest (ROI).The area-of-interest of image is extracted, and ROI is analyzed, the effect of information processing can be improved
Rate.Eye tracker records user's eye movement data, eye movement figure can also be drawn, watch hotspot graph attentively etc., intuitively reflect user
To diagram picture really interested region or the position of object.It utilizes eye tracker (Tobii Studio 3.3.1software)
Picture is observed, area-of-interest is obtained, and analyze these regions, sees if there is automobile storage and exist, specific practice:
Observer about 65cm at a distance from screen, the time that each width picture is presented is 8s, and is in behind every width picture
An existing width gray scale picture, Shi Changwei 2s, the purpose of setting gray scale picture are to alleviate the visual fatigue of observer, are opened in eye movement test
Eye movement correction was carried out before beginning.After the completion of eye movement test, we are available to watch trajectory diagram attentively and watches hotspot graph attentively, such as Fig. 3 institute
Show.Watch trajectory diagram attentively: record observer watches track attentively in entire experience of the process.Blue circles indicate blinkpunkt, circle
Size indicates the length of fixation time, and the more big then fixation time of circle is longer, and the digital representation in circle watches order attentively, blue
Lines indicate twitching of the eyelid.Watch hotspot graph attentively: indicating that observer is different to the attention rate of picture everywhere with different colours, so as to straight
It sees ground and sees the region that subject most pays close attention to and the region ignored.Color shows that annotation time is longer more deeply feeling, and red indicates most to close
The region of note, yellow and green expression are watched attentively horizontal relatively low, and not having coloured region then indicates not watch attentively.With excel
The form of table exports eye movement data, eye movement data will record the number of area-of-interest, each region of interest centers cross,
Ordinate value, and to the time that each area-of-interest is annotated.Eye movement data is pre-processed, does not consider fixation time
Data less than 100ms, that is to say, that if observer is too short for a region fixation time, he may not be to this area
Domain is interested.According to eye movement data, the center table of available each ROI region, and combine and watch hotspot graph attentively, with each ROI
Centre coordinate centered on, extract 180 × 120 rectangular area around it, divide followed by these rectangular areas
Analysis, sees if there is automobile storage and exists, specific algorithm step:
1) SIFT feature is mentioned to extracted each 180 × 120 rectangular area of every frame picture;
2) multiple target object (automobile) pictures are chosen, SIFT feature is extracted, and calculate the Euclidean distance between feature, asks
Averagely obtain threshold value;
3) Euclidean distance calculated between the feature and target object feature in extracted region is calculated with threshold value comparison
Euclidean distance be less than threshold value, then it is believed that the rectangular area is similar to automobile, and then may determine that in scene there is such target
(automobile) object exists.
It obtains required scene information, indicates field with the one 3 vector of binary features S=tieed up [outdoor interior automobile]
Scape information, such as: S=[1 0 1], corresponding scene information are as follows: in outdoor, have automobile.
Step 2. carries out Fusion Features using full link conditional random field models, completes group behavior identification:
Step 2-1, calculate unitary gesture: by extracted AC descriptor and RAC descriptor respectively with scene information feature to
S fusion is measured, new feature vector Scene_AC, Scene_RAC is respectively obtained.Using SVM classifier to both feature vectors
Training obtains behavior score, converts by softmax score vector being converted into matrix, and seek Max to the two:
Pi(yi): the behavior label of i-th of people is yiProbability, Pi(yi|d1): gained is calculated by feature vector Scene_AC
Probability, Pi(yi|d2): resulting probability is calculated by feature vector Scene_RAC.Then everyone unitary gesture may be expressed as:
ψu(yi)=- log (Pi(yi)) (4)
Step 2-2, the binary gesture of computation model carries out Activity recognition by linking conditional random field models entirely.Binary gesture
Indicate group in interpersonal distant relationships because in the same group everyone have similar location information,
Size information, motion information (be all it is static or all be movement) and certain higher level information (use CNN
Feature indicates), then proprietary binary gesture may be expressed as:
ψp(yi,yj)=u (yi,yj)k(fi,fj) (5)
k(fi,fj) it is Gaussian convolution and the feature mentioned with CNN to everyone: cnni, location information: pi, size letter
Breath: si, motion information: mi, weight: w.Gaussian convolution and can be calculated by the following formula gained:
Deduction and study for model can use Maximun Posterior Probability Estimation Method.
Effectiveness of the invention can be further illustrated by following emulation experiment:
Current group behavior Study of recognition method is exactly Collectivity Activity using more database
Dataset, because shot under its tangible different scenes, the people of each group's movement is also different, and these video bases
All it is the lower monitor video sequence of resolution ratio shot in daily life by hand-held camera in sheet, substantially presents
One relatively true video monitoring scene, so there is employed herein this databases as experiment.This group behavior data
Library contains 44 video sequences, wherein there are 5 kinds of commonplace group behaviors: it is lined up, talks, on foot, go across the road, etc.
To and 8 kinds of postures: forward, backward, to the left, to the right, towards it is left front, towards it is left back, towards before right, towards after right.This number
People's location information in the scene and the elevation information of people are additionally provided according to library, is provided convenience place for our research.
There is employed herein leaving-one methods to test: due to sharing 44 video sequences in database, we are every time using one of view
As test sample, remaining 43 video sequences make in this way as training sample by 44 video sequences frequency sequence
It is all tested primary as test sample, finally it is averaged the recognition result as us.
Table 1
Method | Mean (%) | Crossing (%) | Waiting (%) | Queuing (%) | Walking (%) | Talking (%) |
Choi et al.[4] | 65.9 | 55.4 | 64.6 | 63.3 | 57.9 | 83.6 |
Choi et al.[5] | 70.9 | 76.4 | 76.4 | 78.7 | 36.8 | 85.7 |
Lan et al.[7] | 79.7 | 68 | 69 | 76 | 80 | 99 |
Takuhiro et al.[8] | 73.2 | 63 | 87 | 89 | 49 | 78 |
Our method | 79.9 | 67.5 | 85.2 | 99.5 | 74.8 | 71.2 |
The experimental result that we are done on group behavior database can be observed by table 1 and Fig. 4.In table 1, I
Give every class behavior discrimination and total average recognition rate, it can be seen that our method is relative to most
It is effectively, although our method has only been higher by 0.2% than the Lan method proposed for existing research method.This
A little methods all do not account for scene information, and the priori knowledge accumulated from our daily lifes is it is recognised that be impossible indoors
Occur going across the road behavior or wait the behavior gone across the road on road side, equally if we know that behavior occurs in outdoor, and has
Automobile occurs, then one can consider that this is that the behavior gone across the road is larger.It can be seen that scene information is group's row
For identification certain important clue is provided, can be seen that certainly from our experimental result strictly feasible and effective.
Being described above is only a specific embodiment of the invention, it is clear that this field under technical solution of the present invention guidance
Anyone made by modification or part replacement, belong to claims of the present invention restriction range.
Claims (2)
1. group behavior recognition methods based on multi-feature fusion, characterized in that this method comprises the following steps:
Step 1, feature extraction: point three parts carry out different feature extractions, extract single characteristic information for single people respectively,
Scene information extraction is carried out to interpersonal interactively pick-up interaction feature, and to environment locating for the people in group behavior;
Step 2, Fusion Features: interpersonal interaction feature is merged with scene information, using containing radial basis function
Support vector cassification algorithm obtains behavior score, as the unitary gesture of full link conditional random field models, and is directed to single people
Binary gesture of the extracted characteristic information as full link conditional random field models, by extracted all Fusion Features in one
In model, group behavior identification is carried out;
The step 1 specifically includes:
Step 1-1, the characteristic information mentioned for single people considers everyone location information, size information, motion information,
These three information belong to most basic exterior representations information;And feature, this spy are mentioned to single people using convolutional neural networks
The method that sign is extracted is to extract feature for picture in its entirety, by multiple convolutional layers, down-sampled layer operation, finally obtained feature
It is high-level semantic combination, this method can better describe the behavior of single people, posture information than simple external feature;
Step 1-2, carry out feature extraction for interpersonal interaction, respectively centered on everyone, by around him with
His neighbouring people is considered as his context, according to the behavior that itself behavior and the neighbouring people of surrounding are showed, above and below extraction behavior
Literary feature is denoted as AC descriptor;This descriptor considers interpersonal action relationships, in order to improve the robust of feature
Property, on this basis, consider everyone and he around neighbouring people relative orientation relationship, extract opposite contextual feature, be denoted as
RAC descriptor;Concrete operations are as follows:
(1) extraction of behavior contextual feature (AC), this feature consider everyone and he around people adjacent to it behavior
Performance;HOG feature is mentioned to everyone, is then classified by SVM, the score of every class behavior: A is obtainedi=[S1i,S2i,…,
SKi], wherein SniIndicate the score that behavior label n is corresponded to by resulting i-th of the people of SVM classifier, it, will using i-th of people as center
The people region of neighbouring (dis ∈ (0.5 × h, 2 × h)) is considered as context area around him, special to the extracted region context
Sign, wherein h corresponds to everyone height:
M: context area is divided into M sub-regions, Nm(i): i-th of people in m-th of subregion, such as in first sub-
There are 2 people close with him in the region context, around him, then the behavior score of the two people is taken out, and takes each phase
The maximum value for answering behavior score obtains first sub- contextual feature, obtains behavior contextual feature are as follows: ACi=[Ai, Ci];
(2) extraction of opposite behavior contextual feature (RAC), RAC not only allow for behavioural characteristic, also capture people from center and week
Enclose the relativeness of people;When center people is towards the right side, another person around him is towards a left side, then their relativeness is defined as towards phase
Opposite direction;AC descriptor is not due to accounting for relative orientation relationship, and the extracting method of RAC descriptor is similar with AC's, because together
When consider behavior and direction, so its behavioural characteristic be K dimension: K=U × V, U: behavior classification number, V: direction classification number;Root
According to HOG feature and the direction of resulting i-th of the people of svm classifier is carried out, obtains everyone opposite behavior score: According to HOG feature and carry out svm classifier
The direction of resulting i-th of people;According toThe opposite behavior descriptor of i-th people and m-th sub- context area it is opposite
Context descriptor:
The relative descriptors of entire context area are as follows:So the opposite behavior descriptor of i-th of people
Are as follows:
Step 1-3, scene information is extracted to environment locating for people, considers three kinds of scene informations: outdoor, indoor, automobile, scene letter
The extraction of breath is divided into the progress of two steps, carries out outdoor, indoor classification to scene using spatial pyramid distribution method first, secondly benefit
Scene picture is observed with eye tracker, area-of-interest is obtained, area-of-interest is analyzed, whether detect in scene has automobile
In the presence of;These three extracted scene informations are merged, the binary vector tieed up with one 3 indicates, if in scene
There is the information for meeting corresponding position, is just indicated with 1, otherwise indicated with 0.
2. group behavior recognition methods according to claim 1, characterized in that the step 2 specifically includes:
Step 2-1, interpersonal interaction feature AC descriptor and RAC descriptor are merged to obtain respectively with scene information new
Feature vector classified to obtain behavior score, then distinguish using the support vector cassification algorithm containing radial basis function
It is converted to probability by softmax, by both probability by asking max to obtain new probability vector, and the result that will be obtained
Unitary gesture as full link conditional random field models;
It calculates unitary gesture: extracted AC descriptor and RAC descriptor is merged with scene information feature vector S respectively, respectively
Obtain new feature vector Scene_AC, Scene_RAC;Behavior is obtained to the training of both feature vectors using SVM classifier
Score is converted by softmax score vector being converted into matrix, and seeks Max to the two;
Step 2-2, the binary gesture for the extracted characteristic information of single people as full link conditional random field models, realizing will
Extracted all Fusion Features carry out group behavior identification in a model.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610976817.1A CN106529467B (en) | 2016-11-07 | 2016-11-07 | Group behavior recognition methods based on multi-feature fusion |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610976817.1A CN106529467B (en) | 2016-11-07 | 2016-11-07 | Group behavior recognition methods based on multi-feature fusion |
Publications (2)
Publication Number | Publication Date |
---|---|
CN106529467A CN106529467A (en) | 2017-03-22 |
CN106529467B true CN106529467B (en) | 2019-08-23 |
Family
ID=58349991
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201610976817.1A Active CN106529467B (en) | 2016-11-07 | 2016-11-07 | Group behavior recognition methods based on multi-feature fusion |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN106529467B (en) |
Families Citing this family (17)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109214403B (en) * | 2017-07-06 | 2023-02-28 | 斑马智行网络(香港)有限公司 | Image recognition method, device and equipment and readable medium |
CN107679462B (en) * | 2017-09-13 | 2021-10-19 | 哈尔滨工业大学深圳研究生院 | Depth multi-feature fusion classification method based on wavelets |
US20190272466A1 (en) * | 2018-03-02 | 2019-09-05 | University Of Southern California | Expert-driven, technology-facilitated intervention system for improving interpersonal relationships |
CN108647641B (en) * | 2018-05-10 | 2021-04-27 | 北京影谱科技股份有限公司 | Video behavior segmentation method and device based on two-way model fusion |
CN110659397B (en) * | 2018-06-28 | 2022-10-04 | 杭州海康威视数字技术股份有限公司 | Behavior detection method and device, electronic equipment and storage medium |
CN109299657B (en) * | 2018-08-14 | 2020-07-03 | 清华大学 | Group behavior identification method and device based on semantic attention retention mechanism |
CN111326253A (en) * | 2018-12-14 | 2020-06-23 | 深圳先进技术研究院 | Method for evaluating multi-modal emotional cognitive ability of patients with autism spectrum disorder |
CN109620266B (en) * | 2018-12-29 | 2021-12-21 | 中国科学院深圳先进技术研究院 | Method and system for detecting anxiety level of individual |
CN109977856B (en) * | 2019-03-25 | 2023-04-07 | 中国科学技术大学 | Method for identifying complex behaviors in multi-source video |
CN109977872B (en) * | 2019-03-27 | 2021-09-17 | 北京迈格威科技有限公司 | Motion detection method and device, electronic equipment and computer readable storage medium |
CN110348296B (en) * | 2019-05-30 | 2022-04-12 | 北京市遥感信息研究所 | Target identification method based on man-machine fusion |
CN110263723A (en) * | 2019-06-21 | 2019-09-20 | 王森 | The gesture recognition method of the interior space, system, medium, equipment |
CN110309790B (en) * | 2019-07-04 | 2021-09-03 | 闽江学院 | Scene modeling method and device for road target detection |
CN110796081B (en) * | 2019-10-29 | 2023-07-21 | 深圳龙岗智能视听研究院 | Group behavior recognition method based on relational graph analysis |
CN112131944B (en) * | 2020-08-20 | 2023-10-17 | 深圳大学 | Video behavior recognition method and system |
CN112188171A (en) * | 2020-09-30 | 2021-01-05 | 重庆天智慧启科技有限公司 | System and method for judging visiting relationship of client |
CN113569645B (en) * | 2021-06-28 | 2024-03-22 | 广东技术师范大学 | Track generation method, device and system based on image detection |
Family Cites Families (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103500340B (en) * | 2013-09-13 | 2017-02-08 | 南京邮电大学 | Human body behavior identification method based on thematic knowledge transfer |
JP6194777B2 (en) * | 2013-11-29 | 2017-09-13 | 富士通株式会社 | Operation determination method, operation determination apparatus, and operation determination program |
CN104063721B (en) * | 2014-07-04 | 2017-06-16 | 中国科学院自动化研究所 | A kind of human behavior recognition methods learnt automatically based on semantic feature with screening |
CN105631462A (en) * | 2014-10-28 | 2016-06-01 | 北京交通大学 | Behavior identification method through combination of confidence and contribution degree on the basis of space-time context |
CN104966052A (en) * | 2015-06-09 | 2015-10-07 | 南京邮电大学 | Attributive characteristic representation-based group behavior identification method |
CN105426820B (en) * | 2015-11-03 | 2018-09-21 | 中原智慧城市设计研究院有限公司 | More people's anomaly detection methods based on safety monitoring video data |
CN105574489B (en) * | 2015-12-07 | 2019-01-11 | 上海交通大学 | Based on the cascade violence group behavior detection method of level |
-
2016
- 2016-11-07 CN CN201610976817.1A patent/CN106529467B/en active Active
Also Published As
Publication number | Publication date |
---|---|
CN106529467A (en) | 2017-03-22 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN106529467B (en) | Group behavior recognition methods based on multi-feature fusion | |
CN109389055B (en) | Video classification method based on mixed convolution and attention mechanism | |
He et al. | Multi-scale FCN with cascaded instance aware segmentation for arbitrary oriented word spotting in the wild | |
Wang et al. | Dense trajectories and motion boundary descriptors for action recognition | |
Zhang et al. | Fast and robust occluded face detection in ATM surveillance | |
JP7386545B2 (en) | Method for identifying objects in images and mobile device for implementing the method | |
Betancourt et al. | A sequential classifier for hand detection in the framework of egocentric vision | |
CN111523462A (en) | Video sequence list situation recognition system and method based on self-attention enhanced CNN | |
Aich et al. | Improving object counting with heatmap regulation | |
Arivazhagan et al. | Human action recognition from RGB-D data using complete local binary pattern | |
CN108416780B (en) | Object detection and matching method based on twin-region-of-interest pooling model | |
Vezzani et al. | HMM based action recognition with projection histogram features | |
Ye et al. | Jersey number detection in sports video for athlete identification | |
García-Martín et al. | Robust real time moving people detection in surveillance scenarios | |
Bak et al. | Two-stream convolutional networks for dynamic saliency prediction | |
CN111723773A (en) | Remnant detection method, device, electronic equipment and readable storage medium | |
Russo et al. | Sports classification in sequential frames using CNN and RNN | |
Zhu et al. | A two-stage detector for hand detection in ego-centric videos | |
Afsar et al. | Automatic human action recognition from video using hidden markov model | |
Urabe et al. | Cooking activities recognition in egocentric videos using combining 2DCNN and 3DCNN | |
Zhang et al. | Realgait: Gait recognition for person re-identification | |
Kumar et al. | On-the-fly hand detection training with application in egocentric action recognition | |
CN109299702B (en) | Human behavior recognition method and system based on depth space-time diagram | |
Xu et al. | Semantic Part RCNN for Real-World Pedestrian Detection. | |
Yu et al. | Gender classification of full body images based on the convolutional neural network |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |