CN111414846B - Group behavior identification method based on key space-time information driving and group co-occurrence structural analysis - Google Patents

Group behavior identification method based on key space-time information driving and group co-occurrence structural analysis Download PDF

Info

Publication number
CN111414846B
CN111414846B CN202010192335.3A CN202010192335A CN111414846B CN 111414846 B CN111414846 B CN 111414846B CN 202010192335 A CN202010192335 A CN 202010192335A CN 111414846 B CN111414846 B CN 111414846B
Authority
CN
China
Prior art keywords
group
network
key
lstm
occurrence
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010192335.3A
Other languages
Chinese (zh)
Other versions
CN111414846A (en
Inventor
王传旭
薛豪
邓海刚
闫春娟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Litong Information Technology Co ltd
Original Assignee
Qingdao University of Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Qingdao University of Science and Technology filed Critical Qingdao University of Science and Technology
Priority to CN202010192335.3A priority Critical patent/CN111414846B/en
Publication of CN111414846A publication Critical patent/CN111414846A/en
Application granted granted Critical
Publication of CN111414846B publication Critical patent/CN111414846B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/20Movements or behaviour, e.g. gesture recognition
    • G06V40/23Recognition of whole body movements, e.g. for sport training
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/06Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
    • G06N3/061Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using biological neurons, e.g. biological neurons connected to an integrated circuit
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/41Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02ATECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
    • Y02A90/00Technologies having an indirect contribution to adaptation to climate change
    • Y02A90/10Information and communication technologies [ICT] supporting adaptation to climate change, e.g. for weather forecasting or climate simulation

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Data Mining & Analysis (AREA)
  • Computational Linguistics (AREA)
  • Molecular Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • General Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Multimedia (AREA)
  • Neurology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Psychiatry (AREA)
  • Social Psychology (AREA)
  • Human Computer Interaction (AREA)
  • Evolutionary Biology (AREA)
  • Microelectronics & Electronic Packaging (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a group behavior identification method based on key time-space information driving and group co-occurrence structural analysis, 1) obtaining importance weights of each member in a group based on a key person candidate sub-network; 2) Inputting the personal importance weight and the boundary box characteristics into a main network CNN to obtain the spatial characteristics input into the laminated LSTM network; 3) Modeling the co-occurrence characteristics by taking the output of the 2) as input, and grouping the internal neurons of the laminated LSTM to realize different groups to learn different co-occurrence characteristics so as to obtain group characteristics; 4) Inputting the boundary frame characteristics into a key time segment candidate sub-network for characteristic extraction, and obtaining the importance weight of the current frame; 5) And (3) combining the group characteristics obtained in the step (3) with the importance weights of the current frame obtained in the step (4) to obtain the group characteristics of the current frame, and inputting the group characteristics into softmax for group behavior recognition to finish the classification task. The scheme extracts important member characteristics of the group and key scene frames based on key space-time information, and combines interaction information inside the co-occurrence processing group behaviors to achieve improvement of recognition accuracy of the group behaviors.

Description

Group behavior identification method based on key space-time information driving and group co-occurrence structural analysis
Technical Field
The invention relates to the field of group behavior identification, in particular to a group behavior identification method based on key space-time information driving and group co-occurrence structural analysis.
Background
In recent years, human behavior recognition in video has gained attention in the field of computer vision. Human behavior recognition is also widely applied in real life, such as intelligent video monitoring, abnormal event detection, sports analysis, social behavior understanding and the like, and the application of the human behavior recognition makes group behavior recognition have important scientific practicability and huge economic value. Group behavior recognition is a complex activity commonly performed by multiple people, and most important in group behavior recognition methods are research of personal characteristics and how to infer group behavior with individuals.
"A Hierarchical Deep Temporal Model for Group Activity Recognition", published in CVPR in 2016, built a deep model to capture LSTM model-based dynamics, and proposed a novel deep architecture that models group activities in LSTM networks, models personal activities in a first stage, and then combines personnel level information with representative community activities. The time characterization value of the model is based on a Long Short Term Memory (LSTM) network, with the goal of utilizing discriminative information in the hierarchy between individual behavior and community activities. However, although the method uses a two-layer LSTM network, the group behaviors are simply represented by combining the personal characteristics, the interaction relationship of the individuals cannot be utilized, and key people in the group cannot be identified, so that the accuracy of the group behavior identification is lower; in addition, given that the importance of each person to group behavior recognition in a group activity is different, this approach simply models each person, while also reducing the accuracy of group behavior recognition.
In addition, "Region based multi-stream convolutional neural networks for collective activity recognition" published in JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION in 2019 proposes a new multi-leave architecture of person-based areas for group activity recognition, which analyzes a plurality of local areas in addition to using whole image information, and well considers person-person and group-person interaction information, but does not well capture timing information of video and well consider optical flow motion information of individuals because LSTM network is not well utilized; and the proposed Fusion strategies such as Sum Fusion, max Fusion and Concatenation Fusion are all artificially formulated and cannot well represent the characteristics.
Disclosure of Invention
The invention provides a group behavior identification method based on key spatiotemporal information driving and group co-occurrence structural analysis, which aims to accurately identify the behavior of each individual in a group and infer the group behavior by utilizing the individual and the interaction characteristics among the individuals.
The invention is realized by adopting the following technical scheme: a group behavior identification method based on key space-time information driving and group concurrence structural analysis comprises the following steps:
step A, tracking each member in the video aiming at the video to be identified to obtain a boundary frame image x of the video t Inputting the static characteristics and the dynamic characteristics into a candidate sub-network of the key person according to time sequence to extract the static characteristics and the dynamic characteristics, and identifying personal behavior attributes to obtain personal importance weights alpha t
Step B, weighting the personal importance weight alpha obtained in the step A t Personal bounding box image x t Inputting to a main network CNN for analysis processing to obtain a spatial feature X input to a laminated LSTM network t '=x tt
Step C, taking the output of the step B as input to conduct co-occurrence feature modeling, grouping neurons of the laminated LSTM, and learning different co-occurrence features by different groups to obtain group features Z t
Step D, the boundary frame image x in the step A t Inputting the importance weight of the current frame into a key time segment candidate sub-network to extract the characteristics, namely the importance beta of the current frame t
Step E, grouping the features Z obtained in the step C t And the importance weight beta of the current frame obtained in the step D t Combining to obtain group feature Z 'of current frame' t And Z 'is' t Input to softmax, and complete group behavior recognition.
Further, in the step a, the personal importance weight α is obtained t The method is realized by the following steps:
(1) Firstly, establishing a key character candidate sub-network, wherein the key character candidate sub-network comprises a CNN layer, an LSTM layer, a full connection layer, a tanh activation function and a full connection layer which are connected in series;
(2) Secondly, obtaining behavior attribute scores of group members, specifically:
at the t-th moment, M members in the scene are set, and the boundary frame feature set extracted by the CNN network is x t =(x t,1 ,,...,x t,M ) T Behavior attribute score s t =(s t,1 ,...,s t,M ) T The behavior attribute score represents the behavior class judgment of the M members, expressed as:
Figure GDA0004118287120000021
wherein T is the length of the video sequence, U s ,W xs ,W hs As a matrix of learnable parameters, b s ,b us As a result of the offset vector,
Figure GDA0004118287120000022
representing hidden variables from the LSTM layer.
(3) Finally, the importance weight of each member is obtained, and then the key person is determined, and the specific steps are as follows:
given group behavior class set G _action =(A 1 ,A 2 ,...,A q ) T For the kth person, calculate his behavioral attribute score s t,k At G _action Specifically measured by cosine similarity of the two multidimensional vectors:
Figure GDA0004118287120000023
representation of 2 norms;
the normalized coefficient converted into cosine angle is as follows:
Figure GDA0004118287120000024
the importance weight of each person in the spatial scene is calculated by the following formula:
Figure GDA0004118287120000031
α t,k to determine how much each member contributes to the group behavior recognition task.
Further, the step B is specifically implemented by the following manner:
(1) Firstly, obtaining the spatial characteristics of a kth person at the moment t through importance weight modulation:
x' t,k =α t,k ·x t,k
(2) Then, all the individual spatial features modulated by the importance weights are aggregated to be used as the input of the laminated LSTM in the main network, and the method is obtained:
X' t =(x' t,1 ,...,x' t,K )。
further, in the step C, when the co-occurrence feature learning is performed, the following method is specifically adopted:
step C1, firstly, establishing an end-to-end full-connection depth LSTM network model to realize automatic learning of time sequence characteristics and motion modeling;
based on the LSTM layer and the feedforward layer, a deep network is formed by alternative arrangement, so that the motion information is captured, and the feedforward layer is positioned between the two LSTM networks, so that each layer of neurons are completely connected with the neurons of the next layer;
and C2, grouping neurons of the laminated LSTM, introducing a constraint on weights of member individuals and the neurons connected in an objective function, so that the neurons in the same group have larger weight connections to subsets formed by certain member individuals and smaller weight connections to other nodes, and mining the co-occurrence of the member individuals.
Further, in the step C1, in order to ensure that the full connection depth LSTM network model learns effective features, different types of regularization are implemented in different parts of the model, which specifically includes two types of regularization:
1) For fully connected layers, regularization is introduced to drive the model to learn co-occurrence features of individuals of different layers, and co-occurrence feature learning of nodes between LSTM layers;
2) For LSTM neurons, a new Dropout layer is derived and applied to LSTM neurons in the last LSTM layer.
Further, in the step C2, the excavation and utilization of co-occurrence is achieved by adding a group sparse constraint to the connection of each group of neurons and member individuals:
(1) The neurons of each layer of LSTM are grouped according to the group behavior category number, and for the neurons of the K group, the neurons of the group are trained to automatically distinguish different individual behaviors, and the co-occurrence regularization is added into the loss function:
Figure GDA0004118287120000032
where L is the maximum likelihood loss function of the deep LSTM network, W =[W x,1 ;...;W x,K ]Is a weight matrix connected with beta of the input unit, and N is set to represent the number of neurons, the N neurons are divided into theta groups, and the number of the neurons in each group is epsilon= [ N/theta ]]For LSTM layers, s= { i, f, o, c } represents the input gate, forget gate, output gate and cell in LSTM neurons, and for feed forward layers, s= { h } represents the neurons themselves;
the second term in equation (1) is L1 regularization, which is used to determine a relatively important subset of key characters during training; in the third term, matrix W may be encouraged due to the L2 norm xβ,k Becomes sparse, so using the L2 norm definition for each group of cells is defined as
Figure GDA0004118287120000041
The characteristics with different descriptions are selected as input by driving the characteristics, different neuron groups explore different co-occurrence characteristics, and then a gradient descent method is adopted for solving.
Further, the step D is specifically implemented by the following manner:
(1) Firstly, establishing a key time segment candidate sub-network, wherein the key time segment candidate sub-network comprises a CNN layer and an LSTM layer which are connected in series, a full connection layer and a Relu nonlinear unit;
(2) According to the video sequence input into the sub-network, obtaining a behavior attribute score in the current frame by utilizing a Relu unit, namely: o (o) t =RELU(w x' x t +w h' h t ' -1 +b')=(o 1 ,o 2 ,...o C ) t C represents the total number of behavior categories, t represents the current frame, and its size depends on the current input x t Hidden state h at t-1 time of LSTM layer t ' -1
(3) Finally, according to the association degree between the current frame behavior attribute score and the group behavior attribute, the importance weight beta of the current frame in the input sequence T is obtained t, The method comprises the following steps:
given group behavior class set G _action =(A 1 ,A 2 ,...,A q ) T Time importance weight beta t By calculating the aggregate and current frame behavior attribute score o t =(o 1 ,o 2 ,...o C ) t The joint similarity coefficient of the two sets is expressed as:
Figure GDA0004118287120000042
where I represents intersection computation and n () represents the number of elements in the solution set.
Further, in the step E, the group characteristic Z is based t And importance weight beta of the current frame t The following method is specifically adopted when the group behavior identification is carried out:
(1) First, the frame group characteristics at the t-th moment are calculated as follows:
Z t '=Z t ·β t
(2) It is then input into the softmax layer for final group behavior recognition:
y=softmax(Z t ')
wherein y is the group behavior category.
Further, based on the complexity of the model, the main network, the key character candidate sub-network and the key time segment candidate sub-network are jointly trained, and specifically, the joint training process of the network is as follows:
input: training times N1 and N2 of the model;
(1) Initializing network parameters using a gaussian function;
(2) The method comprises the steps of fixing weights of candidate sub-networks of key characters, and jointly training a candidate sub-network main network of a key time period with only one LSTM layer of the main network to obtain a candidate model of the key time period;
(3) Repeating the iteration, and training the main network after increasing the LSTM layer to three layers through N1 iterations;
(4) Fine-tuning the main network and the candidate sub-network of the key time period through N2 iterations;
(5) Fixing the candidate sub-network of the key time period, and jointly training the candidate sub-network of the key person and the main network which only have one LSTM layer to obtain the candidate sub-network of the key person;
(6) Repeating the iteration, and training the main network after increasing the LSTM layer to three layers through N1 iterations;
(7) Fine-tuning the main network and the candidate sub-network of the key time period through N2 iterations;
(8) Obtaining a sub-network through N1 iterative joint training (4) and (7);
(9) Fine tuning the whole network model together through N2 iterations;
and (3) outputting: and finally converging to obtain the whole group behavior recognition model.
Compared with the prior art, the invention has the advantages and positive effects that:
according to the group behavior recognition method based on the space-time importance and the co-occurrence, an importance mechanism is used for focusing on important individual behaviors in group behaviors, more important personal characteristics and important scene frames are extracted, interaction information in the group behaviors is processed in a combined mode, the personal characteristics can be better utilized through the combination of the space-time importance and the co-occurrence, key information in a plurality of information is effectively utilized, and therefore accuracy and efficiency are improved, and the method has important scientific practicality and huge economic value.
Drawings
FIG. 1 is a diagram of an overall network architecture according to an embodiment of the present invention;
FIG. 2 is a schematic diagram of the internal structure of a primary network layer LSTM according to an embodiment of the present invention;
FIG. 3 is a schematic diagram of a grouping of neurons within each layer of LSTM in accordance with an embodiment of the present invention;
FIG. 4 is a schematic diagram of the internal structure of an LSTM neuron according to an embodiment of the invention.
Detailed Description
In order that the above objects, features and advantages of the invention will be more readily understood, a further description of the invention will be rendered by reference to specific embodiments thereof which are illustrated in the appended drawings. In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present invention, however, the present invention may be practiced otherwise than as described herein, and therefore the present invention is not limited to the specific embodiments disclosed below.
In order to accurately identify the behavior of each individual in the group and infer the group behavior by using the characteristics of the individual and the interaction between the individual, the embodiment provides a group behavior identification method based on key spatiotemporal information driving and group co-occurrence structural analysis, which comprises the following steps:
step A, tracking each member in the video aiming at the video to be identified to obtain a boundary frame image x of the video t Inputting the static characteristics and the dynamic characteristics into a candidate sub-network of the key person in time sequence to extract the static characteristics and the dynamic characteristics, and identifying personal behavior attributes so as to obtain the personal importance weight alpha t
Step B, weighting the personal importance weight alpha obtained in the step A t And personal bounding box image x t Is input into the main network for multiplication processing to obtain the spatial feature X input into the main network layer LSTM t '=x tt
Step C, taking the output of the step B as input, modeling the co-occurrence characteristics, grouping the neurons of the layered LSTM, and differentDifferent co-occurrence characteristics are learned by the group of the (B) to obtain group characteristic Z t
Step D, the boundary frame image x in the step A t Inputting the importance weight of the current frame, namely importance beta of the current frame, into a key time segment candidate sub-network to perform feature extraction t
Step E, grouping the features Z obtained in the step C t And the importance weight beta of the current frame obtained in the step D t Combining to obtain group feature Z 'of current frame' t And inputting the group behavior identification data into softmax for group behavior identification, and completing the group behavior identification.
In order to realize the accuracy and efficiency of group behavior identification, the scheme designs a main network and two sub-networks (a key character candidate sub-network and a key time period candidate sub-network), wherein the main network is used for realizing feature extraction, space-time correlation utilization and final classification; the key character candidate sub-network is used for distributing proper importance to different individuals; the critical period candidate sub-network is used to assign appropriate importance to the different frames. The identification of the individual behavior is not performed within the main network, but rather the individual behavior is inferred directly within the sub-network to obtain the key persona ordering and then to control the input of useful information to the main network.
1. In a specific group behavior recognition process, although group activities are commonly performed by a plurality of persons, members determining group behaviors are usually few members (key persons) from which group behaviors are performed within a certain period of time. Therefore, two sub-networks are designed to pay attention to the key character information in the scheme so as to play a role in shielding the interference of useless information of other members and optimize the recognition precision of the model; the two sub-networks are a key person candidate sub-network and a key time period candidate sub-network respectively, wherein the key person candidate sub-network realizes quantitative queuing of the importance of the spatial position of the group member according to the relevance of the personal behavior and the group behavior, and controls the input of CNN spatial information in the main network; the key time period candidate sub-network realizes the quantitative choice of the importance of time information output by the laminated LSTM in the main network according to the relevance of the category of member behaviors and group behaviors, pays attention to a useful time slice terminal, and optimizes the information input to a classification layer softmax; the purification of the key moment information in the group behavior is realized through the two sub-networks, and the purpose is to shield the influence of interference noise of irrelevant personnel in the group.
2. In addition, as a complex group activity, a group's certain behavior is typically coordinated by several specific individuals within the group, and group members are present in structured groupings, with close interactions of the members within the small set. In the scheme, a core group formed by key characters is focused, influence of irrelevant members on group behavior identification is reduced, and characteristics of the core group members in cooperation interaction and cooperation to determine group behavior discrimination are called co-occurrence. There are a variety of group behaviors, so there will be a corresponding plurality of such relatively "stable co-occurrence subgroups". It is emphasized that the occurrence of these co-occurrence groups is time-varying and that only one "co-occurrence group" dominates for a certain time segment.
To characterize such "co-occurrence teams" in a team, three layers of co-iterated bi-directional LSTM layers are designed in the main network and LSTM neurons of each layer are grouped, each team focusing on only one class of team behavior, each neuron within a team needs to connect each member of the team (e.g., if neurons within one LSTM are grouped into 6 teams, 6 different teams of behaviors can be focused, and the six groups of focused behavior attributes are unchanged). Therefore, neurons in the same group are trained to have larger weight connection to a subset formed by a plurality of individuals with a certain specific behavior, and have smaller weight connection to other individuals with smaller correlation degree, the co-occurrence characteristics are learned by the method, the co-occurrence time sequence information of a key core small group is highlighted, and after the key time segment candidate sub-network characteristics are learned, redundant useless information in the LSTM can be further restrained, the signal-to-noise ratio of the co-occurrence time sequence information is improved, and the signal-to-noise ratio of the co-occurrence time sequence information is used as input of a classification layer softmax, so that the recognition accuracy of group behaviors is improved.
The following describes the group behavior recognition method in detail in this embodiment, specifically:
in step A, a personal importance weight alpha is obtained t The method is realized by the following steps:
(1) Firstly, establishing a key character candidate sub-network, wherein the key character candidate sub-network comprises CNN and LSTM layers, a full connection layer, a tanh activation function and a full connection layer which are connected in series as shown in figure 1;
(2) Secondly, obtaining behavior attribute scores of group members, specifically:
at time t, M members in the scene are set, and the boundary frame feature set x extracted through CNN network t =(x t,1 ,,...,x t,M ) T Behavioral attribute score s t =(s t,1 ,...,s t,M ) T The behavior class judgment representing M members is obtained by the following formula:
Figure GDA0004118287120000071
wherein T is the length of the video sequence, U s ,W xs ,W hs As a matrix of learnable parameters, b s ,b us As a result of the offset vector,
Figure GDA0004118287120000072
representing hidden variables from the LSTM layer;
(4) Finally, the importance weight of each member is obtained, and then the key person is determined, and the specific steps are as follows:
given group behavior class set G _action =(A 1 ,A 2 ,...,A q ) T For the kth person, calculate his behavioral attribute score s t,k At G _action Can be measured by cosine similarity of the two multidimensional vectors:
Figure GDA0004118287120000073
representation of 2 norms;
the normalized coefficient converted into cosine angle is as follows:
Figure GDA0004118287120000074
the importance weight of each person in the spatial scene is calculated by the following formula:
Figure GDA0004118287120000081
α t,k to determine how much each member contributes to the group behavior recognition task, and thus control the amount of information that it flows to the primary network.
In step B, the personal importance weight alpha obtained in step A is used for obtaining the personal importance weight alpha t And personal bounding box feature x t The features input to the main network CNN are multiplied in order to obtain the feature X input to the stacked LSTM network t '=x tt The method is realized by the following steps:
(1) Firstly, obtaining the spatial characteristics of a kth person at the moment t through importance weight modulation:
x' t,k =α t,k ·x t,k
(2) All importance weight modulated individual spatial features are then aggregated as inputs to the stacked LSTM:
X' t =(x' t,1 ,...,x' t,K )
the key person candidate sub-network proposed in this embodiment determines the importance of the individual behavior based on all individuals at the current time and hidden variables in the LSTM layer, and the key person candidate sub-network aims to assign importance weights to individuals in group activities, taking into account the LSTM hidden variable h t-1 Information of past frames is contained so that it can explore long-term dynamics.
In step C, taking the output of step B as input, modeling the co-occurrence characteristics, grouping the neurons of the laminated LSTM, and learning different co-occurrence characteristics by different groups to obtain group characteristics Z t The method is realized by the following steps:
(1) Firstly, an end-to-end full-connection depth lamination LSTM network model is established, and time sequence feature learning and motion modeling are realized; the method aims at reliably modeling complex relations among different individuals, and adopts LSTM layers and feedforward layers to be alternately deployed to form a deep network for capturing motion information, wherein the feedforward layer is positioned between the two LSTM layers, as shown in figure 2, and the effect is that each layer of neurons are completely connected with neurons of the next layer, no same-layer connection exists among the neurons, and no cross-layer connection exists.
In order to ensure that the model learns effective features, different types of regularization are implemented in different parts of the model, so that the problem of overfitting (the model is a depth model formed by a fully-connected LSTM network and a feedforward layer, the structure is relatively complex, the overfitting problem is easy to cause, so that the regularization design is to reduce the complexity of the model to solve the overfitting problem), specifically, two types of regularization are provided in the embodiment:
1) For fully connected layers, the present embodiment introduces regularization to drive the model to learn co-occurrence features of individuals of different layers, as well as co-occurrence feature learning of nodes between LSTM layers;
2) For LSTM neurons, deriving a new Dropout layer and applying it to LSTM neurons in the last LSTM layer helps the network learn complex dynamics.
(2) And grouping the neurons in the layered LSTM, introducing the constraint on the weights of the member individuals and the neurons in the objective function, so that the neurons in the same group have larger weight connections to a subset formed by some member individuals and smaller weight connections to other nodes, thereby mining the co-occurrence of the member individuals.
As shown in fig. 3, the LSTM layer of the main network is composed of LSTM neurons divided into K groups, each neuron in the same group having a larger connection weight with some individuals (i.e., a subset of members closely related to the behavior of a group of a certain type) and a smaller connection weight with other individuals. The degree of sensitivity of different sets of neurons to different sets of group behaviors is different, in that individual subsets of different sets of neurons corresponding to larger connection weights are also different. In practice, the mining and exploitation of co-occurrence described above may be achieved by adding a set of sparsity constraints to the connection of each set of neurons and member individuals.
1) Neurons of each layer LSTM are grouped according to the number of group behavior categories, e.g., 10 behavior categories are grouped into 10 groups. Each neuron is fully connected to an individual, and each individual is of different importance, and the trained neurons can know which individuals are important, thus highlighting the importance group.
Thus, the present embodiment designs a fully connected main network, allowing each neuron to connect to any individual to implement co-occurrence features inside an auto-discovery group, grouping neurons in the same layer into θ groups, allowing different groups to focus on discriminating different behavioral classifications. Taking the K group of neurons as an example, the group of neurons are trained to automatically distinguish different individual behaviors, and the co-occurrence regularization is added into the loss function, so that the design is as follows: (the regularization of the model is used for two purposes, one is to prevent overfitting, and the other is to integrate prior information, so that the model can learn the desired effect
Figure GDA0004118287120000091
Where L is the maximum likelihood loss function of the deep LSTM network, W =[W x,1 ;...;W x,K ]Is a weight matrix connected with the input unit beta, and N is set to represent the number of neurons, the N neurons are divided into theta groups, and the number of the neurons in each group is epsilon= [ N/theta ]]For LSTM layers, s= { i, f, o, c } represents the input gate, forget gate, output gate and cell in LSTM neurons, and for feed forward layers, s= { h } represents the neurons themselves;
the second term in equation (1) is L1 regularization, which reduces rapidly for small weights and slowly for large weights during training, so the weights of the final model are mainly concentrated at high valuesOn features of importance, the weight will quickly approach 0 for less important features, and thus a relatively important subset of key characters can be determined therefrom. In the third term, matrix W may be encouraged due to the L2 norm xβ,k Becomes sparse, so using the L2 norm definition for each group of cells is defined as
Figure GDA0004118287120000092
The method is characterized in that characteristics with different descriptions can be selected as input by driving the method, different neuron groups explore different co-occurrence characteristics so as to obtain the capability of identifying various action categories, and then a gradient descent method is adopted for solving. (matrix W xβ,k The optimization, the value of which becomes sparse, is actually to realize the selection of the feature importance of the small group bodies with theta groups of co-occurrence.
During training, the network will drop some neurons randomly to force the rest of the network elements to compensate, and during testing, the network will use all neurons together to make predictions. Extending this idea to LSTM networks, the present approach uses a new gradient descent algorithm that allows the internal gates, cells and output responses of LSTM neurons to selectively gradient down, encouraging each cell to learn better parameters, as shown in fig. 4, exposing LSTM neurons in an expanded form, for which it is undesirable to erase all information in the cell because it memorizes events that occurred over time, thus allowing the lost effects in LSTM to flow along the various layers of dashed lines, prohibiting it from flowing along the time axis, as shown in fig. 4.
The response of the non-lost cell transmitted in the time direction is:
Figure GDA0004118287120000101
Figure GDA0004118287120000102
Figure GDA0004118287120000103
Figure GDA0004118287120000104
Figure GDA0004118287120000105
the response of the lost cell is:
Figure GDA0004118287120000106
Figure GDA0004118287120000107
Figure GDA0004118287120000108
Figure GDA0004118287120000109
Figure GDA00041182871200001010
wherein m is i ,m f ,m c ,m o And m h Input gate, forget gate, cell memory cell, output gate and output response missing binary mask vector, element value 0 indicates that a miss occurred. For the first LSTM layer, input x t Is the obtained single person behavior feature; for higher LSTM layers, input x t Is the response output of the previous layer.
In step D, the boundary box information in step A is input into a key time segment candidate sub-network to perform feature extraction, and a current frame is obtainedImportance weight of (2), i.e. importance beta of the current frame t Specifically, the method is implemented by the following steps:
(1) Firstly, establishing a key time slice candidate sub-network, wherein the key time slice candidate sub-network comprises CNN and LSTM layers, a full connection layer and a Relu nonlinear unit which are connected in series as shown in figure 1;
(2) Then, according to the video sequence input into the sub-network, the behavior attribute score in the current frame is obtained by using the Relu unit, namely: o (o) t =RELU(w x' x t +w h' h t ' -1 +b')=(o 1 ,o 2 ,...o C ) t C represents the total number of behavior categories, t represents the current frame, and its size depends on the current input x t Hidden state h at t-1 time of LSTM layer t ' -1
For a sequence of video frames, the amount of valuable information provided by the different frames is often unequal, with only some frames containing the most distinct information and other frames providing context information as a supplement. For example, for a group action of a cue ball in a volleyball match, the importance of the action frame, such as a cue ball, a jump, etc., is lower than that of a cue ball.
(3) Finally, according to the association degree between the current frame behavior attribute score and the group behavior attribute, the importance weight beta of the current frame in the input sequence T is obtained t The method specifically comprises the following steps:
given group behavior class set G _action =(A 1 ,A 2 ,...,A q ) T Time importance weight beta t By calculating the aggregate and current frame behavior attribute score o t =(o 1 ,o 2 ,...o C ) t The joint similarity coefficient of the two sets is expressed as:
Figure GDA0004118287120000111
where I denotes calculating the intersection of two sets and n () denotes solving the number of elements in a set.
In step E, the group feature Z obtained in step C is used t And the importance weight beta of the current frame obtained in the step D t Combining to obtain group feature Z 'of current frame' t And inputs it into the softmax layer for final group behavior identification, specifically by:
(1) First, the frame group characteristics at the t-th moment are calculated as follows:
Z t '=Z t ·β t
(2) It is then input into the softmax layer for final group behavior recognition:
y=softmax(Z t ')
wherein y is the group behavior category.
The invention designs a sub-network to enable the network to pay different levels of attention to different individuals and allocate different importance to different frames, as shown in a network model figure 1, key character candidate sub-networks mainly act on the input of a main LSTM, and key time segment candidate sub-networks mainly act on the output of the main LSTM. The purpose of the two sub-networks is to enable the network to pay different levels of attention to different individuals and to assign different importance levels to different frames.
The scheme integrates a key character candidate and a key time period candidate sub-network in the same network, wherein the key character candidate sub-network acts on the input of the main LSTM network, and the key time period candidate sub-network acts on the output of the main LSTM network. The final objective function of the two subnetworks is formulated as a sequence with regularized cross entropy loss, as follows:
Figure GDA0004118287120000112
where y= (y 1,., yC) represents the real tag, if it belongs to class i, y for j not equal to i i =1 and y j =0。
Figure GDA0004118287120000113
Representing a probability scalar that predicts the sequence as class i. λ1, λ2, and λ3 balance the contributions of the three regularization terms.
The first regularization term is intended to encourage the key persona candidate subnetwork to dynamically focus on more spatial nodes in the sequence. This embodiment finds that the network model tends to continually ignore many individuals over time, even though these individuals are valuable for determining the type of action, i.e., are trapped in locally optimal locations, and therefore introduces this regularization term to avoid such an uncomfortable solution. The second regularization term is to regularize the learned critical period candidate sub-networks with L2 norm control, rather than adding them without limitation. This mitigates the disappearance of the gradient in the back propagation, which is proportional to 1/βt. The third regularization term for L1 norm is to reduce the over-fitting of the network, W uv Representing a connection matrix in the network.
Finally, considering the complexity of the model, a strategy of the combined training of the main network and the sub-network is provided to enable the model to achieve a better result, and the combined training process of the network is as follows:
because of the interaction of these three networks, containing two loss functions, the optimization effort is quite difficult. Therefore, the scheme provides a strategy of combined training, which can effectively train the whole model, and the training process is as follows:
input: training times N1, N2 of model (e.g., n1=1000, n2=500)
1. Initializing network parameters using a gaussian function;
2. fixing the weights of the candidate sub-networks of the key characters (according to the initialized weights), and jointly training the candidate main network of the sub-network of the key time period with only one LSTM layer of the main network to obtain a candidate model of the key time period;
3. repeating the iteration, and training the main network after increasing the LSTM layer to three layers through N1 iterations;
4. fine-tuning the main network and the candidate sub-network of the key time period through N2 iterations;
5. fixing the candidate sub-network of the key time period, and jointly training the candidate sub-network of the key person and the main network which only have one LSTM layer to obtain the candidate sub-network of the key person;
6. repeating the iteration, and training the main network after increasing the LSTM layer to three layers through N1 iterations;
7. fine-tuning the main network and the candidate sub-network of the key time period through N2 iterations;
8. obtaining a sub-network through N1 iterative joint training steps 4 and 7;
9. fine tuning the whole network model together through N2 iterations;
and (3) outputting: and finally converging to obtain the whole group behavior recognition model.
For the group behavior recognition method provided by the scheme, the key person candidate sub-network is adopted to allocate different importance weights to each person in the group, so that the information is not lost, and the individuals with large contribution to the group behavior recognition can be concerned; and the candidate sub-network of the key time period is utilized to allocate different importance weights for each frame, any frame is not discarded, no data loss is caused, and iteration is continuously trained through a model, so that the efficiency and the accuracy of group behavior recognition can be greatly improved.
In addition, the expression of the interactive relations at the current stage is to model the interactive relations of the characters in the group by using a graph model, the data are huge, and the model training is difficult, but in the scheme, the interactive relations inside the group behaviors are processed by using the co-occurrence property, the fully-connected stacked bidirectional LSTM is adopted, the neurons of the stacked bidirectional LSTM are grouped, different behaviors are identified by different groups, and the accuracy of the group behavior identification is further effectively improved.
The present invention is not limited to the above-mentioned embodiments, and any equivalent embodiments which can be changed or modified by the technical content disclosed above can be applied to other fields, but any simple modification, equivalent changes and modification made to the above-mentioned embodiments according to the technical substance of the present invention without departing from the technical content of the present invention still belong to the protection scope of the technical solution of the present invention.

Claims (9)

1. The group behavior identification method based on key space-time information driving and group concurrence structural analysis is characterized by comprising the following steps of:
step A, tracking each member in the video aiming at the video to be identified to obtain a boundary frame image x of the video t Inputting the obtained personal importance weight alpha into a candidate sub-network of the key person in time sequence to extract static and dynamic characteristics, and identifying personal behavior attributes t
Step B, weighting the personal importance weight alpha obtained in the step A t Personal bounding box image x t Inputting the spatial characteristics X ' into a main network for analysis and processing to obtain the spatial characteristics X ' input into a laminated LSTM network ' t =x tt
Step C, taking the output of the step B as input to conduct co-occurrence feature modeling, grouping neurons of the laminated LSTM, and learning different co-occurrence features by different groups to obtain group features Z t
Step D, the boundary frame image x in the step A t Inputting the importance weight beta of the current frame into a key time segment candidate sub-network for feature extraction t
Step E, grouping the features Z obtained in the step C t And the importance weight beta of the current frame obtained in the step D t Combining to obtain group feature Z 'of current frame' t And Z 'is' t Input to softmax, and complete group behavior recognition.
2. The method for group behavior identification based on key spatiotemporal information driven and group co-occurrence structured analysis of claim 1, wherein: in the step A, the personal importance weight alpha is obtained t The method is realized by the following steps:
(1) Firstly, establishing a key character candidate sub-network, wherein the key character candidate sub-network comprises a CNN layer, an LSTM layer, a full connection layer, a tanh activation function and a full connection layer which are connected in series;
(2) Secondly, obtaining behavior attribute scores of group members, specifically:
at the t-th moment, M members in the scene are set, and the boundary frame feature set extracted by the CNN network is x t =(x t,1 ,,...,x t,M ) T Behavior attribute score s t =(s t,1 ,...,s t,M ) T The behavior attribute score represents the behavior class judgment of the M members, expressed as:
Figure FDA0004124440010000011
wherein T is the length of the video sequence, U s ,W xs ,W hs As a matrix of learnable parameters, b s ,b us As a result of the offset vector,
Figure FDA0004124440010000013
representing hidden variables from the LSTM layer;
(3) Finally, the importance weight of each member is obtained, and then the key person is determined, and the specific steps are as follows:
given group behavior class set G _action =(A 1 ,A 2 ,...,A q ) T For the kth person, calculate his behavioral attribute score s t,k At G _action Is measured by cosine similarity of the two multidimensional vectors:
Figure FDA0004124440010000012
representation of 2 norms;
the normalized coefficient converted into cosine angle is as follows:
Figure FDA0004124440010000021
the importance weight of each person in the spatial scene is calculated by the following formula:
Figure FDA0004124440010000022
α t,k to determine how much each member contributes to the group behavior recognition task.
3. The method for group behavior identification based on key spatiotemporal information driven and group co-occurrence structured analysis of claim 2, wherein: the step B is specifically realized by the following steps:
(1) Firstly, obtaining the spatial characteristics of a kth person at the moment t through importance weight modulation:
x' t,k =α t,k ·x t,k
(2) Then, all the individual spatial features modulated by the importance weights are aggregated to be used as the stacked LSTM input in the main network, and the method is obtained:
X' t =(x' t,1 ,...,x' t,K )。
4. the method for group behavior identification based on key spatiotemporal information driven and group co-occurrence structured analysis of claim 1, wherein: in the step C, when co-occurrence feature learning is performed, the following specific method is adopted:
step C1, firstly, establishing an end-to-end full-connection depth LSTM network model to realize time sequence feature learning and motion modeling;
based on the LSTM layer and the feedforward layer, a deep network is formed by alternative arrangement, so that the motion information is captured, and the feedforward layer is positioned between the two LSTM networks, so that each layer of neurons are completely connected with the neurons of the next layer;
and C2, grouping neurons of the laminated LSTM, introducing the constraint on weights of member individuals and the neurons connected in an objective function, so that the neurons in the same group have large weight connections to subsets formed by certain member individuals and small weight connections to other nodes, and thus the co-occurrence of the member individuals is mined.
5. The method for group behavior identification based on key spatiotemporal information driven and group co-occurrence structured analysis of claim 4, wherein: in step C1, in order to ensure that the full connection depth LSTM network model learns effective features, different types of regularization are implemented in different parts of the model, specifically including two types of regularization:
1) For fully connected layers, regularization is introduced to drive the model to learn co-occurrence features of individuals of different layers, and of nodes between LSTM layers;
2) For LSTM neurons, a new Dropout layer is derived and applied to LSTM neurons in the last LSTM layer.
6. The method for group behavior identification based on key spatiotemporal information driven and group co-occurrence structured analysis of claim 5, wherein: in the step C2, the excavation and utilization of co-occurrence is achieved by adding a group sparse constraint to the connection of each group of neurons and member individuals:
(1) The neurons of each layer of LSTM are grouped according to the group behavior category number, and for the neurons of the K group, the neurons of the group are trained to automatically distinguish different individual behaviors, and the co-occurrence regularization is added into the loss function:
Figure FDA0004124440010000031
where L is the maximum likelihood loss function of the deep LSTM network, W =[W x,1 ,...,W x,K ]Is a weight matrix connected with the input unit beta, and N is set to represent the number of neurons, the N neurons are divided into theta groups, and the number of the neurons in each group is epsilon= [ N/theta ]]For LSTM layers, s= { i, f, o, c } represents the input gate, forget gate, output gate and cell in LSTM neurons, and for feed forward layers, s= { h } represents the neurons themselves;
the second term in equation (1) is L1 regularization, at training timeFor determining a relatively important subset of key characters; in the third term, matrix W may be encouraged due to the L2 norm xβ,k Becomes sparse, so using the L2 norm definition for each group of cells is defined as
Figure FDA0004124440010000032
The characteristics with different descriptions are selected as input by driving the characteristics, different neuron groups explore different co-occurrence characteristics, and then a gradient descent method is adopted for solving.
7. The method for group behavior identification based on key spatiotemporal information driven and group co-occurrence structured analysis of claim 2, wherein: the step D is specifically realized by the following steps:
(1) Firstly, establishing a key time segment candidate sub-network, wherein the key time segment candidate sub-network comprises a CNN layer and an LSTM layer which are connected in series, a full connection layer and a Relu nonlinear unit;
(2) Then, according to the video sequence input into the sub-network, the behavior attribute score in the current frame is obtained by using the Relu unit, namely: o (o) t =RELU(w x' x t +w h' h′ t-1 +b')=(o 1 ,o 2 ,...o C ) t C represents the total number of behavior categories, t represents the current frame, and its size depends on the current input x t Hidden state h 'at t-1 time of LSTM layer' t-1
(3) Finally, according to the association degree between the current frame behavior attribute score and the group behavior attribute, the importance weight beta of the current frame in the input sequence T is obtained t The method specifically comprises the following steps:
given group behavior class set G _action =(A 1 ,A 2 ,...,A q ) T Time importance weight beta t By calculating the aggregate and current frame behavior attribute scores o t =(o 1 ,o 2 ,...o C ) t The joint similarity coefficient of the two sets is expressed as:
Figure FDA0004124440010000033
where I represents intersection computation and n () represents the number of elements in the solution set.
8. The method for group behavior identification based on key spatiotemporal information driven and group co-occurrence structured analysis of claim 1, wherein: in the step E, the group characteristic Z is based t And importance weight beta of the current frame t The following method is specifically adopted when the group behavior identification is carried out:
(1) First, the frame group characteristics at the t-th moment are calculated as follows:
Z′ t =Z t ·β t
(2) It is then input into the softmax layer for final group behavior recognition:
y=softmax(Z t ')
wherein y is the group behavior category.
9. The method for group behavior identification based on key spatiotemporal information driven and group co-occurrence structured analysis according to claim 1, characterized in that: based on the complexity of the model, the main network, the key character candidate sub-network and the key time segment candidate sub-network are jointly trained, and the joint training process of the network is specifically as follows:
input: training times N1 and N2 of the model;
(1) Initializing network parameters using a gaussian function;
(2) The method comprises the steps of fixing weights of candidate sub-networks of key characters, and jointly training a candidate sub-network main network of a key time period with only one LSTM layer of the main network to obtain a candidate model of the key time period;
(3) Repeating the iteration, and training the main network after increasing the LSTM layer to three layers through N1 iterations;
(4) Fine-tuning the main network and the candidate sub-network of the key time period through N2 iterations;
(5) Fixing the key time segment candidate sub-network, and jointly training the key character candidate sub-network and the main network which only have one LSTM layer to obtain the key character candidate sub-network;
(6) Repeating the iteration, and training the main network after increasing the LSTM layer to three layers through N1 iterations;
(7) Fine-tuning the main network and the candidate sub-network of the key time period through N2 iterations;
(8) Obtaining a sub-network through N1 iterative joint training (4) and (7);
(9) Fine tuning the whole network model together through N2 iterations;
and (3) outputting: and finally converging to obtain the whole group behavior recognition model.
CN202010192335.3A 2020-03-18 2020-03-18 Group behavior identification method based on key space-time information driving and group co-occurrence structural analysis Active CN111414846B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010192335.3A CN111414846B (en) 2020-03-18 2020-03-18 Group behavior identification method based on key space-time information driving and group co-occurrence structural analysis

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010192335.3A CN111414846B (en) 2020-03-18 2020-03-18 Group behavior identification method based on key space-time information driving and group co-occurrence structural analysis

Publications (2)

Publication Number Publication Date
CN111414846A CN111414846A (en) 2020-07-14
CN111414846B true CN111414846B (en) 2023-06-02

Family

ID=71494765

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010192335.3A Active CN111414846B (en) 2020-03-18 2020-03-18 Group behavior identification method based on key space-time information driving and group co-occurrence structural analysis

Country Status (1)

Country Link
CN (1) CN111414846B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114495151B (en) * 2021-12-07 2024-09-27 海纳云物联科技有限公司 Group behavior recognition method

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8009863B1 (en) * 2008-06-30 2011-08-30 Videomining Corporation Method and system for analyzing shopping behavior using multiple sensor tracking
CN108764011A (en) * 2018-03-26 2018-11-06 青岛科技大学 Group recognition methods based on the modeling of graphical interactive relation
CN109101896A (en) * 2018-07-19 2018-12-28 电子科技大学 A kind of video behavior recognition methods based on temporal-spatial fusion feature and attention mechanism

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8009863B1 (en) * 2008-06-30 2011-08-30 Videomining Corporation Method and system for analyzing shopping behavior using multiple sensor tracking
CN108764011A (en) * 2018-03-26 2018-11-06 青岛科技大学 Group recognition methods based on the modeling of graphical interactive relation
CN109101896A (en) * 2018-07-19 2018-12-28 电子科技大学 A kind of video behavior recognition methods based on temporal-spatial fusion feature and attention mechanism

Also Published As

Publication number Publication date
CN111414846A (en) 2020-07-14

Similar Documents

Publication Publication Date Title
Karim et al. Insights into LSTM fully convolutional networks for time series classification
CN106570477A (en) Vehicle model recognition model construction method based on depth learning and vehicle model recognition method based on depth learning
CN101447020B (en) Pornographic image recognizing method based on intuitionistic fuzzy
CN113297936B (en) Volleyball group behavior identification method based on local graph convolution network
CN109101876A (en) Human bodys' response method based on long memory network in short-term
CN110222634A (en) A kind of human posture recognition method based on convolutional neural networks
CN109034034A (en) A kind of vein identification method based on nitrification enhancement optimization convolutional neural networks
CN106909938A (en) Viewing angle independence Activity recognition method based on deep learning network
CN110532862B (en) Feature fusion group identification method based on gating fusion unit
CN111178319A (en) Video behavior identification method based on compression reward and punishment mechanism
CN111582397A (en) CNN-RNN image emotion analysis method based on attention mechanism
Benkaddour CNN based features extraction for age estimation and gender classification
CN108596243A (en) The eye movement for watching figure and condition random field attentively based on classification watches figure prediction technique attentively
CN111988329A (en) Network intrusion detection method based on deep learning
CN111597929A (en) Group behavior identification method based on channel information fusion and group relation space structured modeling
Wang et al. High-Fidelity Simulated Players for Interactive Narrative Planning.
CN111104975A (en) Credit assessment model based on breadth learning
Hassan Deep learning architecture using rough sets and rough neural networks
CN111414846B (en) Group behavior identification method based on key space-time information driving and group co-occurrence structural analysis
CN113221683A (en) Expression recognition method based on CNN model in teaching scene
Zheng et al. Fruit tree disease recognition based on convolutional neural networks
Hu et al. Learning salient features for flower classification using convolutional neural network
CN116433800A (en) Image generation method based on social scene user preference and text joint guidance
CN115456173A (en) Generalized artificial neural network unsupervised local learning method, system and application
Zhang et al. Surveillance videos classification based on multilayer long short-term memory networks

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right
TR01 Transfer of patent right

Effective date of registration: 20240412

Address after: 509 Kangrui Times Square, Keyuan Business Building, 39 Huarong Road, Gaofeng Community, Dalang Street, Longhua District, Shenzhen, Guangdong Province, 518000

Patentee after: Shenzhen Litong Information Technology Co.,Ltd.

Country or region after: China

Address before: 266000 Songling Road, Laoshan District, Qingdao, Shandong Province, No. 99

Patentee before: QINGDAO University OF SCIENCE AND TECHNOLOGY

Country or region before: China