CN108830157A - Human bodys' response method based on attention mechanism and 3D convolutional neural networks - Google Patents

Human bodys' response method based on attention mechanism and 3D convolutional neural networks Download PDF

Info

Publication number
CN108830157A
CN108830157A CN201810463529.5A CN201810463529A CN108830157A CN 108830157 A CN108830157 A CN 108830157A CN 201810463529 A CN201810463529 A CN 201810463529A CN 108830157 A CN108830157 A CN 108830157A
Authority
CN
China
Prior art keywords
frame
convolutional neural
layer
neural networks
layers
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201810463529.5A
Other languages
Chinese (zh)
Other versions
CN108830157B (en
Inventor
袁和金
牛为华
张颖
崔克彬
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Power Investment Northeast Energy Technology Co ltd
North China Electric Power University
Original Assignee
North China Electric Power University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by North China Electric Power University filed Critical North China Electric Power University
Priority to CN201810463529.5A priority Critical patent/CN108830157B/en
Publication of CN108830157A publication Critical patent/CN108830157A/en
Application granted granted Critical
Publication of CN108830157B publication Critical patent/CN108830157B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/20Movements or behaviour, e.g. gesture recognition
    • G06V40/23Recognition of whole body movements, e.g. for sport training
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/049Temporal neural networks, e.g. delay elements, oscillating neurons or pulsed inputs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/41Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items

Abstract

The Human bodys' response method based on attention mechanism and 3D convolutional neural networks that the invention discloses a kind of, human body Activity recognition method constructs a 3D convolutional neural networks, and the input layer of the 3D convolutional neural networks includes two channels of original gradation figure and attention matrix.The 3D CNN model of the human body behavior in identification video is constructed in this method, introduce attention mechanism, the distance of two interframe is calculated as attention matrix, binary channels is constituted with original human body behavior video sequence to be input in the 3D CNN of building, and convolution operation is allowed to carry out feature extraction emphatically to case of visual emphasis region.Simultaneously, 3DCNN structure is optimized, Dropout layers are added in a network and freezes network portion connection weight at random, use ReLU activation primitive, improve network sparsity, solve the problems, such as to increase with dimension, calculation amount increases severely, gradient disappears, and prevents the over-fitting under small data set caused by the number of plies is deepened, reduce the loss of time while promoting Network Recognition accuracy rate.

Description

Human bodys' response method based on attention mechanism and 3D convolutional neural networks
Technical field
The present invention relates to Human bodys' response method, espespecially a kind of people based on attention mechanism and 3D convolutional neural networks Body Activity recognition method.
Background technique
Intelligent video analysis is always the research field that there is Important Academic to be worth, and Human bodys' response is as in the field Essential a part becomes new research hotspot, in intelligent video monitoring, advanced human-computer interaction, sports analysis All have broad application prospects with content based video retrieval system etc..The Human bodys' response method of mainstream makes mostly at present The feature manually designed characterizes the human motion in video, as profile, outline, HOG, Harris, SIFT and this A little extensions etc. of the feature in three-dimensional.Artificial design features are a kind of wisdom and priori knowledge using the mankind, and these are known Know the good mode being applied in target and Activity recognition technology.But this mode, which needs manually to excavate, can show movement Feature, and the sometimes more difficult substantive characteristics for showing movement of the feature of artificial selection, are affected to recognition result.
Therefore, Human bodys' response accuracy rate in video how is improved, is this preferably using the raw information in video Field technical staff makes great efforts the direction of research.
Summary of the invention
In view of this, it is a primary object of the present invention to improve Human bodys' response accuracy rate in video, it is contemplated that video It as continuous sequence of the image on time dimension that be mutually related, can be handled, can be incited somebody to action by convolutional neural networks Original video is entered directly into the neural network of building, carries out the training and identification of human body behavior, an object of the present invention It is to propose a kind of 3D convolutional neural networks model based on attention mechanism that can preferably utilize the raw information in video.
To achieve the above object, the present invention provides a kind of human body row based on attention mechanism and 3D convolutional neural networks For recognition methods, which is characterized in that human body Activity recognition method constructs a 3D convolutional neural networks, the 3D convolutional Neural The input layer of network includes two channels of original gradation figure and attention matrix.
Preferably, the attention matrix and is obtained after being normalized by calculating the difference between two continuous frames 's.
Preferably, the calculating of the attention matrix be using to two calculus of finite differences of the Difference Calculation between two continuous frames or Person is using three adjacent frame images as one group of progress, three frame difference method of difference again.Three traditional frame difference methods are that present frame and before and after frames are poor Difference again is carried out after point, it is " union " for taking difference result twice that the present invention, which is further improved, this concept of union is every by taking The larger value of a pixel present frame and before and after frames difference result obtains, which can indicate that maximum variation occurs before and after present frame Region.
Three frame difference method is the difference image for finding out present frame and former frame and present frame and a later frame respectively, is continued Two frame differences are allowed to make the difference again.
Preferably, attention matrix A is to be calculated by the following formula to obtain in the two frame differences method:
Wherein, x, y are the coordinate of target pixel points, and t is current frame number, and t-1 indicates the former frame of present frame, ItTo work as Previous frame is to calculate the distance between adjacent two frame in x, the gray value of y location, formula (3), will by the threshold value T in formula (2) No conspicuousness region of variation is rejected, and is obtained conspicuousness region of variation ID, is adjusted the distance and be normalized by formula (1), final To attention matrix A, wherein min and max be minimum value in all pixels in conspicuousness region of variation ID in gray value and Maximum value, the three-dimensional matrice can indicate to act conspicuousness region of variation in the human body behavior video of input.
Preferably, the Three image difference step is:
1) the continuous three frames image I in selecting video frame sequencet-1(x, y), It(x, y), It+1(x, y) calculates separately phase The difference D of adjacent two field picturest-1,t(x, y), Dt,t+1(x,y):
2) noise jamming is excluded by selecting suitable threshold value T to extract conspicuousness region of variation to obtained difference image:
3) two difference image logical "or"s will be obtained in one group, obtain the union of region of variation between two continuous frames, Obtain the front and back conspicuousness region of variation of the intermediate frame in three frame images, B (x, y),
B (x, y)=max (B1(x,y),B2(x,y)) (23)
4) finally obtained difference image is normalized, obtains frame difference channel A (x, y), which being capable of table Show movement conspicuousness region of variation in the human body behavior video of input.
Preferably, the 3D convolutional Neural pessimistic concurrency control of the 3D convolutional neural networks includes:
One binary channels input layer, multiple 3D convolutional layers and multiple ponds 3D layer intermesh interspersed, and final connection connects entirely Classification results are obtained after connecing layer, attention matrix is inputted with original gradation video frame cube by the binary channels input layer together Into neural network model.
Preferably, the full articulamentum is two, there are one Dropout layers respectively before two full articulamentums.It is described Dropout probability is set as the decimal between 0.25 to 0.5.
The preferred Dropout probability is respectively 0.5 and 0.25.
Preferably, the 3D convolutional layer and the pond 3D layer are to be respectively 3-7.The preferred 3D convolutional layer and the pond 3D The number of layer is respectively 5.
Preferably, the 3D convolutional Neural pessimistic concurrency control of the 3D convolutional neural networks includes:1 binary channels input layer, 5 3D volumes Lamination and 5 pond 3D layers intermesh it is interspersed, finally connect 2 full articulamentums after obtain classification results, in 2 full articulamentums Dropout operation twice is carried out respectively,
Wherein:
C1 to C5 is convolutional layer, and every layer of convolution kernel is 3 × 3 × 3, and convolution nuclear volume is incremented by successively by 16 to 256, with Just further types of high-level characteristic is generated from rudimentary feature combination, at C1 layers, convolution kernel is to attention matrix and original video Frame carries out binary channels convolution,
S1 to S5 layers are down-sampling layer, using maximum pond method, reduce the resolution ratio of characteristic pattern, reduce characteristic pattern rule Mould reduces calculation amount, improves the tolerance to distort to input picture;Wherein use 2 × 2 × 2 window right simultaneously for S2, S4 layers Time dimension and Spatial Dimension carry out down-sampling, other layers use 1 × 2 × 2 window, adopt under only carrying out on Spatial Dimension Sample;
D1 layers are full articulamentums, include 256 neurons, the feature cube and D1 layers of 256 nerves of S5 layers of output Member is connected, and the input video of continuous 15 frame is converted into the feature vector of 256 dimensions in this layer, uses between S5 and D1 Dropout layers, freezes S5 layers with 0.25 probability and connected with D1 layers of part;
D2 layers are second full articulamentums while being also output layer, and neuron number 6 is identical with target category number, D2 layers of each neuron are connect entirely with 256 neuron of D1 layer, are finally classified by classifier softmax recurrence, are obtained energy The output of enough marking behavior classifications.
Preferably, the 3D convolutional neural networks use performance of the ReLU as activation primitive to promote depth network.It is preferred that , 3D convolutional layer therein and full articulamentum D1 use ReLU as activation primitive, and output layer uses Softmax as activation letter Number, majorized function use SGD function, and loss function intersects entropy function using multiclass.
Wherein, log-likelihood cost function, formula are:
C=- ∑kyklogak (5)
Wherein, akIndicate the output valve of k-th of neuron, ykIndicate the corresponding true value of k-th of neuron, value be 0 or 1.The gradient formula of neural network weight w and biasing b are as follows:
Wherein j is current layer neuron serial number, and k is the upper one layer neuron serial number being attached thereto, and L indicates Current neural member The number of plies.Log-likelihood function has nonnegativity as cross entropy cost function, therefore target is exactly to minimize cost function, When really output a and desired output y are close, cost function is close to 0.Variance generation can be overcome using entropy function is intersected Valence function updates the excessively slow problem of weight.Softmax function cooperation log-likelihood cost function can carry out classify more well Neural metwork training under task.
By the above method, the present invention mainly has the following advantages:
Present invention uses the attention channels based on visual perception, carry out auxiliary nervous network from original video frame Feature extraction is carried out, network carries out convolution operation to two channels simultaneously, and the feature in two channels interacts.It mentions Human bodys' response accuracy rate is proposed preferably using the raw information in video based on attention mechanism in high video 3D convolutional neural networks model.
One deep layer Three dimensional convolution nerve net of the model construction,
Attention mechanism is introduced, obtains to describe the attention torque in human motion region by calculating interframe distance Battle array, is combined into input of the binary channels as convolutional neural networks for attention matrix and original video, is carried out using 3D convolution kernel Convolution operation extracts feature of the human body behavior in time domain and airspace in video.
In order to overcome overfitting problem when network training, increase Dropout layers in network structure, in the training process with Random " freezing " the partial nerve member of certain proportion improves network sparsity, alleviates network over-fitting to a certain extent.
ReLU activation primitive has been used, network sparsity is improved, has solved to increase with dimension, be calculated caused by number of plies intensification The problem of amount increases severely, gradient disappears, prevents the over-fitting under small data set, reduces the time while promoting Network Recognition accuracy rate Loss
It is on KTH data set the experimental results showed that, which has preferable recognition effect.
Detailed description of the invention
The 3D convolutional neural networks frame based on attention mechanism of Fig. 1 specific embodiments of the present invention;
Fig. 2 often uses activation primitive (Sigmoid, tanh and ReLU function) image;
The visable representation schematic diagram of Fig. 3 Dropout;
Fig. 4 (a) (b) vision attention example diagram;
Attention schematic diagram of mechanism in Fig. 5 convolutional Neural net;
Fig. 6 (a) (b) attention mechanism comparative experiments curve:(a) recognition accuracy curve (b) error curve;
The three-frame difference flow diagram of Fig. 7 specific embodiment of the invention 2;
Fig. 8 binary channels 3D CNN network structure of the invention based on attention mechanism;
The experiment flow figure of Fig. 9 specific embodiment of the invention.
Specific embodiment
Below with reference to the embodiments and with reference to the accompanying drawing technical solution of the present invention is described in further detail.
The Human bodys' response method based on attention mechanism and 3D convolutional Neural net that the present invention provides a kind of, it is main It is that vision noticing mechanism is applied to 3D convolutional neural networks human body to make in identification model.Identification video is constructed in the method In human body behavior 3D CNN model, introduce attention mechanism, the distance for calculating two interframe is used as attention matrix, and original Beginning human body behavior video sequence constitutes binary channels and is input in the 3D CNN of building, and convolution operation is allowed to carry out case of visual emphasis region Feature extraction emphatically.Meanwhile 3D CNN structure is optimized, Dropout layers are added in a network and freezes network portion at random Connection weight improves network sparsity using ReLU activation primitive, solves to increase with dimension, calculate caused by number of plies intensification The problem of amount increases severely, gradient disappears, prevents the over-fitting under small data set, reduces the time while promoting Network Recognition accuracy rate Loss.
Present invention introduces attention mechanism, be because human visual perception a key property be will not immediately treat it is whole A scene, but be primarily focused in certain a part of visual space, it is scanned on the image according to certain order, from One zone-transfer is to another region.The information for obtaining sometime some regional area, then by the information in these regions Combine for whole judgement and impression.
And attention mechanism in RNN be widely utilized that for describe in sequence or between sequence element correlation, Different weights are assigned to attention matrix by different correlations, network is allowed to give more concerns to the element of big weight.
Referring to fig. 4, wherein Fig. 4 (a) is demonstrated by under stationary state, and it is poor with ambient enviroment that the vision of people can always be paid close attention to emphatically Different biggish part, can extract the salient region of still image under this principle.And Fig. 4 (b) is demonstrated by movement shape Under state, human visual can pay close attention to emphatically changed part in the visual field, and ignore static constant part, wherein the portion changed It point more importantly acts on to judging that current kinetic classification has.
That is, in the video for including human body behavior, human body target is as the main part for being different from background Salient region.However it is different from the target identification based on image, during continuous movement, in the movement that we are more concerned about Changed part, as shown in Fig. 4 (b).Human body behavior in video is a dynamic process, because of human body behavior act Continuity and otherness feature, the difference section of human body behavior more can play row as salient region between successive video frames For the directive function of identification.For Human bodys' response, we are concerned with human region changed in movement, Wish that training network can give this higher concern in significant changes part.
The present invention introduces attention matrix in the input layer of building 3D convolutional neural networks as a result, can be continuous by calculating Vision noticing mechanism is applied to 3D convolutional neural networks human action by the distance between frame (building inter-frame difference channel) to be known In other model.
As shown in Figure 1, in specific embodiments of the present invention 1, the 3D convolutional neural networks frame that uses for:
1) network structure
3D convolutional Neural pessimistic concurrency control of the invention is as shown in Figure 1, it is the 3D convolution mind the present invention is based on attention mechanism Through network frame schematic diagram, it includes 1 binary channels input layer, 5 3D convolutional layers and 5 pond 3D layers intermesh it is interspersed, Classification results finally are obtained after 2 full articulamentums of connection, and there are 2 Dropout layers before 2 full articulamentums.
Its first layer is input layer, which includes two channels of original gradation figure and attention matrix, original gradation Data are made of the gray level image of continuous 15 adjacent video frames, and attention matrix is by formula (1) in this embodiment Calculated interframe is formed apart from normalization matrix.
As shown in Figure 5:The present invention introduces attention matrix in the input layer of building 3D convolutional neural networks, can be by calculating Vision noticing mechanism is applied to 3D convolutional neural networks human body by the distance between two continuous frames or building inter-frame difference channel In action recognition model.In this embodiment, the input expanding for being the neural network model that will be constructed is binary channels, and will Attention matrix is input in neural network model as another channel with original gradation video frame cube together.
Wherein, the attention matrix of this specific embodiment is obtained after normalization by calculating the distance between two continuous frames It arrives, the distance between two continuous frames can describe the difference of human action in motion process.Pass through the entire frame of attention matrix description The region that should be paid close attention in cube.
Wherein, attention matrix A can be calculated by formula once:
Wherein formula (3) is to calculate the distance between adjacent two frame, will be become without conspicuousness by the threshold value T in formula (2) Change region to reject, is adjusted the distance and be normalized by formula (1), finally obtain attention matrix, which can indicate Conspicuousness region of variation is acted in the human body behavior video of input.
In this embodiment, C1 to C5 be convolutional layer, every layer of convolution kernel is 3 × 3 × 3, convolution nuclear volume by 16 to 256 is incremented by successively, to generate further types of high-level characteristic from rudimentary feature combination;And at C1 layers, convolution kernel It is that binary channels convolution is carried out to attention matrix and original video frame.
S1 to S5 layers are down-sampling layer, are using maximum pond method, to reduce characteristic pattern in this embodiment Resolution ratio reduces characteristic pattern scale, reduces calculation amount, improves the tolerance to distort to input picture.Preferably, therein S2, S4 layers are to carry out down-sampling to time dimension and Spatial Dimension simultaneously using 2 × 2 × 2 window, and other layers then use 1 × 2 × 2 window only carries out down-sampling on Spatial Dimension.
D1 layers are full articulamentum (FC), include 256 neurons, the feature cubes of S5 layer output and 256 of D1 layers Neuron is connected, and the input video of continuous 15 frame is converted into the feature vector of 256 dimensions in this layer.And make between S5 and D1 It is to freeze S5 layers with 0.25 probability to connect with D1 layers of part in this embodiment with Dropout layers.
D2 layers are second full articulamentums while being also output layer (Output), neuron number 6 and target category number Mesh is identical;And D2 layers of each neuron are connect entirely with 256 neuron of D1 layer, are finally returned and are carried out by classifier softmax Classification, obtains the output for capableing of marking behavior classification.
2) activation primitive
3D convolutional neural networks of the invention are using ReLU (amendment linear unit, Rectified Linear Units) As activation primitive.Fig. 2 gives the image of Sigmoid, tanh and ReLU function.Although more common in traditional neural network Sigmoid system function is larger to the signal gain of central area, small to the signal gain of two lateral areas, maps in the feature space of signal On, there is good effect.But as the network number of plies is deepened, this kind of activation primitive calculation amount increases, and when close to saturation region, Slowly, derivative tends to 0, when error gradient is sought in backpropagation for transformation, the case where gradient disappears is easy to appear, to be unable to complete depth The training of layer network.Therefore as the network number of plies is gradually deepened, more can using this simple, quick linear activation primitive of ReLU Promote the performance of depth network.
And ReLU calculation formula is:ReLU (x)=max (0, x), is substantially piecewise linear model, forward calculation and It is all relatively simple that reversed gradient propagates calculating;Because ReLU does not have zone of saturation, thus gradient disappearance problem is less likely to occur;Due to ReLU is closed on the left of y-axis, so that the output of certain hidden neurons is 0, i.e. network becomes sparse, the activation that different structure generates Path can learn to arrive relatively sparse feature, alleviate over-fitting to a certain extent preferably from training data.Due to The 3D convolutional neural networks number of plies that the present invention constructs compared with deep, training sample scale is big, the training time is longer, trained acceleration and Gradient disappearance is inhibited to have very important meaning for the practicability of convolutional neural networks.
3)Dropout
Referring to the visable representation schematic diagram of Fig. 3 Dropout, 3D convolutional neural networks of the invention are a kind of depth nerves Network, and structure is complicated for deep neural network, as the network number of plies is deepened, the increase of the number of iterations, network is for training sample In include noise information also start to be fitted as feature, Dropout can then alleviate this problem to a certain extent. As shown in figure 3, Dropout refers to the neuron in network that according to certain probability, temporarily " freezing " connection is not involved in this Take turns propagated forward and reversed error calculation.
Dropout process is equivalent to pulls out the simpler network of structure from original network.And every wheel training The neuron connection probability being " frozen " is random, other neurons that this mode is forced single neuron and picked out at random are total With work, rather than excessively rely on the effect of certain specific neurons after training by successive ignition, this mode solves Overfitting problem caused by output excessively relies on certain middle layer neurons.
Therefore, pond layer S5 and full articulamentum in specific embodiments of the present invention, in 3D convolutional neural networks model It joined two Dropout layers between D1, between D1 and D2, be respectively provided with 0.5 and 0.25 Dropout probability.
Inventor tests the algorithm of the invention on KTH human body behavioral data collection.KTH database is included in 6 classes movement that lower 25 people of 4 different scenes complete (walking, jogging, running, boxing, hand waving and Hand clapping), amount to 600 videos, identical behavior has carried out 3 to 4 times in each video, can extract 2391 in total A video sample contains dimensional variation, clothing variation and illumination variation.The present embodiment is chosen 16 people in 25 people of data set and is made For training sample, 9 people are as test sample.
Experimental procedure includes:
1) experiment is first handled the human body behavior video in data set for grayscale mode;
2) original video of 3D convolutional Neural net that 15 frames are constructed as the present invention is extracted to input, having a size of 15 × 120 × 160;
3) attention matrix is calculated according to formula (1) to the 15 frame human body behavior video samples extracted,
4) using attention matrix as second channel of 3D convolutional Neural net and greyscale video data together as input It is trained.
In the 3D CNN structure that the embodiment of the present invention 1 constructs, 5 3D convolutional layers and down-sampling layer are staggered, connection two A layer that connects entirely obtains output result.Wherein, 3D convolutional layer C1 to C5 and full articulamentum D1 uses ReLU as activation primitive, output Layer D2 uses Softmax as activation primitive, and majorized function uses SGD (stochastic gradient descent) function, and loss function uses more Class intersects entropy function.Every 10 samples carry out a gradient calculating when training, train 50 times in total.
Table 1 gives the recognition accuracy of some common Human bodys' response methods on KTH data set.Present invention building 3D convolutional Neural pessimistic concurrency control recognition accuracy be 91.67%, the network accuracy rate that joined attention mechanism reaches 92.59%, it is higher than the 3D CNN model of Ji [3] building.It is further seen that using the engineers' such as HOG, light stream, SIFT The relatively accurate rate of the model of feature is higher, the reason is that such methods usually require adequately to pre-process video, so After carry out feature extraction, be difficult to extract the accurate feature for being enough to describe complex behavior in the video under complex environment.And this Various features of the inventive method independent of engineer, using the powerful self-learning capability of deep neural network, from a large amount of Voluntarily acquistion human body behavioural characteristic in training sample, as the number of plies is deepened, the feature learnt can be more abstracted, be better able to from Different human body behaviors is substantially described.After joined attention mechanism, network can be under the action of attention matrix, emphatically Changed part in human body behavior is paid close attention to, extraneous background is ignored, obtains more outstanding recognition capability.
Accuracy rate of the various Human bodys' response algorithms of table 1 on KTH data set
As shown in fig. 6, the present invention has carried out comparative experiments to the effect of neural network recognization ability to attention mechanism, it is real Line represents the 3D convolutional Neural pessimistic concurrency control that attention mechanism is not added that the present invention constructs, and dotted line representative joined attention mechanism 3D convolutional Neural pessimistic concurrency control.Fig. 6 (a) is the recognition accuracy curve in test data set, error when Fig. 6 (b) is training Curve.After attention mechanism is added, network has just reached higher accuracy rate, and error in trained former wheel epoch Decline is very fast, starts to restrain quickly.Under the action of attention matrix, network captures human body behavioural characteristic quickly, and does not have Have and has passed through the training of tens of wheels using the network of attention mechanism just gradually human body behavioural characteristic has been arrived in study.In 3D convolutional Neural Attention mechanism is introduced in net can promote the accuracy rate of Human bodys' response.
Another specific embodiment 2 of the invention has done further improvement to above-mentioned attention matrix calculation, and upper One specific embodiment main difference is that, the frame difference channel of the present embodiment is calculated using Three image difference, three Frame difference can describe the variation between present frame and before and after frames, and the two frame differences that a upper specific embodiment uses then only describe and work as Variation in previous frame and front or rear frame single one direction.And three frame difference methods of the present embodiment use be not as tradition three Frame difference method is the same, after finding out the difference of present frame and former frame and present frame and a later frame, continues that two frame differences is allowed to make the difference again, But the union of two frame differences has been sought, a movement can be fully described in this way to change in an of short duration time interval Region.
2 concrete scheme of embodiment is as follows:
The frame difference channel of the present embodiment is calculated using Three image difference, and calculation process is as shown in fig. 7, be by by phase Three adjacent frame images can preferably detect the front and back region of variation of intermediate frame as one group of progress difference again.Frame difference can To describe the difference of human action in motion process, pass through the area that should be paid close attention in the entire frame cube of frame difference matrix description Domain.
The attention matrix (Three image difference) of 1 the present embodiment:
1) the continuous three frames image I in selecting video frame sequencet-1(x, y), It(x, y), It+1(x, y) calculates separately phase The difference D of adjacent two field picturest-1,t(x, y), Dt,t+1(x,y):
2) noise jamming is excluded by selecting suitable threshold value T to extract conspicuousness region of variation to obtained difference image:
3) two difference image logical "or"s will be obtained in one group, obtain the union of region of variation between two continuous frames, Obtain the front and back conspicuousness region of variation of the intermediate frame in three frame images, B (x, y).
B (x, y)=max (B1(x,y),B2(x,y)) (23)
4) finally obtained difference image is normalized, obtains frame difference channel A (x, y), which being capable of table Show movement conspicuousness region of variation in the human body behavior video of input.
2 network structures
The binary channels 3D convolutional Neural pessimistic concurrency control of the present embodiment building is as shown in figure 8, it equally includes 1 binary channels input Layer, 5 3D convolutional layers and 5 pond 3D layers intermesh interspersed, classification results are obtained after 2 full articulamentums of final connection, 2 A full articulamentum carries out Dropout operation twice respectively.
First layer is input layer.Input layer includes that original gradation video frame and inter-frame difference (attention matrix) two are logical Road, original gradation data are made of the gray level image of continuous 15 adjacent video frames, and frame difference channel is calculated by formula (21)-(24) Interframe out is formed apart from normalization matrix.Inputted video image and frame difference matrix are all treated as 120 × 160 pixels.
C1 to C5 is convolutional layer, and every layer of convolution kernel is 3 × 3 × 3, and convolution nuclear volume is incremented by successively by 16 to 256, with Just further types of high-level characteristic is generated from rudimentary feature combination.At C1 layers, convolution kernel is to attention matrix and original video Frame carries out binary channels convolution.
S1 to S5 layers are down-sampling layer, using maximum pond method, reduce the resolution ratio of characteristic pattern, reduce characteristic pattern rule Mould reduces calculation amount, improves the tolerance to distort to input picture.Wherein use 2 × 2 × 2 window right simultaneously for S2, S4 layers Time dimension and Spatial Dimension carry out down-sampling, other layers use 1 × 2 × 2 window, adopt under only carrying out on Spatial Dimension Sample.
D1 layers are full articulamentums, include 256 neurons.The feature cube and D1 layers of 256 nerves of S5 layers of output Member is connected, and the input video of continuous 15 frame is converted into the feature vector of 256 dimensions in this layer.
D2 layers are second full articulamentums while being also output layer, and neuron number 6 is identical with target category number. D2 layers of each neuron are connect entirely with 256 neuron of D1 layer, are finally classified by classifier softmax recurrence, are obtained energy Enough indicate the output of behavior classification.
It joined between the pond layer S5 and full articulamentum D1 in 3D convolutional neural networks model, between D1 and D2 Dropout operation, provided with 0.25 Dropout probability.
3 Human bodys' response algorithm flows (experimental analysis)
1) by the walking in KTH data set, jogging, running, boxing, hand waving, hand The human body behavior video input of six seed type of clapping is read frame by frame into computer, by OpenCV, is handled as grayscale mode;
2) the blank background frame for not including human region in video is rejected, the remaining video comprising human body behavior is used 15 frame key frames are extracted in equispaced sampling, the image extracted are saved, as the original of the 3D convolutional Neural net constructed herein Video input, having a size of 15 × 120 × 160;
3) inter-frame difference channel is calculated to the key frame sample extracted according to inter-frame difference path computation method;
4) the inter-frame difference image extracted and original gradation key frame are constituted into binary channels;
5) according to building binary channels 3D convolutional neural networks model.
6) it inputs into being trained in binary channels 3D convolutional neural networks model, single-wheel calculates that steps are as follows:
In first convolution sum down sample module, the set sequence of frames of video having a size of 15 × 120 × 160 passes through C1 After the convolution operation of 16 3 × 3 × 3 convolution kernels of layer, 16 15 × 120 × 160 feature cubes are obtained;By the 1 of S1 × 2 × 2 down-samplings obtain 16 15 × 60 × 80 feature cubes;
In second convolution sum down sample module, after the convolution operation of C2 layers of 32 3 × 3 × 3 convolution kernels, obtain 32 15 × 60 × 80 feature cubes;By 2 × 2 × 2 down-samplings of S2,32 7 × 30 × 40 features cube are obtained Body;
In third convolution sum down sample module, after the convolution operation of C3 layers of 64 3 × 3 × 3 convolution kernels, obtain 64 7 × 30 × 40 feature cubes;By 1 × 2 × 2 down-samplings of S3,64 7 × 15 × 20 features cube are obtained Body;
In 4th convolution sum down sample module, after the convolution operation of C4 layers of 128 3 × 3 × 3 convolution kernels, obtain To 128 7 × 15 × 20 feature cubes;By 2 × 2 × 2 down-samplings of S4, it is vertical to obtain 128 3 × 7 × 10 features Cube;
In 5th convolution sum down sample module, after the convolution operation of C5 layers of 256 3 × 3 × 3 convolution kernels, obtain To 256 3 × 7 × 10 feature cubes;By 1 × 2 × 2 down-samplings of S5,256 3 × 3 × 5 features cube are obtained Body;
Operated by five groups of convolution sum down-samplings, the feature cube graduation that will be obtained, obtain the features of 11520 dimensions to Amount, is then connected with the full articulamentum containing 256 neurons, finally connects the output layer containing 6 neurons, obtain 6 dimensions Output.
After a wheel fl transmission, the output corresponding with former behavior classification of the reality output of network is subjected to error calculation And backpropagation.
Network model uses log-likelihood cost function as loss function.Variance loss function is in training neural network It may result in the slack-off problem of training speed, i.e., initial output valve is remoter from true value, and training speed is slower.Cross entropy generation Valence function can solve this problem.When output layer uses softmax activation primitive, then log-likelihood cost function is used, Formula is:
C=- ∑kyklogak (5)
Wherein, akIndicate the output valve of k-th of neuron, ykIndicate the corresponding true value of k-th of neuron, value be 0 or 1.The gradient formula of neural network weight w and biasing b are as follows:
Wherein j is current layer neuron serial number, and k is the upper one layer neuron serial number being attached thereto, and L indicates Current neural member The number of plies.Log-likelihood function has nonnegativity as cross entropy cost function, therefore target is exactly to minimize cost function, When really output a and desired output y are close, cost function is close to 0.Variance generation can be overcome using entropy function is intersected Valence function updates the excessively slow problem of weight.Softmax function cooperation log-likelihood cost function can carry out classify more well Neural metwork training under task.
SGD (stochastic gradient descent) majorized function is used in network model, this method is the batch version of gradient decline. For training dataset, it is classified as n batch, each batch includes m sample.Each more new capital utilizes a batch Data, rather than entire training set.I.e.:
xt+1=xt+Δxt (7)
Δxt=-η gt (8)
Wherein, η is learning rate, in this experiment, takes η=0.01.gtFor x t moment gradient.Use stochastic gradient descent The advantages of optimization method, is:It is often unrealistic in time using the update of entire data set when amount of training data is big, The method of batch can reduce the pressure of machine, and can restrain faster.
Training obtains the classification mould of accuracy rate 90 or more percent after the completion of iteration 50 times trained on KTH data set Type;
By human body behavior video to be sorted according to step 1) -4), gray channel and inter-frame difference channel are extracted, is inputted Into trained binary channels 3D convolutional neural networks, the correspondence classification of behavior video can be obtained, reach people in identification video The purpose of body behavior.
Skilled person will appreciate that dawn, in the above-described embodiments, basic content be it is general, the main distinction exists In the attention matrix calculation in specific embodiment 2 is the improvement of the calculation provided in embodiment 1, is used Three frame differences can describe the variation between present frame and before and after frames, and two frame differences in embodiment 1 then only describe present frame with Variation on front or rear frame single one direction.Moreover, three frame difference method use be not as traditional three frame difference methods, After finding out the difference of present frame and former frame and present frame and a later frame, continues that two frame differences is allowed to make the difference again, but asked two The union of a frame difference can fully describe movement changed region in an of short duration time interval in this way.
Wherein convolution kernel size passes through experimental verification, more rapid convergence and can have higher identification standard at 3 × 3 × 3 True rate, best performance;5 × 5 × 5 convolution kernel can reach highest recognition accuracy, but parameter amount is big, restrain slower.Net Network layers number can suitably increase and decrease according to the complexity of data set to be identified, but 7 feature extraction layers at most are not to be exceeded, otherwise Parameter amount is excessive, is not easy to realize real-time.The size of input video can be adjusted according to different application, but should not be too large or It is too small.It is excessive to will lead to containing excessive redundant details information in video, it is time-consuming big when training, it is easy to happen over-fitting;It is too small then Missing information is too many, can not extract enough features to be identified.General recommendations is down-sampled by the biggish video of resolution ratio To within 500 × 500, wherein video content is more complicated can retain biggish resolution ratio, and video content is relatively simple It can be downsampled to lesser size, but be not less than 100 × 100.Dropout probability recommended setting is small between 0.25 to 0.5 Number, the heterogeneous network number for being set as generating when 0.5 are most.
In conclusion the method for this patent does not need complicated pretreatment, obtained without the help of traditional artificial experience Common feature excavates the deeper for including in original video, more abstract motion feature using depth convolutional neural networks, can The raw information in video is made full use of, the Human bodys' response of complex scene can be preferably adapted to.And it does not need to carry out Description son construction, directly extracts space-time characteristic using 3D convolutional neural networks, eliminates complicated video pre-filtering and manual features Extraction step improves efficiency.
The above embodiments are merely illustrative of the technical scheme of the present invention and are not intended to be limiting thereof, although referring to above-described embodiment pair The present invention is described in detail, it should be understood by a person of ordinary skill in the art that still can be to of the invention specific Embodiment is modified or replaced equivalently, and without departing from any modification of spirit and scope of the invention or equivalent replacement, It is intended to be within the scope of the claims of the invention.

Claims (10)

1. a kind of Human bodys' response method based on attention mechanism and 3D convolutional neural networks, which is characterized in that the human body Activity recognition method constructs a 3D convolutional neural networks, the input layer of the 3D convolutional neural networks include original gradation figure and Two channels of attention matrix.
2. a kind of Human bodys' response side based on attention mechanism and 3D convolutional neural networks according to claim 1 Method, which is characterized in that the attention matrix is and to carry out normalizing by calculating the difference between two continuous frames or three frame images It is obtained after change.
3. a kind of Human bodys' response based on attention mechanism and 3D convolutional neural networks according to claim 2 Method, which is characterized in that three frame difference method is to find out the difference of present frame and former frame and present frame and a later frame respectively Then image takes " union " of difference result twice;This concept of union is poor by taking each pixel present frame and before and after frames The larger value of point result obtains, and the region of maximum variation occurs before and after so that the result is indicated present frame.
4. a kind of Human bodys' response side based on attention mechanism and 3D convolutional neural networks according to claim 2 Method, which is characterized in that in the two frame differences method, attention matrix A is to be calculated by the following formula to obtain:
Wherein, x, y are the coordinate of target pixel points, and t is current frame number, and t-1 indicates the former frame of present frame, and It is present frame In x, the gray value of y location, formula (3) is to calculate the distance between adjacent two frame D, will be without aobvious by the threshold value T in formula (2) Work property region of variation is rejected, and obtains conspicuousness region of variation ID, adjusting the distance by formula (1) is normalized, and finally obtains note Meaning power matrix A, wherein min and max is the minimum value and maximum in all pixels in conspicuousness region of variation ID in gray value Value, the three-dimensional matrice can indicate to act conspicuousness region of variation in the human body behavior video of input.
5. a kind of Human bodys' response side based on attention mechanism and 3D convolutional neural networks according to claim 3 Method, which is characterized in that the Three image difference step is:
1) the continuous three frames image I in selecting video frame sequencet-1(x, y), It(x, y), It+1(x, y) calculates separately adjacent two frame The difference D of imaget-1,t(x, y), Dt,t+1(x,y):
2) noise jamming is excluded by selecting suitable threshold value T to extract conspicuousness region of variation to obtained difference image:
3) two difference image logical "or"s will be obtained in one group, be obtained the union of region of variation between two continuous frames, be obtained The front and back conspicuousness region of variation of intermediate frame in three frame images, B (x, y),
B (x, y)=max (B1(x,y),B2(x,y)) (23)
4) finally obtained difference image is normalized, obtains frame difference channel A (x, y), which can indicate defeated Conspicuousness region of variation is acted in the human body behavior video entered.
6. a kind of human body behavior based on attention mechanism and 3D convolutional neural networks according to claim 1-5 Recognition methods, which is characterized in that the 3D convolutional Neural pessimistic concurrency control of the 3D convolutional neural networks includes:
One binary channels input layer, multiple 3D convolutional layers and multiple ponds 3D layer intermesh interspersed, finally connect full articulamentum After obtain classification results, attention matrix is input to mind by the binary channels input layer with original gradation video frame cube together Through in network model.
7. a kind of Human bodys' response side based on attention mechanism and 3D convolutional neural networks according to claim 6 Method, which is characterized in that the full articulamentum is two, has one Dropout layers respectively before two full articulamentums.
8. a kind of Human bodys' response side based on attention mechanism and 3D convolutional neural networks according to claim 8 Method, which is characterized in that the Dropout probability is set as the decimal between 0.25 to 0.5.
9. a kind of Human bodys' response side based on attention mechanism and 3D convolutional neural networks according to claim 6 Method, which is characterized in that the 3D convolutional layer and the pond 3D layer are a for respectively 3-7;Of the 3D convolutional layer and the pond 3D layer Number is respectively 5.
10. a kind of human body row based on attention mechanism and 3D convolutional neural networks according to claim 1-6 3D convolutional Neural pessimistic concurrency control for recognition methods, the 3D convolutional neural networks includes:1 binary channels input layer, 5 3D convolutional layers It intermeshes with 5 pond 3D layers interspersed, obtains classification results after finally connecting 2 full articulamentums, in 2 full articulamentum difference Dropout twice is carried out to operate,
Wherein:
C1 to C5 be convolutional layer, every layer of convolution kernel is 3 × 3 × 3, and convolution nuclear volume is incremented by successively by 16 to 256, so as to from Rudimentary feature combination generates further types of high-level characteristic, at C1 layers, convolution kernel to attention matrix and original video frame into Row binary channels convolution,
S1 to S5 layers are down-sampling layer, using maximum pond method, reduce the resolution ratio of characteristic pattern, reduce characteristic pattern scale, subtract Small calculation amount improves the tolerance to distort to input picture;Wherein use 2 × 2 × 2 window while to the time for S2, S4 layers Dimension and Spatial Dimension carry out down-sampling, other layers use 1 × 2 × 2 window, down-sampling is only carried out on Spatial Dimension;
D1 layers are full articulamentums, include 256 neurons, the S5 layer feature cube exported and D1 layers of 256 neuron phases Even, the input video of continuous 15 frame is converted into the feature vector of 256 dimensions in this layer, Dropout has been used between S5 and D1 Layer, freezes S5 layers with 0.25 probability and connects with D1 layers of part;
D2 layers are second full articulamentums while being also output layer, and neuron number 6 is identical with target category number, D2 layers Each neuron is connect entirely with 256 neuron of D1 layer, is finally classified by classifier softmax recurrence, obtaining can mark The output of note behavior classification;
Wherein, 3D convolutional layer and full articulamentum D1 use ReLU as activation primitive and are promoted the performance of depth network, output layer Use Softmax as activation primitive, majorized function uses SGD function, and loss function intersects entropy function using multiclass.
CN201810463529.5A 2018-05-15 2018-05-15 Human behavior identification method based on attention mechanism and 3D convolutional neural network Active CN108830157B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810463529.5A CN108830157B (en) 2018-05-15 2018-05-15 Human behavior identification method based on attention mechanism and 3D convolutional neural network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810463529.5A CN108830157B (en) 2018-05-15 2018-05-15 Human behavior identification method based on attention mechanism and 3D convolutional neural network

Publications (2)

Publication Number Publication Date
CN108830157A true CN108830157A (en) 2018-11-16
CN108830157B CN108830157B (en) 2021-01-22

Family

ID=64148794

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810463529.5A Active CN108830157B (en) 2018-05-15 2018-05-15 Human behavior identification method based on attention mechanism and 3D convolutional neural network

Country Status (1)

Country Link
CN (1) CN108830157B (en)

Cited By (51)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109598225A (en) * 2018-11-29 2019-04-09 浙江大学 Sharp attention network, neural network and pedestrian's recognition methods again
CN109635926A (en) * 2018-11-30 2019-04-16 深圳市商汤科技有限公司 Attention characteristic-acquisition method, device and storage medium for neural network
CN109635790A (en) * 2019-01-28 2019-04-16 杭州电子科技大学 A kind of pedestrian's abnormal behaviour recognition methods based on 3D convolution
CN109657571A (en) * 2018-12-04 2019-04-19 北京京东金融科技控股有限公司 A kind of childbirth monitoring method and device
CN109784144A (en) * 2018-11-29 2019-05-21 北京邮电大学 A kind of kinship recognition methods and system
CN109829512A (en) * 2019-03-01 2019-05-31 华东师范大学 A kind of image recognition mould group based on deep neural network
CN109919110A (en) * 2019-03-13 2019-06-21 北京航空航天大学 Video area-of-interest-detection method, device and equipment
CN109961019A (en) * 2019-02-28 2019-07-02 华中科技大学 A kind of time-space behavior detection method
CN110008993A (en) * 2019-03-01 2019-07-12 华东师范大学 A kind of end-to-end image-recognizing method based on deep neural network
CN110110642A (en) * 2019-04-29 2019-08-09 华南理工大学 A kind of pedestrian's recognition methods again based on multichannel attention feature
CN110163302A (en) * 2019-06-02 2019-08-23 东北石油大学 Indicator card recognition methods based on regularization attention convolutional neural networks
CN110166388A (en) * 2019-05-25 2019-08-23 西南电子技术研究所(中国电子科技集团公司第十研究所) The intelligence communication signal modulation mode identification method of CNN joint L1 regularization
CN110175580A (en) * 2019-05-29 2019-08-27 复旦大学 A kind of video behavior recognition methods based on timing cause and effect convolutional network
CN110197235A (en) * 2019-06-28 2019-09-03 浙江大学城市学院 A kind of physical activity recognition methods based on unique attention mechanism
CN110222653A (en) * 2019-06-11 2019-09-10 中国矿业大学(北京) A kind of skeleton data Activity recognition method based on figure convolutional neural networks
CN110223706A (en) * 2019-03-06 2019-09-10 天津大学 Based on the environment self-adaption voice enhancement algorithm for paying attention to power drive cyclic convolution network
CN110334749A (en) * 2019-06-20 2019-10-15 浙江工业大学 Confrontation attack defending model, construction method and application based on attention mechanism
CN110399847A (en) * 2019-07-30 2019-11-01 北京字节跳动网络技术有限公司 Extraction method of key frame, device and electronic equipment
CN110458085A (en) * 2019-08-06 2019-11-15 中国海洋大学 Video behavior recognition methods based on attention enhancing three-dimensional space-time representative learning
CN110503053A (en) * 2019-08-27 2019-11-26 电子科技大学 Human motion recognition method based on cyclic convolution neural network
CN110502995A (en) * 2019-07-19 2019-11-26 南昌大学 Driver based on subtle facial action recognition yawns detection method
CN110555368A (en) * 2019-06-28 2019-12-10 西安理工大学 Fall-down behavior identification method based on three-dimensional convolutional neural network
CN110555523A (en) * 2019-07-23 2019-12-10 中建三局智能技术有限公司 short-range tracking method and system based on impulse neural network
CN110638455A (en) * 2019-09-26 2020-01-03 京东方科技集团股份有限公司 Server, system, device and medium for evaluating user rehabilitation status
CN110728183A (en) * 2019-09-09 2020-01-24 天津大学 Human body action recognition method based on attention mechanism neural network
CN110826389A (en) * 2019-09-02 2020-02-21 东华大学 Gait recognition method based on attention 3D frequency convolution neural network
CN110852273A (en) * 2019-11-12 2020-02-28 重庆大学 Behavior identification method based on reinforcement learning attention mechanism
CN111222466A (en) * 2020-01-08 2020-06-02 武汉大学 Remote sensing image landslide automatic detection method based on three-dimensional space-channel attention mechanism
CN111275694A (en) * 2020-02-06 2020-06-12 电子科技大学 Attention mechanism guided progressive division human body analytic model and method
CN111325161A (en) * 2020-02-25 2020-06-23 四川翼飞视科技有限公司 Method for constructing human face detection neural network based on attention mechanism
CN111325736A (en) * 2020-02-27 2020-06-23 成都航空职业技术学院 Sight angle estimation method based on human eye difference image
CN111353539A (en) * 2020-02-29 2020-06-30 武汉大学 Cervical OCT image classification method and system based on double-path attention convolutional neural network
CN111710008A (en) * 2020-05-29 2020-09-25 北京百度网讯科技有限公司 People stream density generation method and device, electronic device and storage medium
CN111783699A (en) * 2020-07-06 2020-10-16 周书田 Video face recognition method based on efficient decomposition convolution and time pyramid network
CN111932538A (en) * 2020-10-10 2020-11-13 平安科技(深圳)有限公司 Method, device, computer equipment and storage medium for analyzing thyroid gland atlas
CN112016400A (en) * 2020-08-04 2020-12-01 香港理工大学深圳研究院 Single-class target detection method and device based on deep learning and storage medium
CN112257572A (en) * 2020-10-20 2021-01-22 神思电子技术股份有限公司 Behavior identification method based on self-attention mechanism
CN112417932A (en) * 2019-08-23 2021-02-26 中移雄安信息通信科技有限公司 Method, device and equipment for identifying target object in video
CN112561918A (en) * 2020-12-31 2021-03-26 中移(杭州)信息技术有限公司 Convolutional neural network training method and focus segmentation method
CN112731309A (en) * 2021-01-06 2021-04-30 哈尔滨工程大学 Active interference identification method based on bilinear efficient neural network
CN112767539A (en) * 2021-01-12 2021-05-07 杭州师范大学 Image three-dimensional reconstruction method and system based on deep learning
CN112818948A (en) * 2021-03-09 2021-05-18 东南大学 Behavior identification method based on visual attention under embedded system
CN112818828A (en) * 2021-01-27 2021-05-18 中国科学技术大学 Weak supervision time domain action positioning method and system based on memory network
CN112862746A (en) * 2019-11-28 2021-05-28 深圳硅基智控科技有限公司 Tissue lesion identification method and system based on artificial neural network
CN112907607A (en) * 2021-03-15 2021-06-04 德鲁动力科技(成都)有限公司 Deep learning, target detection and semantic segmentation method based on differential attention
CN112990144A (en) * 2021-04-30 2021-06-18 德鲁动力科技(成都)有限公司 Data enhancement method and system for pedestrian re-identification
CN113033276A (en) * 2020-12-01 2021-06-25 神思电子技术股份有限公司 Behavior recognition method based on conversion module
CN113111760A (en) * 2021-04-07 2021-07-13 同济大学 Lightweight graph convolution human skeleton action identification method based on channel attention
CN113516028A (en) * 2021-04-28 2021-10-19 南通大学 Human body abnormal behavior identification method and system based on mixed attention mechanism
CN113657534A (en) * 2021-08-24 2021-11-16 北京经纬恒润科技股份有限公司 Classification method and device based on attention mechanism
JP2022521130A (en) * 2020-01-20 2022-04-06 上▲海▼商▲湯▼智能科技有限公司 Network training, image processing methods and electronics, storage media and computer programs

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106682616A (en) * 2016-12-28 2017-05-17 南京邮电大学 Newborn-painful-expression recognition method based on dual-channel-characteristic deep learning
CN107103277A (en) * 2017-02-28 2017-08-29 中科唯实科技(北京)有限公司 A kind of gait recognition method based on depth camera and 3D convolutional neural networks
CN107749067A (en) * 2017-09-13 2018-03-02 华侨大学 Fire hazard smoke detecting method based on kinetic characteristic and convolutional neural networks
CN107862275A (en) * 2017-11-01 2018-03-30 电子科技大学 Human bodys' response model and its construction method and Human bodys' response method

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106682616A (en) * 2016-12-28 2017-05-17 南京邮电大学 Newborn-painful-expression recognition method based on dual-channel-characteristic deep learning
CN107103277A (en) * 2017-02-28 2017-08-29 中科唯实科技(北京)有限公司 A kind of gait recognition method based on depth camera and 3D convolutional neural networks
CN107749067A (en) * 2017-09-13 2018-03-02 华侨大学 Fire hazard smoke detecting method based on kinetic characteristic and convolutional neural networks
CN107862275A (en) * 2017-11-01 2018-03-30 电子科技大学 Human bodys' response model and its construction method and Human bodys' response method

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
JIANLONG FU 等: "Look Closer to See Better: Recurrent Attention Convolutional Neural Network", 《2017 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION》 *
张颖 等: "基于3D卷积神经网络的人体行为识别方法", 《软件导刊》 *

Cited By (83)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109784144A (en) * 2018-11-29 2019-05-21 北京邮电大学 A kind of kinship recognition methods and system
CN109598225A (en) * 2018-11-29 2019-04-09 浙江大学 Sharp attention network, neural network and pedestrian's recognition methods again
CN109635926B (en) * 2018-11-30 2021-11-05 深圳市商汤科技有限公司 Attention feature acquisition method and device for neural network and storage medium
CN109635926A (en) * 2018-11-30 2019-04-16 深圳市商汤科技有限公司 Attention characteristic-acquisition method, device and storage medium for neural network
CN109657571A (en) * 2018-12-04 2019-04-19 北京京东金融科技控股有限公司 A kind of childbirth monitoring method and device
CN109635790A (en) * 2019-01-28 2019-04-16 杭州电子科技大学 A kind of pedestrian's abnormal behaviour recognition methods based on 3D convolution
CN109961019A (en) * 2019-02-28 2019-07-02 华中科技大学 A kind of time-space behavior detection method
CN109961019B (en) * 2019-02-28 2021-03-26 华中科技大学 Space-time behavior detection method
CN109829512A (en) * 2019-03-01 2019-05-31 华东师范大学 A kind of image recognition mould group based on deep neural network
CN110008993A (en) * 2019-03-01 2019-07-12 华东师范大学 A kind of end-to-end image-recognizing method based on deep neural network
CN110223706A (en) * 2019-03-06 2019-09-10 天津大学 Based on the environment self-adaption voice enhancement algorithm for paying attention to power drive cyclic convolution network
CN110223706B (en) * 2019-03-06 2021-05-07 天津大学 Environment self-adaptive speech enhancement algorithm based on attention-driven cyclic convolution network
CN109919110A (en) * 2019-03-13 2019-06-21 北京航空航天大学 Video area-of-interest-detection method, device and equipment
CN109919110B (en) * 2019-03-13 2021-06-04 北京航空航天大学 Video attention area detection method, device and equipment
CN110110642A (en) * 2019-04-29 2019-08-09 华南理工大学 A kind of pedestrian's recognition methods again based on multichannel attention feature
CN110166388A (en) * 2019-05-25 2019-08-23 西南电子技术研究所(中国电子科技集团公司第十研究所) The intelligence communication signal modulation mode identification method of CNN joint L1 regularization
CN110175580A (en) * 2019-05-29 2019-08-27 复旦大学 A kind of video behavior recognition methods based on timing cause and effect convolutional network
CN110175580B (en) * 2019-05-29 2020-10-30 复旦大学 Video behavior identification method based on time sequence causal convolutional network
CN110163302B (en) * 2019-06-02 2022-03-22 东北石油大学 Indicator diagram identification method based on regularization attention convolution neural network
CN110163302A (en) * 2019-06-02 2019-08-23 东北石油大学 Indicator card recognition methods based on regularization attention convolutional neural networks
CN110222653A (en) * 2019-06-11 2019-09-10 中国矿业大学(北京) A kind of skeleton data Activity recognition method based on figure convolutional neural networks
CN110222653B (en) * 2019-06-11 2020-06-16 中国矿业大学(北京) Skeleton data behavior identification method based on graph convolution neural network
CN110334749A (en) * 2019-06-20 2019-10-15 浙江工业大学 Confrontation attack defending model, construction method and application based on attention mechanism
CN110197235B (en) * 2019-06-28 2021-03-30 浙江大学城市学院 Human body activity recognition method based on unique attention mechanism
CN110555368A (en) * 2019-06-28 2019-12-10 西安理工大学 Fall-down behavior identification method based on three-dimensional convolutional neural network
CN110197235A (en) * 2019-06-28 2019-09-03 浙江大学城市学院 A kind of physical activity recognition methods based on unique attention mechanism
CN110555368B (en) * 2019-06-28 2022-05-03 西安理工大学 Fall-down behavior identification method based on three-dimensional convolutional neural network
CN110502995A (en) * 2019-07-19 2019-11-26 南昌大学 Driver based on subtle facial action recognition yawns detection method
CN110502995B (en) * 2019-07-19 2023-03-14 南昌大学 Driver yawning detection method based on fine facial action recognition
CN110555523B (en) * 2019-07-23 2022-03-29 中建三局智能技术有限公司 Short-range tracking method and system based on impulse neural network
CN110555523A (en) * 2019-07-23 2019-12-10 中建三局智能技术有限公司 short-range tracking method and system based on impulse neural network
CN110399847A (en) * 2019-07-30 2019-11-01 北京字节跳动网络技术有限公司 Extraction method of key frame, device and electronic equipment
CN110399847B (en) * 2019-07-30 2021-11-09 北京字节跳动网络技术有限公司 Key frame extraction method and device and electronic equipment
CN110458085A (en) * 2019-08-06 2019-11-15 中国海洋大学 Video behavior recognition methods based on attention enhancing three-dimensional space-time representative learning
CN110458085B (en) * 2019-08-06 2022-02-08 中国海洋大学 Video behavior identification method based on attention-enhanced three-dimensional space-time representation learning
CN112417932A (en) * 2019-08-23 2021-02-26 中移雄安信息通信科技有限公司 Method, device and equipment for identifying target object in video
CN112417932B (en) * 2019-08-23 2023-04-07 中移雄安信息通信科技有限公司 Method, device and equipment for identifying target object in video
CN110503053B (en) * 2019-08-27 2022-07-08 电子科技大学 Human body action recognition method based on cyclic convolution neural network
CN110503053A (en) * 2019-08-27 2019-11-26 电子科技大学 Human motion recognition method based on cyclic convolution neural network
CN110826389B (en) * 2019-09-02 2022-05-27 东华大学 Gait recognition method based on attention 3D frequency convolution neural network
CN110826389A (en) * 2019-09-02 2020-02-21 东华大学 Gait recognition method based on attention 3D frequency convolution neural network
CN110728183B (en) * 2019-09-09 2023-09-22 天津大学 Human body action recognition method of neural network based on attention mechanism
CN110728183A (en) * 2019-09-09 2020-01-24 天津大学 Human body action recognition method based on attention mechanism neural network
CN110638455A (en) * 2019-09-26 2020-01-03 京东方科技集团股份有限公司 Server, system, device and medium for evaluating user rehabilitation status
CN110638455B (en) * 2019-09-26 2022-06-14 京东方科技集团股份有限公司 Server, system, device and medium for evaluating user rehabilitation status
CN110852273A (en) * 2019-11-12 2020-02-28 重庆大学 Behavior identification method based on reinforcement learning attention mechanism
CN112862746A (en) * 2019-11-28 2021-05-28 深圳硅基智控科技有限公司 Tissue lesion identification method and system based on artificial neural network
CN112862746B (en) * 2019-11-28 2022-09-02 深圳硅基智控科技有限公司 Tissue lesion identification method and system based on artificial neural network
CN111222466B (en) * 2020-01-08 2022-04-01 武汉大学 Remote sensing image landslide automatic detection method based on three-dimensional space-channel attention mechanism
CN111222466A (en) * 2020-01-08 2020-06-02 武汉大学 Remote sensing image landslide automatic detection method based on three-dimensional space-channel attention mechanism
JP2022521130A (en) * 2020-01-20 2022-04-06 上▲海▼商▲湯▼智能科技有限公司 Network training, image processing methods and electronics, storage media and computer programs
CN111275694A (en) * 2020-02-06 2020-06-12 电子科技大学 Attention mechanism guided progressive division human body analytic model and method
CN111325161A (en) * 2020-02-25 2020-06-23 四川翼飞视科技有限公司 Method for constructing human face detection neural network based on attention mechanism
CN111325161B (en) * 2020-02-25 2023-04-18 四川翼飞视科技有限公司 Method for constructing human face detection neural network based on attention mechanism
CN111325736A (en) * 2020-02-27 2020-06-23 成都航空职业技术学院 Sight angle estimation method based on human eye difference image
CN111325736B (en) * 2020-02-27 2024-02-27 成都航空职业技术学院 Eye differential image-based sight angle estimation method
CN111353539A (en) * 2020-02-29 2020-06-30 武汉大学 Cervical OCT image classification method and system based on double-path attention convolutional neural network
CN111710008A (en) * 2020-05-29 2020-09-25 北京百度网讯科技有限公司 People stream density generation method and device, electronic device and storage medium
CN111783699A (en) * 2020-07-06 2020-10-16 周书田 Video face recognition method based on efficient decomposition convolution and time pyramid network
CN112016400B (en) * 2020-08-04 2021-06-29 香港理工大学深圳研究院 Single-class target detection method and device based on deep learning and storage medium
CN112016400A (en) * 2020-08-04 2020-12-01 香港理工大学深圳研究院 Single-class target detection method and device based on deep learning and storage medium
CN111932538A (en) * 2020-10-10 2020-11-13 平安科技(深圳)有限公司 Method, device, computer equipment and storage medium for analyzing thyroid gland atlas
CN111932538B (en) * 2020-10-10 2021-01-15 平安科技(深圳)有限公司 Method, device, computer equipment and storage medium for analyzing thyroid gland atlas
CN112257572A (en) * 2020-10-20 2021-01-22 神思电子技术股份有限公司 Behavior identification method based on self-attention mechanism
CN112257572B (en) * 2020-10-20 2022-02-01 神思电子技术股份有限公司 Behavior identification method based on self-attention mechanism
CN113033276B (en) * 2020-12-01 2022-05-17 神思电子技术股份有限公司 Behavior recognition method based on conversion module
CN113033276A (en) * 2020-12-01 2021-06-25 神思电子技术股份有限公司 Behavior recognition method based on conversion module
CN112561918A (en) * 2020-12-31 2021-03-26 中移(杭州)信息技术有限公司 Convolutional neural network training method and focus segmentation method
CN112731309A (en) * 2021-01-06 2021-04-30 哈尔滨工程大学 Active interference identification method based on bilinear efficient neural network
CN112767539B (en) * 2021-01-12 2023-08-08 杭州师范大学 Image three-dimensional reconstruction method and system based on deep learning
CN112767539A (en) * 2021-01-12 2021-05-07 杭州师范大学 Image three-dimensional reconstruction method and system based on deep learning
CN112818828B (en) * 2021-01-27 2022-09-09 中国科学技术大学 Weak supervision time domain action positioning method and system based on memory network
CN112818828A (en) * 2021-01-27 2021-05-18 中国科学技术大学 Weak supervision time domain action positioning method and system based on memory network
CN112818948A (en) * 2021-03-09 2021-05-18 东南大学 Behavior identification method based on visual attention under embedded system
CN112818948B (en) * 2021-03-09 2022-03-29 东南大学 Behavior identification method based on visual attention under embedded system
CN112907607A (en) * 2021-03-15 2021-06-04 德鲁动力科技(成都)有限公司 Deep learning, target detection and semantic segmentation method based on differential attention
CN113111760A (en) * 2021-04-07 2021-07-13 同济大学 Lightweight graph convolution human skeleton action identification method based on channel attention
CN113111760B (en) * 2021-04-07 2023-05-02 同济大学 Light-weight graph convolution human skeleton action recognition method based on channel attention
CN113516028B (en) * 2021-04-28 2024-01-19 南通大学 Human body abnormal behavior identification method and system based on mixed attention mechanism
CN113516028A (en) * 2021-04-28 2021-10-19 南通大学 Human body abnormal behavior identification method and system based on mixed attention mechanism
CN112990144B (en) * 2021-04-30 2021-08-17 德鲁动力科技(成都)有限公司 Data enhancement method and system for pedestrian re-identification
CN112990144A (en) * 2021-04-30 2021-06-18 德鲁动力科技(成都)有限公司 Data enhancement method and system for pedestrian re-identification
CN113657534A (en) * 2021-08-24 2021-11-16 北京经纬恒润科技股份有限公司 Classification method and device based on attention mechanism

Also Published As

Publication number Publication date
CN108830157B (en) 2021-01-22

Similar Documents

Publication Publication Date Title
CN108830157A (en) Human bodys' response method based on attention mechanism and 3D convolutional neural networks
CN110458844B (en) Semantic segmentation method for low-illumination scene
CN109389055B (en) Video classification method based on mixed convolution and attention mechanism
CN109740419A (en) A kind of video behavior recognition methods based on Attention-LSTM network
CN105095833B (en) For the network establishing method of recognition of face, recognition methods and system
WO2023185243A1 (en) Expression recognition method based on attention-modulated contextual spatial information
CN113496217B (en) Method for identifying human face micro expression in video image sequence
Cheng et al. Facial expression recognition method based on improved VGG convolutional neural network
CN108133188A (en) A kind of Activity recognition method based on motion history image and convolutional neural networks
CN104217214B (en) RGB D personage's Activity recognition methods based on configurable convolutional neural networks
CN105678284B (en) A kind of fixed bit human body behavior analysis method
CN107341452A (en) Human bodys' response method based on quaternary number space-time convolutional neural networks
CN109961034A (en) Video object detection method based on convolution gating cycle neural unit
CN106485214A (en) A kind of eyes based on convolutional neural networks and mouth state identification method
CN107609460A (en) A kind of Human bodys' response method for merging space-time dual-network stream and attention mechanism
CN106909938B (en) Visual angle independence behavior identification method based on deep learning network
CN111652903B (en) Pedestrian target tracking method based on convolution association network in automatic driving scene
CN110378208B (en) Behavior identification method based on deep residual error network
CN104933722A (en) Image edge detection method based on Spiking-convolution network model
CN107657204A (en) The construction method and facial expression recognizing method and system of deep layer network model
CN109817276A (en) A kind of secondary protein structure prediction method based on deep neural network
CN109934158A (en) Video feeling recognition methods based on local strengthening motion history figure and recursive convolution neural network
Xu et al. Recurrent convolutional neural network for video classification
CN110232361B (en) Human behavior intention identification method and system based on three-dimensional residual dense network
CN108921047A (en) A kind of multi-model ballot mean value action identification method based on cross-layer fusion

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right

Effective date of registration: 20220810

Address after: 071003 Hebei province Baoding Yonghua No. 619 North Street

Patentee after: NORTH CHINA ELECTRIC POWER University (BAODING)

Patentee after: China Power Investment Northeast Energy Technology Co.,Ltd.

Address before: 071003 Hebei province Baoding Yonghua No. 619 North Street

Patentee before: NORTH CHINA ELECTRIC POWER University (BAODING)

TR01 Transfer of patent right