CN110321761A - A kind of Activity recognition method, terminal device and computer readable storage medium - Google Patents
A kind of Activity recognition method, terminal device and computer readable storage medium Download PDFInfo
- Publication number
- CN110321761A CN110321761A CN201810272399.7A CN201810272399A CN110321761A CN 110321761 A CN110321761 A CN 110321761A CN 201810272399 A CN201810272399 A CN 201810272399A CN 110321761 A CN110321761 A CN 110321761A
- Authority
- CN
- China
- Prior art keywords
- network
- sub
- layer
- base net
- indicate
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/25—Fusion techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/29—Graphical models, e.g. Bayesian networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/20—Movements or behaviour, e.g. gesture recognition
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Computer Vision & Pattern Recognition (AREA)
- General Physics & Mathematics (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Evolutionary Biology (AREA)
- Evolutionary Computation (AREA)
- Bioinformatics & Computational Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Artificial Intelligence (AREA)
- Life Sciences & Earth Sciences (AREA)
- Health & Medical Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Psychiatry (AREA)
- Social Psychology (AREA)
- Human Computer Interaction (AREA)
- Multimedia (AREA)
- Image Analysis (AREA)
Abstract
The application is suitable for nerual network technique field, provide a kind of Activity recognition method, terminal device and computer readable storage medium, the described method includes: building includes the identification model of at least two sub-networks, and each sub-network in the identification model is trained respectively, after training, video sequence to be identified is identified by each sub-network, obtain initial recognition result corresponding with each sub-network, Activity recognition result will be obtained after the corresponding initial recognition result fusion of each sub-network, the robustness of Activity recognition method can be improved by the application.
Description
Technical field
The application belongs to nerual network technique field more particularly to a kind of Activity recognition method, terminal device and computer
Readable storage medium storing program for executing.
Background technique
Activity recognition has been widely used for video monitoring, human-computer interaction, robot as an important field of research
Learn etc..Also, with the development of inexpensive depth transducer, the three-dimensional coordinate point of skeleton joint also can be accurately recorded,
This just provides advantageous help for the development of Activity recognition.
Currently, the Activity recognition based on 3D video sequence, mainly using the algorithm based on recurrent neural network and based on 2 dimensions
The algorithm of convolutional neural networks.However, both methods can not accurately be extracted simultaneously from time dimension and Spatial Dimension
Feature.Therefore, there is poor robustness in current Activity recognition method.
Summary of the invention
In view of this, the embodiment of the present application provides a kind of Activity recognition method, terminal device and computer-readable storage
Medium, to solve the problems, such as current Activity recognition method, there are poor robustness.
The first aspect of the embodiment of the present application provides a kind of Activity recognition method, comprising:
Building include at least two sub-networks identification model, and to each sub-network in the identification model respectively into
Row training;
After training, video sequence to be identified is identified by each sub-network, is obtained corresponding with each sub-network
Initial recognition result;
Activity recognition result will be obtained after the corresponding initial recognition result fusion of each sub-network.
The second aspect of the embodiment of the present application provides a kind of terminal device, comprising:
Training module is constructed, for constructing the identification model including at least two sub-networks, and in the identification model
Each sub-network be trained respectively;
Initial recognition result obtains module, for identifying video to be identified by each sub-network after training
Sequence obtains initial recognition result corresponding with each sub-network;
Activity recognition result obtains module, will obtain Activity recognition after the corresponding initial recognition result fusion of each sub-network
As a result.
The third aspect of the embodiment of the present application provides a kind of terminal device, including memory, processor and is stored in
In the memory and the computer program that can run on the processor, when the processor executes the computer program
The step of realizing the method that the embodiment of the present application first aspect provides.
The fourth aspect of the embodiment of the present application provides a kind of computer readable storage medium, the computer-readable storage
Media storage has computer program, and the computer program realizes the embodiment of the present application when being executed by one or more processors
On the one hand the step of the method provided.
5th aspect of the embodiment of the present application provides a kind of computer program product, and the computer program product includes
Computer program, the computer program realize that the embodiment of the present application first aspect provides when being executed by one or more processors
The method the step of.
The embodiment of the present application building includes the identification model of at least two sub-networks, and to each of described identification model
Sub-network is trained respectively, after training, is identified video sequence to be identified by each sub-network, is obtained and each
The corresponding initial recognition result of sub-network will obtain Activity recognition knot after the corresponding initial recognition result fusion of each sub-network
Fruit, since the corresponding behavioural characteristic of each sub-network can be extracted by different sub-network network, and finally by each sub-network
Test result is merged, and can be improved the robustness of Activity recognition method.
Detailed description of the invention
It in order to more clearly explain the technical solutions in the embodiments of the present application, below will be to embodiment or description of the prior art
Needed in attached drawing be briefly described, it should be apparent that, the accompanying drawings in the following description is only some of the application
Embodiment for those of ordinary skill in the art without any creative labor, can also be according to these
Attached drawing obtains other attached drawings.
Fig. 1 is a kind of implementation process schematic diagram for Activity recognition method that one embodiment of the application provides;
Fig. 2 is a kind of structural schematic diagram for base net network that one embodiment of the application provides;
Fig. 3 is a kind of structural schematic diagram for binary-flow network that one embodiment of the application provides;
Fig. 4 is a kind of structural schematic diagram for limbs separated network that one embodiment of the application provides;
Fig. 5 is a kind of structural schematic diagram for attention network that one embodiment of the application provides;
Fig. 6 is the schematic block diagram for the terminal device that one embodiment of the application provides;
Fig. 7 is the schematic block diagram for the terminal device that the another embodiment of the application provides.
Specific embodiment
In being described below, for illustration and not for limitation, the tool of such as particular system structure, technology etc is proposed
Body details, so as to provide a thorough understanding of the present application embodiment.However, it will be clear to one skilled in the art that there is no these specific
The application also may be implemented in the other embodiments of details.In other situations, it omits to well-known system, device, electricity
The detailed description of road and method, so as not to obscure the description of the present application with unnecessary details.
It should be appreciated that ought use in this specification and in the appended claims, term " includes " instruction is described special
Sign, entirety, step, operation, the presence of element and/or component, but be not precluded one or more of the other feature, entirety, step,
Operation, the presence or addition of element, component and/or its set.
It is also understood that mesh of the term used in this present specification merely for the sake of description specific embodiment
And be not intended to limit the application.As present specification and it is used in the attached claims, unless on
Other situations are hereafter clearly indicated, otherwise " one " of singular, "one" and "the" are intended to include plural form.
It will be further appreciated that the term "and/or" used in present specification and the appended claims is
Refer to any combination and all possible combinations of one or more of associated item listed, and including these combinations.
As used in this specification and in the appended claims, term " if " can be according to context quilt
Be construed to " when ... " or " once " or " in response to determination " or " in response to detecting ".Similarly, phrase " if it is determined that " or
" if detecting [described condition or event] " can be interpreted to mean according to context " once it is determined that " or " in response to true
It is fixed " or " once detecting [described condition or event] " or " in response to detecting [described condition or event] ".
In order to illustrate technical solution described herein, the following is a description of specific embodiments.
Fig. 1 is the implementation process schematic diagram for the Activity recognition method that one embodiment of the application provides, this method as shown in the figure
It may comprise steps of:
Step S101, building include the identification model of at least two sub-networks, and to every height in the identification model
Network is trained respectively.
In the embodiment of the present application, it may include 4 sub-networks for the model of Activity recognition, also may include than 4
The more sub-networks of sub-network, it is, of course, also possible to only include one or more of sub-networks.The sub-network includes: double
Flow network, limbs separated network, attention network, frame difference network.The binary-flow network, limbs separated network, attention network,
Frame difference network is all made of base net network.
As shown in Fig. 2, the base net network includes:
Sequentially connected one-dimensional convolutional layer, at least two basic blocks, mean value pond layer and full articulamentum;
It is connected between two adjacent basic blocks by residual error, the residual error is indicated by following formula:
xi+1=Flayer(xi)+xi
Wherein, xiIt is the input of i-th of basic block, xi+1It is the output of i-th of basic block, xi+1It is also that i+1 is basic
The input of block.
Include three basic blocks: Block1, Block2, Block3 in Fig. 2, be before Block1 one-dimensional convolutional layer,
It is full articulamentum that the subsequent Avg pool of Block3, which is mean value pond layer, the mean value pond subsequent Fc of layer,.
The basic block includes:
At least two convolutional layers are provided with batch normalization layer, nonlinear activation function and dropout between convolutional layer
Layer, the batch normalization layer, nonlinear activation function and dropout layers are indicated by following formula:
Flayer(x)=Dropout (ReLU (BN (f (x*w))))
Wherein, w indicates the weight of convolution kernel, and the x indicates the input of convolutional layer, and the f (x*w) indicates a upper convolution
The output of layer, also illustrates that the input of batch normalization layer, and the BN (f (x*w)) indicates the output of batch normalization layer, also illustrates that
The input of nonlinear activation function;The ReLU (BN (f (x*w))) indicates the output of nonlinear activation function, also illustrates that
Dropout layers of input, Flayer(x) output for indicating dropout layers, also illustrates that the input of next convolutional layer, and * indicates convolution
Operation.
As shown in Fig. 2, right part is a kind of structure of basic block provided by the embodiments of the present application in figure, wherein
Conv1D indicates that convolutional layer, the batch normalization indicate that batch normalizes layer, it is also possible to BN expression, ReLU
Indicate nonlinear activation function, be finally dropout layer, the convolutional layer, batch normalize layer, nonlinear activation function,
Dropout layers are sequentially connected.
The binary-flow network includes:
Base net network and the base net network on spatial flow is softmax layers corresponding, the base net network in time flow and the base net network
It is softmax layers corresponding.
In the embodiment of the present application, object to be identified is video sequence, is all had in each image in video sequence
The three-dimensional coordinate of human skeleton, in this way, just will appear two dimensions: time dimension and Spatial Dimension.Time dimension has recorded people
The motion information of body, while Spatial Dimension has recorded the interactive information in the important joint of human body.
As shown in figure 3, being the binary-flow network that the application one embodiment provides, joints indicates to close as illustrated in the drawing
Node, time indicate time series, wherein spatial stream representation space stream, and temporal stream indicates the time
Stream, score fusion indicate score fusion.
The limbs separated network includes:
Five base net networks and, five base softmax layers corresponding with each base net network in five base net networks
Network respectively corresponds five parts of human body: trunk, left arm, right arm, left leg, right leg.
As shown in figure 4, being the limbs separated network that one embodiment of the application provides, j1 to j20 illustrates the human body of label
In 20 artis three-dimensional coordinate, T indicate video sequence in share T image.
In practical application, some behaviors of people are only related to part of limb, and for waving, only arm has participated in fortune
In dynamic, other parts are all static.So human body can be divided into five parts, other can also be carried out in practical application
Division mode.Limbs separated network can capture subtle limb motion information, while can also learn to behavior class
The big limbs of other contribution degree can be regarded as the attention mechanism based on limbs for this angle.In this network, volume
Product core only slides on time dimension.
The attention network includes:
The base net network of attention mechanism is merged, the attention mechanism includes: two full articulamentums, softmax layers.
As shown in figure 5, being the attention network that one embodiment of the application provides, a video comprising behavior is considered as
The set of Time Continuous frame, but not all frame has identical importance, and some frames even can cause to mislead to classification
Information, meanwhile, the different channels of feature are also different to the contribution margin of behavior classification in network, therefore we need to design a note
Meaning power mechanism learns important frame and feature channel.In fig. 5 it can be seen that the volume of base net network is arranged in attention mechanism
Behind the basic block (layer or Block) of lamination or base net network, for being arranged in behind basic block, basic block it is defeated
Enter to carry out transmission by basic block and obtain corresponding characteristic value, while the input of basic block can also be connected entirely by first
Connect layer (FC layer1), activation primitive (activation), second full articulamentum (FC layer2) and softmax layers
It carries out transmission and obtains normalized weight, normalized weight is similar to for pointing out which frame or which feature channel are important
Weight in mathematical formulae is obtained with two classes transmission result (feature and the corresponding weight of each feature) in this way, will obtain
Two classes transmission result be multiplied be added again by way of (by each feature multiplied by the corresponding weight of this feature after carry out again it is tired
Add) it calculates and obtains next layer of input.
The frame difference network includes:
Base net network and softmax layers corresponding with the base net network.It, most can area for different classes of behavior
The feature of point property is exactly motion information, but original frame sequence can not directly indicate motion information.
Step S102 identifies video sequence to be identified by each sub-network, obtains and every height after training
The corresponding initial recognition result of network.
In the embodiment of the present application, each sub-network in identification model is independently trained, and is based on the double fluid
Network, limbs separated network, attention network and frame difference network, the loss function used when training are to intersect entropy loss letter
Number:
Wherein, yiIndicate true class label,Prediction label is represented, n indicates class number.
Explanation during being trained below for each sub-network.
Based on the binary-flow network, the convolution kernel on spatial flow is slided in Spatial Dimension, with the base on the spatial flow
The corresponding softmax layers of score obtained on spatial flow of network, the convolution kernel in time flow slide on time dimension, with institute
State the corresponding softmax layers of score obtained in time flow of base net network in time flow, by the spatial flow score and institute
The score stated in time flow is multiplied, to be trained end to end;
Based on the limbs separated network, five Partial Features of human body are respectively fed to corresponding five base net networks, are obtained
The corresponding score of five base net networks is multiplied, to be trained by the corresponding score of five base net networks;
Based on the attention network, the attention mechanism is arranged in behind convolutional layer or basic block, the attention
Power mechanism are as follows:
yc1=Activation (W1xic+b1)
yc2=W2yc1+b2
xoc=F (xic)
Wherein, the c indicates c-th of channel of attention mechanism input, xicOne layer is indicated in the attention mechanism
Output, the W1Indicate the weight of first full articulamentum, the b1Indicate the biasing of first full articulamentum, yc1Indicate first
The output of a full articulamentum also illustrates that the input of second full articulamentum, the W2Indicate the weight of second full articulamentum, institute
State b2Indicate the biasing of second full articulamentum, yc2Indicate the output and softmax layers of input of second full articulamentum,
WαIndicate the attention weight learnt, the αcIndicate the normalized weight of softmax layers of acquisition, xocIndicate the attention
One layer of output in mechanism, O indicate lower layer of the attention mechanism of input;
Based on the frame difference network, convolution kernel slides in timing, the input of the frame difference network are as follows:
Sm={ M2,M3,…Mt,…MN,
Wherein, the SmIndicate the input of the frame difference network, the Mt=Ft-Ft-1, the Ft={ J1,J2, Ji,…
Jt, the Ji=(xi,yi,zi), N indicates shared N frame video sequence;
It is also understood that a three-dimensional body joint point coordinate J=(x, y, z), F can be indicated at t framet={ J1,J2,
Ji,…JN, meanwhile, one has the video of N frame that can be expressed as S={ F1,F2,…Ft,…FN}.The motion information of skeleton is by such as
Lower formula calculates: Mt=Ft-Ft-1, the motion information of such video may be expressed as: Sm={ M2,M3,…Mt,…MN}。
Step S103 will obtain Activity recognition result after the corresponding initial recognition result fusion of each sub-network.
In the embodiment of the present application, it can be merged in the following manner,
Pass through formulaOrThe corresponding initial recognition result fusion of each sub-network is obtained
Activity recognition as a result,
Wherein, the ytestIndicate Activity recognition as a result, the yiIndicate i-th of sub-network, n indicates the identification model
It altogether include n sub-network.
It is describedIndicate the fusion that is multiplied, the initial recognition result that each sub-network is obtained is multiplied, describedIt indicates to be added fusion, the initial recognition result that each sub-network is obtained is added.
The embodiment of the present application building includes the identification model of at least two sub-networks, and to each of described identification model
Sub-network is trained respectively, after training, is identified video sequence to be identified by each sub-network, is obtained and each
The corresponding initial recognition result of sub-network will obtain Activity recognition knot after the corresponding initial recognition result fusion of each sub-network
Fruit, since the corresponding behavioural characteristic of each sub-network can be extracted by different sub-network network, and finally by each sub-network
Test result is merged, and can be improved the robustness of Activity recognition method.
It should be understood that the size of the serial number of each step is not meant that the order of the execution order in above-described embodiment, each process
Execution sequence should be determined by its function and internal logic, the implementation process without coping with the embodiment of the present application constitutes any limit
It is fixed.
Fig. 6 is that the schematic block diagram for the terminal device that one embodiment of the application provides only is shown and this Shen for ease of description
It please the relevant part of embodiment.
The terminal device 6 can be the software being built in the existing terminal device such as mobile phone, notebook, computer
The unit of unit, hardware cell or soft or hard combination, can also be used as independent pendant be integrated into the existing such as mobile phone,
In the terminal devices such as notebook, computer, it is also used as independent terminal device and exists.
The terminal device 6 includes:
Training module 61 is constructed, for constructing the identification model including at least two sub-networks, and to the identification model
In each sub-network be trained respectively;
Initial recognition result obtains module 62, for identifying view to be identified by each sub-network after training
Frequency sequence obtains initial recognition result corresponding with each sub-network;
Activity recognition result obtains module 63, knows behavior is obtained after the corresponding initial recognition result fusion of each sub-network
Other result.
Optionally, the sub-network includes: binary-flow network, limbs separated network, attention network, frame difference network.
Optionally, the binary-flow network includes:
Base net network and the base net network on spatial flow is softmax layers corresponding, the base net network in time flow and the base net network
It is softmax layers corresponding;
The limbs separated network includes:
Five base net networks and, five base softmax layers corresponding with each base net network in five base net networks
Network respectively corresponds five parts of human body: trunk, left arm, right arm, left leg, right leg;
The attention network includes:
The base net network of attention mechanism is merged, the attention mechanism includes: two full articulamentums, softmax layers;
The frame difference network includes:
Base net network and softmax layers corresponding with the base net network.
Optionally, the base net network includes:
Sequentially connected one-dimensional convolutional layer, at least two basic blocks, mean value pond layer and full articulamentum;
It is connected between two adjacent basic blocks by residual error, the residual error is indicated by following formula:
xi+1=Flayer(xi)+xi
Wherein, xiIt is the input of i-th of basic block, xi+1It is the output of i-th of basic block, xi+1It is also that i+1 is basic
The input of block.
Optionally, the basic block includes:
At least two convolutional layers are provided with batch normalization layer, nonlinear activation function and dropout between convolutional layer
Layer, the batch normalization layer, nonlinear activation function and dropout layers are indicated by following formula:
Flayer(x)=Dropout (ReLU (BN (f (x*w))))
Wherein, w indicates the weight of convolution kernel, and the x indicates the input of convolutional layer, and the f (x*w) indicates a upper convolution
The output of layer, also illustrates that the input of batch normalization layer, and the BN (f (x*w)) indicates the output of batch normalization layer, also illustrates that
The input of nonlinear activation function;The ReLU (BN (f (x*w))) indicates the output of nonlinear activation function, also illustrates that
Dropout layers of input, Flayer(x) output for indicating dropout layers, also illustrates that the input of next convolutional layer.
Optionally, the building training module 61 includes:
Binary-flow network training unit, for being based on the binary-flow network, the convolution kernel on spatial flow is slided in Spatial Dimension,
Convolution kernel in the score that softmax layers corresponding with the base net network on the spatial flow obtains on spatial flow, time flow exists
It is slided on time dimension, it, will in the score that softmax layers corresponding with the base net network in the time flow obtains in time flow
Score on the spatial flow is multiplied with the score in the time flow, to be trained end to end;
Limbs separated network training unit distinguishes five Partial Features of human body for being based on the limbs separated network
Corresponding five base net networks are sent into, the corresponding score of five base net networks is obtained, five base net networks are corresponding
Score is multiplied, to be trained;
Attention network training unit, for being based on the attention network, the attention mechanism is arranged in convolutional layer
Or behind basic block, the attention mechanism are as follows:
yc1=Activation (W1xic+b1)
yc2=W2yc1+b2
xoc=F (xic)
Wherein, the c indicates c-th of channel of attention mechanism input, xicOne layer is indicated in the attention mechanism
Output, the W1Indicate the weight of first full articulamentum, the b1Indicate the biasing of first full articulamentum, yc1Indicate first
The output of a full articulamentum also illustrates that the input of second full articulamentum, the W2Indicate the weight of second full articulamentum, institute
State b2Indicate the biasing of second full articulamentum, yc2Indicate the output and softmax layers of input of second full articulamentum,
WαIndicate the attention weight learnt, the αcIndicate the normalized weight of softmax layers of acquisition, xocIndicate the attention
One layer of output in mechanism, O indicate lower layer of the attention mechanism of input;
Frame difference network training unit, for being based on the frame difference network, convolution kernel slides in timing, the frame difference network
Input are as follows:
Sm={ M2,M3,…Mt,…MN,
Wherein, the SmIndicate the input of the frame difference network, the Mt=Ft-Ft-1, the Ft={ J1,J2, Ji,…
Jt, the Ji=(xi,yi,zi), N indicates shared N frame video sequence;
Based on the binary-flow network, limbs separated network, attention network and frame difference network, the loss letter used when training
Number is cross entropy loss function:
Wherein, yiIndicate true class label,Prediction label is represented, n indicates class number.
Optionally, the Activity recognition result obtains module 63 and is also used to:
Pass through formulaOrThe corresponding initial recognition result fusion of each sub-network is obtained
Activity recognition as a result,
Wherein, the ytestIndicate Activity recognition as a result, the yiIndicate i-th of sub-network, n indicates the identification model
It altogether include n sub-network.
It is apparent to those skilled in the art that for convenience of description and succinctly, only with above-mentioned each function
Can unit, module division progress for example, in practical application, can according to need and by above-mentioned function distribution by different
Functional unit, module are completed, i.e., the internal structure of the terminal device is divided into different functional unit or module, to complete
All or part of function described above.Each functional unit in embodiment, module can integrate in one processing unit,
It is also possible to each unit to physically exist alone, can also be integrated in one unit with two or more units, above-mentioned collection
At unit both can take the form of hardware realization, can also realize in the form of software functional units.In addition, each function
Unit, module specific name be also only for convenience of distinguishing each other, the protection scope being not intended to limit this application.Above-mentioned dress
The specific work process for setting middle unit, module, can refer to corresponding processes in the foregoing method embodiment, and details are not described herein.
Fig. 7 is the schematic block diagram for the terminal device that the another embodiment of the application provides.As shown in fig. 7, the end of the embodiment
End equipment 7 includes: one or more processors 70, memory 71 and is stored in the memory 71 and can be in the processing
The computer program 72 run on device 70.The processor 70 realizes that above-mentioned each behavior is known when executing the computer program 72
Step in other embodiment of the method, such as step S101 to S103 shown in Fig. 1.Alternatively, the processor 70 executes the meter
The function of each module/unit in above-mentioned terminal device embodiment, such as module 61 to 63 shown in Fig. 6 are realized when calculation machine program 72
Function.
Illustratively, the computer program 72 can be divided into one or more module/units, it is one or
Multiple module/units are stored in the memory 71, and are executed by the processor 70, to complete the application.Described one
A or multiple module/units can be the series of computation machine program instruction section that can complete specific function, which is used for
Implementation procedure of the computer program 72 in the terminal device 7 is described.For example, the computer program 72 can be divided
It is cut into building training module, initial recognition result obtains module, Activity recognition result obtains module.
The building training module, for constructing the identification model including at least two sub-networks, and to the identification mould
Each sub-network in type is trained respectively;
The initial recognition result obtains module, for being identified by each sub-network to be identified after training
Video sequence obtains initial recognition result corresponding with each sub-network;
The Activity recognition result obtains module, will obtain behavior after the corresponding initial recognition result fusion of each sub-network
Recognition result.
Other modules or unit can refer to the description in embodiment shown in fig. 6, and details are not described herein.
The terminal device includes but are not limited to processor 70, memory 71.It will be understood by those skilled in the art that figure
7 be only an example of terminal device 7, does not constitute the restriction to terminal device 7, may include more more or less than illustrating
Component, perhaps combine certain components or different components, for example, the terminal device can also include input equipment, it is defeated
Equipment, network access equipment, bus etc. out.
The processor 70 can be central processing unit (Central Processing Unit, CPU), can also be
Other general processors, digital signal processor (Digital Signal Processor, DSP), specific integrated circuit
(Application Specific Integrated Circuit, ASIC), ready-made programmable gate array (Field-
Programmable Gate Array, FPGA) either other programmable logic device, discrete gate or transistor logic,
Discrete hardware components etc..General processor can be microprocessor or the processor is also possible to any conventional processor
Deng.
The memory 71 can be the internal storage unit of the terminal device 7, for example, terminal device 7 hard disk or
Memory.The memory 71 is also possible to the External memory equipment of the terminal device 7, such as is equipped on the terminal device 7
Plug-in type hard disk, intelligent memory card (Smart Media Card, SMC), secure digital (Secure Digital, SD) card,
Flash card (Flash Card) etc..Further, the memory 71 can also both include the storage inside of the terminal device 4
Unit also includes External memory equipment.The memory 71 is for storing needed for the computer program and the terminal device
Other programs and data.The memory 71 can be also used for temporarily storing the data that has exported or will export.
In the above-described embodiments, it all emphasizes particularly on different fields to the description of each embodiment, is not described in detail or remembers in some embodiment
The part of load may refer to the associated description of other embodiments.
Those of ordinary skill in the art may be aware that list described in conjunction with the examples disclosed in the embodiments of the present disclosure
Member and algorithm steps can be realized with the combination of electronic hardware or computer software and electronic hardware.These functions are actually
It is implemented in hardware or software, the specific application and design constraint depending on technical solution.Professional technician
Each specific application can be used different methods to achieve the described function, but this realization is it is not considered that exceed
Scope of the present application.
In embodiment provided herein, it should be understood that disclosed terminal device and method can pass through it
Its mode is realized.For example, terminal device embodiment described above is only schematical, for example, the module or list
Member division, only a kind of logical function partition, there may be another division manner in actual implementation, for example, multiple units or
Component can be combined or can be integrated into another system, or some features can be ignored or not executed.Another point is shown
The mutual coupling or direct-coupling or communication connection shown or discussed can be through some interfaces, between device or unit
Coupling or communication connection are connect, can be electrical property, mechanical or other forms.
The unit as illustrated by the separation member may or may not be physically separated, aobvious as unit
The component shown may or may not be physical unit, it can and it is in one place, or may be distributed over multiple
In network unit.It can select some or all of unit therein according to the actual needs to realize the mesh of this embodiment scheme
's.
It, can also be in addition, each functional unit in each embodiment of the application can integrate in one processing unit
It is that each unit physically exists alone, can also be integrated in one unit with two or more units.Above-mentioned integrated list
Member both can take the form of hardware realization, can also realize in the form of software functional units.
If the integrated module/unit be realized in the form of SFU software functional unit and as independent product sale or
In use, can store in a computer readable storage medium.Based on this understanding, the application realizes above-mentioned implementation
All or part of the process in example method, can also instruct relevant hardware to complete, the meter by computer program
Calculation machine program can be stored in a computer readable storage medium, the computer program when being executed by processor, it can be achieved that on
The step of stating each embodiment of the method.Wherein, the computer program includes computer program code, the computer program generation
Code can be source code form, object identification code form, executable file or certain intermediate forms etc..The computer-readable medium
It may include: any entity or device, recording medium, USB flash disk, mobile hard disk, magnetic that can carry the computer program code
Dish, CD, computer storage, read-only memory (ROM, Read-Only Memory), random access memory (RAM,
Random Access Memory), electric carrier signal, telecommunication signal and software distribution medium etc..It should be noted that described
The content that computer-readable medium includes can carry out increasing appropriate according to the requirement made laws in jurisdiction with patent practice
Subtract, such as in certain jurisdictions, according to legislation and patent practice, computer-readable medium do not include be electric carrier signal and
Telecommunication signal.
Embodiment described above is only to illustrate the technical solution of the application, rather than its limitations;Although referring to aforementioned reality
Example is applied the application is described in detail, those skilled in the art should understand that: it still can be to aforementioned each
Technical solution documented by embodiment is modified or equivalent replacement of some of the technical features;And these are modified
Or replacement, the spirit and scope of each embodiment technical solution of the application that it does not separate the essence of the corresponding technical solution should all
Comprising within the scope of protection of this application.
Claims (10)
1. a kind of Activity recognition method characterized by comprising
Building includes the identification model of at least two sub-networks, and instructs respectively to each sub-network in the identification model
Practice;
After training, video sequence to be identified is identified by each sub-network, is obtained corresponding with each sub-network first
Beginning recognition result;
Activity recognition result will be obtained after the corresponding initial recognition result fusion of each sub-network.
2. Activity recognition method as described in claim 1, which is characterized in that the sub-network includes: binary-flow network, limbs point
Off-network network, attention network, frame difference network.
3. Activity recognition method as claimed in claim 2, which is characterized in that the binary-flow network includes:
Base net network and the base net network on spatial flow is softmax layers corresponding, and the base net network in time flow is corresponding with the base net network
Softmax layer;
The limbs separated network includes:
Five base net networks and, five base net network softmax layers corresponding with each base net network in five base net networks
Respectively correspond five parts of human body: trunk, left arm, right arm, left leg, right leg;
The attention network includes:
The base net network of attention mechanism is merged, the attention mechanism includes: two full articulamentums, softmax layers;
The frame difference network includes:
Base net network and softmax layers corresponding with the base net network.
4. Activity recognition method as claimed in claim 3, which is characterized in that the base net network includes:
Sequentially connected one-dimensional convolutional layer, at least two basic blocks, mean value pond layer and full articulamentum;
It is connected between two adjacent basic blocks by residual error, the residual error is indicated by following formula:
xi+1=Flayer(xi)+xi
Wherein, xiIt is the input of i-th of basic block, xi+1It is the output of i-th of basic block, xi+1It is also the defeated of i+1 basic block
Enter.
5. Activity recognition method as claimed in claim 4, which is characterized in that the basic block includes:
At least two convolutional layers are provided with batch normalization layer, nonlinear activation function and dropout layers, institute between convolutional layer
State batch normalization layer, nonlinear activation function and dropout layer pass through following formula expression:
Flayer(x)=Dropout (ReLU (BN (f (x*w))))
Wherein, w indicates the weight of convolution kernel, and the x indicates the input of convolutional layer, and the f (x*w) indicates a upper convolutional layer
Output, also illustrates that the input of batch normalization layer, and the BN (f (x*w)) indicates the output of batch normalization layer, also illustrates that non-thread
The input of property activation primitive;The ReLU (BN (f (x*w))) indicates the output of nonlinear activation function, also illustrates that dropout layers
Input, Flayer(x) output for indicating dropout layers, also illustrates that the input of next convolutional layer.
6. Activity recognition method as claimed in claim 3, which is characterized in that each subnet in the identification model
Network is trained respectively
Based on the binary-flow network, the convolution kernel on spatial flow is slided in Spatial Dimension, with the base net network on the spatial flow
The corresponding softmax layers score obtained on spatial flow, the convolution kernel in time flow slides on time dimension, when with described
Between stream on base net network it is corresponding softmax layer acquisition time flow on score, by the spatial flow score and it is described when
Between stream on score be multiplied, to be trained end to end;
Based on the limbs separated network, five Partial Features of human body are respectively fed to corresponding five base net networks, described in acquisition
The corresponding score of five base net networks is multiplied, to be trained by the corresponding score of five base net networks;
Based on the attention network, the attention mechanism is arranged in behind convolutional layer or basic block, the attention machine
It is made as:
yc1=Activation (W1xic+b1)
yc2=W2yc1+b2
xoc=F (xic)
Wherein, the c indicates c-th of channel of attention mechanism input, xicIndicate one layer in the attention mechanism of output,
The W1Indicate the weight of first full articulamentum, the b1Indicate the biasing of first full articulamentum, yc1Expression first complete
The output of articulamentum also illustrates that the input of second full articulamentum, the W2Indicate the weight of second full articulamentum, the b2
Indicate the biasing of second full articulamentum, yc2Indicate the output and softmax layers of input of second full articulamentum, WαTable
The attention weight that dendrography is practised, the αcIndicate the normalized weight of softmax layers of acquisition, xocIndicate the attention mechanism
Upper one layer of output, O indicate lower layer of the attention mechanism of input;
Based on the frame difference network, convolution kernel slides in timing, the input of the frame difference network are as follows:
Sm={ M2,M3,…Mt,…MN,
Wherein, the SmIndicate the input of the frame difference network, the Mt=Ft-Ft-1, the Ft={ J1,J2,Ji,…Jt, institute
State Ji=(xi,yi,zi), N indicates shared N frame video sequence;
Based on the binary-flow network, limbs separated network, attention network and frame difference network, the loss function used when training is equal
For cross entropy loss function:
Wherein, yiIndicate true class label,Prediction label is represented, n indicates class number.
7. such as Activity recognition method as claimed in any one of claims 1 to 6, which is characterized in that described that each sub-network is corresponding
Initial recognition result fusion after obtain Activity recognition result include:
Pass through formulaOrThe corresponding initial recognition result fusion acquisition behavior of each sub-network is known
Not as a result,
Wherein, the ytestIndicate Activity recognition as a result, the yiIndicate i-th of sub-network, n indicates that the identification model is wrapped altogether
Include n sub-network.
8. a kind of terminal device characterized by comprising
Training module is constructed, for constructing the identification model including at least two sub-networks, and to every in the identification model
A sub-network is trained respectively;
Initial recognition result obtains module, for identifying video sequence to be identified by each sub-network after training,
Obtain initial recognition result corresponding with each sub-network;
Activity recognition result obtains module, will obtain Activity recognition knot after the corresponding initial recognition result fusion of each sub-network
Fruit.
9. a kind of terminal device, including memory, processor and storage are in the memory and can be on the processor
The computer program of operation, which is characterized in that the processor realizes such as claim 1 to 7 when executing the computer program
The step of any one the method.
10. a kind of computer readable storage medium, which is characterized in that the computer-readable recording medium storage has computer journey
Sequence realizes the step such as any one of claim 1 to 7 the method when the computer program is executed by one or more processors
Suddenly.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810272399.7A CN110321761B (en) | 2018-03-29 | 2018-03-29 | Behavior identification method, terminal equipment and computer readable storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810272399.7A CN110321761B (en) | 2018-03-29 | 2018-03-29 | Behavior identification method, terminal equipment and computer readable storage medium |
Publications (2)
Publication Number | Publication Date |
---|---|
CN110321761A true CN110321761A (en) | 2019-10-11 |
CN110321761B CN110321761B (en) | 2022-02-11 |
Family
ID=68110943
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810272399.7A Active CN110321761B (en) | 2018-03-29 | 2018-03-29 | Behavior identification method, terminal equipment and computer readable storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110321761B (en) |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110222653A (en) * | 2019-06-11 | 2019-09-10 | 中国矿业大学(北京) | A kind of skeleton data Activity recognition method based on figure convolutional neural networks |
CN111161306A (en) * | 2019-12-31 | 2020-05-15 | 北京工业大学 | Video target segmentation method based on motion attention |
CN111310707A (en) * | 2020-02-28 | 2020-06-19 | 山东大学 | Skeleton-based method and system for recognizing attention network actions |
CN111539290A (en) * | 2020-04-16 | 2020-08-14 | 咪咕文化科技有限公司 | Video motion recognition method and device, electronic equipment and storage medium |
CN112597824A (en) * | 2020-12-07 | 2021-04-02 | 深延科技(北京)有限公司 | Behavior recognition method and device, electronic equipment and storage medium |
CN112926453A (en) * | 2021-02-26 | 2021-06-08 | 电子科技大学 | Examination room cheating behavior analysis method based on motion feature enhancement and long-term time sequence modeling |
WO2023147778A1 (en) * | 2022-02-07 | 2023-08-10 | 北京字跳网络技术有限公司 | Action recognition method and apparatus, and electronic device and storage medium |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8462996B2 (en) * | 2008-05-19 | 2013-06-11 | Videomining Corporation | Method and system for measuring human response to visual stimulus based on changes in facial expression |
CN106599789A (en) * | 2016-07-29 | 2017-04-26 | 北京市商汤科技开发有限公司 | Video class identification method and device, data processing device and electronic device |
US20170220854A1 (en) * | 2016-01-29 | 2017-08-03 | Conduent Business Services, Llc | Temporal fusion of multimodal data from multiple data acquisition systems to automatically recognize and classify an action |
CN107025420A (en) * | 2016-01-29 | 2017-08-08 | 中兴通讯股份有限公司 | The method and apparatus of Human bodys' response in video |
CN107679522A (en) * | 2017-10-31 | 2018-02-09 | 内江师范学院 | Action identification method based on multithread LSTM |
CN109522874A (en) * | 2018-12-11 | 2019-03-26 | 中国科学院深圳先进技术研究院 | Human motion recognition method, device, terminal device and storage medium |
-
2018
- 2018-03-29 CN CN201810272399.7A patent/CN110321761B/en active Active
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8462996B2 (en) * | 2008-05-19 | 2013-06-11 | Videomining Corporation | Method and system for measuring human response to visual stimulus based on changes in facial expression |
US20170220854A1 (en) * | 2016-01-29 | 2017-08-03 | Conduent Business Services, Llc | Temporal fusion of multimodal data from multiple data acquisition systems to automatically recognize and classify an action |
CN107025420A (en) * | 2016-01-29 | 2017-08-08 | 中兴通讯股份有限公司 | The method and apparatus of Human bodys' response in video |
CN106599789A (en) * | 2016-07-29 | 2017-04-26 | 北京市商汤科技开发有限公司 | Video class identification method and device, data processing device and electronic device |
CN107679522A (en) * | 2017-10-31 | 2018-02-09 | 内江师范学院 | Action identification method based on multithread LSTM |
CN109522874A (en) * | 2018-12-11 | 2019-03-26 | 中国科学院深圳先进技术研究院 | Human motion recognition method, device, terminal device and storage medium |
Non-Patent Citations (3)
Title |
---|
INWOONG LEE ET AL.: "Ensemble Deep Learning for Skeleton-Based Action Recognition Using Temporal Sliding LSTM Networks", 《2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV)》 * |
YANGYANG XU ET AL.: "NTU RGB+D: A large scale dataset for 3-D human activity analysis", 《2017 3RD IEEE INTERNATIONAL CONFERENCE ON COMPUTER AND COMMUNICATIONS (ICCC)》 * |
李艳荻 等: "基于空-时域特征决策级融合的人体行为识别算法", 《光学学报》 * |
Cited By (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110222653A (en) * | 2019-06-11 | 2019-09-10 | 中国矿业大学(北京) | A kind of skeleton data Activity recognition method based on figure convolutional neural networks |
CN110222653B (en) * | 2019-06-11 | 2020-06-16 | 中国矿业大学(北京) | Skeleton data behavior identification method based on graph convolution neural network |
CN111161306A (en) * | 2019-12-31 | 2020-05-15 | 北京工业大学 | Video target segmentation method based on motion attention |
CN111161306B (en) * | 2019-12-31 | 2023-06-02 | 北京工业大学 | Video target segmentation method based on motion attention |
CN111310707A (en) * | 2020-02-28 | 2020-06-19 | 山东大学 | Skeleton-based method and system for recognizing attention network actions |
CN111310707B (en) * | 2020-02-28 | 2023-06-20 | 山东大学 | Bone-based graph annotation meaning network action recognition method and system |
CN111539290A (en) * | 2020-04-16 | 2020-08-14 | 咪咕文化科技有限公司 | Video motion recognition method and device, electronic equipment and storage medium |
CN111539290B (en) * | 2020-04-16 | 2023-10-20 | 咪咕文化科技有限公司 | Video motion recognition method and device, electronic equipment and storage medium |
CN112597824A (en) * | 2020-12-07 | 2021-04-02 | 深延科技(北京)有限公司 | Behavior recognition method and device, electronic equipment and storage medium |
CN112926453A (en) * | 2021-02-26 | 2021-06-08 | 电子科技大学 | Examination room cheating behavior analysis method based on motion feature enhancement and long-term time sequence modeling |
CN112926453B (en) * | 2021-02-26 | 2022-08-05 | 电子科技大学 | Examination room cheating behavior analysis method based on motion feature enhancement and long-term time sequence modeling |
WO2023147778A1 (en) * | 2022-02-07 | 2023-08-10 | 北京字跳网络技术有限公司 | Action recognition method and apparatus, and electronic device and storage medium |
Also Published As
Publication number | Publication date |
---|---|
CN110321761B (en) | 2022-02-11 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110321761A (en) | A kind of Activity recognition method, terminal device and computer readable storage medium | |
CN110020620A (en) | Face identification method, device and equipment under a kind of big posture | |
US11710041B2 (en) | Feature map and weight selection method and accelerating device | |
Sun et al. | Lattice long short-term memory for human action recognition | |
CN110321910A (en) | Feature extracting method, device and equipment towards cloud | |
CN108830211A (en) | Face identification method and Related product based on deep learning | |
WO2020253852A1 (en) | Image identification method and device, identification model training method and device, and storage medium | |
WO2021057056A1 (en) | Neural architecture search method, image processing method and device, and storage medium | |
WO2021248859A1 (en) | Video classification method and apparatus, and device, and computer readable storage medium | |
CN109064428A (en) | A kind of image denoising processing method, terminal device and computer readable storage medium | |
CN108765278A (en) | A kind of image processing method, mobile terminal and computer readable storage medium | |
CN105631398A (en) | Method and apparatus for recognizing object, and method and apparatus for training recognizer | |
CN110263909A (en) | Image-recognizing method and device | |
CN110222718B (en) | Image processing method and device | |
CN108510982A (en) | Audio event detection method, device and computer readable storage medium | |
CN109117773A (en) | A kind of characteristics of image point detecting method, terminal device and storage medium | |
CN109584992A (en) | Exchange method, device, server, storage medium and sand play therapy system | |
CN112035671B (en) | State detection method and device, computer equipment and storage medium | |
CN111047022A (en) | Computing device and related product | |
Gao et al. | Natural scene recognition based on convolutional neural networks and deep Boltzmannn machines | |
CN110633624A (en) | Machine vision human body abnormal behavior identification method based on multi-feature fusion | |
CN110046941A (en) | A kind of face identification method, system and electronic equipment and storage medium | |
CN113191479A (en) | Method, system, node and storage medium for joint learning | |
CN109086871A (en) | Training method, device, electronic equipment and the computer-readable medium of neural network | |
WO2022183805A1 (en) | Video classification method, apparatus, and device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |