CN107808150A - The recognition methods of human body video actions, device, storage medium and processor - Google Patents
The recognition methods of human body video actions, device, storage medium and processor Download PDFInfo
- Publication number
- CN107808150A CN107808150A CN201711154691.0A CN201711154691A CN107808150A CN 107808150 A CN107808150 A CN 107808150A CN 201711154691 A CN201711154691 A CN 201711154691A CN 107808150 A CN107808150 A CN 107808150A
- Authority
- CN
- China
- Prior art keywords
- network model
- neural network
- video
- convolution
- target
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/20—Movements or behaviour, e.g. gesture recognition
- G06V40/23—Recognition of whole body movements, e.g. for sport training
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/40—Scenes; Scene-specific elements in video content
- G06V20/49—Segmenting video sequences, i.e. computational techniques such as parsing or cutting the sequence, low-level clustering or determining units such as shots or scenes
Abstract
The invention discloses a kind of recognition methods of human body video actions, device, storage medium and processor.Wherein, this method includes:First convolution neural network model is created according to default full tunnel three dimensional convolution kernel;Neural network model is accumulated according to the deliberate action identification data collection training first volume, obtains the second convolution neural network model;At least part full tunnel Three dimensional convolution layer in first convolution neural network model is replaced with into single channel Three dimensional convolution unit, obtains the 3rd convolutional neural networks model;3rd convolutional neural networks model is trained according to deliberate action identification data collection and the second convolution neural network model, obtains target convolution neural network model;Video to be identified is inputted to target convolution neural network model, obtains target identification result.The present invention solves the technical problem that human action identification method computational accuracy is relatively low, computational efficiency is poor present in prior art.
Description
Technical field
The present invention relates to field of video processing, in particular to a kind of human body video actions recognition methods, device, deposits
Storage media and processor.
Background technology
With the informationization of society, web development tide, various videos emerge in multitude, as monitoring system security protection video,
From shoot the video, network media video etc..The motion analysis identification technology of intelligence is for extensive video frequency searching, man-machine interaction, peace
The application such as anti-monitoring and early warning, visual classification plays an important roll.
Conventional action identification is carried out by technologies such as optical flow method, intensive trajectory analysises, engineer and selected characteristic, is calculated
Complexity, and performance bottleneck be present.Along with breakthrough development of the deep learning in image classification field, deep learning correlation technique
Gradually infiltrate into video analysis action recognition field.But current human action identification method exist computational accuracy it is relatively low, meter
Calculate the poor technical problem of efficiency.
For it is above-mentioned the problem of, not yet propose effective solution at present.
The content of the invention
The embodiments of the invention provide a kind of recognition methods of human body video actions, device, storage medium and processor, so that
Solves the technical problem that human action identification method computational accuracy is relatively low, computational efficiency is poor present in prior art less.
One side according to embodiments of the present invention, there is provided a kind of human body video actions recognition methods, this method include:
First convolution neural network model is created according to default full tunnel three dimensional convolution kernel;According in deliberate action identification data collection training
The first convolution neural network model is stated, obtains the second convolution neural network model, wherein, above-mentioned second convolution neural network model
To reach the above-mentioned first convolution neural network model of convergence state;By at least portion in above-mentioned first convolution neural network model
Divide full tunnel Three dimensional convolution layer to replace with single channel Three dimensional convolution unit, obtain the 3rd convolutional neural networks model;According to above-mentioned
Deliberate action identification data collection and above-mentioned second convolution neural network model train above-mentioned 3rd convolutional neural networks model, obtain
Target convolution neural network model, wherein, above-mentioned target convolution neural network model is reach convergence state above-mentioned volume three
Product neural network model;Input video to be identified to above-mentioned target volume and accumulate neural network model, obtain target identification result.
Further, before according to the above-mentioned first convolution neural network model of deliberate action identification data collection training, on
Stating method also includes:Obtain the video data in target video;By above-mentioned partitioning video data into multiple video clip sections, its
In, each above-mentioned video clip section only includes single action classification;Pre-set categories label is added to above-mentioned video clip section, obtained
Above-mentioned deliberate action identification data collection.
Further, above-mentioned at least part full tunnel Three dimensional convolution layer by above-mentioned first convolution neural network model replaces
Being changed to single channel Three dimensional convolution unit includes:It is three-dimensional that above-mentioned at least part full tunnel Three dimensional convolution layer is replaced with into above-mentioned single channel
Convolutional layer;In the rear position level addition batch standardization layer of above-mentioned single channel Three dimensional convolution layer, non-linear layer, residual error branch, superposition
Unit and 1x1 packet convolutional layers, obtain above-mentioned single channel Three dimensional convolution unit.
Further, above-mentioned input video to be identified to above-mentioned target volume accumulates neural network model, obtains target identification knot
Fruit includes:Above-mentioned video to be identified is split, obtains multiple the second video sequences with same preset length;Will be multiple
Above-mentioned second video sequence inputs to above-mentioned target volume and accumulates neutral net, obtains corresponding to above-mentioned multiple above-mentioned second video sequences
Preliminary recognition result;Above-mentioned preliminary recognition result is handled according to preset data processing mode, obtains above-mentioned target identification
As a result, wherein, above-mentioned preset data processing mode includes at least one following:Obtain the extreme value of above-mentioned preliminary recognition result, obtain
Take the average value of above-mentioned preliminary recognition result and summation is weighted to above-mentioned preliminary recognition result.
Another aspect according to embodiments of the present invention, a kind of human body video actions identification device is additionally provided, the device bag
Include:Creating unit, for creating the first convolution neural network model according to default full tunnel three dimensional convolution kernel;First training is single
Member, for according to the above-mentioned first convolution neural network model of deliberate action identification data collection training, obtaining the second convolution nerve net
Network model, wherein, above-mentioned second convolution neural network model is the above-mentioned first convolution neural network model for reaching convergence state;
Replacement unit, at least part full tunnel Three dimensional convolution layer in above-mentioned first convolution neural network model to be replaced with into single-pass
Road Three dimensional convolution unit, obtains the 3rd convolutional neural networks model;Second training unit, for being identified according to above-mentioned deliberate action
Data set and above-mentioned second convolution neural network model train above-mentioned 3rd convolutional neural networks model, obtain target convolutional Neural
Network model, wherein, above-mentioned target convolution neural network model is the above-mentioned 3rd convolutional neural networks mould for reaching convergence state
Type;Processing unit, neural network model is accumulated for inputting video to be identified to above-mentioned target volume, obtains target identification result.
Further, said apparatus also includes:Acquiring unit, for obtaining the video data in target video;Segmentation is single
Member, for by above-mentioned partitioning video data into multiple video clip sections, wherein, each above-mentioned video clip section is only comprising single dynamic
Make classification;Adding device, for adding pre-set categories label to above-mentioned video clip section, obtain above-mentioned deliberate action identification data
Collection.
Further, above-mentioned replacement unit includes:Subelement is replaced, for by least part full tunnel Three dimensional convolution
Layer replaces with the single channel Three dimensional convolution layer;Subelement is added, for the rear position level in the single channel Three dimensional convolution layer
Addition batch standardization layer, non-linear layer, residual error branch, superpositing unit and 1x1 packet convolutional layers, obtain the three-dimensional volume of the single channel
Product unit.
Further, above-mentioned processing unit includes:Split subelement, for splitting to above-mentioned video to be identified, obtain
To multiple the second video sequences with same preset length;Subelement is inputted, for multiple above-mentioned second video sequences are defeated
Enter to above-mentioned target volume and accumulate neutral net, obtain preliminary recognition result corresponding to above-mentioned multiple above-mentioned second video sequences;Processing
Subelement, for being handled according to preset data processing mode above-mentioned preliminary recognition result, obtain above-mentioned target identification knot
Fruit, wherein, above-mentioned preset data processing mode includes at least one following:Obtain the extreme value of above-mentioned preliminary recognition result, obtain
The average value of above-mentioned preliminary recognition result and summation is weighted to above-mentioned preliminary recognition result.
Another aspect according to embodiments of the present invention, provides a kind of storage medium again, and above-mentioned storage medium includes storage
Program, wherein, equipment where above-mentioned storage medium is controlled when said procedure is run performs above-mentioned human body video actions and known
Other method.
Another aspect according to embodiments of the present invention, a kind of processor is provided again, above-mentioned processor is used for operation program,
Wherein, above-mentioned human body video actions recognition methods is performed when said procedure is run.
In embodiments of the present invention, the first convolution neural network model is created using according to default full tunnel three dimensional convolution kernel
Mode;Neural network model is accumulated according to the deliberate action identification data collection training first volume, obtains the second convolutional neural networks mould
Type, wherein, the second convolution neural network model is the first convolution neural network model for reaching convergence state;By by the first volume
At least part full tunnel Three dimensional convolution layer in product neural network model replaces with single channel Three dimensional convolution unit, obtains volume three
Product neural network model;3rd convolutional Neural net is trained according to deliberate action identification data collection and the second convolution neural network model
Network model, target convolution neural network model is obtained, wherein, target convolution neural network model is reach convergence state the 3
Convolutional neural networks model;Reach input video to be identified to target convolution neural network model, obtain target identification result
Purpose, it is achieved thereby that lifting human action accuracy of identification, improve human action identification efficiency technical problem.
Brief description of the drawings
Accompanying drawing described herein is used for providing a further understanding of the present invention, forms the part of the application, this hair
Bright schematic description and description is used to explain the present invention, does not form inappropriate limitation of the present invention.In the accompanying drawings:
Fig. 1 is a kind of schematic flow sheet of optional human body video actions recognition methods according to embodiments of the present invention;
Fig. 2 is the schematic flow sheet of the optional human body video actions recognition methods of another kind according to embodiments of the present invention;
Fig. 3 is the schematic flow sheet of another optional human body video actions recognition methods according to embodiments of the present invention;
Fig. 4 is the schematic flow sheet of another optional human body video actions recognition methods according to embodiments of the present invention;
Fig. 5 is a kind of structural representation of optional human body video actions identification device according to embodiments of the present invention;
Fig. 6 is a kind of structural representation of optional first convolution neural network model according to embodiments of the present invention;
Embodiment
In order that those skilled in the art more fully understand the present invention program, below in conjunction with the embodiment of the present invention
Accompanying drawing, the technical scheme in the embodiment of the present invention is clearly and completely described, it is clear that described embodiment is only
The embodiment of a part of the invention, rather than whole embodiments.Based on the embodiment in the present invention, ordinary skill people
The every other embodiment that member is obtained under the premise of creative work is not made, it should all belong to the model that the present invention protects
Enclose.
It should be noted that term " first " in description and claims of this specification and above-mentioned accompanying drawing, "
Two " etc. be for distinguishing similar object, without for describing specific order or precedence.It should be appreciated that so use
Data can exchange in the appropriate case, so as to embodiments of the invention described herein can with except illustrating herein or
Order beyond those of description is implemented.In addition, term " comprising " and " having " and their any deformation, it is intended that cover
Cover it is non-exclusive include, be not necessarily limited to for example, containing the process of series of steps or unit, method, system, product or equipment
Those steps or unit clearly listed, but may include not list clearly or for these processes, method, product
Or the intrinsic other steps of equipment or unit.
Embodiment 1
According to embodiments of the present invention, there is provided a kind of embodiment of human body video actions recognition methods, it is necessary to explanation,
It can be performed the step of the flow of accompanying drawing illustrates in the computer system of such as one group computer executable instructions, and
And although showing logical order in flow charts, in some cases, can be with different from order execution institute herein
The step of showing or describing.
Fig. 1 is a kind of schematic flow sheet of optional human body video actions recognition methods according to embodiments of the present invention, such as
Shown in Fig. 1, this method comprises the following steps:
Step S102, the first convolution neural network model is created according to default full tunnel three dimensional convolution kernel;
Step S104, neural network model is accumulated according to the deliberate action identification data collection training first volume, obtains the second convolution
Neural network model, wherein, the second convolution neural network model is the first convolution neural network model for reaching convergence state;
Step S106, at least part full tunnel Three dimensional convolution layer in the first convolution neural network model is replaced with into single-pass
Road Three dimensional convolution unit, obtains the 3rd convolutional neural networks model;
Step S108, the 3rd convolutional Neural is trained according to deliberate action identification data collection and the second convolution neural network model
Network model, target convolution neural network model is obtained, wherein, target convolution neural network model is reach convergence state
Three convolutional neural networks models;
Step S110, video to be identified is inputted to target convolution neural network model, obtains target identification result.
In embodiments of the present invention, the first convolution neural network model is created using according to default full tunnel three dimensional convolution kernel
Mode;Neural network model is accumulated according to the deliberate action identification data collection training first volume, obtains the second convolutional neural networks mould
Type, wherein, the second convolution neural network model is the first convolution neural network model for reaching convergence state;By by the first volume
At least part full tunnel Three dimensional convolution layer in product neural network model replaces with single channel Three dimensional convolution unit, obtains volume three
Product neural network model;3rd convolutional Neural net is trained according to deliberate action identification data collection and the second convolution neural network model
Network model, target convolution neural network model is obtained, wherein, target convolution neural network model is reach convergence state the 3
Convolutional neural networks model;Reach input video to be identified to target convolution neural network model, obtain target identification result
Purpose, it is achieved thereby that lifting human action accuracy of identification, improve human action identification efficiency technical problem.
Alternatively, the first convolutional neural networks in step S102 include:Input layer, Three dimensional convolution layer, three-dimensional pond layer,
Non-linear layer, full articulamentum, output layer.Wherein, input layer size is [H, W, 3, F], and wherein H and W are respectively input video
Height and the width, F are the number of image frames included in video.Wherein, three-dimensional pond layer uses maximum pond function.
Alternatively, in step S104, the Video segmentation that can concentrate deliberate action identification data is that length is not weighing for F
Folded video sequence, input into the first convolution neural network model, be trained using gradient descent method, object function is friendship
Pitch entropy error.
Alternatively, in step S106, by least part full tunnel Three dimensional convolution layer in the first convolution neural network model
Single channel Three dimensional convolution unit is replaced with to include:The full tunnel connection mode of three dimensional convolution kernel in Three dimensional convolution layer is replaced with
Single channel connection mode, obtains single channel Three dimensional convolution unit, and calculation formula is:Input feature vector figure is X [h, w, c, f], output
Characteristic pattern is Y [h1, w1, c, f1], and convolution kernel is K [k, k, c, d], step-length 1, bias vector b, single channel Three dimensional convolution list
Member exports:In the rear position of above-mentioned single channel Three dimensional convolution unit
Level addition batch standardization layer, non-linear layer, residual error branch, superpositing unit and 1x1 packet convolutional layers.
Alternatively, in step S108, the Video segmentation that can concentrate deliberate action identification data is that length is not weighing for F
Folded video sequence, input into the second convolution neural network model and the 3rd convolutional neural networks model, obtain soft label and
Predict output valve.The cross entropy error between prediction output valve and soft label is calculated, and calculates and predicts that output valve and video are true
Cross entropy error between class label, summation is weighted, obtains overall error, is trained using gradient descent method.
Alternatively, the human body video actions recognition methods in the embodiment of the present application is based on single channel Three dimensional convolution unit, structure
Action recognition convolutional neural networks are made, temporal information and spatial information in input video can be utilized simultaneously, compared to tradition
Two-dimensional convolution neutral net, more suitable for handle video data, enhancing action identification precision.
Alternatively, the single channel Three dimensional convolution unit in the human body video actions recognition methods in the embodiment of the present application includes
Single channel Three dimensional convolution layer, batch standardization layer, non-linear layer, residual error branch, superpositing unit, 1x1 packet convolutional layers.Wherein, adopt
With single channel Three dimensional convolution, compared to initial three-dimensional convolution, amount of calculation and parameter amount are reduced.Using residual error branch and 1x1 points
Group convolutional layer, loss of significance caused by parameter reduces effectively is made up, is known so as to solve present in existing action recognition technology
The technical problem that other precision is low, computational efficiency is poor.
Alternatively, Fig. 2 is the flow of the optional human body video actions recognition methods of another kind according to embodiments of the present invention
Schematic diagram, as shown in Fig. 2 before step S104 is performed, i.e., god is being accumulated according to the deliberate action identification data collection training first volume
Before network model, this method can also include:
Step S202, obtain the video data in target video;
Step S204, by partitioning video data into multiple video clip sections, wherein, each video clip section is only comprising single
Action classification;
Step S206, pre-set categories label is added to video clip section, obtains deliberate action identification data collection.
Alternatively, Fig. 3 is the flow of another optional human body video actions recognition methods according to embodiments of the present invention
Schematic diagram, as shown in figure 3, perform step S106, i.e., it is at least part full tunnel in the first convolution neural network model is three-dimensional
Convolutional layer, which replaces with single channel Three dimensional convolution unit, to be included:
Step S302, at least part full tunnel Three dimensional convolution layer is replaced with into single channel Three dimensional convolution layer;
Step S304, in the rear position level addition batch standardization layer, non-linear layer, residual error point of single channel Three dimensional convolution layer
Branch, superpositing unit and 1x1 packet convolutional layers, obtain single channel Three dimensional convolution unit.
Alternatively, Fig. 4 is the flow of another optional human body video actions recognition methods according to embodiments of the present invention
Schematic diagram, as shown in figure 4, performing step S110, video to be identified is inputted to target convolution neural network model, obtains target knowledge
Other result includes:
Step S402, video to be identified is split, obtain multiple the second video sequences with same preset length;
Step S404, multiple second video sequences are inputted to target convolutional neural networks, obtain multiple second video sequences
Preliminary recognition result corresponding to row;
Step S406, preliminary recognition result is handled according to preset data processing mode, obtains target identification result,
Wherein, preset data processing mode includes at least one following:Obtain the extreme value of preliminary recognition result, obtain preliminary recognition result
Average value and summation is weighted to preliminary recognition result.
In embodiments of the present invention, the first convolution neural network model is created using according to default full tunnel three dimensional convolution kernel
Mode;Neural network model is accumulated according to the deliberate action identification data collection training first volume, obtains the second convolutional neural networks mould
Type, wherein, the second convolution neural network model is the first convolution neural network model for reaching convergence state;By by the first volume
At least part full tunnel Three dimensional convolution layer in product neural network model replaces with single channel Three dimensional convolution unit, obtains volume three
Product neural network model;3rd convolutional Neural net is trained according to deliberate action identification data collection and the second convolution neural network model
Network model, target convolution neural network model is obtained, wherein, target convolution neural network model is reach convergence state the 3
Convolutional neural networks model;Reach input video to be identified to target convolution neural network model, obtain target identification result
Purpose, it is achieved thereby that lifting human action accuracy of identification, improve human action identification efficiency technical problem.
Embodiment 2
Another aspect according to embodiments of the present invention, a kind of human body video actions identification device is additionally provided, such as Fig. 5 institutes
Show, the device includes:
Creating unit 501, for creating the first convolution neural network model according to default full tunnel three dimensional convolution kernel;First
Training unit 503, for accumulating neural network model according to the deliberate action identification data collection training first volume, obtain the second convolution god
Through network model, wherein, the second convolution neural network model is the first convolution neural network model for reaching convergence state;Replace
Unit 505, at least part full tunnel Three dimensional convolution layer in the first convolution neural network model to be replaced with into single channel three
Convolution unit is tieed up, obtains the 3rd convolutional neural networks model;Second training unit 507, for according to deliberate action identification data
Collection and the second convolution neural network model train the 3rd convolutional neural networks model, obtain target convolution neural network model, its
In, target convolution neural network model is the 3rd convolutional neural networks model for reaching convergence state;Processing unit 509, is used for
Video to be identified is inputted to target convolution neural network model, obtains target identification result.
Alternatively, Fig. 6 is that a kind of structure of optional first convolution neural network model according to embodiments of the present invention is shown
Be intended to, as shown in fig. 6, the first convolution neural network model include input layer, 12 full tunnel Three dimensional convolution layers, five three
Wei Chiization layer, two-dimensional convolution layer, full articulamentum, output layer.Specifically, each layer of parameter in the first convolution neural network model
Can be:Input layer size is [H, W, 3, F], and wherein H and W are respectively the height and the width of input video, and F is to be included in video
Number of image frames.Optionally, the H of input layer is set to 128, W and is set to 171, F to be set to 16.12 full tunnel Three dimensional convolution layer volumes
Product core size is 3x3x3, and step-length is [1,1,1], port number is respectively 16,32,64,64,64,128,128,128,256,256,
512、512.The pond size of three-dimensional pond layer is [2,2,1], [2,2,2], [2,2,2], [2,2,2], [2,2,3] respectively, is adopted
With maximum pond function.
Alternatively, device can also include:Acquiring unit, for obtaining the video data in target video;Cutting unit,
For by partitioning video data into multiple video clip sections, wherein, each video clip section only includes single action classification;Addition
Unit, for adding pre-set categories label to video clip section, obtain deliberate action identification data collection.
Alternatively, replacement unit includes:Subelement is replaced, at least part full tunnel Three dimensional convolution layer to be replaced with into list
Passage Three dimensional convolution layer;Subelement is added, for the rear position level addition batch standardization layer, non-thread in single channel Three dimensional convolution layer
Property layer, residual error branch, superpositing unit and 1x1 packet convolutional layer, obtain single channel Three dimensional convolution unit.
Alternatively, processing unit includes:Split subelement, for splitting video to be identified, obtain multiple having
Second video sequence of same preset length;Subelement is inputted, for multiple second video sequences to be inputted to target convolution god
Through network, preliminary recognition result corresponding to multiple second video sequences is obtained;Subelement is handled, for being handled according to preset data
Mode is handled preliminary recognition result, obtains target identification result, wherein, preset data processing mode include it is following at least
One of:The extreme value of preliminary recognition result is obtained, the average value of preliminary recognition result is obtained and preliminary recognition result is added
Power summation.
Another aspect according to embodiments of the present invention, provides a kind of storage medium again, and storage medium includes the journey of storage
Sequence, wherein, control equipment where storage medium to perform the human body video actions identification in the embodiment of the present application 1 when program is run
Method.
Another aspect according to embodiments of the present invention, a kind of processor is provided again, processor is used for operation program, its
In, program performs the human body video actions recognition methods in the embodiment of the present application 1 when running.
In embodiments of the present invention, the first convolution neural network model is created using according to default full tunnel three dimensional convolution kernel
Mode;Neural network model is accumulated according to the deliberate action identification data collection training first volume, obtains the second convolutional neural networks mould
Type, wherein, the second convolution neural network model is the first convolution neural network model for reaching convergence state;By by the first volume
At least part full tunnel Three dimensional convolution layer in product neural network model replaces with single channel Three dimensional convolution unit, obtains volume three
Product neural network model;3rd convolutional Neural net is trained according to deliberate action identification data collection and the second convolution neural network model
Network model, target convolution neural network model is obtained, wherein, target convolution neural network model is reach convergence state the 3
Convolutional neural networks model;Reach input video to be identified to target convolution neural network model, obtain target identification result
Purpose, it is achieved thereby that lifting human action accuracy of identification, improve human action identification efficiency technical problem.
The embodiments of the present invention are for illustration only, do not represent the quality of embodiment.
In the above embodiment of the present invention, the description to each embodiment all emphasizes particularly on different fields, and does not have in some embodiment
The part of detailed description, it may refer to the associated description of other embodiment.
In several embodiments provided herein, it should be understood that disclosed technology contents, others can be passed through
Mode is realized.Wherein, device embodiment described above is only schematical, such as the division of the unit, Ke Yiwei
A kind of division of logic function, can there is an other dividing mode when actually realizing, for example, multiple units or component can combine or
Person is desirably integrated into another system, or some features can be ignored, or does not perform.Another, shown or discussed is mutual
Between coupling or direct-coupling or communication connection can be INDIRECT COUPLING or communication link by some interfaces, unit or module
Connect, can be electrical or other forms.
The unit illustrated as separating component can be or may not be physically separate, show as unit
The part shown can be or may not be physical location, you can with positioned at a place, or can also be distributed to multiple
On unit.Some or all of unit therein can be selected to realize the purpose of this embodiment scheme according to the actual needs.
In addition, each functional unit in each embodiment of the present invention can be integrated in a processing unit, can also
That unit is individually physically present, can also two or more units it is integrated in a unit.Above-mentioned integrated list
Member can both be realized in the form of hardware, can also be realized in the form of SFU software functional unit.
If the integrated unit is realized in the form of SFU software functional unit and is used as independent production marketing or use
When, it can be stored in a computer read/write memory medium.Based on such understanding, technical scheme is substantially
The part to be contributed in other words to prior art or all or part of the technical scheme can be in the form of software products
Embody, the computer software product is stored in a storage medium, including some instructions are causing a computer
Equipment (can be personal computer, server or network equipment etc.) perform each embodiment methods described of the present invention whole or
Part steps.And foregoing storage medium includes:USB flash disk, read-only storage (ROM, Read-Only Memory), arbitrary access are deposited
Reservoir (RAM, Random Access Memory), mobile hard disk, magnetic disc or CD etc. are various can be with store program codes
Medium.
Described above is only the preferred embodiment of the present invention, it is noted that for the ordinary skill people of the art
For member, under the premise without departing from the principles of the invention, some improvements and modifications can also be made, these improvements and modifications also should
It is considered as protection scope of the present invention.
Claims (10)
- A kind of 1. human body video actions recognition methods, it is characterised in that including:First convolution neural network model is created according to default full tunnel three dimensional convolution kernel;According to deliberate action identification data collection training the first convolution neural network model, the second convolutional neural networks mould is obtained Type, wherein, the second convolution neural network model is the first convolution neural network model for reaching convergence state;At least part full tunnel Three dimensional convolution layer in the first convolution neural network model is replaced with into the three-dimensional volume of single channel Product unit, obtain the 3rd convolutional neural networks model;3rd convolutional Neural is trained according to the deliberate action identification data collection and the second convolution neural network model Network model, target convolution neural network model is obtained, wherein, the target convolution neural network model is to reach convergence state The 3rd convolutional neural networks model;Video to be identified is inputted to the target convolution neural network model, obtains target identification result.
- 2. according to the method for claim 1, it is characterised in that according to deliberate action identification data collection training described first Before convolutional neural networks model, methods described also includes:Obtain the video data in target video;By the partitioning video data into multiple video clip sections, wherein, each video clip section only includes single action Classification;Pre-set categories label is added to the video clip section, obtains the deliberate action identification data collection.
- 3. according to the method for claim 1, it is characterised in that it is described by the first convolution neural network model extremely Small part full tunnel Three dimensional convolution layer, which replaces with single channel Three dimensional convolution unit, to be included:At least part full tunnel Three dimensional convolution layer is replaced with into the single channel Three dimensional convolution layer;In rear position level addition batch standardization layer, non-linear layer, residual error branch, the superpositing unit of the single channel Three dimensional convolution layer Convolutional layer is grouped with 1x1, obtains the single channel Three dimensional convolution unit.
- 4. according to the method for claim 1, it is characterised in that the input video to be identified to the target convolutional Neural Network model, obtaining target identification result includes:The video to be identified is split, obtains multiple the second video sequences with same preset length;Multiple second video sequences are inputted to the target convolutional neural networks, obtain the multiple second video Preliminary recognition result corresponding to sequence;The preliminary recognition result is handled according to preset data processing mode, obtains the target identification result, wherein, The preset data processing mode includes at least one following:Extreme value, the acquisition for obtaining the preliminary recognition result are described preliminary The average value of recognition result and summation is weighted to the preliminary recognition result.
- A kind of 5. human body video actions identification device, it is characterised in that including:Creating unit, for creating the first convolution neural network model according to default full tunnel three dimensional convolution kernel;First training unit, for according to deliberate action identification data collection training the first convolution neural network model, obtaining Second convolution neural network model, wherein, the second convolution neural network model is the first volume for reaching convergence state Product neural network model;Replacement unit, at least part full tunnel Three dimensional convolution layer in the first convolution neural network model to be replaced with Single channel Three dimensional convolution unit, obtains the 3rd convolutional neural networks model;Second training unit, for according to the deliberate action identification data collection and the second convolution neural network model training The 3rd convolutional neural networks model, target convolution neural network model is obtained, wherein, the target convolutional neural networks mould Type is to reach the 3rd convolutional neural networks model of convergence state;Processing unit, for inputting video to be identified to the target convolution neural network model, obtain target identification result.
- 6. device according to claim 5, it is characterised in that described device also includes:Acquiring unit, for obtaining the video data in target video;Cutting unit, for by the partitioning video data into multiple video clip sections, wherein, each video clip section is only Include single action classification;Adding device, for adding pre-set categories label to the video clip section, obtain the deliberate action identification data collection.
- 7. device according to claim 5, it is characterised in that the replacement unit includes:Subelement is replaced, at least part full tunnel Three dimensional convolution layer to be replaced with into the single channel Three dimensional convolution layer;Subelement is added, for the rear position level addition batch standardization layer, non-linear layer, residual in the single channel Three dimensional convolution layer Difference branch, superpositing unit and 1x1 packet convolutional layers, obtain the single channel Three dimensional convolution unit.
- 8. device according to claim 5, it is characterised in that the processing unit includes:Split subelement, for splitting to the video to be identified, obtain multiple second regarding with same preset length Frequency sequence;Subelement is inputted, for multiple second video sequences to be inputted to the target convolutional neural networks, is obtained described Preliminary recognition result corresponding to multiple second video sequences;Subelement is handled, for being handled according to preset data processing mode the preliminary recognition result, obtains the mesh Recognition result is marked, wherein, the preset data processing mode includes at least one following:Obtain the pole of the preliminary recognition result It is worth, obtains the average value of the preliminary recognition result and summation is weighted to the preliminary recognition result.
- A kind of 9. storage medium, it is characterised in that the storage medium includes the program of storage, wherein, run in described program When control the storage medium where equipment perform claim require that 1 people's volumetric video into claim 4 described in any one is moved Make recognition methods.
- A kind of 10. processor, it is characterised in that the processor is used for operation program, wherein, right of execution when described program is run Profit requires the 1 human body video actions recognition methods into claim 4 described in any one.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201711154691.0A CN107808150A (en) | 2017-11-20 | 2017-11-20 | The recognition methods of human body video actions, device, storage medium and processor |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201711154691.0A CN107808150A (en) | 2017-11-20 | 2017-11-20 | The recognition methods of human body video actions, device, storage medium and processor |
Publications (1)
Publication Number | Publication Date |
---|---|
CN107808150A true CN107808150A (en) | 2018-03-16 |
Family
ID=61580278
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201711154691.0A Pending CN107808150A (en) | 2017-11-20 | 2017-11-20 | The recognition methods of human body video actions, device, storage medium and processor |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN107808150A (en) |
Cited By (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109063824A (en) * | 2018-07-25 | 2018-12-21 | 深圳市中悦科技有限公司 | Creation method, device, storage medium and the processor of deep layer Three dimensional convolution neural network |
CN109214282A (en) * | 2018-08-01 | 2019-01-15 | 中南民族大学 | A kind of three-dimension gesture critical point detection method and system neural network based |
CN109829398A (en) * | 2019-01-16 | 2019-05-31 | 北京航空航天大学 | A kind of object detection method in video based on Three dimensional convolution network |
CN110070867A (en) * | 2019-04-26 | 2019-07-30 | 珠海普林芯驰科技有限公司 | Voice instruction recognition method, computer installation and computer readable storage medium |
CN110287820A (en) * | 2019-06-06 | 2019-09-27 | 北京清微智能科技有限公司 | Activity recognition method, apparatus, equipment and medium based on LRCN network |
CN110705331A (en) * | 2018-07-09 | 2020-01-17 | 中国科学技术大学 | Sign language identification method and device |
CN111382758A (en) * | 2018-12-28 | 2020-07-07 | 杭州海康威视数字技术股份有限公司 | Training image classification model, image classification method, device, equipment and medium |
CN111598026A (en) * | 2020-05-20 | 2020-08-28 | 广州市百果园信息技术有限公司 | Action recognition method, device, equipment and storage medium |
WO2020258498A1 (en) * | 2019-06-26 | 2020-12-30 | 平安科技(深圳)有限公司 | Football match behavior recognition method and apparatus based on deep learning, and terminal device |
CN112257526A (en) * | 2020-10-10 | 2021-01-22 | 中国科学院深圳先进技术研究院 | Action identification method based on feature interactive learning and terminal equipment |
CN112997192A (en) * | 2021-02-03 | 2021-06-18 | 深圳市锐明技术股份有限公司 | Gesture recognition method and device, terminal device and readable storage medium |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20110182469A1 (en) * | 2010-01-28 | 2011-07-28 | Nec Laboratories America, Inc. | 3d convolutional neural networks for automatic human action recognition |
US20110222724A1 (en) * | 2010-03-15 | 2011-09-15 | Nec Laboratories America, Inc. | Systems and methods for determining personal characteristics |
CN104866810A (en) * | 2015-04-10 | 2015-08-26 | 北京工业大学 | Face recognition method of deep convolutional neural network |
CN106845549A (en) * | 2017-01-22 | 2017-06-13 | 珠海习悦信息技术有限公司 | A kind of method and device of the scene based on multi-task learning and target identification |
WO2017164478A1 (en) * | 2016-03-25 | 2017-09-28 | 한국과학기술원 | Method and apparatus for recognizing micro-expressions through deep learning analysis of micro-facial dynamics |
CN107316079A (en) * | 2017-08-08 | 2017-11-03 | 珠海习悦信息技术有限公司 | Processing method, device, storage medium and the processor of terminal convolutional neural networks |
-
2017
- 2017-11-20 CN CN201711154691.0A patent/CN107808150A/en active Pending
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20110182469A1 (en) * | 2010-01-28 | 2011-07-28 | Nec Laboratories America, Inc. | 3d convolutional neural networks for automatic human action recognition |
US20110222724A1 (en) * | 2010-03-15 | 2011-09-15 | Nec Laboratories America, Inc. | Systems and methods for determining personal characteristics |
CN104866810A (en) * | 2015-04-10 | 2015-08-26 | 北京工业大学 | Face recognition method of deep convolutional neural network |
WO2017164478A1 (en) * | 2016-03-25 | 2017-09-28 | 한국과학기술원 | Method and apparatus for recognizing micro-expressions through deep learning analysis of micro-facial dynamics |
CN106845549A (en) * | 2017-01-22 | 2017-06-13 | 珠海习悦信息技术有限公司 | A kind of method and device of the scene based on multi-task learning and target identification |
CN107316079A (en) * | 2017-08-08 | 2017-11-03 | 珠海习悦信息技术有限公司 | Processing method, device, storage medium and the processor of terminal convolutional neural networks |
Cited By (17)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110705331A (en) * | 2018-07-09 | 2020-01-17 | 中国科学技术大学 | Sign language identification method and device |
CN110705331B (en) * | 2018-07-09 | 2023-03-24 | 中国科学技术大学 | Sign language recognition method and device |
CN109063824A (en) * | 2018-07-25 | 2018-12-21 | 深圳市中悦科技有限公司 | Creation method, device, storage medium and the processor of deep layer Three dimensional convolution neural network |
CN109063824B (en) * | 2018-07-25 | 2023-04-07 | 深圳市中悦科技有限公司 | Deep three-dimensional convolutional neural network creation method and device, storage medium and processor |
CN109214282A (en) * | 2018-08-01 | 2019-01-15 | 中南民族大学 | A kind of three-dimension gesture critical point detection method and system neural network based |
CN111382758B (en) * | 2018-12-28 | 2023-12-26 | 杭州海康威视数字技术股份有限公司 | Training image classification model, image classification method, device, equipment and medium |
CN111382758A (en) * | 2018-12-28 | 2020-07-07 | 杭州海康威视数字技术股份有限公司 | Training image classification model, image classification method, device, equipment and medium |
CN109829398A (en) * | 2019-01-16 | 2019-05-31 | 北京航空航天大学 | A kind of object detection method in video based on Three dimensional convolution network |
CN109829398B (en) * | 2019-01-16 | 2020-03-31 | 北京航空航天大学 | Target detection method in video based on three-dimensional convolution network |
CN110070867A (en) * | 2019-04-26 | 2019-07-30 | 珠海普林芯驰科技有限公司 | Voice instruction recognition method, computer installation and computer readable storage medium |
CN110070867B (en) * | 2019-04-26 | 2022-03-11 | 珠海普林芯驰科技有限公司 | Speech instruction recognition method, computer device and computer-readable storage medium |
CN110287820B (en) * | 2019-06-06 | 2021-07-23 | 北京清微智能科技有限公司 | Behavior recognition method, device, equipment and medium based on LRCN network |
CN110287820A (en) * | 2019-06-06 | 2019-09-27 | 北京清微智能科技有限公司 | Activity recognition method, apparatus, equipment and medium based on LRCN network |
WO2020258498A1 (en) * | 2019-06-26 | 2020-12-30 | 平安科技(深圳)有限公司 | Football match behavior recognition method and apparatus based on deep learning, and terminal device |
CN111598026A (en) * | 2020-05-20 | 2020-08-28 | 广州市百果园信息技术有限公司 | Action recognition method, device, equipment and storage medium |
CN112257526A (en) * | 2020-10-10 | 2021-01-22 | 中国科学院深圳先进技术研究院 | Action identification method based on feature interactive learning and terminal equipment |
CN112997192A (en) * | 2021-02-03 | 2021-06-18 | 深圳市锐明技术股份有限公司 | Gesture recognition method and device, terminal device and readable storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107808150A (en) | The recognition methods of human body video actions, device, storage medium and processor | |
CN106570477B (en) | Vehicle cab recognition model building method and model recognizing method based on deep learning | |
Zhang et al. | Ga-net: Guided aggregation net for end-to-end stereo matching | |
CN105095862B (en) | A kind of human motion recognition method based on depth convolution condition random field | |
CN106157319B (en) | The conspicuousness detection method in region and Pixel-level fusion based on convolutional neural networks | |
KR102302725B1 (en) | Room Layout Estimation Methods and Techniques | |
CN105657402B (en) | A kind of depth map restoration methods | |
CN106204499B (en) | Removing rain based on single image method based on convolutional neural networks | |
CN107292256A (en) | Depth convolved wavelets neutral net expression recognition method based on secondary task | |
CN106709461A (en) | Video based behavior recognition method and device | |
CN109376681A (en) | A kind of more people's Attitude estimation method and system | |
CN109871781A (en) | Dynamic gesture identification method and system based on multi-modal 3D convolutional neural networks | |
CN110210603A (en) | Counter model construction method, method of counting and the device of crowd | |
CN111626184B (en) | Crowd density estimation method and system | |
CN110781893B (en) | Feature map processing method, image processing method, device and storage medium | |
CN108062562A (en) | A kind of object recognition methods and device again | |
CN109272509A (en) | A kind of object detection method of consecutive image, device, equipment and storage medium | |
CN107292250A (en) | A kind of gait recognition method based on deep neural network | |
CN107784322A (en) | Abnormal deviation data examination method, device, storage medium and program product | |
CN108121931A (en) | two-dimensional code data processing method, device and mobile terminal | |
CN107633226A (en) | A kind of human action Tracking Recognition method and system | |
CN110222760A (en) | A kind of fast image processing method based on winograd algorithm | |
CN115100574A (en) | Action identification method and system based on fusion graph convolution network and Transformer network | |
CN104881640A (en) | Method and device for acquiring vectors | |
CN112651360B (en) | Skeleton action recognition method under small sample |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
AD01 | Patent right deemed abandoned | ||
AD01 | Patent right deemed abandoned |
Effective date of abandoning: 20220415 |