CN107808150A - The recognition methods of human body video actions, device, storage medium and processor - Google Patents

The recognition methods of human body video actions, device, storage medium and processor Download PDF

Info

Publication number
CN107808150A
CN107808150A CN201711154691.0A CN201711154691A CN107808150A CN 107808150 A CN107808150 A CN 107808150A CN 201711154691 A CN201711154691 A CN 201711154691A CN 107808150 A CN107808150 A CN 107808150A
Authority
CN
China
Prior art keywords
network model
neural network
video
convolution
target
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201711154691.0A
Other languages
Chinese (zh)
Inventor
周文明
王志鹏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhuhai Xi Yue Information Technology Co Ltd
Original Assignee
Zhuhai Xi Yue Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhuhai Xi Yue Information Technology Co Ltd filed Critical Zhuhai Xi Yue Information Technology Co Ltd
Priority to CN201711154691.0A priority Critical patent/CN107808150A/en
Publication of CN107808150A publication Critical patent/CN107808150A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/20Movements or behaviour, e.g. gesture recognition
    • G06V40/23Recognition of whole body movements, e.g. for sport training
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/49Segmenting video sequences, i.e. computational techniques such as parsing or cutting the sequence, low-level clustering or determining units such as shots or scenes

Abstract

The invention discloses a kind of recognition methods of human body video actions, device, storage medium and processor.Wherein, this method includes:First convolution neural network model is created according to default full tunnel three dimensional convolution kernel;Neural network model is accumulated according to the deliberate action identification data collection training first volume, obtains the second convolution neural network model;At least part full tunnel Three dimensional convolution layer in first convolution neural network model is replaced with into single channel Three dimensional convolution unit, obtains the 3rd convolutional neural networks model;3rd convolutional neural networks model is trained according to deliberate action identification data collection and the second convolution neural network model, obtains target convolution neural network model;Video to be identified is inputted to target convolution neural network model, obtains target identification result.The present invention solves the technical problem that human action identification method computational accuracy is relatively low, computational efficiency is poor present in prior art.

Description

The recognition methods of human body video actions, device, storage medium and processor
Technical field
The present invention relates to field of video processing, in particular to a kind of human body video actions recognition methods, device, deposits Storage media and processor.
Background technology
With the informationization of society, web development tide, various videos emerge in multitude, as monitoring system security protection video, From shoot the video, network media video etc..The motion analysis identification technology of intelligence is for extensive video frequency searching, man-machine interaction, peace The application such as anti-monitoring and early warning, visual classification plays an important roll.
Conventional action identification is carried out by technologies such as optical flow method, intensive trajectory analysises, engineer and selected characteristic, is calculated Complexity, and performance bottleneck be present.Along with breakthrough development of the deep learning in image classification field, deep learning correlation technique Gradually infiltrate into video analysis action recognition field.But current human action identification method exist computational accuracy it is relatively low, meter Calculate the poor technical problem of efficiency.
For it is above-mentioned the problem of, not yet propose effective solution at present.
The content of the invention
The embodiments of the invention provide a kind of recognition methods of human body video actions, device, storage medium and processor, so that Solves the technical problem that human action identification method computational accuracy is relatively low, computational efficiency is poor present in prior art less.
One side according to embodiments of the present invention, there is provided a kind of human body video actions recognition methods, this method include: First convolution neural network model is created according to default full tunnel three dimensional convolution kernel;According in deliberate action identification data collection training The first convolution neural network model is stated, obtains the second convolution neural network model, wherein, above-mentioned second convolution neural network model To reach the above-mentioned first convolution neural network model of convergence state;By at least portion in above-mentioned first convolution neural network model Divide full tunnel Three dimensional convolution layer to replace with single channel Three dimensional convolution unit, obtain the 3rd convolutional neural networks model;According to above-mentioned Deliberate action identification data collection and above-mentioned second convolution neural network model train above-mentioned 3rd convolutional neural networks model, obtain Target convolution neural network model, wherein, above-mentioned target convolution neural network model is reach convergence state above-mentioned volume three Product neural network model;Input video to be identified to above-mentioned target volume and accumulate neural network model, obtain target identification result.
Further, before according to the above-mentioned first convolution neural network model of deliberate action identification data collection training, on Stating method also includes:Obtain the video data in target video;By above-mentioned partitioning video data into multiple video clip sections, its In, each above-mentioned video clip section only includes single action classification;Pre-set categories label is added to above-mentioned video clip section, obtained Above-mentioned deliberate action identification data collection.
Further, above-mentioned at least part full tunnel Three dimensional convolution layer by above-mentioned first convolution neural network model replaces Being changed to single channel Three dimensional convolution unit includes:It is three-dimensional that above-mentioned at least part full tunnel Three dimensional convolution layer is replaced with into above-mentioned single channel Convolutional layer;In the rear position level addition batch standardization layer of above-mentioned single channel Three dimensional convolution layer, non-linear layer, residual error branch, superposition Unit and 1x1 packet convolutional layers, obtain above-mentioned single channel Three dimensional convolution unit.
Further, above-mentioned input video to be identified to above-mentioned target volume accumulates neural network model, obtains target identification knot Fruit includes:Above-mentioned video to be identified is split, obtains multiple the second video sequences with same preset length;Will be multiple Above-mentioned second video sequence inputs to above-mentioned target volume and accumulates neutral net, obtains corresponding to above-mentioned multiple above-mentioned second video sequences Preliminary recognition result;Above-mentioned preliminary recognition result is handled according to preset data processing mode, obtains above-mentioned target identification As a result, wherein, above-mentioned preset data processing mode includes at least one following:Obtain the extreme value of above-mentioned preliminary recognition result, obtain Take the average value of above-mentioned preliminary recognition result and summation is weighted to above-mentioned preliminary recognition result.
Another aspect according to embodiments of the present invention, a kind of human body video actions identification device is additionally provided, the device bag Include:Creating unit, for creating the first convolution neural network model according to default full tunnel three dimensional convolution kernel;First training is single Member, for according to the above-mentioned first convolution neural network model of deliberate action identification data collection training, obtaining the second convolution nerve net Network model, wherein, above-mentioned second convolution neural network model is the above-mentioned first convolution neural network model for reaching convergence state; Replacement unit, at least part full tunnel Three dimensional convolution layer in above-mentioned first convolution neural network model to be replaced with into single-pass Road Three dimensional convolution unit, obtains the 3rd convolutional neural networks model;Second training unit, for being identified according to above-mentioned deliberate action Data set and above-mentioned second convolution neural network model train above-mentioned 3rd convolutional neural networks model, obtain target convolutional Neural Network model, wherein, above-mentioned target convolution neural network model is the above-mentioned 3rd convolutional neural networks mould for reaching convergence state Type;Processing unit, neural network model is accumulated for inputting video to be identified to above-mentioned target volume, obtains target identification result.
Further, said apparatus also includes:Acquiring unit, for obtaining the video data in target video;Segmentation is single Member, for by above-mentioned partitioning video data into multiple video clip sections, wherein, each above-mentioned video clip section is only comprising single dynamic Make classification;Adding device, for adding pre-set categories label to above-mentioned video clip section, obtain above-mentioned deliberate action identification data Collection.
Further, above-mentioned replacement unit includes:Subelement is replaced, for by least part full tunnel Three dimensional convolution Layer replaces with the single channel Three dimensional convolution layer;Subelement is added, for the rear position level in the single channel Three dimensional convolution layer Addition batch standardization layer, non-linear layer, residual error branch, superpositing unit and 1x1 packet convolutional layers, obtain the three-dimensional volume of the single channel Product unit.
Further, above-mentioned processing unit includes:Split subelement, for splitting to above-mentioned video to be identified, obtain To multiple the second video sequences with same preset length;Subelement is inputted, for multiple above-mentioned second video sequences are defeated Enter to above-mentioned target volume and accumulate neutral net, obtain preliminary recognition result corresponding to above-mentioned multiple above-mentioned second video sequences;Processing Subelement, for being handled according to preset data processing mode above-mentioned preliminary recognition result, obtain above-mentioned target identification knot Fruit, wherein, above-mentioned preset data processing mode includes at least one following:Obtain the extreme value of above-mentioned preliminary recognition result, obtain The average value of above-mentioned preliminary recognition result and summation is weighted to above-mentioned preliminary recognition result.
Another aspect according to embodiments of the present invention, provides a kind of storage medium again, and above-mentioned storage medium includes storage Program, wherein, equipment where above-mentioned storage medium is controlled when said procedure is run performs above-mentioned human body video actions and known Other method.
Another aspect according to embodiments of the present invention, a kind of processor is provided again, above-mentioned processor is used for operation program, Wherein, above-mentioned human body video actions recognition methods is performed when said procedure is run.
In embodiments of the present invention, the first convolution neural network model is created using according to default full tunnel three dimensional convolution kernel Mode;Neural network model is accumulated according to the deliberate action identification data collection training first volume, obtains the second convolutional neural networks mould Type, wherein, the second convolution neural network model is the first convolution neural network model for reaching convergence state;By by the first volume At least part full tunnel Three dimensional convolution layer in product neural network model replaces with single channel Three dimensional convolution unit, obtains volume three Product neural network model;3rd convolutional Neural net is trained according to deliberate action identification data collection and the second convolution neural network model Network model, target convolution neural network model is obtained, wherein, target convolution neural network model is reach convergence state the 3 Convolutional neural networks model;Reach input video to be identified to target convolution neural network model, obtain target identification result Purpose, it is achieved thereby that lifting human action accuracy of identification, improve human action identification efficiency technical problem.
Brief description of the drawings
Accompanying drawing described herein is used for providing a further understanding of the present invention, forms the part of the application, this hair Bright schematic description and description is used to explain the present invention, does not form inappropriate limitation of the present invention.In the accompanying drawings:
Fig. 1 is a kind of schematic flow sheet of optional human body video actions recognition methods according to embodiments of the present invention;
Fig. 2 is the schematic flow sheet of the optional human body video actions recognition methods of another kind according to embodiments of the present invention;
Fig. 3 is the schematic flow sheet of another optional human body video actions recognition methods according to embodiments of the present invention;
Fig. 4 is the schematic flow sheet of another optional human body video actions recognition methods according to embodiments of the present invention;
Fig. 5 is a kind of structural representation of optional human body video actions identification device according to embodiments of the present invention;
Fig. 6 is a kind of structural representation of optional first convolution neural network model according to embodiments of the present invention;
Embodiment
In order that those skilled in the art more fully understand the present invention program, below in conjunction with the embodiment of the present invention Accompanying drawing, the technical scheme in the embodiment of the present invention is clearly and completely described, it is clear that described embodiment is only The embodiment of a part of the invention, rather than whole embodiments.Based on the embodiment in the present invention, ordinary skill people The every other embodiment that member is obtained under the premise of creative work is not made, it should all belong to the model that the present invention protects Enclose.
It should be noted that term " first " in description and claims of this specification and above-mentioned accompanying drawing, " Two " etc. be for distinguishing similar object, without for describing specific order or precedence.It should be appreciated that so use Data can exchange in the appropriate case, so as to embodiments of the invention described herein can with except illustrating herein or Order beyond those of description is implemented.In addition, term " comprising " and " having " and their any deformation, it is intended that cover Cover it is non-exclusive include, be not necessarily limited to for example, containing the process of series of steps or unit, method, system, product or equipment Those steps or unit clearly listed, but may include not list clearly or for these processes, method, product Or the intrinsic other steps of equipment or unit.
Embodiment 1
According to embodiments of the present invention, there is provided a kind of embodiment of human body video actions recognition methods, it is necessary to explanation, It can be performed the step of the flow of accompanying drawing illustrates in the computer system of such as one group computer executable instructions, and And although showing logical order in flow charts, in some cases, can be with different from order execution institute herein The step of showing or describing.
Fig. 1 is a kind of schematic flow sheet of optional human body video actions recognition methods according to embodiments of the present invention, such as Shown in Fig. 1, this method comprises the following steps:
Step S102, the first convolution neural network model is created according to default full tunnel three dimensional convolution kernel;
Step S104, neural network model is accumulated according to the deliberate action identification data collection training first volume, obtains the second convolution Neural network model, wherein, the second convolution neural network model is the first convolution neural network model for reaching convergence state;
Step S106, at least part full tunnel Three dimensional convolution layer in the first convolution neural network model is replaced with into single-pass Road Three dimensional convolution unit, obtains the 3rd convolutional neural networks model;
Step S108, the 3rd convolutional Neural is trained according to deliberate action identification data collection and the second convolution neural network model Network model, target convolution neural network model is obtained, wherein, target convolution neural network model is reach convergence state Three convolutional neural networks models;
Step S110, video to be identified is inputted to target convolution neural network model, obtains target identification result.
In embodiments of the present invention, the first convolution neural network model is created using according to default full tunnel three dimensional convolution kernel Mode;Neural network model is accumulated according to the deliberate action identification data collection training first volume, obtains the second convolutional neural networks mould Type, wherein, the second convolution neural network model is the first convolution neural network model for reaching convergence state;By by the first volume At least part full tunnel Three dimensional convolution layer in product neural network model replaces with single channel Three dimensional convolution unit, obtains volume three Product neural network model;3rd convolutional Neural net is trained according to deliberate action identification data collection and the second convolution neural network model Network model, target convolution neural network model is obtained, wherein, target convolution neural network model is reach convergence state the 3 Convolutional neural networks model;Reach input video to be identified to target convolution neural network model, obtain target identification result Purpose, it is achieved thereby that lifting human action accuracy of identification, improve human action identification efficiency technical problem.
Alternatively, the first convolutional neural networks in step S102 include:Input layer, Three dimensional convolution layer, three-dimensional pond layer, Non-linear layer, full articulamentum, output layer.Wherein, input layer size is [H, W, 3, F], and wherein H and W are respectively input video Height and the width, F are the number of image frames included in video.Wherein, three-dimensional pond layer uses maximum pond function.
Alternatively, in step S104, the Video segmentation that can concentrate deliberate action identification data is that length is not weighing for F Folded video sequence, input into the first convolution neural network model, be trained using gradient descent method, object function is friendship Pitch entropy error.
Alternatively, in step S106, by least part full tunnel Three dimensional convolution layer in the first convolution neural network model Single channel Three dimensional convolution unit is replaced with to include:The full tunnel connection mode of three dimensional convolution kernel in Three dimensional convolution layer is replaced with Single channel connection mode, obtains single channel Three dimensional convolution unit, and calculation formula is:Input feature vector figure is X [h, w, c, f], output Characteristic pattern is Y [h1, w1, c, f1], and convolution kernel is K [k, k, c, d], step-length 1, bias vector b, single channel Three dimensional convolution list Member exports:In the rear position of above-mentioned single channel Three dimensional convolution unit Level addition batch standardization layer, non-linear layer, residual error branch, superpositing unit and 1x1 packet convolutional layers.
Alternatively, in step S108, the Video segmentation that can concentrate deliberate action identification data is that length is not weighing for F Folded video sequence, input into the second convolution neural network model and the 3rd convolutional neural networks model, obtain soft label and Predict output valve.The cross entropy error between prediction output valve and soft label is calculated, and calculates and predicts that output valve and video are true Cross entropy error between class label, summation is weighted, obtains overall error, is trained using gradient descent method.
Alternatively, the human body video actions recognition methods in the embodiment of the present application is based on single channel Three dimensional convolution unit, structure Action recognition convolutional neural networks are made, temporal information and spatial information in input video can be utilized simultaneously, compared to tradition Two-dimensional convolution neutral net, more suitable for handle video data, enhancing action identification precision.
Alternatively, the single channel Three dimensional convolution unit in the human body video actions recognition methods in the embodiment of the present application includes Single channel Three dimensional convolution layer, batch standardization layer, non-linear layer, residual error branch, superpositing unit, 1x1 packet convolutional layers.Wherein, adopt With single channel Three dimensional convolution, compared to initial three-dimensional convolution, amount of calculation and parameter amount are reduced.Using residual error branch and 1x1 points Group convolutional layer, loss of significance caused by parameter reduces effectively is made up, is known so as to solve present in existing action recognition technology The technical problem that other precision is low, computational efficiency is poor.
Alternatively, Fig. 2 is the flow of the optional human body video actions recognition methods of another kind according to embodiments of the present invention Schematic diagram, as shown in Fig. 2 before step S104 is performed, i.e., god is being accumulated according to the deliberate action identification data collection training first volume Before network model, this method can also include:
Step S202, obtain the video data in target video;
Step S204, by partitioning video data into multiple video clip sections, wherein, each video clip section is only comprising single Action classification;
Step S206, pre-set categories label is added to video clip section, obtains deliberate action identification data collection.
Alternatively, Fig. 3 is the flow of another optional human body video actions recognition methods according to embodiments of the present invention Schematic diagram, as shown in figure 3, perform step S106, i.e., it is at least part full tunnel in the first convolution neural network model is three-dimensional Convolutional layer, which replaces with single channel Three dimensional convolution unit, to be included:
Step S302, at least part full tunnel Three dimensional convolution layer is replaced with into single channel Three dimensional convolution layer;
Step S304, in the rear position level addition batch standardization layer, non-linear layer, residual error point of single channel Three dimensional convolution layer Branch, superpositing unit and 1x1 packet convolutional layers, obtain single channel Three dimensional convolution unit.
Alternatively, Fig. 4 is the flow of another optional human body video actions recognition methods according to embodiments of the present invention Schematic diagram, as shown in figure 4, performing step S110, video to be identified is inputted to target convolution neural network model, obtains target knowledge Other result includes:
Step S402, video to be identified is split, obtain multiple the second video sequences with same preset length;
Step S404, multiple second video sequences are inputted to target convolutional neural networks, obtain multiple second video sequences Preliminary recognition result corresponding to row;
Step S406, preliminary recognition result is handled according to preset data processing mode, obtains target identification result, Wherein, preset data processing mode includes at least one following:Obtain the extreme value of preliminary recognition result, obtain preliminary recognition result Average value and summation is weighted to preliminary recognition result.
In embodiments of the present invention, the first convolution neural network model is created using according to default full tunnel three dimensional convolution kernel Mode;Neural network model is accumulated according to the deliberate action identification data collection training first volume, obtains the second convolutional neural networks mould Type, wherein, the second convolution neural network model is the first convolution neural network model for reaching convergence state;By by the first volume At least part full tunnel Three dimensional convolution layer in product neural network model replaces with single channel Three dimensional convolution unit, obtains volume three Product neural network model;3rd convolutional Neural net is trained according to deliberate action identification data collection and the second convolution neural network model Network model, target convolution neural network model is obtained, wherein, target convolution neural network model is reach convergence state the 3 Convolutional neural networks model;Reach input video to be identified to target convolution neural network model, obtain target identification result Purpose, it is achieved thereby that lifting human action accuracy of identification, improve human action identification efficiency technical problem.
Embodiment 2
Another aspect according to embodiments of the present invention, a kind of human body video actions identification device is additionally provided, such as Fig. 5 institutes Show, the device includes:
Creating unit 501, for creating the first convolution neural network model according to default full tunnel three dimensional convolution kernel;First Training unit 503, for accumulating neural network model according to the deliberate action identification data collection training first volume, obtain the second convolution god Through network model, wherein, the second convolution neural network model is the first convolution neural network model for reaching convergence state;Replace Unit 505, at least part full tunnel Three dimensional convolution layer in the first convolution neural network model to be replaced with into single channel three Convolution unit is tieed up, obtains the 3rd convolutional neural networks model;Second training unit 507, for according to deliberate action identification data Collection and the second convolution neural network model train the 3rd convolutional neural networks model, obtain target convolution neural network model, its In, target convolution neural network model is the 3rd convolutional neural networks model for reaching convergence state;Processing unit 509, is used for Video to be identified is inputted to target convolution neural network model, obtains target identification result.
Alternatively, Fig. 6 is that a kind of structure of optional first convolution neural network model according to embodiments of the present invention is shown Be intended to, as shown in fig. 6, the first convolution neural network model include input layer, 12 full tunnel Three dimensional convolution layers, five three Wei Chiization layer, two-dimensional convolution layer, full articulamentum, output layer.Specifically, each layer of parameter in the first convolution neural network model Can be:Input layer size is [H, W, 3, F], and wherein H and W are respectively the height and the width of input video, and F is to be included in video Number of image frames.Optionally, the H of input layer is set to 128, W and is set to 171, F to be set to 16.12 full tunnel Three dimensional convolution layer volumes Product core size is 3x3x3, and step-length is [1,1,1], port number is respectively 16,32,64,64,64,128,128,128,256,256, 512、512.The pond size of three-dimensional pond layer is [2,2,1], [2,2,2], [2,2,2], [2,2,2], [2,2,3] respectively, is adopted With maximum pond function.
Alternatively, device can also include:Acquiring unit, for obtaining the video data in target video;Cutting unit, For by partitioning video data into multiple video clip sections, wherein, each video clip section only includes single action classification;Addition Unit, for adding pre-set categories label to video clip section, obtain deliberate action identification data collection.
Alternatively, replacement unit includes:Subelement is replaced, at least part full tunnel Three dimensional convolution layer to be replaced with into list Passage Three dimensional convolution layer;Subelement is added, for the rear position level addition batch standardization layer, non-thread in single channel Three dimensional convolution layer Property layer, residual error branch, superpositing unit and 1x1 packet convolutional layer, obtain single channel Three dimensional convolution unit.
Alternatively, processing unit includes:Split subelement, for splitting video to be identified, obtain multiple having Second video sequence of same preset length;Subelement is inputted, for multiple second video sequences to be inputted to target convolution god Through network, preliminary recognition result corresponding to multiple second video sequences is obtained;Subelement is handled, for being handled according to preset data Mode is handled preliminary recognition result, obtains target identification result, wherein, preset data processing mode include it is following at least One of:The extreme value of preliminary recognition result is obtained, the average value of preliminary recognition result is obtained and preliminary recognition result is added Power summation.
Another aspect according to embodiments of the present invention, provides a kind of storage medium again, and storage medium includes the journey of storage Sequence, wherein, control equipment where storage medium to perform the human body video actions identification in the embodiment of the present application 1 when program is run Method.
Another aspect according to embodiments of the present invention, a kind of processor is provided again, processor is used for operation program, its In, program performs the human body video actions recognition methods in the embodiment of the present application 1 when running.
In embodiments of the present invention, the first convolution neural network model is created using according to default full tunnel three dimensional convolution kernel Mode;Neural network model is accumulated according to the deliberate action identification data collection training first volume, obtains the second convolutional neural networks mould Type, wherein, the second convolution neural network model is the first convolution neural network model for reaching convergence state;By by the first volume At least part full tunnel Three dimensional convolution layer in product neural network model replaces with single channel Three dimensional convolution unit, obtains volume three Product neural network model;3rd convolutional Neural net is trained according to deliberate action identification data collection and the second convolution neural network model Network model, target convolution neural network model is obtained, wherein, target convolution neural network model is reach convergence state the 3 Convolutional neural networks model;Reach input video to be identified to target convolution neural network model, obtain target identification result Purpose, it is achieved thereby that lifting human action accuracy of identification, improve human action identification efficiency technical problem.
The embodiments of the present invention are for illustration only, do not represent the quality of embodiment.
In the above embodiment of the present invention, the description to each embodiment all emphasizes particularly on different fields, and does not have in some embodiment The part of detailed description, it may refer to the associated description of other embodiment.
In several embodiments provided herein, it should be understood that disclosed technology contents, others can be passed through Mode is realized.Wherein, device embodiment described above is only schematical, such as the division of the unit, Ke Yiwei A kind of division of logic function, can there is an other dividing mode when actually realizing, for example, multiple units or component can combine or Person is desirably integrated into another system, or some features can be ignored, or does not perform.Another, shown or discussed is mutual Between coupling or direct-coupling or communication connection can be INDIRECT COUPLING or communication link by some interfaces, unit or module Connect, can be electrical or other forms.
The unit illustrated as separating component can be or may not be physically separate, show as unit The part shown can be or may not be physical location, you can with positioned at a place, or can also be distributed to multiple On unit.Some or all of unit therein can be selected to realize the purpose of this embodiment scheme according to the actual needs.
In addition, each functional unit in each embodiment of the present invention can be integrated in a processing unit, can also That unit is individually physically present, can also two or more units it is integrated in a unit.Above-mentioned integrated list Member can both be realized in the form of hardware, can also be realized in the form of SFU software functional unit.
If the integrated unit is realized in the form of SFU software functional unit and is used as independent production marketing or use When, it can be stored in a computer read/write memory medium.Based on such understanding, technical scheme is substantially The part to be contributed in other words to prior art or all or part of the technical scheme can be in the form of software products Embody, the computer software product is stored in a storage medium, including some instructions are causing a computer Equipment (can be personal computer, server or network equipment etc.) perform each embodiment methods described of the present invention whole or Part steps.And foregoing storage medium includes:USB flash disk, read-only storage (ROM, Read-Only Memory), arbitrary access are deposited Reservoir (RAM, Random Access Memory), mobile hard disk, magnetic disc or CD etc. are various can be with store program codes Medium.
Described above is only the preferred embodiment of the present invention, it is noted that for the ordinary skill people of the art For member, under the premise without departing from the principles of the invention, some improvements and modifications can also be made, these improvements and modifications also should It is considered as protection scope of the present invention.

Claims (10)

  1. A kind of 1. human body video actions recognition methods, it is characterised in that including:
    First convolution neural network model is created according to default full tunnel three dimensional convolution kernel;
    According to deliberate action identification data collection training the first convolution neural network model, the second convolutional neural networks mould is obtained Type, wherein, the second convolution neural network model is the first convolution neural network model for reaching convergence state;
    At least part full tunnel Three dimensional convolution layer in the first convolution neural network model is replaced with into the three-dimensional volume of single channel Product unit, obtain the 3rd convolutional neural networks model;
    3rd convolutional Neural is trained according to the deliberate action identification data collection and the second convolution neural network model Network model, target convolution neural network model is obtained, wherein, the target convolution neural network model is to reach convergence state The 3rd convolutional neural networks model;
    Video to be identified is inputted to the target convolution neural network model, obtains target identification result.
  2. 2. according to the method for claim 1, it is characterised in that according to deliberate action identification data collection training described first Before convolutional neural networks model, methods described also includes:
    Obtain the video data in target video;
    By the partitioning video data into multiple video clip sections, wherein, each video clip section only includes single action Classification;
    Pre-set categories label is added to the video clip section, obtains the deliberate action identification data collection.
  3. 3. according to the method for claim 1, it is characterised in that it is described by the first convolution neural network model extremely Small part full tunnel Three dimensional convolution layer, which replaces with single channel Three dimensional convolution unit, to be included:
    At least part full tunnel Three dimensional convolution layer is replaced with into the single channel Three dimensional convolution layer;
    In rear position level addition batch standardization layer, non-linear layer, residual error branch, the superpositing unit of the single channel Three dimensional convolution layer Convolutional layer is grouped with 1x1, obtains the single channel Three dimensional convolution unit.
  4. 4. according to the method for claim 1, it is characterised in that the input video to be identified to the target convolutional Neural Network model, obtaining target identification result includes:
    The video to be identified is split, obtains multiple the second video sequences with same preset length;
    Multiple second video sequences are inputted to the target convolutional neural networks, obtain the multiple second video Preliminary recognition result corresponding to sequence;
    The preliminary recognition result is handled according to preset data processing mode, obtains the target identification result, wherein, The preset data processing mode includes at least one following:Extreme value, the acquisition for obtaining the preliminary recognition result are described preliminary The average value of recognition result and summation is weighted to the preliminary recognition result.
  5. A kind of 5. human body video actions identification device, it is characterised in that including:
    Creating unit, for creating the first convolution neural network model according to default full tunnel three dimensional convolution kernel;
    First training unit, for according to deliberate action identification data collection training the first convolution neural network model, obtaining Second convolution neural network model, wherein, the second convolution neural network model is the first volume for reaching convergence state Product neural network model;
    Replacement unit, at least part full tunnel Three dimensional convolution layer in the first convolution neural network model to be replaced with Single channel Three dimensional convolution unit, obtains the 3rd convolutional neural networks model;
    Second training unit, for according to the deliberate action identification data collection and the second convolution neural network model training The 3rd convolutional neural networks model, target convolution neural network model is obtained, wherein, the target convolutional neural networks mould Type is to reach the 3rd convolutional neural networks model of convergence state;
    Processing unit, for inputting video to be identified to the target convolution neural network model, obtain target identification result.
  6. 6. device according to claim 5, it is characterised in that described device also includes:
    Acquiring unit, for obtaining the video data in target video;
    Cutting unit, for by the partitioning video data into multiple video clip sections, wherein, each video clip section is only Include single action classification;
    Adding device, for adding pre-set categories label to the video clip section, obtain the deliberate action identification data collection.
  7. 7. device according to claim 5, it is characterised in that the replacement unit includes:
    Subelement is replaced, at least part full tunnel Three dimensional convolution layer to be replaced with into the single channel Three dimensional convolution layer;
    Subelement is added, for the rear position level addition batch standardization layer, non-linear layer, residual in the single channel Three dimensional convolution layer Difference branch, superpositing unit and 1x1 packet convolutional layers, obtain the single channel Three dimensional convolution unit.
  8. 8. device according to claim 5, it is characterised in that the processing unit includes:
    Split subelement, for splitting to the video to be identified, obtain multiple second regarding with same preset length Frequency sequence;
    Subelement is inputted, for multiple second video sequences to be inputted to the target convolutional neural networks, is obtained described Preliminary recognition result corresponding to multiple second video sequences;
    Subelement is handled, for being handled according to preset data processing mode the preliminary recognition result, obtains the mesh Recognition result is marked, wherein, the preset data processing mode includes at least one following:Obtain the pole of the preliminary recognition result It is worth, obtains the average value of the preliminary recognition result and summation is weighted to the preliminary recognition result.
  9. A kind of 9. storage medium, it is characterised in that the storage medium includes the program of storage, wherein, run in described program When control the storage medium where equipment perform claim require that 1 people's volumetric video into claim 4 described in any one is moved Make recognition methods.
  10. A kind of 10. processor, it is characterised in that the processor is used for operation program, wherein, right of execution when described program is run Profit requires the 1 human body video actions recognition methods into claim 4 described in any one.
CN201711154691.0A 2017-11-20 2017-11-20 The recognition methods of human body video actions, device, storage medium and processor Pending CN107808150A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201711154691.0A CN107808150A (en) 2017-11-20 2017-11-20 The recognition methods of human body video actions, device, storage medium and processor

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201711154691.0A CN107808150A (en) 2017-11-20 2017-11-20 The recognition methods of human body video actions, device, storage medium and processor

Publications (1)

Publication Number Publication Date
CN107808150A true CN107808150A (en) 2018-03-16

Family

ID=61580278

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201711154691.0A Pending CN107808150A (en) 2017-11-20 2017-11-20 The recognition methods of human body video actions, device, storage medium and processor

Country Status (1)

Country Link
CN (1) CN107808150A (en)

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109063824A (en) * 2018-07-25 2018-12-21 深圳市中悦科技有限公司 Creation method, device, storage medium and the processor of deep layer Three dimensional convolution neural network
CN109214282A (en) * 2018-08-01 2019-01-15 中南民族大学 A kind of three-dimension gesture critical point detection method and system neural network based
CN109829398A (en) * 2019-01-16 2019-05-31 北京航空航天大学 A kind of object detection method in video based on Three dimensional convolution network
CN110070867A (en) * 2019-04-26 2019-07-30 珠海普林芯驰科技有限公司 Voice instruction recognition method, computer installation and computer readable storage medium
CN110287820A (en) * 2019-06-06 2019-09-27 北京清微智能科技有限公司 Activity recognition method, apparatus, equipment and medium based on LRCN network
CN110705331A (en) * 2018-07-09 2020-01-17 中国科学技术大学 Sign language identification method and device
CN111382758A (en) * 2018-12-28 2020-07-07 杭州海康威视数字技术股份有限公司 Training image classification model, image classification method, device, equipment and medium
CN111598026A (en) * 2020-05-20 2020-08-28 广州市百果园信息技术有限公司 Action recognition method, device, equipment and storage medium
WO2020258498A1 (en) * 2019-06-26 2020-12-30 平安科技(深圳)有限公司 Football match behavior recognition method and apparatus based on deep learning, and terminal device
CN112257526A (en) * 2020-10-10 2021-01-22 中国科学院深圳先进技术研究院 Action identification method based on feature interactive learning and terminal equipment
CN112997192A (en) * 2021-02-03 2021-06-18 深圳市锐明技术股份有限公司 Gesture recognition method and device, terminal device and readable storage medium

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110182469A1 (en) * 2010-01-28 2011-07-28 Nec Laboratories America, Inc. 3d convolutional neural networks for automatic human action recognition
US20110222724A1 (en) * 2010-03-15 2011-09-15 Nec Laboratories America, Inc. Systems and methods for determining personal characteristics
CN104866810A (en) * 2015-04-10 2015-08-26 北京工业大学 Face recognition method of deep convolutional neural network
CN106845549A (en) * 2017-01-22 2017-06-13 珠海习悦信息技术有限公司 A kind of method and device of the scene based on multi-task learning and target identification
WO2017164478A1 (en) * 2016-03-25 2017-09-28 한국과학기술원 Method and apparatus for recognizing micro-expressions through deep learning analysis of micro-facial dynamics
CN107316079A (en) * 2017-08-08 2017-11-03 珠海习悦信息技术有限公司 Processing method, device, storage medium and the processor of terminal convolutional neural networks

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110182469A1 (en) * 2010-01-28 2011-07-28 Nec Laboratories America, Inc. 3d convolutional neural networks for automatic human action recognition
US20110222724A1 (en) * 2010-03-15 2011-09-15 Nec Laboratories America, Inc. Systems and methods for determining personal characteristics
CN104866810A (en) * 2015-04-10 2015-08-26 北京工业大学 Face recognition method of deep convolutional neural network
WO2017164478A1 (en) * 2016-03-25 2017-09-28 한국과학기술원 Method and apparatus for recognizing micro-expressions through deep learning analysis of micro-facial dynamics
CN106845549A (en) * 2017-01-22 2017-06-13 珠海习悦信息技术有限公司 A kind of method and device of the scene based on multi-task learning and target identification
CN107316079A (en) * 2017-08-08 2017-11-03 珠海习悦信息技术有限公司 Processing method, device, storage medium and the processor of terminal convolutional neural networks

Cited By (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110705331A (en) * 2018-07-09 2020-01-17 中国科学技术大学 Sign language identification method and device
CN110705331B (en) * 2018-07-09 2023-03-24 中国科学技术大学 Sign language recognition method and device
CN109063824A (en) * 2018-07-25 2018-12-21 深圳市中悦科技有限公司 Creation method, device, storage medium and the processor of deep layer Three dimensional convolution neural network
CN109063824B (en) * 2018-07-25 2023-04-07 深圳市中悦科技有限公司 Deep three-dimensional convolutional neural network creation method and device, storage medium and processor
CN109214282A (en) * 2018-08-01 2019-01-15 中南民族大学 A kind of three-dimension gesture critical point detection method and system neural network based
CN111382758B (en) * 2018-12-28 2023-12-26 杭州海康威视数字技术股份有限公司 Training image classification model, image classification method, device, equipment and medium
CN111382758A (en) * 2018-12-28 2020-07-07 杭州海康威视数字技术股份有限公司 Training image classification model, image classification method, device, equipment and medium
CN109829398A (en) * 2019-01-16 2019-05-31 北京航空航天大学 A kind of object detection method in video based on Three dimensional convolution network
CN109829398B (en) * 2019-01-16 2020-03-31 北京航空航天大学 Target detection method in video based on three-dimensional convolution network
CN110070867A (en) * 2019-04-26 2019-07-30 珠海普林芯驰科技有限公司 Voice instruction recognition method, computer installation and computer readable storage medium
CN110070867B (en) * 2019-04-26 2022-03-11 珠海普林芯驰科技有限公司 Speech instruction recognition method, computer device and computer-readable storage medium
CN110287820B (en) * 2019-06-06 2021-07-23 北京清微智能科技有限公司 Behavior recognition method, device, equipment and medium based on LRCN network
CN110287820A (en) * 2019-06-06 2019-09-27 北京清微智能科技有限公司 Activity recognition method, apparatus, equipment and medium based on LRCN network
WO2020258498A1 (en) * 2019-06-26 2020-12-30 平安科技(深圳)有限公司 Football match behavior recognition method and apparatus based on deep learning, and terminal device
CN111598026A (en) * 2020-05-20 2020-08-28 广州市百果园信息技术有限公司 Action recognition method, device, equipment and storage medium
CN112257526A (en) * 2020-10-10 2021-01-22 中国科学院深圳先进技术研究院 Action identification method based on feature interactive learning and terminal equipment
CN112997192A (en) * 2021-02-03 2021-06-18 深圳市锐明技术股份有限公司 Gesture recognition method and device, terminal device and readable storage medium

Similar Documents

Publication Publication Date Title
CN107808150A (en) The recognition methods of human body video actions, device, storage medium and processor
CN106570477B (en) Vehicle cab recognition model building method and model recognizing method based on deep learning
Zhang et al. Ga-net: Guided aggregation net for end-to-end stereo matching
CN105095862B (en) A kind of human motion recognition method based on depth convolution condition random field
CN106157319B (en) The conspicuousness detection method in region and Pixel-level fusion based on convolutional neural networks
KR102302725B1 (en) Room Layout Estimation Methods and Techniques
CN105657402B (en) A kind of depth map restoration methods
CN106204499B (en) Removing rain based on single image method based on convolutional neural networks
CN107292256A (en) Depth convolved wavelets neutral net expression recognition method based on secondary task
CN106709461A (en) Video based behavior recognition method and device
CN109376681A (en) A kind of more people's Attitude estimation method and system
CN109871781A (en) Dynamic gesture identification method and system based on multi-modal 3D convolutional neural networks
CN110210603A (en) Counter model construction method, method of counting and the device of crowd
CN111626184B (en) Crowd density estimation method and system
CN110781893B (en) Feature map processing method, image processing method, device and storage medium
CN108062562A (en) A kind of object recognition methods and device again
CN109272509A (en) A kind of object detection method of consecutive image, device, equipment and storage medium
CN107292250A (en) A kind of gait recognition method based on deep neural network
CN107784322A (en) Abnormal deviation data examination method, device, storage medium and program product
CN108121931A (en) two-dimensional code data processing method, device and mobile terminal
CN107633226A (en) A kind of human action Tracking Recognition method and system
CN110222760A (en) A kind of fast image processing method based on winograd algorithm
CN115100574A (en) Action identification method and system based on fusion graph convolution network and Transformer network
CN104881640A (en) Method and device for acquiring vectors
CN112651360B (en) Skeleton action recognition method under small sample

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
AD01 Patent right deemed abandoned
AD01 Patent right deemed abandoned

Effective date of abandoning: 20220415