CN108171134A - A kind of operational motion discrimination method and device - Google Patents
A kind of operational motion discrimination method and device Download PDFInfo
- Publication number
- CN108171134A CN108171134A CN201711387866.2A CN201711387866A CN108171134A CN 108171134 A CN108171134 A CN 108171134A CN 201711387866 A CN201711387866 A CN 201711387866A CN 108171134 A CN108171134 A CN 108171134A
- Authority
- CN
- China
- Prior art keywords
- action
- video
- identified
- operational motion
- deep learning
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/20—Movements or behaviour, e.g. gesture recognition
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/049—Temporal neural networks, e.g. delay elements, oscillating neurons or pulsed inputs
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/40—Scenes; Scene-specific elements in video content
- G06V20/41—Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items
- G06V20/42—Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items of sport video content
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Software Systems (AREA)
- Computational Linguistics (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Data Mining & Analysis (AREA)
- Molecular Biology (AREA)
- Biophysics (AREA)
- General Engineering & Computer Science (AREA)
- Biomedical Technology (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- Life Sciences & Earth Sciences (AREA)
- Multimedia (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Psychiatry (AREA)
- Social Psychology (AREA)
- Human Computer Interaction (AREA)
- Image Analysis (AREA)
Abstract
The present invention provides a kind of operational motion discrimination method and device.The method includes:Video clip to be identified is obtained, wherein, a kind of type of action is included in the video clip to be identified;According to the video clip to be identified and the action recognition identification model pre-established, the type of action of the video to be identified is identified.Operational motion discrimination method and device provided by the invention can successively extract information from Pixel-level initial data to abstract semantic concept, its feature of aspect ratio engineer extracted has more efficient ability to express, and rapidly and accurately operational motion can be recognized.
Description
Technical field
The present invention relates to machine vision pattern technology fields, and in particular to a kind of operational motion discrimination method and device.
Background technology
Urban track traffic assumes responsibility for large-scale transport task between urban inner and outskirts of a town, is the public visitor of modern city
The important component of railway and highway system is transported, ensures that its operational safety is particularly important.According to China's Rail Transit System accident statistics
Data, in the reason of leading to great driving accident, the human factors such as operation error of train operator occupy major portion.Therefore,
Monitoring train operator in real time has found its operation error and gives warning with correcting, early to reducing safety accident and casualties
There is highly important realistic meaning.
However existing driver monitors system, is mostly used for the physical condition of monitoring driver.Such as bullet train is anti-sudden
Dead system, the system can only simply recognize the survival condition of driver;Also some Wearables, by the heart for measuring driver
Electricity and pulse signal, so as to judge the current working status of driver, but the equipment seriously affects operation of the driver to train.By
In the complexity and uncertainty of human motion, action recognition, which is then one, has more highly difficult subject, at this stage without one
The ripe equipment of set can directly recognize the operational motion of train operator.
In terms of general action recognition, most methods are devoted to design effective motion feature, then pass through
This feature carries out the classification of motion.Such as intensive track (DT) algorithm, exercise data is subjected to dynamic time warping (DTW), then
Its image grey level histogram (HOG), light stream pros figure (HOF) and optical flow gradient histogram (MBH) are extracted, is finally compiled
Code, so as to obtain sports immunology feature and classify.The accuracy of identification of these methods depends on the quality of motion feature, for
Different scenes need to carry out Different Optimization, therefore wide usage is poor.In addition, the accuracy of action recognition is also relied on and is just acquired
The dimension of data, the exercise data comprising depth information three-dimensional data or based on binocular vision is just than the fortune of common monocular vision
Dynamic data are able to record more relative position informations, therefore are easier to be identified, however its required sensor is also more
Complexity is not easy to be installed in subway driver.
Therefore, how to propose a kind of method, can quickly identify the type of operational motion, become urgent problem to be solved.
Invention content
For the defects in the prior art, the present invention provides a kind of operational motion discrimination method and devices.
In a first aspect, the present invention provides a kind of operational motion discrimination method, including:
Video clip to be identified is obtained, wherein, a kind of type of action is included in the video clip to be identified;
According to the video clip to be identified and the action identifying model pre-established, the video to be identified is identified
Type of action.
Second aspect, the present invention provide a kind of operational motion device for identifying, including:
Acquisition module, for obtaining video clip to be identified, wherein, one kind is included in the video clip to be identified
Type of action;
Identification module, for according to the video clip to be identified and the action identifying model pre-established, identifying
The type of action of the video to be identified.
Operational motion discrimination method and device provided by the invention based on deep learning network, merge 3D convolutional Neural nets
Network and long memory network in short-term.Compared to traditional action identifying algorithm, deep learning can be from Pixel-level initial data to pumping
The semantic concept of elephant successively extracts information, and the feature of aspect ratio engineer extracted has more efficient ability to express,
Therefore there is prominent advantage in terms of image identification.In addition, 3D convolutional neural networks can obtain continuous picture frame, than read-only
The information in sequential is obtained the convolutional neural networks for taking single image more.Then, long memory network in short-term can cope with difference
The forms of motion of rate, therefore network provided by the invention, on the basis of motion detection is realized, clear in structure, complexity is low,
End-to-end operation dramatically simplifies identification algorithm flow.
Description of the drawings
In order to illustrate more clearly about the embodiment of the present invention or technical scheme of the prior art, to embodiment or will show below
There is attached drawing needed in technology description to be briefly described, it should be apparent that, the accompanying drawings in the following description is this hair
Some bright embodiments, for those of ordinary skill in the art, without creative efforts, can be with root
Other attached drawings are obtained according to these attached drawings.
Fig. 1 is the flow diagram of operational motion discrimination method provided in the embodiment of the present invention;
Figure picture answers position view when Fig. 2 is progress action identifying provided in an embodiment of the present invention;
Fig. 3 is the flow diagram for the operational motion discrimination method that further embodiment of this invention provides;
Fig. 4 is the structure diagram of deep learning network provided in an embodiment of the present invention;
Fig. 5 is the 3D convolution process schematic diagrames for the deep learning network that further embodiment of this invention provides;
Fig. 6 is the structure diagram of operational motion device for identifying provided in the embodiment of the present invention.
Specific embodiment
Purpose, technical scheme and advantage to make the embodiment of the present invention are clearer, below in conjunction with the embodiment of the present invention
In attached drawing, the technical solution in the embodiment of the present invention is clearly and completely described, it is clear that described embodiment is
Part of the embodiment of the present invention, instead of all the embodiments.Based on the embodiments of the present invention, those of ordinary skill in the art
All other embodiments obtained without creative efforts shall fall within the protection scope of the present invention.
Fig. 1 is the flow diagram of operational motion discrimination method provided in the embodiment of the present invention, as shown in Figure 1, described
Method includes:
S101, video clip to be identified is obtained, wherein, a kind of action class is included in the video clip to be identified
Type;
S102, according to the video clip to be identified and the action identifying model pre-established, identify and described wait to know
Type of action in other video clip.
Specifically, figure picture answers position view, such as Fig. 2 when Fig. 2 is progress action identifying provided in an embodiment of the present invention
It is shown.Colour TV camera can be used in the embodiment of the present invention or infrared visual sensor obtains subway driver work video, due to ground
Dark in iron is preferably to use infrared visual sensor in the present invention.
In gatherer process, personage 2 is 0.8-1.2 meters apart from camera 1.Become for the indoor illumination of reply subway driver
To change, camera 2 uses single infrared camera, lens focus 55mm, and shooting angle is 60 ° -90 °, in shooting process, camera shooting
The resolution requirement of first 2 shooting video is in more than 640*480.
Single thermal camera in subway train is installed, shoots the work video of subway driver, and to the work of shooting
Video is handled, and obtains video clip to be identified, wherein, a kind of action class is included in the video clip to be identified
Type.
Practical when being recognized to the type of action in video, video clip to be identified be input to and is pre-established
Action identification identification model in, server through calculating and identification, provide the type of action in video clip to be identified.
The embodiment of the present invention builds a model for having merged 3D convolutional neural networks and long memory network in short-term, the model
Receive the input of video using 3D convolutional neural networks, while different rates are acted using long memory network extended model in short-term
Compatibility, finally according to the video clip to be identified and the action recognition model of structure, identify described to be identified regard
The type of action of frequency.
Operational motion discrimination method provided by the invention, can be from Pixel-level initial data to abstract semantic concept successively
Information is extracted, the feature of aspect ratio engineer extracted has more efficient ability to express, can be rapidly and accurately to behaviour
Work action is recognized.
Optionally, the action identifying model is established using following steps:
The video of different types of operational motion is selected according to the video of acquisition, establishes operational motion database;
According to the operational motion database, the deep learning network model pre-established is trained, is determined described
Action identifying model.
On the basis of above-described embodiment, Fig. 3 is the stream for the operational motion discrimination method that further embodiment of this invention provides
Journey schematic diagram;When the type of action in video clip is identified, need to establish action identifying model in advance, specifically
To establish process as follows:
Using single thermal camera shooting subway driver work video free of discontinuities, the video of at least one week is acquired, then
According to train operation rules, screen and intercept out correct relevant operation action, and be classified as N classes, structure driver operation moves
Make database.Then when screening video structure database, the single sample in each action classification should be only dynamic comprising one
The video file of work or video frame intersection.
In the training network model, due to model structure requirement, it is desirable that by input data, that is, Sample video into row format
Change.
For example, for a sample i of sample database, belong to type j, sample i should be the video for including an action, false
If shared a frame images.It is divided into first(For downward rounding) part segment, each segment is interior comprising 16 frames, if most
The latter segment then gives up the segment, and the resolution ratio of each frame is adjusted to 128*128 using linear interpolation method less than 16 frames,
Build the frame stream of a 128*128*16.Meanwhile the label j of the sample is subjected to one-hot coding (One-Hot Encoding) and is compiled
Code generates the vector of a N*1 dimension, and j-th of element is 1, remaining is all zero.Then each frame stream of sample i is tied up with label j
It is fixed, therefore a sample can change intoA input.During training, using 80% sample as training set, 10% sample
Collect as verification, 10% sample is trained as test set.
It is trained using driver operation action database, so as to obtain a model that can be used for the classification of motion i.e. institute
State action identifying model.
Operational motion discrimination method provided by the invention based on deep learning network, merges 3D convolutional neural networks and length
Short-term memory network.Compared to traditional action identifying algorithm, deep learning can be from Pixel-level initial data to abstract language
Adopted concept successively extracts information, the feature of aspect ratio engineer extracted have more efficient ability to express, therefore
There is prominent advantage in terms of image identification.In addition, 3D convolutional neural networks can obtain continuous picture frame, it is more single than only reading
The convolutional neural networks of image obtain the information in sequential more.Then, long memory network in short-term can cope with different rates
Forms of motion, therefore network provided by the invention, on the basis of motion detection is realized, clear in structure, complexity is low, end-to-end
Operation, dramatically simplifies identification algorithm flow.
Optionally, the deep learning network model includes 3D convolutional neural networks and long memory network in short-term.Optionally,
The deep learning network model concrete structure includes:Multiple convolutional layers, multiple pond layers, full articulamentum, a length
When a memory layer and Softmax output layer.
On the basis of above-described embodiment, Fig. 4 is the structure diagram of deep learning network provided in an embodiment of the present invention;
Fig. 5 is the 3D convolution process schematic diagrames of deep learning network that further embodiment of this invention provides;
With reference to the content of Fig. 4 and Fig. 5, specific example is named to illustrate the training process of deep learning network model.
8 convolutional layers (1-8), 5 pond layers (9-13), 1 full articulamentum (14), 1 long short-term memory are included in the network model
Layer (15) and 1 Softmax output layer (16).
It is every layer of specific configuration below:
Conv1→Pool1→Conv2→Pool2→Conv3a→Conv3b→Pool3→Conv4a→Conv4b→
Pool4→Conv5a→Conv5b→Pool5→fc6→lstm7→Softmax
Convolutional layer 1 receives the input of 128*128*16*1.Wherein 128*128 refer to input picture width and height, 16
Refer to continuous 16 frame figure, 1 refers to that picture is single channel.Convolution kernel size is 3*3*3, and weights use mean value as 0, variance 1
Just too distribution initialization, moving step length 1, input Boundary filling is 0, and activation primitive is Relu functions, and formula is as follows:
F (x)=max (0, x)
For common convolutional layer, input is two-dimensional array, therefore the output undergone after single convolution nuclear convolution should be single
Open characteristic pattern, it is impossible to the feature in fine extraction time dimension.Different from common convolutional neural networks, the convolution kernel of the network is
Three-dimensional structure, convolution process as shown in Figure 5, convolution kernel can once receive input and the processing of continuous multiple frames picture, simultaneously
The time and space information of sample is obtained, the set of multiple characteristic patterns that output result is then is referred to as character.Finally,
Convolutional layer 1 will export the character of 64 128*128*16*1.
Pond layer 9 receives the input of 64 128*128*16*1 character.Similar with convolution process, pond core is three-dimensional
Structure, size 2*2*1, weights use the just too distribution initialization that mean value is 1 for 0, variance, moving step length 1, primary energy
Enough receive the input of a character and carry out maximum value pond.Therefore, pond layer 9 will export the spy of 64 64*64*16*1
Levy body.
Convolutional layer 2 receives the input of the character of 64 64*64*16*1.Convolution kernel size is 3*3*3, and weights use
The just too distribution initialization that mean value is 0, variance is 1, moving step length 1, input Boundary filling is 0, and activation primitive is Relu letters
Number, the character of 128 64*64*16*1 of final output.
Pond layer 10 receives the input of 128 64*64*16*1 character.Pond core size is 2*2*2, and weights use
Just too distribution initialization, the moving step length 1 that mean value is 0, variance is 1 carry out maximum value pond.Therefore, pond layer 10 will
Export the character of 128 32*32*8*1.
Convolutional layer 3 receives the input of the character of 128 32*32*8*1.Convolution kernel size is 3*3*3, and weights use
The just too distribution initialization that mean value is 0, variance is 1, moving step length 1, input Boundary filling is 0, and activation primitive is Relu letters
Number, the character of 256 32*32*8*1 of final output.
Convolutional layer 4 receives the input of the character of 256 32*32*8*1.Convolution kernel size is 3*3*3, and weights use
The just too distribution initialization that mean value is 0, variance is 1, moving step length 1, input Boundary filling is 0, and activation primitive is Relu letters
Number, the character of 256 32*32*8*1 of final output.
Pond layer 11 receives the input of 256 32*32*8*1 character.Pond core size is 2*2*2, and weights use
Just too distribution initialization, the moving step length 1 that mean value is 0, variance is 1 carry out maximum value pond.Therefore, pond layer 11 will
Export the character of 256 16*16*4*1.
Convolutional layer 5 receives the input of the character of 256 16*16*4*1.Convolution kernel size is 3*3*3, and weights use
The just too distribution initialization that mean value is 0, variance is 1, moving step length 1, input Boundary filling is 0, and activation primitive is Relu letters
Number, the character of 512 16*16*4*1 of final output.
Convolutional layer 6 receives the input of the character of 512 16*16*4*1.Convolution kernel size is 3*3*3, and weights use
The just too distribution initialization that mean value is 0, variance is 1, moving step length 1, input Boundary filling is 0, and activation primitive is Relu letters
Number, the character of 512 16*16*4*1 of final output.
Pond layer 12 receives the input of 512 16*16*4*1 character.Pond core size is 2*2*2, and weights use
Just too distribution initialization, the moving step length 1 that mean value is 0, variance is 1 carry out maximum value pond.Therefore, pond layer 12 will
Export the character of 512 8*8*2*1.
Convolutional layer 7 receives the input of the character of 512 8*8*2*1.Convolution kernel size is 3*3*3, and weights are using equal
It is worth the just too distribution initialization for being 1 for 0, variance, moving step length 1, input Boundary filling is 0, and activation primitive is Relu functions,
The character of 512 8*8*2*1 of final output.
Convolutional layer 8 receives the input of the character of 512 8*8*2*1.Convolution kernel size is 3*3*3, and weights are using equal
It is worth the just too distribution initialization for being 1 for 0, variance, moving step length 1, input Boundary filling is 0, and activation primitive is Relu functions,
The character of 512 8*8*2*1 of final output.
Pond layer 13 receives the input of 512 8*8*2*1 character.Pond core size is 2*2*2, and weights are using equal
It is worth just too distribution initialization, the moving step length 1 that are 1 for 0, variance and carries out maximum value pond.Therefore, pond layer 13 will be defeated
Go out the character of 512 4*4*1*1.
Full articulamentum 14 receives the character input of 512 4*4*1*1.Share 4096 nodes, weights use mean value for
0th, the just too distribution initialization, and use Relu activation primitives that variance is 1.Full articulamentum 14 will export 4096 characteristic value.
Long short-term memory layer 15 receives 4096 characteristic value inputs.It includes 4096 units, each unit has middle placement
Input gate forgets door and out gate.The characteristic value of generation 1000 is exported to Softmax layers 16.Weights use mean value as 0, side
Difference is initialized for 1 be just distributed very much.For 3D convolutional layers, although can receive temporal input, it can be enterprising in sequential
Capable judgement is relatively fixed, therefore limited for the unstable action effect of rate.And long short-term memory is a kind of time recurrence
Neural network can be used for interval and the relatively large event of delay variation in processing and predicted time sequence.Therefore using length
Short-term memory layer exports 1000 characteristic values and carries out the classification of motion to Softmax.
Softmax layers 16 have N number of node, and each node corresponds to a kind of type action, and export target as the general of the category
Rate, for node n, the formula of Softmax is as follows:
yn=f (Wn,xn)
As Softmax exports the probability that the sample is the n-th class.ynThe value obtained for the node from previous layer network.
In training process, cross entropy loss function, after considering numerical computations steadiness, softmax loss letters are used
Several formula is as follows:
For sample i, its correct class categories are j, if model exportsValue be 1, illustrate that classification is correct, it is this
Situation does not contribute loss function.But if classification error,Value be less than 1, at this time loss function value increase, because
This, training process will optimize weight and tend to causeValue level off to 1, so as to reduce loss function.During indiscipline, due to
Weight generates at random, therefore the probability each classified is exactly 1/N, therefore loses and approach in the case of no increase regularization
After all samples are introduced with L1 regularizations punishment, the formula of loss function is:
Training process uses stochastic gradient descent (SGD), and B is lot number, and taking 30 samples, learning rate is opened for a batch
Beginning is set as 0.003, then often halves after 10w iterative calculation, each iteration all can reversely update the weight of every layer of network.
It is according to the final gradient direction that loss function obtains:
Pi,NIt is the only heat vector of label of sample i, dimension is N*1, and j-th of element value is 1, and other element values are 0.PN
Be network model output sample i in N number of classificatory probability.After loss variation tends towards stability with training process, then stop
Only train.
After the completion of training, then the model can be used to carry out action identifying.In action identifying, first using infrared photography
Machine acquires one section of action, is then inputted action identifying model, the action identifying model will provide judging result, and result is
Certain one kind action in maneuver library is either not belonging to other actions in maneuver library, so as to fulfill action identifying.
Optionally, the obtaining step of the video of different types of operational motion is as follows:Original video is divided into
The set of segments of specially multiple continuous 16 frame pictures, then sequentially input the deep learning network model video.The video
And the spatial positional information of executive agent is acted in temporal information comprising the video and the picture.
On the basis of above-described embodiment, specifically, such as in picture driver spatial positional information, for remembering difference
The action of movement rate, so as to fulfill an accurately action identifying result.
Optionally, the type of action includes at least:Refer to poor operation, push operation, pulling process, safety check operation and gesture behaviour
Make.
On the basis of above-described embodiment, the type of action includes at least:The finger difference operation of driver, draws behaviour at push operation
Work, safety check operation and gesture operation, and these action types are stored in operational motion database.
Operational motion discrimination method provided in an embodiment of the present invention based on deep learning network, merges 3D convolutional Neural nets
Network and long memory network in short-term.Compared to traditional action identifying algorithm, deep learning can be from Pixel-level initial data to pumping
The semantic concept of elephant successively extracts information, and the feature of aspect ratio engineer extracted has more efficient ability to express,
Therefore there is prominent advantage in terms of image identification.In addition, 3D convolutional neural networks can obtain continuous picture frame, than read-only
The information in sequential is obtained the convolutional neural networks for taking single image more.Then, long memory network in short-term can cope with difference
The forms of motion of rate, therefore network provided by the invention, on the basis of motion detection is realized, clear in structure, complexity is low,
End-to-end operation dramatically simplifies identification algorithm flow.
Fig. 6 is the structure diagram of operational motion device for identifying provided in an embodiment of the present invention, as shown in fig. 6, the dress
Put including:Acquisition module 10 and identification module 20, wherein:
Acquisition module 10 is used to obtain video clip to be identified, wherein, one is included in the video clip to be identified
Kind type of action;
Identification module 20 is used to, according to the video clip to be identified and the action identifying model pre-established, identify
The type of action of the video to be identified.
Operational motion device for identifying provided in an embodiment of the present invention includes acquisition module 10 and identification module 20, acquisition module
10 obtain video clip to be identified, wherein, a kind of type of action, identification module 20 are included in the video clip to be identified
According to the video clip to be identified and the action identifying model pre-established, the action class of the video to be identified is identified
Type.
Operational motion device for identifying provided by the invention, can be from Pixel-level initial data to abstract semantic concept successively
Information is extracted, the feature of aspect ratio engineer extracted has more efficient ability to express, can be rapidly and accurately to behaviour
Work action is recognized.
Optionally, the action identifying model is established using following steps:
The video of different types of operational motion is selected according to the video of acquisition, establishes operational motion database;
According to the operational motion database, the deep learning network model pre-established is trained, is determined described
Action identifying model.
On the basis of above-described embodiment, the flow diagram of operational motion discrimination method shown in Figure 3, to regarding
When type of action in frequency segment is identified, needs to establish action identifying model in advance, it is as follows specifically to establish process:
Using single thermal camera shooting subway driver work video free of discontinuities, the video of at least one week is acquired, then
According to train operation rules, screen and intercept out correct relevant operation action, and be classified as N classes, structure driver operation moves
Make database.Then when screening video structure database, the single sample in each action classification should be only dynamic comprising one
The video file of work or video frame intersection.
In the training network model, due to model structure requirement, it is desirable that by input data, that is, Sample video into row format
Change.
For example, for a sample i of sample database, belong to type j, sample i should be the video for including an action, false
If shared a frame images.It is divided into first(For downward rounding) part segment, each segment is interior comprising 16 frames, if most
The latter segment then gives up the segment, and the resolution ratio of each frame is adjusted to 128*128 using linear interpolation method less than 16 frames,
Build the frame stream of a 128*128*16.Meanwhile the label j of the sample is subjected to one-hot coding (One-Hot Encoding) and is compiled
Code generates the vector of a N*1 dimension, and j-th of element is 1, remaining is all zero.Then each frame stream of sample i is tied up with label j
It is fixed, therefore a sample can change intoA input.During training, using 80% sample as training set, 10% sample
Collect as verification, 10% sample is trained as test set.
It is trained using driver operation action database, so as to obtain a model that can be used for the classification of motion i.e. institute
State action identifying model.
Operational motion device for identifying provided by the invention based on deep learning network, merges 3D convolutional neural networks and length
Short-term memory network.Compared to traditional action identifying algorithm, deep learning can be from Pixel-level initial data to abstract language
Adopted concept successively extracts information, the feature of aspect ratio engineer extracted have more efficient ability to express, therefore
There is prominent advantage in terms of image identification.In addition, 3D convolutional neural networks can obtain continuous picture frame, it is more single than only reading
The convolutional neural networks of image obtain the information in sequential more.Then, long memory network in short-term can cope with different rates
Forms of motion, therefore network provided by the invention, on the basis of motion detection is realized, clear in structure, complexity is low, end-to-end
Operation, dramatically simplifies identification algorithm flow.
Optionally, the deep learning network model includes 3D convolutional neural networks and long memory network in short-term.
Optionally, the deep learning network model concrete structure includes:Multiple convolutional layers, multiple pond layers, one it is complete
Articulamentum, one long short-term memory layer and a Softmax output layer.Optionally, the deep learning network model is specific
Structure includes:Multiple convolutional layers, multiple pond layers, a full articulamentum, one long short-term memory layer and a Softmax are defeated
Go out layer.
It is introduced in specific training process such as embodiment of the method, does not do specific introduction herein.
It is preferable to do action effect for the personage immediately ahead of identification picture pick-up device for method and device provided by the invention,
The indoor monitoring situation of subway driver is suitable for the scene that this algorithm is good at.In addition, method and device provided by the invention, base
In the infrared vision of monocular, equipment framework is simple, convenient for reequiping into subway drivers' cab.Therefore method provided by the invention and dress
It puts, it is still immature at this stage in subway driver operation identification system, it can be provided for subway driver's violation operation identification system
Solution.
Through the above description of the embodiments, those skilled in the art can be understood that each embodiment can
It is realized by the mode of software plus required general hardware platform, naturally it is also possible to pass through hardware.Based on such understanding, on
Technical solution is stated substantially in other words to embody the part that the prior art contributes in the form of software product, it should
Computer software product can store in a computer-readable storage medium, such as ROM/RAM, magnetic disc, CD, including several fingers
It enables and (can be personal computer, server or the network equipment etc.) so that computer equipment is used to perform each implementation
Method described in certain parts of example or embodiment.
Device and system embodiment described above is only schematical, wherein described be used as separating component explanation
Unit may or may not be physically separate, the component shown as unit may or may not be
Physical unit, you can be located at a place or can also be distributed in multiple network element.It can be according to the actual needs
Some or all of module therein is selected to realize the purpose of this embodiment scheme.Those of ordinary skill in the art are not paying
In the case of performing creative labour, you can to understand and implement.
Claims (10)
1. a kind of operational motion discrimination method, which is characterized in that including:
Video clip to be identified is obtained, wherein, a kind of type of action is included in the video clip to be identified;
According to the video clip to be identified and the action identifying model pre-established, the video clip to be identified is identified
In type of action.
2. according to the method described in claim 1, it is characterized in that, the action identifying model is established using following steps:
The video of different types of operational motion is selected according to the video of acquisition, establishes operational motion database;
According to the operational motion database, the deep learning network model pre-established is trained, determines the action
Identification model.
3. according to the method described in claim 2, it is characterized in that, the deep learning network model includes 3D convolutional Neural nets
Network and long memory network in short-term.
4. according to the method described in claim 3, it is characterized in that, the deep learning network model concrete structure includes:It is more
A convolutional layer, multiple pond layers, a full articulamentum, one long short-term memory layer and a Softmax output layer.
5. the according to the method described in claim 2, it is characterized in that, acquisition step of the video of different types of operational motion
It is rapid as follows:
Original video is divided into the set of segments of multiple continuous 16 frame pictures, then sequentially inputs the deep learning network mould
Type.The video includes the spatial positional information that executive agent is acted in temporal information and the picture.
6. according to the method described in claim 2, it is characterized in that, the type of action includes at least:Refer to difference operation, push away behaviour
Work, pulling process, safety check operation and gesture operation.
7. a kind of operational motion device for identifying, which is characterized in that including:
Acquisition module, for obtaining video clip to be identified, wherein, a kind of action is included in the video clip to be identified
Type;
Identification module, for according to the video clip to be identified and the action identifying model pre-established, identifying described
The type of action of video to be identified.
8. device according to claim 7, which is characterized in that the action identifying model is established using following steps:
The video of different types of operational motion is selected according to the video of acquisition, establishes operational motion database;
According to the operational motion database, the deep learning network model pre-established is trained, determines the action
Identification model.
9. device according to claim 8, which is characterized in that the deep learning network model includes 3D convolutional Neural nets
Network and long memory network in short-term.
10. device according to claim 9, which is characterized in that the deep learning network model concrete structure includes:It is more
A convolutional layer, multiple pond layers, a full articulamentum, one long short-term memory layer and a Softmax output layer.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201711387866.2A CN108171134A (en) | 2017-12-20 | 2017-12-20 | A kind of operational motion discrimination method and device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201711387866.2A CN108171134A (en) | 2017-12-20 | 2017-12-20 | A kind of operational motion discrimination method and device |
Publications (1)
Publication Number | Publication Date |
---|---|
CN108171134A true CN108171134A (en) | 2018-06-15 |
Family
ID=62523168
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201711387866.2A Pending CN108171134A (en) | 2017-12-20 | 2017-12-20 | A kind of operational motion discrimination method and device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108171134A (en) |
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109164910A (en) * | 2018-07-05 | 2019-01-08 | 北京航空航天大学合肥创新研究院 | For the multiple signals neural network architecture design method of electroencephalogram |
CN109782906A (en) * | 2018-12-28 | 2019-05-21 | 深圳云天励飞技术有限公司 | A kind of gesture identification method of advertisement machine, exchange method, device and electronic equipment |
CN110009640A (en) * | 2018-11-20 | 2019-07-12 | 腾讯科技(深圳)有限公司 | Handle method, equipment and the readable medium of heart video |
CN110866427A (en) * | 2018-08-28 | 2020-03-06 | 杭州海康威视数字技术股份有限公司 | Vehicle behavior detection method and device |
CN111460971A (en) * | 2020-03-27 | 2020-07-28 | 北京百度网讯科技有限公司 | Video concept detection method and device and electronic equipment |
WO2021008018A1 (en) * | 2019-07-18 | 2021-01-21 | 平安科技(深圳)有限公司 | Vehicle identification method and device employing artificial intelligence, and program and storage medium |
TWI760769B (en) * | 2020-06-12 | 2022-04-11 | 國立中央大學 | Computing device and method for generating a hand gesture recognition model, and hand gesture recognition device |
CN114943324A (en) * | 2022-05-26 | 2022-08-26 | 中国科学院深圳先进技术研究院 | Neural network training method, human motion recognition method and device, and storage medium |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106203283A (en) * | 2016-06-30 | 2016-12-07 | 重庆理工大学 | Based on Three dimensional convolution deep neural network and the action identification method of deep video |
CN106909887A (en) * | 2017-01-19 | 2017-06-30 | 南京邮电大学盐城大数据研究院有限公司 | A kind of action identification method based on CNN and SVM |
CN107273800A (en) * | 2017-05-17 | 2017-10-20 | 大连理工大学 | A kind of action identification method of the convolution recurrent neural network based on attention mechanism |
US20190251337A1 (en) * | 2017-02-06 | 2019-08-15 | Tencent Technology (Shenzhen) Company Limited | Facial tracking method and apparatus, storage medium, and electronic device |
-
2017
- 2017-12-20 CN CN201711387866.2A patent/CN108171134A/en active Pending
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106203283A (en) * | 2016-06-30 | 2016-12-07 | 重庆理工大学 | Based on Three dimensional convolution deep neural network and the action identification method of deep video |
CN106909887A (en) * | 2017-01-19 | 2017-06-30 | 南京邮电大学盐城大数据研究院有限公司 | A kind of action identification method based on CNN and SVM |
US20190251337A1 (en) * | 2017-02-06 | 2019-08-15 | Tencent Technology (Shenzhen) Company Limited | Facial tracking method and apparatus, storage medium, and electronic device |
CN107273800A (en) * | 2017-05-17 | 2017-10-20 | 大连理工大学 | A kind of action identification method of the convolution recurrent neural network based on attention mechanism |
Non-Patent Citations (1)
Title |
---|
秦阳 等: "3DCNNs与LSTMs在行为识别中的组合及其应用", 《测控技术》 * |
Cited By (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109164910A (en) * | 2018-07-05 | 2019-01-08 | 北京航空航天大学合肥创新研究院 | For the multiple signals neural network architecture design method of electroencephalogram |
CN109164910B (en) * | 2018-07-05 | 2021-09-21 | 北京航空航天大学合肥创新研究院 | Multi-signal neural network architecture design method for electroencephalogram |
CN110866427A (en) * | 2018-08-28 | 2020-03-06 | 杭州海康威视数字技术股份有限公司 | Vehicle behavior detection method and device |
CN110009640A (en) * | 2018-11-20 | 2019-07-12 | 腾讯科技(深圳)有限公司 | Handle method, equipment and the readable medium of heart video |
CN110009640B (en) * | 2018-11-20 | 2023-09-26 | 腾讯科技(深圳)有限公司 | Method, apparatus and readable medium for processing cardiac video |
CN109782906A (en) * | 2018-12-28 | 2019-05-21 | 深圳云天励飞技术有限公司 | A kind of gesture identification method of advertisement machine, exchange method, device and electronic equipment |
WO2021008018A1 (en) * | 2019-07-18 | 2021-01-21 | 平安科技(深圳)有限公司 | Vehicle identification method and device employing artificial intelligence, and program and storage medium |
CN111460971A (en) * | 2020-03-27 | 2020-07-28 | 北京百度网讯科技有限公司 | Video concept detection method and device and electronic equipment |
CN111460971B (en) * | 2020-03-27 | 2023-09-12 | 北京百度网讯科技有限公司 | Video concept detection method and device and electronic equipment |
TWI760769B (en) * | 2020-06-12 | 2022-04-11 | 國立中央大學 | Computing device and method for generating a hand gesture recognition model, and hand gesture recognition device |
CN114943324A (en) * | 2022-05-26 | 2022-08-26 | 中国科学院深圳先进技术研究院 | Neural network training method, human motion recognition method and device, and storage medium |
CN114943324B (en) * | 2022-05-26 | 2023-10-13 | 中国科学院深圳先进技术研究院 | Neural network training method, human motion recognition method and device, and storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108171134A (en) | A kind of operational motion discrimination method and device | |
Zendel et al. | Railsem19: A dataset for semantic rail scene understanding | |
CN109919031B (en) | Human behavior recognition method based on deep neural network | |
CN105787458B (en) | The infrared behavior recognition methods adaptively merged based on artificial design features and deep learning feature | |
CN111598030A (en) | Method and system for detecting and segmenting vehicle in aerial image | |
CN105574550B (en) | A kind of vehicle identification method and device | |
CN113688652B (en) | Abnormal driving behavior processing method and device | |
CN108830252A (en) | A kind of convolutional neural networks human motion recognition method of amalgamation of global space-time characteristic | |
CN109101914A (en) | It is a kind of based on multiple dimensioned pedestrian detection method and device | |
CN107085696A (en) | A kind of vehicle location and type identifier method based on bayonet socket image | |
CN107945153A (en) | A kind of road surface crack detection method based on deep learning | |
CN107220603A (en) | Vehicle checking method and device based on deep learning | |
CN112633149B (en) | Domain-adaptive foggy-day image target detection method and device | |
KR102035592B1 (en) | A supporting system and method that assist partial inspections of suspicious objects in cctv video streams by using multi-level object recognition technology to reduce workload of human-eye based inspectors | |
CN108154102A (en) | A kind of traffic sign recognition method | |
CN110532925B (en) | Driver fatigue detection method based on space-time graph convolutional network | |
CN110111565A (en) | A kind of people's vehicle flowrate System and method for flowed down based on real-time video | |
CN108875754B (en) | Vehicle re-identification method based on multi-depth feature fusion network | |
CN109961037A (en) | A kind of examination hall video monitoring abnormal behavior recognition methods | |
CN109902676A (en) | A kind of separated based on dynamic background stops detection algorithm | |
CN108229300A (en) | Video classification methods, device, computer readable storage medium and electronic equipment | |
CN110222604A (en) | Target identification method and device based on shared convolutional neural networks | |
CN112287827A (en) | Complex environment pedestrian mask wearing detection method and system based on intelligent lamp pole | |
CN112200089B (en) | Dense vehicle detection method based on vehicle counting perception attention | |
CN109935080A (en) | The monitoring system and method that a kind of vehicle flowrate on traffic route calculates in real time |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20180615 |