CN108171134A - A kind of operational motion discrimination method and device - Google Patents

A kind of operational motion discrimination method and device Download PDF

Info

Publication number
CN108171134A
CN108171134A CN201711387866.2A CN201711387866A CN108171134A CN 108171134 A CN108171134 A CN 108171134A CN 201711387866 A CN201711387866 A CN 201711387866A CN 108171134 A CN108171134 A CN 108171134A
Authority
CN
China
Prior art keywords
action
video
identified
operational motion
deep learning
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201711387866.2A
Other languages
Chinese (zh)
Inventor
唐海川
李欣旭
龚明
孙帮成
田寅
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
CRRC Industry Institute Co Ltd
Original Assignee
CRRC Industry Institute Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by CRRC Industry Institute Co Ltd filed Critical CRRC Industry Institute Co Ltd
Priority to CN201711387866.2A priority Critical patent/CN108171134A/en
Publication of CN108171134A publication Critical patent/CN108171134A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/20Movements or behaviour, e.g. gesture recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/049Temporal neural networks, e.g. delay elements, oscillating neurons or pulsed inputs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/41Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items
    • G06V20/42Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items of sport video content

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Computational Linguistics (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • Biophysics (AREA)
  • General Engineering & Computer Science (AREA)
  • Biomedical Technology (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Psychiatry (AREA)
  • Social Psychology (AREA)
  • Human Computer Interaction (AREA)
  • Image Analysis (AREA)

Abstract

The present invention provides a kind of operational motion discrimination method and device.The method includes:Video clip to be identified is obtained, wherein, a kind of type of action is included in the video clip to be identified;According to the video clip to be identified and the action recognition identification model pre-established, the type of action of the video to be identified is identified.Operational motion discrimination method and device provided by the invention can successively extract information from Pixel-level initial data to abstract semantic concept, its feature of aspect ratio engineer extracted has more efficient ability to express, and rapidly and accurately operational motion can be recognized.

Description

A kind of operational motion discrimination method and device
Technical field
The present invention relates to machine vision pattern technology fields, and in particular to a kind of operational motion discrimination method and device.
Background technology
Urban track traffic assumes responsibility for large-scale transport task between urban inner and outskirts of a town, is the public visitor of modern city The important component of railway and highway system is transported, ensures that its operational safety is particularly important.According to China's Rail Transit System accident statistics Data, in the reason of leading to great driving accident, the human factors such as operation error of train operator occupy major portion.Therefore, Monitoring train operator in real time has found its operation error and gives warning with correcting, early to reducing safety accident and casualties There is highly important realistic meaning.
However existing driver monitors system, is mostly used for the physical condition of monitoring driver.Such as bullet train is anti-sudden Dead system, the system can only simply recognize the survival condition of driver;Also some Wearables, by the heart for measuring driver Electricity and pulse signal, so as to judge the current working status of driver, but the equipment seriously affects operation of the driver to train.By In the complexity and uncertainty of human motion, action recognition, which is then one, has more highly difficult subject, at this stage without one The ripe equipment of set can directly recognize the operational motion of train operator.
In terms of general action recognition, most methods are devoted to design effective motion feature, then pass through This feature carries out the classification of motion.Such as intensive track (DT) algorithm, exercise data is subjected to dynamic time warping (DTW), then Its image grey level histogram (HOG), light stream pros figure (HOF) and optical flow gradient histogram (MBH) are extracted, is finally compiled Code, so as to obtain sports immunology feature and classify.The accuracy of identification of these methods depends on the quality of motion feature, for Different scenes need to carry out Different Optimization, therefore wide usage is poor.In addition, the accuracy of action recognition is also relied on and is just acquired The dimension of data, the exercise data comprising depth information three-dimensional data or based on binocular vision is just than the fortune of common monocular vision Dynamic data are able to record more relative position informations, therefore are easier to be identified, however its required sensor is also more Complexity is not easy to be installed in subway driver.
Therefore, how to propose a kind of method, can quickly identify the type of operational motion, become urgent problem to be solved.
Invention content
For the defects in the prior art, the present invention provides a kind of operational motion discrimination method and devices.
In a first aspect, the present invention provides a kind of operational motion discrimination method, including:
Video clip to be identified is obtained, wherein, a kind of type of action is included in the video clip to be identified;
According to the video clip to be identified and the action identifying model pre-established, the video to be identified is identified Type of action.
Second aspect, the present invention provide a kind of operational motion device for identifying, including:
Acquisition module, for obtaining video clip to be identified, wherein, one kind is included in the video clip to be identified Type of action;
Identification module, for according to the video clip to be identified and the action identifying model pre-established, identifying The type of action of the video to be identified.
Operational motion discrimination method and device provided by the invention based on deep learning network, merge 3D convolutional Neural nets Network and long memory network in short-term.Compared to traditional action identifying algorithm, deep learning can be from Pixel-level initial data to pumping The semantic concept of elephant successively extracts information, and the feature of aspect ratio engineer extracted has more efficient ability to express, Therefore there is prominent advantage in terms of image identification.In addition, 3D convolutional neural networks can obtain continuous picture frame, than read-only The information in sequential is obtained the convolutional neural networks for taking single image more.Then, long memory network in short-term can cope with difference The forms of motion of rate, therefore network provided by the invention, on the basis of motion detection is realized, clear in structure, complexity is low, End-to-end operation dramatically simplifies identification algorithm flow.
Description of the drawings
In order to illustrate more clearly about the embodiment of the present invention or technical scheme of the prior art, to embodiment or will show below There is attached drawing needed in technology description to be briefly described, it should be apparent that, the accompanying drawings in the following description is this hair Some bright embodiments, for those of ordinary skill in the art, without creative efforts, can be with root Other attached drawings are obtained according to these attached drawings.
Fig. 1 is the flow diagram of operational motion discrimination method provided in the embodiment of the present invention;
Figure picture answers position view when Fig. 2 is progress action identifying provided in an embodiment of the present invention;
Fig. 3 is the flow diagram for the operational motion discrimination method that further embodiment of this invention provides;
Fig. 4 is the structure diagram of deep learning network provided in an embodiment of the present invention;
Fig. 5 is the 3D convolution process schematic diagrames for the deep learning network that further embodiment of this invention provides;
Fig. 6 is the structure diagram of operational motion device for identifying provided in the embodiment of the present invention.
Specific embodiment
Purpose, technical scheme and advantage to make the embodiment of the present invention are clearer, below in conjunction with the embodiment of the present invention In attached drawing, the technical solution in the embodiment of the present invention is clearly and completely described, it is clear that described embodiment is Part of the embodiment of the present invention, instead of all the embodiments.Based on the embodiments of the present invention, those of ordinary skill in the art All other embodiments obtained without creative efforts shall fall within the protection scope of the present invention.
Fig. 1 is the flow diagram of operational motion discrimination method provided in the embodiment of the present invention, as shown in Figure 1, described Method includes:
S101, video clip to be identified is obtained, wherein, a kind of action class is included in the video clip to be identified Type;
S102, according to the video clip to be identified and the action identifying model pre-established, identify and described wait to know Type of action in other video clip.
Specifically, figure picture answers position view, such as Fig. 2 when Fig. 2 is progress action identifying provided in an embodiment of the present invention It is shown.Colour TV camera can be used in the embodiment of the present invention or infrared visual sensor obtains subway driver work video, due to ground Dark in iron is preferably to use infrared visual sensor in the present invention.
In gatherer process, personage 2 is 0.8-1.2 meters apart from camera 1.Become for the indoor illumination of reply subway driver To change, camera 2 uses single infrared camera, lens focus 55mm, and shooting angle is 60 ° -90 °, in shooting process, camera shooting The resolution requirement of first 2 shooting video is in more than 640*480.
Single thermal camera in subway train is installed, shoots the work video of subway driver, and to the work of shooting Video is handled, and obtains video clip to be identified, wherein, a kind of action class is included in the video clip to be identified Type.
Practical when being recognized to the type of action in video, video clip to be identified be input to and is pre-established Action identification identification model in, server through calculating and identification, provide the type of action in video clip to be identified.
The embodiment of the present invention builds a model for having merged 3D convolutional neural networks and long memory network in short-term, the model Receive the input of video using 3D convolutional neural networks, while different rates are acted using long memory network extended model in short-term Compatibility, finally according to the video clip to be identified and the action recognition model of structure, identify described to be identified regard The type of action of frequency.
Operational motion discrimination method provided by the invention, can be from Pixel-level initial data to abstract semantic concept successively Information is extracted, the feature of aspect ratio engineer extracted has more efficient ability to express, can be rapidly and accurately to behaviour Work action is recognized.
Optionally, the action identifying model is established using following steps:
The video of different types of operational motion is selected according to the video of acquisition, establishes operational motion database;
According to the operational motion database, the deep learning network model pre-established is trained, is determined described Action identifying model.
On the basis of above-described embodiment, Fig. 3 is the stream for the operational motion discrimination method that further embodiment of this invention provides Journey schematic diagram;When the type of action in video clip is identified, need to establish action identifying model in advance, specifically To establish process as follows:
Using single thermal camera shooting subway driver work video free of discontinuities, the video of at least one week is acquired, then According to train operation rules, screen and intercept out correct relevant operation action, and be classified as N classes, structure driver operation moves Make database.Then when screening video structure database, the single sample in each action classification should be only dynamic comprising one The video file of work or video frame intersection.
In the training network model, due to model structure requirement, it is desirable that by input data, that is, Sample video into row format Change.
For example, for a sample i of sample database, belong to type j, sample i should be the video for including an action, false If shared a frame images.It is divided into first(For downward rounding) part segment, each segment is interior comprising 16 frames, if most The latter segment then gives up the segment, and the resolution ratio of each frame is adjusted to 128*128 using linear interpolation method less than 16 frames, Build the frame stream of a 128*128*16.Meanwhile the label j of the sample is subjected to one-hot coding (One-Hot Encoding) and is compiled Code generates the vector of a N*1 dimension, and j-th of element is 1, remaining is all zero.Then each frame stream of sample i is tied up with label j It is fixed, therefore a sample can change intoA input.During training, using 80% sample as training set, 10% sample Collect as verification, 10% sample is trained as test set.
It is trained using driver operation action database, so as to obtain a model that can be used for the classification of motion i.e. institute State action identifying model.
Operational motion discrimination method provided by the invention based on deep learning network, merges 3D convolutional neural networks and length Short-term memory network.Compared to traditional action identifying algorithm, deep learning can be from Pixel-level initial data to abstract language Adopted concept successively extracts information, the feature of aspect ratio engineer extracted have more efficient ability to express, therefore There is prominent advantage in terms of image identification.In addition, 3D convolutional neural networks can obtain continuous picture frame, it is more single than only reading The convolutional neural networks of image obtain the information in sequential more.Then, long memory network in short-term can cope with different rates Forms of motion, therefore network provided by the invention, on the basis of motion detection is realized, clear in structure, complexity is low, end-to-end Operation, dramatically simplifies identification algorithm flow.
Optionally, the deep learning network model includes 3D convolutional neural networks and long memory network in short-term.Optionally, The deep learning network model concrete structure includes:Multiple convolutional layers, multiple pond layers, full articulamentum, a length When a memory layer and Softmax output layer.
On the basis of above-described embodiment, Fig. 4 is the structure diagram of deep learning network provided in an embodiment of the present invention; Fig. 5 is the 3D convolution process schematic diagrames of deep learning network that further embodiment of this invention provides;
With reference to the content of Fig. 4 and Fig. 5, specific example is named to illustrate the training process of deep learning network model. 8 convolutional layers (1-8), 5 pond layers (9-13), 1 full articulamentum (14), 1 long short-term memory are included in the network model Layer (15) and 1 Softmax output layer (16).
It is every layer of specific configuration below:
Conv1→Pool1→Conv2→Pool2→Conv3a→Conv3b→Pool3→Conv4a→Conv4b→ Pool4→Conv5a→Conv5b→Pool5→fc6→lstm7→Softmax
Convolutional layer 1 receives the input of 128*128*16*1.Wherein 128*128 refer to input picture width and height, 16 Refer to continuous 16 frame figure, 1 refers to that picture is single channel.Convolution kernel size is 3*3*3, and weights use mean value as 0, variance 1 Just too distribution initialization, moving step length 1, input Boundary filling is 0, and activation primitive is Relu functions, and formula is as follows:
F (x)=max (0, x)
For common convolutional layer, input is two-dimensional array, therefore the output undergone after single convolution nuclear convolution should be single Open characteristic pattern, it is impossible to the feature in fine extraction time dimension.Different from common convolutional neural networks, the convolution kernel of the network is Three-dimensional structure, convolution process as shown in Figure 5, convolution kernel can once receive input and the processing of continuous multiple frames picture, simultaneously The time and space information of sample is obtained, the set of multiple characteristic patterns that output result is then is referred to as character.Finally, Convolutional layer 1 will export the character of 64 128*128*16*1.
Pond layer 9 receives the input of 64 128*128*16*1 character.Similar with convolution process, pond core is three-dimensional Structure, size 2*2*1, weights use the just too distribution initialization that mean value is 1 for 0, variance, moving step length 1, primary energy Enough receive the input of a character and carry out maximum value pond.Therefore, pond layer 9 will export the spy of 64 64*64*16*1 Levy body.
Convolutional layer 2 receives the input of the character of 64 64*64*16*1.Convolution kernel size is 3*3*3, and weights use The just too distribution initialization that mean value is 0, variance is 1, moving step length 1, input Boundary filling is 0, and activation primitive is Relu letters Number, the character of 128 64*64*16*1 of final output.
Pond layer 10 receives the input of 128 64*64*16*1 character.Pond core size is 2*2*2, and weights use Just too distribution initialization, the moving step length 1 that mean value is 0, variance is 1 carry out maximum value pond.Therefore, pond layer 10 will Export the character of 128 32*32*8*1.
Convolutional layer 3 receives the input of the character of 128 32*32*8*1.Convolution kernel size is 3*3*3, and weights use The just too distribution initialization that mean value is 0, variance is 1, moving step length 1, input Boundary filling is 0, and activation primitive is Relu letters Number, the character of 256 32*32*8*1 of final output.
Convolutional layer 4 receives the input of the character of 256 32*32*8*1.Convolution kernel size is 3*3*3, and weights use The just too distribution initialization that mean value is 0, variance is 1, moving step length 1, input Boundary filling is 0, and activation primitive is Relu letters Number, the character of 256 32*32*8*1 of final output.
Pond layer 11 receives the input of 256 32*32*8*1 character.Pond core size is 2*2*2, and weights use Just too distribution initialization, the moving step length 1 that mean value is 0, variance is 1 carry out maximum value pond.Therefore, pond layer 11 will Export the character of 256 16*16*4*1.
Convolutional layer 5 receives the input of the character of 256 16*16*4*1.Convolution kernel size is 3*3*3, and weights use The just too distribution initialization that mean value is 0, variance is 1, moving step length 1, input Boundary filling is 0, and activation primitive is Relu letters Number, the character of 512 16*16*4*1 of final output.
Convolutional layer 6 receives the input of the character of 512 16*16*4*1.Convolution kernel size is 3*3*3, and weights use The just too distribution initialization that mean value is 0, variance is 1, moving step length 1, input Boundary filling is 0, and activation primitive is Relu letters Number, the character of 512 16*16*4*1 of final output.
Pond layer 12 receives the input of 512 16*16*4*1 character.Pond core size is 2*2*2, and weights use Just too distribution initialization, the moving step length 1 that mean value is 0, variance is 1 carry out maximum value pond.Therefore, pond layer 12 will Export the character of 512 8*8*2*1.
Convolutional layer 7 receives the input of the character of 512 8*8*2*1.Convolution kernel size is 3*3*3, and weights are using equal It is worth the just too distribution initialization for being 1 for 0, variance, moving step length 1, input Boundary filling is 0, and activation primitive is Relu functions, The character of 512 8*8*2*1 of final output.
Convolutional layer 8 receives the input of the character of 512 8*8*2*1.Convolution kernel size is 3*3*3, and weights are using equal It is worth the just too distribution initialization for being 1 for 0, variance, moving step length 1, input Boundary filling is 0, and activation primitive is Relu functions, The character of 512 8*8*2*1 of final output.
Pond layer 13 receives the input of 512 8*8*2*1 character.Pond core size is 2*2*2, and weights are using equal It is worth just too distribution initialization, the moving step length 1 that are 1 for 0, variance and carries out maximum value pond.Therefore, pond layer 13 will be defeated Go out the character of 512 4*4*1*1.
Full articulamentum 14 receives the character input of 512 4*4*1*1.Share 4096 nodes, weights use mean value for 0th, the just too distribution initialization, and use Relu activation primitives that variance is 1.Full articulamentum 14 will export 4096 characteristic value.
Long short-term memory layer 15 receives 4096 characteristic value inputs.It includes 4096 units, each unit has middle placement Input gate forgets door and out gate.The characteristic value of generation 1000 is exported to Softmax layers 16.Weights use mean value as 0, side Difference is initialized for 1 be just distributed very much.For 3D convolutional layers, although can receive temporal input, it can be enterprising in sequential Capable judgement is relatively fixed, therefore limited for the unstable action effect of rate.And long short-term memory is a kind of time recurrence Neural network can be used for interval and the relatively large event of delay variation in processing and predicted time sequence.Therefore using length Short-term memory layer exports 1000 characteristic values and carries out the classification of motion to Softmax.
Softmax layers 16 have N number of node, and each node corresponds to a kind of type action, and export target as the general of the category Rate, for node n, the formula of Softmax is as follows:
yn=f (Wn,xn)
As Softmax exports the probability that the sample is the n-th class.ynThe value obtained for the node from previous layer network.
In training process, cross entropy loss function, after considering numerical computations steadiness, softmax loss letters are used Several formula is as follows:
For sample i, its correct class categories are j, if model exportsValue be 1, illustrate that classification is correct, it is this Situation does not contribute loss function.But if classification error,Value be less than 1, at this time loss function value increase, because This, training process will optimize weight and tend to causeValue level off to 1, so as to reduce loss function.During indiscipline, due to Weight generates at random, therefore the probability each classified is exactly 1/N, therefore loses and approach in the case of no increase regularization
After all samples are introduced with L1 regularizations punishment, the formula of loss function is:
Training process uses stochastic gradient descent (SGD), and B is lot number, and taking 30 samples, learning rate is opened for a batch Beginning is set as 0.003, then often halves after 10w iterative calculation, each iteration all can reversely update the weight of every layer of network. It is according to the final gradient direction that loss function obtains:
Pi,NIt is the only heat vector of label of sample i, dimension is N*1, and j-th of element value is 1, and other element values are 0.PN Be network model output sample i in N number of classificatory probability.After loss variation tends towards stability with training process, then stop Only train.
After the completion of training, then the model can be used to carry out action identifying.In action identifying, first using infrared photography Machine acquires one section of action, is then inputted action identifying model, the action identifying model will provide judging result, and result is Certain one kind action in maneuver library is either not belonging to other actions in maneuver library, so as to fulfill action identifying.
Optionally, the obtaining step of the video of different types of operational motion is as follows:Original video is divided into The set of segments of specially multiple continuous 16 frame pictures, then sequentially input the deep learning network model video.The video And the spatial positional information of executive agent is acted in temporal information comprising the video and the picture.
On the basis of above-described embodiment, specifically, such as in picture driver spatial positional information, for remembering difference The action of movement rate, so as to fulfill an accurately action identifying result.
Optionally, the type of action includes at least:Refer to poor operation, push operation, pulling process, safety check operation and gesture behaviour Make.
On the basis of above-described embodiment, the type of action includes at least:The finger difference operation of driver, draws behaviour at push operation Work, safety check operation and gesture operation, and these action types are stored in operational motion database.
Operational motion discrimination method provided in an embodiment of the present invention based on deep learning network, merges 3D convolutional Neural nets Network and long memory network in short-term.Compared to traditional action identifying algorithm, deep learning can be from Pixel-level initial data to pumping The semantic concept of elephant successively extracts information, and the feature of aspect ratio engineer extracted has more efficient ability to express, Therefore there is prominent advantage in terms of image identification.In addition, 3D convolutional neural networks can obtain continuous picture frame, than read-only The information in sequential is obtained the convolutional neural networks for taking single image more.Then, long memory network in short-term can cope with difference The forms of motion of rate, therefore network provided by the invention, on the basis of motion detection is realized, clear in structure, complexity is low, End-to-end operation dramatically simplifies identification algorithm flow.
Fig. 6 is the structure diagram of operational motion device for identifying provided in an embodiment of the present invention, as shown in fig. 6, the dress Put including:Acquisition module 10 and identification module 20, wherein:
Acquisition module 10 is used to obtain video clip to be identified, wherein, one is included in the video clip to be identified Kind type of action;
Identification module 20 is used to, according to the video clip to be identified and the action identifying model pre-established, identify The type of action of the video to be identified.
Operational motion device for identifying provided in an embodiment of the present invention includes acquisition module 10 and identification module 20, acquisition module 10 obtain video clip to be identified, wherein, a kind of type of action, identification module 20 are included in the video clip to be identified According to the video clip to be identified and the action identifying model pre-established, the action class of the video to be identified is identified Type.
Operational motion device for identifying provided by the invention, can be from Pixel-level initial data to abstract semantic concept successively Information is extracted, the feature of aspect ratio engineer extracted has more efficient ability to express, can be rapidly and accurately to behaviour Work action is recognized.
Optionally, the action identifying model is established using following steps:
The video of different types of operational motion is selected according to the video of acquisition, establishes operational motion database;
According to the operational motion database, the deep learning network model pre-established is trained, is determined described Action identifying model.
On the basis of above-described embodiment, the flow diagram of operational motion discrimination method shown in Figure 3, to regarding When type of action in frequency segment is identified, needs to establish action identifying model in advance, it is as follows specifically to establish process:
Using single thermal camera shooting subway driver work video free of discontinuities, the video of at least one week is acquired, then According to train operation rules, screen and intercept out correct relevant operation action, and be classified as N classes, structure driver operation moves Make database.Then when screening video structure database, the single sample in each action classification should be only dynamic comprising one The video file of work or video frame intersection.
In the training network model, due to model structure requirement, it is desirable that by input data, that is, Sample video into row format Change.
For example, for a sample i of sample database, belong to type j, sample i should be the video for including an action, false If shared a frame images.It is divided into first(For downward rounding) part segment, each segment is interior comprising 16 frames, if most The latter segment then gives up the segment, and the resolution ratio of each frame is adjusted to 128*128 using linear interpolation method less than 16 frames, Build the frame stream of a 128*128*16.Meanwhile the label j of the sample is subjected to one-hot coding (One-Hot Encoding) and is compiled Code generates the vector of a N*1 dimension, and j-th of element is 1, remaining is all zero.Then each frame stream of sample i is tied up with label j It is fixed, therefore a sample can change intoA input.During training, using 80% sample as training set, 10% sample Collect as verification, 10% sample is trained as test set.
It is trained using driver operation action database, so as to obtain a model that can be used for the classification of motion i.e. institute State action identifying model.
Operational motion device for identifying provided by the invention based on deep learning network, merges 3D convolutional neural networks and length Short-term memory network.Compared to traditional action identifying algorithm, deep learning can be from Pixel-level initial data to abstract language Adopted concept successively extracts information, the feature of aspect ratio engineer extracted have more efficient ability to express, therefore There is prominent advantage in terms of image identification.In addition, 3D convolutional neural networks can obtain continuous picture frame, it is more single than only reading The convolutional neural networks of image obtain the information in sequential more.Then, long memory network in short-term can cope with different rates Forms of motion, therefore network provided by the invention, on the basis of motion detection is realized, clear in structure, complexity is low, end-to-end Operation, dramatically simplifies identification algorithm flow.
Optionally, the deep learning network model includes 3D convolutional neural networks and long memory network in short-term.
Optionally, the deep learning network model concrete structure includes:Multiple convolutional layers, multiple pond layers, one it is complete Articulamentum, one long short-term memory layer and a Softmax output layer.Optionally, the deep learning network model is specific Structure includes:Multiple convolutional layers, multiple pond layers, a full articulamentum, one long short-term memory layer and a Softmax are defeated Go out layer.
It is introduced in specific training process such as embodiment of the method, does not do specific introduction herein.
It is preferable to do action effect for the personage immediately ahead of identification picture pick-up device for method and device provided by the invention, The indoor monitoring situation of subway driver is suitable for the scene that this algorithm is good at.In addition, method and device provided by the invention, base In the infrared vision of monocular, equipment framework is simple, convenient for reequiping into subway drivers' cab.Therefore method provided by the invention and dress It puts, it is still immature at this stage in subway driver operation identification system, it can be provided for subway driver's violation operation identification system Solution.
Through the above description of the embodiments, those skilled in the art can be understood that each embodiment can It is realized by the mode of software plus required general hardware platform, naturally it is also possible to pass through hardware.Based on such understanding, on Technical solution is stated substantially in other words to embody the part that the prior art contributes in the form of software product, it should Computer software product can store in a computer-readable storage medium, such as ROM/RAM, magnetic disc, CD, including several fingers It enables and (can be personal computer, server or the network equipment etc.) so that computer equipment is used to perform each implementation Method described in certain parts of example or embodiment.
Device and system embodiment described above is only schematical, wherein described be used as separating component explanation Unit may or may not be physically separate, the component shown as unit may or may not be Physical unit, you can be located at a place or can also be distributed in multiple network element.It can be according to the actual needs Some or all of module therein is selected to realize the purpose of this embodiment scheme.Those of ordinary skill in the art are not paying In the case of performing creative labour, you can to understand and implement.

Claims (10)

1. a kind of operational motion discrimination method, which is characterized in that including:
Video clip to be identified is obtained, wherein, a kind of type of action is included in the video clip to be identified;
According to the video clip to be identified and the action identifying model pre-established, the video clip to be identified is identified In type of action.
2. according to the method described in claim 1, it is characterized in that, the action identifying model is established using following steps:
The video of different types of operational motion is selected according to the video of acquisition, establishes operational motion database;
According to the operational motion database, the deep learning network model pre-established is trained, determines the action Identification model.
3. according to the method described in claim 2, it is characterized in that, the deep learning network model includes 3D convolutional Neural nets Network and long memory network in short-term.
4. according to the method described in claim 3, it is characterized in that, the deep learning network model concrete structure includes:It is more A convolutional layer, multiple pond layers, a full articulamentum, one long short-term memory layer and a Softmax output layer.
5. the according to the method described in claim 2, it is characterized in that, acquisition step of the video of different types of operational motion It is rapid as follows:
Original video is divided into the set of segments of multiple continuous 16 frame pictures, then sequentially inputs the deep learning network mould Type.The video includes the spatial positional information that executive agent is acted in temporal information and the picture.
6. according to the method described in claim 2, it is characterized in that, the type of action includes at least:Refer to difference operation, push away behaviour Work, pulling process, safety check operation and gesture operation.
7. a kind of operational motion device for identifying, which is characterized in that including:
Acquisition module, for obtaining video clip to be identified, wherein, a kind of action is included in the video clip to be identified Type;
Identification module, for according to the video clip to be identified and the action identifying model pre-established, identifying described The type of action of video to be identified.
8. device according to claim 7, which is characterized in that the action identifying model is established using following steps:
The video of different types of operational motion is selected according to the video of acquisition, establishes operational motion database;
According to the operational motion database, the deep learning network model pre-established is trained, determines the action Identification model.
9. device according to claim 8, which is characterized in that the deep learning network model includes 3D convolutional Neural nets Network and long memory network in short-term.
10. device according to claim 9, which is characterized in that the deep learning network model concrete structure includes:It is more A convolutional layer, multiple pond layers, a full articulamentum, one long short-term memory layer and a Softmax output layer.
CN201711387866.2A 2017-12-20 2017-12-20 A kind of operational motion discrimination method and device Pending CN108171134A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201711387866.2A CN108171134A (en) 2017-12-20 2017-12-20 A kind of operational motion discrimination method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201711387866.2A CN108171134A (en) 2017-12-20 2017-12-20 A kind of operational motion discrimination method and device

Publications (1)

Publication Number Publication Date
CN108171134A true CN108171134A (en) 2018-06-15

Family

ID=62523168

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201711387866.2A Pending CN108171134A (en) 2017-12-20 2017-12-20 A kind of operational motion discrimination method and device

Country Status (1)

Country Link
CN (1) CN108171134A (en)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109164910A (en) * 2018-07-05 2019-01-08 北京航空航天大学合肥创新研究院 For the multiple signals neural network architecture design method of electroencephalogram
CN109782906A (en) * 2018-12-28 2019-05-21 深圳云天励飞技术有限公司 A kind of gesture identification method of advertisement machine, exchange method, device and electronic equipment
CN110009640A (en) * 2018-11-20 2019-07-12 腾讯科技(深圳)有限公司 Handle method, equipment and the readable medium of heart video
CN110866427A (en) * 2018-08-28 2020-03-06 杭州海康威视数字技术股份有限公司 Vehicle behavior detection method and device
CN111460971A (en) * 2020-03-27 2020-07-28 北京百度网讯科技有限公司 Video concept detection method and device and electronic equipment
WO2021008018A1 (en) * 2019-07-18 2021-01-21 平安科技(深圳)有限公司 Vehicle identification method and device employing artificial intelligence, and program and storage medium
TWI760769B (en) * 2020-06-12 2022-04-11 國立中央大學 Computing device and method for generating a hand gesture recognition model, and hand gesture recognition device
CN114943324A (en) * 2022-05-26 2022-08-26 中国科学院深圳先进技术研究院 Neural network training method, human motion recognition method and device, and storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106203283A (en) * 2016-06-30 2016-12-07 重庆理工大学 Based on Three dimensional convolution deep neural network and the action identification method of deep video
CN106909887A (en) * 2017-01-19 2017-06-30 南京邮电大学盐城大数据研究院有限公司 A kind of action identification method based on CNN and SVM
CN107273800A (en) * 2017-05-17 2017-10-20 大连理工大学 A kind of action identification method of the convolution recurrent neural network based on attention mechanism
US20190251337A1 (en) * 2017-02-06 2019-08-15 Tencent Technology (Shenzhen) Company Limited Facial tracking method and apparatus, storage medium, and electronic device

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106203283A (en) * 2016-06-30 2016-12-07 重庆理工大学 Based on Three dimensional convolution deep neural network and the action identification method of deep video
CN106909887A (en) * 2017-01-19 2017-06-30 南京邮电大学盐城大数据研究院有限公司 A kind of action identification method based on CNN and SVM
US20190251337A1 (en) * 2017-02-06 2019-08-15 Tencent Technology (Shenzhen) Company Limited Facial tracking method and apparatus, storage medium, and electronic device
CN107273800A (en) * 2017-05-17 2017-10-20 大连理工大学 A kind of action identification method of the convolution recurrent neural network based on attention mechanism

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
秦阳 等: "3DCNNs与LSTMs在行为识别中的组合及其应用", 《测控技术》 *

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109164910A (en) * 2018-07-05 2019-01-08 北京航空航天大学合肥创新研究院 For the multiple signals neural network architecture design method of electroencephalogram
CN109164910B (en) * 2018-07-05 2021-09-21 北京航空航天大学合肥创新研究院 Multi-signal neural network architecture design method for electroencephalogram
CN110866427A (en) * 2018-08-28 2020-03-06 杭州海康威视数字技术股份有限公司 Vehicle behavior detection method and device
CN110009640A (en) * 2018-11-20 2019-07-12 腾讯科技(深圳)有限公司 Handle method, equipment and the readable medium of heart video
CN110009640B (en) * 2018-11-20 2023-09-26 腾讯科技(深圳)有限公司 Method, apparatus and readable medium for processing cardiac video
CN109782906A (en) * 2018-12-28 2019-05-21 深圳云天励飞技术有限公司 A kind of gesture identification method of advertisement machine, exchange method, device and electronic equipment
WO2021008018A1 (en) * 2019-07-18 2021-01-21 平安科技(深圳)有限公司 Vehicle identification method and device employing artificial intelligence, and program and storage medium
CN111460971A (en) * 2020-03-27 2020-07-28 北京百度网讯科技有限公司 Video concept detection method and device and electronic equipment
CN111460971B (en) * 2020-03-27 2023-09-12 北京百度网讯科技有限公司 Video concept detection method and device and electronic equipment
TWI760769B (en) * 2020-06-12 2022-04-11 國立中央大學 Computing device and method for generating a hand gesture recognition model, and hand gesture recognition device
CN114943324A (en) * 2022-05-26 2022-08-26 中国科学院深圳先进技术研究院 Neural network training method, human motion recognition method and device, and storage medium
CN114943324B (en) * 2022-05-26 2023-10-13 中国科学院深圳先进技术研究院 Neural network training method, human motion recognition method and device, and storage medium

Similar Documents

Publication Publication Date Title
CN108171134A (en) A kind of operational motion discrimination method and device
Zendel et al. Railsem19: A dataset for semantic rail scene understanding
CN109919031B (en) Human behavior recognition method based on deep neural network
CN105787458B (en) The infrared behavior recognition methods adaptively merged based on artificial design features and deep learning feature
CN111598030A (en) Method and system for detecting and segmenting vehicle in aerial image
CN105574550B (en) A kind of vehicle identification method and device
CN113688652B (en) Abnormal driving behavior processing method and device
CN108830252A (en) A kind of convolutional neural networks human motion recognition method of amalgamation of global space-time characteristic
CN109101914A (en) It is a kind of based on multiple dimensioned pedestrian detection method and device
CN107085696A (en) A kind of vehicle location and type identifier method based on bayonet socket image
CN107945153A (en) A kind of road surface crack detection method based on deep learning
CN107220603A (en) Vehicle checking method and device based on deep learning
CN112633149B (en) Domain-adaptive foggy-day image target detection method and device
KR102035592B1 (en) A supporting system and method that assist partial inspections of suspicious objects in cctv video streams by using multi-level object recognition technology to reduce workload of human-eye based inspectors
CN108154102A (en) A kind of traffic sign recognition method
CN110532925B (en) Driver fatigue detection method based on space-time graph convolutional network
CN110111565A (en) A kind of people's vehicle flowrate System and method for flowed down based on real-time video
CN108875754B (en) Vehicle re-identification method based on multi-depth feature fusion network
CN109961037A (en) A kind of examination hall video monitoring abnormal behavior recognition methods
CN109902676A (en) A kind of separated based on dynamic background stops detection algorithm
CN108229300A (en) Video classification methods, device, computer readable storage medium and electronic equipment
CN110222604A (en) Target identification method and device based on shared convolutional neural networks
CN112287827A (en) Complex environment pedestrian mask wearing detection method and system based on intelligent lamp pole
CN112200089B (en) Dense vehicle detection method based on vehicle counting perception attention
CN109935080A (en) The monitoring system and method that a kind of vehicle flowrate on traffic route calculates in real time

Legal Events

Date Code Title Description
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20180615