CN106980826A - A kind of action identification method based on neutral net - Google Patents

A kind of action identification method based on neutral net Download PDF

Info

Publication number
CN106980826A
CN106980826A CN201710156415.1A CN201710156415A CN106980826A CN 106980826 A CN106980826 A CN 106980826A CN 201710156415 A CN201710156415 A CN 201710156415A CN 106980826 A CN106980826 A CN 106980826A
Authority
CN
China
Prior art keywords
video
trained
characteristic vector
network
extraction device
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201710156415.1A
Other languages
Chinese (zh)
Inventor
苏育挺
安阳
聂为之
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tianjin University
Original Assignee
Tianjin University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tianjin University filed Critical Tianjin University
Priority to CN201710156415.1A priority Critical patent/CN106980826A/en
Publication of CN106980826A publication Critical patent/CN106980826A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/41Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items
    • G06V20/42Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items of sport video content
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Software Systems (AREA)
  • Computational Linguistics (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Multimedia (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Image Analysis (AREA)

Abstract

Know method for distinguishing the invention discloses a kind of human action based on neutral net, the described method comprises the following steps:N number of separate 3D convolutional neural networks are trained based on video database, as video feature extraction device;According to video feature extraction device, multi-instance learning grader is trained;Video to be identified is inputted, by the network extraction video features trained, the classification of motion is carried out by grader.Influence present invention, avoiding much noise feature to classification results, it is to avoid negative effect of the fixed sample length to action recognition result.

Description

A kind of action identification method based on neutral net
Technical field
Field, more particularly to a kind of action identification method based on neutral net are recognized the present invention relates to human action.
Background technology
With the development of mobile Internet, it is many that the carrier of information gradually expands to audio, image, video etc. via word The form of kind.In recent years, the video data volume is in explosive increase, and application field more becomes diversity, is related to safety, monitoring and entertains Etc. every field[1].In face of the data of such magnanimity, traditional artificial treatment can not meet the demand of people.Therefore, it is sharp The identification and understanding to video information are realized with the powerful storage of computer and computing capability, with important learning value and extensively Wealthy application prospect.
In fact, in computer vision field, many decades have been carried out in the research on video, research topic includes dynamic Recognize, abnormality detection and video frequency searching etc..Human action identification is one of them important research direction, and is achieved larger Progress, it is each that achievement in research is related to intelligent monitoring, Medical nursing, video frequency searching, man-machine interaction, behavioural analysis, virtual reality etc. Individual field[2].Wherein, the most ripe with man-machine interaction, Kinect (body-sensing) camera of such as Microsoft can be achieved to human action Seizure and understanding.However, still suffering from very big difficult point and challenge, such as true nature scene on the research that human action is recognized Under action recognition, colony's action recognition etc..The presence of these problems, makes human body action recognition distance be efficiently applied to real field There is a very long segment distance in scape.
With the development of parallel computation equipment (GPUs (graphics processor), CPU cluster), and large scale training data Occur, convolutional neural networks (convolutional neural networks, CNNs) rise again, and in object identification, from Right Language Processing, Classification of Speech, man-machine interaction, human body is followed the trail of, the direction such as image restoration, denoising and segmentation achieve it is prominent into Really.However, in video identification field, the application of convolutional neural networks is also seldom.
The content of the invention
The invention provides a kind of action identification method based on neutral net, present invention, avoiding much noise feature pair The influence of classification results, it is to avoid negative effect of the fixed sample length to action recognition result, it is described below:
A kind of human action based on neutral net knows method for distinguishing, the described method comprises the following steps:
N number of separate 3D convolutional neural networks are trained based on video database, as video feature extraction device;
According to video feature extraction device, multi-instance learning grader is trained;
Video to be identified is inputted, by the network extraction video features trained, the classification of motion is carried out by grader.
Wherein, it is described that N number of separate 3D convolutional neural networks are trained based on video database, carried as video features The step of taking device be specially:
Each video in video library is divided into the video segment that several frame lengths are Fi, each video segment is as net A network i training sample, trains 3D convolutional neural networks, and N number of independent 3D convolutional neural networks collectively form video features Extractor.
Wherein, described according to video feature extraction device, the step of training multi-instance learning grader is specially:
Each video in database is distinguished into input video feature extractor, characteristic vector is extracted;Then will be each Video regards a bag of multi-instance learning as, and characteristic vector carries out multi-instance learning as the example in bag.
Wherein, described each video by database distinguishes input video feature extractor, extracts characteristic vector Step is specially:
A video M is given, is video segment that Mi frame length is Fi by video M points, is extracted as the input of network The characteristic vector of Mi n dimension, then video M extract (M1+M2+ ...+MN) individual characteristic vector altogether.
Wherein, the input video to be identified, by the network extraction video features trained, action is entered by grader Work is specially the step of classify:
By the network trained, the characteristic vector that P n is tieed up is extracted, whole video is regard as one of multi-instance learning Bag, each characteristic vector carries out the classification of motion by multi-instance learning as an example in bag to it.
The beneficial effect for the technical scheme that the present invention is provided is:
1st, on the basis of C3D (3D convolution) feature, the method that multiple features are produced to same video is introduced, using showing more Example learning method, influence of the reduction much noise feature to classification results;
2nd, the influence in view of length of time series to action recognition result, is respectively adopted the piece of video of different length combination Duan Jinhang feature learnings, it is to avoid negative effect of the fixed sample length to action recognition result.
Brief description of the drawings
Fig. 1 is the flow chart of the action identification method based on neutral net;
Fig. 2 is 3D convolutional neural networks structure charts;
Fig. 3 is that 3D convolutional neural networks train schematic diagram;
Fig. 4 is C3D feature extraction schematic diagrames.
Embodiment
To make the object, technical solutions and advantages of the present invention clearer, further is made to embodiment of the present invention below It is described in detail on ground.
To solve problem above, it is desirable to be able to comprehensively, it is automatic, accurately extract video features, and classified.Study table Bright C3D features have higher accuracy rate in visual classification field, and multi-instance learning can eliminate much noise feature to classification As a result influence.
Embodiment 1
The embodiment of the present invention proposes a kind of action identification method based on neutral net, referring to Fig. 1, the action recognition side Method comprises the following steps:
101:N number of separate 3D convolutional neural networks are trained based on video database, as video feature extraction device;
102:According to video feature extraction device, multi-instance learning grader is trained;
103:Video to be identified is inputted, by the network extraction video features trained, passes through grader progress action point Class.
Wherein, N number of separate 3D convolutional neural networks are trained based on video database in step 101, as regarding The step of frequency feature extractor is specially:
Each video in video library is divided into the video segment that several frame lengths are Fi, each video segment is as net A network i training sample, trains 3D convolutional neural networks, and N number of independent 3D convolutional neural networks collectively form video features Extractor.
Wherein, it is specially according to video feature extraction device, the step of training multi-instance learning grader in step 102:
Each video in database is distinguished into input video feature extractor, characteristic vector is extracted;Then will be each Video regards a bag of multi-instance learning as, and characteristic vector carries out multi-instance learning as the example in bag.
Wherein, above-mentioned each video by database distinguishes input video feature extractor, extracts characteristic vector Step is specially:
A video M is given, is video segment that Mi frame length is Fi by video M points, is extracted as the input of network The characteristic vector of Mi n dimension, then video M extract (M1+M2+ ...+MN) individual characteristic vector altogether.
Wherein, the input video to be identified in step 103, by the network extraction video features trained, passes through classification Device carry out the classification of motion the step of be specially:
By the network trained, the characteristic vector that P n is tieed up is extracted, whole video is regard as one of multi-instance learning Bag, each characteristic vector carries out the classification of motion by multi-instance learning as an example in bag to it.
In summary, the embodiment of the present invention avoids much noise feature to classification by above-mentioned steps 101- steps 103 As a result influence, it is to avoid negative effect of the fixed sample length to action recognition result, substantially increases human action identification Robustness and accuracy.
Embodiment 2
The scheme in embodiment 1 is further introduced with reference to specific example, Fig. 2-Fig. 4, it is as detailed below Description:
201:Video database is set up, and N number of separate 3D convolutional neural networks are trained based on video database, is used Make video feature extraction device, i.e. C3D features;
Wherein, the study of C3D features is carried out on 3D ConvNets (3D convolutional neural networks), its network structure Figure is as shown in Fig. 2 all convolution filter sizes are all 3*3*3, and space-time step-length is 1.It is all except Pool1 (1*2*2) Pond layer size is all 2*2*2, and step-length is 1.Finally, the output of 4096 dimensions is respectively obtained in full articulamentum fc6 and fc7.
Wherein, video feature extraction device needs the N number of separate 3D ConvNets of training, the training of each network Cheng Xiangtong, referring to Fig. 3, by taking network i (i=1,2,3 ..., N) as an example, detailed process is:Each video in database is divided into Several frame lengths are Fi video segment, and each video segment trains 3D as a network i training sample ConvNets.Change frame length Fi, repeat above procedure, N number of different 3D ConvNets can be obtained, human action is collectively formed The video feature extraction device of identifying system.
202:According to video feature extraction device, multi-instance learning grader is trained;
Wherein, the characteristic vector of each video in database is extracted using the N number of 3D ConvNets trained, each The characteristic extraction procedure of network is identical, and referring to Fig. 4, by taking network i (i=1,2,3 ..., N) as an example, detailed process is:By video M It is divided into the video segment that Mi frame length is Fi, as network i input, extracts Mi characteristic vector.Therefore, video M passes through Feature extractor (N number of 3D ConvNets) extracts (M1+M2+ ...+MN) individual characteristic vector altogether.
Finally, the video of each in video library is regarded as to a bag of multi-instance learning, carried by video feature extraction device The characteristic vector got regards an example in bag as, carries out multi-instance learning, trains sorter model.
203:Video to be identified is inputted, by the network extraction video features trained, passes through grader progress action point Class.
When carrying out action recognition, a video K to be identified is inputted, it is passed through into N number of 3D for training respectively first ConvNets, extracts (K1+K2+ ...+KN) individual characteristic vector, regard video K as a bag of multi-instance learning, characteristic vector As the example in bag, by the grader trained in step (2), classification results are obtained.
In summary, the embodiment of the present invention avoids much noise feature to classification by above-mentioned steps 2-01- steps 203 As a result influence, it is to avoid negative effect of the fixed sample length to action recognition result, substantially increases human action identification Robustness and accuracy.
Bibliography:
[1]Turaga P,Chellappa R,Subrahmanian V S,et al.Machine recognition of human activities:A survey[J].IEEE Transactions on Circuits and Systems for Video Technology,2008,18(11):1473-1488.
[2]Aggarwal J K,Ryoo M S.Human activity analysis:A review[J].ACM Computing Surveys(CSUR),2011,43(3):16.
[3]Laptev I.On space-time interest points[J].International Journal of Computer Vision,2005,64(2-3):107-123.
[4]Ji S,Xu W,Yang M,et al.3D convolutional neural networks for human action recognition[J].IEEE transactions on pattern analysis and machine intelligence,2013,35(1):221-231.
[5]Tran D,Bourdev L,Fergus R,et al.C3D:generic features for video analysis[J].CoRR,abs/1412.0767,2014,2:7.
It will be appreciated by those skilled in the art that accompanying drawing is the schematic diagram of a preferred embodiment, the embodiments of the present invention Sequence number is for illustration only, and the quality of embodiment is not represented.
The foregoing is only presently preferred embodiments of the present invention, be not intended to limit the invention, it is all the present invention spirit and Within principle, any modification, equivalent substitution and improvements made etc. should be included in the scope of the protection.

Claims (5)

1. a kind of human action based on neutral net knows method for distinguishing, it is characterised in that the described method comprises the following steps:
N number of separate 3D convolutional neural networks are trained based on video database, as video feature extraction device;
According to video feature extraction device, multi-instance learning grader is trained;
Video to be identified is inputted, by the network extraction video features trained, the classification of motion is carried out by grader.
2. a kind of human action based on neutral net according to claim 1 knows method for distinguishing, it is characterised in that described N number of separate 3D convolutional neural networks are trained based on video database, are specially the step of as video feature extraction device:
Each video in video library is divided into the video segment that several frame lengths are Fi, each video segment is as network i A training sample, train 3D convolutional neural networks, N number of independent 3D convolutional neural networks collectively form video feature extraction Device.
3. a kind of human action based on neutral net according to claim 1 knows method for distinguishing, it is characterised in that described According to video feature extraction device, the step of training multi-instance learning grader is specially:
Each video in database is distinguished into input video feature extractor, characteristic vector is extracted;Then by each video Regard a bag of multi-instance learning as, characteristic vector carries out multi-instance learning as the example in bag.
4. a kind of human action based on neutral net according to claim 3 knows method for distinguishing, it is characterised in that described Each video in database is distinguished into input video feature extractor, the step of extracting characteristic vector is specially:
A video M is given, is video segment that Mi frame length is Fi by video M points, Mi n is extracted as the input of network The characteristic vector of dimension, then video M extract (M1+M2+ ...+MN) individual characteristic vector altogether.
5. a kind of human action based on neutral net according to claim 1 knows method for distinguishing, it is characterised in that described Video to be identified is inputted, by the network extraction video features trained, the step of carrying out the classification of motion by grader is specific For:
By the network trained, the characteristic vector of P n dimension is extracted, is wrapped whole video as one of multi-instance learning, often Individual characteristic vector carries out the classification of motion by multi-instance learning as an example in bag to it.
CN201710156415.1A 2017-03-16 2017-03-16 A kind of action identification method based on neutral net Pending CN106980826A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710156415.1A CN106980826A (en) 2017-03-16 2017-03-16 A kind of action identification method based on neutral net

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710156415.1A CN106980826A (en) 2017-03-16 2017-03-16 A kind of action identification method based on neutral net

Publications (1)

Publication Number Publication Date
CN106980826A true CN106980826A (en) 2017-07-25

Family

ID=59338802

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710156415.1A Pending CN106980826A (en) 2017-03-16 2017-03-16 A kind of action identification method based on neutral net

Country Status (1)

Country Link
CN (1) CN106980826A (en)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108734095A (en) * 2018-04-10 2018-11-02 南京航空航天大学 A kind of motion detection method based on 3D convolutional neural networks
CN108846852A (en) * 2018-04-11 2018-11-20 杭州电子科技大学 Monitor video accident detection method based on more examples and time series
CN108960059A (en) * 2018-06-01 2018-12-07 众安信息技术服务有限公司 A kind of video actions recognition methods and device
CN108965585A (en) * 2018-06-22 2018-12-07 成都博宇科技有限公司 A kind of method for identifying ID based on intelligent mobile phone sensor
CN109376696A (en) * 2018-11-28 2019-02-22 北京达佳互联信息技术有限公司 Method, apparatus, computer equipment and the storage medium of video actions classification
CN111160117A (en) * 2019-12-11 2020-05-15 青岛联合创智科技有限公司 Abnormal behavior detection method based on multi-example learning modeling
CN113011322A (en) * 2021-03-17 2021-06-22 南京工业大学 Detection model training method and detection method for specific abnormal behaviors of monitoring video
WO2022116479A1 (en) * 2020-12-01 2022-06-09 南京智谷人工智能研究院有限公司 End-to-end multi-instance learning method based on automatic instance selection

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102509084A (en) * 2011-11-18 2012-06-20 中国科学院自动化研究所 Multi-examples-learning-based method for identifying horror video scene
CN104778457A (en) * 2015-04-18 2015-07-15 吉林大学 Video face identification algorithm on basis of multi-instance learning
CN105930792A (en) * 2016-04-19 2016-09-07 武汉大学 Human action classification method based on video local feature dictionary
CN106203283A (en) * 2016-06-30 2016-12-07 重庆理工大学 Based on Three dimensional convolution deep neural network and the action identification method of deep video
CN106407903A (en) * 2016-08-31 2017-02-15 四川瞳知科技有限公司 Multiple dimensioned convolution neural network-based real time human body abnormal behavior identification method
CN106504255A (en) * 2016-11-02 2017-03-15 南京大学 A kind of multi-Target Image joint dividing method based on multi-tag multi-instance learning

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102509084A (en) * 2011-11-18 2012-06-20 中国科学院自动化研究所 Multi-examples-learning-based method for identifying horror video scene
CN104778457A (en) * 2015-04-18 2015-07-15 吉林大学 Video face identification algorithm on basis of multi-instance learning
CN105930792A (en) * 2016-04-19 2016-09-07 武汉大学 Human action classification method based on video local feature dictionary
CN106203283A (en) * 2016-06-30 2016-12-07 重庆理工大学 Based on Three dimensional convolution deep neural network and the action identification method of deep video
CN106407903A (en) * 2016-08-31 2017-02-15 四川瞳知科技有限公司 Multiple dimensioned convolution neural network-based real time human body abnormal behavior identification method
CN106504255A (en) * 2016-11-02 2017-03-15 南京大学 A kind of multi-Target Image joint dividing method based on multi-tag multi-instance learning

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108734095A (en) * 2018-04-10 2018-11-02 南京航空航天大学 A kind of motion detection method based on 3D convolutional neural networks
CN108734095B (en) * 2018-04-10 2022-05-20 南京航空航天大学 Motion detection method based on 3D convolutional neural network
CN108846852A (en) * 2018-04-11 2018-11-20 杭州电子科技大学 Monitor video accident detection method based on more examples and time series
CN108846852B (en) * 2018-04-11 2022-03-08 杭州电子科技大学 Monitoring video abnormal event detection method based on multiple examples and time sequence
CN108960059A (en) * 2018-06-01 2018-12-07 众安信息技术服务有限公司 A kind of video actions recognition methods and device
CN108965585A (en) * 2018-06-22 2018-12-07 成都博宇科技有限公司 A kind of method for identifying ID based on intelligent mobile phone sensor
CN109376696A (en) * 2018-11-28 2019-02-22 北京达佳互联信息技术有限公司 Method, apparatus, computer equipment and the storage medium of video actions classification
CN109376696B (en) * 2018-11-28 2020-10-23 北京达佳互联信息技术有限公司 Video motion classification method and device, computer equipment and storage medium
CN111160117A (en) * 2019-12-11 2020-05-15 青岛联合创智科技有限公司 Abnormal behavior detection method based on multi-example learning modeling
WO2022116479A1 (en) * 2020-12-01 2022-06-09 南京智谷人工智能研究院有限公司 End-to-end multi-instance learning method based on automatic instance selection
CN113011322A (en) * 2021-03-17 2021-06-22 南京工业大学 Detection model training method and detection method for specific abnormal behaviors of monitoring video
CN113011322B (en) * 2021-03-17 2023-09-05 贵州安防工程技术研究中心有限公司 Detection model training method and detection method for monitoring specific abnormal behavior of video

Similar Documents

Publication Publication Date Title
CN106980826A (en) A kind of action identification method based on neutral net
CN107944442B (en) Based on the object test equipment and method for improving convolutional neural networks
CN110353675B (en) Electroencephalogram signal emotion recognition method and device based on picture generation
CN109034210A (en) Object detection method based on super Fusion Features Yu multi-Scale Pyramid network
WO2021139324A1 (en) Image recognition method and apparatus, computer-readable storage medium and electronic device
CN110969124A (en) Two-dimensional human body posture estimation method and system based on lightweight multi-branch network
CN108921037B (en) Emotion recognition method based on BN-acceptance double-flow network
Srinivasan et al. Interpretable human action recognition in compressed domain
CN111488805B (en) Video behavior recognition method based on salient feature extraction
CN108805833A (en) Miscellaneous minimizing technology of copybook binaryzation ambient noise of network is fought based on condition
CN106355171A (en) Video monitoring internetworking system
CN115240117A (en) Helmet wearing detection method in construction site construction scene
CN115578770A (en) Small sample facial expression recognition method and system based on self-supervision
Koli et al. Human action recognition using deep neural networks
CN111862031A (en) Face synthetic image detection method and device, electronic equipment and storage medium
CN114155572A (en) Facial expression recognition method and system
CN111881803B (en) Face recognition method based on improved YOLOv3
Liang et al. Mask-guided multiscale feature aggregation network for hand gesture recognition
Monisha et al. Enhanced automatic recognition of human emotions using machine learning techniques
CN103778445B (en) Cold-rolled strip steel surface defect reason analyzing method and system
CN116403286A (en) Social grouping method for large-scene video
Lin et al. Micro-expression recognition based on spatiotemporal Gabor filters
US20220027688A1 (en) Image identification device, method for performing semantic segmentation, and storage medium
CN114565772A (en) Set feature extraction method and device, electronic equipment and storage medium
CN110555342B (en) Image identification method and device and image equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20170725