CN106980826A - A kind of action identification method based on neutral net - Google Patents
A kind of action identification method based on neutral net Download PDFInfo
- Publication number
- CN106980826A CN106980826A CN201710156415.1A CN201710156415A CN106980826A CN 106980826 A CN106980826 A CN 106980826A CN 201710156415 A CN201710156415 A CN 201710156415A CN 106980826 A CN106980826 A CN 106980826A
- Authority
- CN
- China
- Prior art keywords
- video
- trained
- characteristic vector
- network
- extraction device
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/40—Scenes; Scene-specific elements in video content
- G06V20/41—Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items
- G06V20/42—Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items of sport video content
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- Software Systems (AREA)
- Computational Linguistics (AREA)
- Evolutionary Computation (AREA)
- General Engineering & Computer Science (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Biology (AREA)
- Bioinformatics & Computational Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Multimedia (AREA)
- Health & Medical Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Image Analysis (AREA)
Abstract
Know method for distinguishing the invention discloses a kind of human action based on neutral net, the described method comprises the following steps:N number of separate 3D convolutional neural networks are trained based on video database, as video feature extraction device;According to video feature extraction device, multi-instance learning grader is trained;Video to be identified is inputted, by the network extraction video features trained, the classification of motion is carried out by grader.Influence present invention, avoiding much noise feature to classification results, it is to avoid negative effect of the fixed sample length to action recognition result.
Description
Technical field
Field, more particularly to a kind of action identification method based on neutral net are recognized the present invention relates to human action.
Background technology
With the development of mobile Internet, it is many that the carrier of information gradually expands to audio, image, video etc. via word
The form of kind.In recent years, the video data volume is in explosive increase, and application field more becomes diversity, is related to safety, monitoring and entertains
Etc. every field[1].In face of the data of such magnanimity, traditional artificial treatment can not meet the demand of people.Therefore, it is sharp
The identification and understanding to video information are realized with the powerful storage of computer and computing capability, with important learning value and extensively
Wealthy application prospect.
In fact, in computer vision field, many decades have been carried out in the research on video, research topic includes dynamic
Recognize, abnormality detection and video frequency searching etc..Human action identification is one of them important research direction, and is achieved larger
Progress, it is each that achievement in research is related to intelligent monitoring, Medical nursing, video frequency searching, man-machine interaction, behavioural analysis, virtual reality etc.
Individual field[2].Wherein, the most ripe with man-machine interaction, Kinect (body-sensing) camera of such as Microsoft can be achieved to human action
Seizure and understanding.However, still suffering from very big difficult point and challenge, such as true nature scene on the research that human action is recognized
Under action recognition, colony's action recognition etc..The presence of these problems, makes human body action recognition distance be efficiently applied to real field
There is a very long segment distance in scape.
With the development of parallel computation equipment (GPUs (graphics processor), CPU cluster), and large scale training data
Occur, convolutional neural networks (convolutional neural networks, CNNs) rise again, and in object identification, from
Right Language Processing, Classification of Speech, man-machine interaction, human body is followed the trail of, the direction such as image restoration, denoising and segmentation achieve it is prominent into
Really.However, in video identification field, the application of convolutional neural networks is also seldom.
The content of the invention
The invention provides a kind of action identification method based on neutral net, present invention, avoiding much noise feature pair
The influence of classification results, it is to avoid negative effect of the fixed sample length to action recognition result, it is described below:
A kind of human action based on neutral net knows method for distinguishing, the described method comprises the following steps:
N number of separate 3D convolutional neural networks are trained based on video database, as video feature extraction device;
According to video feature extraction device, multi-instance learning grader is trained;
Video to be identified is inputted, by the network extraction video features trained, the classification of motion is carried out by grader.
Wherein, it is described that N number of separate 3D convolutional neural networks are trained based on video database, carried as video features
The step of taking device be specially:
Each video in video library is divided into the video segment that several frame lengths are Fi, each video segment is as net
A network i training sample, trains 3D convolutional neural networks, and N number of independent 3D convolutional neural networks collectively form video features
Extractor.
Wherein, described according to video feature extraction device, the step of training multi-instance learning grader is specially:
Each video in database is distinguished into input video feature extractor, characteristic vector is extracted;Then will be each
Video regards a bag of multi-instance learning as, and characteristic vector carries out multi-instance learning as the example in bag.
Wherein, described each video by database distinguishes input video feature extractor, extracts characteristic vector
Step is specially:
A video M is given, is video segment that Mi frame length is Fi by video M points, is extracted as the input of network
The characteristic vector of Mi n dimension, then video M extract (M1+M2+ ...+MN) individual characteristic vector altogether.
Wherein, the input video to be identified, by the network extraction video features trained, action is entered by grader
Work is specially the step of classify:
By the network trained, the characteristic vector that P n is tieed up is extracted, whole video is regard as one of multi-instance learning
Bag, each characteristic vector carries out the classification of motion by multi-instance learning as an example in bag to it.
The beneficial effect for the technical scheme that the present invention is provided is:
1st, on the basis of C3D (3D convolution) feature, the method that multiple features are produced to same video is introduced, using showing more
Example learning method, influence of the reduction much noise feature to classification results;
2nd, the influence in view of length of time series to action recognition result, is respectively adopted the piece of video of different length combination
Duan Jinhang feature learnings, it is to avoid negative effect of the fixed sample length to action recognition result.
Brief description of the drawings
Fig. 1 is the flow chart of the action identification method based on neutral net;
Fig. 2 is 3D convolutional neural networks structure charts;
Fig. 3 is that 3D convolutional neural networks train schematic diagram;
Fig. 4 is C3D feature extraction schematic diagrames.
Embodiment
To make the object, technical solutions and advantages of the present invention clearer, further is made to embodiment of the present invention below
It is described in detail on ground.
To solve problem above, it is desirable to be able to comprehensively, it is automatic, accurately extract video features, and classified.Study table
Bright C3D features have higher accuracy rate in visual classification field, and multi-instance learning can eliminate much noise feature to classification
As a result influence.
Embodiment 1
The embodiment of the present invention proposes a kind of action identification method based on neutral net, referring to Fig. 1, the action recognition side
Method comprises the following steps:
101:N number of separate 3D convolutional neural networks are trained based on video database, as video feature extraction device;
102:According to video feature extraction device, multi-instance learning grader is trained;
103:Video to be identified is inputted, by the network extraction video features trained, passes through grader progress action point
Class.
Wherein, N number of separate 3D convolutional neural networks are trained based on video database in step 101, as regarding
The step of frequency feature extractor is specially:
Each video in video library is divided into the video segment that several frame lengths are Fi, each video segment is as net
A network i training sample, trains 3D convolutional neural networks, and N number of independent 3D convolutional neural networks collectively form video features
Extractor.
Wherein, it is specially according to video feature extraction device, the step of training multi-instance learning grader in step 102:
Each video in database is distinguished into input video feature extractor, characteristic vector is extracted;Then will be each
Video regards a bag of multi-instance learning as, and characteristic vector carries out multi-instance learning as the example in bag.
Wherein, above-mentioned each video by database distinguishes input video feature extractor, extracts characteristic vector
Step is specially:
A video M is given, is video segment that Mi frame length is Fi by video M points, is extracted as the input of network
The characteristic vector of Mi n dimension, then video M extract (M1+M2+ ...+MN) individual characteristic vector altogether.
Wherein, the input video to be identified in step 103, by the network extraction video features trained, passes through classification
Device carry out the classification of motion the step of be specially:
By the network trained, the characteristic vector that P n is tieed up is extracted, whole video is regard as one of multi-instance learning
Bag, each characteristic vector carries out the classification of motion by multi-instance learning as an example in bag to it.
In summary, the embodiment of the present invention avoids much noise feature to classification by above-mentioned steps 101- steps 103
As a result influence, it is to avoid negative effect of the fixed sample length to action recognition result, substantially increases human action identification
Robustness and accuracy.
Embodiment 2
The scheme in embodiment 1 is further introduced with reference to specific example, Fig. 2-Fig. 4, it is as detailed below
Description:
201:Video database is set up, and N number of separate 3D convolutional neural networks are trained based on video database, is used
Make video feature extraction device, i.e. C3D features;
Wherein, the study of C3D features is carried out on 3D ConvNets (3D convolutional neural networks), its network structure
Figure is as shown in Fig. 2 all convolution filter sizes are all 3*3*3, and space-time step-length is 1.It is all except Pool1 (1*2*2)
Pond layer size is all 2*2*2, and step-length is 1.Finally, the output of 4096 dimensions is respectively obtained in full articulamentum fc6 and fc7.
Wherein, video feature extraction device needs the N number of separate 3D ConvNets of training, the training of each network
Cheng Xiangtong, referring to Fig. 3, by taking network i (i=1,2,3 ..., N) as an example, detailed process is:Each video in database is divided into
Several frame lengths are Fi video segment, and each video segment trains 3D as a network i training sample
ConvNets.Change frame length Fi, repeat above procedure, N number of different 3D ConvNets can be obtained, human action is collectively formed
The video feature extraction device of identifying system.
202:According to video feature extraction device, multi-instance learning grader is trained;
Wherein, the characteristic vector of each video in database is extracted using the N number of 3D ConvNets trained, each
The characteristic extraction procedure of network is identical, and referring to Fig. 4, by taking network i (i=1,2,3 ..., N) as an example, detailed process is:By video M
It is divided into the video segment that Mi frame length is Fi, as network i input, extracts Mi characteristic vector.Therefore, video M passes through
Feature extractor (N number of 3D ConvNets) extracts (M1+M2+ ...+MN) individual characteristic vector altogether.
Finally, the video of each in video library is regarded as to a bag of multi-instance learning, carried by video feature extraction device
The characteristic vector got regards an example in bag as, carries out multi-instance learning, trains sorter model.
203:Video to be identified is inputted, by the network extraction video features trained, passes through grader progress action point
Class.
When carrying out action recognition, a video K to be identified is inputted, it is passed through into N number of 3D for training respectively first
ConvNets, extracts (K1+K2+ ...+KN) individual characteristic vector, regard video K as a bag of multi-instance learning, characteristic vector
As the example in bag, by the grader trained in step (2), classification results are obtained.
In summary, the embodiment of the present invention avoids much noise feature to classification by above-mentioned steps 2-01- steps 203
As a result influence, it is to avoid negative effect of the fixed sample length to action recognition result, substantially increases human action identification
Robustness and accuracy.
Bibliography:
[1]Turaga P,Chellappa R,Subrahmanian V S,et al.Machine recognition of
human activities:A survey[J].IEEE Transactions on Circuits and Systems for
Video Technology,2008,18(11):1473-1488.
[2]Aggarwal J K,Ryoo M S.Human activity analysis:A review[J].ACM
Computing Surveys(CSUR),2011,43(3):16.
[3]Laptev I.On space-time interest points[J].International Journal of
Computer Vision,2005,64(2-3):107-123.
[4]Ji S,Xu W,Yang M,et al.3D convolutional neural networks for human
action recognition[J].IEEE transactions on pattern analysis and machine
intelligence,2013,35(1):221-231.
[5]Tran D,Bourdev L,Fergus R,et al.C3D:generic features for video
analysis[J].CoRR,abs/1412.0767,2014,2:7.
It will be appreciated by those skilled in the art that accompanying drawing is the schematic diagram of a preferred embodiment, the embodiments of the present invention
Sequence number is for illustration only, and the quality of embodiment is not represented.
The foregoing is only presently preferred embodiments of the present invention, be not intended to limit the invention, it is all the present invention spirit and
Within principle, any modification, equivalent substitution and improvements made etc. should be included in the scope of the protection.
Claims (5)
1. a kind of human action based on neutral net knows method for distinguishing, it is characterised in that the described method comprises the following steps:
N number of separate 3D convolutional neural networks are trained based on video database, as video feature extraction device;
According to video feature extraction device, multi-instance learning grader is trained;
Video to be identified is inputted, by the network extraction video features trained, the classification of motion is carried out by grader.
2. a kind of human action based on neutral net according to claim 1 knows method for distinguishing, it is characterised in that described
N number of separate 3D convolutional neural networks are trained based on video database, are specially the step of as video feature extraction device:
Each video in video library is divided into the video segment that several frame lengths are Fi, each video segment is as network i
A training sample, train 3D convolutional neural networks, N number of independent 3D convolutional neural networks collectively form video feature extraction
Device.
3. a kind of human action based on neutral net according to claim 1 knows method for distinguishing, it is characterised in that described
According to video feature extraction device, the step of training multi-instance learning grader is specially:
Each video in database is distinguished into input video feature extractor, characteristic vector is extracted;Then by each video
Regard a bag of multi-instance learning as, characteristic vector carries out multi-instance learning as the example in bag.
4. a kind of human action based on neutral net according to claim 3 knows method for distinguishing, it is characterised in that described
Each video in database is distinguished into input video feature extractor, the step of extracting characteristic vector is specially:
A video M is given, is video segment that Mi frame length is Fi by video M points, Mi n is extracted as the input of network
The characteristic vector of dimension, then video M extract (M1+M2+ ...+MN) individual characteristic vector altogether.
5. a kind of human action based on neutral net according to claim 1 knows method for distinguishing, it is characterised in that described
Video to be identified is inputted, by the network extraction video features trained, the step of carrying out the classification of motion by grader is specific
For:
By the network trained, the characteristic vector of P n dimension is extracted, is wrapped whole video as one of multi-instance learning, often
Individual characteristic vector carries out the classification of motion by multi-instance learning as an example in bag to it.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710156415.1A CN106980826A (en) | 2017-03-16 | 2017-03-16 | A kind of action identification method based on neutral net |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710156415.1A CN106980826A (en) | 2017-03-16 | 2017-03-16 | A kind of action identification method based on neutral net |
Publications (1)
Publication Number | Publication Date |
---|---|
CN106980826A true CN106980826A (en) | 2017-07-25 |
Family
ID=59338802
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710156415.1A Pending CN106980826A (en) | 2017-03-16 | 2017-03-16 | A kind of action identification method based on neutral net |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN106980826A (en) |
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108734095A (en) * | 2018-04-10 | 2018-11-02 | 南京航空航天大学 | A kind of motion detection method based on 3D convolutional neural networks |
CN108846852A (en) * | 2018-04-11 | 2018-11-20 | 杭州电子科技大学 | Monitor video accident detection method based on more examples and time series |
CN108965585A (en) * | 2018-06-22 | 2018-12-07 | 成都博宇科技有限公司 | A kind of method for identifying ID based on intelligent mobile phone sensor |
CN108960059A (en) * | 2018-06-01 | 2018-12-07 | 众安信息技术服务有限公司 | A kind of video actions recognition methods and device |
CN109376696A (en) * | 2018-11-28 | 2019-02-22 | 北京达佳互联信息技术有限公司 | Method, apparatus, computer equipment and the storage medium of video actions classification |
CN111160117A (en) * | 2019-12-11 | 2020-05-15 | 青岛联合创智科技有限公司 | Abnormal behavior detection method based on multi-example learning modeling |
CN113011322A (en) * | 2021-03-17 | 2021-06-22 | 南京工业大学 | Detection model training method and detection method for specific abnormal behaviors of monitoring video |
WO2022116479A1 (en) * | 2020-12-01 | 2022-06-09 | 南京智谷人工智能研究院有限公司 | End-to-end multi-instance learning method based on automatic instance selection |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102509084A (en) * | 2011-11-18 | 2012-06-20 | 中国科学院自动化研究所 | Multi-examples-learning-based method for identifying horror video scene |
CN104778457A (en) * | 2015-04-18 | 2015-07-15 | 吉林大学 | Video face identification algorithm on basis of multi-instance learning |
CN105930792A (en) * | 2016-04-19 | 2016-09-07 | 武汉大学 | Human action classification method based on video local feature dictionary |
CN106203283A (en) * | 2016-06-30 | 2016-12-07 | 重庆理工大学 | Based on Three dimensional convolution deep neural network and the action identification method of deep video |
CN106407903A (en) * | 2016-08-31 | 2017-02-15 | 四川瞳知科技有限公司 | Multiple dimensioned convolution neural network-based real time human body abnormal behavior identification method |
CN106504255A (en) * | 2016-11-02 | 2017-03-15 | 南京大学 | A kind of multi-Target Image joint dividing method based on multi-tag multi-instance learning |
-
2017
- 2017-03-16 CN CN201710156415.1A patent/CN106980826A/en active Pending
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102509084A (en) * | 2011-11-18 | 2012-06-20 | 中国科学院自动化研究所 | Multi-examples-learning-based method for identifying horror video scene |
CN104778457A (en) * | 2015-04-18 | 2015-07-15 | 吉林大学 | Video face identification algorithm on basis of multi-instance learning |
CN105930792A (en) * | 2016-04-19 | 2016-09-07 | 武汉大学 | Human action classification method based on video local feature dictionary |
CN106203283A (en) * | 2016-06-30 | 2016-12-07 | 重庆理工大学 | Based on Three dimensional convolution deep neural network and the action identification method of deep video |
CN106407903A (en) * | 2016-08-31 | 2017-02-15 | 四川瞳知科技有限公司 | Multiple dimensioned convolution neural network-based real time human body abnormal behavior identification method |
CN106504255A (en) * | 2016-11-02 | 2017-03-15 | 南京大学 | A kind of multi-Target Image joint dividing method based on multi-tag multi-instance learning |
Cited By (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108734095A (en) * | 2018-04-10 | 2018-11-02 | 南京航空航天大学 | A kind of motion detection method based on 3D convolutional neural networks |
CN108734095B (en) * | 2018-04-10 | 2022-05-20 | 南京航空航天大学 | Motion detection method based on 3D convolutional neural network |
CN108846852A (en) * | 2018-04-11 | 2018-11-20 | 杭州电子科技大学 | Monitor video accident detection method based on more examples and time series |
CN108846852B (en) * | 2018-04-11 | 2022-03-08 | 杭州电子科技大学 | Monitoring video abnormal event detection method based on multiple examples and time sequence |
CN108960059A (en) * | 2018-06-01 | 2018-12-07 | 众安信息技术服务有限公司 | A kind of video actions recognition methods and device |
CN108965585A (en) * | 2018-06-22 | 2018-12-07 | 成都博宇科技有限公司 | A kind of method for identifying ID based on intelligent mobile phone sensor |
CN109376696A (en) * | 2018-11-28 | 2019-02-22 | 北京达佳互联信息技术有限公司 | Method, apparatus, computer equipment and the storage medium of video actions classification |
CN109376696B (en) * | 2018-11-28 | 2020-10-23 | 北京达佳互联信息技术有限公司 | Video motion classification method and device, computer equipment and storage medium |
CN111160117A (en) * | 2019-12-11 | 2020-05-15 | 青岛联合创智科技有限公司 | Abnormal behavior detection method based on multi-example learning modeling |
WO2022116479A1 (en) * | 2020-12-01 | 2022-06-09 | 南京智谷人工智能研究院有限公司 | End-to-end multi-instance learning method based on automatic instance selection |
CN113011322A (en) * | 2021-03-17 | 2021-06-22 | 南京工业大学 | Detection model training method and detection method for specific abnormal behaviors of monitoring video |
CN113011322B (en) * | 2021-03-17 | 2023-09-05 | 贵州安防工程技术研究中心有限公司 | Detection model training method and detection method for monitoring specific abnormal behavior of video |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN106980826A (en) | A kind of action identification method based on neutral net | |
CN107944442B (en) | Based on the object test equipment and method for improving convolutional neural networks | |
CN110353675B (en) | Electroencephalogram signal emotion recognition method and device based on picture generation | |
WO2021139324A1 (en) | Image recognition method and apparatus, computer-readable storage medium and electronic device | |
CN109034210A (en) | Object detection method based on super Fusion Features Yu multi-Scale Pyramid network | |
Srinivasan et al. | Interpretable human action recognition in compressed domain | |
CN108921037B (en) | Emotion recognition method based on BN-acceptance double-flow network | |
CN111488805B (en) | Video behavior recognition method based on salient feature extraction | |
CN101866429A (en) | Training method of multi-moving object action identification and multi-moving object action identification method | |
CN106355171A (en) | Video monitoring internetworking system | |
CN115240117A (en) | Helmet wearing detection method in construction site construction scene | |
CN115578770A (en) | Small sample facial expression recognition method and system based on self-supervision | |
Koli et al. | Human action recognition using deep neural networks | |
CN116403286A (en) | Social grouping method for large-scene video | |
CN114187546B (en) | Combined action recognition method and system | |
CN111862031A (en) | Face synthetic image detection method and device, electronic equipment and storage medium | |
US20220027688A1 (en) | Image identification device, method for performing semantic segmentation, and storage medium | |
Monisha et al. | Enhanced automatic recognition of human emotions using machine learning techniques | |
Liang et al. | Mask-guided multiscale feature aggregation network for hand gesture recognition | |
CN114155572A (en) | Facial expression recognition method and system | |
CN111881803B (en) | Face recognition method based on improved YOLOv3 | |
CN103778445B (en) | Cold-rolled strip steel surface defect reason analyzing method and system | |
Lin et al. | Micro-expression recognition based on spatiotemporal Gabor filters | |
CN114565772A (en) | Set feature extraction method and device, electronic equipment and storage medium | |
CN110555342B (en) | Image identification method and device and image equipment |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WD01 | Invention patent application deemed withdrawn after publication | ||
WD01 | Invention patent application deemed withdrawn after publication |
Application publication date: 20170725 |