CN110399822A - Action identification method of raising one's hand, device and storage medium based on deep learning - Google Patents
Action identification method of raising one's hand, device and storage medium based on deep learning Download PDFInfo
- Publication number
- CN110399822A CN110399822A CN201910647658.4A CN201910647658A CN110399822A CN 110399822 A CN110399822 A CN 110399822A CN 201910647658 A CN201910647658 A CN 201910647658A CN 110399822 A CN110399822 A CN 110399822A
- Authority
- CN
- China
- Prior art keywords
- identified
- hand
- convolutional neural
- characteristic point
- neural networks
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
- G06F18/2411—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on the proximity to a decision surface, e.g. support vector machines
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/44—Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/20—Movements or behaviour, e.g. gesture recognition
- G06V40/28—Recognition of hand or arm movements, e.g. recognition of deaf sign language
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Data Mining & Analysis (AREA)
- General Physics & Mathematics (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Evolutionary Computation (AREA)
- Bioinformatics & Computational Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Artificial Intelligence (AREA)
- Life Sciences & Earth Sciences (AREA)
- Evolutionary Biology (AREA)
- Health & Medical Sciences (AREA)
- Psychiatry (AREA)
- Social Psychology (AREA)
- Human Computer Interaction (AREA)
- General Health & Medical Sciences (AREA)
- Image Analysis (AREA)
Abstract
This application discloses a kind of action identification method of raising one's hand, device and storage medium based on deep learning.Wherein, this method, comprising: obtain the image of the upper limb comprising object to be identified;Using the Feature Selection Model based on deep learning training, location information of the multiple characteristic points of the upper limb of the object to be identified in described image is determined;And according to identified location information, using pre-set disaggregated model, determines whether the object to be identified is made and raise one's hand to act.The disclosure can obtain image data from multiplex image acquisition equipment simultaneously, the location information of multiple characteristic points of the upper limb of object to be identified in the picture is obtained in real time using the Feature Selection Model of deep learning training, then according to identified location information, utilize pre-set disaggregated model, whether real-time judgment object to be identified, which is made, is raised one's hand to act, be not required to compute repeatedly, can identify raise one's hand be intended to and detection accuracy it is higher.
Description
Technical field
This application involves Activity recognition method fields, more particularly to a kind of action recognition side of raising one's hand based on deep learning
Method, device and storage medium.
Background technique
With the development of scientific and technological level, the enhancing of computing resource ability, the life based on computer vision applied in people
In work increasingly extensively, also occur much motion detection applications based on deep learning in the market, and examined at this stage based on target
There are some problems for the recognizer of raising one's hand surveyed, including verification and measurement ratio is low, the sample that is blocked can not identify and can not Accurate Prediction
Movement intention of target etc., and the consumption of the motion detection computing resource based on track algorithm is higher, can not largely input in data
When it is competent, and exist when to multiple target tracking and obscure target and Loss.It therefore can be in higher robustness
Under the premise of, realize that detection of raising one's hand is the deciding factor of measure algorithm superiority and inferiority in complex scene in high precision and quickly.
Currently used detection method of raising one's hand generally has following several method: one is ocular estimate, this recognition methods is only
It is observed in real time suitable for human eye, when one-man service can not uninterruptedly complete the real-time monitoring of multichannel scene, in lack of manpower and nothing
Task is unable to complete in the case of people, and human cost is relatively high.Another kind is raised one's hand based on target detection and track algorithm
Detection method, this detection method carry out tracking merging to hand based on the successive frame of video, this cause algorithm calculation amount compared with
Big and real-time is poor, multi-path camera cannot be supported to monitor in real time, is difficult with camera is more multi-environment, and cannot be accurate
Identify manpower position and manpower posture, accuracy rate is not high, and there are more erroneous detections, missing inspection.
It is calculated for above-mentioned existing recognizer data preparation higher cost of raising one's hand existing in the prior art, algorithm
Time is compared with long, the amount of computing repeatedly is big, cannot support multiway images input, can not identify raise one's hand intention and the lower skill of detection accuracy
Art problem, currently no effective solution has been proposed.
Summary of the invention
Embodiment of the disclosure provides a kind of action identification method of raising one's hand based on deep learning, device and storage and is situated between
Matter, at least to solve existing recognizer data preparation higher cost of raising one's hand existing in the prior art, algorithm calculates the time
It is big compared with the long, amount of computing repeatedly, cannot support multiway images input, can not identify and raise one's hand to be intended to and the lower technology of detection accuracy is asked
Topic.
According to the one aspect of the embodiment of the present disclosure, a kind of action identification method of raising one's hand based on deep learning is provided,
It include: the image for obtaining the upper limb comprising object to be identified;Using based on deep learning training Feature Selection Model, determine to
Identify the location information of multiple characteristic points of the upper limb of object in the picture;And according to identified location information, using pre-
The disaggregated model being first arranged determines whether object to be identified is made and raises one's hand to act.
According to the other side of the embodiment of the present disclosure, a kind of storage medium is additionally provided, storage medium includes storage
Program, wherein the method as described in processor execution any of the above one in program operation.
According to the other side of the embodiment of the present disclosure, a kind of action recognition dress of raising one's hand based on deep learning is additionally provided
It sets, comprising: module is obtained, for obtaining the image of the upper limb comprising object to be identified;Determining module, for using based on depth
The Feature Selection Model of learning training determines the location information of multiple characteristic points of the upper limb of object to be identified in the picture;With
And determination module, for whether determining object to be identified using pre-set disaggregated model according to identified location information
It makes and raises one's hand to act.
According to the other side of the embodiment of the present disclosure, a kind of action recognition dress of raising one's hand based on deep learning is additionally provided
It sets, comprising: processor;And memory, it is connect with processor, for providing the finger for handling following processing step for processor
It enables: obtaining the image of the upper limb comprising object to be identified;Using the Feature Selection Model based on deep learning training, determine wait know
The location information of multiple characteristic points of the upper limb of other object in the picture;And according to identified location information, using preparatory
The disaggregated model of setting determines whether object to be identified is made and raises one's hand to act.
In the embodiments of the present disclosure, a kind of action identification method of raising one's hand based on deep learning is provided, processor is first
Then the image for obtaining the upper limb comprising object to be identified utilizes the Feature Selection Model based on deep learning training, determines institute
Location information of the multiple characteristic points of the upper limb of object to be identified in described image is stated, is then believed according to identified position
Breath, using pre-set disaggregated model, determines whether the object to be identified is made and raises one's hand to act.The technical solution of the application
Image data can be obtained from multiplex image acquisition equipment simultaneously, real-time using the Feature Selection Model of deep learning training
The location information of multiple characteristic points of the upper limb of object to be identified in the picture out, then according to identified location information, benefit
With pre-set disaggregated model, whether real-time judgment object to be identified, which is made, is raised one's hand to act, and is not required to compute repeatedly, and can be identified
It raises one's hand to be intended to and detection accuracy is higher.And since this programme uses the Feature Selection Model based on deep learning training, because
This does not need sensor relative to traditional recognition methods of raising one's hand, easy to use, and calculating occupancy resource is not high, and cost is relatively low, easily
In universal.And then solve existing recognizer data preparation higher cost of raising one's hand existing in the prior art, algorithm calculates
Time is compared with long, the amount of computing repeatedly is big, cannot support multiway images input, can not identify raise one's hand intention and the lower skill of detection accuracy
Art problem.
Detailed description of the invention
Attached drawing described herein is used to provide further understanding of the disclosure, constitutes part of this application, this public affairs
The illustrative embodiments and their description opened do not constitute the improper restriction to the disclosure for explaining the disclosure.In the accompanying drawings:
Fig. 1 is the action recognition side of raising one's hand according to the first aspect of the embodiment of the present disclosure 1 based on deep learning
The flow diagram of method;
Fig. 2 a be according to the first aspect of the embodiment of the present disclosure 1 under gesture of raising one's hand the multiple characteristic points of upper limb
Distribution situation schematic diagram;
Fig. 2 b be according to the first aspect of the embodiment of the present disclosure 1 under non-gesture of raising one's hand the multiple features of upper limb
One schematic diagram of the distribution situation of point;
Fig. 2 c be according to the first aspect of the embodiment of the present disclosure 1 under non-gesture of raising one's hand the multiple features of upper limb
Another schematic diagram of the distribution situation of point;
Fig. 3 is the structural schematic diagram of the Feature Selection Model according to the first aspect of the embodiment of the present disclosure 1;
Fig. 4 is the schematic diagram of the fisrt feature figure according to the first aspect of the embodiment of the present disclosure 1;
Fig. 5 is a schematic diagram of the second feature figure according to the first aspect of the embodiment of the present disclosure 1;
Fig. 6 is another schematic diagram of the second feature figure according to the first aspect of the embodiment of the present disclosure 1;
Fig. 7 is the schematic diagram of the action recognition device of raising one's hand according to the embodiment of the present disclosure 2 based on deep learning;With
And
Fig. 8 is the schematic diagram of the action recognition device of raising one's hand according to the embodiment of the present disclosure 3 based on deep learning.
Specific embodiment
In order to make those skilled in the art more fully understand the technical solution of the disclosure, implement below in conjunction with the disclosure
Attached drawing in example, is clearly and completely described the technical solution in the embodiment of the present disclosure.Obviously, described embodiment
The only embodiment of disclosure a part, instead of all the embodiments.Based on the embodiment in the disclosure, this field is common
Disclosure protection all should belong in technical staff's every other embodiment obtained without making creative work
Range.
It should be noted that the specification and claims of the disclosure and term " first " in above-mentioned attached drawing, "
Two " etc. be to be used to distinguish similar objects, without being used to describe a particular order or precedence order.It should be understood that using in this way
Data be interchangeable under appropriate circumstances, so as to embodiment of the disclosure described herein can in addition to illustrating herein or
Sequence other than those of description is implemented.In addition, term " includes " and " having " and their any deformation, it is intended that cover
Cover it is non-exclusive include, for example, the process, method, system, product or equipment for containing a series of steps or units are not necessarily limited to
Step or unit those of is clearly listed, but may include be not clearly listed or for these process, methods, product
Or other step or units that equipment is intrinsic.
Embodiment 1
According to the first aspect of the present embodiment, a kind of action identification method of raising one's hand based on deep learning is provided.Fig. 1
The flow diagram of this method is shown, refering to what is shown in Fig. 1, this method comprises:
S102: the image of the upper limb comprising object to be identified is obtained;
S104: using the Feature Selection Model based on deep learning training, multiple spies of the upper limb of object to be identified are determined
The location information of sign point in the picture;And
S106: determine whether object to be identified does using pre-set disaggregated model according to identified location information
It raises one's hand to act out.
As described in foregoing Background, the existing development with scientific and technological level, the enhancing of computing resource ability,
Application based on computer vision increasingly extensively, also occurs much dynamic based on deep learning in the market in people's lives
Make detection application, and at this stage the recognizer of raising one's hand based on target detection there are some problems, including verification and measurement ratio it is low, be blocked
Sample can not identify and can not the movement of Accurate Prediction target be intended to etc., and the motion detection computing resource based on track algorithm
Consume it is higher, can not be competent when data largely input, and exist when to multiple target tracking and obscure target and loss
Phenomenon.Therefore it can realize that detection of raising one's hand is weighing apparatus in complex scene in high precision and quickly under the premise of higher robustness
The deciding factor of quantity algorithm superiority and inferiority.
Currently used detection method of raising one's hand generally has following several method: one is ocular estimate, this recognition methods is only
It is observed in real time suitable for human eye, when one-man service can not uninterruptedly complete the real-time monitoring of multichannel scene, in lack of manpower and nothing
Task is unable to complete in the case of people, and human cost is relatively high.Another kind is raised one's hand based on target detection and track algorithm
Detection method, this detection method carry out tracking merging to hand based on the successive frame of video, this cause algorithm calculation amount compared with
Big and real-time is poor, multi-path camera cannot be supported to monitor in real time, is difficult with camera is more multi-environment, and cannot be accurate
Identify manpower position and manpower posture, accuracy rate is not high, and there are more erroneous detections, missing inspection.
The problem of for above-mentioned background technique, present embodiments provides and a kind of raises one's hand to act based on deep learning
Recognition methods, processor obtain the image of the upper limb comprising object to be identified first, then utilize based on deep learning training
Feature Selection Model determines location information of the multiple characteristic points of the upper limb of the object to be identified in described image, then
According to identified location information, using pre-set disaggregated model, determines whether the object to be identified is made and raise one's hand to move
Make.The technical solution of the application can obtain image data from multiplex image acquisition equipment simultaneously, use deep learning training
Feature Selection Model obtain in real time object to be identified upper limb multiple characteristic points location information in the picture, then basis
Identified location information, using pre-set disaggregated model, whether real-time judgment object to be identified, which is made, is raised one's hand to act.No
Need to compute repeatedly, can identify raise one's hand be intended to and detection accuracy it is higher.
Further, due under different gestures, the distribution situation of the distribution situation characteristic point of multiple characteristic points with
And the positional relationship between two adjacent characteristic points is had nothing in common with each other.Therefore, Feature Selection Model can be by learning a large amount of packets
It closes position between the distribution characteristics for the multiple characteristic points for including in image containing human upper limb and two adjacent characteristic points
System, and then constantly improve accuracy rate.Wherein Fig. 2 a shows the distribution situation of the multiple characteristic points of upper limb under gesture of raising one's hand
Schematic diagram, Fig. 2 b shows a schematic diagram of the distribution situation of the multiple characteristic points of upper limb under non-gesture of raising one's hand, and
Fig. 2 c shows another schematic diagram of the distribution situation of the multiple characteristic points of upper limb under non-gesture of raising one's hand.Fig. 2 a, 2b and
Each circular dot in 2c refers to a characteristic point.It include the image of one upper limb of human body referring to shown in Fig. 2 a, 2b and 2c
In include multiple characteristic points have: hand-characteristic point 21, wrist joint characteristic point 1, elbow joint characteristic point 1 and shoulder joint
Characteristic point 1.And since this programme uses the Feature Selection Model based on deep learning training, relative to traditional
Recognition methods of raising one's hand does not need sensor, easy to use, calculates and occupies that resource is not high, and cost is relatively low, is easy to universal.
To which the technical solution of the present embodiment solves the existing recognizer data preparation higher cost, algorithm meter of raising one's hand
Evaluation time is big compared with the long, amount of computing repeatedly, cannot support multiway images input, can not identify and raise one's hand to be intended to and detection accuracy is lower
Technical problem.
In addition, for example, the technical solution of the present embodiment when obtaining the image of the upper limb comprising object to be identified, such as can
To utilize the image capture devices such as camera, and the image of the upper limb comprising object to be identified is sent to processor.Wherein wrap
The image of upper limb containing object to be identified for example can obtain video derived from the video of camera acquisition, processor from video flowing
Data, and be image frame data by video data decoding, wherein image frame data is above-mentioned upper comprising object to be identified
The image of limb.Flower screen when solving the problems, such as video decoding by modifying transport protocol, and formulate timing reconnection after camera interrupts
Function, improve system robustness simultaneously from side improve arithmetic accuracy.
Optionally, Feature Selection Model includes the first convolutional neural networks, and utilizes the spy based on deep learning training
Sign extracts model, determines the operation of the location information of multiple characteristic points of the upper limb of object to be identified in the picture, comprising: utilizes
First convolutional neural networks generate the fisrt feature figure in multiple channels according to image;It will be in the fisrt feature figure in each channel
Greatest measure point is determined as characteristic point;And the position according to characteristic point in respective fisrt feature figure, determine that characteristic point exists
Location information in image.
Fig. 3 shows the structural schematic diagram of Feature Selection Model.Specifically, Feature Selection Model includes the first convolutional Neural
Network, referring to shown in Fig. 3, Feature Selection Model includes 2 branches, wherein branching into the first convolution on the left of such as, but not limited to
Neural network.First convolutional neural networks use first 8 layers of the VGG16 pre-training network model after fine tuning, export as width
Height is respectively the characteristic pattern of 512 dimensions of 1/8 size of input picture, it is assumed that the size of the image of input is 512 × 512, then exports
For the characteristic pattern of 512 dimensions of 64 × 64 sizes.The convolutional layer for being followed by a 1x1 generates 128 characteristic patterns tieed up of 64 × 64 sizes,
Then in this, as feature extraction result input feature vector point computation layer, (referring to shown in Fig. 3, characteristic point computation layer is by multilayer convolutional layer
Composition).Wherein it is possible to be arranged in the last layer convolutional layer of left-hand branch according to the quantity for the upper limb characteristic point for including in image
The quantity of convolution kernel, the quantity for the characteristic point for including based on an above-mentioned upper limb is 24, therefore branch's (first volume in left side
Product neural network) the last layer convolutional layer in the quantity of convolution kernel can be set to 24+1=25, wherein including 1 layer of background
Layer.Therefore the fisrt feature figure of 64 × 64 sizes in 25 channels of the first convolutional neural networks final output.
Further, the greatest measure point in the fisrt feature figure in each channel is determined as characteristic point.Fig. 4 is illustrative
The schematic diagram of fisrt feature figure is shown, referring to shown in Fig. 4, greatest measure point is the corresponding point of numerical value 0.9 in Fig. 4, i.e., will count
The corresponding point of value 0.9 is determined as characteristic point.Then the position according to characteristic point in respective fisrt feature figure, determines characteristic point
Location information in the picture.Such as: in Fig. 4, position of the characteristic point in fisrt feature figure is the position of last cell, this
When position in conjunction with fisrt feature figure in the picture, it can determine the location information of characteristic point in the picture.To pass through this
Kind mode, the location information of each characteristic point that can accurately and quickly include in determining image in the picture.
Optionally, Feature Selection Model further includes the second convolutional neural networks, and using based on deep learning training
Feature Selection Model determines the operation of the location information of multiple characteristic points of the upper limb of object to be identified in the picture, further includes:
Using the second convolutional neural networks, the second feature figure in multiple channels is generated according to image;It is determined according to second feature figure multiple
Positional relationship in characteristic point between two adjacent characteristic points;And the positional relationship according to determined by second feature figure, it is right
It is screened using characteristic point determined by fisrt feature figure, and determines the characteristic point to be sorted for classifying.
Specifically, Feature Selection Model includes the second convolutional neural networks, and referring to shown in Fig. 3, Feature Selection Model includes 2
A branch, wherein branching into the second convolutional neural networks on the right side of such as, but not limited to.Second convolutional neural networks are using process
First 8 layers of VGG16 pre-training network model after fine tuning export the spies of 512 dimensions for wide high respectively 1/8 size of input picture
Sign figure, it is assumed that the size of the image of input is 512 × 512, then output is the characteristic pattern of 512 dimensions of 64 × 64 sizes.It is followed by one
The convolutional layer of a 1x1 generates the characteristic pattern of 128 dimensions of 64 × 64 sizes, then in this, as feature extraction result input feature vector point
Computation layer (referring to shown in Fig. 3).Wherein it is possible to be arranged according to the quantity of the line between the upper limb characteristic point for including in image right
The quantity of convolution kernel in the last layer convolutional layer of side branch, the line between characteristic point for including based on an above-mentioned upper limb
Quantity be 22.Therefore the quantity of convolution kernel can in the last layer convolutional layer of branch's (the second convolutional neural networks) on right side
To be set as 22.To the second feature figure of 64 × 64 sizes in 22 channels of the second convolutional neural networks final output.Its
In, the second feature figure in each channel corresponds to a line between two characteristic points of the same upper limb.That is, each
The second feature figure in channel is used to describe the positional relationship between two characteristic points of the same upper limb.To the spy in 22 channels
Sign figure can be used in describing the positional relationship between 24 characteristic points of the same upper limb.
Further, determine that the position between two characteristic points adjacent in multiple characteristic points is closed according to second feature figure
System.Such as, but not limited to: in the picture, a wrist joint feature is distributed in multiple characteristic points of the upper limb comprising object to be identified
Point and two elbow joint characteristic points (such as elbow joint characteristic point 1 and elbow joint characteristic point 2).Fig. 5 illustratively shows
One schematic diagram of two characteristic patterns.Such as: Fig. 5 shows the position between adjacent wrist joint characteristic point and elbow joint characteristic point 1
Relationship is set, wherein characteristic point corresponding to second feature figure upper end is wrist joint characteristic point, corresponding to second feature figure lower end
Characteristic point is elbow joint characteristic point 1.In Fig. 5 numerical value be " 1 " orientation be used to indicate adjacent wrist joint characteristic point with
Positional relationship between elbow joint characteristic point 1.Referring to Figure 5, it is vertical direction that numerical value, which is the orientation of " 1 ", therefore phase
Positional relationship between adjacent wrist joint characteristic point and elbow joint characteristic point 1 is that elbow joint characteristic point 1 is located at wrist joint characteristic point
Underface.
Further, Fig. 6 illustratively shows another schematic diagram of second feature figure.Specifically, Fig. 6 shows phase
Positional relationship between adjacent wrist joint characteristic point and elbow joint characteristic point 2, wherein on the left of second feature figure corresponding to feature
Point is wrist joint characteristic point, and characteristic point corresponding to second feature figure right side is elbow joint characteristic point 2.Numerical value is " 1 " in Fig. 6
Orientation be horizontal direction, therefore the positional relationship between adjacent wrist joint characteristic point and elbow joint characteristic point 2 be elbow
Joint characteristic point is located at the front-right of wrist joint characteristic point.
Further, in conjunction with shown in Fig. 2 a, Fig. 2 a is shown raise one's hand gesture under positional relationship between each characteristic point, In
In Fig. 2 a, the positional relationship between adjacent wrist joint characteristic point and elbow joint characteristic point is also that elbow joint characteristic point is located at wrist pass
Save the underface of characteristic point.Therefore, it is possible to filter out wrist joint characteristic point and elbow joint characteristic point 1 is the spy of same upper limb
Point is levied, elbow joint characteristic point 2 is not belonging to the characteristic point of same upper limb.To by wrist joint characteristic point and elbow joint characteristic point 1
Determine the characteristic point to be sorted for classifying.And so on, 24 features that same upper limb is included can be filtered out
Point, and then filtered out 24 characteristic points are determined as to the characteristic point to be sorted for being used to classify.The present embodiment is ensured
The accuracy of the action identification method of raising one's hand.
Wherein, the calculating process of above-mentioned characteristic point computation layer can be by following formula into calculating:
Wherein, the characteristic pattern for 128 dimensions that the convolutional layer that F is 1x1 exports,Respectively left-hand branch and right side point
The convolutional calculation process in branch t stage, JtFor the calculated result in left-hand branch t stage, AtFor the meter in right-hand branch t stage
Calculate result.
Optionally, whether object to be identified is determined using pre-set disaggregated model according to identified location information
Make the operation for movement of raising one's hand, further includes: according to the location information of characteristic point to be sorted, utilize pre-set support vector machines
Model, determines whether object to be identified makes the operation for movement of raising one's hand.
Specifically, according to the location information of characteristic point to be sorted, using pre-set supporting vector machine model with single
The location information (such as position coordinates) for the characteristic point that upper limb includes is as input, whether to raise one's hand as output, therefore, it is determined that
Whether object to be identified makes the operation for movement of raising one's hand.For example, the positional relationship of 24 characteristic points of the same upper limb can divide
Not with each characteristic point position coordinates (x1, y1) in the picture, (x2, y2), (x3, y3) ... (x24, y24) is indicated.From
And in this way, the position coordinates of 24 characteristic points of the same upper limb are input to pre-set support vector machines
In model, it can quickly and accurately determine whether object to be identified is made and raise one's hand to act.
Wherein, support vector machines is a kind of disaggregated model of two relatively common classification, is repeated no more in the present embodiment.
Optionally, further include that the first convolutional neural networks are trained by following operation: obtaining multiple comprising upper limb
Sample image;Construct the first convolutional neural networks;Using the first convolutional neural networks, corresponding with sample image first is generated
Output vector, wherein the first output vector is used to indicate position letter of the characteristic point for including in sample image in sample image
Breath;And the first output vector and pre-set the first label-vector corresponding with sample image are compared, and root
The first convolutional neural networks are adjusted according to comparison result.
Specifically, it is necessary first to enough sample sets (set of sample image composition) is got, wherein including a variety of fields
The various postures of human body under scape, the content that training set production mainly marks are each characteristic point of human upper limb.Wherein hand is special
Sign point 21, wrist joint characteristic point 1, elbow joint characteristic point 1, shoulder joint characteristic point 1, and indicate affiliated action classification.
It wherein raises one's hand to act label to be 1, other movement labels are 0.Such as, but not limited to, mark aspect first manually marks nearly 10,000
Data set is opened, data enhancing aspect includes flip horizontal, scaling, cutting, translation and addition noise etc., but cannot be using vertical
Overturning.After getting a certain number of image pattern collection comprising upper limb, algorithm and artificial mark, calibration, building instruction are carried out
Practice sample database.The normalized and data enhancing of size are carried out to the sample database of production.
Further, the first convolutional neural networks are constructed.Then utilize the first constructed convolutional neural networks, generate with
Corresponding first output vector of sample image, wherein the first output vector is used to indicate the characteristic point for including in sample image in sample
Location information in this image.Then by the first output vector and pre-set the first label-vector corresponding with sample image
It is compared, and adjusts the first convolutional neural networks according to the result of the comparison.To reach optimal recognition effect.
Optionally, further include that the second convolutional neural networks are trained by following operation: obtaining multiple comprising upper limb
Sample image;Construct the second convolutional neural networks;Using the second convolutional neural networks, corresponding with sample image second is generated
Output vector, wherein the second output vector is used to indicate the pass of the position between adjacent two characteristic point for including in sample image
System;And the second output vector and pre-set the second label-vector corresponding with sample image are compared, and root
The second convolutional neural networks are adjusted according to comparison result.
Specifically, same as above, it is also desirable to enough sample sets (set of sample image composition) to be got, wherein including
The various postures of human body under several scenes, the content that training set production mainly marks are each characteristic point of human upper limb.Wherein
Hand-characteristic point 21, wrist joint characteristic point 1, elbow joint characteristic point 1, shoulder joint characteristic point 1, and indicate affiliated movement
Classification.It wherein raises one's hand to act label to be 1, other movement labels are 0.Such as, but not limited to, mark aspect first manually marks
Nearly 10,000 data sets, data enhancing aspect includes flip horizontal, scaling, cutting, translation and addition noise etc., but cannot be made
Use flip vertical.After getting a certain number of image pattern collection comprising upper limb, algorithm and artificial mark, calibration are carried out,
Construct training sample database.The normalized and data enhancing of size are carried out to the sample database of production.
Further, the second convolutional neural networks are constructed.Then utilize the second constructed convolutional neural networks, generate with
Corresponding second output vector of sample image, wherein the second output vector is used to indicate adjacent two for including in sample image
Positional relationship between characteristic point.Then by the second output vector and pre-set corresponding with sample image second mark to
Amount is compared, and adjusts the second convolutional neural networks according to the result of the comparison.To reach optimal recognition effect.
Optionally, the first output vector and pre-set the first label-vector corresponding with sample image are compared
Operation, including calculating the first L2 space length between the first output vector and the first label-vector, and according to comparing
As a result the operation of the first convolutional neural networks is adjusted, comprising: using the first L2 space length as first-loss function, calculate first
The first gradient of loss function;And be based on first gradient, according to stochastic gradient descent principle to the first convolutional neural networks into
Row is adjusted.
Specifically, the first L2 space length between the first output vector and the first label-vector is calculated, then by first
L2 space length calculates the first gradient of first-loss function as first-loss function.Such as: first-loss function is using pre-
L2 space length between measured value and true value carries out backpropagation derivation with this, and circular is as follows:
Wherein, the corresponding z characteristic patterns of Z, p are characterized p-th of pixel of figure,For true value,For predicted value.
Then, according to stochastic gradient descent principle (Stochastic Gradient Descent, i.e. SGD) optimization first
The network parameter of convolutional neural networks.
Optionally, the second output vector and pre-set the second label-vector corresponding with sample image are compared
Operation, including calculating the 2nd L2 space length between the second output vector and the second label-vector, and according to comparing
As a result the operation of the second convolutional neural networks is adjusted, comprising: using the 2nd L2 space length as the second loss function, calculate second
Second gradient of loss function;And be based on the second gradient, according to stochastic gradient descent principle to the second convolutional neural networks into
Row is adjusted.Specifically, the above-mentioned mode that first convolutional neural networks are adjusted is referred to the second convolution nerve net
Network is adjusted.
In addition, recognition methods of raising one's hand described in the present embodiment, specifically includes the following steps:
Step 1: collecting a certain number of image pattern collection comprising upper limb, carries out algorithm and artificial mark, calibration,
Construct training sample database.The normalized and data enhancing of size are carried out to the sample database of production.
Step 2: establishing low accuracy characteristic spot detector using deep learning algorithm, carries out to the classified sample set of production
Training generates svm classifier model, and the movement that will raise one's hand raises one's hand to act effective classification with non-.
Step 3: acquisition new data carries out automatic marking to sample is not marked using trained simple model.
Step 4: manual calibration simple model labeled data collection generated expands training sample.
Step 5: continue the detection of training characteristics point and disaggregated model of raising one's hand using training sample is expanded.
Step 6: step 3 is repeated to step 5, until accuracy rate reaches model accuracy rate and reaches requirement.
Step 7: image data is read respectively from each IP Camera, by picture number needed for decoder module generating algorithm
According to.
Step 8: step 7 data generated are inputted in trained neural network model, and by recognition result
Back to terminal.
Wherein, in above-mentioned steps one data acquisition specifically includes the following steps:
It needs to be collected into enough sample sets first, wherein the various postures comprising human body under several scenes, training set system
Making the content mainly marked is that human body Upper Limb levies point, wherein hand-characteristic point 21, and wrist joint characteristic point 1, elbow joint is special
Sign point 1, shoulder joint characteristic point 1, and indicate affiliated action classification, wherein raising one's hand to act label to be 1, other movement labels are
0.Mark aspect first manually marks nearly 10,000 data sets;Data enhancing aspect includes flip horizontal, scaling, cutting, puts down
Move, add noise etc., flip vertical cannot be used.
Model construction in above-mentioned steps two is specifically includes the following steps: establish deep learning algorithm model, referring to Fig. 2 institute
Show, feature extraction network uses first 8 layers of the VGG16 pre-training network model after fine tuning in the present embodiment, exports as width
Height is respectively 512 dimensional feature figures of 1/8 size of input picture, and the convolutional layer for being followed by a 1x1 generates the characteristic pattern of 128 dimensions, with
This is as feature extraction result input feature vector point computation layer, wherein being divided into 2 branches, left-hand branch output is the human body detected
The location information and confidence level of upper limb characteristic point, right-hand branch export the positional relationship between each adjacent characteristic point, calculate
Process is as follows:
Wherein, the characteristic pattern for 128 dimensions that the convolutional layer that F is 1x1 exports,Respectively left-hand branch and right side point
The convolutional calculation process in branch t stage, JtFor the calculated result in left-hand branch t stage, AtFor the meter in right-hand branch t stage
Calculate result.Loss function carries out backpropagation derivation using the L2 space length between predicted value and true value with this, specific to count
Calculation method is as follows:
Wherein, the corresponding z characteristic patterns of Z, p are characterized p-th of pixel of figure,For true value,For predicted value.
Disaggregated model part uses support vector machines, using single upper limb and the coordinate information of singlehanded characteristic point as inputting,
Whether to raise one's hand to be trained as output, the feature point coordinate data of loss is calculated with 0.
Model construction in above-mentioned steps three, four, five, six specifically includes the following steps:
Completely new image data is collected, by trained feature point detector and act in the data input step two being collected into
In hand classifier, coarse training sample is obtained whereby, on the basis of this sample mark, carries out artificial mark correction, mark
After the completion of note, continue using data enhancement methods EDS extended data set in step 1, this step can help to save a large amount of marks manually
Cost.Reinforcement training is carried out to old model with the data sample expanded later, continuing this operation can continuous Lifting Modules
The robustness and precision of type.
Video flowing is read specifically includes the following steps: being decided whether according to the hardware of algorithm platform composition in above-mentioned steps seven
Video flowing hard decoder scheme is taken, flower screen when solving the problems, such as video decoding by modifying transport protocol, and formulate in camera
It has no progeny the function of timing reconnection, improves system robustness and simultaneously improve arithmetic accuracy from side.
Video flowing reads specific further comprising the steps of in above-mentioned steps seven: with the above-mentioned obtained act of step operation result
Hand detection model is core, and foundation raises one's hand to detect application system;Using the real-time data capture method for reading IP Camera, root
The quantity of single batch algorithm process image is determined according to the computing capability of terminal, after algorithm process, according to different application scenarios
Raise one's hand testing result display mode, including testing result real-time display are changed, detects behavior push alarm signal etc. of raising one's hand.
To which, recognition methods of raising one's hand in this way, provided by this implementations has reached following effect: utilization on a small quantity with
Labeled data carries out loop-around data and expands automatically, establishes deep neural network and carries out characteristic image extraction and upper limb and hand spy
The calculating of sign point coordinate confidence level, and the calculated upper limb of institute and hand-characteristic point are carried out raising one's hand to sentence by support vector machines
It is disconnected, and Feature Selection Model and SVM are reversely updated by deep learning algorithm, accelerate detection speed, improves computational accuracy.
It raises one's hand to detect relative to what algorithm of target detection was realized, this method can achieve similar speed, and accurate
Have greatly improved in terms of degree, robustness, difficult identification object recognition rate.
It raises one's hand recognition detection relative to the dynamic containing track algorithm, this method can evade falling existing for many track algorithms
Problem such as target is lost, and target is obscured, multiple target tracking difficulty etc., while the speed of the disclosure substantially leads over track algorithm,
It can be to the real-time detection of multi-path video stream.
Compared to other algorithms, this method deployment it is high-efficient, it is at low cost, can mobile terminal dispose multi-channel video examine in real time
It surveys.
In addition, the disclosure proposed based on the improved recognition methods of raising one's hand of deep learning due to having carried out one to calculation amount
Fixed compression, calculating speed is very fast, and this method can be arranged in vision mobile terminal, pass through the side of automatic marking exptended sample
Formula saves a large amount of mark cost and obtains a large amount of training sample, which greatly enhances the robustness of model and accurately
Degree, saves cpu computing resource by hard decoder scheme, the modification of transport protocol solves the problems, such as video messy code, model knot
The selection of structure solve the problems, such as due to raise one's hand to detect scene is excessive and number excessively to cause to calculate pressure larger and complicated be stranded
Erroneous detection and missing inspection problem caused by difficult scene Recognition inaccuracy.
In addition, refering to what is shown in Fig. 1, providing a kind of storage medium 104 in terms of according to the third of the present embodiment.Storage
Medium 104 includes the program of storage, wherein the method as described in processor execution any of the above one in program operation.
It should be noted that for the various method embodiments described above, for simple description, therefore, it is stated as a series of
Combination of actions, but those skilled in the art should understand that, the present invention is not limited by the sequence of acts described because
According to the present invention, some steps may be performed in other sequences or simultaneously.Secondly, those skilled in the art should also know
It knows, the embodiments described in the specification are all preferred embodiments, and related actions and modules is not necessarily of the invention
It is necessary.
Through the above description of the embodiments, those skilled in the art can be understood that according to above-mentioned implementation
The method of example can be realized by means of software and necessary general hardware platform, naturally it is also possible to by hardware, but it is very much
In the case of the former be more preferably embodiment.Based on this understanding, technical solution of the present invention is substantially in other words to existing
The part that technology contributes can be embodied in the form of software products, which is stored in a storage
In medium (such as ROM/RAM, magnetic disk, CD), including some instructions are used so that a terminal device (can be mobile phone, calculate
Machine, server or network equipment etc.) execute method described in each embodiment of the present invention.
Embodiment 2
Fig. 7 shows the action recognition device 700 of raising one's hand according to the present embodiment based on deep learning, the device
700 is corresponding with the method according to the first aspect of embodiment 1.Refering to what is shown in Fig. 7, the device 700 includes: acquisition mould
Block 710, for obtaining the image of the upper limb comprising object to be identified;Determining module 720, for using based on deep learning training
Feature Selection Model, determine the location information of multiple characteristic points of the upper limb of object to be identified in the picture;And determine mould
Block 730, for determining whether object to be identified is made using pre-set disaggregated model according to identified location information
It raises one's hand to act.
Optionally, Feature Selection Model includes the first convolutional neural networks, and determining module 720 includes: the first generation
Submodule generates the fisrt feature figure in multiple channels according to image for utilizing the first convolutional neural networks;First determines submodule
Block is determined as characteristic point for the greatest measure point in the fisrt feature figure by each channel;And second determine submodule, use
In the position according to characteristic point in respective fisrt feature figure, the location information of characteristic point in the picture is determined.
Optionally, Feature Selection Model further includes the second convolutional neural networks, and determining module 720 further include: second
Submodule is generated, for utilizing the second convolutional neural networks, the second feature figure in multiple channels is generated according to image;Third determines
Submodule, for determining the positional relationship between two characteristic points adjacent in multiple characteristic points according to second feature figure;And
4th determination submodule, is used for the positional relationship according to determined by second feature figure, to spy determined by utilization fisrt feature figure
Sign point is screened, and determines the characteristic point to be sorted for classifying.
Optionally, determination module 730 includes: decision sub-module, for the location information according to characteristic point to be sorted, is utilized
Pre-set supporting vector machine model, determines whether object to be identified makes the operation for movement of raising one's hand.
It optionally, further include the first training module, for being instructed by following submodule to the first convolutional neural networks
Practice: the first acquisition submodule, for obtaining multiple sample images comprising upper limb;First building submodule, for constructing first
Convolutional neural networks;First generates submodule, for utilizing the first convolutional neural networks, generates corresponding with sample image first
Output vector, wherein the first output vector is used to indicate position letter of the characteristic point for including in sample image in sample image
Breath;And first Comparative sub-module, for the first output vector to be marked with pre-set corresponding with sample image first
Vector is compared, and adjusts the first convolutional neural networks according to the result of the comparison.
It optionally, further include the second training module, for being instructed by following submodule to the second convolutional neural networks
Practice: the second acquisition submodule, for obtaining multiple sample images comprising upper limb;Second building submodule, for constructing second
Convolutional neural networks;Second generates submodule, for utilizing the second convolutional neural networks, generates corresponding with sample image second
Output vector, wherein the second output vector is used to indicate the pass of the position between adjacent two characteristic point for including in sample image
System;And second Comparative sub-module, for the second output vector to be marked with pre-set corresponding with sample image second
Vector is compared, and adjusts the second convolutional neural networks according to the result of the comparison.
Optionally, the first Comparative sub-module includes: the first computing unit, for calculating the first output vector and the first mark
The first L2 space length between vector;Second computing unit, for using the first L2 space length as first-loss function, meter
Calculate the first gradient of first-loss function;And first adjust unit, for be based on first gradient, according to stochastic gradient descent original
The first convolutional neural networks are adjusted in reason.
Optionally, the second Comparative sub-module includes: third computing unit, for calculating the second output vector and the second mark
The 2nd L2 space length between vector;4th computing unit, for using the 2nd L2 space length as the second loss function, meter
Calculate the second gradient of the second loss function;And second adjust unit, for be based on the second gradient, according to stochastic gradient descent original
The second convolutional neural networks are adjusted in reason.
To obtain the image of the upper limb comprising object to be identified first according to the present embodiment, then using based on depth
The Feature Selection Model of learning training determines position of the multiple characteristic points of the upper limb of the object to be identified in described image
Then information determines whether the object to be identified does using pre-set disaggregated model according to identified location information
It raises one's hand to act out.The technical solution of the application can obtain image data from multiplex image acquisition equipment simultaneously, use depth
The Feature Selection Model of learning training obtains the location information of multiple characteristic points of the upper limb of object to be identified in the picture in real time,
Then according to identified location information, using pre-set disaggregated model, whether real-time judgment object to be identified makes act
Manually make, be not required to compute repeatedly, can identify raise one's hand be intended to and detection accuracy it is higher.And since this programme is used based on deep
The Feature Selection Model of learning training is spent, therefore does not need sensor relative to traditional recognition methods of raising one's hand, easy to use, meter
Calculation occupancy resource is not high, and cost is relatively low, is easy to universal.And then solves existing recognizer of raising one's hand existing in the prior art
Data preparation higher cost, algorithm calculate the time compared with long, the amount of computing repeatedly is big, multiway images cannot be supported to input, can not identify
It raises one's hand to be intended to and the lower technical problem of detection accuracy.
Embodiment 3
Fig. 8 shows the action recognition device 800 of raising one's hand according to the present embodiment based on deep learning, the device
800 is corresponding with the method according to the first aspect of embodiment 1.Refering to what is shown in Fig. 8, the device 800 includes: processor
810;And memory 820, it is connect with processor 810, for providing the instruction for handling following processing step for processor 810:
Obtain the image of the upper limb comprising object to be identified;Using the Feature Selection Model based on deep learning training, determine to be identified
The location information of multiple characteristic points of the upper limb of object in the picture;And according to identified location information, using setting in advance
The disaggregated model set determines whether object to be identified is made and raises one's hand to act.
Optionally, Feature Selection Model includes the first convolutional neural networks, and utilizes the spy based on deep learning training
Sign extracts model, determines the operation of the location information of multiple characteristic points of the upper limb of object to be identified in the picture, comprising: utilizes
First convolutional neural networks generate the fisrt feature figure in multiple channels according to image;It will be in the fisrt feature figure in each channel
Greatest measure point is determined as characteristic point;And the position according to characteristic point in respective fisrt feature figure, determine that characteristic point exists
Location information in image.
Optionally, Feature Selection Model further includes the second convolutional neural networks, and using based on deep learning training
Feature Selection Model determines the operation of the location information of multiple characteristic points of the upper limb of object to be identified in the picture, further includes:
Using the second convolutional neural networks, the second feature figure in multiple channels is generated according to image;It is determined according to second feature figure multiple
Positional relationship in characteristic point between two adjacent characteristic points;And the positional relationship according to determined by second feature figure, it is right
It is screened using characteristic point determined by fisrt feature figure, and determines the characteristic point to be sorted for classifying.
Optionally, whether object to be identified is determined using pre-set disaggregated model according to identified location information
Make the operation for movement of raising one's hand, further includes: according to the location information of characteristic point to be sorted, utilize pre-set support vector machines
Model, determines whether object to be identified makes the operation for movement of raising one's hand.
Optionally, memory 820 is also used to provide the instruction for handling following processing step for processor 810: by following
Operation is trained the first convolutional neural networks: obtaining multiple sample images comprising upper limb;Construct the first convolution nerve net
Network;Using the first convolutional neural networks, the first output vector corresponding with sample image is generated, wherein the first output vector is used for
Indicate location information of the characteristic point for including in sample image in sample image;And by the first output vector with preset
The first label-vector corresponding with sample image be compared, and adjust the first convolution nerve net according to the result of the comparison
Network.
Optionally, memory 820 is also used to provide the instruction for handling following processing step for processor 810: by following
Operation is trained the second convolutional neural networks: obtaining multiple sample images comprising upper limb;Construct the second convolution nerve net
Network;Using the second convolutional neural networks, the second output vector corresponding with sample image is generated, wherein the second output vector is used for
Indicate the positional relationship between adjacent two characteristic point for including in sample image;And by the second output vector with set in advance
The second label-vector corresponding with sample image set is compared, and adjusts the second convolution nerve net according to the result of the comparison
Network.
Optionally, the first output vector and pre-set the first label-vector corresponding with sample image are compared
Operation, including calculating the first L2 space length between the first output vector and the first label-vector, and according to comparing
As a result the operation of the first convolutional neural networks is adjusted, comprising: using the first L2 space length as first-loss function, calculate first
The first gradient of loss function;And be based on first gradient, according to stochastic gradient descent principle to the first convolutional neural networks into
Row is adjusted.
Optionally, the second output vector and pre-set the second label-vector corresponding with sample image are compared
Operation, including calculating the 2nd L2 space length between the second output vector and the second label-vector, and according to comparing
As a result the operation of the second convolutional neural networks is adjusted, comprising: using the 2nd L2 space length as the second loss function, calculate second
Second gradient of loss function;And be based on the second gradient, according to stochastic gradient descent principle to the second convolutional neural networks into
Row is adjusted.
To which according to the present embodiment, processor obtains the image of the upper limb comprising object to be identified first, base is then utilized
In the Feature Selection Model of deep learning training, determine multiple characteristic points of the upper limb of the object to be identified in described image
Location information, the object to be identified is then determined using pre-set disaggregated model according to identified location information
Whether make and raises one's hand to act.The technical solution of the application can obtain image data from multiplex image acquisition equipment simultaneously, make
The position of multiple characteristic points of the upper limb of object to be identified in the picture is obtained in real time with the Feature Selection Model of deep learning training
Confidence breath, then according to identified location information, using pre-set disaggregated model, whether real-time judgment object to be identified
Make and raise one's hand to act, be not required to compute repeatedly, can identify raise one's hand be intended to and detection accuracy it is higher.And since this programme uses
Based on the Feature Selection Model of deep learning training, therefore sensor is not needed relative to traditional recognition methods of raising one's hand, used
Convenient, calculating occupancy resource is not high, and cost is relatively low, is easy to universal.And then solve existing in the prior art existing raise one's hand
Recognizer data preparation higher cost, algorithm calculate the time is big compared with the long, amount of computing repeatedly, cannot support multiway images input,
It can not identify the technical problem for raising one's hand to be intended to and detection accuracy is lower.
The serial number of the above embodiments of the invention is only for description, does not represent the advantages or disadvantages of the embodiments.
In the above embodiment of the invention, it all emphasizes particularly on different fields to the description of each embodiment, does not have in some embodiment
The part of detailed description, reference can be made to the related descriptions of other embodiments.
In several embodiments provided herein, it should be understood that disclosed technology contents can pass through others
Mode is realized.Wherein, the apparatus embodiments described above are merely exemplary, such as the division of the unit, only
A kind of logical function partition, there may be another division manner in actual implementation, for example, multiple units or components can combine or
Person is desirably integrated into another system, or some features can be ignored or not executed.Another point, shown or discussed is mutual
Between coupling, direct-coupling or communication connection can be through some interfaces, the INDIRECT COUPLING or communication link of unit or module
It connects, can be electrical or other forms.
The unit as illustrated by the separation member may or may not be physically separated, aobvious as unit
The component shown may or may not be physical unit, it can and it is in one place, or may be distributed over multiple
In network unit.It can select some or all of unit therein according to the actual needs to realize the mesh of this embodiment scheme
's.
It, can also be in addition, the functional units in various embodiments of the present invention may be integrated into one processing unit
It is that each unit physically exists alone, can also be integrated in one unit with two or more units.Above-mentioned integrated list
Member both can take the form of hardware realization, can also realize in the form of software functional units.
If the integrated unit is realized in the form of SFU software functional unit and sells or use as independent product
When, it can store in a computer readable storage medium.Based on this understanding, technical solution of the present invention is substantially
The all or part of the part that contributes to existing technology or the technical solution can be in the form of software products in other words
It embodies, which is stored in a storage medium, including some instructions are used so that a computer
Equipment (can for personal computer, server or network equipment etc.) execute each embodiment the method for the present invention whole or
Part steps.And storage medium above-mentioned includes: that USB flash disk, read-only memory (ROM, Read-Only Memory), arbitrary access are deposited
Reservoir (RAM, Random Access Memory), mobile hard disk, magnetic or disk etc. be various to can store program code
Medium.
The above is only a preferred embodiment of the present invention, it is noted that for the ordinary skill people of the art
For member, various improvements and modifications may be made without departing from the principle of the present invention, these improvements and modifications are also answered
It is considered as protection scope of the present invention.
Claims (10)
1. a kind of action identification method of raising one's hand based on deep learning characterized by comprising
Obtain the image of the upper limb comprising object to be identified;
Using the Feature Selection Model based on deep learning training, determine that multiple characteristic points of the upper limb of the object to be identified exist
Location information in described image;And
Determine whether the object to be identified makes act using pre-set disaggregated model according to identified location information
Make manually.
2. the method according to claim 1, wherein the Feature Selection Model includes the first convolution nerve net
Network, and using the Feature Selection Model based on deep learning training, determine multiple features of the upper limb of the object to be identified
The operation of location information of the point in described image, comprising:
Using first convolutional neural networks, the fisrt feature figure in multiple channels is generated according to described image;
Greatest measure point in the fisrt feature figure in each channel is determined as the characteristic point;And
According to position of the characteristic point in respective fisrt feature figure, position of the characteristic point in described image is determined
Information.
3. according to the method described in claim 2, it is characterized in that, the Feature Selection Model further includes the second convolution nerve net
Network, and using the Feature Selection Model based on deep learning training, determine multiple features of the upper limb of the object to be identified
The operation of location information of the point in described image, further includes:
Using second convolutional neural networks, the second feature figure in multiple channels is generated according to described image;
The positional relationship between two characteristic points adjacent in the multiple characteristic point is determined according to the second feature figure;And
According to positional relationship determined by the second feature figure, carried out to using characteristic point determined by the fisrt feature figure
Screening, and determine the characteristic point to be sorted for classifying.
4. according to the method described in claim 3, it is characterized in that, according to identified location information, utilization is pre-set
Disaggregated model, determines whether the object to be identified makes the operation for movement of raising one's hand, further includes: according to the characteristic point to be sorted
Location information determine whether the object to be identified makes movement of raising one's hand using pre-set supporting vector machine model
Operation.
5. according to the method described in claim 4, it is characterized in that, further including by following operation to first convolutional Neural
Network is trained:
Obtain multiple sample images comprising upper limb;
Construct the first convolutional neural networks;
Using first convolutional neural networks, the first output vector corresponding with the sample image is generated, wherein described the
One output vector is used to indicate location information of the characteristic point in the sample image included in the sample image;And
First output vector and pre-set the first label-vector corresponding with the sample image are compared, and
And first convolutional neural networks are adjusted according to the result of the comparison.
6. according to the method described in claim 4, it is characterized in that, further including by following operation to second convolutional Neural
Network is trained:
Obtain multiple sample images comprising upper limb;
Construct the second convolutional neural networks;
Using second convolutional neural networks, the second output vector corresponding with the sample image is generated, wherein described the
Two output vectors are used to indicate the positional relationship between adjacent two characteristic point for including in the sample image;And
Second output vector and pre-set the second label-vector corresponding with the sample image are compared, and
And second convolutional neural networks are adjusted according to the result of the comparison.
7. according to the method described in claim 5, it is characterized in that, by first output vector with it is pre-set with it is described
The operation that corresponding first label-vector of sample image is compared, including calculate first output vector and first mark
The first L2 space length between vector is infused, and adjusts the operation of first convolutional neural networks according to the result of the comparison,
Include:
Using the first L2 space length as first-loss function, the first gradient of the first-loss function is calculated;And
Based on the first gradient, first convolutional neural networks are adjusted according to stochastic gradient descent principle.
8. according to the method described in claim 6, it is characterized in that, by second output vector with it is pre-set with it is described
The operation that corresponding second label-vector of sample image is compared, including calculate second output vector and second mark
The 2nd L2 space length between vector is infused, and adjusts the operation of second convolutional neural networks according to the result of the comparison,
Include:
Using the 2nd L2 space length as the second loss function, the second gradient of second loss function is calculated;And
Based on second gradient, second convolutional neural networks are adjusted according to stochastic gradient descent principle.
9. a kind of storage medium, which is characterized in that the storage medium includes the program of storage, wherein run in described program
When as processor perform claim require any one of 1 to 8 described in method.
10. a kind of action recognition device of raising one's hand based on deep learning characterized by comprising
First obtains module, for obtaining the image of the upper limb comprising object to be identified;
Determining module, for determining the upper limb of the object to be identified using the Feature Selection Model based on deep learning training
Location information of multiple characteristic points in described image;And
Determination module, for according to identified location information, using pre-set disaggregated model, it to be described to be identified right to determine
As if no make raises one's hand to act.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910647658.4A CN110399822A (en) | 2019-07-17 | 2019-07-17 | Action identification method of raising one's hand, device and storage medium based on deep learning |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910647658.4A CN110399822A (en) | 2019-07-17 | 2019-07-17 | Action identification method of raising one's hand, device and storage medium based on deep learning |
Publications (1)
Publication Number | Publication Date |
---|---|
CN110399822A true CN110399822A (en) | 2019-11-01 |
Family
ID=68324493
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910647658.4A Pending CN110399822A (en) | 2019-07-17 | 2019-07-17 | Action identification method of raising one's hand, device and storage medium based on deep learning |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110399822A (en) |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110991533A (en) * | 2019-12-03 | 2020-04-10 | Oppo广东移动通信有限公司 | Image recognition method, recognition device, terminal device and readable storage medium |
CN111339905A (en) * | 2020-02-22 | 2020-06-26 | 郑州铁路职业技术学院 | CIM well lid state visual detection system based on deep learning and multi-view angle |
CN112818802A (en) * | 2021-01-26 | 2021-05-18 | 四川天翼网络服务有限公司 | Bank counter personnel hand-lifting identification method and system |
WO2022237481A1 (en) * | 2021-05-12 | 2022-11-17 | 北京百度网讯科技有限公司 | Hand-raising recognition method and apparatus, electronic device, and storage medium |
CN117670259A (en) * | 2024-01-31 | 2024-03-08 | 天津师范大学 | Sample detection information management method |
Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107644190A (en) * | 2016-07-20 | 2018-01-30 | 北京旷视科技有限公司 | Pedestrian's monitoring method and device |
US20180047196A1 (en) * | 2016-08-11 | 2018-02-15 | Integem Inc. | Intelligent augmented reality (iar) platform-based communication system |
CN107808376A (en) * | 2017-10-31 | 2018-03-16 | 上海交通大学 | A kind of detection method of raising one's hand based on deep learning |
CN108038452A (en) * | 2017-12-15 | 2018-05-15 | 厦门瑞为信息技术有限公司 | A kind of quick detection recognition method of household electrical appliances gesture based on topography's enhancing |
CN108304819A (en) * | 2018-02-12 | 2018-07-20 | 北京易真学思教育科技有限公司 | Gesture recognition system and method, storage medium |
CN109165552A (en) * | 2018-07-14 | 2019-01-08 | 深圳神目信息技术有限公司 | A kind of gesture recognition method based on human body key point, system and memory |
CN109508661A (en) * | 2018-10-31 | 2019-03-22 | 上海交通大学 | A kind of person's of raising one's hand detection method based on object detection and Attitude estimation |
CN109558865A (en) * | 2019-01-22 | 2019-04-02 | 郭道宁 | A kind of abnormal state detection method to the special caregiver of need based on human body key point |
CN109740446A (en) * | 2018-12-14 | 2019-05-10 | 深圳壹账通智能科技有限公司 | Classroom students ' behavior analysis method and device |
-
2019
- 2019-07-17 CN CN201910647658.4A patent/CN110399822A/en active Pending
Patent Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107644190A (en) * | 2016-07-20 | 2018-01-30 | 北京旷视科技有限公司 | Pedestrian's monitoring method and device |
US20180047196A1 (en) * | 2016-08-11 | 2018-02-15 | Integem Inc. | Intelligent augmented reality (iar) platform-based communication system |
CN107808376A (en) * | 2017-10-31 | 2018-03-16 | 上海交通大学 | A kind of detection method of raising one's hand based on deep learning |
CN108038452A (en) * | 2017-12-15 | 2018-05-15 | 厦门瑞为信息技术有限公司 | A kind of quick detection recognition method of household electrical appliances gesture based on topography's enhancing |
CN108304819A (en) * | 2018-02-12 | 2018-07-20 | 北京易真学思教育科技有限公司 | Gesture recognition system and method, storage medium |
CN109165552A (en) * | 2018-07-14 | 2019-01-08 | 深圳神目信息技术有限公司 | A kind of gesture recognition method based on human body key point, system and memory |
CN109508661A (en) * | 2018-10-31 | 2019-03-22 | 上海交通大学 | A kind of person's of raising one's hand detection method based on object detection and Attitude estimation |
CN109740446A (en) * | 2018-12-14 | 2019-05-10 | 深圳壹账通智能科技有限公司 | Classroom students ' behavior analysis method and device |
CN109558865A (en) * | 2019-01-22 | 2019-04-02 | 郭道宁 | A kind of abnormal state detection method to the special caregiver of need based on human body key point |
Cited By (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110991533A (en) * | 2019-12-03 | 2020-04-10 | Oppo广东移动通信有限公司 | Image recognition method, recognition device, terminal device and readable storage medium |
CN110991533B (en) * | 2019-12-03 | 2023-08-04 | Oppo广东移动通信有限公司 | Image recognition method, recognition device, terminal device and readable storage medium |
CN111339905A (en) * | 2020-02-22 | 2020-06-26 | 郑州铁路职业技术学院 | CIM well lid state visual detection system based on deep learning and multi-view angle |
CN111339905B (en) * | 2020-02-22 | 2022-07-08 | 郑州铁路职业技术学院 | CIM well lid state visual detection system based on deep learning and multiple visual angles |
CN112818802A (en) * | 2021-01-26 | 2021-05-18 | 四川天翼网络服务有限公司 | Bank counter personnel hand-lifting identification method and system |
CN112818802B (en) * | 2021-01-26 | 2022-07-05 | 四川天翼网络服务有限公司 | Bank counter personnel hand-lifting identification method and system |
WO2022237481A1 (en) * | 2021-05-12 | 2022-11-17 | 北京百度网讯科技有限公司 | Hand-raising recognition method and apparatus, electronic device, and storage medium |
CN117670259A (en) * | 2024-01-31 | 2024-03-08 | 天津师范大学 | Sample detection information management method |
CN117670259B (en) * | 2024-01-31 | 2024-04-19 | 天津师范大学 | Sample detection information management method |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110399822A (en) | Action identification method of raising one's hand, device and storage medium based on deep learning | |
CN108520229A (en) | Image detecting method, device, electronic equipment and computer-readable medium | |
CN109670441A (en) | A kind of realization safety cap wearing knows method for distinguishing, system, terminal and computer readable storage medium | |
CN109447169A (en) | The training method of image processing method and its model, device and electronic system | |
CN107844753A (en) | Pedestrian in video image recognition methods, device, storage medium and processor again | |
CN107657249A (en) | Method, apparatus, storage medium and the processor that Analysis On Multi-scale Features pedestrian identifies again | |
CN109815770A (en) | Two-dimentional code detection method, apparatus and system | |
CN109670452A (en) | Method for detecting human face, device, electronic equipment and Face datection model | |
CN109522967A (en) | A kind of commodity attribute recognition methods, device, equipment and storage medium | |
CN108470172A (en) | A kind of text information identification method and device | |
CN108304835A (en) | character detecting method and device | |
CN108197250B (en) | Picture retrieval method, electronic equipment and storage medium | |
CN110222611A (en) | Human skeleton Activity recognition method, system, device based on figure convolutional network | |
CN107808143A (en) | Dynamic gesture identification method based on computer vision | |
CN109978918A (en) | A kind of trajectory track method, apparatus and storage medium | |
CN109389599A (en) | A kind of defect inspection method and device based on deep learning | |
CN110991435A (en) | Express waybill key information positioning method and device based on deep learning | |
CN110084161A (en) | A kind of rapid detection method and system of skeleton key point | |
CN110378235A (en) | A kind of fuzzy facial image recognition method, device and terminal device | |
CN106469302A (en) | A kind of face skin quality detection method based on artificial neural network | |
CN108460362A (en) | A kind of system and method for detection human body | |
CN110135476A (en) | A kind of detection method of personal safety equipment, device, equipment and system | |
CN108304820A (en) | A kind of method for detecting human face, device and terminal device | |
CN109145766A (en) | Model training method, device, recognition methods, electronic equipment and storage medium | |
CN108256404A (en) | Pedestrian detection method and device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20191101 |