CN110399822A

CN110399822A - Action identification method of raising one's hand, device and storage medium based on deep learning

Info

Publication number: CN110399822A
Application number: CN201910647658.4A
Authority: CN
Inventors: 田志博; 朱博
Original assignee: Sparta Internet Of Things Technology (beijing) Co Ltd
Current assignee: Sparta Internet Of Things Technology (beijing) Co Ltd
Priority date: 2019-07-17
Filing date: 2019-07-17
Publication date: 2019-11-01

Abstract

This application discloses a kind of action identification method of raising one's hand, device and storage medium based on deep learning.Wherein, this method, comprising: obtain the image of the upper limb comprising object to be identified；Using the Feature Selection Model based on deep learning training, location information of the multiple characteristic points of the upper limb of the object to be identified in described image is determined；And according to identified location information, using pre-set disaggregated model, determines whether the object to be identified is made and raise one's hand to act.The disclosure can obtain image data from multiplex image acquisition equipment simultaneously, the location information of multiple characteristic points of the upper limb of object to be identified in the picture is obtained in real time using the Feature Selection Model of deep learning training, then according to identified location information, utilize pre-set disaggregated model, whether real-time judgment object to be identified, which is made, is raised one's hand to act, be not required to compute repeatedly, can identify raise one's hand be intended to and detection accuracy it is higher.

Description

Action identification method of raising one's hand, device and storage medium based on deep learning

Technical field

This application involves Activity recognition method fields, more particularly to a kind of action recognition side of raising one's hand based on deep learning Method, device and storage medium.

Background technique

With the development of scientific and technological level, the enhancing of computing resource ability, the life based on computer vision applied in people In work increasingly extensively, also occur much motion detection applications based on deep learning in the market, and examined at this stage based on target There are some problems for the recognizer of raising one's hand surveyed, including verification and measurement ratio is low, the sample that is blocked can not identify and can not Accurate Prediction Movement intention of target etc., and the consumption of the motion detection computing resource based on track algorithm is higher, can not largely input in data When it is competent, and exist when to multiple target tracking and obscure target and Loss.It therefore can be in higher robustness Under the premise of, realize that detection of raising one's hand is the deciding factor of measure algorithm superiority and inferiority in complex scene in high precision and quickly.

Currently used detection method of raising one's hand generally has following several method: one is ocular estimate, this recognition methods is only It is observed in real time suitable for human eye, when one-man service can not uninterruptedly complete the real-time monitoring of multichannel scene, in lack of manpower and nothing Task is unable to complete in the case of people, and human cost is relatively high.Another kind is raised one's hand based on target detection and track algorithm Detection method, this detection method carry out tracking merging to hand based on the successive frame of video, this cause algorithm calculation amount compared with Big and real-time is poor, multi-path camera cannot be supported to monitor in real time, is difficult with camera is more multi-environment, and cannot be accurate Identify manpower position and manpower posture, accuracy rate is not high, and there are more erroneous detections, missing inspection.

It is calculated for above-mentioned existing recognizer data preparation higher cost of raising one's hand existing in the prior art, algorithm Time is compared with long, the amount of computing repeatedly is big, cannot support multiway images input, can not identify raise one's hand intention and the lower skill of detection accuracy Art problem, currently no effective solution has been proposed.

Summary of the invention

Embodiment of the disclosure provides a kind of action identification method of raising one's hand based on deep learning, device and storage and is situated between Matter, at least to solve existing recognizer data preparation higher cost of raising one's hand existing in the prior art, algorithm calculates the time It is big compared with the long, amount of computing repeatedly, cannot support multiway images input, can not identify and raise one's hand to be intended to and the lower technology of detection accuracy is asked Topic.

According to the one aspect of the embodiment of the present disclosure, a kind of action identification method of raising one's hand based on deep learning is provided, It include: the image for obtaining the upper limb comprising object to be identified；Using based on deep learning training Feature Selection Model, determine to Identify the location information of multiple characteristic points of the upper limb of object in the picture；And according to identified location information, using pre- The disaggregated model being first arranged determines whether object to be identified is made and raises one's hand to act.

According to the other side of the embodiment of the present disclosure, a kind of storage medium is additionally provided, storage medium includes storage Program, wherein the method as described in processor execution any of the above one in program operation.

According to the other side of the embodiment of the present disclosure, a kind of action recognition dress of raising one's hand based on deep learning is additionally provided It sets, comprising: module is obtained, for obtaining the image of the upper limb comprising object to be identified；Determining module, for using based on depth The Feature Selection Model of learning training determines the location information of multiple characteristic points of the upper limb of object to be identified in the picture；With And determination module, for whether determining object to be identified using pre-set disaggregated model according to identified location information It makes and raises one's hand to act.

According to the other side of the embodiment of the present disclosure, a kind of action recognition dress of raising one's hand based on deep learning is additionally provided It sets, comprising: processor；And memory, it is connect with processor, for providing the finger for handling following processing step for processor It enables: obtaining the image of the upper limb comprising object to be identified；Using the Feature Selection Model based on deep learning training, determine wait know The location information of multiple characteristic points of the upper limb of other object in the picture；And according to identified location information, using preparatory The disaggregated model of setting determines whether object to be identified is made and raises one's hand to act.

In the embodiments of the present disclosure, a kind of action identification method of raising one's hand based on deep learning is provided, processor is first Then the image for obtaining the upper limb comprising object to be identified utilizes the Feature Selection Model based on deep learning training, determines institute Location information of the multiple characteristic points of the upper limb of object to be identified in described image is stated, is then believed according to identified position Breath, using pre-set disaggregated model, determines whether the object to be identified is made and raises one's hand to act.The technical solution of the application Image data can be obtained from multiplex image acquisition equipment simultaneously, real-time using the Feature Selection Model of deep learning training The location information of multiple characteristic points of the upper limb of object to be identified in the picture out, then according to identified location information, benefit With pre-set disaggregated model, whether real-time judgment object to be identified, which is made, is raised one's hand to act, and is not required to compute repeatedly, and can be identified It raises one's hand to be intended to and detection accuracy is higher.And since this programme uses the Feature Selection Model based on deep learning training, because This does not need sensor relative to traditional recognition methods of raising one's hand, easy to use, and calculating occupancy resource is not high, and cost is relatively low, easily In universal.And then solve existing recognizer data preparation higher cost of raising one's hand existing in the prior art, algorithm calculates Time is compared with long, the amount of computing repeatedly is big, cannot support multiway images input, can not identify raise one's hand intention and the lower skill of detection accuracy Art problem.

Detailed description of the invention

Attached drawing described herein is used to provide further understanding of the disclosure, constitutes part of this application, this public affairs The illustrative embodiments and their description opened do not constitute the improper restriction to the disclosure for explaining the disclosure.In the accompanying drawings:

Fig. 1 is the action recognition side of raising one's hand according to the first aspect of the embodiment of the present disclosure 1 based on deep learning The flow diagram of method；

Fig. 2 a be according to the first aspect of the embodiment of the present disclosure 1 under gesture of raising one's hand the multiple characteristic points of upper limb Distribution situation schematic diagram；

Fig. 2 b be according to the first aspect of the embodiment of the present disclosure 1 under non-gesture of raising one's hand the multiple features of upper limb One schematic diagram of the distribution situation of point；

Fig. 2 c be according to the first aspect of the embodiment of the present disclosure 1 under non-gesture of raising one's hand the multiple features of upper limb Another schematic diagram of the distribution situation of point；

Fig. 3 is the structural schematic diagram of the Feature Selection Model according to the first aspect of the embodiment of the present disclosure 1；

Fig. 4 is the schematic diagram of the fisrt feature figure according to the first aspect of the embodiment of the present disclosure 1；

Fig. 5 is a schematic diagram of the second feature figure according to the first aspect of the embodiment of the present disclosure 1；

Fig. 6 is another schematic diagram of the second feature figure according to the first aspect of the embodiment of the present disclosure 1；

Fig. 7 is the schematic diagram of the action recognition device of raising one's hand according to the embodiment of the present disclosure 2 based on deep learning；With And

Fig. 8 is the schematic diagram of the action recognition device of raising one's hand according to the embodiment of the present disclosure 3 based on deep learning.

Specific embodiment

In order to make those skilled in the art more fully understand the technical solution of the disclosure, implement below in conjunction with the disclosure Attached drawing in example, is clearly and completely described the technical solution in the embodiment of the present disclosure.Obviously, described embodiment The only embodiment of disclosure a part, instead of all the embodiments.Based on the embodiment in the disclosure, this field is common Disclosure protection all should belong in technical staff's every other embodiment obtained without making creative work Range.

It should be noted that the specification and claims of the disclosure and term " first " in above-mentioned attached drawing, " Two " etc. be to be used to distinguish similar objects, without being used to describe a particular order or precedence order.It should be understood that using in this way Data be interchangeable under appropriate circumstances, so as to embodiment of the disclosure described herein can in addition to illustrating herein or Sequence other than those of description is implemented.In addition, term " includes " and " having " and their any deformation, it is intended that cover Cover it is non-exclusive include, for example, the process, method, system, product or equipment for containing a series of steps or units are not necessarily limited to Step or unit those of is clearly listed, but may include be not clearly listed or for these process, methods, product Or other step or units that equipment is intrinsic.

Embodiment 1

According to the first aspect of the present embodiment, a kind of action identification method of raising one's hand based on deep learning is provided.Fig. 1 The flow diagram of this method is shown, refering to what is shown in Fig. 1, this method comprises:

S102: the image of the upper limb comprising object to be identified is obtained；

S104: using the Feature Selection Model based on deep learning training, multiple spies of the upper limb of object to be identified are determined The location information of sign point in the picture；And

S106: determine whether object to be identified does using pre-set disaggregated model according to identified location information It raises one's hand to act out.

As described in foregoing Background, the existing development with scientific and technological level, the enhancing of computing resource ability, Application based on computer vision increasingly extensively, also occurs much dynamic based on deep learning in the market in people's lives Make detection application, and at this stage the recognizer of raising one's hand based on target detection there are some problems, including verification and measurement ratio it is low, be blocked Sample can not identify and can not the movement of Accurate Prediction target be intended to etc., and the motion detection computing resource based on track algorithm Consume it is higher, can not be competent when data largely input, and exist when to multiple target tracking and obscure target and loss Phenomenon.Therefore it can realize that detection of raising one's hand is weighing apparatus in complex scene in high precision and quickly under the premise of higher robustness The deciding factor of quantity algorithm superiority and inferiority.

The problem of for above-mentioned background technique, present embodiments provides and a kind of raises one's hand to act based on deep learning Recognition methods, processor obtain the image of the upper limb comprising object to be identified first, then utilize based on deep learning training Feature Selection Model determines location information of the multiple characteristic points of the upper limb of the object to be identified in described image, then According to identified location information, using pre-set disaggregated model, determines whether the object to be identified is made and raise one's hand to move Make.The technical solution of the application can obtain image data from multiplex image acquisition equipment simultaneously, use deep learning training Feature Selection Model obtain in real time object to be identified upper limb multiple characteristic points location information in the picture, then basis Identified location information, using pre-set disaggregated model, whether real-time judgment object to be identified, which is made, is raised one's hand to act.No Need to compute repeatedly, can identify raise one's hand be intended to and detection accuracy it is higher.

Further, due under different gestures, the distribution situation of the distribution situation characteristic point of multiple characteristic points with And the positional relationship between two adjacent characteristic points is had nothing in common with each other.Therefore, Feature Selection Model can be by learning a large amount of packets It closes position between the distribution characteristics for the multiple characteristic points for including in image containing human upper limb and two adjacent characteristic points System, and then constantly improve accuracy rate.Wherein Fig. 2 a shows the distribution situation of the multiple characteristic points of upper limb under gesture of raising one's hand Schematic diagram, Fig. 2 b shows a schematic diagram of the distribution situation of the multiple characteristic points of upper limb under non-gesture of raising one's hand, and Fig. 2 c shows another schematic diagram of the distribution situation of the multiple characteristic points of upper limb under non-gesture of raising one's hand.Fig. 2 a, 2b and Each circular dot in 2c refers to a characteristic point.It include the image of one upper limb of human body referring to shown in Fig. 2 a, 2b and 2c In include multiple characteristic points have: hand-characteristic point 21, wrist joint characteristic point 1, elbow joint characteristic point 1 and shoulder joint Characteristic point 1.And since this programme uses the Feature Selection Model based on deep learning training, relative to traditional Recognition methods of raising one's hand does not need sensor, easy to use, calculates and occupies that resource is not high, and cost is relatively low, is easy to universal.

To which the technical solution of the present embodiment solves the existing recognizer data preparation higher cost, algorithm meter of raising one's hand Evaluation time is big compared with the long, amount of computing repeatedly, cannot support multiway images input, can not identify and raise one's hand to be intended to and detection accuracy is lower Technical problem.

In addition, for example, the technical solution of the present embodiment when obtaining the image of the upper limb comprising object to be identified, such as can To utilize the image capture devices such as camera, and the image of the upper limb comprising object to be identified is sent to processor.Wherein wrap The image of upper limb containing object to be identified for example can obtain video derived from the video of camera acquisition, processor from video flowing Data, and be image frame data by video data decoding, wherein image frame data is above-mentioned upper comprising object to be identified The image of limb.Flower screen when solving the problems, such as video decoding by modifying transport protocol, and formulate timing reconnection after camera interrupts Function, improve system robustness simultaneously from side improve arithmetic accuracy.

Optionally, Feature Selection Model includes the first convolutional neural networks, and utilizes the spy based on deep learning training Sign extracts model, determines the operation of the location information of multiple characteristic points of the upper limb of object to be identified in the picture, comprising: utilizes First convolutional neural networks generate the fisrt feature figure in multiple channels according to image；It will be in the fisrt feature figure in each channel Greatest measure point is determined as characteristic point；And the position according to characteristic point in respective fisrt feature figure, determine that characteristic point exists Location information in image.

Fig. 3 shows the structural schematic diagram of Feature Selection Model.Specifically, Feature Selection Model includes the first convolutional Neural Network, referring to shown in Fig. 3, Feature Selection Model includes 2 branches, wherein branching into the first convolution on the left of such as, but not limited to Neural network.First convolutional neural networks use first 8 layers of the VGG16 pre-training network model after fine tuning, export as width Height is respectively the characteristic pattern of 512 dimensions of 1/8 size of input picture, it is assumed that the size of the image of input is 512 × 512, then exports For the characteristic pattern of 512 dimensions of 64 × 64 sizes.The convolutional layer for being followed by a 1x1 generates 128 characteristic patterns tieed up of 64 × 64 sizes, Then in this, as feature extraction result input feature vector point computation layer, (referring to shown in Fig. 3, characteristic point computation layer is by multilayer convolutional layer Composition).Wherein it is possible to be arranged in the last layer convolutional layer of left-hand branch according to the quantity for the upper limb characteristic point for including in image The quantity of convolution kernel, the quantity for the characteristic point for including based on an above-mentioned upper limb is 24, therefore branch's (first volume in left side Product neural network) the last layer convolutional layer in the quantity of convolution kernel can be set to 24+1=25, wherein including 1 layer of background Layer.Therefore the fisrt feature figure of 64 × 64 sizes in 25 channels of the first convolutional neural networks final output.

Further, the greatest measure point in the fisrt feature figure in each channel is determined as characteristic point.Fig. 4 is illustrative The schematic diagram of fisrt feature figure is shown, referring to shown in Fig. 4, greatest measure point is the corresponding point of numerical value 0.9 in Fig. 4, i.e., will count The corresponding point of value 0.9 is determined as characteristic point.Then the position according to characteristic point in respective fisrt feature figure, determines characteristic point Location information in the picture.Such as: in Fig. 4, position of the characteristic point in fisrt feature figure is the position of last cell, this When position in conjunction with fisrt feature figure in the picture, it can determine the location information of characteristic point in the picture.To pass through this Kind mode, the location information of each characteristic point that can accurately and quickly include in determining image in the picture.

Optionally, Feature Selection Model further includes the second convolutional neural networks, and using based on deep learning training Feature Selection Model determines the operation of the location information of multiple characteristic points of the upper limb of object to be identified in the picture, further includes: Using the second convolutional neural networks, the second feature figure in multiple channels is generated according to image；It is determined according to second feature figure multiple Positional relationship in characteristic point between two adjacent characteristic points；And the positional relationship according to determined by second feature figure, it is right It is screened using characteristic point determined by fisrt feature figure, and determines the characteristic point to be sorted for classifying.

Specifically, Feature Selection Model includes the second convolutional neural networks, and referring to shown in Fig. 3, Feature Selection Model includes 2 A branch, wherein branching into the second convolutional neural networks on the right side of such as, but not limited to.Second convolutional neural networks are using process First 8 layers of VGG16 pre-training network model after fine tuning export the spies of 512 dimensions for wide high respectively 1/8 size of input picture Sign figure, it is assumed that the size of the image of input is 512 × 512, then output is the characteristic pattern of 512 dimensions of 64 × 64 sizes.It is followed by one The convolutional layer of a 1x1 generates the characteristic pattern of 128 dimensions of 64 × 64 sizes, then in this, as feature extraction result input feature vector point Computation layer (referring to shown in Fig. 3).Wherein it is possible to be arranged according to the quantity of the line between the upper limb characteristic point for including in image right The quantity of convolution kernel in the last layer convolutional layer of side branch, the line between characteristic point for including based on an above-mentioned upper limb Quantity be 22.Therefore the quantity of convolution kernel can in the last layer convolutional layer of branch's (the second convolutional neural networks) on right side To be set as 22.To the second feature figure of 64 × 64 sizes in 22 channels of the second convolutional neural networks final output.Its In, the second feature figure in each channel corresponds to a line between two characteristic points of the same upper limb.That is, each The second feature figure in channel is used to describe the positional relationship between two characteristic points of the same upper limb.To the spy in 22 channels Sign figure can be used in describing the positional relationship between 24 characteristic points of the same upper limb.

Further, determine that the position between two characteristic points adjacent in multiple characteristic points is closed according to second feature figure System.Such as, but not limited to: in the picture, a wrist joint feature is distributed in multiple characteristic points of the upper limb comprising object to be identified Point and two elbow joint characteristic points (such as elbow joint characteristic point 1 and elbow joint characteristic point 2).Fig. 5 illustratively shows One schematic diagram of two characteristic patterns.Such as: Fig. 5 shows the position between adjacent wrist joint characteristic point and elbow joint characteristic point 1 Relationship is set, wherein characteristic point corresponding to second feature figure upper end is wrist joint characteristic point, corresponding to second feature figure lower end Characteristic point is elbow joint characteristic point 1.In Fig. 5 numerical value be " 1 " orientation be used to indicate adjacent wrist joint characteristic point with Positional relationship between elbow joint characteristic point 1.Referring to Figure 5, it is vertical direction that numerical value, which is the orientation of " 1 ", therefore phase Positional relationship between adjacent wrist joint characteristic point and elbow joint characteristic point 1 is that elbow joint characteristic point 1 is located at wrist joint characteristic point Underface.

Further, Fig. 6 illustratively shows another schematic diagram of second feature figure.Specifically, Fig. 6 shows phase Positional relationship between adjacent wrist joint characteristic point and elbow joint characteristic point 2, wherein on the left of second feature figure corresponding to feature Point is wrist joint characteristic point, and characteristic point corresponding to second feature figure right side is elbow joint characteristic point 2.Numerical value is " 1 " in Fig. 6 Orientation be horizontal direction, therefore the positional relationship between adjacent wrist joint characteristic point and elbow joint characteristic point 2 be elbow Joint characteristic point is located at the front-right of wrist joint characteristic point.

Further, in conjunction with shown in Fig. 2 a, Fig. 2 a is shown raise one's hand gesture under positional relationship between each characteristic point, In In Fig. 2 a, the positional relationship between adjacent wrist joint characteristic point and elbow joint characteristic point is also that elbow joint characteristic point is located at wrist pass Save the underface of characteristic point.Therefore, it is possible to filter out wrist joint characteristic point and elbow joint characteristic point 1 is the spy of same upper limb Point is levied, elbow joint characteristic point 2 is not belonging to the characteristic point of same upper limb.To by wrist joint characteristic point and elbow joint characteristic point 1 Determine the characteristic point to be sorted for classifying.And so on, 24 features that same upper limb is included can be filtered out Point, and then filtered out 24 characteristic points are determined as to the characteristic point to be sorted for being used to classify.The present embodiment is ensured The accuracy of the action identification method of raising one's hand.

Wherein, the calculating process of above-mentioned characteristic point computation layer can be by following formula into calculating:

Wherein, the characteristic pattern for 128 dimensions that the convolutional layer that F is 1x1 exports,Respectively left-hand branch and right side point The convolutional calculation process in branch t stage, J_tFor the calculated result in left-hand branch t stage, A_tFor the meter in right-hand branch t stage Calculate result.

Optionally, whether object to be identified is determined using pre-set disaggregated model according to identified location information Make the operation for movement of raising one's hand, further includes: according to the location information of characteristic point to be sorted, utilize pre-set support vector machines Model, determines whether object to be identified makes the operation for movement of raising one's hand.

Specifically, according to the location information of characteristic point to be sorted, using pre-set supporting vector machine model with single The location information (such as position coordinates) for the characteristic point that upper limb includes is as input, whether to raise one's hand as output, therefore, it is determined that Whether object to be identified makes the operation for movement of raising one's hand.For example, the positional relationship of 24 characteristic points of the same upper limb can divide Not with each characteristic point position coordinates (x1, y1) in the picture, (x2, y2), (x3, y3) ... (x24, y24) is indicated.From And in this way, the position coordinates of 24 characteristic points of the same upper limb are input to pre-set support vector machines In model, it can quickly and accurately determine whether object to be identified is made and raise one's hand to act.

Wherein, support vector machines is a kind of disaggregated model of two relatively common classification, is repeated no more in the present embodiment.

Optionally, further include that the first convolutional neural networks are trained by following operation: obtaining multiple comprising upper limb Sample image；Construct the first convolutional neural networks；Using the first convolutional neural networks, corresponding with sample image first is generated Output vector, wherein the first output vector is used to indicate position letter of the characteristic point for including in sample image in sample image Breath；And the first output vector and pre-set the first label-vector corresponding with sample image are compared, and root The first convolutional neural networks are adjusted according to comparison result.

Specifically, it is necessary first to enough sample sets (set of sample image composition) is got, wherein including a variety of fields The various postures of human body under scape, the content that training set production mainly marks are each characteristic point of human upper limb.Wherein hand is special Sign point 21, wrist joint characteristic point 1, elbow joint characteristic point 1, shoulder joint characteristic point 1, and indicate affiliated action classification. It wherein raises one's hand to act label to be 1, other movement labels are 0.Such as, but not limited to, mark aspect first manually marks nearly 10,000 Data set is opened, data enhancing aspect includes flip horizontal, scaling, cutting, translation and addition noise etc., but cannot be using vertical Overturning.After getting a certain number of image pattern collection comprising upper limb, algorithm and artificial mark, calibration, building instruction are carried out Practice sample database.The normalized and data enhancing of size are carried out to the sample database of production.

Further, the first convolutional neural networks are constructed.Then utilize the first constructed convolutional neural networks, generate with Corresponding first output vector of sample image, wherein the first output vector is used to indicate the characteristic point for including in sample image in sample Location information in this image.Then by the first output vector and pre-set the first label-vector corresponding with sample image It is compared, and adjusts the first convolutional neural networks according to the result of the comparison.To reach optimal recognition effect.

Optionally, further include that the second convolutional neural networks are trained by following operation: obtaining multiple comprising upper limb Sample image；Construct the second convolutional neural networks；Using the second convolutional neural networks, corresponding with sample image second is generated Output vector, wherein the second output vector is used to indicate the pass of the position between adjacent two characteristic point for including in sample image System；And the second output vector and pre-set the second label-vector corresponding with sample image are compared, and root The second convolutional neural networks are adjusted according to comparison result.

Specifically, same as above, it is also desirable to enough sample sets (set of sample image composition) to be got, wherein including The various postures of human body under several scenes, the content that training set production mainly marks are each characteristic point of human upper limb.Wherein Hand-characteristic point 21, wrist joint characteristic point 1, elbow joint characteristic point 1, shoulder joint characteristic point 1, and indicate affiliated movement Classification.It wherein raises one's hand to act label to be 1, other movement labels are 0.Such as, but not limited to, mark aspect first manually marks Nearly 10,000 data sets, data enhancing aspect includes flip horizontal, scaling, cutting, translation and addition noise etc., but cannot be made Use flip vertical.After getting a certain number of image pattern collection comprising upper limb, algorithm and artificial mark, calibration are carried out, Construct training sample database.The normalized and data enhancing of size are carried out to the sample database of production.

Further, the second convolutional neural networks are constructed.Then utilize the second constructed convolutional neural networks, generate with Corresponding second output vector of sample image, wherein the second output vector is used to indicate adjacent two for including in sample image Positional relationship between characteristic point.Then by the second output vector and pre-set corresponding with sample image second mark to Amount is compared, and adjusts the second convolutional neural networks according to the result of the comparison.To reach optimal recognition effect.

Optionally, the first output vector and pre-set the first label-vector corresponding with sample image are compared Operation, including calculating the first L2 space length between the first output vector and the first label-vector, and according to comparing As a result the operation of the first convolutional neural networks is adjusted, comprising: using the first L2 space length as first-loss function, calculate first The first gradient of loss function；And be based on first gradient, according to stochastic gradient descent principle to the first convolutional neural networks into Row is adjusted.

Specifically, the first L2 space length between the first output vector and the first label-vector is calculated, then by first L2 space length calculates the first gradient of first-loss function as first-loss function.Such as: first-loss function is using pre- L2 space length between measured value and true value carries out backpropagation derivation with this, and circular is as follows:

Wherein, the corresponding z characteristic patterns of Z, p are characterized p-th of pixel of figure,For true value,For predicted value.

Then, according to stochastic gradient descent principle (Stochastic Gradient Descent, i.e. SGD) optimization first The network parameter of convolutional neural networks.

Optionally, the second output vector and pre-set the second label-vector corresponding with sample image are compared Operation, including calculating the 2nd L2 space length between the second output vector and the second label-vector, and according to comparing As a result the operation of the second convolutional neural networks is adjusted, comprising: using the 2nd L2 space length as the second loss function, calculate second Second gradient of loss function；And be based on the second gradient, according to stochastic gradient descent principle to the second convolutional neural networks into Row is adjusted.Specifically, the above-mentioned mode that first convolutional neural networks are adjusted is referred to the second convolution nerve net Network is adjusted.

In addition, recognition methods of raising one's hand described in the present embodiment, specifically includes the following steps:

Step 1: collecting a certain number of image pattern collection comprising upper limb, carries out algorithm and artificial mark, calibration, Construct training sample database.The normalized and data enhancing of size are carried out to the sample database of production.

Step 2: establishing low accuracy characteristic spot detector using deep learning algorithm, carries out to the classified sample set of production Training generates svm classifier model, and the movement that will raise one's hand raises one's hand to act effective classification with non-.

Step 3: acquisition new data carries out automatic marking to sample is not marked using trained simple model.

Step 4: manual calibration simple model labeled data collection generated expands training sample.

Step 5: continue the detection of training characteristics point and disaggregated model of raising one's hand using training sample is expanded.

Step 6: step 3 is repeated to step 5, until accuracy rate reaches model accuracy rate and reaches requirement.

Step 7: image data is read respectively from each IP Camera, by picture number needed for decoder module generating algorithm According to.

Step 8: step 7 data generated are inputted in trained neural network model, and by recognition result Back to terminal.

Wherein, in above-mentioned steps one data acquisition specifically includes the following steps:

It needs to be collected into enough sample sets first, wherein the various postures comprising human body under several scenes, training set system Making the content mainly marked is that human body Upper Limb levies point, wherein hand-characteristic point 21, and wrist joint characteristic point 1, elbow joint is special Sign point 1, shoulder joint characteristic point 1, and indicate affiliated action classification, wherein raising one's hand to act label to be 1, other movement labels are 0.Mark aspect first manually marks nearly 10,000 data sets；Data enhancing aspect includes flip horizontal, scaling, cutting, puts down Move, add noise etc., flip vertical cannot be used.

Model construction in above-mentioned steps two is specifically includes the following steps: establish deep learning algorithm model, referring to Fig. 2 institute Show, feature extraction network uses first 8 layers of the VGG16 pre-training network model after fine tuning in the present embodiment, exports as width Height is respectively 512 dimensional feature figures of 1/8 size of input picture, and the convolutional layer for being followed by a 1x1 generates the characteristic pattern of 128 dimensions, with This is as feature extraction result input feature vector point computation layer, wherein being divided into 2 branches, left-hand branch output is the human body detected The location information and confidence level of upper limb characteristic point, right-hand branch export the positional relationship between each adjacent characteristic point, calculate Process is as follows:

Wherein, the characteristic pattern for 128 dimensions that the convolutional layer that F is 1x1 exports,Respectively left-hand branch and right side point The convolutional calculation process in branch t stage, J_tFor the calculated result in left-hand branch t stage, A_tFor the meter in right-hand branch t stage Calculate result.Loss function carries out backpropagation derivation using the L2 space length between predicted value and true value with this, specific to count Calculation method is as follows:

Disaggregated model part uses support vector machines, using single upper limb and the coordinate information of singlehanded characteristic point as inputting, Whether to raise one's hand to be trained as output, the feature point coordinate data of loss is calculated with 0.

Model construction in above-mentioned steps three, four, five, six specifically includes the following steps:

Completely new image data is collected, by trained feature point detector and act in the data input step two being collected into In hand classifier, coarse training sample is obtained whereby, on the basis of this sample mark, carries out artificial mark correction, mark After the completion of note, continue using data enhancement methods EDS extended data set in step 1, this step can help to save a large amount of marks manually Cost.Reinforcement training is carried out to old model with the data sample expanded later, continuing this operation can continuous Lifting Modules The robustness and precision of type.

Video flowing is read specifically includes the following steps: being decided whether according to the hardware of algorithm platform composition in above-mentioned steps seven Video flowing hard decoder scheme is taken, flower screen when solving the problems, such as video decoding by modifying transport protocol, and formulate in camera It has no progeny the function of timing reconnection, improves system robustness and simultaneously improve arithmetic accuracy from side.

Video flowing reads specific further comprising the steps of in above-mentioned steps seven: with the above-mentioned obtained act of step operation result Hand detection model is core, and foundation raises one's hand to detect application system；Using the real-time data capture method for reading IP Camera, root The quantity of single batch algorithm process image is determined according to the computing capability of terminal, after algorithm process, according to different application scenarios Raise one's hand testing result display mode, including testing result real-time display are changed, detects behavior push alarm signal etc. of raising one's hand.

To which, recognition methods of raising one's hand in this way, provided by this implementations has reached following effect: utilization on a small quantity with Labeled data carries out loop-around data and expands automatically, establishes deep neural network and carries out characteristic image extraction and upper limb and hand spy The calculating of sign point coordinate confidence level, and the calculated upper limb of institute and hand-characteristic point are carried out raising one's hand to sentence by support vector machines It is disconnected, and Feature Selection Model and SVM are reversely updated by deep learning algorithm, accelerate detection speed, improves computational accuracy.

It raises one's hand to detect relative to what algorithm of target detection was realized, this method can achieve similar speed, and accurate Have greatly improved in terms of degree, robustness, difficult identification object recognition rate.

It raises one's hand recognition detection relative to the dynamic containing track algorithm, this method can evade falling existing for many track algorithms Problem such as target is lost, and target is obscured, multiple target tracking difficulty etc., while the speed of the disclosure substantially leads over track algorithm, It can be to the real-time detection of multi-path video stream.

Compared to other algorithms, this method deployment it is high-efficient, it is at low cost, can mobile terminal dispose multi-channel video examine in real time It surveys.

In addition, the disclosure proposed based on the improved recognition methods of raising one's hand of deep learning due to having carried out one to calculation amount Fixed compression, calculating speed is very fast, and this method can be arranged in vision mobile terminal, pass through the side of automatic marking exptended sample Formula saves a large amount of mark cost and obtains a large amount of training sample, which greatly enhances the robustness of model and accurately Degree, saves cpu computing resource by hard decoder scheme, the modification of transport protocol solves the problems, such as video messy code, model knot The selection of structure solve the problems, such as due to raise one's hand to detect scene is excessive and number excessively to cause to calculate pressure larger and complicated be stranded Erroneous detection and missing inspection problem caused by difficult scene Recognition inaccuracy.

In addition, refering to what is shown in Fig. 1, providing a kind of storage medium 104 in terms of according to the third of the present embodiment.Storage Medium 104 includes the program of storage, wherein the method as described in processor execution any of the above one in program operation.

It should be noted that for the various method embodiments described above, for simple description, therefore, it is stated as a series of Combination of actions, but those skilled in the art should understand that, the present invention is not limited by the sequence of acts described because According to the present invention, some steps may be performed in other sequences or simultaneously.Secondly, those skilled in the art should also know It knows, the embodiments described in the specification are all preferred embodiments, and related actions and modules is not necessarily of the invention It is necessary.

Through the above description of the embodiments, those skilled in the art can be understood that according to above-mentioned implementation The method of example can be realized by means of software and necessary general hardware platform, naturally it is also possible to by hardware, but it is very much In the case of the former be more preferably embodiment.Based on this understanding, technical solution of the present invention is substantially in other words to existing The part that technology contributes can be embodied in the form of software products, which is stored in a storage In medium (such as ROM/RAM, magnetic disk, CD), including some instructions are used so that a terminal device (can be mobile phone, calculate Machine, server or network equipment etc.) execute method described in each embodiment of the present invention.

Embodiment 2

Fig. 7 shows the action recognition device 700 of raising one's hand according to the present embodiment based on deep learning, the device 700 is corresponding with the method according to the first aspect of embodiment 1.Refering to what is shown in Fig. 7, the device 700 includes: acquisition mould Block 710, for obtaining the image of the upper limb comprising object to be identified；Determining module 720, for using based on deep learning training Feature Selection Model, determine the location information of multiple characteristic points of the upper limb of object to be identified in the picture；And determine mould Block 730, for determining whether object to be identified is made using pre-set disaggregated model according to identified location information It raises one's hand to act.

Optionally, Feature Selection Model includes the first convolutional neural networks, and determining module 720 includes: the first generation Submodule generates the fisrt feature figure in multiple channels according to image for utilizing the first convolutional neural networks；First determines submodule Block is determined as characteristic point for the greatest measure point in the fisrt feature figure by each channel；And second determine submodule, use In the position according to characteristic point in respective fisrt feature figure, the location information of characteristic point in the picture is determined.

Optionally, Feature Selection Model further includes the second convolutional neural networks, and determining module 720 further include: second Submodule is generated, for utilizing the second convolutional neural networks, the second feature figure in multiple channels is generated according to image；Third determines Submodule, for determining the positional relationship between two characteristic points adjacent in multiple characteristic points according to second feature figure；And 4th determination submodule, is used for the positional relationship according to determined by second feature figure, to spy determined by utilization fisrt feature figure Sign point is screened, and determines the characteristic point to be sorted for classifying.

Optionally, determination module 730 includes: decision sub-module, for the location information according to characteristic point to be sorted, is utilized Pre-set supporting vector machine model, determines whether object to be identified makes the operation for movement of raising one's hand.

It optionally, further include the first training module, for being instructed by following submodule to the first convolutional neural networks Practice: the first acquisition submodule, for obtaining multiple sample images comprising upper limb；First building submodule, for constructing first Convolutional neural networks；First generates submodule, for utilizing the first convolutional neural networks, generates corresponding with sample image first Output vector, wherein the first output vector is used to indicate position letter of the characteristic point for including in sample image in sample image Breath；And first Comparative sub-module, for the first output vector to be marked with pre-set corresponding with sample image first Vector is compared, and adjusts the first convolutional neural networks according to the result of the comparison.

It optionally, further include the second training module, for being instructed by following submodule to the second convolutional neural networks Practice: the second acquisition submodule, for obtaining multiple sample images comprising upper limb；Second building submodule, for constructing second Convolutional neural networks；Second generates submodule, for utilizing the second convolutional neural networks, generates corresponding with sample image second Output vector, wherein the second output vector is used to indicate the pass of the position between adjacent two characteristic point for including in sample image System；And second Comparative sub-module, for the second output vector to be marked with pre-set corresponding with sample image second Vector is compared, and adjusts the second convolutional neural networks according to the result of the comparison.

Optionally, the first Comparative sub-module includes: the first computing unit, for calculating the first output vector and the first mark The first L2 space length between vector；Second computing unit, for using the first L2 space length as first-loss function, meter Calculate the first gradient of first-loss function；And first adjust unit, for be based on first gradient, according to stochastic gradient descent original The first convolutional neural networks are adjusted in reason.

Optionally, the second Comparative sub-module includes: third computing unit, for calculating the second output vector and the second mark The 2nd L2 space length between vector；4th computing unit, for using the 2nd L2 space length as the second loss function, meter Calculate the second gradient of the second loss function；And second adjust unit, for be based on the second gradient, according to stochastic gradient descent original The second convolutional neural networks are adjusted in reason.

To obtain the image of the upper limb comprising object to be identified first according to the present embodiment, then using based on depth The Feature Selection Model of learning training determines position of the multiple characteristic points of the upper limb of the object to be identified in described image Then information determines whether the object to be identified does using pre-set disaggregated model according to identified location information It raises one's hand to act out.The technical solution of the application can obtain image data from multiplex image acquisition equipment simultaneously, use depth The Feature Selection Model of learning training obtains the location information of multiple characteristic points of the upper limb of object to be identified in the picture in real time, Then according to identified location information, using pre-set disaggregated model, whether real-time judgment object to be identified makes act Manually make, be not required to compute repeatedly, can identify raise one's hand be intended to and detection accuracy it is higher.And since this programme is used based on deep The Feature Selection Model of learning training is spent, therefore does not need sensor relative to traditional recognition methods of raising one's hand, easy to use, meter Calculation occupancy resource is not high, and cost is relatively low, is easy to universal.And then solves existing recognizer of raising one's hand existing in the prior art Data preparation higher cost, algorithm calculate the time compared with long, the amount of computing repeatedly is big, multiway images cannot be supported to input, can not identify It raises one's hand to be intended to and the lower technical problem of detection accuracy.

Embodiment 3

Fig. 8 shows the action recognition device 800 of raising one's hand according to the present embodiment based on deep learning, the device 800 is corresponding with the method according to the first aspect of embodiment 1.Refering to what is shown in Fig. 8, the device 800 includes: processor 810；And memory 820, it is connect with processor 810, for providing the instruction for handling following processing step for processor 810: Obtain the image of the upper limb comprising object to be identified；Using the Feature Selection Model based on deep learning training, determine to be identified The location information of multiple characteristic points of the upper limb of object in the picture；And according to identified location information, using setting in advance The disaggregated model set determines whether object to be identified is made and raises one's hand to act.

Optionally, memory 820 is also used to provide the instruction for handling following processing step for processor 810: by following Operation is trained the first convolutional neural networks: obtaining multiple sample images comprising upper limb；Construct the first convolution nerve net Network；Using the first convolutional neural networks, the first output vector corresponding with sample image is generated, wherein the first output vector is used for Indicate location information of the characteristic point for including in sample image in sample image；And by the first output vector with preset The first label-vector corresponding with sample image be compared, and adjust the first convolution nerve net according to the result of the comparison Network.

Optionally, memory 820 is also used to provide the instruction for handling following processing step for processor 810: by following Operation is trained the second convolutional neural networks: obtaining multiple sample images comprising upper limb；Construct the second convolution nerve net Network；Using the second convolutional neural networks, the second output vector corresponding with sample image is generated, wherein the second output vector is used for Indicate the positional relationship between adjacent two characteristic point for including in sample image；And by the second output vector with set in advance The second label-vector corresponding with sample image set is compared, and adjusts the second convolution nerve net according to the result of the comparison Network.

Optionally, the second output vector and pre-set the second label-vector corresponding with sample image are compared Operation, including calculating the 2nd L2 space length between the second output vector and the second label-vector, and according to comparing As a result the operation of the second convolutional neural networks is adjusted, comprising: using the 2nd L2 space length as the second loss function, calculate second Second gradient of loss function；And be based on the second gradient, according to stochastic gradient descent principle to the second convolutional neural networks into Row is adjusted.

To which according to the present embodiment, processor obtains the image of the upper limb comprising object to be identified first, base is then utilized In the Feature Selection Model of deep learning training, determine multiple characteristic points of the upper limb of the object to be identified in described image Location information, the object to be identified is then determined using pre-set disaggregated model according to identified location information Whether make and raises one's hand to act.The technical solution of the application can obtain image data from multiplex image acquisition equipment simultaneously, make The position of multiple characteristic points of the upper limb of object to be identified in the picture is obtained in real time with the Feature Selection Model of deep learning training Confidence breath, then according to identified location information, using pre-set disaggregated model, whether real-time judgment object to be identified Make and raise one's hand to act, be not required to compute repeatedly, can identify raise one's hand be intended to and detection accuracy it is higher.And since this programme uses Based on the Feature Selection Model of deep learning training, therefore sensor is not needed relative to traditional recognition methods of raising one's hand, used Convenient, calculating occupancy resource is not high, and cost is relatively low, is easy to universal.And then solve existing in the prior art existing raise one's hand Recognizer data preparation higher cost, algorithm calculate the time is big compared with the long, amount of computing repeatedly, cannot support multiway images input, It can not identify the technical problem for raising one's hand to be intended to and detection accuracy is lower.

The serial number of the above embodiments of the invention is only for description, does not represent the advantages or disadvantages of the embodiments.

In the above embodiment of the invention, it all emphasizes particularly on different fields to the description of each embodiment, does not have in some embodiment The part of detailed description, reference can be made to the related descriptions of other embodiments.

In several embodiments provided herein, it should be understood that disclosed technology contents can pass through others Mode is realized.Wherein, the apparatus embodiments described above are merely exemplary, such as the division of the unit, only A kind of logical function partition, there may be another division manner in actual implementation, for example, multiple units or components can combine or Person is desirably integrated into another system, or some features can be ignored or not executed.Another point, shown or discussed is mutual Between coupling, direct-coupling or communication connection can be through some interfaces, the INDIRECT COUPLING or communication link of unit or module It connects, can be electrical or other forms.

The unit as illustrated by the separation member may or may not be physically separated, aobvious as unit The component shown may or may not be physical unit, it can and it is in one place, or may be distributed over multiple In network unit.It can select some or all of unit therein according to the actual needs to realize the mesh of this embodiment scheme 's.

It, can also be in addition, the functional units in various embodiments of the present invention may be integrated into one processing unit It is that each unit physically exists alone, can also be integrated in one unit with two or more units.Above-mentioned integrated list Member both can take the form of hardware realization, can also realize in the form of software functional units.

If the integrated unit is realized in the form of SFU software functional unit and sells or use as independent product When, it can store in a computer readable storage medium.Based on this understanding, technical solution of the present invention is substantially The all or part of the part that contributes to existing technology or the technical solution can be in the form of software products in other words It embodies, which is stored in a storage medium, including some instructions are used so that a computer Equipment (can for personal computer, server or network equipment etc.) execute each embodiment the method for the present invention whole or Part steps.And storage medium above-mentioned includes: that USB flash disk, read-only memory (ROM, Read-Only Memory), arbitrary access are deposited Reservoir (RAM, Random Access Memory), mobile hard disk, magnetic or disk etc. be various to can store program code Medium.

The above is only a preferred embodiment of the present invention, it is noted that for the ordinary skill people of the art For member, various improvements and modifications may be made without departing from the principle of the present invention, these improvements and modifications are also answered It is considered as protection scope of the present invention.

Claims

1. a kind of action identification method of raising one's hand based on deep learning characterized by comprising

Obtain the image of the upper limb comprising object to be identified；

Using the Feature Selection Model based on deep learning training, determine that multiple characteristic points of the upper limb of the object to be identified exist Location information in described image；And

Determine whether the object to be identified makes act using pre-set disaggregated model according to identified location information Make manually.

2. the method according to claim 1, wherein the Feature Selection Model includes the first convolution nerve net Network, and using the Feature Selection Model based on deep learning training, determine multiple features of the upper limb of the object to be identified The operation of location information of the point in described image, comprising:

Using first convolutional neural networks, the fisrt feature figure in multiple channels is generated according to described image；

Greatest measure point in the fisrt feature figure in each channel is determined as the characteristic point；And

According to position of the characteristic point in respective fisrt feature figure, position of the characteristic point in described image is determined Information.

3. according to the method described in claim 2, it is characterized in that, the Feature Selection Model further includes the second convolution nerve net Network, and using the Feature Selection Model based on deep learning training, determine multiple features of the upper limb of the object to be identified The operation of location information of the point in described image, further includes:

Using second convolutional neural networks, the second feature figure in multiple channels is generated according to described image；

The positional relationship between two characteristic points adjacent in the multiple characteristic point is determined according to the second feature figure；And

According to positional relationship determined by the second feature figure, carried out to using characteristic point determined by the fisrt feature figure Screening, and determine the characteristic point to be sorted for classifying.

4. according to the method described in claim 3, it is characterized in that, according to identified location information, utilization is pre-set Disaggregated model, determines whether the object to be identified makes the operation for movement of raising one's hand, further includes: according to the characteristic point to be sorted Location information determine whether the object to be identified makes movement of raising one's hand using pre-set supporting vector machine model Operation.

5. according to the method described in claim 4, it is characterized in that, further including by following operation to first convolutional Neural Network is trained:

Obtain multiple sample images comprising upper limb；

Construct the first convolutional neural networks；

Using first convolutional neural networks, the first output vector corresponding with the sample image is generated, wherein described the One output vector is used to indicate location information of the characteristic point in the sample image included in the sample image；And

First output vector and pre-set the first label-vector corresponding with the sample image are compared, and And first convolutional neural networks are adjusted according to the result of the comparison.

6. according to the method described in claim 4, it is characterized in that, further including by following operation to second convolutional Neural Network is trained:

Obtain multiple sample images comprising upper limb；

Construct the second convolutional neural networks；

Using second convolutional neural networks, the second output vector corresponding with the sample image is generated, wherein described the Two output vectors are used to indicate the positional relationship between adjacent two characteristic point for including in the sample image；And

Second output vector and pre-set the second label-vector corresponding with the sample image are compared, and And second convolutional neural networks are adjusted according to the result of the comparison.

7. according to the method described in claim 5, it is characterized in that, by first output vector with it is pre-set with it is described The operation that corresponding first label-vector of sample image is compared, including calculate first output vector and first mark The first L2 space length between vector is infused, and adjusts the operation of first convolutional neural networks according to the result of the comparison, Include:

Using the first L2 space length as first-loss function, the first gradient of the first-loss function is calculated；And

Based on the first gradient, first convolutional neural networks are adjusted according to stochastic gradient descent principle.

8. according to the method described in claim 6, it is characterized in that, by second output vector with it is pre-set with it is described The operation that corresponding second label-vector of sample image is compared, including calculate second output vector and second mark The 2nd L2 space length between vector is infused, and adjusts the operation of second convolutional neural networks according to the result of the comparison, Include:

Using the 2nd L2 space length as the second loss function, the second gradient of second loss function is calculated；And

Based on second gradient, second convolutional neural networks are adjusted according to stochastic gradient descent principle.

9. a kind of storage medium, which is characterized in that the storage medium includes the program of storage, wherein run in described program When as processor perform claim require any one of 1 to 8 described in method.

10. a kind of action recognition device of raising one's hand based on deep learning characterized by comprising

First obtains module, for obtaining the image of the upper limb comprising object to be identified；

Determining module, for determining the upper limb of the object to be identified using the Feature Selection Model based on deep learning training Location information of multiple characteristic points in described image；And

Determination module, for according to identified location information, using pre-set disaggregated model, it to be described to be identified right to determine As if no make raises one's hand to act.