CN107423721A - Interactive action detection method, device, storage medium and processor - Google Patents
Interactive action detection method, device, storage medium and processor Download PDFInfo
- Publication number
- CN107423721A CN107423721A CN201710670075.4A CN201710670075A CN107423721A CN 107423721 A CN107423721 A CN 107423721A CN 201710670075 A CN201710670075 A CN 201710670075A CN 107423721 A CN107423721 A CN 107423721A
- Authority
- CN
- China
- Prior art keywords
- default
- target
- classification
- target photo
- position coordinates
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/20—Movements or behaviour, e.g. gesture recognition
- G06V40/23—Recognition of whole body movements, e.g. for sport training
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/01—Input arrangements or combined input and output arrangements for interaction between user and computer
- G06F3/011—Arrangements for interaction with the human body, e.g. for user immersion in virtual reality
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/20—Movements or behaviour, e.g. gesture recognition
- G06V40/28—Recognition of hand or arm movements, e.g. recognition of deaf sign language
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- General Engineering & Computer Science (AREA)
- Human Computer Interaction (AREA)
- Biophysics (AREA)
- Mathematical Physics (AREA)
- Data Mining & Analysis (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Computational Linguistics (AREA)
- Biomedical Technology (AREA)
- Evolutionary Computation (AREA)
- Software Systems (AREA)
- Artificial Intelligence (AREA)
- Life Sciences & Earth Sciences (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Psychiatry (AREA)
- Social Psychology (AREA)
- Multimedia (AREA)
- Image Analysis (AREA)
Abstract
The invention discloses a kind of interactive action detection method, device, storage medium and processor.Wherein, this method includes:Target Photo is detected according to default multilayer convolutional neural networks, obtains classification corresponding at least one destination object and frame coordinate present in Target Photo;Determine that confidence level highest destination object is target detection object at least one destination object;Classification corresponding to target detection object and frame coordinate were inputted to the default multistage and return convolutional neural networks, and then the position that human synovial position is carried out to target detection object is detected, and obtains the position coordinates at the human synovial position in target detection object;Position coordinates is normalized, and then the position coordinates after completing normalized detected according to default multilayer recurrent neural network, obtains the class label of Target Photo.The present invention solves interactive action accuracy in detection present in prior art and less efficient technical problem.
Description
Technical field
The present invention relates to field of human-computer interaction, in particular to a kind of interactive action detection method, device, deposits
Storage media and processor.
Background technology
Interactive action detects and classification is a basic technology of man-machine interaction, for smart home, safety-protection system
It is significant in the scene interacted with mankind such as patient cares with electronic equipment.Such as medical industry, in gesture identification
Under help, deaf-mutes patient can nurse not when, demand is communicated to by hospital by a camera and simple gesture, solve
The problems such as independent electronics of having determined configuration costliness and patient will not use computer.
It is the method based on two-stream convolutional neural networks to be currently used in human action to know method for distinguishing, when it will contain
Between information optical flow field and RBG images input convolutional neural networks simultaneously and go forward side by side row information fusion, the class of whole section of video of final output
Distinguishing label.Because the temporal information that optical flow field contains is confined to several frames nearby, the accuracy of result is limited, and because output is
The class label of one section of video calculates a large amount of duplicate messages, limits system, it is necessary to enter line slip to time window frame by frame
Efficiency and real-time.To sum up, there is the degree of accuracy and less efficient technical problem in interactive action detection of the prior art.
For it is above-mentioned the problem of, not yet propose effective solution at present.
The content of the invention
The embodiments of the invention provide a kind of interactive action detection method, device, storage medium and processor, so that
Solve interactive action accuracy in detection and less efficient technical problem present in prior art less.
One side according to embodiments of the present invention, there is provided a kind of interactive action detection method, this method include:
Target Photo is detected according to default multilayer convolutional neural networks, obtains at least one mesh present in above-mentioned Target Photo
Mark frame coordinate corresponding to classification corresponding to object and above-mentioned at least one destination object;Determine above-mentioned at least one target pair
As the middle above-mentioned destination object of confidence level highest is target detection object;By above-mentioned classification corresponding to above-mentioned target detection object and
Above-mentioned frame coordinate corresponding to above-mentioned target detection object inputs to the default multistage and returns convolutional neural networks, and then according to upper
State the position that default multistage recurrence convolutional neural networks carry out above-mentioned target detection object at human synovial position to detect, obtain
The position coordinates at the above-mentioned human synovial position in above-mentioned target detection object;Above-mentioned position coordinates is normalized,
And then the above-mentioned position coordinates after completing above-mentioned normalized is detected according to default multilayer recurrent neural network, obtain
To the testing result of above-mentioned Target Photo, wherein, including at least the class label of above-mentioned Target Photo in above-mentioned testing result.
Further, in the default multilayer recurrent neural network of basis to the above-mentioned position after completing above-mentioned normalized
Before coordinate is detected, the above method also includes:According to default loss function and preset algorithm to above-mentioned default multilayer recurrence
Neutral net is trained, wherein, above-mentioned default loss function is classification function, and above-mentioned preset algorithm is based on time scale
Back-propagation algorithm.
Further, above-mentioned basis presets multilayer recurrent neural network to the upper rheme after completing above-mentioned normalized
Put coordinate to be detected, obtaining the testing result of above-mentioned Target Photo includes:According to above-mentioned default multilayer recurrent neural network pair
The above-mentioned position coordinates after above-mentioned normalized is completed to be detected, obtain multiple classifications corresponding to above-mentioned Target Photo with
And multiple activation values corresponding to each above-mentioned classification in above-mentioned multiple classifications;Each above-mentioned classification is obtained in preset time window
The average value of corresponding above-mentioned multiple activation values;Above-mentioned classification corresponding to maximum average value in multiple above-mentioned average values is determined
For the class label of above-mentioned Target Photo, so as to obtain above-mentioned testing result.
Further, before being detected according to default multilayer convolutional neural networks to Target Photo, the above method is also
Including:Obtain the human body attitude video image photographed in default camera;Will be any in above-mentioned human body attitude video image
One frame picture is defined as above-mentioned Target Photo.
Another aspect according to embodiments of the present invention, a kind of interactive action detection means is additionally provided, the device bag
Include:Detection unit, for being detected according to default multilayer convolutional neural networks to Target Photo, obtain in above-mentioned Target Photo
Frame coordinate corresponding to classification corresponding to existing at least one destination object and above-mentioned at least one destination object;First is true
Order member, for determining that the above-mentioned destination object of confidence level highest is target detection object in above-mentioned at least one destination object;
First processing units, for will be above-mentioned corresponding to above-mentioned classification corresponding to above-mentioned target detection object and above-mentioned target detection object
Frame coordinate inputs to the default multistage and returns convolutional neural networks, and then returns convolutional Neural net according to the above-mentioned default multistage
The position that network carries out human synovial position to above-mentioned target detection object is detected, and obtains the above-mentioned people in above-mentioned target detection object
The position coordinates of body joint part;Second processing unit, for above-mentioned position coordinates to be normalized, and then according to pre-
If multilayer recurrent neural network detects to the above-mentioned position coordinates after completing above-mentioned normalized, above-mentioned target is obtained
The testing result of picture, wherein, including at least the class label of above-mentioned Target Photo in above-mentioned testing result.
Further, said apparatus also includes:Training unit, for the default loss function of basis and preset algorithm to above-mentioned
Default multilayer recurrent neural network is trained, wherein, above-mentioned default loss function is classification function, and above-mentioned preset algorithm is base
In the back-propagation algorithm of time scale.
Further, above-mentioned second processing unit includes:Detection sub-unit, for according to above-mentioned default multilayer recurrent neural
Network detects to the above-mentioned position coordinates after completing above-mentioned normalized, obtains multiple corresponding to above-mentioned Target Photo
Multiple activation values corresponding to each above-mentioned classification in classification and above-mentioned multiple classifications;Subelement is obtained, for when default
Between the average value of above-mentioned multiple activation values corresponding to each above-mentioned classification is obtained in window;Determination subelement, for will be multiple above-mentioned
Above-mentioned classification is defined as the class label of above-mentioned Target Photo corresponding to maximum average value in average value, so as to obtain above-mentioned inspection
Survey result.
Further, said apparatus also includes:Acquiring unit, for obtaining the human body attitude photographed in default camera
Video image;Second determining unit, for any one frame picture in above-mentioned human body attitude video image to be defined as into above-mentioned mesh
Mark on a map piece.
Another aspect according to embodiments of the present invention, additionally provides a kind of storage medium, and above-mentioned storage medium includes storage
Program, wherein, equipment where above-mentioned storage medium is controlled when said procedure is run performs above-mentioned interactive action inspection
Survey method.
Another aspect according to embodiments of the present invention, additionally provides a kind of processor, and above-mentioned processor is used for operation program,
Wherein, above-mentioned interactive action detection method is performed when said procedure is run.
In embodiments of the present invention, Target Photo is detected using according to default multilayer convolutional neural networks, obtained
Frame coordinate corresponding to classification corresponding at least one destination object present in Target Photo and at least one destination object
Mode, by determining that confidence level highest destination object is target detection object at least one destination object;So as to by mesh
Frame coordinate corresponding to classification corresponding to mark detection object and target detection object, which was inputted to the default multistage, returns convolutional Neural
Network, and then convolutional neural networks are returned according to the default multistage position at target detection object progress human synovial position is examined
Survey, obtain the position coordinates at the human synovial position in target detection object;Reach and position coordinates be normalized,
And then the position coordinates after completing normalized is detected according to default multilayer recurrent neural network, obtain target figure
The purpose of the testing result of piece, wherein, including at least the class label of Target Photo in testing result.The embodiment of the present invention is realized
The accuracy rate of lifting interactive action detection, improve interactive action detection efficiency technique effect, and then solve
Interactive action accuracy in detection and less efficient technical problem present in prior art.
Brief description of the drawings
Accompanying drawing described herein is used for providing a further understanding of the present invention, forms the part of the application, this hair
Bright schematic description and description is used to explain the present invention, does not form inappropriate limitation of the present invention.In the accompanying drawings:
Fig. 1 is a kind of schematic flow sheet of optional interactive action detection method according to embodiments of the present invention;
Fig. 2 is the schematic flow sheet of the optional interactive action detection method of another kind according to embodiments of the present invention;
Fig. 3 is the schematic flow sheet of another optional interactive action detection method according to embodiments of the present invention;
Fig. 4 is a kind of structural representation of optional interactive action detection means according to embodiments of the present invention.
Embodiment
In order that those skilled in the art more fully understand the present invention program, below in conjunction with the embodiment of the present invention
Accompanying drawing, the technical scheme in the embodiment of the present invention is clearly and completely described, it is clear that described embodiment is only
The embodiment of a part of the invention, rather than whole embodiments.Based on the embodiment in the present invention, ordinary skill people
The every other embodiment that member is obtained under the premise of creative work is not made, it should all belong to the model that the present invention protects
Enclose.
It should be noted that term " first " in description and claims of this specification and above-mentioned accompanying drawing, "
Two " etc. be for distinguishing similar object, without for describing specific order or precedence.It should be appreciated that so use
Data can exchange in the appropriate case, so as to embodiments of the invention described herein can with except illustrating herein or
Order beyond those of description is implemented.In addition, term " comprising " and " having " and their any deformation, it is intended that cover
Cover it is non-exclusive include, be not necessarily limited to for example, containing the process of series of steps or unit, method, system, product or equipment
Those steps or unit clearly listed, but may include not list clearly or for these processes, method, product
Or the intrinsic other steps of equipment or unit.
Embodiment 1
According to embodiments of the present invention, there is provided a kind of embodiment of interactive action detection method, it is necessary to explanation,
It can be performed the step of the flow of accompanying drawing illustrates in the computer system of such as one group computer executable instructions, and
And although showing logical order in flow charts, in some cases, can be with different from order execution institute herein
The step of showing or describing.
Fig. 1 is a kind of schematic flow sheet of optional interactive action detection method according to embodiments of the present invention, such as
Shown in Fig. 1, this method comprises the following steps:
Step S102, Target Photo is detected according to default multilayer convolutional neural networks, obtain depositing in Target Photo
At least one destination object corresponding to frame coordinate corresponding to classification and at least one destination object;
Step S104, determine that confidence level highest destination object is target detection object at least one destination object;
Step S106, frame coordinate corresponding to classification corresponding to target detection object and target detection object is inputted to pre-
If the multistage returns convolutional neural networks, and then returns convolutional neural networks according to the default multistage and target detection object is carried out
The position detection at human synovial position, obtains the position coordinates at the human synovial position in target detection object;
Step S108, position coordinates is normalized, and then according to default multilayer recurrent neural network to completing
Position coordinates after normalized is detected, and obtains the testing result of Target Photo, wherein, at least wrapped in testing result
Include the class label of Target Photo.
In embodiments of the present invention, Target Photo is detected using according to default multilayer convolutional neural networks, obtained
Frame coordinate corresponding to classification corresponding at least one destination object present in Target Photo and at least one destination object
Mode, by determining that confidence level highest destination object is target detection object at least one destination object;So as to by mesh
Frame coordinate corresponding to classification corresponding to mark detection object and target detection object, which was inputted to the default multistage, returns convolutional Neural
Network, and then convolutional neural networks are returned according to the default multistage position at target detection object progress human synovial position is examined
Survey, obtain the position coordinates at the human synovial position in target detection object;Reach and position coordinates be normalized,
And then the position coordinates after completing normalized is detected according to default multilayer recurrent neural network, obtain target figure
The purpose of the testing result of piece, wherein, including at least the class label of Target Photo in testing result.The embodiment of the present invention is realized
The accuracy rate of lifting interactive action detection, improve interactive action detection efficiency technique effect, and then solve
Interactive action accuracy in detection and less efficient technical problem present in prior art.
Alternatively, a large amount of problems such as object classification in terms of computer vision in recent years of convolutional neural networks technology, knowledge
Not, detection etc. embodies good performance, and it is mainly suitable for the analysis of hollow static schema of visual signal.And recurrent neural net
Network achieves leading result in machine translation, visual classification problem in recent years, and it is mainly suitable for the dynamic to time series
Characteristic is modeled.Therefore, depth convolutional neural networks and recurrent neural network are combined by the embodiment of the present application, and description regards
Feel the Dynamic mode of time and space in signal, improve motion detection and the accuracy rate of classification.
Alternatively, before step S102 is performed, classification can be marked to every frame Target Photo, so as to build training sample
Collection.For example, interactive action classification can be divided into 6 kinds:Raise one's hand, wave, swing arm, draw circle, both hands intersect and be not belonging to this 5
Other actions of kind action.
Alternatively, during step S102 is performed, multilayer convolutional neural networks can be constructed Target Photo is carried out
Target detection, obtain the classification and frame coordinate of multiple targets.The network includes carrying out the feature of picture space feature extraction
Network is extracted, proposes the candidate region network (Region Proposal Network) of possible target bezel locations to be detected,
And candidate region is classified and frame return classification Recurrent networks.Space characteristics extraction network can select
Zeiler&Fergus networks, VGG-16/19 networks or residual error neutral net.Candidate region network is made up of 3 convolutional layers,
Convolution kernel size is respectively 512 × 3 × 3,18 × 1 × 1 and 36 × 1 × 1.The input of wherein second and third convolutional layer is
One convolutional layer, output are respectively the frame coordinate and fraction of candidate region.Classify Recurrent networks by region of interest (region of
Interest) pond layer, the full articulamentum of two layers 4096 dimensions, and parallel classification and the full articulamentum composition of recurrence.Wherein feel
The vector table that the provincial characteristics of arbitrary size is mapped as fixed length by region of interest pond layer reaches, and classifies and returns the output of full articulamentum
Respectively for input frame, the offset of all kinds of fractions and frame.Frame coordinate is carried out according to frame offset
Fine setting, and after being screened according to fraction, the testing result of the step can be obtained.
Alternatively, during step S106 is performed, multistage recurrence convolutional neural networks can be constructed and carry out human body
Joint position is estimated, obtains the coordinate of important joint position.The network is formed by the stacking of multiple identical networks, each sub-network
Input be using target's center as average Gaussian Profile, pre-process network output and a upper sub-network output.Wherein, in advance
Processing network is made up of four convolutional layers, and convolution kernel size is respectively 128 × 9 × 9, pad=4,128 × 9 × 9, pad=4,
128 × 9 × 9, pad=4,32 × 5 × 5, pad=2, size 9 × 9, the maximum pondization behaviour that step-length is 2 are carried out after three first layers
Make.Sub-network increases three-layer coil lamination on the basis of network is pre-processed, and size is respectively 512 × 9 × 9, pad=4,512 × 1
× 1 and 512 × 1 × 1.Number of stages is higher, then the effect of joint estimation amendment is better.
Alternatively, during step S106 is performed, presetting multistage recurrence convolutional neural networks can be by same structure
Convolutional neural networks stack, and finely tune output result respectively in the multistage, and obtain head, neck, right and left shoulders, left and right elbow,
Left and right wrist, left and right stock, left and right knee, the two-dimensional coordinate in 14 joints of left and right ankle.
Alternatively, during step S108 is performed, the joint coordinates of output can be normalized.Take it
The head position h of first frameI, 0For initial point.In view of the Scale invariant on front and each visual angle vertical direction in side, head is taken
Apart from sum it is scale factor s to neck, hip joint to double kneesi, then the coordinate after normalizingFor:
Alternatively, the position coordinates after completing normalized is examined according to default multilayer recurrent neural network
Before survey, method also includes:Default multilayer recurrent neural network is trained according to default loss function and preset algorithm, its
In, it is classification function to preset loss function, and preset algorithm is the back-propagation algorithm based on time scale.Specifically, the loss
Function can be softmax classification functions.
It is alternatively possible to build multilayer recurrent neural network and be trained.According to Rye moral criterion selecting video length
It is the video near average in 3 times of standard deviations as training set, and takes the time step of a length of recurrent neural network of largest frames
(time step).Data enhancing is carried out using the mode that joint coordinates are added with random noise, and using trellis search method to net
The hyper parameters such as the ratio of network layers number, every layer of neuron number and dropout optimize.
Specifically, during being trained to default multilayer recurrent neural network, can be lost according to Rye moral criterion
Video length is abandoned as the average video outside 3 times of standard deviations, and take the time step of a length of recurrent neural network of largest frames nearby
(time step) is deployed.It is neat using full 0 value complement for the video pictures feature of the maximum frame length of deficiency, and by class label
0 is arranged to, representative is not belonging to any classification.Loss function is arranged to softmax functions, the prediction classification to each frame asks loss
Value, and take average as total loss, this is also referred to as perplexity.Sample set is pressed into sample number 7:3 random divisions are instruction
Practice collection and test set, and joint coordinates are added fromNoise, with increase sample number carry out data enhancing.Training
When using BPTT algorithms carry out network weight renewal.Using trellis search method to the network number of plies, every layer of neuron number, with
And the hyper parameter such as dropout ratio optimizes, iterations elects 200 as.It is chosen at the best super ginseng of effect on test set
Model under number and iterations is used to test.Indexes of Evaluation Effect is F1Fraction, computational methods are:
Wherein precision is classification accuracy, and recall is classification recall rate.
Alternatively, Fig. 2 is the flow of the optional interactive action detection method of another kind according to embodiments of the present invention
Schematic diagram, as shown in Fig. 2 performing step S108, i.e., after basis presets multilayer recurrent neural network to completion normalized
Position coordinates detected, obtaining the testing result of Target Photo includes:
Step S202, the position coordinates after completing normalized is examined according to default multilayer recurrent neural network
Survey, obtain multiple activation values corresponding to multiple classifications corresponding to Target Photo and each classification in multiple classifications;
Step S204, the average value of multiple activation values corresponding to each classification is obtained in preset time window;
Step S206, classification corresponding to the maximum average value in multiple average values is defined as to the classification mark of Target Photo
Label, so as to obtain testing result.
Alternatively, Fig. 3 is the flow of another optional interactive action detection method according to embodiments of the present invention
Schematic diagram, as shown in figure 3, before step S102 is performed, i.e., Target Photo is entered according to default multilayer convolutional neural networks
Before row detection, this method can also include:
Step S302, obtain the human body attitude video image photographed in default camera;
Step S304, any one frame picture in human body attitude video image is defined as Target Photo.
Specifically, the default camera can be USB camera or IP Camera.One in human body attitude video image
As include multiframe target image.
Alternatively, the present invention extracts the spatial vision feature of image by convolutional neural networks, and by training recurrence refreshing
Dynamic modeling is carried out to time series through network, all kinds of activation values fixed time in window previous to certain moment tire out in test
Add, and take the value of maximum to correspond to result of the classification as the moment.The present invention can complete simultaneously carry out human action detection with
Identification, and there is preferable real-time and robustness.
Alternatively, the detection of man-machine interaction human action and classification involved in the present invention based on depth Space-time Neural Network
Method, the spatial vision feature of image can be extracted using convolutional neural networks, and human body is moved using recurrent neural network
State is modeled and predicted, extends the use range of deep learning method, improves the utilization ratio of time scale information, and
Simultaneously can to human body behavior at the beginning of carve and detect, expand the use range of the technology.
In embodiments of the present invention, Target Photo is detected using according to default multilayer convolutional neural networks, obtained
Frame coordinate corresponding to classification corresponding at least one destination object present in Target Photo and at least one destination object
Mode, by determining that confidence level highest destination object is target detection object at least one destination object;So as to by mesh
Frame coordinate corresponding to classification corresponding to mark detection object and target detection object, which was inputted to the default multistage, returns convolutional Neural
Network, and then convolutional neural networks are returned according to the default multistage position at target detection object progress human synovial position is examined
Survey, obtain the position coordinates at the human synovial position in target detection object;Reach and position coordinates be normalized,
And then the position coordinates after completing normalized is detected according to default multilayer recurrent neural network, obtain target figure
The purpose of the testing result of piece, wherein, including at least the class label of Target Photo in testing result.The embodiment of the present invention is realized
The accuracy rate of lifting interactive action detection, improve interactive action detection efficiency technique effect, and then solve
Interactive action accuracy in detection and less efficient technical problem present in prior art.
Embodiment 2
Another aspect according to embodiments of the present invention, a kind of interactive action detection means is additionally provided, such as Fig. 4 institutes
Show, the device includes:Detection unit 401, the first determining unit 403, first processing units 405, second processing unit 407.
Wherein, detection unit 401, for being detected according to default multilayer convolutional neural networks to Target Photo, obtain
Frame coordinate corresponding to classification corresponding at least one destination object present in Target Photo and at least one destination object;
First determining unit 403, for determining that confidence level highest destination object is target detection object at least one destination object;
First processing units 405, for frame coordinate corresponding to classification corresponding to target detection object and target detection object to be inputted
Convolutional neural networks are returned to the default multistage, and then convolutional neural networks are returned to target detection object according to the default multistage
The position detection at human synovial position is carried out, obtains the position coordinates at the human synovial position in target detection object;At second
Unit 407 is managed, for position coordinates to be normalized, and then according to default multilayer recurrent neural network to completing normalizing
Position coordinates after change processing is detected, and obtains the testing result of Target Photo, wherein, mesh is comprised at least in testing result
Mark on a map the class label of piece.
Alternatively, the device can also include:Training unit, for the default loss function of basis and preset algorithm to default
Multilayer recurrent neural network is trained, wherein, default loss function is classification function, and preset algorithm is based on time scale
Back-propagation algorithm.
Alternatively, second processing unit 407 can include:Detection sub-unit, for according to default multilayer recurrent neural net
Network detects to the position coordinates after completing normalized, obtains multiple classifications corresponding to Target Photo and multiple classes
Multiple activation values corresponding to each classification in not;Subelement is obtained, it is corresponding for obtaining each classification in preset time window
Multiple activation values average value;Determination subelement, for classification corresponding to the maximum average value in multiple average values to be determined
For the class label of Target Photo, so as to obtain testing result.
Alternatively, the device can also include:Acquiring unit, for obtaining the human body attitude photographed in default camera
Video image;Second determining unit, for any one frame picture in human body attitude video image to be defined as into Target Photo.
Another aspect according to embodiments of the present invention, additionally provides a kind of storage medium, and storage medium includes the journey of storage
Sequence, wherein, equipment performs above-mentioned interactive action detection method where controlling storage medium when program is run.
Another aspect according to embodiments of the present invention, additionally provides a kind of processor, and processor is used for operation program, its
In, program performs above-mentioned interactive action detection method when running.
In embodiments of the present invention, Target Photo is detected using according to default multilayer convolutional neural networks, obtained
Frame coordinate corresponding to classification corresponding at least one destination object present in Target Photo and at least one destination object
Mode, by determining that confidence level highest destination object is target detection object at least one destination object;So as to by mesh
Frame coordinate corresponding to classification corresponding to mark detection object and target detection object, which was inputted to the default multistage, returns convolutional Neural
Network, and then convolutional neural networks are returned according to the default multistage position at target detection object progress human synovial position is examined
Survey, obtain the position coordinates at the human synovial position in target detection object;Reach and position coordinates be normalized,
And then the position coordinates after completing normalized is detected according to default multilayer recurrent neural network, obtain target figure
The purpose of the testing result of piece, wherein, including at least the class label of Target Photo in testing result.The embodiment of the present invention is realized
The accuracy rate of lifting interactive action detection, improve interactive action detection efficiency technique effect, and then solve
Interactive action accuracy in detection and less efficient technical problem present in prior art.
The embodiments of the present invention are for illustration only, do not represent the quality of embodiment.
In the above embodiment of the present invention, the description to each embodiment all emphasizes particularly on different fields, and does not have in some embodiment
The part of detailed description, it may refer to the associated description of other embodiment.
In several embodiments provided herein, it should be understood that disclosed technology contents, others can be passed through
Mode is realized.Wherein, device embodiment described above is only schematical, such as the division of the unit, Ke Yiwei
A kind of division of logic function, can there is an other dividing mode when actually realizing, for example, multiple units or component can combine or
Person is desirably integrated into another system, or some features can be ignored, or does not perform.Another, shown or discussed is mutual
Between coupling or direct-coupling or communication connection can be INDIRECT COUPLING or communication link by some interfaces, unit or module
Connect, can be electrical or other forms.
The unit illustrated as separating component can be or may not be physically separate, show as unit
The part shown can be or may not be physical location, you can with positioned at a place, or can also be distributed to multiple
On unit.Some or all of unit therein can be selected to realize the purpose of this embodiment scheme according to the actual needs.
In addition, each functional unit in each embodiment of the present invention can be integrated in a processing unit, can also
That unit is individually physically present, can also two or more units it is integrated in a unit.Above-mentioned integrated list
Member can both be realized in the form of hardware, can also be realized in the form of SFU software functional unit.
If the integrated unit is realized in the form of SFU software functional unit and is used as independent production marketing or use
When, it can be stored in a computer read/write memory medium.Based on such understanding, technical scheme is substantially
The part to be contributed in other words to prior art or all or part of the technical scheme can be in the form of software products
Embody, the computer software product is stored in a storage medium, including some instructions are causing a computer
Equipment (can be personal computer, server or network equipment etc.) perform each embodiment methods described of the present invention whole or
Part steps.And foregoing storage medium includes:USB flash disk, read-only storage (ROM, Read-Only Memory), arbitrary access are deposited
Reservoir (RAM, Random Access Memory), mobile hard disk, magnetic disc or CD etc. are various can be with store program codes
Medium.
Described above is only the preferred embodiment of the present invention, it is noted that for the ordinary skill people of the art
For member, under the premise without departing from the principles of the invention, some improvements and modifications can also be made, these improvements and modifications also should
It is considered as protection scope of the present invention.
Claims (10)
- A kind of 1. interactive action detection method, it is characterised in that including:Target Photo is detected according to default multilayer convolutional neural networks, obtained at least one present in the Target Photo Frame coordinate corresponding to classification corresponding to individual destination object and at least one destination object;Determine that destination object described in confidence level highest is target detection object at least one destination object;By the frame coordinate input corresponding to the classification corresponding to the target detection object and the target detection object Convolutional neural networks are returned to the default multistage, and then convolutional neural networks are returned to the target according to the default multistage Detection object carries out the position detection at human synovial position, obtains the human synovial position in the target detection object Position coordinates;The position coordinates is normalized, and then according to default multilayer recurrent neural network to completing the normalization The position coordinates after processing is detected, and obtains the testing result of the Target Photo, wherein, in the testing result Including at least the class label of the Target Photo.
- 2. according to the method for claim 1, it is characterised in that in the default multilayer recurrent neural network of basis to described in completion Before the position coordinates after normalized is detected, methods described also includes:According to default loss function and in advance Imputation method is trained to the default multilayer recurrent neural network, wherein, the default loss function is classification function, described Preset algorithm is the back-propagation algorithm based on time scale.
- 3. according to the method for claim 1, it is characterised in that the basis presets multilayer recurrent neural network to completing institute State the position coordinates after normalized to be detected, obtaining the testing result of the Target Photo includes:The position coordinates after completing the normalized is examined according to the default multilayer recurrent neural network Survey, obtain multiple sharp corresponding to multiple classifications corresponding to the Target Photo and each classification in the multiple classification Value living;The average value of the multiple activation value corresponding to each classification is obtained in preset time window;The classification corresponding to maximum average value in multiple average values is defined as to the class label of the Target Photo, So as to obtain the testing result.
- 4. according to the method for claim 1, it is characterised in that in the default multilayer convolutional neural networks of basis to Target Photo Before being detected, methods described also includes:Obtain the human body attitude video image photographed in default camera;Any one frame picture in the human body attitude video image is defined as the Target Photo.
- A kind of 5. interactive action detection means, it is characterised in that including:Detection unit, for being detected according to default multilayer convolutional neural networks to Target Photo, obtain the Target Photo Present in frame coordinate corresponding to classification corresponding at least one destination object and at least one destination object;First determining unit, for determining that destination object described in confidence level highest is target at least one destination object Detection object;First processing units, for by corresponding to the classification corresponding to the target detection object and the target detection object The frame coordinate inputs to the default multistage and returns convolutional neural networks, and then returns convolution god according to the default multistage The position that human synovial position is carried out to the target detection object through network is detected, and obtains the institute in the target detection object State the position coordinates at human synovial position;Second processing unit, for the position coordinates to be normalized, and then according to default multilayer recurrent neural net Network detects to the position coordinates after completing the normalized, obtains the testing result of the Target Photo, Wherein, the class label of the Target Photo is comprised at least in the testing result.
- 6. device according to claim 5, it is characterised in that described device also includes:Training unit, for being instructed according to default loss function and preset algorithm to the default multilayer recurrent neural network Practice, wherein, the default loss function is classification function, and the preset algorithm is the back-propagation algorithm based on time scale.
- 7. device according to claim 5, it is characterised in that the second processing unit includes:Detection sub-unit, for according to the default multilayer recurrent neural network to described in completing after the normalized Position coordinates is detected, and obtains multiple classifications corresponding to the Target Photo and each class in the multiple classification Not corresponding multiple activation values;Subelement is obtained, for obtaining being averaged for the multiple activation value corresponding to each classification in preset time window Value;Determination subelement, for the classification corresponding to the maximum average value in multiple average values to be defined as into the target The class label of picture, so as to obtain the testing result.
- 8. device according to claim 5, it is characterised in that described device also includes:Acquiring unit, for obtaining the human body attitude video image photographed in default camera;Second determining unit, for any one frame picture in the human body attitude video image to be defined as into the target figure Piece.
- A kind of 9. storage medium, it is characterised in that the storage medium includes the program of storage, wherein, run in described program When control the storage medium where equipment perform claim require that 1 man-machine interaction into claim 4 described in any one is moved Make detection method.
- A kind of 10. processor, it is characterised in that the processor is used for operation program, wherein, right of execution when described program is run Profit requires the 1 interactive action detection method into claim 4 described in any one.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710670075.4A CN107423721A (en) | 2017-08-08 | 2017-08-08 | Interactive action detection method, device, storage medium and processor |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710670075.4A CN107423721A (en) | 2017-08-08 | 2017-08-08 | Interactive action detection method, device, storage medium and processor |
Publications (1)
Publication Number | Publication Date |
---|---|
CN107423721A true CN107423721A (en) | 2017-12-01 |
Family
ID=60437481
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710670075.4A Pending CN107423721A (en) | 2017-08-08 | 2017-08-08 | Interactive action detection method, device, storage medium and processor |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN107423721A (en) |
Cited By (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108629306A (en) * | 2018-04-28 | 2018-10-09 | 北京京东金融科技控股有限公司 | Human posture recognition method and device, electronic equipment, storage medium |
CN108830185A (en) * | 2018-05-28 | 2018-11-16 | 四川瞳知科技有限公司 | Activity recognition and localization method based on multitask combination learning |
CN109922354A (en) * | 2019-03-29 | 2019-06-21 | 广州虎牙信息科技有限公司 | Living broadcast interactive method, apparatus, live broadcast system and electronic equipment |
CN109936774A (en) * | 2019-03-29 | 2019-06-25 | 广州虎牙信息科技有限公司 | Virtual image control method, device and electronic equipment |
CN110046537A (en) * | 2017-12-08 | 2019-07-23 | 辉达公司 | The system and method for carrying out dynamic face analysis using recurrent neural network |
CN110135246A (en) * | 2019-04-03 | 2019-08-16 | 平安科技(深圳)有限公司 | A kind of recognition methods and equipment of human action |
CN110987189A (en) * | 2019-11-21 | 2020-04-10 | 北京都是科技有限公司 | Method, system and device for detecting temperature of target object |
CN111638791A (en) * | 2020-06-03 | 2020-09-08 | 北京字节跳动网络技术有限公司 | Virtual character generation method and device, electronic equipment and storage medium |
WO2020200082A1 (en) * | 2019-03-29 | 2020-10-08 | 广州虎牙信息科技有限公司 | Live broadcast interaction method and apparatus, live broadcast system and electronic device |
CN111898622A (en) * | 2019-05-05 | 2020-11-06 | 阿里巴巴集团控股有限公司 | Information processing method, information display method, model training method, information display system, model training system and equipment |
CN112102235A (en) * | 2020-08-07 | 2020-12-18 | 上海联影智能医疗科技有限公司 | Human body part recognition method, computer device, and storage medium |
CN108848389B (en) * | 2018-07-27 | 2021-03-30 | 恒信东方文化股份有限公司 | Panoramic video processing method and playing system |
Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105389549A (en) * | 2015-10-28 | 2016-03-09 | 北京旷视科技有限公司 | Object recognition method and device based on human body action characteristic |
CN105678297A (en) * | 2015-12-29 | 2016-06-15 | 南京大学 | Portrait semantic analysis method and system based on label transfer and LSTM model |
CN105976400A (en) * | 2016-05-10 | 2016-09-28 | 北京旷视科技有限公司 | Object tracking method and device based on neural network model |
CN106570480A (en) * | 2016-11-07 | 2017-04-19 | 南京邮电大学 | Posture-recognition-based method for human movement classification |
CN106682697A (en) * | 2016-12-29 | 2017-05-17 | 华中科技大学 | End-to-end object detection method based on convolutional neural network |
CN106845351A (en) * | 2016-05-13 | 2017-06-13 | 苏州大学 | It is a kind of for Activity recognition method of the video based on two-way length mnemon in short-term |
CN106845374A (en) * | 2017-01-06 | 2017-06-13 | 清华大学 | Pedestrian detection method and detection means based on deep learning |
CN106897670A (en) * | 2017-01-19 | 2017-06-27 | 南京邮电大学 | A kind of express delivery violence sorting recognition methods based on computer vision |
-
2017
- 2017-08-08 CN CN201710670075.4A patent/CN107423721A/en active Pending
Patent Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105389549A (en) * | 2015-10-28 | 2016-03-09 | 北京旷视科技有限公司 | Object recognition method and device based on human body action characteristic |
CN105678297A (en) * | 2015-12-29 | 2016-06-15 | 南京大学 | Portrait semantic analysis method and system based on label transfer and LSTM model |
CN105976400A (en) * | 2016-05-10 | 2016-09-28 | 北京旷视科技有限公司 | Object tracking method and device based on neural network model |
CN106845351A (en) * | 2016-05-13 | 2017-06-13 | 苏州大学 | It is a kind of for Activity recognition method of the video based on two-way length mnemon in short-term |
CN106570480A (en) * | 2016-11-07 | 2017-04-19 | 南京邮电大学 | Posture-recognition-based method for human movement classification |
CN106682697A (en) * | 2016-12-29 | 2017-05-17 | 华中科技大学 | End-to-end object detection method based on convolutional neural network |
CN106845374A (en) * | 2017-01-06 | 2017-06-13 | 清华大学 | Pedestrian detection method and detection means based on deep learning |
CN106897670A (en) * | 2017-01-19 | 2017-06-27 | 南京邮电大学 | A kind of express delivery violence sorting recognition methods based on computer vision |
Non-Patent Citations (1)
Title |
---|
YONG DU ET AL.: "Representation Learning of Temporal Dynamics for Skeleton-Based Action Recognition", 《IEEE TRANSACTIONS ON IMAGE PROCESSING》 * |
Cited By (21)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110046537A (en) * | 2017-12-08 | 2019-07-23 | 辉达公司 | The system and method for carrying out dynamic face analysis using recurrent neural network |
CN110046537B (en) * | 2017-12-08 | 2023-12-29 | 辉达公司 | System and method for dynamic facial analysis using recurrent neural networks |
CN108629306A (en) * | 2018-04-28 | 2018-10-09 | 北京京东金融科技控股有限公司 | Human posture recognition method and device, electronic equipment, storage medium |
CN108629306B (en) * | 2018-04-28 | 2020-05-15 | 京东数字科技控股有限公司 | Human body posture recognition method and device, electronic equipment and storage medium |
CN108830185B (en) * | 2018-05-28 | 2020-11-10 | 四川瞳知科技有限公司 | Behavior identification and positioning method based on multi-task joint learning |
CN108830185A (en) * | 2018-05-28 | 2018-11-16 | 四川瞳知科技有限公司 | Activity recognition and localization method based on multitask combination learning |
CN108848389B (en) * | 2018-07-27 | 2021-03-30 | 恒信东方文化股份有限公司 | Panoramic video processing method and playing system |
CN109922354A (en) * | 2019-03-29 | 2019-06-21 | 广州虎牙信息科技有限公司 | Living broadcast interactive method, apparatus, live broadcast system and electronic equipment |
CN109936774A (en) * | 2019-03-29 | 2019-06-25 | 广州虎牙信息科技有限公司 | Virtual image control method, device and electronic equipment |
CN109922354B (en) * | 2019-03-29 | 2020-07-03 | 广州虎牙信息科技有限公司 | Live broadcast interaction method and device, live broadcast system and electronic equipment |
CN109922354B9 (en) * | 2019-03-29 | 2020-08-21 | 广州虎牙信息科技有限公司 | Live broadcast interaction method and device, live broadcast system and electronic equipment |
WO2020200082A1 (en) * | 2019-03-29 | 2020-10-08 | 广州虎牙信息科技有限公司 | Live broadcast interaction method and apparatus, live broadcast system and electronic device |
CN110135246A (en) * | 2019-04-03 | 2019-08-16 | 平安科技(深圳)有限公司 | A kind of recognition methods and equipment of human action |
CN110135246B (en) * | 2019-04-03 | 2023-10-20 | 平安科技(深圳)有限公司 | Human body action recognition method and device |
CN111898622A (en) * | 2019-05-05 | 2020-11-06 | 阿里巴巴集团控股有限公司 | Information processing method, information display method, model training method, information display system, model training system and equipment |
CN111898622B (en) * | 2019-05-05 | 2022-07-15 | 阿里巴巴集团控股有限公司 | Information processing method, information display method, model training method, information display system, model training system and equipment |
CN110987189A (en) * | 2019-11-21 | 2020-04-10 | 北京都是科技有限公司 | Method, system and device for detecting temperature of target object |
CN111638791A (en) * | 2020-06-03 | 2020-09-08 | 北京字节跳动网络技术有限公司 | Virtual character generation method and device, electronic equipment and storage medium |
CN111638791B (en) * | 2020-06-03 | 2021-11-09 | 北京火山引擎科技有限公司 | Virtual character generation method and device, electronic equipment and storage medium |
CN112102235A (en) * | 2020-08-07 | 2020-12-18 | 上海联影智能医疗科技有限公司 | Human body part recognition method, computer device, and storage medium |
CN112102235B (en) * | 2020-08-07 | 2023-10-27 | 上海联影智能医疗科技有限公司 | Human body part recognition method, computer device, and storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107423721A (en) | Interactive action detection method, device, storage medium and processor | |
CN110135375B (en) | Multi-person attitude estimation method based on global information integration | |
CN111291739B (en) | Face detection and image detection neural network training method, device and equipment | |
CN108491880B (en) | Object classification and pose estimation method based on neural network | |
CN108960337B (en) | Multi-modal complex activity recognition method based on deep learning model | |
CN106570477B (en) | Vehicle cab recognition model building method and model recognizing method based on deep learning | |
CN105160400B (en) | The method of lifting convolutional neural networks generalization ability based on L21 norms | |
CN108090408A (en) | For performing the method and apparatus of Facial expression recognition and training | |
CN109559300A (en) | Image processing method, electronic equipment and computer readable storage medium | |
CN110555481A (en) | Portrait style identification method and device and computer readable storage medium | |
CN107516127B (en) | Method and system for service robot to autonomously acquire attribution semantics of human-worn carried articles | |
CN108460356A (en) | A kind of facial image automated processing system based on monitoring system | |
DE112019005671T5 (en) | DETERMINING ASSOCIATIONS BETWEEN OBJECTS AND PERSONS USING MACHINE LEARNING MODELS | |
CN111274916A (en) | Face recognition method and face recognition device | |
CN110348572A (en) | The processing method and processing device of neural network model, electronic equipment, storage medium | |
CN109948526A (en) | Image processing method and device, detection device and storage medium | |
CN109410168A (en) | For determining the modeling method of the convolutional neural networks model of the classification of the subgraph block in image | |
CN107330750A (en) | A kind of recommended products figure method and device, electronic equipment | |
CN109919085B (en) | Human-human interaction behavior identification method based on light-weight convolutional neural network | |
CN107169954A (en) | A kind of image significance detection method based on parallel-convolution neutral net | |
CN113065576A (en) | Feature extraction method and device | |
CN112419326B (en) | Image segmentation data processing method, device, equipment and storage medium | |
CN106909938A (en) | Viewing angle independence Activity recognition method based on deep learning network | |
CN106971145A (en) | A kind of various visual angles action identification method and device based on extreme learning machine | |
CN108875456A (en) | Object detection method, object detecting device and computer readable storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20171201 |
|
RJ01 | Rejection of invention patent application after publication |