CN114582030A - Behavior recognition method based on service robot - Google Patents
Behavior recognition method based on service robot Download PDFInfo
- Publication number
- CN114582030A CN114582030A CN202210484610.8A CN202210484610A CN114582030A CN 114582030 A CN114582030 A CN 114582030A CN 202210484610 A CN202210484610 A CN 202210484610A CN 114582030 A CN114582030 A CN 114582030A
- Authority
- CN
- China
- Prior art keywords
- joint
- human
- service robot
- convolution
- behavior
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 58
- 230000006399 behavior Effects 0.000 claims abstract description 78
- 238000012549 training Methods 0.000 claims abstract description 28
- 238000005457 optimization Methods 0.000 claims abstract description 18
- 238000012360 testing method Methods 0.000 claims abstract description 12
- 230000002776 aggregation Effects 0.000 claims abstract description 5
- 238000004220 aggregation Methods 0.000 claims abstract description 5
- 238000013135 deep learning Methods 0.000 claims abstract description 5
- 238000003062 neural network model Methods 0.000 claims abstract description 4
- 238000007781 pre-processing Methods 0.000 claims abstract description 4
- 230000008569 process Effects 0.000 claims description 28
- 239000011159 matrix material Substances 0.000 claims description 27
- 238000004364 calculation method Methods 0.000 claims description 22
- 238000010586 diagram Methods 0.000 claims description 14
- 238000001514 detection method Methods 0.000 claims description 10
- 230000008859 change Effects 0.000 claims description 8
- 238000012545 processing Methods 0.000 claims description 6
- 206010008479 Chest Pain Diseases 0.000 claims description 4
- 230000035622 drinking Effects 0.000 claims description 4
- 238000000605 extraction Methods 0.000 claims description 3
- 230000009191 jumping Effects 0.000 claims description 3
- 238000002372 labelling Methods 0.000 claims description 3
- 230000002123 temporal effect Effects 0.000 claims description 3
- 239000011800 void material Substances 0.000 claims description 3
- 235000008694 Humulus lupulus Nutrition 0.000 claims description 2
- 239000013589 supplement Substances 0.000 claims description 2
- 230000004044 response Effects 0.000 abstract description 3
- 238000005516 engineering process Methods 0.000 description 8
- 238000011161 development Methods 0.000 description 4
- 238000005070 sampling Methods 0.000 description 4
- 230000009471 action Effects 0.000 description 3
- 230000003993 interaction Effects 0.000 description 3
- 238000011160 research Methods 0.000 description 3
- 206010063385 Intellectualisation Diseases 0.000 description 2
- 230000003044 adaptive effect Effects 0.000 description 2
- 210000000988 bone and bone Anatomy 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 239000000284 extract Substances 0.000 description 2
- 230000006870 function Effects 0.000 description 2
- 230000033001 locomotion Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000012706 support-vector machine Methods 0.000 description 2
- 230000000007 visual effect Effects 0.000 description 2
- 206010019233 Headaches Diseases 0.000 description 1
- 230000004913 activation Effects 0.000 description 1
- 238000004458 analytical method Methods 0.000 description 1
- 238000013459 approach Methods 0.000 description 1
- 238000013473 artificial intelligence Methods 0.000 description 1
- 230000003796 beauty Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000008901 benefit Effects 0.000 description 1
- 238000006243 chemical reaction Methods 0.000 description 1
- 238000004140 cleaning Methods 0.000 description 1
- 238000012937 correction Methods 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 210000003108 foot joint Anatomy 0.000 description 1
- 230000004927 fusion Effects 0.000 description 1
- 210000002478 hand joint Anatomy 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 230000010354 integration Effects 0.000 description 1
- 238000012423 maintenance Methods 0.000 description 1
- 238000010606 normalization Methods 0.000 description 1
- 238000012827 research and development Methods 0.000 description 1
- 230000000717 retained effect Effects 0.000 description 1
- 238000012800 visualization Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/049—Temporal neural networks, e.g. delay elements, oscillating neurons or pulsed inputs
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Evolutionary Computation (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Software Systems (AREA)
- Mathematical Physics (AREA)
- Health & Medical Sciences (AREA)
- Biomedical Technology (AREA)
- Computing Systems (AREA)
- Molecular Biology (AREA)
- General Health & Medical Sciences (AREA)
- Evolutionary Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Image Analysis (AREA)
Abstract
The application relates to a behavior identification method based on a service robot, which comprises the following specific steps: extracting human body joint point sequences of 13 common behavior categories in a service robot application scene to form a training data set; preprocessing a training data set; carrying out weighting optimization on the joint point data by combining with an actual application scene to output 17 main joint points; constructing a lightweight multi-scale aggregation space-time map convolution deep learning neural network model by using a multi-scale space-time map convolution and time convolution module; training and testing the data set by using the constructed network model; identifying human body behaviors in the video image under the real scene to be identified by using the trained model; and the service robot receives the human behavior recognition result and makes a corresponding response. The invention can accurately identify the human body behaviors in the scene, and ensures the service quality of the service robot.
Description
Technical Field
The application relates to the technical field of human behavior recognition, in particular to a behavior recognition method based on a service robot.
Background
With the development of science and technology and the intensive research of artificial intelligence technology, the application field of robot technology is not limited to industrial robots any more, but is popularized and applied in the direction of life activation and civilization, and the service robot gradually enters the daily life of people. In recent years, service robots develop towards intellectualization, have more and more abundant functions, and are widely applied to the aspects of cleaning, medical treatment, rescue, logistics, maintenance, security and the like. The development of the service robot industry can effectively relieve the social service pressure of old disabled people, improve the quality of life of people, promote the rapid development of civil science and technology, and is a strategic measure for realizing the benefit of advanced scientific and technological achievements to the people, so that all countries in the world pay great attention to the development of the service robot industry and invest in a large amount of resources for research and development. Although the relevant research technology of the service robot is mature, the complex external environment is still a great challenge for the service robot in the research of positioning navigation, human-computer interaction, computer vision, reasoning tasks and the like. By carrying out algorithm analysis on the video images captured by the service robot, the behavior of people in the scene can be judged, and then response reactions can be made. In order to identify the human behavior in the video, firstly, information with high relevance of the target human behavior in the video needs to be extracted, then key information is obtained through algorithm processing, and finally the obtained key information is used for identifying the human behavior.
Along with the miniaturization, integration and intellectualization of the camera and the flexibility of an interface of the camera, the service robot can capture indoor environment pictures in real time by carrying the camera. The traditional feature extraction method is to extract visual high-dimensional features by methods of space-time key point sampling, dense track sampling, body part sampling and the like, perform behavior prediction by using classifiers such as SVM (Support Vector Machine), RF (random form) and the like, perform end-to-end feature extraction and recognition by using a deep learning method in an automatic feature learning manner, particularly apply a graph convolution network on a human skeleton, avoid the influence of complex background, shape, RGB (red, green, blue) color and other information on recognition precision as much as possible, apply a key point recognition algorithm (such as openpose, mediaprofile and the like) to captured video pictures to obtain sequence information such as human key points and the like, send the key point sequence information to a constructed multi-scale aggregation space-time graph convolution network model to calculate so as to obtain behavior information of corresponding characters, and further enable a service robot to make corresponding responses (such as waving motions, the robot recognizes the motion and approaches the character).
In the existing scheme, a human body skeleton behavior identification method based on graph convolution mostly treats a human body skeleton sequence as a series of non-intersected graphs, and extracts features through a Graph Convolution (GCN) module in a space dimension and a convolution (TCN) module in a time dimension. Under the complex working environment of the service robot, the recognition efficiency of the behavior recognition model constructed based on the common graph convolution is not high, and the wrong recognition can cause the wrong interaction of the robot, so that the service quality of the robot and the experience of a service object are influenced. Therefore, a lightweight human behavior recognition model is urgently needed to be applied to the service robot.
Disclosure of Invention
The embodiment of the application aims to provide a behavior recognition method based on a service robot, and a lightweight volume human behavior recognition model capable of crossing space-time relations is designed, so that the overall recognition effect is ensured, the false recognition of similar actions is reduced, and the quality of remote visual interaction of the service robot is improved.
In order to achieve the above purpose, the present application provides the following technical solutions:
the embodiment of the application provides a behavior identification method based on a service robot, which comprises the following specific steps:
s1, extracting human body joint point sequences of 13 behavior categories commonly used in the service robot application scene to form a training data set;
s2, preprocessing the training data set, firstly extracting key frames of the joint point sequence, and then optimizing the joint point data by combining with the actual application scene;
s3, for a video shot in a real scene, firstly, carrying out key point estimation by adopting a body-25 human posture estimation model in openposition to obtain 25 key point coordinates and confidence coefficients, then, carrying out key point vacancy value filling on the obtained key point data by adopting a K nearest neighbor method, and finally, carrying out weighting optimization on joint point data by combining with an actual application scene to output 17 main joint points;
s4, constructing a lightweight multi-scale aggregation space-time map convolution deep learning neural network model by using a multi-scale space-time map convolution and time convolution module;
s5, training and testing the data set by using the constructed network model;
s6, identifying human body behaviors in the video image under the real scene to be identified by using the trained model;
and S7, the service robot receives the human behavior recognition result and responds correspondingly.
In the step S1, the training data set is derived from an NTU-RGB + D human behavior data set, and 13 behavior categories are selected: drinking, picking up, throwing away, sitting down, standing up, jumping, shaking head, tumbling, chest pain, waving hands, kicking, hugging and walking, 12324 skeleton files in total.
The step S2 of extracting key frames from the skeleton sequence includes:
on the premise that each section of video corresponding to different behavior types in the service robot application scene is extracted at intervals of 30 frames, 300 frames of data are reserved as a training set, less than 300 frames are repeatedly extracted from the beginning of the video, the number of people in joint data is judged, and joint data only containing one person is reserved for training and verifying the model.
The step S3 specifically includes:
s31, detecting character key points in a video image under a real scene by using an openposition human body key point detection algorithm model, obtaining horizontal and vertical coordinate values (x, y) of 25 skeleton joint points by using a body-25 human body joint point labeling model, splicing discrete joint points according to the physical connection mode of the human body joint points to form a human body skeleton space topological model, and then splicing the space topological graph of each frame in a time sequence to finally obtain a human body skeleton structure change space-time graph;
s32, for the missing detection condition of the whole frame data, defining the 0 th, 1 st and 8 th joint points as main key points, if the output joint point data corresponding to the video image has the condition that any one of the three groups of data is missing in a certain frame, judging that the whole frame data is missing detection, and deleting the joint point data corresponding to the video frame; for the condition that a part of key points of a certain frame are missing, a 2-order K nearest neighbor method is adopted for filling, training and parameter estimation are not needed, and the average value of horizontal and vertical coordinate values (x, y) of frames before and after the point is directly taken for supplement.
The step S4 specifically includes:
s41, graph convolution calculation process: in the best ofAfter the coordinates of the joint points are reached, the joint points are taken as vertexes, the natural connection of the joint points is taken as a bone edge, and the human bone is represented as a pictureWill beThe frame skeleton diagram is in time sequenceArranging and connecting the same-position joint points to form a space-time skeleton graph and a node setIs the set of all the joint points in each skeleton diagram, whereinThe number of joints per frame; edge setRepresented by two sets, the first subset representing the intra-skeleton connections of each frame, represented asWhereinIs a set of naturally connected human joints, the second subset representing connecting edges of identically located joint points between adjacent frames, toTo indicate that the user is not in a normal position,as serial number of joint point, by node setHem edge setA adjacency matrix can be obtainedThe graph convolution is calculated as follows:
wherein,in order to be an input, the user can select,in order to be output, the output is,in the form of a contiguous matrix, the matrix,is a weight that can be learned by the user,is the spatial dimension kernel size;
s42 calculating the self-adaptive graph convolution process as shown in the following formulaOn the basis of (1), addAndtwo matrices are provided, which are arranged in a matrix,is a weight that can be trained in a particular way,a unique map is learned for each sample,
s43, multi-scale space-time graph convolution calculation process: to better connect the spatial and temporal skeleton information, the first node of each nodeThe jump-to-adjacency matrix is tiled to form oneOf (2) matrix,Each node in the network is directly connected with the corresponding neighbor nodes on all frames, so that the jump connection between the nodes is realized, and the calculation process is as follows:
s44, MS-GCN multi-scale space-time graph convolution module: to input node information respectivelyExtracting the jump adjacency matrix and finally extracting the jump adjacency matrixThe matrix is spliced together and then the matrix is spliced,the serial number of the joint point;is the coordinates of the joint point and,representing nodesThe shortest distance between hops;
s45, MS-TCN time expansion convolution module: by usingConvolution for adjusting the number of channels of input informationThe convolution kernel processes the integrated information, processes the features after convolution processing in a mode similar to void convolution, connects the extracted features together, and finally adds the step length of 2Convolution is used for outputting the processed characteristics of the information;
s46, lightweight multi-scale space-time graph convolutional network MS-SGTCN _ S: in order to increase the robustness of the extracted features, two network branches are designed to carry out reasoning operation on input joint point data, wherein the first network branch consists ofThe system comprises a convolution module, an MS-GCN module and a full connection layer, wherein 4 MS-GCN modules are adopted in the middle to extract multi-scale space-time characteristics, and the multi-scale space-time characteristics are realized by adopting different time and space sliding windows; the second branch consists of an MS-GCN module and two MS-TCN modules, a long-range time module is adopted to enhance the attention degree of the network to the context change of the joint point in the time dimension, then the characteristic information obtained by the two branches is uniformly sent to the MS-TCN module, then the characteristics are spliced together through a full connecting layer, and then the characteristics are subjected to softmax classifierThe category with the maximum probability obtained after processing is the predicted human behavior, in order to further improve the accuracy of the algorithm, a double-flow network is designed to train the joint points and the framework sequences respectively, then confidence statistics is carried out on the prediction results of the joint points and the framework double-flow network, and the human behavior with high confidence is used as the predicted value of final output.
In step S46, a dual-flow network is designed to train the joint point and the skeleton sequence, and then a confidence statistic is performed on the prediction results of the joint point and the skeleton dual-flow network, and the human behavior with higher confidence is the final output prediction value.
In the step 7, in order to reduce the influence of the complex external environment on the working quality of the service robot, the robot is designed to respond to a certain behavior after receiving a behavior signal for more than 2 seconds continuously, and for dangerous behaviors, the service robot sends alarm information to remind a worker to process the dangerous behaviors.
Compared with the prior art, the invention has the beneficial effects that: the invention can accurately identify the human body behaviors in the scene, and ensures the service quality of the service robot.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are required to be used in the embodiments of the present application will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present application and therefore should not be considered as limiting the scope, and that those skilled in the art can also obtain other related drawings based on the drawings without inventive efforts.
FIG. 1 is a schematic flow chart of a method according to an embodiment of the present invention;
FIG. 2 is a visualization diagram of a training data skeleton according to an embodiment of the present invention;
FIG. 3 is a body skeleton spatial topology model according to an embodiment of the present invention;
FIG. 4 is a time-space diagram illustrating the change of the skeleton structure of a human body according to an embodiment of the present invention;
FIG. 5 is a block diagram illustrating a MS-GCN multi-scale space-time graph convolution module according to an embodiment of the present invention;
FIG. 6 is a block diagram of an MS-TCN time-dilation convolution module in accordance with an embodiment of the present invention;
FIG. 7 is a multi-scale space-time graph convolution network according to an embodiment of the present invention;
FIG. 8 shows a test set RGB video image test result 1 according to an embodiment of the present invention;
FIG. 9 shows a test set RGB video image test result 2 according to an embodiment of the present invention;
fig. 10 is a result 1 of human behavior recognition in a real scene according to an embodiment of the present invention;
fig. 11 is a result 2 of human behavior recognition in a real scene according to an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present application will be described below with reference to the drawings in the embodiments of the present application. It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, it need not be further defined and explained in subsequent figures.
The terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.
As shown in fig. 1, an embodiment of the present application provides a behavior recognition method based on a service robot, including the following specific steps:
s1, extracting human body joint point sequences of 13 behavior categories commonly used in the service robot application scene to form a training data set;
s2, preprocessing the training data set, firstly extracting key frames of the joint point sequence, and then optimizing the joint point data by combining with an actual application scene;
s3, for a video shot in a real scene, firstly, carrying out key point estimation by adopting a body-25 human posture estimation model in openposition to obtain 25 key point coordinates and confidence coefficients, then, carrying out key point vacancy value filling on the obtained key point data by adopting a K nearest neighbor method, and finally, carrying out weighting optimization on joint point data by combining with an actual application scene to output 17 main joint points;
s4, constructing a lightweight multi-scale aggregation space-time map convolution deep learning neural network model by using a multi-scale space-time map convolution and time convolution module;
s5, training and testing the data set by using the constructed network model;
s6, identifying human body behaviors in the video image under the real scene to be identified by using the trained model;
and S7, the service robot receives the human behavior recognition result and responds correspondingly.
In step S1, the training data is derived from the NTU-RGB + D human behavior data set manufactured by the university of southern beauty, singapore, and 13 daily behaviors and medical behaviors are selected: drinking, picking up, throwing away, sitting down, standing up, jumping, shaking head, tumbling, chest pain, waving hands, kicking, hugging and walking, 12324 skeleton files in total.
In step S2, for the situation that the video durations corresponding to different action types are different, the original data is processed by adopting a method of sampling at intervals and cyclically repeating from the starting frame, that is, on the premise that each video segment extracts one frame at intervals of 30 frames, 200 frames of data are retained as a training set, and less than 200 frames of data are repeatedly extracted from the beginning of the video. The design algorithm judges the number of people in the joint point data, the joint point data only containing a single person is reserved for training and verifying the model, specifically, counting operation is carried out on the joint points, if the total number of the joint points is more than 25, the fact that interference crowds appear in the joint point data can be judged, and then the joint point data is deleted.
In order to further improve the efficiency of algorithm operation and be compatible with key point data of a body _25 human body posture estimation model in openposition, 25 joint point data in an original data set are subjected to weighted optimization processing to remove part of joint points which have little influence on the service robot service object behavior, and the joint point data are recoded.
The set of nodes of the training data set is represented by the following equation:
wherein,in order to train a set of data set nodes,to be at timeThe coordinate values of the joint points are obtained, and the data set is subjected to a frame-taking process, whereinIs set to be 200 a and is,there are 25 joint points in total, which are the serial numbers of the joint points.
The set of 17 joint points after the weighted optimization process is represented by the following formula:
wherein,in order to weight-optimize the set of back-joint points,for after weighted optimization in timeJoint of the lower limbPoint coordinate values, as in the above formula, hereIs set to a maximum value of 200 a,there are 17 joint points in total, which are the serial numbers of the joint points.
The joint points with the middle serial numbers of 1, 3, 4, 6, 7, 9, 10, 11, 12, 13, 14, 15, 16 and 17 correspond to the joint points respectively Joint point numbers 1, 5, 6, 9, 10, 13, 14, 15, 16, 17, 18, 19, 20, 21 in (1).
In the head joint points 3, 4Integrating the joint point set into a joint point 2 through weighted optimization calculation;the left-hand joint points 7, 8, 22 and 23 inIntegrating the joint point set into a joint point 5 through weighted optimization calculation;right hand joint points 11, 12, 24, 25 inIntegrating the joint point set into a joint point 8 through weighted optimization calculation; the weighted calculation formula is as follows:
whereinFrom a set of joint points; From a set of joint points; The optimization coefficients are weighted for the joint.
After recoding, the 17 joint points are finally output as training set data, and the formed human skeleton topological structure diagram is shown in fig. 2 below.
The specific flow of step S3 is:
and S31, detecting the key points of the person in the video image under the real scene by using an openposition human key point detection algorithm model, and obtaining the horizontal and vertical coordinate values (x, y) and the confidence coefficient S of the 25 skeletal joint points by using a body-25 human joint point labeling model. The discrete joint points are spliced together according to the physical connection mode of the human body joint points to form a human body skeleton space topological model, such as a graph 2, and then the space topological graph of each frame in time sequence is spliced together to finally obtain a human body skeleton structure change space-time graph, such as a graph 3.
S32, due to the influence of external factors such as light, shading, character behavior change and the like, the problem of missing detection is difficult to avoid when the openposition human posture estimation algorithm is used for key point estimation, and the missing detection has two conditions of missing detection of the whole frame and missing detection of partial key points in a certain frame. For the first case, defining the 0 th, 1 st and 8 th as main key points, if any group of the three groups of data in a certain frame is missing, judging that the data in the whole frame is missed to be detected, and deleting the data of the frame; for the condition that the second part of key points are missing, a 2-order K neighbor method is adopted for filling, the mean value of frame data before and after the point is taken for filling, the rationality of data filling is ensured under the condition of small calculated amount, and the accuracy relation of complete joint point data to human behavior identification is tight.
And S33, outputting 17 joint points after the 25 joint points in the figure 3 are recoded by a weighted optimization algorithm by the joint points in the synchronization step S2.
The 25 node sets output by the body _25 model in openposition human body posture estimation are represented by the following formula:
wherein,in order to train a set of data set nodes,for the joint coordinate values taken at time, since the data is used for testing, hereIs the duration T of the entire video segment,there are 25 joint points in total, which are the serial numbers of the joint points.
The joint points with the middle serial numbers of 1, 3, 4, 5, 6, 7, 8, 9, 10, 13, 14 and 17 correspond to the joint points respectively Joint point numbers 8, 5, 6, 7, 2, 3, 4, 12, 13, 9, 10, 1 in (1).
The head joint points 0, 15, 16, 17 and 18 inIntegrating the joint point set into a joint point 2 through weighted optimization calculation;in the left foot joint 14, 21The joint point set is integrated into a joint point 11 through weighted optimization calculation, and joint points 19 and 20 are inIntegrating the joint point set into joint points 12 through weighted optimization calculation;right foot articulation point 11, 24 inThe joint point set is integrated into a joint point 15 through weighted optimization calculation, and joint points 22 and 23 are inIntegrating the joint point set into a joint point 16 through weighted optimization calculation; the weighted calculation formula is as follows:
whereinFrom a set of joint points; From a set of joint points; The optimization coefficients are weighted for the joint.
The specific flow of step S4 is:
s41, graph convolution calculation process: after the coordinates of the joint points are obtained, the joint points are taken as vertexesThe natural connection of the joint points is taken as the skeleton edgeHuman skeleton can be represented as a mapWill beThe frame skeleton diagrams are arranged according to a time sequence and are connected with the joint points at the same positions to form a space-time skeleton diagram. Node setIs the set of all the joint points in each skeleton diagram, and the calculation process of the diagram convolution is as follows:
wherein,in order to be an input, the user can select,in order to be output, the output is,in the form of a contiguous matrix, the matrix,are learnable weights.
And S42, in the adaptive graph convolution calculation process, scientific researchers successively put forward adaptive graph convolution due to the fact that a topological structure fixed by graph convolution is not friendly to the joint points which are not physically connected but have strong relevance. The calculation process is shown in the following formula, in the original adjacency matrixOn the basis of (2), newly addingAndtwo matrices.Is a trainable weight and has no pairIt is subject to any constraints such as normalization, i.e.The parameters are parameters completely learned from data, and can not only indicate whether two nodes are in contact or not, but also indicate the strength of the contact, and the difference between the parameters and the ST-GCN is a fusion mode. ST-GCN is a multiplication, here an addition, which can result in a nonexistent association.A unique graph is learned for each sample, and a very classical Gaussian embedding function is adopted, so that the similarity between joints can be captured.
S43, a multi-scale space-time graph convolution calculation process: to better connect the spatial and temporal skeleton information, the first node of each nodeThe jump-to-adjacency matrix is tiled to form oneOf (2) matrix,Each node in the network is directly connected with the corresponding neighbor nodes on all frames, so that the jump connection between the nodes is realized, and the calculation process is as follows:
s44, MS-GCN multi-scale space-time graph convolution module: respectively to the input nodeFirst of informationExtracting the jump adjacency matrix and finally extracting the jump adjacency matrixThe matrices are spliced together.
S45, MS-TCN time expansion convolution module: by usingConvolution for adjusting the number of channels of input informationThe convolution kernel processes the integrated information, processes the features after convolution processing in a mode similar to void convolution, connects the extracted features together, and finally adds the step length of 2The convolution has a certain correction effect on the features after the output information is processed and the proposed features.
S46, lightweight multi-scale space-time graph convolutional network MS-SGTCN _ S: in order to increase the robustness of the extracted features, two network branches are designed to carry out reasoning operation on the input joint point data. The first network branch is composed ofThe system comprises a convolution module, an MS-GCN module and a full connection layer, wherein 4 MS-GCN modules are adopted in the middle to extract multi-scale space-time characteristics, and the multi-scale space-time characteristics are realized by adopting different time and space sliding windows; the second branch is composed of an MS-GCN module and two MS-TCN modules, and a long-range time module is adopted to strengthen the attention of the network to the context change of the joint point in the time dimension. Then, the feature information obtained by the two branches is uniformly sent to an MS-TCN module, then the features are spliced together through a full connection layer, and the class with the maximum probability is obtained after the features are processed by a softmax classifierPredicted human behavior. In order to further improve the accuracy of the algorithm, a double-flow network is designed to train the joint points and the framework sequences respectively, then confidence statistics is carried out on the prediction results of the joint points and the framework double-flow network, and the human body behavior with higher confidence is used as the final output prediction value.
Step S5 trains and tests the data set using the constructed network model
The method is characterized in that a double-flow behavior recognition model is realized based on PyTorch, the method is carried out under CUDA11.1 and 3080Ti GPU, a small-batch stochastic gradient descent algorithm is used for learning network working parameters, the batch size is set to be 32, the momentum is set to be 0.9, the initial learning rate is 0.05, training iterations are reduced in the 25 th and 35 th training iterations, the weight attenuation is set to be 0.0005, and the accuracy under the division rules of X-Sub and X-View is shown in the following table:
in order to verify that the designed network can reduce the false recognition of similar human behaviors, the accuracy rates of 13 human behaviors are respectively counted, and the results are shown in the following table:
the accuracy of recognizing three human behaviors with high similarity, namely drinking, shaking head and chest pain, still reaches more than 95%, and the designed lightweight multi-scale space-time diagram convolutional network can reduce the false recognition of similar human behaviors.
The results of testing the test set of RGB video images are shown in fig. 8 and 9 below;
the result of identifying the human behavior in the real scene is shown in fig. 10 and fig. 11, and thus, the multi-scale space-time graph convolution model constructed by the invention can quickly and effectively identify the human behavior in the real scene, and the service quality of the service robot is ensured.
In order to reduce the influence of the complex external environment on the working quality of the service robot, the robot is designed to react to a certain behavior after receiving a certain behavior signal for more than 2 seconds continuously. For some dangerous behaviors, the service robot sends alarm information to remind a worker to process; the positioning technology and the obstacle avoidance technology of the service robot are combined, for example, when the service robot receives a hand waving action, the robot can move to a customer to carry out service; by combining the face recognition technology, the guest information can be registered and guided to a specified seat for service.
The above description is only an example of the present application and is not intended to limit the scope of the present application, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, improvement and the like made within the spirit and principle of the present application shall be included in the protection scope of the present application.
Claims (7)
1. A behavior identification method based on a service robot is characterized by comprising the following specific steps:
s1, extracting human body joint point sequences of 13 behavior categories commonly used in the service robot application scene to form a training data set;
s2, preprocessing the training data set, firstly extracting key frames of the joint point sequence, and then optimizing the joint point data by combining with an actual application scene;
s3, for a video shot in a real scene, firstly, carrying out key point estimation by adopting a body-25 human posture estimation model in openposition to obtain 25 key point coordinates and confidence coefficients, then, carrying out key point vacancy value filling on the obtained key point data by adopting a K nearest neighbor method, and finally, carrying out weighting optimization on joint point data by combining with an actual application scene to output 17 main joint points;
s4, constructing a lightweight multi-scale aggregation space-time map convolution deep learning neural network model by using a multi-scale space-time map convolution and time convolution module;
s5, training and testing the data set by using the constructed network model;
s6, identifying human body behaviors in the video image under the real scene to be identified by using the trained model;
and S7, the service robot receives the human behavior recognition result and responds correspondingly.
2. The behavior recognition method based on the service robot as claimed in claim 1, wherein the training data set in step S1 is derived from NTU-RGB + D human behavior data set, and 13 behavior categories are selected: drinking, picking up, throwing away, sitting down, standing up, jumping, shaking head, tumbling, chest pain, waving hands, kicking, hugging and walking, 12324 skeleton files in total.
3. The service robot-based behavior recognition method as claimed in claim 1, wherein the step S2 of performing key frame extraction on the skeleton sequence comprises:
on the premise that each section of video corresponding to different behavior types in the service robot application scene is extracted at intervals of 30 frames, 300 frames of data are reserved as a training set, less than 300 frames are repeatedly extracted from the beginning of the video, the number of people in joint data is judged, and joint data only containing one person is reserved for training and verifying the model.
4. The service robot-based behavior recognition method according to claim 1, wherein the step S3 specifically comprises:
s31, detecting the human key points in the video image under the real scene by using an openposition human key point detection algorithm model, obtaining the horizontal and vertical coordinate values (x, y) of 25 skeletal joint points by using a body-25 human joint point labeling model, splicing the discrete joint points according to the physical connection mode of the human joint points to form a human skeleton space topological model, and then splicing the space topological graph of each frame in time sequence to finally obtain a human skeleton structure change space-time graph;
s32, for the missing detection condition of the whole frame data, defining the 0 th, 1 st and 8 th joint points as the main key points, if the output joint point data corresponding to the video image has any one of the three groups of data that a certain frame is missing, judging that the whole frame data is missing, and deleting the joint point data corresponding to the video frame; for the condition that a part of key points of a certain frame are missing, a 2-order K nearest neighbor method is adopted for filling, training and parameter estimation are not needed, and the average value of horizontal and vertical coordinate values (x, y) of frames before and after the point is directly taken for supplement.
5. The service robot-based behavior recognition method according to claim 1, wherein the step S4 specifically comprises:
s41, graph convolution calculation process: after obtaining the coordinates of the joint points, the human skeleton is represented as a map by using the joint points as vertexes and the natural connections of the joint points as skeleton edgesWill beThe frame skeleton diagram is in time sequenceArranging and connecting the same-position joint points to form a space-time skeleton graph and a node setIs the set of all the joint points in each skeleton diagram, whereinThe number of joints per frame; edge setRepresented by two sets, the first subset representing the intra-skeleton connections of each frame, represented asWhereinIs a set of naturally connected human joints, the second subset representing connecting edges of identically located joint points between adjacent frames, toTo indicate that the position of the movable member,as serial number of joint point, by node setHem edge setAn adjacency matrix can be obtainedThe graph convolution is calculated as follows:
wherein,in order to be an input, the user can select,in order to be output, the output is,in the form of a contiguous matrix, the matrix,is a weight that can be learned by the user,is the spatial dimension kernel size;
s42 calculating the self-adaptive graph convolution process as shown in the following formulaOn the basis of (2), newly addingAndtwo matrices are provided, which are arranged in a matrix,is a weight that can be trained in a particular way,a unique map is learned for each sample,
s43, a multi-scale space-time graph convolution calculation process: to better connect the spatial and temporal skeleton information, the first node of each nodeThe jump-to-adjacency matrix is tiled to form oneOf (2) matrix,Corresponding on each node and all frames inThe neighbor nodes are directly connected, so that the jump connection between the nodes is realized, and the calculation process is as follows:
s44, MS-GCN multi-scale space-time graph convolution module: to input node information respectivelyExtracting the jump adjacency matrix and finally extracting the jump adjacency matrixThe matrix is spliced together and then the matrix is spliced,the serial number of the joint point;is the coordinate of the joint point and is,representing nodesThe shortest distance between hops;
s45, MS-TCN time expansion convolution module: by usingConvolution for adjusting the number of channels of input informationThe convolution kernel processes the integrated information, processes the features after convolution processing in a mode similar to void convolution, connects the extracted features together, and finally adds the step length of 2Convolution is used for outputting the processed characteristics of the information;
s46, lightweight multi-scale space-time graph convolutional network MS-SGTCN _ S: in order to increase the robustness of the extracted features, two network branches are designed to carry out reasoning operation on input joint point data, wherein the first network branch consists ofThe system comprises a convolution module, an MS-GCN module and a full connection layer, wherein 4 MS-GCN modules are adopted in the middle to extract multi-scale space-time characteristics, and the multi-scale space-time characteristics are realized by adopting different time and space sliding windows; the second branch is composed of an MS-GCN module and two MS-TCN modules, a long-range time module is adopted to enhance the attention degree of the network to the context change of the joint point in the time dimension, then the feature information obtained by the two branches is uniformly sent to the MS-TCN module, then the features are spliced together through a full connecting layer, the type with the maximum probability is the predicted human behavior after being processed by a softmax classifier, in order to further improve the accuracy of the algorithm, a double-flow network is designed to train the joint point and the framework sequence respectively, then the confidence statistics is carried out on the prediction results of the joint point and the framework double-flow network, and the human behavior with high confidence is taken as the predicted value of final output.
6. The behavior recognition method based on the service robot as claimed in claim 5, wherein in step S46, a dual-flow network is designed to train the joint point and the skeleton sequence, and then a confidence statistic is performed on the prediction results of the joint point and the skeleton dual-flow network, and the human behavior with higher confidence is the final predicted value.
7. The behavior recognition method based on the service robot as claimed in claim 1, wherein in step 7, in order to reduce the influence of the complex external environment on the working quality of the service robot, the robot is designed to respond to a certain behavior signal after receiving the behavior signal for more than 2 seconds continuously, and for dangerous behaviors, the service robot sends alarm information to remind a worker to process the dangerous behaviors.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210484610.8A CN114582030B (en) | 2022-05-06 | 2022-05-06 | Behavior recognition method based on service robot |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210484610.8A CN114582030B (en) | 2022-05-06 | 2022-05-06 | Behavior recognition method based on service robot |
Publications (2)
Publication Number | Publication Date |
---|---|
CN114582030A true CN114582030A (en) | 2022-06-03 |
CN114582030B CN114582030B (en) | 2022-07-22 |
Family
ID=81769365
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210484610.8A Active CN114582030B (en) | 2022-05-06 | 2022-05-06 | Behavior recognition method based on service robot |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114582030B (en) |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114881179A (en) * | 2022-07-08 | 2022-08-09 | 济南大学 | Intelligent experiment method based on intention understanding |
CN115035596A (en) * | 2022-06-05 | 2022-09-09 | 东北石油大学 | Behavior detection method and apparatus, electronic device, and storage medium |
CN115586834A (en) * | 2022-11-03 | 2023-01-10 | 天津大学温州安全(应急)研究院 | Intelligent cardio-pulmonary resuscitation training system |
CN115810203A (en) * | 2022-12-19 | 2023-03-17 | 天翼爱音乐文化科技有限公司 | Obstacle avoidance identification method, system, electronic equipment and storage medium |
CN116386087A (en) * | 2023-03-31 | 2023-07-04 | 阿里巴巴(中国)有限公司 | Target object processing method and device |
CN116665312A (en) * | 2023-08-02 | 2023-08-29 | 烟台大学 | Man-machine cooperation method based on multi-scale graph convolution neural network |
Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109064471A (en) * | 2018-07-18 | 2018-12-21 | 中北大学 | A kind of three-dimensional point cloud model dividing method based on skeleton |
CN111652124A (en) * | 2020-06-02 | 2020-09-11 | 电子科技大学 | Construction method of human behavior recognition model based on graph convolution network |
CN112949569A (en) * | 2021-03-25 | 2021-06-11 | 南京邮电大学 | Effective extraction method of human body posture points for tumble analysis |
CN113657349A (en) * | 2021-09-01 | 2021-11-16 | 重庆邮电大学 | Human body behavior identification method based on multi-scale space-time graph convolutional neural network |
WO2022000420A1 (en) * | 2020-07-02 | 2022-01-06 | 浙江大学 | Human body action recognition method, human body action recognition system, and device |
CN114187653A (en) * | 2021-11-16 | 2022-03-15 | 复旦大学 | Behavior identification method based on multi-stream fusion graph convolution network |
CN114220176A (en) * | 2021-12-22 | 2022-03-22 | 南京华苏科技有限公司 | Human behavior recognition method based on deep learning |
CN114399648A (en) * | 2022-01-17 | 2022-04-26 | Oppo广东移动通信有限公司 | Behavior recognition method and apparatus, storage medium, and electronic device |
US20220138536A1 (en) * | 2020-10-29 | 2022-05-05 | Hong Kong Applied Science And Technology Research Institute Co., Ltd | Actional-structural self-attention graph convolutional network for action recognition |
-
2022
- 2022-05-06 CN CN202210484610.8A patent/CN114582030B/en active Active
Patent Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109064471A (en) * | 2018-07-18 | 2018-12-21 | 中北大学 | A kind of three-dimensional point cloud model dividing method based on skeleton |
CN111652124A (en) * | 2020-06-02 | 2020-09-11 | 电子科技大学 | Construction method of human behavior recognition model based on graph convolution network |
WO2022000420A1 (en) * | 2020-07-02 | 2022-01-06 | 浙江大学 | Human body action recognition method, human body action recognition system, and device |
US20220138536A1 (en) * | 2020-10-29 | 2022-05-05 | Hong Kong Applied Science And Technology Research Institute Co., Ltd | Actional-structural self-attention graph convolutional network for action recognition |
CN112949569A (en) * | 2021-03-25 | 2021-06-11 | 南京邮电大学 | Effective extraction method of human body posture points for tumble analysis |
CN113657349A (en) * | 2021-09-01 | 2021-11-16 | 重庆邮电大学 | Human body behavior identification method based on multi-scale space-time graph convolutional neural network |
CN114187653A (en) * | 2021-11-16 | 2022-03-15 | 复旦大学 | Behavior identification method based on multi-stream fusion graph convolution network |
CN114220176A (en) * | 2021-12-22 | 2022-03-22 | 南京华苏科技有限公司 | Human behavior recognition method based on deep learning |
CN114399648A (en) * | 2022-01-17 | 2022-04-26 | Oppo广东移动通信有限公司 | Behavior recognition method and apparatus, storage medium, and electronic device |
Non-Patent Citations (3)
Title |
---|
LEI SHI 等;: "《Two-Stream Adaptive Graph Convolutional Networks for Skeleton-Based Action Recognition》", 《APARXIV:1805.07694V3》 * |
ZIYU LIU1 等;: "《Disentangling and Unifying Graph Convolutions for Skeleton-Based Action Recognition》", 《IEEE》 * |
郑诗雨: "《基于自适应时空融合图卷积网络的人体动作识别方法研究》", 《中国优秀硕士学位论文全文数据库 信息科技辑》 * |
Cited By (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115035596A (en) * | 2022-06-05 | 2022-09-09 | 东北石油大学 | Behavior detection method and apparatus, electronic device, and storage medium |
CN115035596B (en) * | 2022-06-05 | 2023-09-08 | 东北石油大学 | Behavior detection method and device, electronic equipment and storage medium |
CN114881179A (en) * | 2022-07-08 | 2022-08-09 | 济南大学 | Intelligent experiment method based on intention understanding |
CN114881179B (en) * | 2022-07-08 | 2022-09-06 | 济南大学 | Intelligent experiment method based on intention understanding |
CN115586834A (en) * | 2022-11-03 | 2023-01-10 | 天津大学温州安全(应急)研究院 | Intelligent cardio-pulmonary resuscitation training system |
CN115810203A (en) * | 2022-12-19 | 2023-03-17 | 天翼爱音乐文化科技有限公司 | Obstacle avoidance identification method, system, electronic equipment and storage medium |
CN115810203B (en) * | 2022-12-19 | 2024-05-10 | 天翼爱音乐文化科技有限公司 | Obstacle avoidance recognition method, system, electronic equipment and storage medium |
CN116386087A (en) * | 2023-03-31 | 2023-07-04 | 阿里巴巴(中国)有限公司 | Target object processing method and device |
CN116386087B (en) * | 2023-03-31 | 2024-01-09 | 阿里巴巴(中国)有限公司 | Target object processing method and device |
CN116665312A (en) * | 2023-08-02 | 2023-08-29 | 烟台大学 | Man-machine cooperation method based on multi-scale graph convolution neural network |
CN116665312B (en) * | 2023-08-02 | 2023-10-31 | 烟台大学 | Man-machine cooperation method based on multi-scale graph convolution neural network |
Also Published As
Publication number | Publication date |
---|---|
CN114582030B (en) | 2022-07-22 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN114582030B (en) | Behavior recognition method based on service robot | |
CN109829436B (en) | Multi-face tracking method based on depth appearance characteristics and self-adaptive aggregation network | |
CN107463949B (en) | Video action classification processing method and device | |
CN108537743B (en) | Face image enhancement method based on generation countermeasure network | |
CN110472554A (en) | Table tennis action identification method and system based on posture segmentation and crucial point feature | |
CN110472612B (en) | Human behavior recognition method and electronic equipment | |
CN110472604B (en) | Pedestrian and crowd behavior identification method based on video | |
CN110569795A (en) | Image identification method and device and related equipment | |
CN110414432A (en) | Training method, object identifying method and the corresponding device of Object identifying model | |
CN109685037B (en) | Real-time action recognition method and device and electronic equipment | |
CN111274916A (en) | Face recognition method and face recognition device | |
CN107256386A (en) | Human behavior analysis method based on deep learning | |
CN113128424B (en) | Method for identifying action of graph convolution neural network based on attention mechanism | |
CN110070029A (en) | A kind of gait recognition method and device | |
CN110765839B (en) | Multi-channel information fusion and artificial intelligence emotion monitoring method for visible light facial image | |
CN116343330A (en) | Abnormal behavior identification method for infrared-visible light image fusion | |
CN112115775A (en) | Smoking behavior detection method based on computer vision in monitoring scene | |
CN114529984A (en) | Bone action recognition method based on learnable PL-GCN and ECLSTM | |
CN113516005A (en) | Dance action evaluation system based on deep learning and attitude estimation | |
CN113312973A (en) | Method and system for extracting features of gesture recognition key points | |
CN112906520A (en) | Gesture coding-based action recognition method and device | |
CN113239885A (en) | Face detection and recognition method and system | |
CN117218709A (en) | Household old man real-time state monitoring method based on time deformable attention mechanism | |
CN111797705A (en) | Action recognition method based on character relation modeling | |
CN113963202A (en) | Skeleton point action recognition method and device, electronic equipment and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |