CN112101219B - Intention understanding method and system for elderly accompanying robot - Google Patents
Intention understanding method and system for elderly accompanying robot Download PDFInfo
- Publication number
- CN112101219B CN112101219B CN202010970662.7A CN202010970662A CN112101219B CN 112101219 B CN112101219 B CN 112101219B CN 202010970662 A CN202010970662 A CN 202010970662A CN 112101219 B CN112101219 B CN 112101219B
- Authority
- CN
- China
- Prior art keywords
- gesture
- intention
- gesture recognition
- cin
- probability set
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 44
- 239000011159 matrix material Substances 0.000 claims abstract description 71
- 230000004927 fusion Effects 0.000 claims abstract description 46
- 238000004422 calculation algorithm Methods 0.000 claims abstract description 24
- 238000003062 neural network model Methods 0.000 claims abstract description 22
- 238000003709 image segmentation Methods 0.000 claims abstract description 10
- 238000012549 training Methods 0.000 claims description 30
- 230000008569 process Effects 0.000 claims description 16
- 238000004364 calculation method Methods 0.000 claims description 15
- 230000008859 change Effects 0.000 claims description 12
- 238000011156 evaluation Methods 0.000 claims description 11
- 230000009471 action Effects 0.000 claims description 9
- 230000003993 interaction Effects 0.000 claims description 9
- 239000002131 composite material Substances 0.000 claims description 6
- 230000009466 transformation Effects 0.000 claims description 3
- 230000006399 behavior Effects 0.000 description 27
- 230000036544 posture Effects 0.000 description 11
- 238000010586 diagram Methods 0.000 description 9
- 238000013527 convolutional neural network Methods 0.000 description 7
- 230000000694 effects Effects 0.000 description 7
- 238000013528 artificial neural network Methods 0.000 description 6
- 238000013145 classification model Methods 0.000 description 4
- 238000013135 deep learning Methods 0.000 description 4
- 230000002452 interceptive effect Effects 0.000 description 3
- 238000013461 design Methods 0.000 description 2
- 238000010801 machine learning Methods 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000007781 pre-processing Methods 0.000 description 2
- 238000011160 research Methods 0.000 description 2
- 230000035945 sensitivity Effects 0.000 description 2
- 238000007619 statistical method Methods 0.000 description 2
- 238000013179 statistical model Methods 0.000 description 2
- 235000002566 Capsicum Nutrition 0.000 description 1
- 239000006002 Pepper Substances 0.000 description 1
- 235000016761 Piper aduncum Nutrition 0.000 description 1
- 235000017804 Piper guineense Nutrition 0.000 description 1
- 244000203593 Piper nigrum Species 0.000 description 1
- 235000008184 Piper nigrum Nutrition 0.000 description 1
- 230000032683 aging Effects 0.000 description 1
- 230000004075 alteration Effects 0.000 description 1
- 238000004458 analytical method Methods 0.000 description 1
- 238000013459 approach Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000002902 bimodal effect Effects 0.000 description 1
- 238000013136 deep learning model Methods 0.000 description 1
- 230000008451 emotion Effects 0.000 description 1
- 238000007499 fusion processing Methods 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 238000005259 measurement Methods 0.000 description 1
- 230000000474 nursing effect Effects 0.000 description 1
- 238000003909 pattern recognition Methods 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
- 210000003296 saliva Anatomy 0.000 description 1
- 230000003068 static effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/20—Movements or behaviour, e.g. gesture recognition
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/29—Graphical models, e.g. Bayesian networks
- G06F18/295—Markov models or related models, e.g. semi-Markov models; Markov random fields; Networks embedding Markov models
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
- G06V10/26—Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
- G06V10/267—Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion by performing operations on regions, e.g. growing, shrinking or watersheds
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/20—Movements or behaviour, e.g. gesture recognition
- G06V40/28—Recognition of hand or arm movements, e.g. recognition of deaf sign language
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L2015/088—Word spotting
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
- G10L2015/223—Execution procedure of a spoken command
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Health & Medical Sciences (AREA)
- Multimedia (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Computation (AREA)
- General Health & Medical Sciences (AREA)
- General Engineering & Computer Science (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Human Computer Interaction (AREA)
- Computational Linguistics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Evolutionary Biology (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Psychiatry (AREA)
- Social Psychology (AREA)
- Bioinformatics & Computational Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Acoustics & Sound (AREA)
- Manipulator (AREA)
- Image Analysis (AREA)
Abstract
The invention provides an intention understanding method and system for an elderly accompanying robot, wherein the method comprises the following steps: acquiring gesture images and posture information of the old in real time, and performing image segmentation on the gesture images and the posture information to respectively form a gesture data set and a posture data set; inputting the gesture data set into a trained neural network model for gesture recognition to obtain a gesture recognition probability set, and inputting the gesture data set into a trained hidden Markov model for gesture recognition to obtain a gesture recognition probability set; performing intention fusion on the gesture recognition probability set and the gesture recognition probability set based on a fusion algorithm of a confusion matrix, and calculating weight proportions under different intentions when the two probability sets are fused by adopting an F1score under different intention classifications; and then determining the final recognition intention. Based on the method, an intention understanding system is also provided. The invention improves the intention understanding rate of the old accompanying robot system and the use satisfaction of the old for the social accompanying robot.
Description
Technical Field
The invention belongs to the technical field of training of old people, and particularly relates to an intention understanding method and system for an old accompanying robot.
Background
When aging is a problem in many countries around the world, it is difficult to give a parent the necessary care at all times because work as a child is busy. Meanwhile, through actual research on a nursing home and research on an accompanying robot by Sari Merilampi and the like, the robot accompanying is more and more approved by the old people, and the old accompanying robot provides a lot of services. However, the recognition rate and the intention understanding rate of the robot accompanying system in the intention understanding of the old people need to be improved, and particularly, the robot accompanying system has a plurality of unique characteristics in the movement of the old people, so that the interaction burden of the old people is increased when the old people use the accompanying robot, and some negative emotions of the old people are easily caused, therefore, the model fusion algorithm (SDFM) for effectively improving the intention understanding rate of the robot to the behavior of the old people is provided, and the human-computer interaction action design is applied to a real scene.
Because the deep learning model and the statistical model have advantages and disadvantages in the pattern recognition process, the statistical method is characterized by high judgment efficiency and slower judgment efficiency of the neural network, the establishment of the statistical method can be completely obtained according to the theory, and the recognition effect is obviously and more suitable for the training of the posture and posture information when the data volume is small or the training data is difficult to collect. The design of the structure and the algorithm of the neural network must depend on the experience of a designer, a high recognition effect can be ensured when the data volume is large enough, the method is suitable for gesture recognition of easily collected information, but the recognition effect is often not satisfactory when the data volume is small or training data is difficult to collect, and the success of the system has great chance.
Disclosure of Invention
In order to solve the technical problems, the invention provides an intention understanding method and system for an elderly accompanying robot, and the intention understanding method and system provided by the invention adopt a fusion scheme of fusion of an intention recognition result set and a weight matrix, wherein F1-score under each classification in a confusion matrix of a sub-model recognition result is used as a weight value to form the weight matrix, and a fuzzy evaluation operator is established by referring to a fuzzy evaluation theory to fuse two model recognition results, so that the accuracy and the sensitivity of the elderly intention figure recognition are improved.
In order to achieve the purpose, the invention adopts the following technical scheme:
an intention understanding method for an elderly accompanying robot comprises the following steps:
acquiring a behavior image of the old in real time, wherein the behavior image comprises a gesture image and gesture information, and performing image segmentation on the gesture image and the gesture information to respectively form a gesture data set and a gesture data set;
inputting the gesture data set into a trained neural network model for gesture recognition to obtain a gesture recognition probability set, and inputting the gesture data set into a trained hidden Markov model for gesture recognition to obtain a gesture recognition probability set;
performing intention fusion on the gesture recognition probability set and the gesture recognition probability set based on a fusion algorithm of a confusion matrix, and calculating weight proportion under different intentions when the gesture recognition probability set and the gesture recognition probability set are fused by adopting an F1score under different intention classifications; and then determining the final recognition intention.
Further, before the behavior image of the old person is obtained in real time, voice channel information is obtained, and keywords of the voice channel information are extracted to start the robot.
Further, the method for training the neural network model and the hidden markov model comprises:
acquiring gesture image samples and gesture information samples of a plurality of old people operating robots; the gesture image sample and the gesture information sample are divided by adopting an Otsu algorithm to form an old behavior feature set;
training the neural network model by adopting an old behavior feature set to obtain a neural network model capable of identifying gesture intentions; and training the hidden Markov model by adopting an old behavior feature set to obtain the hidden Markov model capable of identifying the gesture intention.
Further, the process of performing intent fusion on the gesture recognition probability set and the gesture recognition probability set by the fusion algorithm based on the confusion matrix is as follows:
building an intention fusion model F = F (I, cin, hin); wherein f is the model of the intent fusion; i is an intention weight matrix; cin is a gesture recognition probability set; hin is a gesture recognition probability set;
distributing weight values to the Cin to form an n x1 dimensional weight matrix Cconfi; distributing weighted values for Hin to form n x1 dimensional weighted matrix Hconfi respectively;
carrying out fuzzy change on Cconfi and Hconfi to obtain a latest intention probability matrix C; wherein C = Cconfi ° Cin + Hconfi ° Hin; where o is called a composite evaluation operator.
Further, the method for calculating the weight ratio of the gesture recognition probability set to the weight ratio of different intentions during fusion of the gesture recognition probability set by adopting the F1score under different intention classifications comprises the following steps:
assignment of F1score to different intentsAs the weight value under each intention classification of Cin; assign toAs the weight value under each intention classification of Hin; wherein
Based onObtaining n x1 dimensional weight matrix of CinBased onN x1 dimensional weight matrix of Hin can be obtained
Cin, hin, cconfi and Hconfi are subjected to fuzzy change to obtain a one-dimensional matrix [ lambda ] 1 ,λ 2 ,…,λ n ] T (ii) a Selecting the maximum value gamma in the matrix i (ii) a The i intention is the final recognition intention of the user; wherein [ lambda ] 1 ,λ 2 ,…,λ n ] T =Cin×Cconfi+Hin×Hconfi。
Further, when the robot does not complete the specified action according to the final intention:
starting target identification, and judging the distance of a specific obstacle by adopting image acquisition equipment;
after the target object reaches a designated target area, a target object is captured through voice interaction, and the initial coordinate of the target object is (x, y); the robot moves the target object to a video frame to obtain coordinates (x 1, y 1); the transformation process of the target object is (x → x1, y → y 1);
after the target object is positioned, performing grabbing operation; after the grabbing and receiving, the intention of the old people is captured in real time, and the robot is moved.
The invention also provides an intention understanding system for the elderly accompanying robot, which comprises an acquisition module, a training module and a building and calculating module;
the acquisition module is used for acquiring a behavior image of the old in real time, wherein the behavior image comprises a gesture image and gesture information, and the gesture image and the gesture information are subjected to image segmentation to respectively form a gesture data set and a gesture data set;
the training module inputs the gesture data set into a trained neural network model for gesture recognition to obtain a gesture recognition probability set, and inputs the gesture data set into a trained hidden Markov model for gesture recognition to obtain a gesture recognition probability set;
the building calculation module is used for performing intention fusion on the gesture recognition probability set and the gesture recognition probability set based on a fusion algorithm of a confusion matrix, and calculating weight proportion under different intentions when the gesture recognition probability set and the gesture recognition probability set are fused by adopting an F1score under different intention classifications; and then determining the final recognition intention.
Further, the device also comprises a starting module;
the starting module is used for acquiring voice channel information and extracting keywords of the voice channel information to start the robot.
Further, the execution process of the training module is as follows:
acquiring gesture image samples and gesture information samples of a plurality of old people operating robots; the gesture image sample and the gesture information sample are both segmented by adopting an Otsu algorithm to form an old age behavior feature set;
training the neural network model by adopting an old behavior feature set to obtain a neural network model capable of identifying gesture intentions; and training the hidden Markov model by adopting the aged behavior feature set to obtain the hidden Markov model capable of identifying the gesture intention.
Further, the building calculation module comprises a building module and a calculation module;
the process of building the module comprises the following steps: building an intention fusion model F = F (I, cin, hin); wherein f is the model of the intended fusion; i is an intention weight matrix; cin is a gesture recognition probability set; hin is a gesture recognition probability set; distributing weight values to the Cin to form an n x1 dimensional weight matrix Cconfi; distributing weighted values for Hin to form n x1 dimensional weighted matrix Hconfi respectively; carrying out fuzzy change on Ccondfi and Hcondfi to obtain a latest intention probability matrix C; wherein C = Cconfi ° Cin + Hconfi ° Hin; wherein o is called a composite evaluation operator;
the process of the calculation module is as follows: assigning values to the F1score under different intentionsAs weight values under each intent classification of Cin; is assigned toAs the weight value under each intention classification of Hin; wherein Based onObtaining n x1 dimension weight matrix of CinBased onN x1 dimensional weight matrix of Hin can be obtainedCin, hin, cconfi and Hconfi are subjected to fuzzy change to obtain a one-dimensional matrix [ lambda ] 1 ,λ 2 ,…,λ n ] T (ii) a Selecting the maximum value gamma in the matrix i (ii) a The i intention is the final recognition intention of the user; wherein [ lambda ] 1 ,λ 2 ,…,λ n ] T =Cin×Cconfi+Hin×Hconfi。
The effect provided in the summary of the invention is only the effect of the embodiment, not all the effects of the invention, and one of the above technical solutions has the following advantages or beneficial effects:
the invention provides an intention understanding method and system for an elderly accompanying robot, wherein the method comprises the following steps: acquiring a behavior image of the old in real time, wherein the behavior image comprises a gesture image and gesture information, and the gesture image and the gesture information are subjected to image segmentation to respectively form a gesture data set and a gesture data set; inputting the gesture data set into a trained neural network model for gesture recognition to obtain a gesture recognition probability set, and inputting the gesture data set into a trained hidden Markov model for gesture recognition to obtain a gesture recognition probability set; performing intention fusion on the gesture recognition probability set and the gesture recognition probability set based on a fusion algorithm of a confusion matrix, and calculating weight proportions under different intentions when the gesture recognition probability set and the gesture recognition probability set are fused by adopting an F1score under different intention classifications; and then determining the final recognition intention. Based on the intention understanding method for the old accompanying robot provided by the invention, an intention understanding system for the old accompanying robot is also provided. The invention provides a novel gesture recognition and posture recognition method based on deep learning, and solves the key problems of poor recognition rate, low robustness, poor universality and the like in the traditional gesture recognition algorithm and posture recognition algorithm. Calculating weight proportions under different intentions when the gesture recognition probability set and the gesture recognition probability set are fused by adopting the F1 score; and then final recognition intentions are determined, the current intention understanding rate of the old accompanying robot system is improved, and the use satisfaction of the old to the social accompanying robot is improved.
Drawings
Fig. 1 is a flowchart of an intention understanding method for an elderly accompanying robot according to embodiment 1 of the present invention;
FIG. 2 is a schematic diagram of a CNN neural network recognition model framework;
an HMM gesture recognition model framework schematic diagram is given in FIG. 3;
FIG. 4 is a schematic diagram of a bimodal decision-level fusion algorithm according to embodiment 1 of the present invention;
fig. 5 is a diagram of a fusion model architecture of deep learning and statistical probability provided in embodiment 1 of the present invention;
FIG. 6 is a confusion matrix of multiple classification intentions corresponding to CNN and HMM, respectively, in embodiment 1 of the present invention;
fig. 7 is a schematic view of an intention understanding system for an elderly accompanying robot in embodiment 2 of the present invention.
Detailed Description
In order to clearly explain the technical features of the present invention, the following detailed description of the present invention is provided with reference to the accompanying drawings. The following disclosure provides many different embodiments, or examples, for implementing different features of the invention. To simplify the disclosure of the present invention, specific example components and arrangements are described below. Moreover, the present disclosure may repeat reference numerals and/or letters in the various examples. This repetition is for the purpose of simplicity and clarity and does not in itself dictate a relationship between the various embodiments and/or configurations discussed. It should be noted that the components illustrated in the figures are not necessarily drawn to scale. Descriptions of well-known components and processing techniques and procedures are omitted so as to not unnecessarily limit the invention.
Example 1
According to the intention understanding method for the aged accompanying robot, which is provided by the embodiment 1 of the invention, through a fusion scheme of fusion of an intention recognition result set and a weight matrix, F1-score under each classification in a confusion matrix of sub-model recognition results is used as a weight value to form the weight matrix, and a fuzzy evaluation operator is established by referring to a fuzzy evaluation theory to fuse two model recognition results. Fig. 1 shows a flowchart of an intention understanding method for an elderly accompanying robot in embodiment 1 of the present invention.
In step S101, voice channel information is acquired, and a keyword of the voice channel information is extracted to start the robot. The invention carries out the keyword starting system of the voice capturing triggering system through the microphone of the robot, carries out the matching of word stock templates through using the Baidu voice recognition interface of the Baidu intelligent cloud, and starts the robot interaction system or carries out other man-machine interaction actions when capturing the preset keywords.
In step S102, a behavior image of the elderly person is obtained in real time, the behavior image includes a gesture image and gesture information, and the gesture image and the gesture information are both subjected to image segmentation to form a gesture data set and a gesture data set respectively.
However, most of the action data sets mainly comprise middle-aged people and people in all ages for action capture, but the actions of old people and middle-aged people and people in all ages are obviously different =, so that based on improvement of the intention recognition rate of the model for the actions of old people, image data of several gestures and postures used for operating a robot by a plurality of old people are collected, for example, 10-bit or 20-bit or more old people are collected, image segmentation is carried out on the basis of hands and body postures after extracting the old people area in the collected image data, the image segmentation is generally used as the premise of image recognition, and the image is segmented by using the saliva algorithm to form a gesture data set and a posture data set which are used for the old people and contain the gestures and the posture characteristics.
In step S103, the gesture data set is input to the trained neural network model for gesture recognition to obtain a gesture recognition probability set, and the gesture data set is input to the trained hidden markov model for gesture recognition to obtain a gesture recognition probability set.
The CNN neural network plays a better and better effect in gesture recognition. At present, static gesture recognition based on an interactive teaching platform reveals a relation between deep learning network training parameters and a model recognition rate. The gesture recognition based on the CNN neural network solves the key problems of poor recognition rate, low robustness, poor universality and the like in the traditional gesture recognition algorithm. A schematic diagram of a CNN neural network recognition model framework is given in fig. 2.
A Hidden Markov Model (HMM) is a statistical model that is used to describe a Markov process with hidden unknown parameters. The current HMM model can achieve 96% average arm motion recognition rate, and we input them into HMM-based classifier for training and recognition according to the motion feature trajectory of the user. After similar HMM models are built, model training is carried out by using the old people behavior data set, and the HMM models capable of identifying the behavior intentions of the old people are obtained. An HMM gesture recognition model framework schematic is given in fig. 3.
Training the neural network model by adopting an old behavior feature set to obtain a neural network model capable of identifying gesture intentions; and training the hidden Markov model by adopting the aged behavior feature set to obtain the hidden Markov model capable of identifying the gesture intention.
In step S104, performing intention fusion on the gesture recognition probability set and the gesture recognition probability set based on a fusion algorithm of a confusion matrix, and calculating weight proportions of the gesture recognition probability set and the gesture recognition probability set under different intentions by adopting an F1score under different intention classifications; and then determining the final recognition intention.
Building an intention fusion model F = F (I, cin, hin); wherein f is the model of the intended fusion; i is an intention weight matrix; cin is a gesture recognition probability set; hin is the gesture recognition probability set.
Distributing weight values to the Cin to form an n x1 dimensional weight matrix Cconfi; assigning weight values to the Hin to form n x1 dimensional weight matrix Hconfi respectively.
Carrying out fuzzy change on Cconfi and Hconfi to obtain a latest intention probability matrix C; wherein C = Cconfi ° Cin + Hconfi ° Hin. Where o is called a composite evaluation operator.
In the invention, the weight matrixes of two sub-models under different intention classifications are calculated, and the identification correctness of the two models under different intentions is evaluated in a mode of carrying out model evaluation by a multi-classification confusion matrix. The confusion matrix of the multi-classification task is a situation analysis table for summarizing the prediction result of the classification model in machine learning, and records in a data set are summarized in a matrix form according to two standards of real classification and classification judgment of the classification model prediction. The number of the confusion matrixes is counted, and the quality of the models is difficult to measure sometimes in the face of a large amount of data, so that the confusion matrixes extend 4 indexes on the basic statistical result, namely the accuracy, the precision, the recall rate and the specificity, and the results of the number in the confusion matrixes can be converted into the ratio between 0 and 1 through the four secondary indexes, so that the standardized measurement is facilitated. Expanding on the basis of the four indexes, another three-level index is generated, the index is called F1Score (F1 Score), is an index used for measuring the accuracy of the two classification models in statistics, gives consideration to the accuracy and the recall rate of the classification models, the F1Score is a harmonic average of the accuracy and the recall rate of the models, and the calculation formula is
The machine learning approach to multi-class problems, often using F1-score as the final measure, is consistent with the weight of each intent class as model fusion herein.
Assigning values to the F1score under different intentionsAs weight values under each intent classification of Cin; is assigned toAs the weight value under each intention classification of Hin; wherein
Based onObtaining n x1 dimensional weight matrix of CinBased onN x1 dimensional weight matrix of Hin can be obtained
Cin, hin, cconfi and Hconfi are subjected to fuzzy change to obtain a one-dimensional matrix [ lambda ] 1 ,λ 2 ,…,λ n ] T (ii) a Selecting the maximum value gamma in the matrix i (ii) a The subscript i intent ultimately identifies the intent for the user; wherein the fusion process is as follows: [ lambda ] 1 ,λ 2 ,…,λ n ] T = Cin × Cconfi + Hin × Hconfi. Fig. 4 is a schematic diagram of a dual-model decision-level fusion algorithm according to embodiment 1 of the present invention.
In the present invention, if the robot does not complete the specified action according to the final intention: starting target identification, and judging the distance of a specific obstacle by adopting image acquisition equipment; after the target object reaches a designated target area, a target object is captured through voice interaction, and the initial coordinate of the target object is (x, y); the robot moves the target object to a video frame to obtain coordinates (x 1, y 1); the transformation process of the target object is (x → x1, y → y 1); after the target object is positioned, performing grabbing operation; after the grabbing and receiving, the intention of the old people is captured in real time, and the robot is moved. And after the target object is handed to the old, the interaction is finished.
Fig. 5 is a diagram of a fusion model architecture for deep learning and statistical probability provided in embodiment 1 of the present invention. Firstly, the method obtains the most original voice channel information, sends the information into a preprocessing layer to perform voice system awakening and image information preprocessing, and sends the preprocessed information to a recognition layer, wherein the recognition layer comprises a CNN (hidden Markov model) trained by applying an aged people behavior data set (EIDS) and a Hidden Markov Model (HMM), two intention probability sets are obtained in real time through two submodels and provided for a model fusion layer, the real intention of a user is captured through a model fusion algorithm and is transmitted into an interactive behavior layer, and man-machine interaction is performed to finish the operation of the robot by the user to meet the requirements of the user. The system feeds back the user intention and carries out interactive action on the user intention through the pepper robot.
In the embodiment 1 of the invention, the intentions of the old people are divided into four intentions of controlling the robot, namely, controlling the robot to advance (I), stop (II), turn left (III) and turn right (IV).
Each column of the confusion matrix represents a prediction category, the total number of each column represents the number of data predicted as the category, each row represents the real attribution category of the data, in the experimental process, 4 intention classifications are adopted, 200 sample data are totally adopted, the data are divided into 4 categories and 50 sample data of each category, and a multi-category confusion matrix of two sub models is respectively established. Fig. 6 is a schematic diagram of a multi-class-intention confusion matrix corresponding to CNN and HMM respectively in embodiment 1. Then four indexes of the two confusion matrixes, namely accuracy, precision, sensitivity, recall rate and specificity, are calculated, and a formula is further utilizedAnd calculating F1-score under the two confusion matrixes as weight values under the intentions of the submodels. The following table shows the F1-SCORE for determining each intent of the submodel.
Example 2
An intention understanding system for an elderly accompanying robot in embodiment 2 of the present invention, as shown in fig. 7, is a schematic diagram of an intention understanding system for an elderly accompanying robot in embodiment 2 of the present invention. The system comprises an acquisition module, a training module and a building calculation module.
The acquisition module is used for acquiring behavior images of the old in real time, the behavior images comprise gesture images and posture information, and the gesture images and the posture information are subjected to image segmentation to form gesture data sets and posture data sets respectively.
The training module inputs the gesture data set into the trained neural network model to perform gesture recognition to obtain a gesture recognition probability set, and inputs the gesture data set into the trained hidden Markov model to perform gesture recognition to obtain a gesture recognition probability set.
The building calculation module is used for performing intention fusion on the gesture recognition probability set and the gesture recognition probability set based on a fusion algorithm of a confusion matrix, and calculating weight proportions under different intentions when the gesture recognition probability set and the gesture recognition probability set are fused by adopting an F1score under different intention classifications; and then determining the final recognition intention.
The system also includes a start module; the starting module is used for acquiring the voice channel information and extracting keywords of the voice channel information to start the robot.
The execution process of the training module comprises the following steps: acquiring gesture image samples and gesture information samples of a plurality of old people operating robots; and (4) segmenting the gesture image sample and the gesture information sample by adopting an Otsu algorithm to form an old age behavior feature set. Training the neural network model by adopting an old behavior feature set to obtain a neural network model capable of identifying gesture intentions; and training the hidden Markov model by adopting the aged behavior feature set to obtain the hidden Markov model capable of identifying the gesture intention.
The building calculation module comprises a building module and a calculation module; the process of building the module is as follows: building an intention fusion model F = F (I, cin, hin); wherein f is the model of the intended fusion; i is an intention weight matrix; cin is a gesture recognition probability set; hin is a gesture recognition probability set; distributing weight values to the Cin to form an n x1 dimensional weight matrix Cconfi; distributing weight values to Hin to form n x1 dimensional weight matrix Hconfi respectively; carrying out fuzzy change on Ccondfi and Hcondfi to obtain a latest intention probability matrix C; wherein C = Cconfi omicron Cin + Hconfi omicron; where o is called a composite evaluation operator. The process of the calculation module is as follows: assignment of F1score to different intentsAs weight values under each intent classification of Cin; is assigned toAs the weight value under each intention classification of Hin; wherein Based onObtaining n x1 dimensional weight matrix of CinBased onN x1 dimensional weight matrix of Hin can be obtainedCarrying out fuzzy change on Cin, hin, ccnfi and Hconfi to obtain a one-dimensional matrix [ lambda ] 1 ,λ 2 ,…,λ n ] T (ii) a Selecting the maximum value gamma in the matrix i (ii) a The i intention is the final recognition intention of the user; wherein [ lambda ] 1 ,λ 2 ,…,λ n ] T =Cin×Cconfi+Hin×Hconfi。
The invention improves the current intention understanding rate of the old accompanying robot system and improves the use satisfaction of the old to the social accompanying robot.
Although the embodiments of the present invention have been described with reference to the accompanying drawings, the scope of the present invention is not limited thereto. Various modifications and alterations will occur to those skilled in the art based on the foregoing description. And are neither required nor exhaustive of all embodiments. On the basis of the technical solution of the present invention, those skilled in the art can make various modifications or variations without creative efforts and still be within the scope of the present invention.
Claims (7)
1. An intention understanding method for an elderly accompanying robot is characterized by comprising the following steps:
acquiring a behavior image of the old in real time, wherein the behavior image comprises a gesture image and gesture information, and performing image segmentation on the gesture image and the gesture information to respectively form a gesture data set and a gesture data set;
inputting the gesture data set into a trained neural network model for gesture recognition to obtain a gesture recognition probability set, and inputting the gesture data set into a trained hidden Markov model for gesture recognition to obtain a gesture recognition probability set;
performing intention fusion on the gesture recognition probability set and the gesture recognition probability set based on a fusion algorithm of a confusion matrix, and calculating weight proportions under different intentions when the gesture recognition probability set and the gesture recognition probability set are fused by adopting an F1score under different intention classifications; further determining a final recognition intention; the process of the confusion matrix-based fusion algorithm for performing intention fusion on the gesture recognition probability set and the gesture recognition probability set is as follows:
building an intention fusion model F = F (I, cin, hin); wherein f is the model of the intent fusion; i is an intention weight matrix; cin is a gesture recognition probability set; hin is a posture recognition probability set;
distributing weight values to the Cin to form an n x1 dimensional weight matrix Cconfi; distributing weight values to Hin to form n x1 dimensional weight matrix Hconfi respectively;
carrying out fuzzy change on Ccondfi and Hcondfi to obtain a latest intention probability matrix C; wherein C = Cconfi omicron Cin + Hconfi omicron; wherein o is called a composite evaluation operator;
under different intention classifications, the method for calculating the weight ratio of the gesture recognition probability set to the weight ratio of different intentions during fusion of the gesture recognition probability set by adopting the F1score comprises the following steps:
assignment of F1score to different intentsAs weight values under each intent classification of Cin; is assigned toAs the weight value under each intention classification of Hin; wherein
Based onObtaining n x1 dimension weight matrix of CinBased onN x1 dimensional weight matrix of Hin can be obtained
Carrying out fuzzy change on Cin, hin, ccnfi and Hconfi to obtain a one-dimensional matrix [ lambda ] 1 ,λ 2 ,…,λ n ] T (ii) a Selecting the maximum value gamma in the matrix i (ii) a The i intention is the final recognition intention of the user; wherein [ lambda ] 1 ,λ 2 ,…,λ n ] T =Cin×Cconfi+Hin×Hconfi。
2. The method for understanding the intention of the elderly accompanying robot as claimed in claim 1, further comprising obtaining voice channel information and extracting keywords of the voice channel information to start the robot before the real-time obtaining of the behavior image of the elderly.
3. The method for understanding the intention of an elderly accompanying robot as claimed in claim 1, wherein the method for training the neural network model and the hidden markov model comprises:
acquiring gesture image samples and gesture information samples of a plurality of old people operating robots; the gesture image sample and the gesture information sample are both segmented by adopting an Otsu algorithm to form an old age behavior feature set;
training the neural network model by adopting an old behavior feature set to obtain a neural network model capable of identifying gesture intentions; and training the hidden Markov model by adopting an old behavior feature set to obtain the hidden Markov model capable of identifying the gesture intention.
4. The intent understanding method for an elderly accompanying robot as claimed in claim 1, wherein when the robot does not complete the designated action according to the final intent:
starting target identification, and judging the distance of a specific obstacle by adopting image acquisition equipment;
after the target object reaches a designated target area, a target object is captured through voice interaction, and the initial coordinate of the target object is (x, y); the robot moves the target object into the video frame to obtain coordinates (x 1, y 1); the transformation process of the target object is (x → x1, y → y 1);
after the target object is positioned, performing grabbing operation; after the grabbing and receiving, the intention of the old people is captured in real time, and the robot is moved.
5. An intention understanding system for an elderly accompanying robot is characterized by comprising an acquisition module, a training module and a building calculation module;
the acquisition module is used for acquiring a behavior image of the old in real time, wherein the behavior image comprises a gesture image and gesture information, and the gesture image and the gesture information are subjected to image segmentation to respectively form a gesture data set and a gesture data set;
the training module inputs the gesture data set into a trained neural network model for gesture recognition to obtain a gesture recognition probability set, and inputs the gesture data set into a trained hidden Markov model for gesture recognition to obtain a gesture recognition probability set;
the building calculation module is used for performing intention fusion on the gesture recognition probability set and the gesture recognition probability set based on a fusion algorithm of a confusion matrix, and calculating weight proportion under different intentions when the gesture recognition probability set and the gesture recognition probability set are fused by adopting an F1score under different intention classifications; further determining a final recognition intention; the building calculation module comprises a building module and a calculation module;
the process of building the module comprises the following steps: constructing an intention fusion model F = F (I, cin, hin); wherein f is the model of the intent fusion; i is an intention weight matrix; cin is a gesture recognition probability set; hin is a gesture recognition probability set; distributing weight values to the Cin to form an n x1 dimensional weight matrix Cconfi; distributing weight values to Hin to form n x1 dimensional weight matrix Hconfi respectively; carrying out fuzzy change on Ccondfi and Hcondfi to obtain a latest intention probability matrix C; wherein C = Cconfi omicron Cin + Hconfi omicron; wherein o is called a composite evaluation operator;
the process of the calculation module is as follows: assignment of F1score to different intentsAs the weight value under each intention classification of Cin; assign toAs the weight value under each intention classification of Hin; wherein Based onObtaining n x1 dimension weight matrix of CinBased onN x1 dimensional weight matrix of Hin can be obtainedCin, hin, cconfi and Hconfi are subjected to fuzzy change to obtain a one-dimensional matrix [ lambda ] 1 ,λ 2 ,…,λ n ] T (ii) a Selecting the maximum value gamma in the matrix i (ii) a The i intention is the final recognition intention of the user; wherein [ lambda ] 1 ,λ 2 ,…,λ n ] T =Cin×Cconfi+Hin×Hconfi。
6. The elderly accompanying robot oriented intention understanding system of claim 5, further comprising a starting module;
the starting module is used for acquiring voice channel information and extracting keywords of the voice channel information to start the robot.
7. The elderly accompanying and attending robot oriented intention understanding system as claimed in claim 5, wherein the training module is implemented by the following steps:
acquiring gesture image samples and gesture information samples of a plurality of old people operating robots; the gesture image sample and the gesture information sample are divided by adopting an Otsu algorithm to form an old behavior feature set;
training the neural network model by adopting an old behavior feature set to obtain a neural network model capable of identifying gesture intentions; and training the hidden Markov model by adopting an old behavior feature set to obtain the hidden Markov model capable of identifying the gesture intention.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010970662.7A CN112101219B (en) | 2020-09-15 | 2020-09-15 | Intention understanding method and system for elderly accompanying robot |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010970662.7A CN112101219B (en) | 2020-09-15 | 2020-09-15 | Intention understanding method and system for elderly accompanying robot |
Publications (2)
Publication Number | Publication Date |
---|---|
CN112101219A CN112101219A (en) | 2020-12-18 |
CN112101219B true CN112101219B (en) | 2022-11-04 |
Family
ID=73759249
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010970662.7A Active CN112101219B (en) | 2020-09-15 | 2020-09-15 | Intention understanding method and system for elderly accompanying robot |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112101219B (en) |
Families Citing this family (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112684711B (en) * | 2020-12-24 | 2022-10-11 | 青岛理工大学 | Interactive recognition method for human behavior and intention |
CN112766041B (en) * | 2020-12-25 | 2022-04-22 | 北京理工大学 | Method for identifying hand washing action of senile dementia patient based on inertial sensing signal |
CN113780750B (en) * | 2021-08-18 | 2024-03-01 | 同济大学 | Medical risk assessment method and device based on medical image segmentation |
CN113705440B (en) * | 2021-08-27 | 2023-09-01 | 华中师范大学 | Head posture estimation method and system for visual understanding of educational robot |
CN113848790A (en) * | 2021-09-28 | 2021-12-28 | 德州学院 | Intelligent nursing type robot system and control method thereof |
CN114092967A (en) * | 2021-11-19 | 2022-02-25 | 济南大学 | Real-time multi-mode accompanying robot intention understanding method and system |
CN116028880B (en) * | 2023-02-07 | 2023-07-04 | 支付宝(杭州)信息技术有限公司 | Method for training behavior intention recognition model, behavior intention recognition method and device |
Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103593680A (en) * | 2013-11-19 | 2014-02-19 | 南京大学 | Dynamic hand gesture recognition method based on self incremental learning of hidden Markov model |
CN105787471A (en) * | 2016-03-25 | 2016-07-20 | 南京邮电大学 | Gesture identification method applied to control of mobile service robot for elder and disabled |
CN108986801A (en) * | 2017-06-02 | 2018-12-11 | 腾讯科技(深圳)有限公司 | A kind of man-machine interaction method, device and human-computer interaction terminal |
WO2019204186A1 (en) * | 2018-04-18 | 2019-10-24 | Sony Interactive Entertainment Inc. | Integrated understanding of user characteristics by multimodal processing |
CN110554774A (en) * | 2019-07-22 | 2019-12-10 | 济南大学 | AR-oriented navigation type interactive normal form system |
CN110717381A (en) * | 2019-08-28 | 2020-01-21 | 北京航空航天大学 | Human intention understanding method facing human-computer cooperation and based on deep stacking Bi-LSTM |
CN111222341A (en) * | 2020-01-16 | 2020-06-02 | 中国平安人寿保险股份有限公司 | Method, device, equipment and storage medium for training hidden Markov model |
CN111582108A (en) * | 2020-04-28 | 2020-08-25 | 河北工业大学 | Gait recognition and intention perception method |
CN111596767A (en) * | 2020-05-27 | 2020-08-28 | 广州市大湾区虚拟现实研究院 | Gesture capturing method and device based on virtual reality |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20190272764A1 (en) * | 2018-03-03 | 2019-09-05 | Act, Inc. | Multidimensional assessment scoring using machine learning |
-
2020
- 2020-09-15 CN CN202010970662.7A patent/CN112101219B/en active Active
Patent Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103593680A (en) * | 2013-11-19 | 2014-02-19 | 南京大学 | Dynamic hand gesture recognition method based on self incremental learning of hidden Markov model |
CN105787471A (en) * | 2016-03-25 | 2016-07-20 | 南京邮电大学 | Gesture identification method applied to control of mobile service robot for elder and disabled |
CN108986801A (en) * | 2017-06-02 | 2018-12-11 | 腾讯科技(深圳)有限公司 | A kind of man-machine interaction method, device and human-computer interaction terminal |
WO2019204186A1 (en) * | 2018-04-18 | 2019-10-24 | Sony Interactive Entertainment Inc. | Integrated understanding of user characteristics by multimodal processing |
CN110554774A (en) * | 2019-07-22 | 2019-12-10 | 济南大学 | AR-oriented navigation type interactive normal form system |
CN110717381A (en) * | 2019-08-28 | 2020-01-21 | 北京航空航天大学 | Human intention understanding method facing human-computer cooperation and based on deep stacking Bi-LSTM |
CN111222341A (en) * | 2020-01-16 | 2020-06-02 | 中国平安人寿保险股份有限公司 | Method, device, equipment and storage medium for training hidden Markov model |
CN111582108A (en) * | 2020-04-28 | 2020-08-25 | 河北工业大学 | Gait recognition and intention perception method |
CN111596767A (en) * | 2020-05-27 | 2020-08-28 | 广州市大湾区虚拟现实研究院 | Gesture capturing method and device based on virtual reality |
Non-Patent Citations (5)
Title |
---|
A Method of Fusing Gesture and Speech for Human-robot Interaction;Junhong Meng et al.;《ICCDE 2020》;20200307;全文 * |
A Multimodal Framework Based on Integration of Cortical and Muscular Activities for Decoding Human Intentions About Lower Limb Motions;Chengkun Cui et al.;《IEEE TRANSACTIONS ON BIOMEDICAL CIRCUITS AND SYSTEMS》;20170831;全文 * |
Continuous action segmentation and recognition using hybrid convolutional neural network-hidden Markov model model;Jun Lei et al.;《IET Computer Vision》;20161231;全文 * |
Data fusion methods in multimodal human computer dialog;Ming-Hao YANG et al.;《虚拟现实与智能硬件》;20191231;全文 * |
基于 GA - BP 神经网络的接触式人机协作意图理解方法研究;张蕊 等;《组合机床与自动化加工技术》;20191130(第11期);全文 * |
Also Published As
Publication number | Publication date |
---|---|
CN112101219A (en) | 2020-12-18 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN112101219B (en) | Intention understanding method and system for elderly accompanying robot | |
CN104077579B (en) | Facial expression recognition method based on expert system | |
CN105739688A (en) | Man-machine interaction method and device based on emotion system, and man-machine interaction system | |
CN109101108B (en) | Method and system for optimizing human-computer interaction interface of intelligent cabin based on three decisions | |
CN110781829A (en) | Light-weight deep learning intelligent business hall face recognition method | |
CN111402928B (en) | Attention-based speech emotion state evaluation method, device, medium and equipment | |
US20230206928A1 (en) | Audio processing method and apparatus | |
CN107016046A (en) | The intelligent robot dialogue method and system of view-based access control model displaying | |
CN111666845B (en) | Small sample deep learning multi-mode sign language recognition method based on key frame sampling | |
CN104156729B (en) | A kind of classroom demographic method | |
CN111274978B (en) | Micro expression recognition method and device | |
CN112101243A (en) | Human body action recognition method based on key posture and DTW | |
CN111128157A (en) | Wake-up-free voice recognition control method for intelligent household appliance, computer readable storage medium and air conditioner | |
CN111444488A (en) | Identity authentication method based on dynamic gesture | |
CN111428666A (en) | Intelligent family accompanying robot system and method based on rapid face detection | |
CN111128240B (en) | Voice emotion recognition method based on anti-semantic-erasure | |
CN114495211A (en) | Micro-expression identification method, system and computer medium based on graph convolution network | |
CN114093028A (en) | Human-computer cooperation method and system based on intention analysis and robot | |
WO2024001539A1 (en) | Speaking state recognition method and apparatus, model training method and apparatus, vehicle, medium, computer program and computer program product | |
CN112580527A (en) | Facial expression recognition method based on convolution long-term and short-term memory network | |
CN111339878A (en) | Eye movement data-based correction type real-time emotion recognition method and system | |
CN111191510A (en) | Relation network-based remote sensing image small sample target identification method in complex scene | |
CN111898473B (en) | Driver state real-time monitoring method based on deep learning | |
CN114663910A (en) | Multi-mode learning state analysis system | |
CN109977777B (en) | Novel RF-Net model-based gesture recognition method |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |