CN112101219B - Intention understanding method and system for elderly accompanying robot - Google Patents

Intention understanding method and system for elderly accompanying robot Download PDF

Info

Publication number
CN112101219B
CN112101219B CN202010970662.7A CN202010970662A CN112101219B CN 112101219 B CN112101219 B CN 112101219B CN 202010970662 A CN202010970662 A CN 202010970662A CN 112101219 B CN112101219 B CN 112101219B
Authority
CN
China
Prior art keywords
gesture
intention
gesture recognition
cin
probability set
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010970662.7A
Other languages
Chinese (zh)
Other versions
CN112101219A (en
Inventor
冯志全
豆少松
郭庆北
杨晓晖
徐涛
田京兰
范雪
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
University of Jinan
Original Assignee
University of Jinan
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by University of Jinan filed Critical University of Jinan
Priority to CN202010970662.7A priority Critical patent/CN112101219B/en
Publication of CN112101219A publication Critical patent/CN112101219A/en
Application granted granted Critical
Publication of CN112101219B publication Critical patent/CN112101219B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/20Movements or behaviour, e.g. gesture recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/29Graphical models, e.g. Bayesian networks
    • G06F18/295Markov models or related models, e.g. semi-Markov models; Markov random fields; Networks embedding Markov models
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
    • G06V10/267Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion by performing operations on regions, e.g. growing, shrinking or watersheds
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/20Movements or behaviour, e.g. gesture recognition
    • G06V40/28Recognition of hand or arm movements, e.g. recognition of deaf sign language
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L2015/088Word spotting
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • G10L2015/223Execution procedure of a spoken command

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Health & Medical Sciences (AREA)
  • Multimedia (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Human Computer Interaction (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Evolutionary Biology (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Psychiatry (AREA)
  • Social Psychology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Acoustics & Sound (AREA)
  • Manipulator (AREA)
  • Image Analysis (AREA)

Abstract

The invention provides an intention understanding method and system for an elderly accompanying robot, wherein the method comprises the following steps: acquiring gesture images and posture information of the old in real time, and performing image segmentation on the gesture images and the posture information to respectively form a gesture data set and a posture data set; inputting the gesture data set into a trained neural network model for gesture recognition to obtain a gesture recognition probability set, and inputting the gesture data set into a trained hidden Markov model for gesture recognition to obtain a gesture recognition probability set; performing intention fusion on the gesture recognition probability set and the gesture recognition probability set based on a fusion algorithm of a confusion matrix, and calculating weight proportions under different intentions when the two probability sets are fused by adopting an F1score under different intention classifications; and then determining the final recognition intention. Based on the method, an intention understanding system is also provided. The invention improves the intention understanding rate of the old accompanying robot system and the use satisfaction of the old for the social accompanying robot.

Description

Intention understanding method and system for elderly accompanying robot
Technical Field
The invention belongs to the technical field of training of old people, and particularly relates to an intention understanding method and system for an old accompanying robot.
Background
When aging is a problem in many countries around the world, it is difficult to give a parent the necessary care at all times because work as a child is busy. Meanwhile, through actual research on a nursing home and research on an accompanying robot by Sari Merilampi and the like, the robot accompanying is more and more approved by the old people, and the old accompanying robot provides a lot of services. However, the recognition rate and the intention understanding rate of the robot accompanying system in the intention understanding of the old people need to be improved, and particularly, the robot accompanying system has a plurality of unique characteristics in the movement of the old people, so that the interaction burden of the old people is increased when the old people use the accompanying robot, and some negative emotions of the old people are easily caused, therefore, the model fusion algorithm (SDFM) for effectively improving the intention understanding rate of the robot to the behavior of the old people is provided, and the human-computer interaction action design is applied to a real scene.
Because the deep learning model and the statistical model have advantages and disadvantages in the pattern recognition process, the statistical method is characterized by high judgment efficiency and slower judgment efficiency of the neural network, the establishment of the statistical method can be completely obtained according to the theory, and the recognition effect is obviously and more suitable for the training of the posture and posture information when the data volume is small or the training data is difficult to collect. The design of the structure and the algorithm of the neural network must depend on the experience of a designer, a high recognition effect can be ensured when the data volume is large enough, the method is suitable for gesture recognition of easily collected information, but the recognition effect is often not satisfactory when the data volume is small or training data is difficult to collect, and the success of the system has great chance.
Disclosure of Invention
In order to solve the technical problems, the invention provides an intention understanding method and system for an elderly accompanying robot, and the intention understanding method and system provided by the invention adopt a fusion scheme of fusion of an intention recognition result set and a weight matrix, wherein F1-score under each classification in a confusion matrix of a sub-model recognition result is used as a weight value to form the weight matrix, and a fuzzy evaluation operator is established by referring to a fuzzy evaluation theory to fuse two model recognition results, so that the accuracy and the sensitivity of the elderly intention figure recognition are improved.
In order to achieve the purpose, the invention adopts the following technical scheme:
an intention understanding method for an elderly accompanying robot comprises the following steps:
acquiring a behavior image of the old in real time, wherein the behavior image comprises a gesture image and gesture information, and performing image segmentation on the gesture image and the gesture information to respectively form a gesture data set and a gesture data set;
inputting the gesture data set into a trained neural network model for gesture recognition to obtain a gesture recognition probability set, and inputting the gesture data set into a trained hidden Markov model for gesture recognition to obtain a gesture recognition probability set;
performing intention fusion on the gesture recognition probability set and the gesture recognition probability set based on a fusion algorithm of a confusion matrix, and calculating weight proportion under different intentions when the gesture recognition probability set and the gesture recognition probability set are fused by adopting an F1score under different intention classifications; and then determining the final recognition intention.
Further, before the behavior image of the old person is obtained in real time, voice channel information is obtained, and keywords of the voice channel information are extracted to start the robot.
Further, the method for training the neural network model and the hidden markov model comprises:
acquiring gesture image samples and gesture information samples of a plurality of old people operating robots; the gesture image sample and the gesture information sample are divided by adopting an Otsu algorithm to form an old behavior feature set;
training the neural network model by adopting an old behavior feature set to obtain a neural network model capable of identifying gesture intentions; and training the hidden Markov model by adopting an old behavior feature set to obtain the hidden Markov model capable of identifying the gesture intention.
Further, the process of performing intent fusion on the gesture recognition probability set and the gesture recognition probability set by the fusion algorithm based on the confusion matrix is as follows:
building an intention fusion model F = F (I, cin, hin); wherein f is the model of the intent fusion; i is an intention weight matrix; cin is a gesture recognition probability set; hin is a gesture recognition probability set;
distributing weight values to the Cin to form an n x1 dimensional weight matrix Cconfi; distributing weighted values for Hin to form n x1 dimensional weighted matrix Hconfi respectively;
carrying out fuzzy change on Cconfi and Hconfi to obtain a latest intention probability matrix C; wherein C = Cconfi ° Cin + Hconfi ° Hin; where o is called a composite evaluation operator.
Further, the method for calculating the weight ratio of the gesture recognition probability set to the weight ratio of different intentions during fusion of the gesture recognition probability set by adopting the F1score under different intention classifications comprises the following steps:
assignment of F1score to different intents
Figure GDA0003840607220000031
As the weight value under each intention classification of Cin; assign to
Figure GDA0003840607220000032
As the weight value under each intention classification of Hin; wherein
Figure GDA0003840607220000033
Based on
Figure GDA0003840607220000034
Obtaining n x1 dimensional weight matrix of Cin
Figure GDA0003840607220000035
Based on
Figure GDA0003840607220000036
N x1 dimensional weight matrix of Hin can be obtained
Figure GDA0003840607220000037
Cin, hin, cconfi and Hconfi are subjected to fuzzy change to obtain a one-dimensional matrix [ lambda ] 12 ,…,λ n ] T (ii) a Selecting the maximum value gamma in the matrix i (ii) a The i intention is the final recognition intention of the user; wherein [ lambda ] 12 ,…,λ n ] T =Cin×Cconfi+Hin×Hconfi。
Further, when the robot does not complete the specified action according to the final intention:
starting target identification, and judging the distance of a specific obstacle by adopting image acquisition equipment;
after the target object reaches a designated target area, a target object is captured through voice interaction, and the initial coordinate of the target object is (x, y); the robot moves the target object to a video frame to obtain coordinates (x 1, y 1); the transformation process of the target object is (x → x1, y → y 1);
after the target object is positioned, performing grabbing operation; after the grabbing and receiving, the intention of the old people is captured in real time, and the robot is moved.
The invention also provides an intention understanding system for the elderly accompanying robot, which comprises an acquisition module, a training module and a building and calculating module;
the acquisition module is used for acquiring a behavior image of the old in real time, wherein the behavior image comprises a gesture image and gesture information, and the gesture image and the gesture information are subjected to image segmentation to respectively form a gesture data set and a gesture data set;
the training module inputs the gesture data set into a trained neural network model for gesture recognition to obtain a gesture recognition probability set, and inputs the gesture data set into a trained hidden Markov model for gesture recognition to obtain a gesture recognition probability set;
the building calculation module is used for performing intention fusion on the gesture recognition probability set and the gesture recognition probability set based on a fusion algorithm of a confusion matrix, and calculating weight proportion under different intentions when the gesture recognition probability set and the gesture recognition probability set are fused by adopting an F1score under different intention classifications; and then determining the final recognition intention.
Further, the device also comprises a starting module;
the starting module is used for acquiring voice channel information and extracting keywords of the voice channel information to start the robot.
Further, the execution process of the training module is as follows:
acquiring gesture image samples and gesture information samples of a plurality of old people operating robots; the gesture image sample and the gesture information sample are both segmented by adopting an Otsu algorithm to form an old age behavior feature set;
training the neural network model by adopting an old behavior feature set to obtain a neural network model capable of identifying gesture intentions; and training the hidden Markov model by adopting the aged behavior feature set to obtain the hidden Markov model capable of identifying the gesture intention.
Further, the building calculation module comprises a building module and a calculation module;
the process of building the module comprises the following steps: building an intention fusion model F = F (I, cin, hin); wherein f is the model of the intended fusion; i is an intention weight matrix; cin is a gesture recognition probability set; hin is a gesture recognition probability set; distributing weight values to the Cin to form an n x1 dimensional weight matrix Cconfi; distributing weighted values for Hin to form n x1 dimensional weighted matrix Hconfi respectively; carrying out fuzzy change on Ccondfi and Hcondfi to obtain a latest intention probability matrix C; wherein C = Cconfi ° Cin + Hconfi ° Hin; wherein o is called a composite evaluation operator;
the process of the calculation module is as follows: assigning values to the F1score under different intentions
Figure GDA0003840607220000041
As weight values under each intent classification of Cin; is assigned to
Figure GDA0003840607220000042
As the weight value under each intention classification of Hin; wherein
Figure GDA0003840607220000043
Figure GDA0003840607220000044
Based on
Figure GDA0003840607220000045
Obtaining n x1 dimension weight matrix of Cin
Figure GDA0003840607220000046
Based on
Figure GDA0003840607220000047
N x1 dimensional weight matrix of Hin can be obtained
Figure GDA0003840607220000048
Cin, hin, cconfi and Hconfi are subjected to fuzzy change to obtain a one-dimensional matrix [ lambda ] 12 ,…,λ n ] T (ii) a Selecting the maximum value gamma in the matrix i (ii) a The i intention is the final recognition intention of the user; wherein [ lambda ] 12 ,…,λ n ] T =Cin×Cconfi+Hin×Hconfi。
The effect provided in the summary of the invention is only the effect of the embodiment, not all the effects of the invention, and one of the above technical solutions has the following advantages or beneficial effects:
the invention provides an intention understanding method and system for an elderly accompanying robot, wherein the method comprises the following steps: acquiring a behavior image of the old in real time, wherein the behavior image comprises a gesture image and gesture information, and the gesture image and the gesture information are subjected to image segmentation to respectively form a gesture data set and a gesture data set; inputting the gesture data set into a trained neural network model for gesture recognition to obtain a gesture recognition probability set, and inputting the gesture data set into a trained hidden Markov model for gesture recognition to obtain a gesture recognition probability set; performing intention fusion on the gesture recognition probability set and the gesture recognition probability set based on a fusion algorithm of a confusion matrix, and calculating weight proportions under different intentions when the gesture recognition probability set and the gesture recognition probability set are fused by adopting an F1score under different intention classifications; and then determining the final recognition intention. Based on the intention understanding method for the old accompanying robot provided by the invention, an intention understanding system for the old accompanying robot is also provided. The invention provides a novel gesture recognition and posture recognition method based on deep learning, and solves the key problems of poor recognition rate, low robustness, poor universality and the like in the traditional gesture recognition algorithm and posture recognition algorithm. Calculating weight proportions under different intentions when the gesture recognition probability set and the gesture recognition probability set are fused by adopting the F1 score; and then final recognition intentions are determined, the current intention understanding rate of the old accompanying robot system is improved, and the use satisfaction of the old to the social accompanying robot is improved.
Drawings
Fig. 1 is a flowchart of an intention understanding method for an elderly accompanying robot according to embodiment 1 of the present invention;
FIG. 2 is a schematic diagram of a CNN neural network recognition model framework;
an HMM gesture recognition model framework schematic diagram is given in FIG. 3;
FIG. 4 is a schematic diagram of a bimodal decision-level fusion algorithm according to embodiment 1 of the present invention;
fig. 5 is a diagram of a fusion model architecture of deep learning and statistical probability provided in embodiment 1 of the present invention;
FIG. 6 is a confusion matrix of multiple classification intentions corresponding to CNN and HMM, respectively, in embodiment 1 of the present invention;
fig. 7 is a schematic view of an intention understanding system for an elderly accompanying robot in embodiment 2 of the present invention.
Detailed Description
In order to clearly explain the technical features of the present invention, the following detailed description of the present invention is provided with reference to the accompanying drawings. The following disclosure provides many different embodiments, or examples, for implementing different features of the invention. To simplify the disclosure of the present invention, specific example components and arrangements are described below. Moreover, the present disclosure may repeat reference numerals and/or letters in the various examples. This repetition is for the purpose of simplicity and clarity and does not in itself dictate a relationship between the various embodiments and/or configurations discussed. It should be noted that the components illustrated in the figures are not necessarily drawn to scale. Descriptions of well-known components and processing techniques and procedures are omitted so as to not unnecessarily limit the invention.
Example 1
According to the intention understanding method for the aged accompanying robot, which is provided by the embodiment 1 of the invention, through a fusion scheme of fusion of an intention recognition result set and a weight matrix, F1-score under each classification in a confusion matrix of sub-model recognition results is used as a weight value to form the weight matrix, and a fuzzy evaluation operator is established by referring to a fuzzy evaluation theory to fuse two model recognition results. Fig. 1 shows a flowchart of an intention understanding method for an elderly accompanying robot in embodiment 1 of the present invention.
In step S101, voice channel information is acquired, and a keyword of the voice channel information is extracted to start the robot. The invention carries out the keyword starting system of the voice capturing triggering system through the microphone of the robot, carries out the matching of word stock templates through using the Baidu voice recognition interface of the Baidu intelligent cloud, and starts the robot interaction system or carries out other man-machine interaction actions when capturing the preset keywords.
In step S102, a behavior image of the elderly person is obtained in real time, the behavior image includes a gesture image and gesture information, and the gesture image and the gesture information are both subjected to image segmentation to form a gesture data set and a gesture data set respectively.
However, most of the action data sets mainly comprise middle-aged people and people in all ages for action capture, but the actions of old people and middle-aged people and people in all ages are obviously different =, so that based on improvement of the intention recognition rate of the model for the actions of old people, image data of several gestures and postures used for operating a robot by a plurality of old people are collected, for example, 10-bit or 20-bit or more old people are collected, image segmentation is carried out on the basis of hands and body postures after extracting the old people area in the collected image data, the image segmentation is generally used as the premise of image recognition, and the image is segmented by using the saliva algorithm to form a gesture data set and a posture data set which are used for the old people and contain the gestures and the posture characteristics.
In step S103, the gesture data set is input to the trained neural network model for gesture recognition to obtain a gesture recognition probability set, and the gesture data set is input to the trained hidden markov model for gesture recognition to obtain a gesture recognition probability set.
The CNN neural network plays a better and better effect in gesture recognition. At present, static gesture recognition based on an interactive teaching platform reveals a relation between deep learning network training parameters and a model recognition rate. The gesture recognition based on the CNN neural network solves the key problems of poor recognition rate, low robustness, poor universality and the like in the traditional gesture recognition algorithm. A schematic diagram of a CNN neural network recognition model framework is given in fig. 2.
A Hidden Markov Model (HMM) is a statistical model that is used to describe a Markov process with hidden unknown parameters. The current HMM model can achieve 96% average arm motion recognition rate, and we input them into HMM-based classifier for training and recognition according to the motion feature trajectory of the user. After similar HMM models are built, model training is carried out by using the old people behavior data set, and the HMM models capable of identifying the behavior intentions of the old people are obtained. An HMM gesture recognition model framework schematic is given in fig. 3.
Training the neural network model by adopting an old behavior feature set to obtain a neural network model capable of identifying gesture intentions; and training the hidden Markov model by adopting the aged behavior feature set to obtain the hidden Markov model capable of identifying the gesture intention.
In step S104, performing intention fusion on the gesture recognition probability set and the gesture recognition probability set based on a fusion algorithm of a confusion matrix, and calculating weight proportions of the gesture recognition probability set and the gesture recognition probability set under different intentions by adopting an F1score under different intention classifications; and then determining the final recognition intention.
Building an intention fusion model F = F (I, cin, hin); wherein f is the model of the intended fusion; i is an intention weight matrix; cin is a gesture recognition probability set; hin is the gesture recognition probability set.
Distributing weight values to the Cin to form an n x1 dimensional weight matrix Cconfi; assigning weight values to the Hin to form n x1 dimensional weight matrix Hconfi respectively.
Carrying out fuzzy change on Cconfi and Hconfi to obtain a latest intention probability matrix C; wherein C = Cconfi ° Cin + Hconfi ° Hin. Where o is called a composite evaluation operator.
In the invention, the weight matrixes of two sub-models under different intention classifications are calculated, and the identification correctness of the two models under different intentions is evaluated in a mode of carrying out model evaluation by a multi-classification confusion matrix. The confusion matrix of the multi-classification task is a situation analysis table for summarizing the prediction result of the classification model in machine learning, and records in a data set are summarized in a matrix form according to two standards of real classification and classification judgment of the classification model prediction. The number of the confusion matrixes is counted, and the quality of the models is difficult to measure sometimes in the face of a large amount of data, so that the confusion matrixes extend 4 indexes on the basic statistical result, namely the accuracy, the precision, the recall rate and the specificity, and the results of the number in the confusion matrixes can be converted into the ratio between 0 and 1 through the four secondary indexes, so that the standardized measurement is facilitated. Expanding on the basis of the four indexes, another three-level index is generated, the index is called F1Score (F1 Score), is an index used for measuring the accuracy of the two classification models in statistics, gives consideration to the accuracy and the recall rate of the classification models, the F1Score is a harmonic average of the accuracy and the recall rate of the models, and the calculation formula is
Figure GDA0003840607220000081
The machine learning approach to multi-class problems, often using F1-score as the final measure, is consistent with the weight of each intent class as model fusion herein.
Assigning values to the F1score under different intentions
Figure GDA0003840607220000082
As weight values under each intent classification of Cin; is assigned to
Figure GDA0003840607220000083
As the weight value under each intention classification of Hin; wherein
Figure GDA0003840607220000084
Based on
Figure GDA0003840607220000085
Obtaining n x1 dimensional weight matrix of Cin
Figure GDA0003840607220000086
Based on
Figure GDA0003840607220000087
N x1 dimensional weight matrix of Hin can be obtained
Figure GDA0003840607220000088
Cin, hin, cconfi and Hconfi are subjected to fuzzy change to obtain a one-dimensional matrix [ lambda ] 12 ,…,λ n ] T (ii) a Selecting the maximum value gamma in the matrix i (ii) a The subscript i intent ultimately identifies the intent for the user; wherein the fusion process is as follows: [ lambda ] 12 ,…,λ n ] T = Cin × Cconfi + Hin × Hconfi. Fig. 4 is a schematic diagram of a dual-model decision-level fusion algorithm according to embodiment 1 of the present invention.
In the present invention, if the robot does not complete the specified action according to the final intention: starting target identification, and judging the distance of a specific obstacle by adopting image acquisition equipment; after the target object reaches a designated target area, a target object is captured through voice interaction, and the initial coordinate of the target object is (x, y); the robot moves the target object to a video frame to obtain coordinates (x 1, y 1); the transformation process of the target object is (x → x1, y → y 1); after the target object is positioned, performing grabbing operation; after the grabbing and receiving, the intention of the old people is captured in real time, and the robot is moved. And after the target object is handed to the old, the interaction is finished.
Fig. 5 is a diagram of a fusion model architecture for deep learning and statistical probability provided in embodiment 1 of the present invention. Firstly, the method obtains the most original voice channel information, sends the information into a preprocessing layer to perform voice system awakening and image information preprocessing, and sends the preprocessed information to a recognition layer, wherein the recognition layer comprises a CNN (hidden Markov model) trained by applying an aged people behavior data set (EIDS) and a Hidden Markov Model (HMM), two intention probability sets are obtained in real time through two submodels and provided for a model fusion layer, the real intention of a user is captured through a model fusion algorithm and is transmitted into an interactive behavior layer, and man-machine interaction is performed to finish the operation of the robot by the user to meet the requirements of the user. The system feeds back the user intention and carries out interactive action on the user intention through the pepper robot.
In the embodiment 1 of the invention, the intentions of the old people are divided into four intentions of controlling the robot, namely, controlling the robot to advance (I), stop (II), turn left (III) and turn right (IV).
Each column of the confusion matrix represents a prediction category, the total number of each column represents the number of data predicted as the category, each row represents the real attribution category of the data, in the experimental process, 4 intention classifications are adopted, 200 sample data are totally adopted, the data are divided into 4 categories and 50 sample data of each category, and a multi-category confusion matrix of two sub models is respectively established. Fig. 6 is a schematic diagram of a multi-class-intention confusion matrix corresponding to CNN and HMM respectively in embodiment 1. Then four indexes of the two confusion matrixes, namely accuracy, precision, sensitivity, recall rate and specificity, are calculated, and a formula is further utilized
Figure GDA0003840607220000091
And calculating F1-score under the two confusion matrixes as weight values under the intentions of the submodels. The following table shows the F1-SCORE for determining each intent of the submodel.
Figure GDA0003840607220000101
Example 2
An intention understanding system for an elderly accompanying robot in embodiment 2 of the present invention, as shown in fig. 7, is a schematic diagram of an intention understanding system for an elderly accompanying robot in embodiment 2 of the present invention. The system comprises an acquisition module, a training module and a building calculation module.
The acquisition module is used for acquiring behavior images of the old in real time, the behavior images comprise gesture images and posture information, and the gesture images and the posture information are subjected to image segmentation to form gesture data sets and posture data sets respectively.
The training module inputs the gesture data set into the trained neural network model to perform gesture recognition to obtain a gesture recognition probability set, and inputs the gesture data set into the trained hidden Markov model to perform gesture recognition to obtain a gesture recognition probability set.
The building calculation module is used for performing intention fusion on the gesture recognition probability set and the gesture recognition probability set based on a fusion algorithm of a confusion matrix, and calculating weight proportions under different intentions when the gesture recognition probability set and the gesture recognition probability set are fused by adopting an F1score under different intention classifications; and then determining the final recognition intention.
The system also includes a start module; the starting module is used for acquiring the voice channel information and extracting keywords of the voice channel information to start the robot.
The execution process of the training module comprises the following steps: acquiring gesture image samples and gesture information samples of a plurality of old people operating robots; and (4) segmenting the gesture image sample and the gesture information sample by adopting an Otsu algorithm to form an old age behavior feature set. Training the neural network model by adopting an old behavior feature set to obtain a neural network model capable of identifying gesture intentions; and training the hidden Markov model by adopting the aged behavior feature set to obtain the hidden Markov model capable of identifying the gesture intention.
The building calculation module comprises a building module and a calculation module; the process of building the module is as follows: building an intention fusion model F = F (I, cin, hin); wherein f is the model of the intended fusion; i is an intention weight matrix; cin is a gesture recognition probability set; hin is a gesture recognition probability set; distributing weight values to the Cin to form an n x1 dimensional weight matrix Cconfi; distributing weight values to Hin to form n x1 dimensional weight matrix Hconfi respectively; carrying out fuzzy change on Ccondfi and Hcondfi to obtain a latest intention probability matrix C; wherein C = Cconfi omicron Cin + Hconfi omicron; where o is called a composite evaluation operator. The process of the calculation module is as follows: assignment of F1score to different intents
Figure GDA0003840607220000111
As weight values under each intent classification of Cin; is assigned to
Figure GDA0003840607220000112
As the weight value under each intention classification of Hin; wherein
Figure GDA0003840607220000113
Figure GDA0003840607220000114
Based on
Figure GDA0003840607220000115
Obtaining n x1 dimensional weight matrix of Cin
Figure GDA0003840607220000116
Based on
Figure GDA0003840607220000117
N x1 dimensional weight matrix of Hin can be obtained
Figure GDA0003840607220000118
Carrying out fuzzy change on Cin, hin, ccnfi and Hconfi to obtain a one-dimensional matrix [ lambda ] 12 ,…,λ n ] T (ii) a Selecting the maximum value gamma in the matrix i (ii) a The i intention is the final recognition intention of the user; wherein [ lambda ] 12 ,…,λ n ] T =Cin×Cconfi+Hin×Hconfi。
The invention improves the current intention understanding rate of the old accompanying robot system and improves the use satisfaction of the old to the social accompanying robot.
Although the embodiments of the present invention have been described with reference to the accompanying drawings, the scope of the present invention is not limited thereto. Various modifications and alterations will occur to those skilled in the art based on the foregoing description. And are neither required nor exhaustive of all embodiments. On the basis of the technical solution of the present invention, those skilled in the art can make various modifications or variations without creative efforts and still be within the scope of the present invention.

Claims (7)

1. An intention understanding method for an elderly accompanying robot is characterized by comprising the following steps:
acquiring a behavior image of the old in real time, wherein the behavior image comprises a gesture image and gesture information, and performing image segmentation on the gesture image and the gesture information to respectively form a gesture data set and a gesture data set;
inputting the gesture data set into a trained neural network model for gesture recognition to obtain a gesture recognition probability set, and inputting the gesture data set into a trained hidden Markov model for gesture recognition to obtain a gesture recognition probability set;
performing intention fusion on the gesture recognition probability set and the gesture recognition probability set based on a fusion algorithm of a confusion matrix, and calculating weight proportions under different intentions when the gesture recognition probability set and the gesture recognition probability set are fused by adopting an F1score under different intention classifications; further determining a final recognition intention; the process of the confusion matrix-based fusion algorithm for performing intention fusion on the gesture recognition probability set and the gesture recognition probability set is as follows:
building an intention fusion model F = F (I, cin, hin); wherein f is the model of the intent fusion; i is an intention weight matrix; cin is a gesture recognition probability set; hin is a posture recognition probability set;
distributing weight values to the Cin to form an n x1 dimensional weight matrix Cconfi; distributing weight values to Hin to form n x1 dimensional weight matrix Hconfi respectively;
carrying out fuzzy change on Ccondfi and Hcondfi to obtain a latest intention probability matrix C; wherein C = Cconfi omicron Cin + Hconfi omicron; wherein o is called a composite evaluation operator;
under different intention classifications, the method for calculating the weight ratio of the gesture recognition probability set to the weight ratio of different intentions during fusion of the gesture recognition probability set by adopting the F1score comprises the following steps:
assignment of F1score to different intents
Figure FDA0003840607210000011
As weight values under each intent classification of Cin; is assigned to
Figure FDA0003840607210000012
As the weight value under each intention classification of Hin; wherein
Figure FDA0003840607210000013
Based on
Figure FDA0003840607210000014
Obtaining n x1 dimension weight matrix of Cin
Figure FDA0003840607210000015
Based on
Figure FDA0003840607210000016
N x1 dimensional weight matrix of Hin can be obtained
Figure FDA0003840607210000017
Carrying out fuzzy change on Cin, hin, ccnfi and Hconfi to obtain a one-dimensional matrix [ lambda ] 12 ,…,λ n ] T (ii) a Selecting the maximum value gamma in the matrix i (ii) a The i intention is the final recognition intention of the user; wherein [ lambda ] 12 ,…,λ n ] T =Cin×Cconfi+Hin×Hconfi。
2. The method for understanding the intention of the elderly accompanying robot as claimed in claim 1, further comprising obtaining voice channel information and extracting keywords of the voice channel information to start the robot before the real-time obtaining of the behavior image of the elderly.
3. The method for understanding the intention of an elderly accompanying robot as claimed in claim 1, wherein the method for training the neural network model and the hidden markov model comprises:
acquiring gesture image samples and gesture information samples of a plurality of old people operating robots; the gesture image sample and the gesture information sample are both segmented by adopting an Otsu algorithm to form an old age behavior feature set;
training the neural network model by adopting an old behavior feature set to obtain a neural network model capable of identifying gesture intentions; and training the hidden Markov model by adopting an old behavior feature set to obtain the hidden Markov model capable of identifying the gesture intention.
4. The intent understanding method for an elderly accompanying robot as claimed in claim 1, wherein when the robot does not complete the designated action according to the final intent:
starting target identification, and judging the distance of a specific obstacle by adopting image acquisition equipment;
after the target object reaches a designated target area, a target object is captured through voice interaction, and the initial coordinate of the target object is (x, y); the robot moves the target object into the video frame to obtain coordinates (x 1, y 1); the transformation process of the target object is (x → x1, y → y 1);
after the target object is positioned, performing grabbing operation; after the grabbing and receiving, the intention of the old people is captured in real time, and the robot is moved.
5. An intention understanding system for an elderly accompanying robot is characterized by comprising an acquisition module, a training module and a building calculation module;
the acquisition module is used for acquiring a behavior image of the old in real time, wherein the behavior image comprises a gesture image and gesture information, and the gesture image and the gesture information are subjected to image segmentation to respectively form a gesture data set and a gesture data set;
the training module inputs the gesture data set into a trained neural network model for gesture recognition to obtain a gesture recognition probability set, and inputs the gesture data set into a trained hidden Markov model for gesture recognition to obtain a gesture recognition probability set;
the building calculation module is used for performing intention fusion on the gesture recognition probability set and the gesture recognition probability set based on a fusion algorithm of a confusion matrix, and calculating weight proportion under different intentions when the gesture recognition probability set and the gesture recognition probability set are fused by adopting an F1score under different intention classifications; further determining a final recognition intention; the building calculation module comprises a building module and a calculation module;
the process of building the module comprises the following steps: constructing an intention fusion model F = F (I, cin, hin); wherein f is the model of the intent fusion; i is an intention weight matrix; cin is a gesture recognition probability set; hin is a gesture recognition probability set; distributing weight values to the Cin to form an n x1 dimensional weight matrix Cconfi; distributing weight values to Hin to form n x1 dimensional weight matrix Hconfi respectively; carrying out fuzzy change on Ccondfi and Hcondfi to obtain a latest intention probability matrix C; wherein C = Cconfi omicron Cin + Hconfi omicron; wherein o is called a composite evaluation operator;
the process of the calculation module is as follows: assignment of F1score to different intents
Figure FDA0003840607210000031
As the weight value under each intention classification of Cin; assign to
Figure FDA0003840607210000032
As the weight value under each intention classification of Hin; wherein
Figure FDA0003840607210000033
Figure FDA0003840607210000034
Based on
Figure FDA0003840607210000035
Obtaining n x1 dimension weight matrix of Cin
Figure FDA0003840607210000036
Based on
Figure FDA0003840607210000037
N x1 dimensional weight matrix of Hin can be obtained
Figure FDA0003840607210000038
Cin, hin, cconfi and Hconfi are subjected to fuzzy change to obtain a one-dimensional matrix [ lambda ] 12 ,…,λ n ] T (ii) a Selecting the maximum value gamma in the matrix i (ii) a The i intention is the final recognition intention of the user; wherein [ lambda ] 12 ,…,λ n ] T =Cin×Cconfi+Hin×Hconfi。
6. The elderly accompanying robot oriented intention understanding system of claim 5, further comprising a starting module;
the starting module is used for acquiring voice channel information and extracting keywords of the voice channel information to start the robot.
7. The elderly accompanying and attending robot oriented intention understanding system as claimed in claim 5, wherein the training module is implemented by the following steps:
acquiring gesture image samples and gesture information samples of a plurality of old people operating robots; the gesture image sample and the gesture information sample are divided by adopting an Otsu algorithm to form an old behavior feature set;
training the neural network model by adopting an old behavior feature set to obtain a neural network model capable of identifying gesture intentions; and training the hidden Markov model by adopting an old behavior feature set to obtain the hidden Markov model capable of identifying the gesture intention.
CN202010970662.7A 2020-09-15 2020-09-15 Intention understanding method and system for elderly accompanying robot Active CN112101219B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010970662.7A CN112101219B (en) 2020-09-15 2020-09-15 Intention understanding method and system for elderly accompanying robot

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010970662.7A CN112101219B (en) 2020-09-15 2020-09-15 Intention understanding method and system for elderly accompanying robot

Publications (2)

Publication Number Publication Date
CN112101219A CN112101219A (en) 2020-12-18
CN112101219B true CN112101219B (en) 2022-11-04

Family

ID=73759249

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010970662.7A Active CN112101219B (en) 2020-09-15 2020-09-15 Intention understanding method and system for elderly accompanying robot

Country Status (1)

Country Link
CN (1) CN112101219B (en)

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112684711B (en) * 2020-12-24 2022-10-11 青岛理工大学 Interactive recognition method for human behavior and intention
CN112766041B (en) * 2020-12-25 2022-04-22 北京理工大学 Method for identifying hand washing action of senile dementia patient based on inertial sensing signal
CN113780750B (en) * 2021-08-18 2024-03-01 同济大学 Medical risk assessment method and device based on medical image segmentation
CN113705440B (en) * 2021-08-27 2023-09-01 华中师范大学 Head posture estimation method and system for visual understanding of educational robot
CN113848790A (en) * 2021-09-28 2021-12-28 德州学院 Intelligent nursing type robot system and control method thereof
CN114092967A (en) * 2021-11-19 2022-02-25 济南大学 Real-time multi-mode accompanying robot intention understanding method and system
CN116028880B (en) * 2023-02-07 2023-07-04 支付宝(杭州)信息技术有限公司 Method for training behavior intention recognition model, behavior intention recognition method and device

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103593680A (en) * 2013-11-19 2014-02-19 南京大学 Dynamic hand gesture recognition method based on self incremental learning of hidden Markov model
CN105787471A (en) * 2016-03-25 2016-07-20 南京邮电大学 Gesture identification method applied to control of mobile service robot for elder and disabled
CN108986801A (en) * 2017-06-02 2018-12-11 腾讯科技(深圳)有限公司 A kind of man-machine interaction method, device and human-computer interaction terminal
WO2019204186A1 (en) * 2018-04-18 2019-10-24 Sony Interactive Entertainment Inc. Integrated understanding of user characteristics by multimodal processing
CN110554774A (en) * 2019-07-22 2019-12-10 济南大学 AR-oriented navigation type interactive normal form system
CN110717381A (en) * 2019-08-28 2020-01-21 北京航空航天大学 Human intention understanding method facing human-computer cooperation and based on deep stacking Bi-LSTM
CN111222341A (en) * 2020-01-16 2020-06-02 中国平安人寿保险股份有限公司 Method, device, equipment and storage medium for training hidden Markov model
CN111582108A (en) * 2020-04-28 2020-08-25 河北工业大学 Gait recognition and intention perception method
CN111596767A (en) * 2020-05-27 2020-08-28 广州市大湾区虚拟现实研究院 Gesture capturing method and device based on virtual reality

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190272764A1 (en) * 2018-03-03 2019-09-05 Act, Inc. Multidimensional assessment scoring using machine learning

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103593680A (en) * 2013-11-19 2014-02-19 南京大学 Dynamic hand gesture recognition method based on self incremental learning of hidden Markov model
CN105787471A (en) * 2016-03-25 2016-07-20 南京邮电大学 Gesture identification method applied to control of mobile service robot for elder and disabled
CN108986801A (en) * 2017-06-02 2018-12-11 腾讯科技(深圳)有限公司 A kind of man-machine interaction method, device and human-computer interaction terminal
WO2019204186A1 (en) * 2018-04-18 2019-10-24 Sony Interactive Entertainment Inc. Integrated understanding of user characteristics by multimodal processing
CN110554774A (en) * 2019-07-22 2019-12-10 济南大学 AR-oriented navigation type interactive normal form system
CN110717381A (en) * 2019-08-28 2020-01-21 北京航空航天大学 Human intention understanding method facing human-computer cooperation and based on deep stacking Bi-LSTM
CN111222341A (en) * 2020-01-16 2020-06-02 中国平安人寿保险股份有限公司 Method, device, equipment and storage medium for training hidden Markov model
CN111582108A (en) * 2020-04-28 2020-08-25 河北工业大学 Gait recognition and intention perception method
CN111596767A (en) * 2020-05-27 2020-08-28 广州市大湾区虚拟现实研究院 Gesture capturing method and device based on virtual reality

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
A Method of Fusing Gesture and Speech for Human-robot Interaction;Junhong Meng et al.;《ICCDE 2020》;20200307;全文 *
A Multimodal Framework Based on Integration of Cortical and Muscular Activities for Decoding Human Intentions About Lower Limb Motions;Chengkun Cui et al.;《IEEE TRANSACTIONS ON BIOMEDICAL CIRCUITS AND SYSTEMS》;20170831;全文 *
Continuous action segmentation and recognition using hybrid convolutional neural network-hidden Markov model model;Jun Lei et al.;《IET Computer Vision》;20161231;全文 *
Data fusion methods in multimodal human computer dialog;Ming-Hao YANG et al.;《虚拟现实与智能硬件》;20191231;全文 *
基于 GA - BP 神经网络的接触式人机协作意图理解方法研究;张蕊 等;《组合机床与自动化加工技术》;20191130(第11期);全文 *

Also Published As

Publication number Publication date
CN112101219A (en) 2020-12-18

Similar Documents

Publication Publication Date Title
CN112101219B (en) Intention understanding method and system for elderly accompanying robot
CN104077579B (en) Facial expression recognition method based on expert system
CN105739688A (en) Man-machine interaction method and device based on emotion system, and man-machine interaction system
CN109101108B (en) Method and system for optimizing human-computer interaction interface of intelligent cabin based on three decisions
CN110781829A (en) Light-weight deep learning intelligent business hall face recognition method
CN111402928B (en) Attention-based speech emotion state evaluation method, device, medium and equipment
US20230206928A1 (en) Audio processing method and apparatus
CN107016046A (en) The intelligent robot dialogue method and system of view-based access control model displaying
CN111666845B (en) Small sample deep learning multi-mode sign language recognition method based on key frame sampling
CN104156729B (en) A kind of classroom demographic method
CN111274978B (en) Micro expression recognition method and device
CN112101243A (en) Human body action recognition method based on key posture and DTW
CN111128157A (en) Wake-up-free voice recognition control method for intelligent household appliance, computer readable storage medium and air conditioner
CN111444488A (en) Identity authentication method based on dynamic gesture
CN111428666A (en) Intelligent family accompanying robot system and method based on rapid face detection
CN111128240B (en) Voice emotion recognition method based on anti-semantic-erasure
CN114495211A (en) Micro-expression identification method, system and computer medium based on graph convolution network
CN114093028A (en) Human-computer cooperation method and system based on intention analysis and robot
WO2024001539A1 (en) Speaking state recognition method and apparatus, model training method and apparatus, vehicle, medium, computer program and computer program product
CN112580527A (en) Facial expression recognition method based on convolution long-term and short-term memory network
CN111339878A (en) Eye movement data-based correction type real-time emotion recognition method and system
CN111191510A (en) Relation network-based remote sensing image small sample target identification method in complex scene
CN111898473B (en) Driver state real-time monitoring method based on deep learning
CN114663910A (en) Multi-mode learning state analysis system
CN109977777B (en) Novel RF-Net model-based gesture recognition method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant