CN112101219B

CN112101219B - Intention understanding method and system for elderly accompanying robot

Info

Publication number: CN112101219B
Application number: CN202010970662.7A
Authority: CN
Inventors: 冯志全; 豆少松; 郭庆北; 杨晓晖; 徐涛; 田京兰; 范雪
Original assignee: University of Jinan
Current assignee: University of Jinan
Priority date: 2020-09-15
Filing date: 2020-09-15
Publication date: 2022-11-04
Anticipated expiration: 2040-09-15
Also published as: CN112101219A

Abstract

The invention provides an intention understanding method and system for an elderly accompanying robot, wherein the method comprises the following steps: acquiring gesture images and posture information of the old in real time, and performing image segmentation on the gesture images and the posture information to respectively form a gesture data set and a posture data set; inputting the gesture data set into a trained neural network model for gesture recognition to obtain a gesture recognition probability set, and inputting the gesture data set into a trained hidden Markov model for gesture recognition to obtain a gesture recognition probability set; performing intention fusion on the gesture recognition probability set and the gesture recognition probability set based on a fusion algorithm of a confusion matrix, and calculating weight proportions under different intentions when the two probability sets are fused by adopting an F1score under different intention classifications; and then determining the final recognition intention. Based on the method, an intention understanding system is also provided. The invention improves the intention understanding rate of the old accompanying robot system and the use satisfaction of the old for the social accompanying robot.

Description

Intention understanding method and system for elderly accompanying robot

Technical Field

The invention belongs to the technical field of training of old people, and particularly relates to an intention understanding method and system for an old accompanying robot.

Background

When aging is a problem in many countries around the world, it is difficult to give a parent the necessary care at all times because work as a child is busy. Meanwhile, through actual research on a nursing home and research on an accompanying robot by Sari Merilampi and the like, the robot accompanying is more and more approved by the old people, and the old accompanying robot provides a lot of services. However, the recognition rate and the intention understanding rate of the robot accompanying system in the intention understanding of the old people need to be improved, and particularly, the robot accompanying system has a plurality of unique characteristics in the movement of the old people, so that the interaction burden of the old people is increased when the old people use the accompanying robot, and some negative emotions of the old people are easily caused, therefore, the model fusion algorithm (SDFM) for effectively improving the intention understanding rate of the robot to the behavior of the old people is provided, and the human-computer interaction action design is applied to a real scene.

Because the deep learning model and the statistical model have advantages and disadvantages in the pattern recognition process, the statistical method is characterized by high judgment efficiency and slower judgment efficiency of the neural network, the establishment of the statistical method can be completely obtained according to the theory, and the recognition effect is obviously and more suitable for the training of the posture and posture information when the data volume is small or the training data is difficult to collect. The design of the structure and the algorithm of the neural network must depend on the experience of a designer, a high recognition effect can be ensured when the data volume is large enough, the method is suitable for gesture recognition of easily collected information, but the recognition effect is often not satisfactory when the data volume is small or training data is difficult to collect, and the success of the system has great chance.

Disclosure of Invention

In order to solve the technical problems, the invention provides an intention understanding method and system for an elderly accompanying robot, and the intention understanding method and system provided by the invention adopt a fusion scheme of fusion of an intention recognition result set and a weight matrix, wherein F1-score under each classification in a confusion matrix of a sub-model recognition result is used as a weight value to form the weight matrix, and a fuzzy evaluation operator is established by referring to a fuzzy evaluation theory to fuse two model recognition results, so that the accuracy and the sensitivity of the elderly intention figure recognition are improved.

In order to achieve the purpose, the invention adopts the following technical scheme:

an intention understanding method for an elderly accompanying robot comprises the following steps:

acquiring a behavior image of the old in real time, wherein the behavior image comprises a gesture image and gesture information, and performing image segmentation on the gesture image and the gesture information to respectively form a gesture data set and a gesture data set;

inputting the gesture data set into a trained neural network model for gesture recognition to obtain a gesture recognition probability set, and inputting the gesture data set into a trained hidden Markov model for gesture recognition to obtain a gesture recognition probability set;

performing intention fusion on the gesture recognition probability set and the gesture recognition probability set based on a fusion algorithm of a confusion matrix, and calculating weight proportion under different intentions when the gesture recognition probability set and the gesture recognition probability set are fused by adopting an F1score under different intention classifications; and then determining the final recognition intention.

Further, before the behavior image of the old person is obtained in real time, voice channel information is obtained, and keywords of the voice channel information are extracted to start the robot.

Further, the method for training the neural network model and the hidden markov model comprises:

acquiring gesture image samples and gesture information samples of a plurality of old people operating robots; the gesture image sample and the gesture information sample are divided by adopting an Otsu algorithm to form an old behavior feature set;

training the neural network model by adopting an old behavior feature set to obtain a neural network model capable of identifying gesture intentions; and training the hidden Markov model by adopting an old behavior feature set to obtain the hidden Markov model capable of identifying the gesture intention.

Further, the process of performing intent fusion on the gesture recognition probability set and the gesture recognition probability set by the fusion algorithm based on the confusion matrix is as follows:

building an intention fusion model F = F (I, cin, hin); wherein f is the model of the intent fusion; i is an intention weight matrix; cin is a gesture recognition probability set; hin is a gesture recognition probability set;

distributing weight values to the Cin to form an n x1 dimensional weight matrix Cconfi; distributing weighted values for Hin to form n x1 dimensional weighted matrix Hconfi respectively;

carrying out fuzzy change on Cconfi and Hconfi to obtain a latest intention probability matrix C; wherein C = Cconfi ° Cin + Hconfi ° Hin; where o is called a composite evaluation operator.

Further, the method for calculating the weight ratio of the gesture recognition probability set to the weight ratio of different intentions during fusion of the gesture recognition probability set by adopting the F1score under different intention classifications comprises the following steps:

assignment of F1score to different intents

As the weight value under each intention classification of Cin; assign to

As the weight value under each intention classification of Hin; wherein

Based on

Obtaining n x1 dimensional weight matrix of Cin

Based on

N x1 dimensional weight matrix of Hin can be obtained

Cin, hin, cconfi and Hconfi are subjected to fuzzy change to obtain a one-dimensional matrix [ lambda ] ₁ ,λ ₂ ,…,λ _n ] ^T (ii) a Selecting the maximum value gamma in the matrix _i (ii) a The i intention is the final recognition intention of the user; wherein [ lambda ] ₁ ,λ ₂ ,…,λ _n ] ^T ＝Cin×Cconfi+Hin×Hconfi。

Further, when the robot does not complete the specified action according to the final intention:

starting target identification, and judging the distance of a specific obstacle by adopting image acquisition equipment;

after the target object reaches a designated target area, a target object is captured through voice interaction, and the initial coordinate of the target object is (x, y); the robot moves the target object to a video frame to obtain coordinates (x 1, y 1); the transformation process of the target object is (x → x1, y → y 1);

after the target object is positioned, performing grabbing operation; after the grabbing and receiving, the intention of the old people is captured in real time, and the robot is moved.

The invention also provides an intention understanding system for the elderly accompanying robot, which comprises an acquisition module, a training module and a building and calculating module;

the acquisition module is used for acquiring a behavior image of the old in real time, wherein the behavior image comprises a gesture image and gesture information, and the gesture image and the gesture information are subjected to image segmentation to respectively form a gesture data set and a gesture data set;

the training module inputs the gesture data set into a trained neural network model for gesture recognition to obtain a gesture recognition probability set, and inputs the gesture data set into a trained hidden Markov model for gesture recognition to obtain a gesture recognition probability set;

the building calculation module is used for performing intention fusion on the gesture recognition probability set and the gesture recognition probability set based on a fusion algorithm of a confusion matrix, and calculating weight proportion under different intentions when the gesture recognition probability set and the gesture recognition probability set are fused by adopting an F1score under different intention classifications; and then determining the final recognition intention.

Further, the device also comprises a starting module;

the starting module is used for acquiring voice channel information and extracting keywords of the voice channel information to start the robot.

Further, the execution process of the training module is as follows:

acquiring gesture image samples and gesture information samples of a plurality of old people operating robots; the gesture image sample and the gesture information sample are both segmented by adopting an Otsu algorithm to form an old age behavior feature set;

training the neural network model by adopting an old behavior feature set to obtain a neural network model capable of identifying gesture intentions; and training the hidden Markov model by adopting the aged behavior feature set to obtain the hidden Markov model capable of identifying the gesture intention.

Further, the building calculation module comprises a building module and a calculation module;

the process of building the module comprises the following steps: building an intention fusion model F = F (I, cin, hin); wherein f is the model of the intended fusion; i is an intention weight matrix; cin is a gesture recognition probability set; hin is a gesture recognition probability set; distributing weight values to the Cin to form an n x1 dimensional weight matrix Cconfi; distributing weighted values for Hin to form n x1 dimensional weighted matrix Hconfi respectively; carrying out fuzzy change on Ccondfi and Hcondfi to obtain a latest intention probability matrix C; wherein C = Cconfi ° Cin + Hconfi ° Hin; wherein o is called a composite evaluation operator;

the process of the calculation module is as follows: assigning values to the F1score under different intentions

As weight values under each intent classification of Cin; is assigned to

As the weight value under each intention classification of Hin; wherein

Based on

Obtaining n x1 dimension weight matrix of Cin

Based on

N x1 dimensional weight matrix of Hin can be obtained

The effect provided in the summary of the invention is only the effect of the embodiment, not all the effects of the invention, and one of the above technical solutions has the following advantages or beneficial effects:

the invention provides an intention understanding method and system for an elderly accompanying robot, wherein the method comprises the following steps: acquiring a behavior image of the old in real time, wherein the behavior image comprises a gesture image and gesture information, and the gesture image and the gesture information are subjected to image segmentation to respectively form a gesture data set and a gesture data set; inputting the gesture data set into a trained neural network model for gesture recognition to obtain a gesture recognition probability set, and inputting the gesture data set into a trained hidden Markov model for gesture recognition to obtain a gesture recognition probability set; performing intention fusion on the gesture recognition probability set and the gesture recognition probability set based on a fusion algorithm of a confusion matrix, and calculating weight proportions under different intentions when the gesture recognition probability set and the gesture recognition probability set are fused by adopting an F1score under different intention classifications; and then determining the final recognition intention. Based on the intention understanding method for the old accompanying robot provided by the invention, an intention understanding system for the old accompanying robot is also provided. The invention provides a novel gesture recognition and posture recognition method based on deep learning, and solves the key problems of poor recognition rate, low robustness, poor universality and the like in the traditional gesture recognition algorithm and posture recognition algorithm. Calculating weight proportions under different intentions when the gesture recognition probability set and the gesture recognition probability set are fused by adopting the F1 score; and then final recognition intentions are determined, the current intention understanding rate of the old accompanying robot system is improved, and the use satisfaction of the old to the social accompanying robot is improved.

Drawings

Fig. 1 is a flowchart of an intention understanding method for an elderly accompanying robot according to embodiment 1 of the present invention;

FIG. 2 is a schematic diagram of a CNN neural network recognition model framework;

an HMM gesture recognition model framework schematic diagram is given in FIG. 3;

FIG. 4 is a schematic diagram of a bimodal decision-level fusion algorithm according to embodiment 1 of the present invention;

fig. 5 is a diagram of a fusion model architecture of deep learning and statistical probability provided in embodiment 1 of the present invention;

FIG. 6 is a confusion matrix of multiple classification intentions corresponding to CNN and HMM, respectively, in embodiment 1 of the present invention;

fig. 7 is a schematic view of an intention understanding system for an elderly accompanying robot in embodiment 2 of the present invention.

Detailed Description

In order to clearly explain the technical features of the present invention, the following detailed description of the present invention is provided with reference to the accompanying drawings. The following disclosure provides many different embodiments, or examples, for implementing different features of the invention. To simplify the disclosure of the present invention, specific example components and arrangements are described below. Moreover, the present disclosure may repeat reference numerals and/or letters in the various examples. This repetition is for the purpose of simplicity and clarity and does not in itself dictate a relationship between the various embodiments and/or configurations discussed. It should be noted that the components illustrated in the figures are not necessarily drawn to scale. Descriptions of well-known components and processing techniques and procedures are omitted so as to not unnecessarily limit the invention.

Example 1

According to the intention understanding method for the aged accompanying robot, which is provided by the embodiment 1 of the invention, through a fusion scheme of fusion of an intention recognition result set and a weight matrix, F1-score under each classification in a confusion matrix of sub-model recognition results is used as a weight value to form the weight matrix, and a fuzzy evaluation operator is established by referring to a fuzzy evaluation theory to fuse two model recognition results. Fig. 1 shows a flowchart of an intention understanding method for an elderly accompanying robot in embodiment 1 of the present invention.

In step S101, voice channel information is acquired, and a keyword of the voice channel information is extracted to start the robot. The invention carries out the keyword starting system of the voice capturing triggering system through the microphone of the robot, carries out the matching of word stock templates through using the Baidu voice recognition interface of the Baidu intelligent cloud, and starts the robot interaction system or carries out other man-machine interaction actions when capturing the preset keywords.

In step S102, a behavior image of the elderly person is obtained in real time, the behavior image includes a gesture image and gesture information, and the gesture image and the gesture information are both subjected to image segmentation to form a gesture data set and a gesture data set respectively.

However, most of the action data sets mainly comprise middle-aged people and people in all ages for action capture, but the actions of old people and middle-aged people and people in all ages are obviously different =, so that based on improvement of the intention recognition rate of the model for the actions of old people, image data of several gestures and postures used for operating a robot by a plurality of old people are collected, for example, 10-bit or 20-bit or more old people are collected, image segmentation is carried out on the basis of hands and body postures after extracting the old people area in the collected image data, the image segmentation is generally used as the premise of image recognition, and the image is segmented by using the saliva algorithm to form a gesture data set and a posture data set which are used for the old people and contain the gestures and the posture characteristics.

In step S103, the gesture data set is input to the trained neural network model for gesture recognition to obtain a gesture recognition probability set, and the gesture data set is input to the trained hidden markov model for gesture recognition to obtain a gesture recognition probability set.

The CNN neural network plays a better and better effect in gesture recognition. At present, static gesture recognition based on an interactive teaching platform reveals a relation between deep learning network training parameters and a model recognition rate. The gesture recognition based on the CNN neural network solves the key problems of poor recognition rate, low robustness, poor universality and the like in the traditional gesture recognition algorithm. A schematic diagram of a CNN neural network recognition model framework is given in fig. 2.

A Hidden Markov Model (HMM) is a statistical model that is used to describe a Markov process with hidden unknown parameters. The current HMM model can achieve 96% average arm motion recognition rate, and we input them into HMM-based classifier for training and recognition according to the motion feature trajectory of the user. After similar HMM models are built, model training is carried out by using the old people behavior data set, and the HMM models capable of identifying the behavior intentions of the old people are obtained. An HMM gesture recognition model framework schematic is given in fig. 3.

In step S104, performing intention fusion on the gesture recognition probability set and the gesture recognition probability set based on a fusion algorithm of a confusion matrix, and calculating weight proportions of the gesture recognition probability set and the gesture recognition probability set under different intentions by adopting an F1score under different intention classifications; and then determining the final recognition intention.

Building an intention fusion model F = F (I, cin, hin); wherein f is the model of the intended fusion; i is an intention weight matrix; cin is a gesture recognition probability set; hin is the gesture recognition probability set.

Distributing weight values to the Cin to form an n x1 dimensional weight matrix Cconfi; assigning weight values to the Hin to form n x1 dimensional weight matrix Hconfi respectively.

Carrying out fuzzy change on Cconfi and Hconfi to obtain a latest intention probability matrix C; wherein C = Cconfi ° Cin + Hconfi ° Hin. Where o is called a composite evaluation operator.

In the invention, the weight matrixes of two sub-models under different intention classifications are calculated, and the identification correctness of the two models under different intentions is evaluated in a mode of carrying out model evaluation by a multi-classification confusion matrix. The confusion matrix of the multi-classification task is a situation analysis table for summarizing the prediction result of the classification model in machine learning, and records in a data set are summarized in a matrix form according to two standards of real classification and classification judgment of the classification model prediction. The number of the confusion matrixes is counted, and the quality of the models is difficult to measure sometimes in the face of a large amount of data, so that the confusion matrixes extend 4 indexes on the basic statistical result, namely the accuracy, the precision, the recall rate and the specificity, and the results of the number in the confusion matrixes can be converted into the ratio between 0 and 1 through the four secondary indexes, so that the standardized measurement is facilitated. Expanding on the basis of the four indexes, another three-level index is generated, the index is called F1Score (F1 Score), is an index used for measuring the accuracy of the two classification models in statistics, gives consideration to the accuracy and the recall rate of the classification models, the F1Score is a harmonic average of the accuracy and the recall rate of the models, and the calculation formula is

The machine learning approach to multi-class problems, often using F1-score as the final measure, is consistent with the weight of each intent class as model fusion herein.

Assigning values to the F1score under different intentions

As weight values under each intent classification of Cin; is assigned to

As the weight value under each intention classification of Hin; wherein

Based on

Obtaining n x1 dimensional weight matrix of Cin

Based on

N x1 dimensional weight matrix of Hin can be obtained

Cin, hin, cconfi and Hconfi are subjected to fuzzy change to obtain a one-dimensional matrix [ lambda ] ₁ ,λ ₂ ,…,λ _n ] ^T (ii) a Selecting the maximum value gamma in the matrix _i (ii) a The subscript i intent ultimately identifies the intent for the user; wherein the fusion process is as follows: [ lambda ] ₁ ,λ ₂ ,…,λ _n ] ^T = Cin × Cconfi + Hin × Hconfi. Fig. 4 is a schematic diagram of a dual-model decision-level fusion algorithm according to embodiment 1 of the present invention.

In the present invention, if the robot does not complete the specified action according to the final intention: starting target identification, and judging the distance of a specific obstacle by adopting image acquisition equipment; after the target object reaches a designated target area, a target object is captured through voice interaction, and the initial coordinate of the target object is (x, y); the robot moves the target object to a video frame to obtain coordinates (x 1, y 1); the transformation process of the target object is (x → x1, y → y 1); after the target object is positioned, performing grabbing operation; after the grabbing and receiving, the intention of the old people is captured in real time, and the robot is moved. And after the target object is handed to the old, the interaction is finished.

Fig. 5 is a diagram of a fusion model architecture for deep learning and statistical probability provided in embodiment 1 of the present invention. Firstly, the method obtains the most original voice channel information, sends the information into a preprocessing layer to perform voice system awakening and image information preprocessing, and sends the preprocessed information to a recognition layer, wherein the recognition layer comprises a CNN (hidden Markov model) trained by applying an aged people behavior data set (EIDS) and a Hidden Markov Model (HMM), two intention probability sets are obtained in real time through two submodels and provided for a model fusion layer, the real intention of a user is captured through a model fusion algorithm and is transmitted into an interactive behavior layer, and man-machine interaction is performed to finish the operation of the robot by the user to meet the requirements of the user. The system feeds back the user intention and carries out interactive action on the user intention through the pepper robot.

In the embodiment 1 of the invention, the intentions of the old people are divided into four intentions of controlling the robot, namely, controlling the robot to advance (I), stop (II), turn left (III) and turn right (IV).

Each column of the confusion matrix represents a prediction category, the total number of each column represents the number of data predicted as the category, each row represents the real attribution category of the data, in the experimental process, 4 intention classifications are adopted, 200 sample data are totally adopted, the data are divided into 4 categories and 50 sample data of each category, and a multi-category confusion matrix of two sub models is respectively established. Fig. 6 is a schematic diagram of a multi-class-intention confusion matrix corresponding to CNN and HMM respectively in embodiment 1. Then four indexes of the two confusion matrixes, namely accuracy, precision, sensitivity, recall rate and specificity, are calculated, and a formula is further utilized

And calculating F1-score under the two confusion matrixes as weight values under the intentions of the submodels. The following table shows the F1-SCORE for determining each intent of the submodel.

Example 2

An intention understanding system for an elderly accompanying robot in embodiment 2 of the present invention, as shown in fig. 7, is a schematic diagram of an intention understanding system for an elderly accompanying robot in embodiment 2 of the present invention. The system comprises an acquisition module, a training module and a building calculation module.

The acquisition module is used for acquiring behavior images of the old in real time, the behavior images comprise gesture images and posture information, and the gesture images and the posture information are subjected to image segmentation to form gesture data sets and posture data sets respectively.

The training module inputs the gesture data set into the trained neural network model to perform gesture recognition to obtain a gesture recognition probability set, and inputs the gesture data set into the trained hidden Markov model to perform gesture recognition to obtain a gesture recognition probability set.

The building calculation module is used for performing intention fusion on the gesture recognition probability set and the gesture recognition probability set based on a fusion algorithm of a confusion matrix, and calculating weight proportions under different intentions when the gesture recognition probability set and the gesture recognition probability set are fused by adopting an F1score under different intention classifications; and then determining the final recognition intention.

The system also includes a start module; the starting module is used for acquiring the voice channel information and extracting keywords of the voice channel information to start the robot.

The execution process of the training module comprises the following steps: acquiring gesture image samples and gesture information samples of a plurality of old people operating robots; and (4) segmenting the gesture image sample and the gesture information sample by adopting an Otsu algorithm to form an old age behavior feature set. Training the neural network model by adopting an old behavior feature set to obtain a neural network model capable of identifying gesture intentions; and training the hidden Markov model by adopting the aged behavior feature set to obtain the hidden Markov model capable of identifying the gesture intention.

The building calculation module comprises a building module and a calculation module; the process of building the module is as follows: building an intention fusion model F = F (I, cin, hin); wherein f is the model of the intended fusion; i is an intention weight matrix; cin is a gesture recognition probability set; hin is a gesture recognition probability set; distributing weight values to the Cin to form an n x1 dimensional weight matrix Cconfi; distributing weight values to Hin to form n x1 dimensional weight matrix Hconfi respectively; carrying out fuzzy change on Ccondfi and Hcondfi to obtain a latest intention probability matrix C; wherein C = Cconfi omicron Cin + Hconfi omicron; where o is called a composite evaluation operator. The process of the calculation module is as follows: assignment of F1score to different intents

As weight values under each intent classification of Cin; is assigned to

As the weight value under each intention classification of Hin; wherein

Based on

Obtaining n x1 dimensional weight matrix of Cin

Based on

N x1 dimensional weight matrix of Hin can be obtained

Carrying out fuzzy change on Cin, hin, ccnfi and Hconfi to obtain a one-dimensional matrix [ lambda ] ₁ ,λ ₂ ,…,λ _n ] ^T (ii) a Selecting the maximum value gamma in the matrix _i (ii) a The i intention is the final recognition intention of the user; wherein [ lambda ] ₁ ,λ ₂ ,…,λ _n ] ^T ＝Cin×Cconfi+Hin×Hconfi。

The invention improves the current intention understanding rate of the old accompanying robot system and improves the use satisfaction of the old to the social accompanying robot.

Although the embodiments of the present invention have been described with reference to the accompanying drawings, the scope of the present invention is not limited thereto. Various modifications and alterations will occur to those skilled in the art based on the foregoing description. And are neither required nor exhaustive of all embodiments. On the basis of the technical solution of the present invention, those skilled in the art can make various modifications or variations without creative efforts and still be within the scope of the present invention.

Claims

1. An intention understanding method for an elderly accompanying robot is characterized by comprising the following steps:

performing intention fusion on the gesture recognition probability set and the gesture recognition probability set based on a fusion algorithm of a confusion matrix, and calculating weight proportions under different intentions when the gesture recognition probability set and the gesture recognition probability set are fused by adopting an F1score under different intention classifications; further determining a final recognition intention; the process of the confusion matrix-based fusion algorithm for performing intention fusion on the gesture recognition probability set and the gesture recognition probability set is as follows:

building an intention fusion model F = F (I, cin, hin); wherein f is the model of the intent fusion; i is an intention weight matrix; cin is a gesture recognition probability set; hin is a posture recognition probability set;

distributing weight values to the Cin to form an n x1 dimensional weight matrix Cconfi; distributing weight values to Hin to form n x1 dimensional weight matrix Hconfi respectively;

carrying out fuzzy change on Ccondfi and Hcondfi to obtain a latest intention probability matrix C; wherein C = Cconfi omicron Cin + Hconfi omicron; wherein o is called a composite evaluation operator;

under different intention classifications, the method for calculating the weight ratio of the gesture recognition probability set to the weight ratio of different intentions during fusion of the gesture recognition probability set by adopting the F1score comprises the following steps:

assignment of F1score to different intents

As weight values under each intent classification of Cin; is assigned to

As the weight value under each intention classification of Hin; wherein

Based on

Obtaining n x1 dimension weight matrix of Cin

Based on

N x1 dimensional weight matrix of Hin can be obtained

2. The method for understanding the intention of the elderly accompanying robot as claimed in claim 1, further comprising obtaining voice channel information and extracting keywords of the voice channel information to start the robot before the real-time obtaining of the behavior image of the elderly.

3. The method for understanding the intention of an elderly accompanying robot as claimed in claim 1, wherein the method for training the neural network model and the hidden markov model comprises:

4. The intent understanding method for an elderly accompanying robot as claimed in claim 1, wherein when the robot does not complete the designated action according to the final intent:

after the target object reaches a designated target area, a target object is captured through voice interaction, and the initial coordinate of the target object is (x, y); the robot moves the target object into the video frame to obtain coordinates (x 1, y 1); the transformation process of the target object is (x → x1, y → y 1);

5. An intention understanding system for an elderly accompanying robot is characterized by comprising an acquisition module, a training module and a building calculation module;

the building calculation module is used for performing intention fusion on the gesture recognition probability set and the gesture recognition probability set based on a fusion algorithm of a confusion matrix, and calculating weight proportion under different intentions when the gesture recognition probability set and the gesture recognition probability set are fused by adopting an F1score under different intention classifications; further determining a final recognition intention; the building calculation module comprises a building module and a calculation module;

the process of building the module comprises the following steps: constructing an intention fusion model F = F (I, cin, hin); wherein f is the model of the intent fusion; i is an intention weight matrix; cin is a gesture recognition probability set; hin is a gesture recognition probability set; distributing weight values to the Cin to form an n x1 dimensional weight matrix Cconfi; distributing weight values to Hin to form n x1 dimensional weight matrix Hconfi respectively; carrying out fuzzy change on Ccondfi and Hcondfi to obtain a latest intention probability matrix C; wherein C = Cconfi omicron Cin + Hconfi omicron; wherein o is called a composite evaluation operator;

the process of the calculation module is as follows: assignment of F1score to different intents

As the weight value under each intention classification of Cin; assign to

As the weight value under each intention classification of Hin; wherein

Based on

Obtaining n x1 dimension weight matrix of Cin

Based on

N x1 dimensional weight matrix of Hin can be obtained

6. The elderly accompanying robot oriented intention understanding system of claim 5, further comprising a starting module;

7. The elderly accompanying and attending robot oriented intention understanding system as claimed in claim 5, wherein the training module is implemented by the following steps: