WO2021039641A1

WO2021039641A1 - Motion verbalization device, motion verbalization method, program, and motion recording device

Info

Publication number: WO2021039641A1
Application number: PCT/JP2020/031664
Authority: WO
Inventors: 渉 ▲高▼野
Original assignee: 国立大学法人大阪大学
Priority date: 2019-08-30
Filing date: 2020-08-21
Publication date: 2021-03-04

Abstract

A motion verbalization device (1) is provided with: a learning data storage unit (22) that stores, as learning data, time-series motion data and motion symbols in association with a description consisting of a set of words corresponding to each motion symbol; a motion pattern classification unit (103) that classifies input motion data into a corresponding motion pattern; a motion verbalization computation unit (104) that calculates a probability representing the connection strength of each word from the classified motion pattern by applying a multi-layer statistical model, the learning data, and a parameter, the multi-layer statistical model having an input layer configured by motion symbols and an output layer configured by language, the input layer and the output layer being connected via two hidden variable layers, the multi-layer statistical model having a parameter in which the probability of connection between the motion symbols and each word in the description has been acquired through learning; and a word order computation unit (105) that produces a sentence using the probability of each word calculated and the probability of the order of each word. As a result, the motion verbalization device (1) extracts words associated with a motion pattern in a more optimized manner, and produces a sentence by changing the order of words.

Description

Motor verbalization device, motor verbalization method, program and exercise recording device

The present invention relates to a technique for realizing a connection between a movement pattern and a language by a multi-layer statistical model.

In order to build a human / mechanical system through smooth communication, it is indispensable to verbalize the movement. By incorporating an intelligent calculation function that verbalizes and understands human behavior into robots, we will approach a society in which robots permeate our daily lives.

The gesture recognition technology aimed at identifying human movement has been studied for a long time. By verbalizing gestures, it is expected that there is a possibility of developing more advanced robots and man-machine interfaces in terms of communication performance. However, at present, it is only possible to find the movement category (“walking”, “running”, etc.), and it has not reached the calculation of generating a sentence explaining the movement from the movement data. Furthermore, research on automatically annotating (commentary) images using image recognition technology is known, but it has not been developed into verbalization research on behavior by dealing with three-dimensional data of physical exercise.

Non-Patent Document 1 describes the outline of the motor language model in the study of verbalizing the physical movements of humans and robots. In this motor language model, the nodes of the first, second, and third layers form a motion symbol indicating a motion pattern, a latent variable indicating a sentence, and a graph structure indicating a language, respectively, and the latent variable of the sentence in the second layer. The association between the movement symbol and the word is expressed through.

Japanese Unexamined Patent Publication No. 2019-3683 Japanese Unexamined Patent Publication No. 2018-195053 Special Table 2017-527035

However, while

Patent Documents

1 and 2 measure motion, they do not verbalize and output the content of motion, and both

Patent Documents

1 and 2 and Patent Document 3 utilize machine learning to perform motion / verbalization processing. It does not do. In addition, Non-Patent Document 1 uses a motor language model consisting of a graph structure having a latent variable layer showing a single layer of sentences to generate words from motor symbols, and optimizes associative words. There may still be room for improvement in the extraction process.

The present invention has been made in view of the above, and is a motor language in which associative words are optimized and extracted from motion symbols representing motion patterns using a multi-layer statistical model, and the words are rearranged into sentences. It provides a computerization device, an exercise verbalization method, a program, and an exercise recording device.

The motion verbalization device according to the present invention associates motion data representing a three-dimensional time-series motion data and motion patterns, which are the types of motion, with motion symbols consisting of a set of words for each motion symbol. A learning data storage unit that stores as learning data, a classification means that classifies input three-dimensional motion data into corresponding motion patterns, an input layer composed of the motion symbols, and an output layer composed of the language. Are connected via a predetermined number of hidden variable layers, and a multi-layer statistical model having parameters obtained by learning the connection probability between the movement symbol and each word in the explanation corresponding to the movement symbol, the training data, and The first calculation means for calculating the probability of expressing the strength of the connection of each word from the movement patterns classified by the classification means by applying the parameters of the multi-layer statistical model, and the first calculation means. It is provided with a second calculation means for creating a sentence by using the probability of each word and the probability of arrangement of each word.

Further, the motion verbalization method according to the present invention corresponds to three-dimensional time-series motion data, motion symbols representing motion patterns that are the types of motion, and explanatory text consisting of a set of words for each motion symbol. A learning data storage step to be attached and stored as learning data, an input layer composed of the motion symbols, and an output layer composed of the language are connected via a predetermined number of hidden variable layers, and the motion symbols and the motion symbols are described. A multi-layer statistical model execution step that acquires parameters by learning the connection probability with each word in the explanation corresponding to the motion symbol, a classification step that classifies the input three-dimensional motion data into the corresponding motion patterns, and the above. The first calculation step of applying the training data and the parameters of the multi-layer statistical model to calculate the probability of expressing the strength of the connection of each word from the movement patterns classified by the classification means, and the first calculation. It is provided with a second calculation step of creating a sentence by using the probability of each word obtained by the means and the probability of the arrangement of each word.

Further, the program according to the present invention is for making a computer function as the motor verbalization device.

Further, the motion recording device according to the present invention includes the motion verbalization device, a recording unit for recording a sentence generated by the second arithmetic means, and an output unit for outputting the sentence recorded in the recording unit. It is characterized by having and.

According to these inventions, in the pre-learning, the learning data storage unit is composed of a three-dimensional time-series motion data, a motion symbol representing a motion pattern which is a type of the motion, and a set of words for each motion symbol. The explanatory text is associated with the training data and stored as training data, and the input layer composed of the motion symbols and the output layer composed of the language are connected to the multi-layer statistical model via a predetermined number of hidden variable layers. The parameters obtained through learning the connection probability between the movement symbol and each word in the explanatory text corresponding to the movement symbol are stored. Further, the input three-dimensional motion data is classified into the corresponding motion patterns by the classification means. Then, by applying the training data and the parameters of the multi-layer statistical model by the first calculation means, the probability of expressing the strength of the connection of each word is calculated from the motion patterns classified by the classification means, and the probability of expressing the strength of the connection of each word is calculated. A sentence is created by the second calculation means using the probability of each word obtained by the first calculation means and the probability of arrangement of each word. In this way, by extracting the associative relationship between exercise and words by a multi-layer statistical model, it is possible to calculate words closely related to it from physical exercise. Furthermore, by expressing the sequence of words in a sentence as a statistical model, the words related to the previous movement can be rearranged and the validity as a sentence can be calculated stochastically. By calculating the sequence of words with a high probability, the movement can be documented. Further, regardless of the number of hidden variable layers, for example, even if the predetermined number of layers is 0, 1, 2, 3, 4, ..., The same learning process can be performed.

According to the present invention, it is possible to more optimize and extract words associated with movement patterns, and to rearrange and create sentences.

It is a block diagram which shows one Embodiment of the motor language conversion apparatus which concerns on this invention. It is a figure which shows the memory map of the training data. It is a graph structure diagram which shows one Embodiment of a multi-layer statistical model. It is a figure which shows an example of a language model. It is an explanatory diagram which visualized the verbalized sentence. It is a flowchart which shows an example of a machine learning process. It is a flowchart which shows an example of the motor language processing.

FIG. 1 is a block diagram showing an embodiment of the motor verbalization device according to the present invention. The motor verbalization device 1 includes a control unit 10 and a storage unit 20, and further includes a motion measurement unit 31 and an input unit 32. As will be described later, the motor verbalization device 1 can execute two-step processing. First, as the pre-learning process, for example, a machine learning process is executed, and then a motor verbalization process for measuring the human body motion in the measurement mode and verbalizing the motion is executed. The motion measurement unit 31, the input unit 32, and some functional units described later are used in the pre-learning process, while the motion measurement unit 31 is used in both processes, and the other functional units are used in the motor verbalization process. used.

The motion measurement unit 31 transmits the motion data obtained by performing motion analysis on the captured image including the movement of the human body to the control unit 10 side by wire or wirelessly. In the present embodiment, the motion measurement unit 31 employs optical motion capture. The input unit 32 is capable of inputting characters, and preferably a keyboard or the like is adopted to input various words to create a sentence. The input sentence is input to the control unit 10 side, and is decomposed into words by a known morphological analysis on the control unit 10 side. A word ID is attached to the analyzed word, and the word ID is applied in the subsequent processing. Specifically, the motion measurement unit 31 captures the whole body motion of the human body. The input unit 32 inputs a plurality of simple or arbitrary sentences explaining the state or type of exercise obtained by the exercise measurement unit 31. Examples of types of exercise include "jumping," "running," "playing badminton," and "walking." The motion measurement unit 31 is provided separately from the control unit 10, and after the captured image and the measured motion data are once taken into the built-in storage member (not shown), the control unit 10 is offline. It may be in a mode of sending to the storage unit 20 via the device.

The known optical motion capture unit 31, which is the motion measurement unit 31, includes a marker, a plurality of cameras that capture the marker, and an image processing unit. In optical motion capture, markers are attached to predetermined parts of a moving human body, typically joint parts, and these markers are imaged by a plurality of cameras arranged in a plurality of places in advance, and an image processing unit is known. As described above, the three-dimensional position information of each marker, that is, the joint position of the human body is measured as time-series motion data from the captured marker image.

The storage unit 20 has a processing program storage unit 21 that stores each processing program executed by the control unit 10 in addition to a work area that temporarily stores information in the middle of processing. Further, the storage unit 20 includes a learning data storage unit 22, a classification parameter storage unit 23, a multi-layer statistical model storage unit 24, and a grammar model storage unit 25. In addition, the storage unit 20 includes a work area for temporarily storing data in the process of processing.

The processing program storage unit 21 stores each processing program that executes the pre-learning process and the motor verbalization process.

The learning data storage unit 22 stores the learning data created in advance by the control unit 10. FIG. 2 shows an example of learning data. As items, time-series motion data obtained by measuring the motion of the human body by the motion measuring unit 31, exercise patterns, which are the types of the exercise data, and exercise. It explains the content, and the explanation text entered manually is stored for each exercise pattern. The motion data is at least one of a specific part of the human body, here a joint part and a joint angle. The movement pattern indicates the type of movement and is manually input as the movement symbol λ. As the explanation, a plurality of sentences are manually input. For example, sentences s11 and s12 are created for the motion symbol λ1, and sentences s21, s22 and s23 are created for the motion symbol λ2. More specifically, when the type of exercise is "run", it is "he runs.", "Aplayer runs.", "A student runs.", Etc.

The classification parameter storage unit 23 generates classification parameters for classifying the movement data from the movement data acquired at the time of pre-learning and the corresponding movement patterns, and stores them in association with the movement symbol λ.

The multi-layer statistical model storage unit 24 expresses the associative relationship (strength of connection) between the motion symbol λ and the word w in the explanatory text s acquired in the pre-learning by the multi-layer statistical model, and is composed of the motion symbol λ. The input layer to be generated and the output layer composed of the word w are connected via K hidden variable layers.

FIG. 3 is a graph structure diagram showing an embodiment of a multi-layer statistical model. In FIG. 3, there are two ^{hidden variable layers, z (1)} and z ^(2).

The multi-layer statistical model storage unit 24 stores the number 1 that obtains the probability that the word w is generated from the motion symbol λ. Here, we use the assumption that the probability that a word is generated from a motion symbol depends only on that motion symbol.

Here, z ^(k) indicates the state in the kth hidden layer.

Probability parameter P (z) such that the objective function Φ (λ, w) shown in Equation 2 below is maximized using ^{the learning data set (λ i} , w ⁽ⁱ⁾ ) of the words in the motion symbol and the explanatory text. ⁽¹⁾ Solve the optimization problem for finding | λ), P (z ^(k) | z ^(k-1) ), ..., P (w | z ^(k)). This optimization problem is solved by the EM algorithm.

The parameters of the motor language model are optimized by the EM algorithm so that the objective function of Equation 2 is maximized. The EM algorithm is one of the methods for maximum likelihood estimation of the parameters of the probability model in statistics, and is used when the probability model depends on unobservable latent variables. The EM algorithm is applied to machine learning, and the step of calculating the distribution of the hidden variable layer based on the previously estimated model parameters and the model so that the objective function is maximized based on the calculated distribution of the hidden variable layer. This is a method (a type of iterative method) that alternately calculates the steps for estimating the optimum value of a parameter. Optimized parameters are also written in the multi-layer statistical model storage unit 24. For each word, a probability is set based on the number of times it appears in the explanatory text obtained by pre-learning.

Further, the grammar model storage unit 25 describes the grammar related to the sequence of words constituting the sentence. In this embodiment, the N-gram model, that is, the probabilities of the words appearing in the next position from the preceding N-1 words are learned.

FIG. 4 shows an example of a language model, where nodes represent each word and edges represent transitions between words. The value N is preferably about 2 to 4. In the 2-gram model shown in FIG. 4, based on the assumption that the word in the sentence depends only on the word immediately before it, the relationship between words, the transition probability between words, and the initial appearance of the word at the beginning appear. The sentence structure is expressed by the state probability. In FIG. 4, for example, P (“ball” | “a”) indicates the probability that the word “ball” will be located after the word “a”. The transition probability is set based on the number of times the corresponding sequence appears in the description, and the initial state probability is set based on the number of times the corresponding sequence appears at the beginning of the description. Such a grammatical rule is a simple method because it eliminates the complicated work of manually inputting part of speech.

The data stored in each of these storage units 22 to 25 is generated by a processing program for pre-learning.

The control unit 10 is typically composed of a processor (computer) having a built-in CPU. The control unit 10 reads the processing program from the processing program storage unit 21 into the main memory (not shown) and executes it, so that the movement data input unit 101, the input character analysis unit 102, the movement pattern classification unit 103, and the movement verbalization calculation are performed. It functions as a unit 104, a word sequence calculation unit 105, a display image creation unit 106, and a recording processing unit 107.

The exercise data input unit 101 takes in the exercise data measured by the exercise measurement unit 31 in the pre-learning and exercise verbalization processing.

The input character analysis unit 102 captures the explanatory text data input by the input unit 32 in the pre-learning process in association with the corresponding exercise data.

The movement pattern classification unit 103 performs processing for specifying the movement pattern of the movement data input during the movement verbalization processing, that is, the movement symbol. Various methods can be adopted by the exercise pattern classification unit 103, but in the present embodiment, the exercise data corresponding to each exercise pattern obtained at the time of pre-learning and the exercise captured this time are used by using the classification parameters. The difference from the data is calculated, the magnitude of the difference for each movement pattern is compared, the movement pattern with the smallest difference is specified, and the movement symbol is set.

The motion verbalization calculation unit 104 uses the number 1, number 2, multi-layer statistical model and parameters stored in the multi-layer statistical model storage unit 24 to classify the motion symbols of the motion patterns classified by the motion pattern classification unit 103. Calculate the probability that each word is associated with λ.

The word sequence calculation unit 105 has the highest total probability when the words are arranged sequentially by applying the associative probability of each word calculated by the motor verbalization calculation unit 104 and the probability value of the grammatical statistical model. Create a sentence that becomes. Also create sentences from the second place onward in the prescribed to required order.

The probability values of the grammatical statistical model are applied to the plurality of words w extracted by the motor verbalization calculation unit 104, and the words are sequentially rearranged to create a sentence having the highest probability. When the motor verbalization calculation unit 104 extracts the second and subsequent numbers in a predetermined to required order, they are also documented.

The display image creation unit 106 displays the sentence created by the word sequence calculation unit 105 on the display unit 12. The recording processing unit 107 stores the created text in a predetermined storage unit of the storage unit 20 in correspondence with the exercise data. The recording processing unit 107 may store the created text in correspondence with the captured image.

FIG. 5 is an explanatory diagram that visualizes the verbalized sentences. FIG. 5 (A) is an exercise pattern of “jumping”, FIG. 5 (B) is an exercise pattern of “running”, and FIG. 5 (C) is an exercise pattern of “playing badminton”. For example, in FIG. 5 (B), the optimum sentence is “a student runs.” On the first line, the next candidate is “a player runs.” On the second line, and the candidates are three lines one after another. The eyes are "a player walks."

FIG. 6 is a flowchart showing an example of machine learning processing. First, the exercise data input unit 101 acquires exercise data from the exercise measurement unit 31 (step S1). Next, the exercise pattern classification unit 103 performs a process of associating the input exercise data with the information of the exercise pattern input from the input unit 32 (step S3). Further, the input character analysis unit 102 acquires the explanatory text data input by the input unit 32 in association with the corresponding exercise data (step S5).

Then, the motor verbalization calculation unit 104 decomposes the explanatory text acquired during the pre-learning into words by morphological analysis, applies the

numbers

1 and 2, the multi-layer statistical model, and the parameters being learned to the movement pattern. The word w in the explanatory text is extracted from the motion symbol λ of the above in association with the probability of occurrence (step S7).

Subsequently, the word sequence calculation unit 105 starts with the word sequence probability, here the transition probability between words and the word at the beginning, based on the grammatical statistical model for the plurality of words w extracted by the motor verbalization calculation unit 104. The initial state probability appearing in (step S9) is calculated. Next, the obtained data is stored in the grammar model storage unit 25 (step S11). It should be noted that such machine learning may be continued as necessary, and in this case, more accurate motor languageization can be realized.

FIG. 7 is a flowchart showing an example of motor verbalization processing. First, the exercise data input unit 101 acquires exercise data from the exercise measurement unit 31 (step S21). Next, the movement pattern classification unit 103 executes a movement pattern classification process on the input movement data (step S23). Next, the motor verbalization calculation unit 104 applies the

equations

1 and 2, the multi-layer statistical model, and the parameters acquired during the pre-learning to calculate the probability that each word is associated with the exercise pattern (step S25). ).

Next, the word sequence calculation unit 105 is based on the word associative probability and the grammatical statistical model calculated by the motor verbalization calculation unit 104, and the word sequence probability, here the transition probability between words and the initial appearance of the word at the beginning. The state probability is calculated and the text is written (step S27). Then, the created sentence is output to the storage unit 20 or the display unit 12 (step S29).

<Experimental example>
157 kinds of motion patterns were measured by the optical motion capture which is the motion measurement unit 31. Three trials were measured for each exercise pattern, and 765 explanatory texts were manually added to a total of 471 exercise data. The number of vocabularies used in the explanation was 259. Using these as training data, the probability parameters of the multi-layer statistical model were optimized. In this experiment, in the graph structure of FIG. 3, the hidden layer was set to two layers, the number of states of the first hidden layer was set to 100, and the state of the second hidden layer was set to 50. In addition, the knowledge about the sequence of words in the sentence was expressed as a 2 gram statistical model. FIG. 5 shows an example of a sentence created based on this experimental example. By connecting the multi-layer statistical model of motion and word associative probability, which is the graph structure, with the statistical model of word sequence, it was possible to create a sentence (word sequence) with high probability from the motion symbol.

The present invention includes the following embodiments.

(1) As a technique for measuring the whole body movement of a human being, instead of the optical motion capture, an IMU (inertial measurement unit) sensor having a built-in acceleration sensor is used, and no sensor or marker is used. , A method utilizing image processing for a captured image (for example, the patent application "Japanese Patent Application No. 2019-157640" by the present applicant) can also be adopted.

(2) Motion data may be created from images of a director or an athlete on the stage, and the content of the motion may be displayed in sentences or output by voice to provide automatic explanation.

(3) In medical and long-term care sites, the movement of the patient may be measured, and a sentence created based on the measurement result may be automatically created as a diary. In this case, the exercise measurement unit 31 identifies the target patient, measures the movement and exercise of the patient such as rehabilitation, and the recording processing unit 107 creates a rehabilitation diary, a long-term care diary, or the like from the measurement results. Save the text. In the above, when a sentence with unusual movement is observed, it also functions as a monitoring means for determining that an abnormality has occurred.

(4) Effective communication with humans by mounting this motor verbalization device on a robot for man-machine interface and measuring and verbalizing the movements (gestures, gestures, etc.) of the person facing this robot. It becomes possible to take. Moreover, the measurement target is not limited to people.

(5) In the word sequence operation in the present invention, grammatical rules corresponding to each language can be applied. Further, the present invention may apply deep learning as pre-learning. Further, instead of the N-gram statistical model, a statistical model representing a sequence of other words may be adopted.

(6) The present invention is a process in the direction of constructing a sentence from a motion, but conversely, when a sentence is input to a movable robot equipped with an actuator at a joint part, the corresponding motion (6) It can also be applied to processing in the direction of performing operation). According to this, when a human provides motion command information to a robot by characters or voice, it becomes possible for the robot to reproduce the corresponding motion.

As described above, the motion verbalization device according to the present invention is described by consisting of three-dimensional time-series motion data, motion symbols representing motion patterns that are the types of motion, and a set of words for each motion symbol. From a learning data storage unit that associates sentences and stores them as learning data, a classification means that classifies input three-dimensional motion data into corresponding motion patterns, an input layer composed of the motion symbols, and the language. A multi-layer statistical model in which the constituent output layers are connected via a predetermined number of hidden variable layers, and the probability of connection between the motion symbol and each word in the explanatory text corresponding to the motion symbol is acquired through learning. The first calculation means for calculating the probability of expressing the strength of the connection of each word from the movement patterns classified by the classification means by applying the training data and the parameters of the multi-layer statistical model, and the first calculation means. It is preferable to include a second calculation means for creating a sentence by using the probability of each word obtained by the calculation means of 1 and the probability of arrangement of each word.

Further, the motion verbalization method according to the present invention corresponds to three-dimensional time-series motion data, motion symbols representing motion patterns that are the types of motion, and explanatory text consisting of a set of words for each motion symbol. A learning data storage step to be attached and stored as learning data, an input layer composed of the motion symbols, and an output layer composed of the language are connected via a predetermined number of hidden variable layers, and the motion symbols and the motion symbols are described. A multi-layer statistical model execution step that acquires parameters by learning the connection probability with each word in the explanation corresponding to the motion symbol, a classification step that classifies the input three-dimensional motion data into the corresponding motion patterns, and the above. The first calculation step of applying the training data and the parameters of the multi-layer statistical model to calculate the probability of expressing the strength of the connection of each word from the movement patterns classified by the classification means, and the first calculation. It is preferable to include a second calculation step of creating a sentence by using the probability of each word obtained by the means and the probability of the arrangement of each word.

Further, the motion recording device according to the present invention includes the motion verbalization device, a recording unit for recording a sentence generated by the second arithmetic means, and an output unit for outputting the sentence recorded in the recording unit. It is preferable that the device is provided with.

Further, in the present invention, it is preferable that the hidden variable layer of the multi-layer statistical model is two layers, and words having a higher associative relationship can be selected.

Further, in the present invention, it is preferable to apply the multi-layer statistical model to acquire the parameters through machine learning, thereby improving the accuracy of the associative relationship (strength of connection) between movement and words. it can.

Further, the first calculation means expresses the probability P (w | λ) that the word w is generated from the motion symbol λ by the equation (1), and is a learning data set of the motion symbol and the word in the explanatory text. Using (λ ⁱ , w ⁽ⁱ⁾ ^{), the probability parameters P (z (1)} | λ), P (z ^(k ) so that the objective function Φ (λ, w) represented by the equation (2) is maximized. ⁾ | Z ^(k-1) ), ..., P (w | z ^(k) ) is preferably obtained.

z ^(k) indicates the state in the kth hidden variable layer. According to this configuration, the accuracy of the associative relationship between movement and words can be improved. Further, k representing the number of hidden variable layers can be freely set as k = 0,1,2,3,4, ..., And the operation by the number 3 can be executed regardless of the number of layers. ..

Further, as the second calculation means, it is preferable to calculate the probability of the arrangement of each of the words by using the N-gram statistical model. According to this configuration, words can be rearranged to create a sentence in a simple manner.

1 Exercise verbalization device 10 Control unit 101 Exercise data input unit 102 Input character analysis unit 103 Exercise pattern classification unit (classification means)
104 Motor verbalization calculation unit (first calculation means)
105 Word sequence calculation unit (second calculation means)
106 Display image creation unit 107 Recording processing unit 20 Storage unit 22 Learning data storage unit 23 Classification parameter storage unit 24 Multi-layer statistical model storage unit 25 Grammar model storage unit 31 Motion measurement unit 32 Input unit

Claims

A learning data storage unit that stores three-dimensional time-series motion data, motion symbols representing motion patterns that are the types of motion, and explanatory text consisting of a set of words for each motion symbol as learning data. ,
A classification means that classifies the input 3D motion data into the corresponding motion patterns,
The input layer composed of the movement symbols and the output layer composed of the language are connected via a predetermined number of hidden variable layers, and the movement symbols are connected to each word in the explanatory text corresponding to the movement symbols. A multi-layer statistical model with parameters obtained by learning probabilities,
A first calculation means for calculating the probability of expressing the strength of the connection of each word from the motion patterns classified by the classification means by applying the learning data and the parameters of the multi-layer statistical model.
A motor verbalization device including a second arithmetic means for creating a sentence by using the probability of each word obtained by the first arithmetic means and the probability of arrangement of each word.
The motor verbalization device according to claim 1, wherein the multi-layer statistical model has two hidden variable layers.
The motor verbalization device according to claim 1 or 2, wherein the multi-layer statistical model acquires the parameters through machine learning.
The first calculation means expresses the probability P (w | λ) that the word w is generated from the motion symbol λ by the equation (1), and the learning data set (λ) of the motion symbol and the word in the explanatory text. Using i , w (i) ), the probability parameters P (z (1) | λ), P (z (k) | so that the objective function Φ (λ, w) represented by the equation (2) is maximized. The motor verbalization device according to any one of claims 1 to 3, wherein z (k-1) ), ..., P (w | z (k)) is obtained.

Note that z (k) indicates the state in the kth hidden variable layer.
The second calculation means is the motor languageization device according to any one of claims 1 to 4, wherein the probability of the sequence of each word is calculated using an N-gram statistical model.
A learning data storage step in which a three-dimensional time-series motion data, a motion symbol representing a motion pattern that is a type of the motion, and an explanatory text consisting of a set of words for each motion symbol are associated and stored as learning data. ,
The input layer composed of the movement symbols and the output layer composed of the language are connected via a predetermined number of hidden variable layers, and the movement symbols are connected to each word in the explanatory text corresponding to the movement symbols. Multi-layer statistical model execution step to acquire parameters by learning probabilities,
A classification step that classifies the input 3D motion data into the corresponding motion patterns,
A first calculation step of applying the learning data and the parameters of the multi-layer statistical model to calculate the probability of expressing the strength of the connection of each word from the motion patterns classified by the classification means, and
A motor verbalization method including a second calculation step of creating a sentence using the probability of each word obtained by the first calculation means and the probability of arrangement of each word.
A program for making a computer function as the motor languageization device according to any one of claims 1 to 5.
The motor verbalization device according to any one of claims 1 to 5.
A recording unit that records sentences created by the second calculation means, and
An exercise recording device including an output unit that outputs the text recorded in the recording unit.