WO2021039641A1 - Motion verbalization device, motion verbalization method, program, and motion recording device - Google Patents

Motion verbalization device, motion verbalization method, program, and motion recording device Download PDF

Info

Publication number
WO2021039641A1
WO2021039641A1 PCT/JP2020/031664 JP2020031664W WO2021039641A1 WO 2021039641 A1 WO2021039641 A1 WO 2021039641A1 JP 2020031664 W JP2020031664 W JP 2020031664W WO 2021039641 A1 WO2021039641 A1 WO 2021039641A1
Authority
WO
WIPO (PCT)
Prior art keywords
motion
word
probability
verbalization
statistical model
Prior art date
Application number
PCT/JP2020/031664
Other languages
French (fr)
Japanese (ja)
Inventor
渉 ▲高▼野
Original Assignee
国立大学法人大阪大学
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 国立大学法人大阪大学 filed Critical 国立大学法人大阪大学
Publication of WO2021039641A1 publication Critical patent/WO2021039641A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/12Use of codes for handling textual entities
    • G06F40/126Character encoding

Definitions

  • the present invention relates to a technique for realizing a connection between a movement pattern and a language by a multi-layer statistical model.
  • the gesture recognition technology aimed at identifying human movement has been studied for a long time.
  • verbalizing gestures it is expected that there is a possibility of developing more advanced robots and man-machine interfaces in terms of communication performance.
  • research on automatically annotating (commentary) images using image recognition technology is known, but it has not been developed into verbalization research on behavior by dealing with three-dimensional data of physical exercise.
  • Non-Patent Document 1 describes the outline of the motor language model in the study of verbalizing the physical movements of humans and robots.
  • the nodes of the first, second, and third layers form a motion symbol indicating a motion pattern, a latent variable indicating a sentence, and a graph structure indicating a language, respectively, and the latent variable of the sentence in the second layer.
  • the association between the movement symbol and the word is expressed through.
  • Non-Patent Document 1 uses a motor language model consisting of a graph structure having a latent variable layer showing a single layer of sentences to generate words from motor symbols, and optimizes associative words. There may still be room for improvement in the extraction process.
  • the present invention has been made in view of the above, and is a motor language in which associative words are optimized and extracted from motion symbols representing motion patterns using a multi-layer statistical model, and the words are rearranged into sentences. It provides a computerization device, an exercise verbalization method, a program, and an exercise recording device.
  • the motion verbalization device associates motion data representing a three-dimensional time-series motion data and motion patterns, which are the types of motion, with motion symbols consisting of a set of words for each motion symbol.
  • a learning data storage unit that stores as learning data, a classification means that classifies input three-dimensional motion data into corresponding motion patterns, an input layer composed of the motion symbols, and an output layer composed of the language.
  • the first calculation means for calculating the probability of expressing the strength of the connection of each word from the movement patterns classified by the classification means by applying the parameters of the multi-layer statistical model, and the first calculation means. It is provided with a second calculation means for creating a sentence by using the probability of each word and the probability of arrangement of each word.
  • the motion verbalization method corresponds to three-dimensional time-series motion data, motion symbols representing motion patterns that are the types of motion, and explanatory text consisting of a set of words for each motion symbol.
  • a learning data storage step to be attached and stored as learning data, an input layer composed of the motion symbols, and an output layer composed of the language are connected via a predetermined number of hidden variable layers, and the motion symbols and the motion symbols are described.
  • a multi-layer statistical model execution step that acquires parameters by learning the connection probability with each word in the explanation corresponding to the motion symbol, a classification step that classifies the input three-dimensional motion data into the corresponding motion patterns, and the above.
  • program according to the present invention is for making a computer function as the motor verbalization device.
  • the motion recording device includes the motion verbalization device, a recording unit for recording a sentence generated by the second arithmetic means, and an output unit for outputting the sentence recorded in the recording unit. It is characterized by having and.
  • the learning data storage unit is composed of a three-dimensional time-series motion data, a motion symbol representing a motion pattern which is a type of the motion, and a set of words for each motion symbol.
  • the explanatory text is associated with the training data and stored as training data, and the input layer composed of the motion symbols and the output layer composed of the language are connected to the multi-layer statistical model via a predetermined number of hidden variable layers.
  • the parameters obtained through learning the connection probability between the movement symbol and each word in the explanatory text corresponding to the movement symbol are stored. Further, the input three-dimensional motion data is classified into the corresponding motion patterns by the classification means.
  • the probability of expressing the strength of the connection of each word is calculated from the motion patterns classified by the classification means, and the probability of expressing the strength of the connection of each word is calculated.
  • a sentence is created by the second calculation means using the probability of each word obtained by the first calculation means and the probability of arrangement of each word.
  • FIG. 1 is a block diagram showing an embodiment of the motor verbalization device according to the present invention.
  • the motor verbalization device 1 includes a control unit 10 and a storage unit 20, and further includes a motion measurement unit 31 and an input unit 32.
  • the motor verbalization device 1 can execute two-step processing. First, as the pre-learning process, for example, a machine learning process is executed, and then a motor verbalization process for measuring the human body motion in the measurement mode and verbalizing the motion is executed.
  • the motion measurement unit 31, the input unit 32, and some functional units described later are used in the pre-learning process, while the motion measurement unit 31 is used in both processes, and the other functional units are used in the motor verbalization process. used.
  • the motion measurement unit 31 transmits the motion data obtained by performing motion analysis on the captured image including the movement of the human body to the control unit 10 side by wire or wirelessly.
  • the motion measurement unit 31 employs optical motion capture.
  • the input unit 32 is capable of inputting characters, and preferably a keyboard or the like is adopted to input various words to create a sentence.
  • the input sentence is input to the control unit 10 side, and is decomposed into words by a known morphological analysis on the control unit 10 side.
  • a word ID is attached to the analyzed word, and the word ID is applied in the subsequent processing.
  • the motion measurement unit 31 captures the whole body motion of the human body.
  • the input unit 32 inputs a plurality of simple or arbitrary sentences explaining the state or type of exercise obtained by the exercise measurement unit 31.
  • the motion measurement unit 31 is provided separately from the control unit 10, and after the captured image and the measured motion data are once taken into the built-in storage member (not shown), the control unit 10 is offline. It may be in a mode of sending to the storage unit 20 via the device.
  • the known optical motion capture unit 31 which is the motion measurement unit 31, includes a marker, a plurality of cameras that capture the marker, and an image processing unit.
  • markers are attached to predetermined parts of a moving human body, typically joint parts, and these markers are imaged by a plurality of cameras arranged in a plurality of places in advance, and an image processing unit is known.
  • the three-dimensional position information of each marker that is, the joint position of the human body is measured as time-series motion data from the captured marker image.
  • the storage unit 20 has a processing program storage unit 21 that stores each processing program executed by the control unit 10 in addition to a work area that temporarily stores information in the middle of processing. Further, the storage unit 20 includes a learning data storage unit 22, a classification parameter storage unit 23, a multi-layer statistical model storage unit 24, and a grammar model storage unit 25. In addition, the storage unit 20 includes a work area for temporarily storing data in the process of processing.
  • the processing program storage unit 21 stores each processing program that executes the pre-learning process and the motor verbalization process.
  • the learning data storage unit 22 stores the learning data created in advance by the control unit 10.
  • FIG. 2 shows an example of learning data.
  • time-series motion data obtained by measuring the motion of the human body by the motion measuring unit 31, exercise patterns, which are the types of the exercise data, and exercise. It explains the content, and the explanation text entered manually is stored for each exercise pattern.
  • the motion data is at least one of a specific part of the human body, here a joint part and a joint angle.
  • the movement pattern indicates the type of movement and is manually input as the movement symbol ⁇ .
  • a plurality of sentences are manually input. For example, sentences s11 and s12 are created for the motion symbol ⁇ 1, and sentences s21, s22 and s23 are created for the motion symbol ⁇ 2. More specifically, when the type of exercise is "run", it is "he runs.”, "Aplayer runs.”, "A student runs.”, Etc.
  • the classification parameter storage unit 23 generates classification parameters for classifying the movement data from the movement data acquired at the time of pre-learning and the corresponding movement patterns, and stores them in association with the movement symbol ⁇ .
  • the multi-layer statistical model storage unit 24 expresses the associative relationship (strength of connection) between the motion symbol ⁇ and the word w in the explanatory text s acquired in the pre-learning by the multi-layer statistical model, and is composed of the motion symbol ⁇ .
  • the input layer to be generated and the output layer composed of the word w are connected via K hidden variable layers.
  • FIG. 3 is a graph structure diagram showing an embodiment of a multi-layer statistical model.
  • the multi-layer statistical model storage unit 24 stores the number 1 that obtains the probability that the word w is generated from the motion symbol ⁇ .
  • z (k) indicates the state in the kth hidden layer.
  • Probability parameter P (z) such that the objective function ⁇ ( ⁇ , w) shown in Equation 2 below is maximized using the learning data set ( ⁇ i , w (i) ) of the words in the motion symbol and the explanatory text.
  • the parameters of the motor language model are optimized by the EM algorithm so that the objective function of Equation 2 is maximized.
  • the EM algorithm is one of the methods for maximum likelihood estimation of the parameters of the probability model in statistics, and is used when the probability model depends on unobservable latent variables.
  • the EM algorithm is applied to machine learning, and the step of calculating the distribution of the hidden variable layer based on the previously estimated model parameters and the model so that the objective function is maximized based on the calculated distribution of the hidden variable layer. This is a method (a type of iterative method) that alternately calculates the steps for estimating the optimum value of a parameter.
  • Optimized parameters are also written in the multi-layer statistical model storage unit 24. For each word, a probability is set based on the number of times it appears in the explanatory text obtained by pre-learning.
  • the grammar model storage unit 25 describes the grammar related to the sequence of words constituting the sentence.
  • the N-gram model that is, the probabilities of the words appearing in the next position from the preceding N-1 words are learned.
  • FIG. 4 shows an example of a language model, where nodes represent each word and edges represent transitions between words.
  • the value N is preferably about 2 to 4.
  • the 2-gram model shown in FIG. 4 based on the assumption that the word in the sentence depends only on the word immediately before it, the relationship between words, the transition probability between words, and the initial appearance of the word at the beginning appear.
  • the sentence structure is expressed by the state probability.
  • “a”) indicates the probability that the word “ball” will be located after the word “a”.
  • the transition probability is set based on the number of times the corresponding sequence appears in the description, and the initial state probability is set based on the number of times the corresponding sequence appears at the beginning of the description.
  • Such a grammatical rule is a simple method because it eliminates the complicated work of manually inputting part of speech.
  • the data stored in each of these storage units 22 to 25 is generated by a processing program for pre-learning.
  • the control unit 10 is typically composed of a processor (computer) having a built-in CPU.
  • the control unit 10 reads the processing program from the processing program storage unit 21 into the main memory (not shown) and executes it, so that the movement data input unit 101, the input character analysis unit 102, the movement pattern classification unit 103, and the movement verbalization calculation are performed. It functions as a unit 104, a word sequence calculation unit 105, a display image creation unit 106, and a recording processing unit 107.
  • the exercise data input unit 101 takes in the exercise data measured by the exercise measurement unit 31 in the pre-learning and exercise verbalization processing.
  • the input character analysis unit 102 captures the explanatory text data input by the input unit 32 in the pre-learning process in association with the corresponding exercise data.
  • the movement pattern classification unit 103 performs processing for specifying the movement pattern of the movement data input during the movement verbalization processing, that is, the movement symbol.
  • Various methods can be adopted by the exercise pattern classification unit 103, but in the present embodiment, the exercise data corresponding to each exercise pattern obtained at the time of pre-learning and the exercise captured this time are used by using the classification parameters. The difference from the data is calculated, the magnitude of the difference for each movement pattern is compared, the movement pattern with the smallest difference is specified, and the movement symbol is set.
  • the motion verbalization calculation unit 104 uses the number 1, number 2, multi-layer statistical model and parameters stored in the multi-layer statistical model storage unit 24 to classify the motion symbols of the motion patterns classified by the motion pattern classification unit 103. Calculate the probability that each word is associated with ⁇ .
  • the word sequence calculation unit 105 has the highest total probability when the words are arranged sequentially by applying the associative probability of each word calculated by the motor verbalization calculation unit 104 and the probability value of the grammatical statistical model. Create a sentence that becomes. Also create sentences from the second place onward in the prescribed to required order.
  • the probability values of the grammatical statistical model are applied to the plurality of words w extracted by the motor verbalization calculation unit 104, and the words are sequentially rearranged to create a sentence having the highest probability.
  • the motor verbalization calculation unit 104 extracts the second and subsequent numbers in a predetermined to required order, they are also documented.
  • the display image creation unit 106 displays the sentence created by the word sequence calculation unit 105 on the display unit 12.
  • the recording processing unit 107 stores the created text in a predetermined storage unit of the storage unit 20 in correspondence with the exercise data.
  • the recording processing unit 107 may store the created text in correspondence with the captured image.
  • FIG. 5 is an explanatory diagram that visualizes the verbalized sentences.
  • FIG. 5 (A) is an exercise pattern of “jumping”
  • FIG. 5 (B) is an exercise pattern of “running”
  • FIG. 5 (C) is an exercise pattern of “playing badminton”.
  • the optimum sentence is “a student runs.” On the first line, the next candidate is “a player runs.” On the second line, and the candidates are three lines one after another. The eyes are "a player walks.”
  • FIG. 6 is a flowchart showing an example of machine learning processing.
  • the exercise data input unit 101 acquires exercise data from the exercise measurement unit 31 (step S1).
  • the exercise pattern classification unit 103 performs a process of associating the input exercise data with the information of the exercise pattern input from the input unit 32 (step S3).
  • the input character analysis unit 102 acquires the explanatory text data input by the input unit 32 in association with the corresponding exercise data (step S5).
  • the motor verbalization calculation unit 104 decomposes the explanatory text acquired during the pre-learning into words by morphological analysis, applies the numbers 1 and 2, the multi-layer statistical model, and the parameters being learned to the movement pattern.
  • the word w in the explanatory text is extracted from the motion symbol ⁇ of the above in association with the probability of occurrence (step S7).
  • the word sequence calculation unit 105 starts with the word sequence probability, here the transition probability between words and the word at the beginning, based on the grammatical statistical model for the plurality of words w extracted by the motor verbalization calculation unit 104.
  • the initial state probability appearing in (step S9) is calculated.
  • the obtained data is stored in the grammar model storage unit 25 (step S11). It should be noted that such machine learning may be continued as necessary, and in this case, more accurate motor languageization can be realized.
  • FIG. 7 is a flowchart showing an example of motor verbalization processing.
  • the exercise data input unit 101 acquires exercise data from the exercise measurement unit 31 (step S21).
  • the movement pattern classification unit 103 executes a movement pattern classification process on the input movement data (step S23).
  • the motor verbalization calculation unit 104 applies the equations 1 and 2, the multi-layer statistical model, and the parameters acquired during the pre-learning to calculate the probability that each word is associated with the exercise pattern (step S25). ).
  • the word sequence calculation unit 105 is based on the word associative probability and the grammatical statistical model calculated by the motor verbalization calculation unit 104, and the word sequence probability, here the transition probability between words and the initial appearance of the word at the beginning.
  • the state probability is calculated and the text is written (step S27).
  • the created sentence is output to the storage unit 20 or the display unit 12 (step S29).
  • the present invention includes the following embodiments.
  • Motion data may be created from images of a director or an athlete on the stage, and the content of the motion may be displayed in sentences or output by voice to provide automatic explanation.
  • the movement of the patient may be measured, and a sentence created based on the measurement result may be automatically created as a diary.
  • the exercise measurement unit 31 identifies the target patient, measures the movement and exercise of the patient such as rehabilitation, and the recording processing unit 107 creates a rehabilitation diary, a long-term care diary, or the like from the measurement results. Save the text.
  • a sentence with unusual movement is observed, it also functions as a monitoring means for determining that an abnormality has occurred.
  • grammatical rules corresponding to each language can be applied. Further, the present invention may apply deep learning as pre-learning. Further, instead of the N-gram statistical model, a statistical model representing a sequence of other words may be adopted.
  • the present invention is a process in the direction of constructing a sentence from a motion, but conversely, when a sentence is input to a movable robot equipped with an actuator at a joint part, the corresponding motion (6) It can also be applied to processing in the direction of performing operation). According to this, when a human provides motion command information to a robot by characters or voice, it becomes possible for the robot to reproduce the corresponding motion.
  • the motion verbalization device is described by consisting of three-dimensional time-series motion data, motion symbols representing motion patterns that are the types of motion, and a set of words for each motion symbol.
  • a classification means that classifies input three-dimensional motion data into corresponding motion patterns, an input layer composed of the motion symbols, and the language.
  • a multi-layer statistical model in which the constituent output layers are connected via a predetermined number of hidden variable layers, and the probability of connection between the motion symbol and each word in the explanatory text corresponding to the motion symbol is acquired through learning.
  • the first calculation means for calculating the probability of expressing the strength of the connection of each word from the movement patterns classified by the classification means by applying the training data and the parameters of the multi-layer statistical model, and the first calculation means. It is preferable to include a second calculation means for creating a sentence by using the probability of each word obtained by the calculation means of 1 and the probability of arrangement of each word.
  • the motion verbalization method corresponds to three-dimensional time-series motion data, motion symbols representing motion patterns that are the types of motion, and explanatory text consisting of a set of words for each motion symbol.
  • a learning data storage step to be attached and stored as learning data, an input layer composed of the motion symbols, and an output layer composed of the language are connected via a predetermined number of hidden variable layers, and the motion symbols and the motion symbols are described.
  • a multi-layer statistical model execution step that acquires parameters by learning the connection probability with each word in the explanation corresponding to the motion symbol, a classification step that classifies the input three-dimensional motion data into the corresponding motion patterns, and the above.
  • program according to the present invention is for making a computer function as the motor verbalization device.
  • the motion recording device includes the motion verbalization device, a recording unit for recording a sentence generated by the second arithmetic means, and an output unit for outputting the sentence recorded in the recording unit. It is preferable that the device is provided with.
  • the learning data storage unit is composed of a three-dimensional time-series motion data, a motion symbol representing a motion pattern which is a type of the motion, and a set of words for each motion symbol.
  • the explanatory text is associated with the training data and stored as training data, and the input layer composed of the motion symbols and the output layer composed of the language are connected to the multi-layer statistical model via a predetermined number of hidden variable layers.
  • the parameters obtained through learning the connection probability between the movement symbol and each word in the explanatory text corresponding to the movement symbol are stored. Further, the input three-dimensional motion data is classified into the corresponding motion patterns by the classification means.
  • the probability of expressing the strength of the connection of each word is calculated from the motion patterns classified by the classification means, and the probability of expressing the strength of the connection of each word is calculated.
  • a sentence is created by the second calculation means using the probability of each word obtained by the first calculation means and the probability of arrangement of each word.
  • the hidden variable layer of the multi-layer statistical model is two layers, and words having a higher associative relationship can be selected.
  • the multi-layer statistical model it is preferable to apply the multi-layer statistical model to acquire the parameters through machine learning, thereby improving the accuracy of the associative relationship (strength of connection) between movement and words. it can.
  • the first calculation means expresses the probability P (w
  • ⁇ ) the probability parameters P (z (1)
  • the second calculation means it is preferable to calculate the probability of the arrangement of each of the words by using the N-gram statistical model. According to this configuration, words can be rearranged to create a sentence in a simple manner.
  • Exercise verbalization device 10
  • Control unit 101
  • Exercise data input unit 102
  • Input character analysis unit 103
  • Exercise pattern classification unit (classification means)
  • Motor verbalization calculation unit (first calculation means)
  • Word sequence calculation unit (second calculation means)
  • Display image creation unit 107
  • Recording processing unit 20
  • Storage unit 22
  • Learning data storage unit 23
  • Classification parameter storage unit 24
  • Multi-layer statistical model storage unit 25
  • Motion measurement unit 32 Input unit

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Machine Translation (AREA)

Abstract

A motion verbalization device (1) is provided with: a learning data storage unit (22) that stores, as learning data, time-series motion data and motion symbols in association with a description consisting of a set of words corresponding to each motion symbol; a motion pattern classification unit (103) that classifies input motion data into a corresponding motion pattern; a motion verbalization computation unit (104) that calculates a probability representing the connection strength of each word from the classified motion pattern by applying a multi-layer statistical model, the learning data, and a parameter, the multi-layer statistical model having an input layer configured by motion symbols and an output layer configured by language, the input layer and the output layer being connected via two hidden variable layers, the multi-layer statistical model having a parameter in which the probability of connection between the motion symbols and each word in the description has been acquired through learning; and a word order computation unit (105) that produces a sentence using the probability of each word calculated and the probability of the order of each word. As a result, the motion verbalization device (1) extracts words associated with a motion pattern in a more optimized manner, and produces a sentence by changing the order of words.

Description

運動言語化装置、運動言語化方法、プログラム及び運動記録装置Motor verbalization device, motor verbalization method, program and exercise recording device
 本発明は、運動パターンと言語との繋がりを多層統計モデルによって実現する技術に関する。 The present invention relates to a technique for realizing a connection between a movement pattern and a language by a multi-layer statistical model.
 円滑なコミュニケーションを通じた人間・機械システムを構築するうえで、動きを言語化することは必要不可欠である。人間の行動を言語化して理解する知能計算機能をロボットに組み込むなどすることによって、ロボットが日常生活に浸透する社会へ接近する。 In order to build a human / mechanical system through smooth communication, it is indispensable to verbalize the movement. By incorporating an intelligent calculation function that verbalizes and understands human behavior into robots, we will approach a society in which robots permeate our daily lives.
 人間の運動を識別することを目的としたジャスチャー認識技術は古くから研究されている。ジェスチャーを言語化することで、コミュニケーション性能について、より進化したロボットやマンマシンインターフェースの開発の可能性が期待される。しかし、現状では、動きのカテゴリー(「歩く」、「走る」など)を求めるに止まり、また、運動データから動きを説明する文章を生成する計算までには至っていない。さらに、画像認識技術を利用して映像のアノテーション(解説)を自動で付ける研究が知られているが、身体運動の3次元データを扱って行動の言語化研究には展開されていない。 The gesture recognition technology aimed at identifying human movement has been studied for a long time. By verbalizing gestures, it is expected that there is a possibility of developing more advanced robots and man-machine interfaces in terms of communication performance. However, at present, it is only possible to find the movement category (“walking”, “running”, etc.), and it has not reached the calculation of generating a sentence explaining the movement from the movement data. Furthermore, research on automatically annotating (commentary) images using image recognition technology is known, but it has not been developed into verbalization research on behavior by dealing with three-dimensional data of physical exercise.
 非特許文献1には、人間・ロボットの身体運動を言語化する研究において、運動言語モデルの概要が記載されている。この運動言語モデルは、第1、第2、第3の各層のノードがそれぞれ、運動パターンを示す運動シンボル、文章を示す潜在変数、言語を示すグラフ構造をなし、第2層の文章の潜在変数を介して運動シンボルと単語との連想関係を表すようにしている。 Non-Patent Document 1 describes the outline of the motor language model in the study of verbalizing the physical movements of humans and robots. In this motor language model, the nodes of the first, second, and third layers form a motion symbol indicating a motion pattern, a latent variable indicating a sentence, and a graph structure indicating a language, respectively, and the latent variable of the sentence in the second layer. The association between the movement symbol and the word is expressed through.
特開2019-3683号公報Japanese Unexamined Patent Publication No. 2019-3683 特開2018-195053号公報Japanese Unexamined Patent Publication No. 2018-195053 特表2017-527035号公報Special Table 2017-527035
 しかしながら、特許文献1,2は運動を計測する一方、運動内容を言語化して出力するものではなく、しかも特許文献1,2及び特許文献3のいずれも機械学習を利用して運動・言語化処理を行うものでもない。また、非特許文献1は、1層の文章を示す潜在変数層を有するグラフ構造からなる運動言語モデルを使用して運動シンボルから単語を生成するようにしたものであり、連想単語を最適化して抽出処理するには、未だ改善の余地があり得る。 However, while Patent Documents 1 and 2 measure motion, they do not verbalize and output the content of motion, and both Patent Documents 1 and 2 and Patent Document 3 utilize machine learning to perform motion / verbalization processing. It does not do. In addition, Non-Patent Document 1 uses a motor language model consisting of a graph structure having a latent variable layer showing a single layer of sentences to generate words from motor symbols, and optimizes associative words. There may still be room for improvement in the extraction process.
 本発明は、上記に鑑みてなされたもので、多層統計モデルを用いて運動パターンを表す運動記号から連想単語を最適化の下で抽出し、かつ単語の並び替えを行って文章化する運動言語化装置、運動言語化方法、プログラム及び運動記録装置を提供するものである。 The present invention has been made in view of the above, and is a motor language in which associative words are optimized and extracted from motion symbols representing motion patterns using a multi-layer statistical model, and the words are rearranged into sentences. It provides a computerization device, an exercise verbalization method, a program, and an exercise recording device.
 本発明に係る運動言語化装置は、3次元の時系列の運動データ及び前記運動の種類である運動パターンを表す運動記号と、前記各運動記号に対する単語の集合からなる説明文とを対応付けて学習データとして記憶する学習データ記憶部と、入力される3次元の運動データを対応する運動パターンに分類する分類手段と、前記運動記号から構成される入力層と前記言語から構成される出力層とが所定数の隠れ変数層を介して接続され、前記運動記号と前記運動記号に対応する説明文中の各単語との結びつき確率を学習を経て取得したパラメータを有する多層統計モデルと、前記学習データ及び前記多層統計モデルの前記パラメータを適用して、前記分類手段で分類された運動パターンから各単語の結びつきの強さを表す確率を算出する第1の演算手段と、前記第1の演算手段で得られた各単語の確率と前記各単語の並びの確率とを用いて文章を作成する第2の演算手段とを備えたものである。 The motion verbalization device according to the present invention associates motion data representing a three-dimensional time-series motion data and motion patterns, which are the types of motion, with motion symbols consisting of a set of words for each motion symbol. A learning data storage unit that stores as learning data, a classification means that classifies input three-dimensional motion data into corresponding motion patterns, an input layer composed of the motion symbols, and an output layer composed of the language. Are connected via a predetermined number of hidden variable layers, and a multi-layer statistical model having parameters obtained by learning the connection probability between the movement symbol and each word in the explanation corresponding to the movement symbol, the training data, and The first calculation means for calculating the probability of expressing the strength of the connection of each word from the movement patterns classified by the classification means by applying the parameters of the multi-layer statistical model, and the first calculation means. It is provided with a second calculation means for creating a sentence by using the probability of each word and the probability of arrangement of each word.
 また、本発明に係る運動言語化方法は、3次元の時系列の運動データ及び前記運動の種類である運動パターンを表す運動記号と、記各運動記号に対する単語の集合からなる説明文とを対応付けて学習データとして記憶する学習データ記憶ステップと、前記運動記号から構成される入力層と前記言語から構成される出力層とが所定数の隠れ変数層を介して接続され、前記運動記号と前記運動記号に対応する説明文中の各単語との結びつき確率を学習を経てパラメータを取得する多層統計モデル実行ステップと、入力される3次元の運動データを対応する運動パターンに分類する分類ステップと、前記学習データ及び前記多層統計モデルの前記パラメータを適用して、前記分類手段で分類された運動パターンから各単語の結びつきの強さを表す確率を算出する第1の演算ステップと、前記第1の演算手段で得られた各単語の確率と前記各単語の並びの確率とを用いて文章を作成する第2の演算ステップとを備えたものである。 Further, the motion verbalization method according to the present invention corresponds to three-dimensional time-series motion data, motion symbols representing motion patterns that are the types of motion, and explanatory text consisting of a set of words for each motion symbol. A learning data storage step to be attached and stored as learning data, an input layer composed of the motion symbols, and an output layer composed of the language are connected via a predetermined number of hidden variable layers, and the motion symbols and the motion symbols are described. A multi-layer statistical model execution step that acquires parameters by learning the connection probability with each word in the explanation corresponding to the motion symbol, a classification step that classifies the input three-dimensional motion data into the corresponding motion patterns, and the above. The first calculation step of applying the training data and the parameters of the multi-layer statistical model to calculate the probability of expressing the strength of the connection of each word from the movement patterns classified by the classification means, and the first calculation. It is provided with a second calculation step of creating a sentence by using the probability of each word obtained by the means and the probability of the arrangement of each word.
 また、本発明に係るプログラムは、コンピュータを、前記運動言語化装置として機能させるためのものである。 Further, the program according to the present invention is for making a computer function as the motor verbalization device.
 また、本発明に係る運動記録装置は、前記運動言語化装置と、前記第2の演算手段によって生成された文章を記録する記録部と、前記記録部に記録された前記文章を出力する出力部とを備えたことを特徴とするものである。 Further, the motion recording device according to the present invention includes the motion verbalization device, a recording unit for recording a sentence generated by the second arithmetic means, and an output unit for outputting the sentence recorded in the recording unit. It is characterized by having and.
 これらの発明によれば、事前学習において、学習データ記憶部に、3次元の時系列の運動データ及び前記運動の種類である運動パターンを表す運動記号と、前記各運動記号に対する単語の集合からなる説明文とが対応付けて学習データとして記憶され、多層統計モデルに、前記運動記号から構成される入力層と前記言語から構成される出力層とが所定数の隠れ変数層を介して接続され、前記運動記号と前記運動記号に対応する説明文中の各単語との結びつき確率を学習を経て取得したパラメータが記憶される。また、分類手段によって、入力される3次元の運動データが対応する運動パターンに分類される。そして、第1の演算手段によって、前記学習データ及び前記多層統計モデルの前記パラメータを適用して、前記分類手段で分類された運動パターンから各単語の結びつきの強さを表す確率が算出され、第2の演算手段によって、前記第1の演算手段で得られた各単語の確率と前記各単語の並びの確率とを用いて文章が作成される。このように、運動と単語の連想関係を多層統計モデルによって抽出することで、身体運動から、それに関連の深い単語を計算することができる。さらに、文章中の単語の並びも統計モデルとして表現することによって、先の運動に関係がある単語を並べ替えて文章としての妥当性を確率的に計算できる。この確率が高い単語の並びを計算することによって、動きを文章化することができる。また、隠れ変数層の層数を問わず、例えば層数の所定数としては0,1,2,3,4,…であっても同様な学習処理を施すことが可能となる。 According to these inventions, in the pre-learning, the learning data storage unit is composed of a three-dimensional time-series motion data, a motion symbol representing a motion pattern which is a type of the motion, and a set of words for each motion symbol. The explanatory text is associated with the training data and stored as training data, and the input layer composed of the motion symbols and the output layer composed of the language are connected to the multi-layer statistical model via a predetermined number of hidden variable layers. The parameters obtained through learning the connection probability between the movement symbol and each word in the explanatory text corresponding to the movement symbol are stored. Further, the input three-dimensional motion data is classified into the corresponding motion patterns by the classification means. Then, by applying the training data and the parameters of the multi-layer statistical model by the first calculation means, the probability of expressing the strength of the connection of each word is calculated from the motion patterns classified by the classification means, and the probability of expressing the strength of the connection of each word is calculated. A sentence is created by the second calculation means using the probability of each word obtained by the first calculation means and the probability of arrangement of each word. In this way, by extracting the associative relationship between exercise and words by a multi-layer statistical model, it is possible to calculate words closely related to it from physical exercise. Furthermore, by expressing the sequence of words in a sentence as a statistical model, the words related to the previous movement can be rearranged and the validity as a sentence can be calculated stochastically. By calculating the sequence of words with a high probability, the movement can be documented. Further, regardless of the number of hidden variable layers, for example, even if the predetermined number of layers is 0, 1, 2, 3, 4, ..., The same learning process can be performed.
 本発明によれば、運動パターンを連想する単語をより最適化して抽出し、かつ並び替えて文章を作成することを可能とする。 According to the present invention, it is possible to more optimize and extract words associated with movement patterns, and to rearrange and create sentences.
本発明に係る運動言語化装置の一実施形態を示すブロック図である。It is a block diagram which shows one Embodiment of the motor language conversion apparatus which concerns on this invention. 学習データのメモリマップを示す図である。It is a figure which shows the memory map of the training data. 多層統計モデルの一実施形態を示すグラフ構造図である。It is a graph structure diagram which shows one Embodiment of a multi-layer statistical model. 言語モデルの一例を示す図である。It is a figure which shows an example of a language model. 言語化された文章を可視化した説明図である。It is an explanatory diagram which visualized the verbalized sentence. 機械学習処理の一例を示すフローチャートである。It is a flowchart which shows an example of a machine learning process. 運動言語化処理の一例を示すフローチャートである。It is a flowchart which shows an example of the motor language processing.
 図1は、本発明に係る運動言語化装置の一実施形態を示すブロック図である。運動言語化装置1は、制御部10及び記憶部20を備え、さらに運動計測部31及び入力部32を備えている。運動言語化装置1は、後述するように、2段階の処理が実行可能である。まず、事前学習処理として、例えば機械学習処理が実行され、次に計測態様の人体運動を計測して、運動を言語化する運動言語化処理が実行される。運動計測部31及び入力部32、さらに後述するいくつかの機能部は、事前学習処理で使用され、一方、運動計測部31は両処理で使用され、さらに他の機能部は運動言語化処理で使用される。 FIG. 1 is a block diagram showing an embodiment of the motor verbalization device according to the present invention. The motor verbalization device 1 includes a control unit 10 and a storage unit 20, and further includes a motion measurement unit 31 and an input unit 32. As will be described later, the motor verbalization device 1 can execute two-step processing. First, as the pre-learning process, for example, a machine learning process is executed, and then a motor verbalization process for measuring the human body motion in the measurement mode and verbalizing the motion is executed. The motion measurement unit 31, the input unit 32, and some functional units described later are used in the pre-learning process, while the motion measurement unit 31 is used in both processes, and the other functional units are used in the motor verbalization process. used.
 運動計測部31は、撮像した人体の動きを含む画像に運動解析を施して得られた運動データを制御部10側に有線又は無線で送信する。運動計測部31は、本実施形態では光学的モーションキャプチャが採用される。入力部32は、文字が入力可能なもので、好ましくはキーボード等が採用されて、各種の単語を入力して文章を作成する。入力された文章は、制御部10側に入力され、制御部10側で、公知の形態素解析によって単語に分解される。なお、解析された単語には単語IDが付され、以降の処理では単語IDが適用される。具体的には、運動計測部31は、人体の全身運動を撮像する。入力部32は、運動計測部31で得られた運動の状況乃至種類を説明する端的な乃至恣意的な文章を複数入力する。運動の種類としては、一例として「ジャンプ」、「走る」、「バドミントンをする」、「歩く」がある。なお、運動計測部31は、制御部10とは別体で設けられ、撮像した画像、計測した運動データを、一旦内蔵の(図略の)記憶部材に取り込んだ後、オフラインで制御部10を介して記憶部20に送出する態様でもよい。 The motion measurement unit 31 transmits the motion data obtained by performing motion analysis on the captured image including the movement of the human body to the control unit 10 side by wire or wirelessly. In the present embodiment, the motion measurement unit 31 employs optical motion capture. The input unit 32 is capable of inputting characters, and preferably a keyboard or the like is adopted to input various words to create a sentence. The input sentence is input to the control unit 10 side, and is decomposed into words by a known morphological analysis on the control unit 10 side. A word ID is attached to the analyzed word, and the word ID is applied in the subsequent processing. Specifically, the motion measurement unit 31 captures the whole body motion of the human body. The input unit 32 inputs a plurality of simple or arbitrary sentences explaining the state or type of exercise obtained by the exercise measurement unit 31. Examples of types of exercise include "jumping," "running," "playing badminton," and "walking." The motion measurement unit 31 is provided separately from the control unit 10, and after the captured image and the measured motion data are once taken into the built-in storage member (not shown), the control unit 10 is offline. It may be in a mode of sending to the storage unit 20 via the device.
 運動計測部31である公知の光学的モーションキャプチャは、マーカと、マーカを撮像する複数台のカメラと、画像処理部とを備える。光学的モーションキャプチャは、運動する人体の所定箇所、典型的には関節部位にマーカを貼り付け、これらのマーカを、予め複数箇所に配置した複数のカメラで撮像し、画像処理部は、公知のように、撮像されたマーカ画像から各マーカの3次元位置情報、すなわち人体の関節位置を時系列の運動データとして計測する。 The known optical motion capture unit 31, which is the motion measurement unit 31, includes a marker, a plurality of cameras that capture the marker, and an image processing unit. In optical motion capture, markers are attached to predetermined parts of a moving human body, typically joint parts, and these markers are imaged by a plurality of cameras arranged in a plurality of places in advance, and an image processing unit is known. As described above, the three-dimensional position information of each marker, that is, the joint position of the human body is measured as time-series motion data from the captured marker image.
 記憶部20は、処理途中の情報を一時的に記憶するワークエリアの他、制御部10が実行する各処理プログラムを記憶する処理プログラム記憶部21を有する。また、記憶部20は、学習データ記憶部22、分類用パラメータ記憶部23、多層統計モデル記憶部24及び文法モデル記憶部25を有する。また、記憶部20は、処理途中のデータを一時的に保管するワークエリアを備えている。 The storage unit 20 has a processing program storage unit 21 that stores each processing program executed by the control unit 10 in addition to a work area that temporarily stores information in the middle of processing. Further, the storage unit 20 includes a learning data storage unit 22, a classification parameter storage unit 23, a multi-layer statistical model storage unit 24, and a grammar model storage unit 25. In addition, the storage unit 20 includes a work area for temporarily storing data in the process of processing.
 処理プログラム記憶部21には、前記事前学習処理及び運動言語化処理を実行する各処理プログラムが記憶されている。 The processing program storage unit 21 stores each processing program that executes the pre-learning process and the motor verbalization process.
 学習データ記憶部22には、制御部10によって、事前に作成された学習データが記憶される。図2は、学習データの一例を示したもので、項目として、人体の運動を運動計測部31で計測して得た時系列の運動データと、この運動データの種類である運動パターンと、運動内容を説明するもので、人手によって入力された説明文が運動パターン毎に保管される。運動データは、人体の特定部位、ここでは関節部位及び関節角度の少なくとも一方である。運動パターンは、運動の種類を示すもので、運動記号λとして人手によって入力される。説明文は、人手によって複数の文章が入力される。例えば、運動記号λ1については、文章s11,s12が、運動記号λ2については、文章s21,s22,s23が作成されている。より具体的には、運動の種類が「走る」場合、「he runs .」、「aplayer runs .」、「a student runs .」などである。 The learning data storage unit 22 stores the learning data created in advance by the control unit 10. FIG. 2 shows an example of learning data. As items, time-series motion data obtained by measuring the motion of the human body by the motion measuring unit 31, exercise patterns, which are the types of the exercise data, and exercise. It explains the content, and the explanation text entered manually is stored for each exercise pattern. The motion data is at least one of a specific part of the human body, here a joint part and a joint angle. The movement pattern indicates the type of movement and is manually input as the movement symbol λ. As the explanation, a plurality of sentences are manually input. For example, sentences s11 and s12 are created for the motion symbol λ1, and sentences s21, s22 and s23 are created for the motion symbol λ2. More specifically, when the type of exercise is "run", it is "he runs.", "Aplayer runs.", "A student runs.", Etc.
 分類用パラメータ記憶部23には、事前学習時に取得した運動データと、対応する運動パターンとから運動データを分類する分類用パラメータを生成し、運動記号λと対応付けて記憶される。 The classification parameter storage unit 23 generates classification parameters for classifying the movement data from the movement data acquired at the time of pre-learning and the corresponding movement patterns, and stores them in association with the movement symbol λ.
 多層統計モデル記憶部24は、事前学習において取得した、運動記号λと説明文s中の単語wとの連想関係(結びつきの強さ)を多層統計モデルによって表現するもので、運動記号λから構成される入力層と、単語wから構成される出力層とが、K個の隠れ変数層を介して接続されている。 The multi-layer statistical model storage unit 24 expresses the associative relationship (strength of connection) between the motion symbol λ and the word w in the explanatory text s acquired in the pre-learning by the multi-layer statistical model, and is composed of the motion symbol λ. The input layer to be generated and the output layer composed of the word w are connected via K hidden variable layers.
 図3は、多層統計モデルの一実施形態を示すグラフ構造図である。図3では、隠れ変数層は、z(1)とz(2)の2層である。 FIG. 3 is a graph structure diagram showing an embodiment of a multi-layer statistical model. In FIG. 3, there are two hidden variable layers, z (1) and z (2).
 多層統計モデル記憶部24には、運動記号λから単語wが生成される確率を得る数1が記憶されている。ここに、運動記号から単語が生成される確率は、その運動記号にのみ依存するという仮定を用いる。 The multi-layer statistical model storage unit 24 stores the number 1 that obtains the probability that the word w is generated from the motion symbol λ. Here, we use the assumption that the probability that a word is generated from a motion symbol depends only on that motion symbol.
Figure JPOXMLDOC01-appb-M000002
Figure JPOXMLDOC01-appb-M000002
 ここで、z(k)は第k番目の隠れ層における状態を示している。 Here, z (k) indicates the state in the kth hidden layer.
 運動記号と説明文章中の単語の学習データセット(λ,w(i))を用いて、下記の数2に示す目的関数Φ(λ,w)が最大になるような確率パラメータP(z(1)|λ),P(z(k)|z(k-1)),…,P(w|z(k))を求める最適化問題を解く。この最適化問題を、EMアルゴリズムによって解く。 Probability parameter P (z) such that the objective function Φ (λ, w) shown in Equation 2 below is maximized using the learning data set (λ i , w (i) ) of the words in the motion symbol and the explanatory text. (1) Solve the optimization problem for finding | λ), P (z (k) | z (k-1) ), ..., P (w | z (k)). This optimization problem is solved by the EM algorithm.
Figure JPOXMLDOC01-appb-M000003
Figure JPOXMLDOC01-appb-M000003
 数2の目的関数が最大になるように、EMアルゴリズムによって運動言語モデルのパラメータを最適化する。EMアルゴリズムは、統計学において確率モデルのパラメータを最尤推定する手法の一つであり、観測不可能な潜在変数に確率モデルが依存する場合に用いられる。EMアルゴリズムは、機械学習に応用されるもので、前回推定されたモデルパラメータに基づき隠れ変数層の分布を計算するステップと、計算した隠れ変数層の分布に基づき目的関数が最大となるようにモデルパラメータの最適値を推定するステップとを交互に計算する手法(反復法の一種)である。多層統計モデル記憶部24には、最適化されたパラメータも書き込まれる。各単語には、事前学習で得られた説明文の中に出現する回数に基づいて確率が設定される。 The parameters of the motor language model are optimized by the EM algorithm so that the objective function of Equation 2 is maximized. The EM algorithm is one of the methods for maximum likelihood estimation of the parameters of the probability model in statistics, and is used when the probability model depends on unobservable latent variables. The EM algorithm is applied to machine learning, and the step of calculating the distribution of the hidden variable layer based on the previously estimated model parameters and the model so that the objective function is maximized based on the calculated distribution of the hidden variable layer. This is a method (a type of iterative method) that alternately calculates the steps for estimating the optimum value of a parameter. Optimized parameters are also written in the multi-layer statistical model storage unit 24. For each word, a probability is set based on the number of times it appears in the explanatory text obtained by pre-learning.
 また、文法モデル記憶部25は、文章を構成する単語の並びに関する文法を記述したものである。本実施形態では、N-gramモデル、すなわち,前に位置するN-1個分の単語から次の位置に出現する単語の確率を学習したものである。 Further, the grammar model storage unit 25 describes the grammar related to the sequence of words constituting the sentence. In this embodiment, the N-gram model, that is, the probabilities of the words appearing in the next position from the preceding N-1 words are learned.
 図4は、言語モデルの一例を示すもので、ノードは各単語を表し、エッジは単語間の遷移を表す。値Nは、2~4程度が好ましい。図4に示す、2-gramモデルでは、文章中の単語はその直前の単語にのみ依存するという仮定に基づいて、単語間の関係性を、単語間の遷移確率と、単語が先頭に表れる初期状態確率とにより、文章構造が表現される。図4において、例えば、P(“ball”|“a”)は、単語“a”の後ろに単語“ball”が位置する確率を示している。遷移確率は、説明文中に該当する並びが出現する回数に基づいて設定され、初期状態確率は、説明文の先頭に出現した回数に基づいて設定される。かかる文法ルールは、人手で品詞を入力する煩雑な作業が省略できる分、簡易な方法となる。 FIG. 4 shows an example of a language model, where nodes represent each word and edges represent transitions between words. The value N is preferably about 2 to 4. In the 2-gram model shown in FIG. 4, based on the assumption that the word in the sentence depends only on the word immediately before it, the relationship between words, the transition probability between words, and the initial appearance of the word at the beginning appear. The sentence structure is expressed by the state probability. In FIG. 4, for example, P (“ball” | “a”) indicates the probability that the word “ball” will be located after the word “a”. The transition probability is set based on the number of times the corresponding sequence appears in the description, and the initial state probability is set based on the number of times the corresponding sequence appears at the beginning of the description. Such a grammatical rule is a simple method because it eliminates the complicated work of manually inputting part of speech.
 これらの各記憶部22~25に記憶されるデータは事前学習用の処理プログラムによって生成される。 The data stored in each of these storage units 22 to 25 is generated by a processing program for pre-learning.
 制御部10は、典型的にはCPUを内蔵するプロセッサ(コンピュータ)で構成されている。制御部10は、前記処理プログラムを処理プログラム記憶部21から図略のメインメモリに読み出して実行することで、運動データ入力部101、入力文字解析部102、運動パターン分類部103、運動言語化演算部104、単語並び演算部105、表示画像作成部106及び記録処理部107として機能する。 The control unit 10 is typically composed of a processor (computer) having a built-in CPU. The control unit 10 reads the processing program from the processing program storage unit 21 into the main memory (not shown) and executes it, so that the movement data input unit 101, the input character analysis unit 102, the movement pattern classification unit 103, and the movement verbalization calculation are performed. It functions as a unit 104, a word sequence calculation unit 105, a display image creation unit 106, and a recording processing unit 107.
 運動データ入力部101は、事前学習及び運動言語化処理において運動計測部31で計測された運動データを取り込むものである。 The exercise data input unit 101 takes in the exercise data measured by the exercise measurement unit 31 in the pre-learning and exercise verbalization processing.
 入力文字解析部102は、事前学習処理において入力部32によって入力された説明文データを該当の運動データと関連付けて取り込むものである。 The input character analysis unit 102 captures the explanatory text data input by the input unit 32 in the pre-learning process in association with the corresponding exercise data.
 運動パターン分類部103は、運動言語化処理時に入力される運動データの運動パターン、すなわち運動記号を特定する処理を行うものである。運動パターン分類部103は、種々の方法が採用可能であるが、本実施形態では、分類用パラメータを利用して、事前学習時に得られた各運動パターンに対応する運動データと、今回取り込んだ運動データとの差分を算出し、運動パターン毎の差分の大小を比較して差分の最も小さい運動パターンを特定し、運動記号を設定する。 The movement pattern classification unit 103 performs processing for specifying the movement pattern of the movement data input during the movement verbalization processing, that is, the movement symbol. Various methods can be adopted by the exercise pattern classification unit 103, but in the present embodiment, the exercise data corresponding to each exercise pattern obtained at the time of pre-learning and the exercise captured this time are used by using the classification parameters. The difference from the data is calculated, the magnitude of the difference for each movement pattern is compared, the movement pattern with the smallest difference is specified, and the movement symbol is set.
 運動言語化演算部104は、多層統計モデル記憶部24に記憶されている、数1、数2、多層統計モデル及びパラメータを利用して、運動パターン分類部103で分類された運動パターンの運動記号λから各単語が連想される確率を計算する。 The motion verbalization calculation unit 104 uses the number 1, number 2, multi-layer statistical model and parameters stored in the multi-layer statistical model storage unit 24 to classify the motion symbols of the motion patterns classified by the motion pattern classification unit 103. Calculate the probability that each word is associated with λ.
 単語並び演算部105は、運動言語化演算部104で計算された各単語の連想確率と、文法統計モデルの確率値とを適用して順次単語を並べた場合の確率値の合計が最も高い確率となる文章を作成する。2番手以降の所定乃至所要の順番までの文章も作成する。 The word sequence calculation unit 105 has the highest total probability when the words are arranged sequentially by applying the associative probability of each word calculated by the motor verbalization calculation unit 104 and the probability value of the grammatical statistical model. Create a sentence that becomes. Also create sentences from the second place onward in the prescribed to required order.
 運動言語化演算部104で抽出された複数の単語wに対して文法統計モデルの確率値を適用して順次単語を並べ替えて、最も高い確率となる文章を作成する。運動言語化演算部104が2番手以降の所定乃至所要の順番まで抽出した場合には、それらについても文章化を行う。 The probability values of the grammatical statistical model are applied to the plurality of words w extracted by the motor verbalization calculation unit 104, and the words are sequentially rearranged to create a sentence having the highest probability. When the motor verbalization calculation unit 104 extracts the second and subsequent numbers in a predetermined to required order, they are also documented.
 表示画像作成部106は、単語並び演算部105によって作成された文章を表示部12に表示するものである。記録処理部107は、作成された文章を、運動データと対応して記憶部20の所定の記憶部に記憶する。記録処理部107は、作成された文章を、撮像した画像と対応して記憶してもよい。 The display image creation unit 106 displays the sentence created by the word sequence calculation unit 105 on the display unit 12. The recording processing unit 107 stores the created text in a predetermined storage unit of the storage unit 20 in correspondence with the exercise data. The recording processing unit 107 may store the created text in correspondence with the captured image.
 図5は、言語化された文章を可視化した説明図である。図5(A)は「ジャンプ」、図5(B)は「走る」、図5(C)は「バドミントンをする」の運動パターンである。例えば、図5(B)において、最適の文章は、1行目の“a student runs.”であり、次候補は、2行目の“a player runs.”であり、次々候補は、3行目の“a player walks.”である。 FIG. 5 is an explanatory diagram that visualizes the verbalized sentences. FIG. 5 (A) is an exercise pattern of “jumping”, FIG. 5 (B) is an exercise pattern of “running”, and FIG. 5 (C) is an exercise pattern of “playing badminton”. For example, in FIG. 5 (B), the optimum sentence is “a student runs.” On the first line, the next candidate is “a player runs.” On the second line, and the candidates are three lines one after another. The eyes are "a player walks."
 図6は、機械学習処理の一例を示すフローチャートである。まず、運動データ入力部101は、運動計測部31から運動データを取得する(ステップS1)。次いで、運動パターン分類部103は、入力された運動データに対して、入力部32から入力された運動パターンの情報を対応付ける処理を行う(ステップS3)。また、入力文字解析部102は、入力部32によって入力された説明文データを該当の運動データと関連付けて取得する(ステップS5)。 FIG. 6 is a flowchart showing an example of machine learning processing. First, the exercise data input unit 101 acquires exercise data from the exercise measurement unit 31 (step S1). Next, the exercise pattern classification unit 103 performs a process of associating the input exercise data with the information of the exercise pattern input from the input unit 32 (step S3). Further, the input character analysis unit 102 acquires the explanatory text data input by the input unit 32 in association with the corresponding exercise data (step S5).
 そして、運動言語化演算部104は、事前学習中に取得した説明文に対して形態素解析によって単語に分解し、数1、数2、多層統計モデル及び学習中のパラメータを適用して、運動パターンの運動記号λから説明文中の単語wを発生確率と関連付けて抽出する(ステップS7)。 Then, the motor verbalization calculation unit 104 decomposes the explanatory text acquired during the pre-learning into words by morphological analysis, applies the numbers 1 and 2, the multi-layer statistical model, and the parameters being learned to the movement pattern. The word w in the explanatory text is extracted from the motion symbol λ of the above in association with the probability of occurrence (step S7).
 続いて、単語並び演算部105は、運動言語化演算部104で抽出された複数の単語wに対して文法統計モデルに基づいて、単語の並び確率、ここでは単語間の遷移確率と単語が先頭に表れる初期状態確率とを算出する(ステップS9)。次いで、得られたデータを文法モデル記憶部25に保存する(ステップS11)。なお、かかる機械学習は、必要に応じて継続する態様としてもよく、この場合、より高精度の運動言語化が実現可能となる。 Subsequently, the word sequence calculation unit 105 starts with the word sequence probability, here the transition probability between words and the word at the beginning, based on the grammatical statistical model for the plurality of words w extracted by the motor verbalization calculation unit 104. The initial state probability appearing in (step S9) is calculated. Next, the obtained data is stored in the grammar model storage unit 25 (step S11). It should be noted that such machine learning may be continued as necessary, and in this case, more accurate motor languageization can be realized.
 図7は、運動言語化処理の一例を示すフローチャートである。まず、運動データ入力部101は、運動計測部31から運動データを取得する(ステップS21)。次いで、運動パターン分類部103は、入力された運動データに対して運動パターンの分類処理を実行する(ステップS23)。次に、運動言語化演算部104は、数1、数2、多層統計モデル、及び事前学習中に取得したパラメータを適用して、運動パターンから各単語が連想される確率を計算する(ステップS25)。 FIG. 7 is a flowchart showing an example of motor verbalization processing. First, the exercise data input unit 101 acquires exercise data from the exercise measurement unit 31 (step S21). Next, the movement pattern classification unit 103 executes a movement pattern classification process on the input movement data (step S23). Next, the motor verbalization calculation unit 104 applies the equations 1 and 2, the multi-layer statistical model, and the parameters acquired during the pre-learning to calculate the probability that each word is associated with the exercise pattern (step S25). ).
 次いで、単語並び演算部105は、運動言語化演算部104で計算された単語の連想確率と文法統計モデルに基づいて、単語の並び確率、ここでは単語間の遷移確率と単語が先頭に表れる初期状態確率とを算出し、文章化を行う(ステップS27)。そして、作成された文章は、記憶部20又は表示部12に出力される(ステップS29)。 Next, the word sequence calculation unit 105 is based on the word associative probability and the grammatical statistical model calculated by the motor verbalization calculation unit 104, and the word sequence probability, here the transition probability between words and the initial appearance of the word at the beginning. The state probability is calculated and the text is written (step S27). Then, the created sentence is output to the storage unit 20 or the display unit 12 (step S29).
<実験例>
 運動計測部31である光学式モーションキャプチャにて157種類の運動パターンを計測した。各運動パターンに対して3試行を計測し、合計471個の運動データに、人手で765個の説明文を付与した。説明文中で使用した語彙数は259であった。これらを学習データとして、多層統計モデルの確率パラメータを最適化した。本実験では、図3のグラフ構造において、隠れ層を2層とし、第1隠れ層の状態数を100、第2隠れ層の状態を50と設定した。また、文章中の単語の並びに関する知識を、2gram統計モデルとして表現した。図5は、本実験例に基づいて作成された文章例を示している。前記グラフ構造である運動と単語の連想確率の多層統計モデルと、単語の並びに関する統計モデルとを繋げることによって、運動記号から確率が高い文章(単語の並び)を作成することができた。
<Experimental example>
157 kinds of motion patterns were measured by the optical motion capture which is the motion measurement unit 31. Three trials were measured for each exercise pattern, and 765 explanatory texts were manually added to a total of 471 exercise data. The number of vocabularies used in the explanation was 259. Using these as training data, the probability parameters of the multi-layer statistical model were optimized. In this experiment, in the graph structure of FIG. 3, the hidden layer was set to two layers, the number of states of the first hidden layer was set to 100, and the state of the second hidden layer was set to 50. In addition, the knowledge about the sequence of words in the sentence was expressed as a 2 gram statistical model. FIG. 5 shows an example of a sentence created based on this experimental example. By connecting the multi-layer statistical model of motion and word associative probability, which is the graph structure, with the statistical model of word sequence, it was possible to create a sentence (word sequence) with high probability from the motion symbol.
 なお、本発明は以下の実施形態を含む。 The present invention includes the following embodiments.
(1)なお、人間の全身運動を計測する技術としては、前記光学式モーションキャプチャに代えて、加速度センサを内蔵したIMU(inertial measurement unit)センサを利用する方式、また、センサやマーカを用いず、撮像画像に対する画像処理を活用する方式(例えば本特許出願人による特許出願「特願2019-157640」)も採用可能である。 (1) As a technique for measuring the whole body movement of a human being, instead of the optical motion capture, an IMU (inertial measurement unit) sensor having a built-in acceleration sensor is used, and no sensor or marker is used. , A method utilizing image processing for a captured image (for example, the patent application "Japanese Patent Application No. 2019-157640" by the present applicant) can also be adopted.
(2)舞台上の演出者やスポーツ選手の映像から動きデータを作成し、動作の内容を文章で表示乃至音声で出力することで、自動解説を行うようにしてもよい。 (2) Motion data may be created from images of a director or an athlete on the stage, and the content of the motion may be displayed in sentences or output by voice to provide automatic explanation.
(3)医療、介護現場において、患者の動きを計測し、計測結果に基づいて作成された文章を、日誌として自動作成するようにしてもよい。この場合、運動計測部31で対象となる患者を特定して、当該患者のリハビリなどの動き、運動を計測し、記録処理部107は、計測結果から、リハビリ日誌、介護日誌などの形式で作成した文章を保存する。上記において、通常とは異なる動きの文章が観察された場合、異常が発生したと判断するような監視手段としても機能する。 (3) In medical and long-term care sites, the movement of the patient may be measured, and a sentence created based on the measurement result may be automatically created as a diary. In this case, the exercise measurement unit 31 identifies the target patient, measures the movement and exercise of the patient such as rehabilitation, and the recording processing unit 107 creates a rehabilitation diary, a long-term care diary, or the like from the measurement results. Save the text. In the above, when a sentence with unusual movement is observed, it also functions as a monitoring means for determining that an abnormality has occurred.
(4)本運動言語化装置をマンマシンインターフェース用ロボットに搭載し、このロボットと対面する人の動き(身振り手振りなど)を計測して言語化することで、人との間で効果的なコミュニケーションを取ることが可能となる。また、計測対象は人に限定されない。 (4) Effective communication with humans by mounting this motor verbalization device on a robot for man-machine interface and measuring and verbalizing the movements (gestures, gestures, etc.) of the person facing this robot. It becomes possible to take. Moreover, the measurement target is not limited to people.
(5)本発明における単語並び演算は、それぞれの言語に応じた文法ルールを適用することができる。また、本発明は、事前学習としてディープラーニングを適用したものでもよい。また、N-gram統計モデルに代えて、他の単語の並びを表す統計モデルを採用してもよい。 (5) In the word sequence operation in the present invention, grammatical rules corresponding to each language can be applied. Further, the present invention may apply deep learning as pre-learning. Further, instead of the N-gram statistical model, a statistical model representing a sequence of other words may be adopted.
(6)なお、本発明は、運動から文章を構築する方向の処理であるが、逆に、関節部位にアクチュエータが搭載された、運動可能なロボットに対して文章を入力すると、対応する運動(動作)を行う方向の処理にも適用可能である。これによれば、人間がロボットに文字乃至音声によって運動指令情報を提供することで、対応する運動をロボットに再現させることが可能となる。 (6) The present invention is a process in the direction of constructing a sentence from a motion, but conversely, when a sentence is input to a movable robot equipped with an actuator at a joint part, the corresponding motion (6) It can also be applied to processing in the direction of performing operation). According to this, when a human provides motion command information to a robot by characters or voice, it becomes possible for the robot to reproduce the corresponding motion.
 以上説明したように、本発明に係る運動言語化装置は、3次元の時系列の運動データ及び前記運動の種類である運動パターンを表す運動記号と、前記各運動記号に対する単語の集合からなる説明文とを対応付けて学習データとして記憶する学習データ記憶部と、入力される3次元の運動データを対応する運動パターンに分類する分類手段と、前記運動記号から構成される入力層と前記言語から構成される出力層とが所定数の隠れ変数層を介して接続され、前記運動記号と前記運動記号に対応する説明文中の各単語との結びつき確率を学習を経て取得したパラメータを有する多層統計モデルと、前記学習データ及び前記多層統計モデルの前記パラメータを適用して、前記分類手段で分類された運動パターンから各単語の結びつきの強さを表す確率を算出する第1の演算手段と、前記第1の演算手段で得られた各単語の確率と前記各単語の並びの確率とを用いて文章を作成する第2の演算手段とを備えることが好ましい。 As described above, the motion verbalization device according to the present invention is described by consisting of three-dimensional time-series motion data, motion symbols representing motion patterns that are the types of motion, and a set of words for each motion symbol. From a learning data storage unit that associates sentences and stores them as learning data, a classification means that classifies input three-dimensional motion data into corresponding motion patterns, an input layer composed of the motion symbols, and the language. A multi-layer statistical model in which the constituent output layers are connected via a predetermined number of hidden variable layers, and the probability of connection between the motion symbol and each word in the explanatory text corresponding to the motion symbol is acquired through learning. The first calculation means for calculating the probability of expressing the strength of the connection of each word from the movement patterns classified by the classification means by applying the training data and the parameters of the multi-layer statistical model, and the first calculation means. It is preferable to include a second calculation means for creating a sentence by using the probability of each word obtained by the calculation means of 1 and the probability of arrangement of each word.
 また、本発明に係る運動言語化方法は、3次元の時系列の運動データ及び前記運動の種類である運動パターンを表す運動記号と、記各運動記号に対する単語の集合からなる説明文とを対応付けて学習データとして記憶する学習データ記憶ステップと、前記運動記号から構成される入力層と前記言語から構成される出力層とが所定数の隠れ変数層を介して接続され、前記運動記号と前記運動記号に対応する説明文中の各単語との結びつき確率を学習を経てパラメータを取得する多層統計モデル実行ステップと、入力される3次元の運動データを対応する運動パターンに分類する分類ステップと、前記学習データ及び前記多層統計モデルの前記パラメータを適用して、前記分類手段で分類された運動パターンから各単語の結びつきの強さを表す確率を算出する第1の演算ステップと、前記第1の演算手段で得られた各単語の確率と前記各単語の並びの確率とを用いて文章を作成する第2の演算ステップとを備えることが好ましい。 Further, the motion verbalization method according to the present invention corresponds to three-dimensional time-series motion data, motion symbols representing motion patterns that are the types of motion, and explanatory text consisting of a set of words for each motion symbol. A learning data storage step to be attached and stored as learning data, an input layer composed of the motion symbols, and an output layer composed of the language are connected via a predetermined number of hidden variable layers, and the motion symbols and the motion symbols are described. A multi-layer statistical model execution step that acquires parameters by learning the connection probability with each word in the explanation corresponding to the motion symbol, a classification step that classifies the input three-dimensional motion data into the corresponding motion patterns, and the above. The first calculation step of applying the training data and the parameters of the multi-layer statistical model to calculate the probability of expressing the strength of the connection of each word from the movement patterns classified by the classification means, and the first calculation. It is preferable to include a second calculation step of creating a sentence by using the probability of each word obtained by the means and the probability of the arrangement of each word.
 また、本発明に係るプログラムは、コンピュータを、前記運動言語化装置として機能させるためのものである。 Further, the program according to the present invention is for making a computer function as the motor verbalization device.
 また、本発明に係る運動記録装置は、前記運動言語化装置と、前記第2の演算手段によって生成された文章を記録する記録部と、前記記録部に記録された前記文章を出力する出力部とを備えたことを特徴とすることが好ましい。 Further, the motion recording device according to the present invention includes the motion verbalization device, a recording unit for recording a sentence generated by the second arithmetic means, and an output unit for outputting the sentence recorded in the recording unit. It is preferable that the device is provided with.
 これらの発明によれば、事前学習において、学習データ記憶部に、3次元の時系列の運動データ及び前記運動の種類である運動パターンを表す運動記号と、前記各運動記号に対する単語の集合からなる説明文とが対応付けて学習データとして記憶され、多層統計モデルに、前記運動記号から構成される入力層と前記言語から構成される出力層とが所定数の隠れ変数層を介して接続され、前記運動記号と前記運動記号に対応する説明文中の各単語との結びつき確率を学習を経て取得したパラメータが記憶される。また、分類手段によって、入力される3次元の運動データが対応する運動パターンに分類される。そして、第1の演算手段によって、前記学習データ及び前記多層統計モデルの前記パラメータを適用して、前記分類手段で分類された運動パターンから各単語の結びつきの強さを表す確率が算出され、第2の演算手段によって、前記第1の演算手段で得られた各単語の確率と前記各単語の並びの確率とを用いて文章が作成される。このように、運動と単語の連想関係を多層統計モデルによって抽出することで、身体運動から、それに関連の深い単語を計算することができる。さらに、文章中の単語の並びも統計モデルとして表現することによって、先の運動に関係がある単語を並べ替えて文章としての妥当性を確率的に計算できる。この確率が高い単語の並びを計算することによって、動きを文章化することができる。また、隠れ変数層の層数を問わず、例えば層数の所定数としては0,1,2,3,4,…であっても同様な学習処理を施すことが可能となる。 According to these inventions, in the pre-learning, the learning data storage unit is composed of a three-dimensional time-series motion data, a motion symbol representing a motion pattern which is a type of the motion, and a set of words for each motion symbol. The explanatory text is associated with the training data and stored as training data, and the input layer composed of the motion symbols and the output layer composed of the language are connected to the multi-layer statistical model via a predetermined number of hidden variable layers. The parameters obtained through learning the connection probability between the movement symbol and each word in the explanatory text corresponding to the movement symbol are stored. Further, the input three-dimensional motion data is classified into the corresponding motion patterns by the classification means. Then, by applying the training data and the parameters of the multi-layer statistical model by the first calculation means, the probability of expressing the strength of the connection of each word is calculated from the motion patterns classified by the classification means, and the probability of expressing the strength of the connection of each word is calculated. A sentence is created by the second calculation means using the probability of each word obtained by the first calculation means and the probability of arrangement of each word. In this way, by extracting the associative relationship between exercise and words by a multi-layer statistical model, it is possible to calculate words closely related to it from physical exercise. Furthermore, by expressing the sequence of words in a sentence as a statistical model, the words related to the previous movement can be rearranged and the validity as a sentence can be calculated stochastically. By calculating the sequence of words with a high probability, the movement can be documented. Further, regardless of the number of hidden variable layers, for example, even if the predetermined number of layers is 0, 1, 2, 3, 4, ..., The same learning process can be performed.
 また、本発明は、前記多層統計モデルの隠れ変数層を2層とすることが好ましく、より連想関係の高い単語の選出が可能となる。 Further, in the present invention, it is preferable that the hidden variable layer of the multi-layer statistical model is two layers, and words having a higher associative relationship can be selected.
 また、本発明は、前記多層統計モデルを適用して、機械学習を経て前記パラメータを取得することが好ましく、これにより、運動と単語との連想関係(結びつきの強さ)の精度を高めることができる。 Further, in the present invention, it is preferable to apply the multi-layer statistical model to acquire the parameters through machine learning, thereby improving the accuracy of the associative relationship (strength of connection) between movement and words. it can.
 また、前記第1の演算手段は、前記運動記号λから前記単語wが生成される確率P(w|λ)を式(1)で表し、前記運動記号と説明文章中の単語の学習データセット(λ,w(i))を用いて、式(2)で示す目的関数Φ(λ,w)が最大になるような確率パラメータP(z(1)|λ),P(z(k)|z(k-1)),…,P(w|z(k))を求めることが好ましい。 Further, the first calculation means expresses the probability P (w | λ) that the word w is generated from the motion symbol λ by the equation (1), and is a learning data set of the motion symbol and the word in the explanatory text. Using (λ i , w (i) ), the probability parameters P (z (1) | λ), P (z (k ) so that the objective function Φ (λ, w) represented by the equation (2) is maximized. ) | Z (k-1) ), ..., P (w | z (k) ) is preferably obtained.
Figure JPOXMLDOC01-appb-M000004
Figure JPOXMLDOC01-appb-M000004
 z(k)はk番目の隠れ変数層における状態を示す。この構成によれば、運動と単語との連想関係の精度を高めることができる。また、隠れ変数層数を表すkを、k=0,1,2,3,4,…のように自由に設定することが可能となり、かつ層数に関わりなく、数3による演算が実行できる。 z (k) indicates the state in the kth hidden variable layer. According to this configuration, the accuracy of the associative relationship between movement and words can be improved. Further, k representing the number of hidden variable layers can be freely set as k = 0,1,2,3,4, ..., And the operation by the number 3 can be executed regardless of the number of layers. ..
 また、第2の演算手段は、前記各単語の並びの確率をN-gram統計モデルを用いて算出することが好ましい。この構成によれば、簡易な方法で単語を並び替えて文章を作成できる。 Further, as the second calculation means, it is preferable to calculate the probability of the arrangement of each of the words by using the N-gram statistical model. According to this configuration, words can be rearranged to create a sentence in a simple manner.
 1 運動言語化装置
 10 制御部
 101 運動データ入力部
 102 入力文字解析部
 103 運動パターン分類部(分類手段)
 104 運動言語化演算部(第1の演算手段)
 105 単語並び演算部(第2の演算手段)
 106 表示画像作成部
 107 記録処理部
 20 記憶部
 22 学習データ記憶部
 23 分類用パラメータ記憶部
 24 多層統計モデル記憶部
 25 文法モデル記憶部
 31 運動計測部
 32 入力部
1 Exercise verbalization device 10 Control unit 101 Exercise data input unit 102 Input character analysis unit 103 Exercise pattern classification unit (classification means)
104 Motor verbalization calculation unit (first calculation means)
105 Word sequence calculation unit (second calculation means)
106 Display image creation unit 107 Recording processing unit 20 Storage unit 22 Learning data storage unit 23 Classification parameter storage unit 24 Multi-layer statistical model storage unit 25 Grammar model storage unit 31 Motion measurement unit 32 Input unit

Claims (8)

  1.  3次元の時系列の運動データ及び前記運動の種類である運動パターンを表す運動記号と、前記各運動記号に対する単語の集合からなる説明文とを対応付けて学習データとして記憶する学習データ記憶部と、
     入力される3次元の運動データを対応する運動パターンに分類する分類手段と、
     前記運動記号から構成される入力層と前記言語から構成される出力層とが所定数の隠れ変数層を介して接続され、前記運動記号と前記運動記号に対応する説明文中の各単語との結びつき確率を学習を経て取得したパラメータを有する多層統計モデルと、
     前記学習データ及び前記多層統計モデルの前記パラメータを適用して、前記分類手段で分類された運動パターンから各単語の結びつきの強さを表す確率を算出する第1の演算手段と、
     前記第1の演算手段で得られた各単語の確率と前記各単語の並びの確率とを用いて文章を作成する第2の演算手段とを備えた運動言語化装置。
    A learning data storage unit that stores three-dimensional time-series motion data, motion symbols representing motion patterns that are the types of motion, and explanatory text consisting of a set of words for each motion symbol as learning data. ,
    A classification means that classifies the input 3D motion data into the corresponding motion patterns,
    The input layer composed of the movement symbols and the output layer composed of the language are connected via a predetermined number of hidden variable layers, and the movement symbols are connected to each word in the explanatory text corresponding to the movement symbols. A multi-layer statistical model with parameters obtained by learning probabilities,
    A first calculation means for calculating the probability of expressing the strength of the connection of each word from the motion patterns classified by the classification means by applying the learning data and the parameters of the multi-layer statistical model.
    A motor verbalization device including a second arithmetic means for creating a sentence by using the probability of each word obtained by the first arithmetic means and the probability of arrangement of each word.
  2.  前記多層統計モデルは、隠れ変数層が2層である請求項1に記載の運動言語化装置。 The motor verbalization device according to claim 1, wherein the multi-layer statistical model has two hidden variable layers.
  3.  前記多層統計モデルは、機械学習を経て前記パラメータを取得したことを特徴とする請求項1又は2に記載の運動言語化装置。 The motor verbalization device according to claim 1 or 2, wherein the multi-layer statistical model acquires the parameters through machine learning.
  4.  前記第1の演算手段は、前記運動記号λから前記単語wが生成される確率P(w|λ)を式(1)で表し、前記運動記号と説明文章中の単語の学習データセット(λ,w(i))を用いて、式(2)で示す目的関数Φ(λ,w)が最大になるような確率パラメータP(z(1)|λ),P(z(k)|z(k-1)),…,P(w|z(k))を求めるものである請求項1~3のいずれかに記載の運動言語化装置。
    Figure JPOXMLDOC01-appb-M000001
     なお、z(k)はk番目の隠れ変数層における状態を示す。
    The first calculation means expresses the probability P (w | λ) that the word w is generated from the motion symbol λ by the equation (1), and the learning data set (λ) of the motion symbol and the word in the explanatory text. Using i , w (i) ), the probability parameters P (z (1) | λ), P (z (k) | so that the objective function Φ (λ, w) represented by the equation (2) is maximized. The motor verbalization device according to any one of claims 1 to 3, wherein z (k-1) ), ..., P (w | z (k)) is obtained.
    Figure JPOXMLDOC01-appb-M000001
    Note that z (k) indicates the state in the kth hidden variable layer.
  5.  第2の演算手段は、前記各単語の並びの確率をN-gram統計モデルを用いて算出した請求項1~4のいずれかに記載の運動言語化装置。 The second calculation means is the motor languageization device according to any one of claims 1 to 4, wherein the probability of the sequence of each word is calculated using an N-gram statistical model.
  6.  3次元の時系列の運動データ及び前記運動の種類である運動パターンを表す運動記号と、前記各運動記号に対する単語の集合からなる説明文とを対応付けて学習データとして記憶する学習データ記憶ステップと、
     前記運動記号から構成される入力層と前記言語から構成される出力層とが所定数の隠れ変数層を介して接続され、前記運動記号と前記運動記号に対応する説明文中の各単語との結びつき確率を学習を経てパラメータを取得する多層統計モデル実行ステップと、
     入力される3次元の運動データを対応する運動パターンに分類する分類ステップと、
     前記学習データ及び前記多層統計モデルの前記パラメータを適用して、前記分類手段で分類された運動パターンから各単語の結びつきの強さを表す確率を算出する第1の演算ステップと、
     前記第1の演算手段で得られた各単語の確率と前記各単語の並びの確率とを用いて文章を作成する第2の演算ステップとを備えた運動言語化方法。
    A learning data storage step in which a three-dimensional time-series motion data, a motion symbol representing a motion pattern that is a type of the motion, and an explanatory text consisting of a set of words for each motion symbol are associated and stored as learning data. ,
    The input layer composed of the movement symbols and the output layer composed of the language are connected via a predetermined number of hidden variable layers, and the movement symbols are connected to each word in the explanatory text corresponding to the movement symbols. Multi-layer statistical model execution step to acquire parameters by learning probabilities,
    A classification step that classifies the input 3D motion data into the corresponding motion patterns,
    A first calculation step of applying the learning data and the parameters of the multi-layer statistical model to calculate the probability of expressing the strength of the connection of each word from the motion patterns classified by the classification means, and
    A motor verbalization method including a second calculation step of creating a sentence using the probability of each word obtained by the first calculation means and the probability of arrangement of each word.
  7.  コンピュータを、請求項1~5のいずれかに記載の運動言語化装置として機能させるためのプログラム。 A program for making a computer function as the motor languageization device according to any one of claims 1 to 5.
  8.  請求項1~5のいずれかに記載の運動言語化装置と、
     前記第2の演算手段によって作成された文章を記録する記録部と、
     前記記録部に記録された前記文章を出力する出力部とを備えたことを特徴とする運動記録装置。
    The motor verbalization device according to any one of claims 1 to 5.
    A recording unit that records sentences created by the second calculation means, and
    An exercise recording device including an output unit that outputs the text recorded in the recording unit.
PCT/JP2020/031664 2019-08-30 2020-08-21 Motion verbalization device, motion verbalization method, program, and motion recording device WO2021039641A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
JP2019-157721 2019-08-30
JP2019157721 2019-08-30

Publications (1)

Publication Number Publication Date
WO2021039641A1 true WO2021039641A1 (en) 2021-03-04

Family

ID=74685484

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2020/031664 WO2021039641A1 (en) 2019-08-30 2020-08-21 Motion verbalization device, motion verbalization method, program, and motion recording device

Country Status (1)

Country Link
WO (1) WO2021039641A1 (en)

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
ANONYMOUS: "Humanoid robots for everyday life,~ The computational theory that connects words and movements opens the door", HUMANOID ROBOTS FOR EVERYDAY LIFE,~ THE COMPUTATIONAL THEORY THAT CONNECTS WORDS AND MOVEMENTS OPENS THE DOOR, 7 June 2018 (2018-06-07), pages 1 - 7, XP055796690, Retrieved from the Internet <URL:https://resou.osaka-u.ac.jp/ja/feature/2018/pshw4z> [retrieved on 20201007] *
MAEDA, SHINICHI, 7TH: "Deep Learning", THE JAPANESE SOCIETY FOR ARTIFICIAL INTELLIGENCE, vol. 29, no. 4, 1 July 2014 (2014-07-01), pages 366 - 380 *
TAKANO WATARU, NAKAMURA YOSHIHIKO: "Statistical mutual conversion between whole body motion primitives and linguistic sentences for human motions", THE INTERNATIONAL JOURNAL OF ROBOTICS RESEARCH, vol. 34, no. 10, 2015, pages 1314 - 1328, XP055796678 *

Similar Documents

Publication Publication Date Title
Ahuja et al. Language2pose: Natural language grounded pose forecasting
Zhu et al. AR-mentor: Augmented reality based mentoring system
CN112053690B (en) Cross-mode multi-feature fusion audio/video voice recognition method and system
JP6583537B2 (en) Operation information generator
JP7146247B2 (en) Motion recognition method and device
US20090066641A1 (en) Methods and Systems for Interpretation and Processing of Data Streams
Avola et al. Deep temporal analysis for non-acted body affect recognition
JP2023076426A (en) Machine learning system for technical knowledge capture
JP5252393B2 (en) Motion learning device
Gürpınar et al. Sign recognition system for an assistive robot sign tutor for children
Menegozzo et al. Surgical gesture recognition with time delay neural network based on kinematic data
Wagner et al. Building a robust system for multimodal emotion recognition
KR20210054349A (en) Method for predicting clinical functional assessment scale using feature values derived by upper limb movement of patients
Shurid et al. Bangla sign language recognition and sentence building using deep learning
Sarma et al. Real-Time Indian Sign Language Recognition System using YOLOv3 Model
JP7192860B2 (en) Motion estimation system, motion estimation method, and motion estimation program
Françoise et al. Movement sequence analysis using hidden Markov models: a case study in Tai Chi performance
WO2021039641A1 (en) Motion verbalization device, motion verbalization method, program, and motion recording device
Trejo et al. Recognition of Yoga poses through an interactive system with Kinect based on confidence value
Siby et al. Gesture based real-time sign language recognition system
Rehman et al. A Real-Time Approach for Finger Spelling Interpretation Based on American Sign Language Using Neural Networks
Argyropoulos et al. Multimodal user interface for the communication of the disabled
Rodríguez-Moreno et al. A Hierarchical Approach for Spanish Sign Language Recognition: From Weak Classification to Robust Recognition System
Cutugno et al. Interacting with robots via speech and gestures, an integrated architecture.
Goutsu et al. Multi-modal gesture recognition using integrated model of motion, audio and video

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20857449

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20857449

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: JP