JP2976811B2

JP2976811B2 - Human Body Motion Speech Generation System from Text

Info

Publication number: JP2976811B2
Application number: JP12600294A
Authority: JP
Inventors: 山呂
Original assignee: Nippon Electric Co Ltd
Current assignee: NEC Corp
Priority date: 1994-06-08
Filing date: 1994-06-08
Publication date: 1999-11-10
Anticipated expiration: 2014-11-10
Also published as: JPH07334507A

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【産業上の利用分野】本発明は、テキストからの人体動
作音声生成システムに関し、特に計算機システムを用い
て自然言語で書かれたテキストファイルから、人体動作
および音声の生成や人間の動きと音声のアニメーション
の作成を行うシステムに関する。The present invention relates to relates to human behavior sound generation system from the text, especially from a text file written in natural language by using the computer system, human motion and the voice of the generation and human movement and voice Related to a system for creating animations.

【０００２】[0002]

【従来の技術】従来の技術として、計算機による自然言
語の構文解析技術を利用した人体動作の生成方法が、特
開平４−２６４９７２号公報に開示されている。これは
自然言語の構文解析技術を用いて、テキストから解析さ
れる動作を表す特定な単語などを利用して動作プログラ
ムを生成し、人体動作の生成を行うものである。2. Description of the Related Art As a conventional technique, a method of generating a human body motion using a natural language syntax analysis technique by a computer is disclosed in Japanese Patent Application Laid-Open No. 4-264972. In this method, an action program is generated by using a specific word or the like representing an action analyzed from a text using a syntax analysis technique of a natural language, and a human body action is generated.

【０００３】また、別の従来の技術として、音声合成装
置からの音素を利用して人間の口形の変化を生成する方
法が、特開平２−２３４２８５号公報に開示されてい
る。これはテキストファイルから規則音声合成装置によ
って生成された音声の音素を利用して、各音素に対応す
る口形特徴のパラメータを制御することにより、人間の
口形の変化を生成するものである。As another conventional technique, Japanese Patent Laid-Open Publication No. Hei 2-234285 discloses a method for generating a change in the shape of a human mouth using phonemes from a speech synthesizer. It utilizes the phonemes of the voice generated by the rule based speech synthesizer from a text file, by controlling the parameters of the mouth shape features corresponding to each phoneme, and generates a change in the human mouth shape.

【０００４】[0004]

【発明が解決しようとする課題】上述した従来の技術に
は、次のような問題点が存在する。The above-mentioned prior art has the following problems.

【０００５】（１）前者の場合は、テキストから動作の
生成が可能であるが、その動作に同期した音声の出力は
できない。(1) In the former case, an action can be generated from a text, but a sound cannot be output in synchronization with the action.

【０００６】（２）さらに、前者では動作を記述するア
ニメーション専用の動作プログラムを生成するが、その
動作プログラムが計算機プログラムと同様な形式に記述
されているため、プログラマーではない一般利用者の編
集作業には適していない。(2) Furthermore, in the former, an animation-specific operation program for describing an operation is generated. However, since the operation program is described in a format similar to that of a computer program, the editing work of a general user who is not a programmer. Not suitable for

【０００７】（３）後者の場合は、テキストの音声出力
から人間動作の一部である口形の変化だけを生成してお
り、より自然な人間な動作映像を生成するための身体な
どの他の部分の動きの生成が困難である。(3) In the latter case, only a change in the mouth shape, which is a part of the human motion, is generated from the audio output of the text, and other changes such as the body for generating a more natural human motion image. It is difficult to generate the motion of the part.

【０００８】[0008]

【課題を解決するための手段】本発明のテキストからの
人体動作音声生成システムは、上述した（１）項および
（３）項に記載の課題を解決するために、テキストから
動詞や副詞などの単語を取り出す自然言語解析手段と、
動詞などの動作を表す単語と人体動作パターンの対応関
係を記述する動詞・動作パターン辞書と、前記自然言語
解析手段で抽出された動詞を用いて前記動詞・動作パタ
ーン辞書を検索し人体動作パターンを生成する動作パタ
ーン生成手段と、動詞を修飾する副詞などの修飾語と動
作の程度の対応関係を記述する修飾語・動作程度辞書
と、前記自然言語解析手段で抽出された修飾語を用いて
前記修飾語・動作程度辞書を検索し前記人体動作パター
ンの動作程度を生成する動作程度生成手段と、動作映像
出力と合成音声出力とを同期させるためのテキスト中の
動詞の出現位置を動作生成の開始時刻に対応付けた動作
時間データ、およびテキストの長さから計算されるテキ
ストの読み上げ時間を含む音声時間データを出力する動
作音声同期化手段と、前記人体動作パターン，動作程度
および動作時間データを含む動作生成命令を入力とし人
体動作の時系列映像データを生成し表示手段に出力する
動作映像生成手段と、前記テキストおよび前記音声時間
データを含む音声生成命令を入力とし規則音声合成方法
で音声を出力する音声合成手段とを備えている。In order to solve the problems described in the above items (1) and (3), the system for generating a human body motion sound from a text according to the present invention has the following features. Natural language analysis means for extracting words,
A verb / movement pattern dictionary describing the correspondence between words representing movements such as verbs and human body movement patterns, and the verb / movement pattern dictionary using the verb extracted by the natural language analysis means, and the human body movement pattern is searched. A motion pattern generating means for generating, a modifier / movement degree dictionary describing a correspondence between a modifier such as an adverb that modifies a verb and a degree of action, and a modifier extracted by the natural language analyzer. A motion level generating means for searching a modifier / motion level dictionary and generating a motion level of the human body motion pattern; and starting motion generation by determining an appearance position of a verb in a text for synchronizing a motion video output and a synthesized voice output. Operation time synchronization means for outputting operation time data associated with time, and sound time data including a text-to-speech time calculated from the length of the text; and An operation image generation unit that receives an operation generation command including the human body operation pattern, the operation degree, and operation time data, generates time-series image data of the human body operation, and outputs the data to a display unit, and an audio including the text and the audio time data Voice synthesis means for receiving a generation command as input and outputting voice by a rule voice synthesis method.

【０００９】また、上述した（２）項に記載の課題を解
決するために、上記の構成に前記動作パターン生成手
段，前記動作程度生成手段および前記動作音声同期化手
段から出力される前記動作生成命令と前記音声生成命令
とを外部のエディタで修正可能な人間可読のテキストで
ある文章動作記述ファイルに変換する動作音声生成命令
・テキスト変換手段と、前記動作音声生成命令・テキス
ト変換手段に変換された文章動作記述ファイルを格納す
る文章動作記述ファイル蓄積手段と、前記文章動作記述
ファイル蓄積手段からの文章動作記述ファイルを前記動
作生成命令と音声生成命令とに変換し前記動作映像生成
手段および前記音声合成手段に出力するテキスト・動作
音声生成命令変換手段とを付加することにより、外部の
エディターで前記文章動作記述ファイルを修正できるこ
とを可能にしている。Further, in order to solve the problem described in the above item (2), the operation pattern generation means, the operation degree generation means, and the operation sound synchronization means output from the operation sound synchronizing means in the above configuration. An operation voice generation instruction / text conversion unit for converting the instruction and the voice generation instruction into a sentence operation description file which is a human-readable text modifiable by an external editor; and an operation voice generation instruction / text conversion unit. A text motion description file storing means for storing the text motion description file, and converting the text motion description file from the text motion description file storage device into the motion generation command and the voice generation command. By adding a text / action voice generation command conversion means to be output to the synthesis means, the text can be output by an external editor. Is it possible that you can modify the behavior description file.

【００１０】[0010]

【作用】本発明においては、入力されたテキストを解析
して語句ごとに分割し、動詞や動詞修飾語などの単語を
取り出す。そして、動詞に出現する位置を動作開始の信
号とする規則に基づいて、動作生成のタイミングを決定
する。According to the present invention, an input text is analyzed and divided into words and phrases, and words such as verbs and verb modifiers are extracted. Then, based on the rule that the position appearing in the verb is used as the signal for starting the operation, the timing of generating the operation is determined.

【００１１】また、動詞の種類に対応する人体動作パタ
ーンを決定したり、修飾語などを用いて、動作の動きの
程度を決めるところがポイントである。これにより、テ
キストを与えると音声出力および音声と同期した円滑な
人間の動作を自動的に作成することができる。It is also important to determine a human body movement pattern corresponding to the type of verb, and to determine the degree of movement of the movement by using a modifier. Thus, when a text is given, an audio output and a smooth human motion synchronized with the audio can be automatically created.

【００１２】[0012]

【実施例】次に、本発明について図面を参照して説明す
る。Next, the present invention will be described with reference to the drawings.

【００１３】図１は、本発明のテキストからの人体動作
音声生成システムの一実施例を示すブロック図であっ
て、第１の発明の実施例の構成を示す図である。FIG. 1 is a block diagram showing an embodiment of the system for generating human body motion sound from text according to the present invention, and is a diagram showing the configuration of the first embodiment of the present invention.

【００１４】本実施例のテキストからの人体動作音声生
成システムは、図１に示すように、Ａ装置１００とＢ装
置２００とから成り、Ａ装置１００は、テキストから動
詞や副詞などの単語を取り出す自然言語解析装置１と、
動詞などの動作を表す単語と人体動作パターンの対応関
係を記述する動詞・動作パターン辞書２１と、自然言語
解析装置１で抽出された動詞を用いて動詞・動作パター
ン辞書２１を検索し人体動作パターンを生成する動作パ
ターン生成装置２と、動詞を修飾する副詞などの修飾語
と動作の程度の対応関係を記述する修飾語・動作程度辞
書３１と、自然言語解析装置１で抽出された修飾語を用
いて修飾語・動作程度辞書３１を検索し前記人体動作パ
ターンの動作程度を生成する動作程度生成装置３と、動
作映像出力と合成音声出力とを同期させるためのテキス
ト中の動詞の出現位置を動作生成の開始時刻に対応付け
た動作時間データ、およびテキストの長さから計算され
るテキストの読み上げ時間を含む音声時間データを出力
する動作音声同期化装置４とから構成される。As shown in FIG. 1, the system for generating a human body motion sound from text according to the present embodiment includes an A device 100 and a B device 200. The A device 100 extracts words such as verbs and adverbs from the text. A natural language analyzer 1,
A verb / movement pattern dictionary 21 that describes the correspondence between words representing movements such as verbs and human body movement patterns, and a verb / movement pattern dictionary 21 using the verb extracted by the natural language analysis device 1 are searched. , A qualifier / movement degree dictionary 31 that describes the correspondence between a modifier such as an adverb that modifies a verb and the degree of action, and a qualifier extracted by the natural language analyzer 1. The motion degree generating device 3 for searching the modifier / motion degree dictionary 31 and generating the motion degree of the human body motion pattern by using the same, and the appearance position of the verb in the text for synchronizing the motion video output and the synthesized voice output. Motion audio synchronization that outputs motion time data associated with the start time of motion generation and voice time data including text-to-speech time calculated from the length of the text Composed from the device 4.

【００１５】また、Ｂ装置２００は、人体動作パター
ン，動作程度および動作時間データを含む動作生成命令
を入力とし人体動作の時系列映像データを生成し表示装
置に出力する動作映像生成装置５と、テキストおよび音
声時間データを含む音声生成命令を入力とし規則音声合
成方法で音声を出力する音声合成装置６とから構成され
る。The B apparatus 200 receives an operation generation command including a human body operation pattern, an operation degree, and operation time data, generates time-series image data of the human body operation, and outputs the data to a display device. And a voice synthesizing device 6 which receives a voice generation command including text and voice time data as input and outputs voice by a rule voice synthesis method.

【００１６】自然言語解析装置１は、外部より入力され
るテキストから個々の単語を取り出す。この自然言語解
析装置１は、従来からの構文解析の手法を用いて、文か
ら単語を抽出する。ここでは、ＣＹＫ法を用いた構文解
析の方法（杉村領一，赤坂宏二，久保幸弘：論理型形態
素解析ＬＡＸ，Ｐｒｏｃ．ｏｆｔｈｅＬｏｇｉｃ
ＰｒｏｇｒａｍｍｉｎｇＣｏｎｆ．ＩＣＯＴ，２
１３−２２２，１９８８年）などを利用している。な
お、自然言語解析装置１は、既存技術を利用するので、
ここでは詳細な説明を省略する。The natural language analyzer 1 extracts individual words from a text input from the outside. The natural language analysis device 1 extracts words from a sentence by using a conventional syntax analysis method. Here, a parsing method using the CYK method (Ryoichi Sugimura, Koji Akasaka, Yukihiro Kubo: Logic Morphological Analysis LAX, Proc. Of the Logic)
Programming Conf. ICOT, 2
13-222, 1988). Since the natural language analysis device 1 uses an existing technology,
Here, detailed description is omitted.

【００１７】この自然言語解析装置１によって、例え
ば、「彼が気持ち良く笑った」のような文を、「彼・が
・気持ち良く・笑った」の個々の独立した語句に分解す
ることができる。また、次に示すように、文の中に各々
の語句の文法上の意味が得られる。With this natural language analyzer 1, for example, a sentence such as "he laughed comfortably" can be decomposed into individual words such as "he laughs comfortably". Further, as shown below, the grammatical meaning of each phrase is obtained in the sentence.

【００１８】彼：主語が：助詞気持ち良く：修飾語笑った：動詞自然言語解析装置１から出力される動詞を動作パターン
生成装置２の入力として、人体動作パターンを生成す
る。具体的には動詞および人体動作パターンの対応関係
を格納した動詞・動作パターン辞書２１を調べ、入力さ
れた動詞に対応した動作パターンを生成し出力する。He: Subject: Particles Pleasant: Modifiers Laughed: Verbs A verb output from the natural language analyzer 1 is used as an input to the motion pattern generator 2 to generate a human body motion pattern. More specifically, the verb / motion pattern dictionary 21 storing the correspondence between the verb and the human body motion pattern is examined, and a motion pattern corresponding to the input verb is generated and output.

【００１９】次に、人体動作パターンを生成する処理の
詳細について説明する。Next, details of the processing for generating a human body movement pattern will be described.

【００２０】動詞・動作パターン辞書２１は、１つの動
詞に対して複数の人体動作パターンを対応させ、さら
に、それぞれの人体動作パターンに優先度が付与されて
いる。従って、動詞から人体動作パターンを検索すると
き、優先度の高い順に人体動作パターンを出力する。The verb / motion pattern dictionary 21 associates a plurality of human body motion patterns with one verb, and each of the human body motion patterns is given a priority. Therefore, when retrieving a human body motion pattern from a verb, the human body motion patterns are output in descending order of priority.

【００２１】図５は、図１の動作パターン生成装置２の
詳細を示すブロック図である。図５において、動作パタ
ーン制約条件辞書２６は、順序を付けられた２つの前後
の人体動作パターンが適切であるか否かの情報を格納す
る。例えば、頭部が左に傾けた状態で、頭部を前後に振
るといった動作パターンは明らかに不自然で適切ではな
いと定義される。こうした前人体動作パターンから現在
生成される人体動作パターンが適切であるか否かの知識
を動作パターン制約条件辞書２６の中に定義する。FIG. 5 is a block diagram showing details of the operation pattern generation device 2 of FIG. In FIG. 5, the motion pattern constraint dictionary 26 stores information as to whether or not two ordered front and rear human body motion patterns are appropriate. For example, an action pattern in which the head is swung back and forth while the head is tilted to the left is obviously unnatural and is not appropriate. The knowledge as to whether or not the human body motion pattern currently generated from the previous human body motion pattern is appropriate is defined in the motion pattern constraint dictionary 26.

【００２２】図６は、図５の動作パターン生成装置２が
人体動作パターンを生成する処理の流れを示す流れ図で
ある。FIG. 6 is a flowchart showing the flow of the process in which the motion pattern generation device 2 of FIG. 5 generates a human body motion pattern.

【００２３】まず、自然言語解析装置１から得られた動
詞を、動作パターン探索部２２に入力する。このとき、
カウント発生器２３から現在のカウント値（初期値が
“０”である）に“１”を加え、そのカウント値を動作
パターン探索部２２に送る。First, the verb obtained from the natural language analyzer 1 is input to the motion pattern search unit 22. At this time,
The count generator 23 adds “1” to the current count value (the initial value is “0”), and sends the count value to the operation pattern search unit 22.

【００２４】以下、動作パターン生成装置２の動作につ
いて、カウント値＝１の場合と、カウント値＞１の場合
とに分けて説明する。カウント値＝１の場合；まず、動作パターン探索部２
２が入力された動詞を基に、動詞・動作パターン辞書２
１から優先度の最も高い人体動作パターンを取り出す。
そして、取り出された人体動作パターンを出力すると同
時に、この人体動作パターンとカウント値とを合わせて
ヒストリー記憶部２４に記憶させる。カウント値＞１の場合；動作パターン探索部２２が入
力された動詞を基に、動詞・動作パターン辞書２１から
優先度の最も高い人体動作パターン（ＭＰ_i ）を取り出
す。次に、ヒストリー記憶部２４から現カウント値の１
つ前の人体動作パターン（ＭＰ_i-1 ）を取り出す。そし
て、人体動作パターン（ＭＰ_i ）および（ＭＰ_i-1 ）を
動作パターン照合部２５に送る。Hereinafter, the operation of the operation pattern generation device 2 will be described separately for a case where the count value = 1 and a case where the count value> 1. When the count value = 1; first, the operation pattern search unit 2
Verb / operation pattern dictionary 2 based on the verb into which 2 is input
From 1, the highest priority human body motion pattern is extracted.
Then, the extracted human body movement pattern is output, and at the same time, the history storage unit 24 stores the human body movement pattern together with the count value. When the count value> 1, the motion pattern search unit 22 extracts the highest priority human body motion pattern (MP _i ) from the verb / motion pattern dictionary 21 based on the input verb. Next, the current count value of 1 is stored in the history storage unit 24.
The previous human body movement pattern (MP _i-1 ) is extracted. Then, the human body motion patterns (MP _i ) and (MP _i-1 ) are sent to the motion pattern matching unit 25.

【００２５】動作パターン照合部２５は、動作パターン
制約条件辞書２６を参考にし、１つ前の人体動作パター
ン（ＭＰ_i-1 ）に対して現在の人体動作パターン（ＭＰ
_i ）が適切であるか否かを判断する。The motion pattern collation unit 25 refers to the motion pattern constraint dictionary 26 to compare the current human motion pattern (MP _i-1 ) with the previous human motion pattern (MP _i-1 ).
Determine if _i ) is appropriate.

【００２６】そして、適切であると判断されると、現在
の人体動作パターン（ＭＰ_i ）を出力し、現在のカウン
ト値と現在の人体動作パターン（ＭＰ_i ）とをヒストリ
ー記憶部２４に記憶させる。また、適切ではないと判断
されると、動作パターン探索部２２がもう一度次に優先
度の高い人体動作パターン（ＭＰ′_i-1 ）を取り出す。If it is determined that the current human body motion pattern (MP _i ) is appropriate, the current count value and the current human body motion pattern (MP _i ) are stored in the history storage unit 24. . On the other hand, if it is determined that the human body motion pattern is not appropriate, the motion pattern search unit 22 takes out the human body motion pattern (MP ' _i-1 ) having the next highest priority again.

【００２７】次に、この人体動作パターン（ＭＰ′
_i-1 ）と１つ前の人体動作パターン（ＭＰ_i-1 ）とを用
いて人体動作パターンの照合を行い、適切と判断される
まで人体動作パターンの探索・照合の処理を繰り返えす
とともに、探索されたすべての人体動作パターンが不適
切であると判断されると、優先度の最も高い人体動作パ
ターンを出力する。Next, the human body movement pattern (MP '
_i-1 ) and the immediately preceding human body movement pattern (MP _i-1 ) to perform matching of the human body movement pattern, and repeat the process of searching and matching the human body movement pattern until it is determined to be appropriate. If it is determined that all of the searched human body motion patterns are inappropriate, the human body motion pattern having the highest priority is output.

【００２８】さらに、自然言語解析装置１から出力され
る修飾語を動作程度生成装置３の入力として、人体動作
パターンの程度を記述する動作程度のデータを生成す
る。具体的には、修飾語および人体動作パターンの動作
程度の対応関係を格納した修飾語・動作程度辞書３１を
検索し、入力された修飾語に対応した動作程度を生成し
出力する。ここで、動作程度を表現するため、数値デー
タを用いることができる。Further, by using the modifier output from the natural language analyzer 1 as an input to the operation degree generator 3, operation degree data describing the degree of the human body operation pattern is generated. More specifically, it searches the modifier / movement degree dictionary 31 storing the correspondence between the modifier and the degree of movement of the human body movement pattern, and generates and outputs the degree of movement corresponding to the inputted modifier. Here, numerical data can be used to express the degree of operation.

【００２９】動作音声同期化装置４は、自然言語解析装
置１から得られた語句を基に、動作の生成および合成音
声の出力のタイミングを一致させる機能を有する。本発
明においては、自然言語解析装置１からの句読点ではさ
まれた１区切りのテキストを動作生成と音声合成の基本
単位として考え、このテキスト中に動作を表す動詞の出
現する位置を動作生成の開始位置とする。その具体的な
処理について以下に説明する。The operation voice synchronizing device 4 has a function of matching the timing of the generation of the operation and the output timing of the synthesized voice based on the phrase obtained from the natural language analysis device 1. In the present invention, one-segment text sandwiched by punctuation marks from the natural language analysis device 1 is considered as a basic unit of action generation and speech synthesis, and a position where a verb representing an action appears in the text is used to start the action generation. Position. The specific processing will be described below.

【００３０】図４は、動作音声同期化装置４の出力する
動作・音声の時間データの一例を示す図である。図４に
おいて、まず、音声合成装置６の音声出力速度を基にし
て１語句を出力する総時間を計算する。例えば、音声合
成装置６が１文字を出力するための所要時間をｔ秒とす
ると、ｎ個の文字からなる語句の出力時間Ｔ_S ＝ｎ×ｔ秒となる。そして、語句の始めの単語に対応して、音声の
開始時刻をｔ_S0秒（相対時間が０秒である）とすると、音声の終了時刻＝ｔ_S0＋Ｔ_S 秒となる。また、語句の中に動詞の出現する位置がｍ文字
目にあるとすると、動作生成の開始時間ｔ_m0＝ｔ_S0＋ｍ×ｔ秒動作時間の長さＴ_m ＝（ｎ−ｍ）×ｔ秒となる。図４には、このようにして計算された動作時間
データと音声時間データとを示す。FIG. 4 is a diagram showing an example of operation / speech time data output from the operation sound synchronizer 4. In FIG. 4, first, the total time for outputting one word is calculated based on the voice output speed of the voice synthesizer 6. For example, assuming that the time required for the speech synthesizer 6 to output one character is t seconds, the output time of a word composed of n characters is T _s = n × t seconds. Then, assuming that the start time of the voice is t _S0 seconds (the relative time is 0 second) corresponding to the first word of the phrase, the end time of the voice = t _S0 + T _S seconds. Assuming that the verb appears in the m-th character in the phrase, the operation generation start time t _m0 = _ts ₀ + m × t seconds The operation time length T _m = (nm) × t seconds Becomes FIG. 4 shows the operation time data and the voice time data calculated in this way.

【００３１】次に、動作映像生成装置５は、動作パター
ン生成装置２からの人体動作パターン、動作程度生成装
置３からの動作程度、および動作音声同期化装置４から
の動作時間データを含む動作生成命令を入力として、デ
ィスプレイ装置やＶＴＲなどに人体動作の時系列画像を
出力する。この動作映像生成装置５においては、人体動
作パターンを複数の動作モジュールの合成による生成方
式（例えば、呂山，吉坂主旬，宮井均：「人体動作生成
システムの提案」，情報処理学会第４７回全国大会講演
論文集（２），３４５−３４６，１９９３年）を利用す
る。Next, the motion video generation device 5 generates a motion including the human body motion pattern from the motion pattern generation device 2, the motion degree from the motion degree generation device 3, and the motion time data from the motion audio synchronization device 4. With the command as input, a time-series image of human body motion is output to a display device, a VTR, or the like. In this operation image generating device 5, generation by synthesis of a plurality of operating modules of the human body movement pattern scheme (e.g., Ryoyama, Kichisaka main season, Hitoshi Miyai: "Proposed human body motion generating system", IPSJ 47 Annual National Conference Lecture Papers (2), 345-346, 1993).

【００３２】続いて、音声合成装置６については、既存
の音声規則合成手法を利用することができる（山本誠
一，樋口宜男，清水水徹：「テキスト編集機能付き音声
規則合成装置の試作」，電子情報通信学会技術報告ＳＰ
８７−１３７，１９８８年３月）。そして、自然言語解
析装置１からの語句と動作音声同期化装置４からの時間
データを含む音声出力命令を入力として、音声を合成し
出力する。Subsequently, the existing speech rule synthesizing method can be used for the speech synthesizer 6 (Seiichi Yamamoto, Yoshio Higuchi, Toru Shimizu: "Trial production of speech rule synthesizer with text editing function", IEICE Technical Report SP
87-137, March 1988). Then, the speech is synthesized and output by using as input the words from the natural language analysis device 1 and the voice output command including the time data from the operation voice synchronization device 4.

【００３３】図２は、第２の発明の一実施例の構成を示
すブロック図である。本実施例は、図２に示すように、
Ａ装置１００から出力される動作生成命令と音声生成命
令とを人間可読のテキストである文章動作記述ファイル
変換する動作音声生成命令・テキスト変換装置７と、こ
の動作音声生成命令・テキスト変換装置７に変換された
文章動作記述ファイルを格納する文章動作記述ファイル
蓄積装置８と、この文章動作記述ファイル蓄積装置８か
らの文章動作記述ファイルを動作生成命令と音声生成命
令とに変換し、Ｂ装置２００に出力するテキスト・動作
音声生成命令変換装置９とから構成される。FIG. 2 is a block diagram showing the configuration of one embodiment of the second invention. In the present embodiment, as shown in FIG.
The operation voice generation command / text converter 7 converts the motion generation command and the voice generation command output from the A device 100 into a sentence motion description file which is a human-readable text, and the operation voice generation command / text converter 7 A sentence action description file storage device 8 that stores the converted sentence action description file, and the sentence action description file from the sentence action description file storage device 8 is converted into an action generation command and a speech generation command, and the B device 200 And a text / action voice generation command conversion device 9 to be output.

【００３４】なお、Ａ装置１００は、動作パターン生成
装置２，動作程度生成装置３および動作音声同期化装置
４を含み、Ｂ装置２００は、動作映像生成装置５および
音声合成装置６を含んでいるが、これらＡ装置１００お
よびＢ装置２００については、第１の発明の実施例にお
いて既に説明済みであり、重複を避けるために省略し、
図２の他の部分について説明する。The A device 100 includes an operation pattern generation device 2, an operation degree generation device 3, and an operation audio synchronization device 4, and the B device 200 includes an operation video generation device 5 and an audio synthesis device 6. However, these A device 100 and B device 200 have already been described in the embodiment of the first invention, and are omitted to avoid duplication.
The other part of FIG. 2 will be described.

【００３５】本実施例では、Ａ装置１００から出力され
る動作生成命令と音声生成命令とを動作音声生成命令・
テキスト変換装置７により文章動作記述ファイルのフォ
ーマットに合ったテキストファイルに変換する。In this embodiment, the operation generation command and the voice generation command output from the A device 100 are
The text converting device 7 converts the text file into a text file conforming to the format of the text description file.

【００３６】図３は、このフォーマットの一例を示す図
である。図３において、テキストファイルに書き込まれ
たテキスト文章に対し、このテキスト中に含まれる動詞
と同じ位置に、アンダーラインマークを付け、さらに、
そのアンダーラインマークの下にＡ装置１００が生成し
た人体動作パターン名，人体動作パターンの動作程度の
パラメータｐと動作時間の長さｔとを記述する。FIG. 3 is a diagram showing an example of this format. In FIG. 3, an underline mark is added to the text sentence written in the text file at the same position as the verb included in the text, and
Below the underline mark, the name of the human body motion pattern generated by the A device 100, the parameter p of the degree of motion of the human body motion pattern, and the length of operation time t are described.

【００３７】次に、動作音声生成命令・テキスト変換装
置７により変換された文章動作記述ファイルを磁気ディ
スク装置などの外部記憶装置から構成される文章動作記
述ファイル蓄積装置８に格納する。この格納した文章動
作記述ファイルは、可読なテキストファイルの形式であ
るため、一般に市販されているテキストエディターを利
用して、動作の修正などを容易に行うことができる。Next, the sentence action description file converted by the action voice generation command / text converter 7 is stored in a sentence action description file storage device 8 composed of an external storage device such as a magnetic disk device. Since the stored sentence operation description file is in the form of a readable text file, the operation can be easily corrected using a generally available text editor.

【００３８】テキスト・動作音声生成命令変換装置９
は、動作音声生成命令・テキスト変換装置７とは反対
に、文章動作記述ファイル書き込まれた人体動作パター
ン，人体動作パターンの動作程度および動作時間データ
を読み出し、Ｂ装置２００中の動作映像生成装置５に入
力の動作生成命令に変換する。続いて、文章動作記述フ
ァイル書き込まれたテキストを読み出し、動作音声同期
化装置４で用いられた音声時間のデータの生成方法を利
用して、音声出力用のテキストと音声時間データを生成
し、音声合成装置６に入力し音声の出力を行う。Text / operation voice generation command converter 9
Reads the human body motion pattern, the degree of motion of the human body motion pattern, and the operation time data written in the text motion description file, as opposed to the motion voice generation command / text conversion device 7, and reads the motion video generation device 5 in the B device 200. Is converted into an operation generation instruction. Then, the text written in the sentence action description file is read out, and a text for voice output and voice time data are generated by using the voice time data generation method used in the motion voice synchronization device 4. The sound is input to the synthesizing device 6 and is output.

【００３９】[0039]

【発明の効果】以上説明したように、本発明のテキスト
からの人体動作音声生成システムは、入力された自然言
語のテキストを合成音声で出力し、音声と同期が取れた
人体動作を自動的に生成することができる。As described above, the human body motion sound generation system from text according to the present invention outputs an input natural language text as synthesized speech and automatically performs human body motion synchronized with the voice. Can be generated.

【００４０】また、元のテキストに近い形の動作音声記
述ファイルを作成し、普通のテキストエディターでその
ファイルを編集することにより、最終的に生成される人
体動作の調整を行うことができる。Further, by creating a motion sound description file in a form close to the original text and editing the file with a normal text editor, it is possible to adjust the finally generated human body motion.

[Brief description of the drawings]

【図１】第１の発明の一実施例の構成を示すブロック図
である。FIG. 1 is a block diagram showing a configuration of an embodiment of the first invention.

【図２】第２の発明の一実施例の構成を示すブロック図
である。FIG. 2 is a block diagram showing a configuration of one embodiment of the second invention.

【図３】文章動作記述ファイルのフォーマットの一例を
示す図である。FIG. 3 is a diagram illustrating an example of a format of a sentence action description file.

【図４】本実施例の動作音声同期化装置の出力する動作
・音声の時間データの一例を示す図である。FIG. 4 is a diagram showing an example of operation / speech time data output by the operation / sound synchronizer of the embodiment.

【図５】図１の動作パターン生成装置の詳細を示すブロ
ック図である。FIG. 5 is a block diagram showing details of the operation pattern generation device of FIG. 1;

【図６】図５の動作パターン生成装置が人体動作パター
ンを生成する処理の流れを示す流れ図である。FIG. 6 is a flowchart showing a flow of a process in which the motion pattern generation device of FIG. 5 generates a human body motion pattern.

[Explanation of symbols]

１自然言語解析装置２動作パターン生成装置３動作程度生成装置４動作音声同期化装置５動作映像生成装置６音声合成装置７動作音声生成命令・テキスト変換装置８文章動作記述ファイル蓄積装置９テキスト・動作音声生成命令変換装置２１動詞・動作パターン辞書２２動作パターン探索部２３カウント発生器２４ヒストリー記憶部２５動作パターン照合部２６動作パターン制約条件辞書３１修飾語・動作程度辞書１００Ａ装置２００Ｂ装置 DESCRIPTION OF SYMBOLS 1 Natural language analysis device 2 Operation pattern generation device 3 Operation degree generation device 4 Operation voice synchronizing device 5 Operation video generation device 6 Voice synthesis device 7 Operation voice generation command / text conversion device 8 Text operation description file storage device 9 Text / operation Speech generation command converter 21 Verb / operation pattern dictionary 22 Operation pattern search unit 23 Count generator 24 History storage unit 25 Operation pattern matching unit 26 Operation pattern constraint dictionary 31 Modifier / operation degree dictionary 100 A device 200 B device

フロントページの続き (56)参考文献特開平６−52290（ＪＰ，Ａ) 特開平４−264972（ＪＰ，Ａ) 特開昭63−59665（ＪＰ，Ａ) 光永知生、三浦恒、倉内喜孝、「音声同期アニメーション生成システム」、テレヒジョン学会技術報告、Ｖｏｌ．17、Ｎｏ．28、ｐ．17−22（1993) 吉坂主旬、呂山、神谷俊之、一色敬、宮井均、「人体の動作を利用した擬人化エージェントに対する考察モジュール型人体動作生成とその応用について」、情報処理学会研究報告、Ｖｏｌ．94、Ｎｏ．31（ＨＩ−54）、ｐ．41−48 （1994) (58)調査した分野(Int.Cl.⁶，ＤＢ名) G06F 17/00 G06T 13/00 G10L 3/00 G06F 17/28 ＪＩＣＳＴファイル（ＪＯＩＳ)Continuation of the front page (56) References JP-A-6-52290 (JP, A) JP-A-4-264972 (JP, A) JP-A-63-59665 (JP, A) Tomio Mitsunaga, Tsune Miura, Yoshitaka Kurauchi , "Sound Synchronized Animation Generation System", Technical Report of the Telegraph John Society, Vol. 17, No. 28, p. 17-22 (1993) Yoshizaka, Shun, Ryoyama, Kamiya Toshiyuki, Isshiki Takashi, Miyai Hitoshi, "Thinking about anthropomorphic agents using human body movements, Modular human body movement generation and its application," Information Processing Society of Japan research report Vol. 94, No. 31 (HI-54), p. 41-48 (1994) (58) Fields investigated (Int. Cl. ⁶ , DB name) G06F 17/00 G06T 13/00 G10L 3/00 G06F 17/28 JICST file (JOIS)

Claims

(57) [Claims]

1. A natural language analyzing means for extracting words such as verbs and adverbs from text, a verb / movement pattern dictionary describing correspondence between words representing actions such as verbs and human body movement patterns; An operation pattern generating means for searching the verb / operation pattern dictionary using the verb extracted in step 1 and generating a human body operation pattern; and a modifier describing a correspondence between a modifier such as an adverb modifying the verb and the degree of operation. An operation degree dictionary; an operation degree generation unit that searches the modifier / operation degree dictionary using the modifier extracted by the natural language analysis unit and generates an operation degree of the human body operation pattern; Calculates the verb appearance position in the text to synchronize with the synthesized speech output from the operation time data that correlates to the start time of the operation generation, and the text length Voice synchronizing means for outputting voice time data including a text-to-speech time to be read, and generating and displaying time-series video data of human body motion by inputting a motion generation command including the human body motion pattern, motion degree and motion time data. Motion video generating means for outputting to the means, and voice synthesizing means for receiving a voice generating instruction including the text and the voice time data as input and outputting voice by a regular voice synthesizing method. Voice generation system.

2. The system according to claim 1, wherein the motion generation command output from the motion pattern generation means, the motion degree generation means, and the motion voice synchronization means, and the voice generation. Text file format that can modify instructions and external editor
An operation voice generation instruction to convert to a text operation description file
Text conversion means, text motion description file storage means for storing the text motion description file converted by the motion voice generation command / text conversion means, and text motion description file from the text motion description file storage means for generating the motion. A text / motion sound generation command conversion means for converting the text motion description file into a command and a voice generation command, and outputting the text / motion voice generation command to the motion video generation means and the voice synthesis means, so that the external action editor can modify the text motion description file. A human body motion sound generation system from text.

3. The system according to claim 2, wherein the text motion description file is a human-readable text.