JP2017009685A

JP2017009685A - Information processing device, information processing method, and program

Info

Publication number: JP2017009685A
Application number: JP2015122667A
Authority: JP
Inventors: 伸夫福島; Nobuo Fukushima; 西尾　浩一; Koichi Nishio; 浩一西尾; 和由曽我; Kazuyoshi Soga; 浩司中村; Koji Nakamura; 木全　英明; Hideaki Kimata; 英明木全
Original assignee: NTT Comware Corp
Current assignee: NTT Comware Corp
Priority date: 2015-06-18
Filing date: 2015-06-18
Publication date: 2017-01-12

Abstract

PROBLEM TO BE SOLVED: To appropriately perform a translation.SOLUTION: An information processing device includes an acquisition part for acquiring first sentence information indicating a sentence of a first language, a translation part for executing translation processing for generating second sentence information indicating a sentence obtained by translating the sentence indicated by the first sentence information into a second language different from the first language, a second evaluation part for evaluating the existence/absence of abnormality in the translation processing, and an operation control part for causing a character to operate on the basis of a second evaluation by the second evaluation part. Also, the information processing device includes, for example a first evaluation part for evaluating the existence/absence of abnormality on voice recognition processing for generating the first sentence information from voice information indicating a voice of the first language, the acquisition part acquires the first sentence information by executing the voice recognition processing, and the operation control part causes the character to operate on the basis of either or both of the first evaluation by the first evaluation part and the second evaluation by the second evaluation part.SELECTED DRAWING: Figure 1

Description

本発明は、情報処理装置、情報処理方法、及びプログラムに関する。 The present invention relates to an information processing apparatus, an information processing method, and a program.

従来、ある言語の文章を、他の言語の文章に翻訳する装置が開発されている。このような機械翻訳の利便性を高めるため、翻訳精度を高めること、翻訳の良否を評価すること等が考えられている。
例えば、特許文献１には、第１データから第２データへのデータ変換手段と、第２データから第１データへのデータ逆変換手段とが併存するデータ変換装置を用いて、該第１データに対して、データ変換手段における変換適性を評価して変換適性値を算出するデータ変換適性評価方法であって、該データ変換手段により第１データを変換して変換後第２データを取得するデータ変換ステップ、該データ逆変換手段により該変換後第２データを逆変換して逆変換後第１データを取得するデータ逆変換ステップ、該第１データと該逆変換後第１データとを類似度算出手段に入力して、所定の類似度算出式により類似度を算出する類似度算出ステップ、該類似度を第１データのデータ変換手段における変換適性値として出力手段から出力する変換適性値出力ステップを含むことを特徴とするデータ変換適性評価方法が開示されている。 Conventionally, an apparatus for translating a sentence in one language into a sentence in another language has been developed. In order to improve the convenience of such machine translation, it is considered to improve translation accuracy, evaluate the quality of translation, and the like.
For example, in Patent Document 1, a data conversion device in which data conversion means from first data to second data and data reverse conversion means from second data to first data coexist is used. On the other hand, a data conversion suitability evaluation method for evaluating a conversion suitability in a data conversion means and calculating a conversion suitability value, wherein the data conversion means converts the first data and obtains the converted second data A conversion step, a data reverse conversion step of reversely converting the converted second data by the data reverse conversion means to obtain the first data after reverse conversion, and the similarity between the first data and the post-inverse converted first data A similarity calculation step for inputting the calculation means to calculate the similarity by a predetermined similarity calculation formula, and converting the output from the output means as the conversion suitability value in the data conversion means for the first data. Data conversion qualification method characterized by comprising the value output steps are disclosed.

特開２００６−２５２３２３号公報JP 2006-252323 A

ところで、音声認識技術や音声合成技術を適用して、通訳のような翻訳サービスを提供する場合、翻訳が困難となる原因は多岐に渡る。例えば、発言が不明瞭であるために、発言内容を特定することが難しかったり、発言内容が特定できても、翻訳文を生成することが難しかったりする場合がある。このような場合には、発話者に対して翻訳を困難にしている原因やその対処法を示すことで、より翻訳しやすい音声の入力を促し再発話させることで、発言者の意図に沿った翻訳を適切に行うことができる。しかしながら、翻訳を困難にしている原因やその対処法をユーザに分かり易く示すことは行われてこなかった。従って、翻訳を適切に行えない場合があった。 By the way, when providing a translation service such as an interpreter by applying a speech recognition technology or a speech synthesis technology, there are various causes for the difficulty of translation. For example, since the utterance is unclear, it may be difficult to specify the content of the utterance, or even if the utterance content can be specified, it may be difficult to generate a translation. In such a case, by explaining the cause of the difficulty of translation to the speaker and how to deal with it, prompting the user to input speech that is easier to translate and re-speaking the speech, in line with the intention of the speaker Translation can be done properly. However, the reason why translation is difficult and how to deal with it have not been shown to the user in an easy-to-understand manner. Therefore, there are cases where the translation cannot be performed properly.

本発明のいくつかの態様は、翻訳を適切に行うことができる情報処理装置、情報処理方法、及びプログラムを提供することを目的の一つとする。 An object of some aspects of the present invention is to provide an information processing apparatus, an information processing method, and a program that can appropriately perform translation.

また、本発明の他の態様は、後述する実施形態に記載した作用効果を奏することを可能にする情報処理装置、情報処理方法、及びプログラムを提供することを目的の一つとする。 Another object of another aspect of the present invention is to provide an information processing apparatus, an information processing method, and a program that can achieve the effects described in the embodiments described later.

（１）上述した課題を解決するために、本発明の一態様は、第１言語の文章を示す第１文章情報を取得する取得部と、前記第１文章情報が示す文章を、前記第１言語とは異なる第２言語に翻訳した文章を示す第２文章情報を生成する翻訳処理を実行する翻訳部と、前記翻訳処理における異常の有無を評価する第２評価部と、前記第２評価部による第２評価に基づいてキャラクタを動作させる動作制御部と、を備える情報処理装置である。 (1) In order to solve the above-described problem, according to one aspect of the present invention, an acquisition unit that acquires first sentence information indicating a sentence in a first language, and a sentence indicated by the first sentence information are included in the first sentence. A translation unit for executing a translation process for generating second sentence information indicating a sentence translated into a second language different from the language; a second evaluation unit for evaluating presence or absence of abnormality in the translation process; and the second evaluation unit And an action control unit that moves the character based on the second evaluation.

（２）また、本発明の一態様は、前記第１言語の音声を示す音声情報から、前記第１文章情報を生成する音声認識処理における異常の有無を評価する第１評価部、を備え、前記取得部は、前記音声認識処理を実行することにより、前記第１文章情報を取得し、前記動作制御部は、前記第１評価部による第１評価と、前記第２評価部による第２評価とのいずれか又は両方に基づいてキャラクタを動作させる情報処理装置である。 (2) Moreover, 1 aspect of this invention is provided with the 1st evaluation part which evaluates the presence or absence of abnormality in the speech recognition process which produces | generates the said 1st sentence information from the audio | voice information which shows the audio | voice of the said 1st language, The acquisition unit acquires the first sentence information by executing the voice recognition process, and the operation control unit performs a first evaluation by the first evaluation unit and a second evaluation by the second evaluation unit. Is an information processing apparatus that moves a character based on one or both of the above.

（３）また、本発明の一態様は、（２）に記載の情報処理装置であって、前記動作制御部は、前記音声認識処理において異常が無く、且つ、前記翻訳処理において異常が無い場合に、前記第２文章情報に基づいて、前記キャラクタを動作させる。 (3) Moreover, one aspect of the present invention is the information processing apparatus according to (2), in which the operation control unit has no abnormality in the speech recognition process and no abnormality in the translation process In addition, the character is operated based on the second sentence information.

（４）また、本発明の一態様は、（２）又は（３）に記載の情報処理装置であって、前記動作制御部は、前記音声認識処理において異常が無く、且つ、前記翻訳処理において異常がある場合に、前記第１文章情報に基づいて、前記キャラクタを動作させる。 (4) Moreover, one aspect of the present invention is the information processing apparatus according to (2) or (3), in which the operation control unit has no abnormality in the speech recognition processing and the translation processing. When there is an abnormality, the character is moved based on the first sentence information.

（５）また、本発明の一態様は、（２）から（４）のいずれかに記載の情報処理装置であって、前記第１評価と前記第２評価との両方又はいずれかと、前記キャラクタの動作との対応関係を定める動作規定情報であって、言語ごとに互いに異なる動作規定情報を記憶する言語動作情報記憶部を備え、前記動作制御部は、前記第１言語又は前記第２言語に応じた動作規定情報を参照して前記第１評価と前記第２評価とのいずれか又は両方に対応する動作を選択し、選択した動作を前記キャラクタに行わせる。 (5) Moreover, 1 aspect of this invention is an information processing apparatus in any one of (2) to (4), Comprising: Both the said 1st evaluation and the said 2nd evaluation, or either, The said character And a motion information storage unit that stores different motion specification information for each language, and the motion control unit is provided in the first language or the second language. By referring to the corresponding action definition information, an action corresponding to one or both of the first evaluation and the second evaluation is selected, and the selected action is performed by the character.

（６）また、本発明の一態様は、（５）に記載の情報処理装置であって、利用場面ごとに異なる動作規定情報であって、前記言語ごとの動作規定情報に定められていない前記対応関係を定める動作規定情報を記憶する場面動作記憶部を備え、前記動作制御部は、自装置の利用場面に応じた動作規定情報を参照して前記第１評価と前記第２評価とのいずれか又は両方に対応する動作を選択し、選択した動作を前記キャラクタに行わせる。 (6) Moreover, one aspect of the present invention is the information processing apparatus according to (5), wherein the operation definition information is different for each use scene and is not defined in the operation definition information for each language. A scene action storage unit that stores action definition information that defines a correspondence relationship is provided, and the action control unit refers to the action definition information according to the usage scene of the device itself, and the first evaluation or the second evaluation is performed. Or an action corresponding to both of them is selected, and the character is caused to perform the selected action.

（７）また、本発明の一態様は、（６）に記載の情報処理装置であって、前記動作規定情報を設定する操作を受け付ける操作受付部と、前記操作受付部が受け付けた前記操作に基づいて、前記言語動作情報記憶部と前記場面動作記憶部とのいずれか又は両方に、前記動作規定情報を記憶させる動作登録部と、を備える。 (7) One embodiment of the present invention is the information processing apparatus according to (6), in which an operation reception unit that receives an operation for setting the operation definition information and the operation that the operation reception unit receives And a motion registration unit that stores the motion regulation information in one or both of the language motion information storage unit and the scene motion storage unit.

（８）また、本発明の一態様は、情報処理装置が、第１言語の文章を示す第１文章情報を取得する第１ステップと、前記情報処理装置が、前記第１文章情報が示す文章を、前記第１言語とは異なる第２言語に翻訳した文章を示す第２文章情報を生成する翻訳処理を実行する第２ステップと、前記情報処理装置が、前記翻訳処理における異常の有無を評価する第３ステップと、前記情報処理装置が、前記第３ステップにおける評価に基づいてキャラクタを動作させる第４ステップと、を含む情報処理方法である。
である。 (8) According to one aspect of the present invention, the information processing device acquires first sentence information indicating a sentence in the first language, and the information processing apparatus indicates the sentence indicated by the first sentence information. A second step of executing a translation process for generating second sentence information indicating a sentence translated into a second language different from the first language, and the information processing apparatus evaluates whether there is an abnormality in the translation process And a fourth step in which the information processing apparatus moves the character based on the evaluation in the third step.
It is.

（９）また、本発明の一態様は、コンピュータに、第１言語の文章を示す第１文章情報を取得する第１ステップと、前記第１文章情報が示す文章を、前記第１言語とは異なる第２言語に翻訳した文章を示す第２文章情報を生成する翻訳処理を実行する第２ステップと、前記翻訳処理における異常の有無を評価する第３ステップと、前記第３ステップにおける評価に基づいてキャラクタを動作させる第４ステップと、を実行させるプログラムである。 (9) Further, according to one aspect of the present invention, a first step of acquiring, in a computer, first sentence information indicating a sentence in a first language, a sentence indicated by the first sentence information, Based on the second step of executing a translation process for generating second sentence information indicating a sentence translated into a different second language, the third step of evaluating the presence or absence of abnormality in the translation process, and the evaluation in the third step And a fourth step of moving the character.

本発明の実施形態によれば、翻訳を適切に行うことができる。 According to the embodiment of the present invention, translation can be appropriately performed.

本発明の第１の実施形態に係る情報処理装置の概要を説明するための図である。It is a figure for demonstrating the outline | summary of the information processing apparatus which concerns on the 1st Embodiment of this invention. 同実施形態に係る動作規定情報の一例を示す図である。It is a figure showing an example of operation regulation information concerning the embodiment. 同実施形態に係るキャラクタの動作の例を示す図である。It is a figure showing an example of a character's operation concerning the embodiment. 同実施形態に係るキャラクタの動作パターンの概要を示す図である。It is a figure which shows the outline | summary of the movement pattern of the character which concerns on the embodiment. 同実施形態に係る情報処理装置のハードウェア構成の一例を示す図である。It is a figure which shows an example of the hardware constitutions of the information processing apparatus which concerns on the embodiment. 同実施形態に係る情報処理装置の機能構成の一例を示す図である。It is a figure which shows an example of a function structure of the information processing apparatus which concerns on the embodiment. 同実施形態に係る情報処理装置による処理の流れの一例を示すフローチャートである。It is a flowchart which shows an example of the flow of a process by the information processing apparatus which concerns on the embodiment. 本発明の第２の実施形態に係る情報処理装置の概要を説明するための図である。It is a figure for demonstrating the outline | summary of the information processing apparatus which concerns on the 2nd Embodiment of this invention. 同実施形態に係る情報処理装置の機能構成の一例を示す図である。It is a figure which shows an example of a function structure of the information processing apparatus which concerns on the embodiment. 本発明の第３の実施形態に係る情報処理装置の概要を説明するための図である。It is a figure for demonstrating the outline | summary of the information processing apparatus which concerns on the 3rd Embodiment of this invention. 同実施形態に係る情報処理装置の機能構成の一例を示す図である。It is a figure which shows an example of a function structure of the information processing apparatus which concerns on the embodiment.

以下、本発明の一実施形態について、図面を参照して説明する。
[第１の実施形態]
〔情報処理装置の概要〕
本発明の第１の実施形態について説明する。まず、本実施形態に係る情報処理装置１０の概要について説明する。
本実施形態に係る情報処理装置１０（図６）は、互いに異なる２つ以上の言語における会話の通訳を支援する装置である。つまり、情報処理装置１０には、複数のユーザが存在し、これら複数のユーザは、それぞれ、異なる言語により会話をする。ただし、ユーザが３人以上である場合、これら３人以上のユーザにより用いられる言語は、２つ以上であればよい。例えば、３人のユーザがいる場合は、そのうちの２人が同じ言語を用いてもよい。 Hereinafter, an embodiment of the present invention will be described with reference to the drawings.
[First embodiment]
[Outline of information processing equipment]
A first embodiment of the present invention will be described. First, an overview of the information processing apparatus 10 according to the present embodiment will be described.
The information processing apparatus 10 (FIG. 6) according to the present embodiment is an apparatus that supports interpretation of conversations in two or more languages different from each other. That is, there are a plurality of users in the information processing apparatus 10, and these plurality of users each have a conversation in different languages. However, when there are three or more users, the number of languages used by these three or more users may be two or more. For example, if there are three users, two of them may use the same language.

図１は、本実施形態に係る情報処理装置１０の概要を説明するための図である。
図１には、情報処理装置１０の利用に係る３つの場面Ｃ１１〜Ｃ１３を示す。これら第１〜第３場面Ｃ１１〜Ｃ１３には、音声受付部１３１−１、１３１−２、キャラクタＣＲと、発話者ＳＰと、受話者ＲＥと、がそれぞれ示されている。
発話者ＳＰとは、情報処理装置１０の複数のユーザのうち、該複数のユーザによる会話における発話側のユーザである。
受話者ＲＥとは、情報処理装置１０の複数のユーザのうち、該複数のユーザによる会話における受話側のユーザである。図１には、発話者ＳＰと受話者ＲＥとが、それぞれ、１人ずつである例を示すが、情報処理装置１０のユーザは、３人以上であってもよい。ただし、この場合であっても、基本的に発話者ＳＰは１人である。発話者ＳＰと受話者ＲＥとの関係は、固定でなく、会話の進行に応じて、発話者ＳＰと受話者ＲＥとが入れ替わってもよい。本実施形態では、一例として、発話者ＳＰが会話に用いる言語が日本語であり、受話者ＲＥが会話に用いる言語が英語である場合について説明する。 FIG. 1 is a diagram for explaining an overview of the information processing apparatus 10 according to the present embodiment.
FIG. 1 shows three scenes C11 to C13 related to the use of the information processing apparatus 10. In these first to third scenes C11 to C13, voice reception units 131-1 and 131-2, a character CR, a speaker SP, and a receiver RE are shown, respectively.
The speaker SP is a user on the utterance side in a conversation by a plurality of users among the plurality of users of the information processing apparatus 10.
The listener RE is a user on the receiver side in a conversation between the plurality of users among the plurality of users of the information processing apparatus 10. Although FIG. 1 shows an example in which there is one speaker SP and one listener RE, respectively, the number of users of the information processing apparatus 10 may be three or more. However, even in this case, there is basically one speaker SP. The relationship between the speaker SP and the receiver RE is not fixed, and the speaker SP and the receiver RE may be switched according to the progress of the conversation. In the present embodiment, as an example, a case will be described in which the language used by the speaker SP for conversation is Japanese, and the language used by the receiver RE for conversation is English.

キャラクタＣＲとは、情報処理装置１０の複数のユーザによる会話において、通訳者の役割を果たすキャラクタである。キャラクタＣＲの動作は、情報処理装置１０により制御される。キャラクタＣＲは、例えば、ロボットのように現実の物体であってもよいし、例えば画面に表示されたキャラクタアニメーションであってもよい。画面にキャラクタＣＲを表示する場合、キャラクタＣＲは、２次元的に描画されてもよいし、３次元的に描画されてもよい。キャラクタＣＲは、動きを表現できるキャラクタであれば、どのような形状を有していてもよい。以下では、一例として、キャラクタＣＲが、情報処理装置１０が備える表示部に表示された人型ロボットの画像である場合について説明する。 The character CR is a character that plays the role of an interpreter in a conversation between a plurality of users of the information processing apparatus 10. The movement of the character CR is controlled by the information processing apparatus 10. The character CR may be a real object such as a robot, or may be a character animation displayed on a screen, for example. When displaying the character CR on the screen, the character CR may be drawn two-dimensionally or three-dimensionally. The character CR may have any shape as long as it can express movement. Hereinafter, as an example, a case where the character CR is an image of a humanoid robot displayed on the display unit included in the information processing apparatus 10 will be described.

音声受付部１３１−１、１３１−２は、情報処理装置１０の複数のユーザによる会話を集音するマイク等の装置である。図１には、ユーザごとに設置された２つの音声受付部１３１−１、１３１−２を示すが、音声受付部１３１−１、１３１−２は、ユーザごとに設けられなくてもよい。以下では、音声受付部１３１−１、１３１−２を特に区別する必要がない場合、音声受付部１３１と総称する。 The voice receiving units 131-1 and 131-2 are devices such as microphones that collect conversations by a plurality of users of the information processing apparatus 10. Although FIG. 1 shows two voice reception units 131-1 and 131-2 installed for each user, the voice reception units 131-1 and 131-2 may not be provided for each user. Hereinafter, when it is not necessary to particularly distinguish the voice receiving units 131-1 and 131-2, they are collectively referred to as the voice receiving unit 131.

次に、図１に示す第１場面Ｃ１１〜第３場面Ｃ１３を説明する。図１に示す例において、各場面は、第１場面Ｃ１１、第２場面Ｃ１２、第３場面Ｃ１３の順に時系列に進行している。
第１場面Ｃ１１では、発話者ＳＰは、日本語により発言を行っている。この発言は、音声受付部１３１−１により集音される。ここで、情報処理装置１０は、集音した音声から発話者ＳＰが話した内容を解析する。そして、解析した発言内容の第１文章（テキスト）を表す第１文章情報を生成する。つまり、第１文章とは、発話者ＳＰが発話した内容を、発話者ＳＰが用いる言語で表す文章である。また、情報処理装置１０は、日本語の第１文章を英語に翻訳した第２文章（テキスト）を表す第２文章情報を生成する。つまり、第２文章とは、発話者ＳＰが発話した内容を、受話者ＲＥが用いる言語で表す文章である。情報処理装置１０は、これらの音声認識処理や翻訳処理において、異常の有無を評価する。 Next, the first scene C11 to the third scene C13 shown in FIG. 1 will be described. In the example shown in FIG. 1, each scene progresses in time series in the order of a first scene C11, a second scene C12, and a third scene C13.
In the first scene C11, the speaker SP is speaking in Japanese. This speech is collected by the voice reception unit 131-1. Here, the information processing apparatus 10 analyzes the content spoken by the speaker SP from the collected sound. And the 1st sentence information showing the 1st sentence (text) of the analyzed utterance content is produced | generated. That is, the first sentence is a sentence representing the content uttered by the speaker SP in the language used by the speaker SP. In addition, the information processing apparatus 10 generates second sentence information representing a second sentence (text) obtained by translating the first Japanese sentence into English. That is, the second sentence is a sentence representing the content uttered by the speaker SP in the language used by the receiver RE. The information processing apparatus 10 evaluates the presence or absence of abnormality in these speech recognition processes and translation processes.

ここで、音声認識処理における異常の有無の評価とは、音声情報からの発言内容の文章化における確からしさである。また、音声認識処理における異常の有無の評価とは、第１文章の精度を評価することでもある。第１文章の精度とは、発言内容と第１文章との整合の程度である。例えば、発言内容が第１文章とよく整合すると考えられる場合は第１文章の精度は高く、発言内容が第１文章と整合しない可能性が高い場合や第１文章情報を生成することが困難な場合は第１文章の精度は低い。情報処理装置１０は、第１文章の精度が所定の閾値より低い場合、音声認識処理に異常があると判定する。以下では、音声認識処理における異常の有無の評価を音声認識評価処理と称することがある。 Here, the evaluation of the presence / absence of abnormality in the speech recognition processing is the certainty in writing the content of speech from speech information. Further, the evaluation of the presence or absence of abnormality in the speech recognition process is also an evaluation of the accuracy of the first sentence. The accuracy of the first sentence is the degree of matching between the statement content and the first sentence. For example, when it is considered that the content of the statement is well matched with the first sentence, the accuracy of the first sentence is high, and it is highly likely that the content of the comment is not consistent with the first sentence, or it is difficult to generate the first sentence information. In this case, the accuracy of the first sentence is low. The information processing apparatus 10 determines that there is an abnormality in the speech recognition process when the accuracy of the first sentence is lower than a predetermined threshold. Hereinafter, the evaluation of the presence or absence of abnormality in the voice recognition process may be referred to as a voice recognition evaluation process.

また、翻訳処理における異常の有無の評価とは、第１文章から第２文章への翻訳の確からしさであるとも言える。また、翻訳処理における異常の有無の評価とは、第２文章の精度を評価することでもある。第２文章の精度とは、第２文章の翻訳の精度である。例えば、第１文章と第２文章との意味するところが多くの部分で一致すると考えられる場合等は第２文章情報の精度は高く、第１文章と第２文章との意味するところがあまり一致しない可能性がある場合や、第１文章から第２文章の翻訳において複数の選択肢があり、翻訳を誤る可能性がある場合等は第２文章情報の精度は低い。情報処理装置１０は、第２文章の精度が所定の閾値より低い場合、翻訳処理に異常があると判定する。以下では、翻訳処理における異常の有無の評価を翻訳評価処理と称することがある。 Moreover, it can be said that the evaluation of the presence or absence of abnormality in the translation process is the certainty of translation from the first sentence to the second sentence. Also, the evaluation of the presence or absence of abnormality in the translation process is to evaluate the accuracy of the second sentence. The accuracy of the second sentence is the accuracy of translation of the second sentence. For example, when the meanings of the first sentence and the second sentence are considered to match in many parts, the accuracy of the second sentence information is high, and the meanings of the first sentence and the second sentence may not match very much. The accuracy of the second sentence information is low when there is a possibility that there is a plurality of options in the translation from the first sentence to the second sentence and there is a possibility that the translation is erroneous. When the accuracy of the second sentence is lower than the predetermined threshold, the information processing apparatus 10 determines that there is an abnormality in the translation process. Hereinafter, the evaluation of the presence or absence of abnormality in the translation process may be referred to as a translation evaluation process.

第１文章の精度や第２文章の精度の評価方法は、既に開示されている任意の方法を用いてよい。例えば、第１文章や第２文章の精度の評価指標として、確信度を用いることができる。音声認識評価処理における確信度とは、音声情報から生成した第１文章の確からしさを表す指標である。情報処理装置１０は、例えば、音声受付部１３１が受け付けた音声情報を、予め用意しておいたテンプレートの音声情報と比較することにより、発言内容を特定する。このとき、情報処理装置１０は、音声受付部１３１が受け付けた音声情報と、テンプレートの音声情報との一致の程度等を算出することにより、確信度を算出することができる。 As an evaluation method of the accuracy of the first sentence and the accuracy of the second sentence, any already disclosed method may be used. For example, the certainty factor can be used as an evaluation index of the accuracy of the first sentence and the second sentence. The certainty factor in the speech recognition evaluation process is an index representing the likelihood of the first sentence generated from the speech information. For example, the information processing apparatus 10 compares the voice information received by the voice receiving unit 131 with the voice information of a template prepared in advance, thereby specifying the content of the statement. At this time, the information processing apparatus 10 can calculate the certainty factor by calculating the degree of coincidence between the audio information received by the audio receiving unit 131 and the audio information of the template.

また、翻訳評価処理における確信度とは、第１文章から生成した第２文章の確からしさを表す指標である。情報処理装置１０は、例えば、第２言語から第１言語へと、第２文章を逆翻訳する。そして、情報処理装置１０は、翻訳語の第１言語の文章を、元々の第１文章と比較し、２つの文章間の一致の程度を算出することにより確信度を取得することができる。以下では、音声認識評価処理による評価を第１評価と称することがある。また、翻訳評価処理による評価を第２評価と称することがある。 In addition, the certainty factor in the translation evaluation process is an index representing the probability of the second sentence generated from the first sentence. For example, the information processing apparatus 10 back-translates the second sentence from the second language to the first language. Then, the information processing apparatus 10 can acquire the certainty level by comparing the sentence in the first language of the translated word with the original first sentence and calculating the degree of coincidence between the two sentences. Hereinafter, the evaluation by the voice recognition evaluation process may be referred to as a first evaluation. Moreover, the evaluation by the translation evaluation process may be referred to as a second evaluation.

次に、第２場面Ｃ１２では、キャラクタＣＲが第１場面Ｃ１１の発話者ＳＰの発言に対して反応している。ここで、情報処理装置１０は、第１評価や第２評価に応じてキャラクタＣＲの動作を選択する。例えば、上述した第１場面Ｃ１１では、発話者ＳＰの声が小さかったため、情報処理装置１０は、第１文章の精度が低いと判定する。従って、第２場面Ｃ１２において、情報処理装置１０は、例えば、発話者ＳＰに対して口の前で手を広げる動作をキャラクタＣＲに行わせ、大きな声での発話を促す。これにより、発話者ＳＰは、声が小さかったことを認識することができるため、より大きな声で言い直す等の対応を行うことができる。 Next, in the second scene C12, the character CR reacts to the speech of the speaker SP in the first scene C11. Here, the information processing apparatus 10 selects the action of the character CR according to the first evaluation or the second evaluation. For example, in the first scene C11 described above, since the voice of the speaker SP is low, the information processing apparatus 10 determines that the accuracy of the first sentence is low. Accordingly, in the second scene C12, for example, the information processing apparatus 10 causes the character CR to perform an action of spreading the hand in front of the mouth to the speaker SP, and prompts the speaker to speak with a loud voice. Thereby, since the speaker SP can recognize that the voice is low, it is possible to take a countermeasure such as re-speaking with a louder voice.

次に、第３場面Ｃ１３では、情報処理装置１０は、第２場面Ｃ１２における発話者ＳＰの言い直しにより発言内容を認識することができたため、その内容を英語に翻訳した文章を生成する。そして、情報処理装置１０は、キャラクタＣＲに文章に則したジェスチャーを行わせながら、英語の翻訳文を音声出力する。これにより、受話者ＲＥは、発話者ＳＰが日本語で話した内容を、正しい英語で認識することができる。 Next, in the third scene C13, the information processing apparatus 10 can recognize the utterance content by re-phrasing the speaker SP in the second scene C12, and thus generates a sentence in which the content is translated into English. Then, the information processing apparatus 10 outputs the English translation as a voice while making the character CR perform a gesture according to the sentence. Thereby, the listener RE can recognize the content spoken by the speaker SP in Japanese in correct English.

以上のように、本実施形態に係る情報処理装置１０は、第１言語（例えば、日本語）の音声を示す音声情報から、第１言語の文章を示す第１文章情報を生成する音声認識処理を実行する。また、情報処理装置１０は、第１文章情報が示す文章を、第１言語とは異なる第２言語（例えば、英語）に翻訳した文章を示す第２文章情報を生成する翻訳処理を実行する。また、情報処理装置１０は、音声認識処理と翻訳処理とにおける異常の有無を評価する。そして、情報処理装置１０は、評価結果に基づいてキャラクタＣＲを動作させる。
これにより、情報処理装置１０は、キャラクタＣＲに適切な動作を行わせ、翻訳を困難にしている原因やその対処法をユーザに分かり易く示すことができる。従って、情報処理装置１０は、翻訳を適切に行うことができる。 As described above, the information processing apparatus 10 according to the present embodiment generates the first sentence information indicating the first language sentence from the voice information indicating the first language (for example, Japanese) voice. Execute. In addition, the information processing apparatus 10 executes a translation process for generating second sentence information indicating a sentence obtained by translating a sentence indicated by the first sentence information into a second language (for example, English) different from the first language. In addition, the information processing apparatus 10 evaluates whether there is an abnormality in the speech recognition process and the translation process. Then, the information processing apparatus 10 moves the character CR based on the evaluation result.
As a result, the information processing apparatus 10 can cause the character CR to perform an appropriate action, and can easily indicate to the user the cause that makes translation difficult and how to deal with it. Therefore, the information processing apparatus 10 can appropriately perform translation.

〔動作規定情報〕
次に、本実施形態に係る情報処理装置１０が処理する動作規定情報について説明する。
動作規定情報とは、音声認識処理や翻訳処理の評価結果に応じた動作を定める情報である。
まず、動作規定情報のデータ構成について説明する。
図２は、本実施形態に係る動作規定情報の一例を示す図である。
動作規定情報とは、会話の状況に応じたキャラクタＣＲの動作を定める情報である。図２には、一例として、第１評価と第２評価とに応じてキャラクタＣＲを動作させる場合の動作規定情報を示す。図２に示す例において、動作規定情報は、分類情報（図２における「分類」）と、物理入力情報（図２における「物理入力」）と、音声識別情報（図２における「音声識別」）と、音声認識意味解析情報（図２における「音声認識意味解析」）と、テキスト解析情報（図２における「テキスト解析」）と、文法解析情報（図２における「文法解析」）と、目的言語デコード情報（図２における「目的言語デコード」）と、完了判断情報（図２における「完了判断情報」）と、アクションＩＤ（ＩＤｅｎｔｉｆｉｅｒ）（図２における「アクションＩＤ」）と、想定状況情報（図２における「想定状況」）と、を互いに対応付けた情報である。 [Operation regulation information]
Next, the operation definition information processed by the information processing apparatus 10 according to the present embodiment will be described.
The action definition information is information that defines an action according to the evaluation result of the speech recognition process or the translation process.
First, the data structure of the operation defining information will be described.
FIG. 2 is a diagram illustrating an example of the operation definition information according to the present embodiment.
The action defining information is information that defines the action of the character CR according to the conversation situation. FIG. 2 shows, as an example, action definition information when the character CR is moved according to the first evaluation and the second evaluation. In the example shown in FIG. 2, the action definition information includes classification information (“classification” in FIG. 2), physical input information (“physical input” in FIG. 2), and voice identification information (“voice identification” in FIG. 2). Speech recognition semantic analysis information ("voice recognition semantic analysis" in FIG. 2), text analysis information ("text analysis" in FIG. 2), grammar analysis information ("grammar analysis" in FIG. 2), and target language Decoding information (“target language decoding” in FIG. 2), completion determination information (“completion determination information” in FIG. 2), action ID (IDentifier) (“action ID” in FIG. 2), and assumed situation information (FIG. 2 and the “expected situation” in FIG.

分類情報とは、第１評価と第２評価との分類を表す情報である。換言すると、分類情報とは、動作規定情報の各レコードを識別する情報である。
物理入力情報とは、第１文章情報の生成において、集音された音声の物理特性についての異常の有無を表す情報である。物理入力情報には、集音された音声の品質に異常がある場合に、その異常の詳細が記述される。音声の品質における異常の有無の判定は、例えば、音声の品質を測定し、その測定値を所定の閾値と比較することにより行われる。このように、物理入力情報とは、第１評価を表す情報の一例である。つまり、本実施形態において、異常が無いということは、必ずしも全く異常が無い、ということとは限らない。例えば、処理に差支えが無い程度の軽度の異常や部分的にその後の処理が可能である場合には、異常が無いと判定される場合がある。 The classification information is information representing the classification between the first evaluation and the second evaluation. In other words, the classification information is information for identifying each record of the operation definition information.
The physical input information is information indicating whether or not there is an abnormality in the physical characteristics of the collected voice in the generation of the first sentence information. In the physical input information, when there is an abnormality in the quality of the collected voice, details of the abnormality are described. The determination of whether or not there is an abnormality in the sound quality is performed by, for example, measuring the sound quality and comparing the measured value with a predetermined threshold value. Thus, physical input information is an example of information representing the first evaluation. That is, in this embodiment, the absence of abnormality does not necessarily mean that there is no abnormality at all. For example, it may be determined that there is no abnormality when there is a slight abnormality that does not interfere with the processing, or when the subsequent processing is partially possible.

音声識別情報とは、第１文章情報の生成において、集音された音声の明瞭性についての異常の有無を表す情報である。音声識別情報には、集音された音声の明瞭性に異常がある場合に、その異常の詳細が記述される。音声の明瞭性における異常の有無の判定は、例えば、音声の明瞭性を測定し、その測定値を所定の閾値と比較することにより行われる。このように、音声識別情報とは、第１評価を表す情報の一例である。 The voice identification information is information indicating whether or not there is an abnormality in the clarity of the collected voice in the generation of the first sentence information. In the voice identification information, when there is an abnormality in the clarity of the collected voice, details of the abnormality are described. The determination of the presence or absence of an abnormality in speech clarity is performed, for example, by measuring the clarity of speech and comparing the measured value with a predetermined threshold value. Thus, the voice identification information is an example of information representing the first evaluation.

音声認識意味解析情報とは、音声から生成された第１文章の言語的表現性における異常の有無を表す情報である。音声認識意味解析情報には、第１文章の言語的表現性に異常がある場合に、その異常の詳細が記述される。第１文章の言語的表現性における異常の有無の判定は、例えば、第１文章の言語的表現性を測定し、その測定値を所定の閾値と比較することにより行われる。このように、音声認識意味解析情報とは、第１評価を表す情報の一例である。 The speech recognition semantic analysis information is information indicating the presence or absence of abnormality in the linguistic expression of the first sentence generated from the speech. In the speech recognition semantic analysis information, when there is an abnormality in the linguistic expression of the first sentence, details of the abnormality are described. The determination of the presence or absence of abnormality in the linguistic expression of the first sentence is performed, for example, by measuring the linguistic expression of the first sentence and comparing the measured value with a predetermined threshold value. As described above, the speech recognition semantic analysis information is an example of information representing the first evaluation.

テキスト解析情報とは、第２文章情報の生成において、第１文章の翻訳に係るテキスト解析の異常の有無を表す情報である。テキスト解析の異常の有無の判定は、例えば、第１文章の尤もらしさを測定し、その測定値を所定の閾値と比較することにより行われる。このように、テキスト解析情報とは、第２評価を表す情報の一例である。
文法解析情報とは、第２文章情報の生成において、第１文章の文法解析における異常の有無を表す情報である。第１文章の文法解析における異常の有無の判定は、例えば、第１文章の文法的な正しさを測定し、その測定値を所定の閾値と比較することにより行われる。このように、文法解析情報とは、第２評価を表す情報の一例である。 The text analysis information is information indicating whether or not there is an abnormality in text analysis related to the translation of the first sentence in the generation of the second sentence information. The determination of the presence / absence of text analysis abnormality is performed, for example, by measuring the likelihood of the first sentence and comparing the measured value with a predetermined threshold. As described above, the text analysis information is an example of information indicating the second evaluation.
The grammatical analysis information is information indicating the presence or absence of abnormality in the grammatical analysis of the first sentence in the generation of the second sentence information. The determination of the presence or absence of abnormality in the grammatical analysis of the first sentence is performed, for example, by measuring the grammatical correctness of the first sentence and comparing the measured value with a predetermined threshold value. As described above, the grammatical analysis information is an example of information indicating the second evaluation.

目的言語デコード情報とは、第２文章情報の生成において、第１文章から第２文章への翻訳変換における異常の有無を表す情報である。目的言語デコード情報には、翻訳変換に異常がある場合に、その異常の詳細が記述される。このように、目的言語デコード情報とは、第２評価を表す情報の一例である。
完了判断情報とは、第２文章の音声出力段階における異常の有無を表す情報の一例である。 The target language decoding information is information indicating the presence or absence of an abnormality in translation conversion from the first sentence to the second sentence in the generation of the second sentence information. In the target language decoding information, when there is an abnormality in translation conversion, details of the abnormality are described. Thus, the target language decoding information is an example of information representing the second evaluation.
The completion determination information is an example of information indicating the presence / absence of abnormality in the voice output stage of the second sentence.

アクションＩＤとは、キャラクタＣＲの動作を識別する情報である。
想定情報とは、物理入力情報、音声識別情報、音声認識意味解析情報、テキスト解析情報、文法解析情報、目的言語デコード情報、及び完了判断情報の値に基づいて想定される会話、翻訳の状況を表す情報である。
なお、後述するように、情報処理装置１０は、キャラクタＣＲを第１文章や第２文章に基づいて動作させてもよい。この場合、動作規定情報には、例えば、単語や句、節等の表現に応じた動作が記述される。 The action ID is information for identifying the action of the character CR.
Assumed information refers to the state of conversation and translation assumed based on the values of physical input information, speech identification information, speech recognition semantic analysis information, text analysis information, grammar analysis information, target language decoding information, and completion determination information. It is information to represent.
As will be described later, the information processing apparatus 10 may operate the character CR based on the first sentence or the second sentence. In this case, for example, an operation corresponding to an expression such as a word, a phrase, or a clause is described in the operation definition information.

図３は、本実施形態に係るキャラクタＣＲの動作の例を示す図である。
図３には、上述したアクションＩＤごとの、動作（アクション）と、動作の意味との対応関係を示す。情報処理装置１０は、動作規定情報に基づいて、第１評価、第２評価に応じたアクションＩＤを特定し、アクションＩＤに対応する動作をキャラクタＣＲに行わせる。 FIG. 3 is a diagram illustrating an example of the motion of the character CR according to the present embodiment.
FIG. 3 shows a correspondence relationship between the operation (action) and the meaning of the operation for each action ID described above. The information processing apparatus 10 specifies an action ID corresponding to the first evaluation and the second evaluation based on the action defining information, and causes the character CR to perform an action corresponding to the action ID.

以上のように、動作規定情報の各レコードは、第１評価及び第２評価から想定される発言の状況や処理の異常に応じた動作を定める。従って、情報処理装置１０は、動作規定情報に基づいて、キャラクタＣＲを動作させることにより、翻訳を困難としている状況に対応することができる。 As described above, each record of the operation definition information defines an operation according to the state of speech assumed from the first evaluation and the second evaluation and the processing abnormality. Therefore, the information processing apparatus 10 can cope with a situation where translation is difficult by moving the character CR based on the action defining information.

図４は、本実施形態に係るキャラクタＣＲの動作パターンの概要を示す図である。
ここで、動作パターンとは、特定の基準に基づいて複数の動作を分類したものである。
一般に言語と文化圏には相関がある。また、一般にジェスチャーは文化圏ごとに類似したものとなる傾向がある。従って、キャラクタＣＲの動作パターンを翻訳に用いる言語ごとに定めることにより、キャラクタＣＲを適切に動作させることができる。 FIG. 4 is a diagram showing an outline of the motion pattern of the character CR according to this embodiment.
Here, the operation pattern is a classification of a plurality of operations based on specific criteria.
In general, there is a correlation between language and cultural sphere. In general, gestures tend to be similar for each cultural area. Accordingly, the character CR can be appropriately operated by determining the movement pattern of the character CR for each language used for translation.

また、ジェスチャーは、文化圏の他、会話が行われる状況に応じて異なる場合がある。つまり、ジェスチャーは、情報処理装置１０が用いられる場面に応じて異なる場合がある。具体的には、友人同士の間で用いられるジェスチャーと、店舗において従業員と顧客との間で用いられるジェスチャーは異なる。従って、キャラクタＣＲの動作パターンを利用場面（サービス、シチュエーション）ごとに定めることにより、キャラクタＣＲを適切に動作させることができる。 In addition, the gesture may differ depending on the situation where the conversation is performed in addition to the cultural area. That is, the gesture may differ depending on the scene where the information processing apparatus 10 is used. Specifically, the gestures used between friends are different from the gestures used between employees and customers in the store. Accordingly, the character CR can be appropriately operated by determining the operation pattern of the character CR for each use scene (service, situation).

以上から、動作規定情報を言語ごと、利用場面ごとに用意し、会話に用いられる言語や、情報処理装置１０が用いられる場面に応じて動作規定情報を選択することにより、キャラクタＣＲを適切に動作させることができる。
なお、ジェスチャーの違いは、文化圏によるところが大きいため、動作規定情報を言語ごとに用意しておき、利用場面ごとの動作規定情報は差分ファイルとして用意されてもよい。つまり、図４に示すように、動作パターンは、階層的に定められてもよい。具体的には、言語ごとの動作パターンを基本とし、利用場面ごとの特徴的な動作パターン、すなわちサービスに固有の動作パターンを例外、応用等として定めてよい。これにより、キャラクタＣＲの動作パターンを効率的に規定することができる。なお、このような動作パターンの規定において、階層は、３つ以上の複数であってもよい。また、各階層の分類は、言語、サービスの他、任意に定められてよい。 As described above, the action regulation information is prepared for each language and each usage scene, and the action CR is appropriately operated by selecting the action regulation information according to the language used for the conversation or the scene where the information processing apparatus 10 is used. Can be made.
In addition, since the difference in gestures largely depends on the cultural sphere, the action definition information may be prepared for each language, and the action definition information for each use scene may be prepared as a difference file. That is, as shown in FIG. 4, the operation pattern may be defined hierarchically. Specifically, an operation pattern for each language may be used as a basis, and a characteristic operation pattern for each usage scene, that is, an operation pattern unique to a service may be defined as an exception or application. Thereby, the motion pattern of the character CR can be efficiently defined. In the definition of such an operation pattern, the hierarchy may be a plurality of three or more. Further, the classification of each layer may be arbitrarily determined in addition to language and service.

〔情報処理装置の構成〕
次に、情報処理装置１０の構成について説明する。
まず、情報処理装置１０のハードウェア構成について説明する。
図５は、本実施形態に係る情報処理装置１０のハードウェア構成の一例を示す図である。
情報処理装置１０は、例えば、パーソナルコンピュータ、携帯電話、タブレット、スマートフォン、ＰＨＳ（ＰｅｒｓｏｎａｌＨａｎｄｙ−ｐｈｏｎｅＳｙｓｔｅｍ）端末装置、又はＰＤＡ（ＰｅｒｓｏｎａｌＤｉｇｉｔａｌＡｓｓｉｓｔａｎｔ）などの電子機器である。 [Configuration of information processing device]
Next, the configuration of the information processing apparatus 10 will be described.
First, the hardware configuration of the information processing apparatus 10 will be described.
FIG. 5 is a diagram illustrating an example of a hardware configuration of the information processing apparatus 10 according to the present embodiment.
The information processing device 10 is an electronic device such as a personal computer, a mobile phone, a tablet, a smartphone, a PHS (Personal Handy-phone System) terminal device, or a PDA (Personal Digital Assistant).

情報処理装置１０は、ＣＰＵ（ＣｅｎｔｒａｌＰｒｏｃｅｓｓｉｎｇＵｎｉｔ）１１と、記憶部１２と、入力部１３と、音声出力部１４と、表示部１５と、を備える。
記憶部１２は、例えば、ＨＤＤ（ＨａｒｄＤｉｓｋＤｒｉｖｅ）やＳＳＤ（ＳｏｌｉｄＳｔａｔｅＤｒｉｖｅ）、ＥＥＰＲＯＭ（ＥｌｅｃｔｒｉｃａｌｌｙＥｒａｓａｂｌｅＰｒｏｇｒａｍｍａｂｌｅＲｅａｄ−ＯｎｌｙＭｅｍｏｒｙ）、ＲＯＭ（Ｒｅａｄ−ＯｎｌｙＭｅｍｏｒｙ）、ＲＡＭ（ＲａｎｄｏｍＡｃｃｅｓｓＭｅｍｏｒｙ）などを含み、ＣＰＵ１１が実行するプログラムや、ＣＰＵ１１が処理する各種情報、ＣＰＵ１１による処理結果等を記憶する。 The information processing apparatus 10 includes a CPU (Central Processing Unit) 11, a storage unit 12, an input unit 13, an audio output unit 14, and a display unit 15.
The storage unit 12 includes, for example, an HDD (Hard Disk Drive), an SSD (Solid State Drive), an EEPROM (Electrically Erasable Programmable Read-Only Memory), and a ROM (Read-Only Memory). A program executed by the CPU 11, various information processed by the CPU 11, processing results by the CPU 11, and the like are stored.

入力部１３は、例えば、マウス、タッチパネル等のポインティングデバイス、キーボード、マイク等の各種入力装置を含む。入力部１３が受け付け可能な操作の内容は、例えば、表示部１５が備える表示装置により表示されてもよい。
音声出力部１４は、例えば、スピーカー等の音響機器を含む。
表示部１５は、例えば、液晶ディスプレイや有機ＥＬ（Ｅｌｅｃｔｒｏ−Ｌｕｍｉｎｅｓｃｅｎｃｅ）ディスプレイなどの表示装置を含む。
以上が、情報処理装置１０のハードウェア構成の説明である。 The input unit 13 includes various input devices such as a pointing device such as a mouse and a touch panel, a keyboard, and a microphone, for example. The content of the operation that can be received by the input unit 13 may be displayed by, for example, a display device included in the display unit 15.
The audio output unit 14 includes, for example, an audio device such as a speaker.
The display unit 15 includes a display device such as a liquid crystal display or an organic EL (Electro-Luminescence) display.
The above is the description of the hardware configuration of the information processing apparatus 10.

次に、情報処理装置１０の機能構成について説明する。
図６は、本実施形態に係る情報処理装置１０の機能構成の一例を示す図である。
情報処理装置１０は、記憶部１２と、音声出力部１４と、表示部１５と、制御部１１０と、音声受付部１３１と、操作受付部１３２と、を備える。
記憶部１２は、動作規定情報を記憶する動作情報記憶部１２１を備える。また、動作情報記憶部１２１は、言語動作情報記憶部１２２と、場面動作情報記憶部１２３と、を備える。 Next, the functional configuration of the information processing apparatus 10 will be described.
FIG. 6 is a diagram illustrating an example of a functional configuration of the information processing apparatus 10 according to the present embodiment.
The information processing apparatus 10 includes a storage unit 12, a voice output unit 14, a display unit 15, a control unit 110, a voice reception unit 131, and an operation reception unit 132.
The storage unit 12 includes an operation information storage unit 121 that stores operation definition information. The motion information storage unit 121 includes a language motion information storage unit 122 and a scene motion information storage unit 123.

言語動作情報記憶部１２２は、言語ごとの動作規定情報を記憶する。言語ごとの動作規定情報は、例えば、各種言語に対応する文化圏に共通の基本的な動作パターン全般を定める。
場面動作情報記憶部１２３は、利用場面ごとの動作規定情報を記憶する。利用場面ごとの動作規定情報は、例えば、各種利用場面に対応するサービスに共通の動作パターンのうち、上述した言語ごとの動作規定情報には記述されていないサービス固有の動作パターンを定める。利用場面ごとの動作規定情報は、例えば、言語ごとの動作規定情報とは、分離した差分ファイルとして用意されてもよい。 The language action information storage unit 122 stores action definition information for each language. The operation definition information for each language defines, for example, all basic operation patterns common to cultural areas corresponding to various languages.
The scene motion information storage unit 123 stores motion regulation information for each usage scene. For example, among the operation patterns common to services corresponding to various usage scenes, the operation definition information for each use scene defines a service-specific operation pattern that is not described in the above-described operation definition information for each language. For example, the action definition information for each use scene may be prepared as a differential file separated from the action definition information for each language.

音声受付部１３１は、音声入力を受け付ける。具体的には、例えば、音声受付部１３１は、情報処理装置１０の周囲の音声を集音する。つまり、音声受付部１３１は、発話者ＳＰの発言を集音する。音声受付部１３１は、集音した音声を示す音声情報を制御部１１０に出力する。
操作受付部１３２は、操作入力を受け付ける。具体的には、例えば、操作受付部１３２は、情報処理装置１０のユーザが用いる言語や、情報処理装置１０が用いられる場面等の選択を受け付ける。また、操作受付部１３２は、ユーザが独自に定める動作規定情報の入力を受け付けてもよい。つまり、操作受付部１３２は、動作規定情報を設定する操作を受け付けてもよい。操作受付部１３２は、受け付けた操作を示す操作情報を制御部１１０に出力する。 The voice reception unit 131 receives a voice input. Specifically, for example, the voice reception unit 131 collects voice around the information processing apparatus 10. That is, the voice reception unit 131 collects the speech of the speaker SP. The voice reception unit 131 outputs voice information indicating the collected voice to the control unit 110.
The operation receiving unit 132 receives an operation input. Specifically, for example, the operation accepting unit 132 accepts selection of a language used by the user of the information processing apparatus 10, a scene where the information processing apparatus 10 is used, and the like. Moreover, the operation reception part 132 may receive the input of the action regulation information which a user uniquely defines. That is, the operation accepting unit 132 may accept an operation for setting the action defining information. The operation reception unit 132 outputs operation information indicating the received operation to the control unit 110.

制御部１１０は、情報処理装置１０が備える各種構成の動作を制御する。制御部１１０は、例えば、ＣＰＵ１１が記憶部１２に記憶されたプログラムを実行することにより実現される。
制御部１１０は、音声認識部（取得部）１１１と、翻訳部１１２と、音声合成部１１３と、第１評価部１１４と、第２評価部１１５と、動作制御部１１６と、動作登録部１１７と、を備える。 The control unit 110 controls operations of various configurations included in the information processing apparatus 10. The control unit 110 is realized, for example, when the CPU 11 executes a program stored in the storage unit 12.
The control unit 110 includes a speech recognition unit (acquisition unit) 111, a translation unit 112, a speech synthesis unit 113, a first evaluation unit 114, a second evaluation unit 115, a motion control unit 116, and a motion registration unit 117. And comprising.

音声認識部１１１は、音声認識処理を実行する。具体的には、音声認識部１１１は、音声受付部１３１から音声情報を取得する。音声認識部１１１は、取得した音声情報を解析し、音声情報に含まれる発話者ＳＰの発言をテキスト化する。つまり、音声認識部１１１は、第１文章を示す第１文章情報を生成する。音声認識部１１１は、生成した第１文章情報を翻訳部１１２に出力する。 The voice recognition unit 111 executes voice recognition processing. Specifically, the voice recognition unit 111 acquires voice information from the voice reception unit 131. The voice recognition unit 111 analyzes the acquired voice information and converts the speech of the speaker SP included in the voice information into text. That is, the voice recognition unit 111 generates first sentence information indicating the first sentence. The voice recognition unit 111 outputs the generated first sentence information to the translation unit 112.

翻訳部１１２は、翻訳処理を実行する。具体的には、翻訳部１１２は、音声受付部１３１から第１文章情報を取得する。翻訳部１１２は、取得した第１文章情報を解析し、第１文章情報が示す文章を受話者ＲＥの言語に翻訳する。つまり、翻訳部１１２は、第２文章を示す第２文章情報を生成する。翻訳部１１２は、生成した第２文章情報を音声合成部１１３に出力する。 The translation unit 112 performs a translation process. Specifically, the translation unit 112 acquires first sentence information from the voice reception unit 131. The translation unit 112 analyzes the acquired first sentence information and translates the sentence indicated by the first sentence information into the language of the listener RE. That is, the translation unit 112 generates second sentence information indicating the second sentence. The translation unit 112 outputs the generated second sentence information to the speech synthesis unit 113.

音声合成部１１３は、翻訳結果を出力する。具体的には、音声合成部１１３は、翻訳部１１２から第２文章情報を取得する。音声合成部１１３は、取得した第２文章情報が示す第２文章を音声化した音声情報を生成する。音声合成部１１３は、生成した音声情報を音声出力部１４に出力する。これにより、音声受付部１３１に入力された第１言語の音声が、第１言語とは異なる第２言語の音声として音声出力部１４から発せられる。
また、音声合成部１１３は、第１評価、第２評価において、異常が発見された場合は、キャラクタＣＲの動作に合わせて異常の内容を音声出力する。 The speech synthesizer 113 outputs the translation result. Specifically, the speech synthesizer 113 acquires the second sentence information from the translator 112. The voice synthesizer 113 generates voice information obtained by converting the second sentence indicated by the acquired second sentence information into a voice. The voice synthesizer 113 outputs the generated voice information to the voice output unit 14. Thereby, the voice of the first language input to the voice receiving unit 131 is uttered from the voice output unit 14 as the voice of the second language different from the first language.
In addition, when an abnormality is found in the first evaluation and the second evaluation, the voice synthesizer 113 outputs the content of the abnormality in voice according to the action of the character CR.

第１評価部１１４は、音声認識評価処理を実行する。具体的には、第１評価部１１４は、音声認識処理の各過程において、上述した物理入力情報、音声識別情報、音声認識意味解析情報等の第１評価情報を生成する。第１評価部１１４は、生成した第１評価情報を動作制御部１１６に出力する。 The first evaluation unit 114 performs a speech recognition evaluation process. Specifically, the first evaluation unit 114 generates first evaluation information such as the above-described physical input information, voice identification information, and voice recognition semantic analysis information in each process of the voice recognition processing. The first evaluation unit 114 outputs the generated first evaluation information to the operation control unit 116.

第２評価部１１５は、翻訳評価処理を実行する。具体的には、第２評価部１１５は、翻訳処理の各過程において、上述したテキスト解析情報、文法解析情報、目的言語デコード情報等の第２評価情報を生成する。第２評価部１１５は、生成した第２評価情報を動作制御部１１６に出力する。 The second evaluation unit 115 executes a translation evaluation process. Specifically, the second evaluation unit 115 generates second evaluation information such as the text analysis information, the grammar analysis information, and the target language decoding information described above in each process of the translation process. The second evaluation unit 115 outputs the generated second evaluation information to the operation control unit 116.

動作制御部１１６は、キャラクタＣＲの動作を制御する。動作制御部１１６は、主に動作規定情報の選択、動作の特定を行う。
まず、動作規定情報の選択について説明する。動作制御部１１６は、操作受付部１３２から取得する操作情報に基づいて、キャラクタＣＲの動作の決定に用いる動作規定情報を選択する。この操作情報には、例えば、情報処理装置１０のユーザによって選択された発話者ＳＰが用いる言語の情報、受話者ＲＥが用いる言語の情報、情報処理装置１０の利用場面の情報等が含まれている。動作制御部１１６は、動作情報記憶部１２１に記憶されている言語ごと、利用場面ごとの動作規定情報のうち、ユーザにより選択された言語、利用場面の動作規定情報を取得する。これにより、動作制御部１１６は、言語圏、利用場面ごとに適した動作をキャラクタＣＲに行わせることができる。 The motion control unit 116 controls the motion of the character CR. The operation control unit 116 mainly selects operation defining information and specifies an operation.
First, selection of the operation regulation information will be described. Based on the operation information acquired from the operation receiving unit 132, the motion control unit 116 selects the motion regulation information used for determining the motion of the character CR. This operation information includes, for example, information on the language used by the speaker SP selected by the user of the information processing apparatus 10, information on the language used by the receiver RE, and information on the usage scene of the information processing apparatus 10. Yes. The operation control unit 116 acquires the language selected by the user and the operation specification information of the use scene from the operation specification information for each language and use scene stored in the operation information storage unit 121. Accordingly, the motion control unit 116 can cause the character CR to perform a motion suitable for each language zone and usage scene.

なお、発話者ＳＰの言語は、音声認識部１１１による音声認識処理において特定されてもよい。この場合であっても、動作制御部１１６は、発話者ＳＰの言語に応じた動作規定情報を選択することができるため、情報処理装置１０のユーザに不要な操作を行わせることなく、発話者ＳＰの言語に応じた適切な動作をキャラクタＣＲに行わせることができる。 Note that the language of the speaker SP may be specified in the voice recognition processing by the voice recognition unit 111. Even in this case, the motion control unit 116 can select the motion regulation information according to the language of the speaker SP, so that the user of the information processing apparatus 10 can perform an unnecessary operation without performing the unnecessary operation. It is possible to cause the character CR to perform an appropriate action according to the SP language.

次に、動作の特定について説明する。動作制御部１１６は、第１評価部１１４から第１評価情報を取得する。また、動作制御部１１６は、第２評価部１１５から第２評価情報を取得する。動作制御部１１６は、上述した処理により選択された動作規定情報を参照し、第１評価情報の値、及び、第２評価情報の値に対応するアクションＩＤを特定する。つまり、動作制御部１１６は、第１評価と第２評価とに応じた動作を特定する。このとき、動作制御部１１６は、キャラクタＣＲを発話者ＳＰに対して動作させるときは、発話者ＳＰの言語に応じた動作規定情報を参照して動作を特定する。また、キャラクタＣＲを受話者ＲＥに対して動作させるときは、受話者ＲＥの言語に応じた動作規定情報を参照して動作を特定する。そして、動作制御部１１６は、特定した動作をキャラクタＣＲに行わせる。この動作は、例えば、音声出力部１４による音声の出力と協調させる。これにより、情報処理装置１０は、翻訳が困難である原因やその対処法をさらに分かり易くユーザに示すことができる。 Next, the operation specification will be described. The operation control unit 116 acquires first evaluation information from the first evaluation unit 114. In addition, the operation control unit 116 acquires second evaluation information from the second evaluation unit 115. The operation control unit 116 refers to the operation defining information selected by the above-described processing, and specifies the action ID corresponding to the value of the first evaluation information and the value of the second evaluation information. That is, the operation control unit 116 specifies an operation according to the first evaluation and the second evaluation. At this time, when the character control unit 116 causes the character CR to move with respect to the speaker SP, the operation control unit 116 specifies the operation with reference to the action defining information according to the language of the speaker SP. In addition, when the character CR is operated with respect to the listener RE, the motion is specified with reference to the action definition information according to the language of the receiver RE. Then, the motion control unit 116 causes the character CR to perform the identified motion. This operation is coordinated with, for example, audio output by the audio output unit 14. As a result, the information processing apparatus 10 can further indicate to the user the cause of the difficulty of translation and the coping method.

また、動作制御部１１６は、第１文章情報、第２文章情報を参照し、その内容に応じた動作を選択してもよい。例えば、動作制御部１１６は、第１文章や第２文章に現れる名詞（例えば、料理等）の画像を表示させたり、感嘆詞に応じて感情を表現する動作をキャラクタＣＲに行わせたりしてもよい。これにより、情報処理装置１０は、キャラクタＣＲを発話者ＳＰの発言内容に応じて動作させることができるため、会話をより円滑に進めることができる。
第１文章情報に基づく動作の選択は、例えば、翻訳処理において異常が検出された場合に行われるようにしてもよい。また、音声認識処理や翻訳処理に異常が検出された場合であっても、テキスト化できた部分に基づいてキャラクタＣＲを動作させてもよい。これにより、情報処理装置１０は、発話者ＳＰの発言内容を翻訳することができない場合であっても、発話者ＳＰの発言内容の少なくとも一部を、受話者ＲＥに通知することができる。 In addition, the operation control unit 116 may select an operation according to the contents by referring to the first sentence information and the second sentence information. For example, the motion control unit 116 displays an image of a noun (for example, cooking) that appears in the first sentence or the second sentence, or causes the character CR to perform an action that expresses emotion according to the exclamation. Also good. Thereby, since the information processing apparatus 10 can operate the character CR according to the utterance content of the speaker SP, the conversation can be more smoothly advanced.
The selection of the operation based on the first sentence information may be performed, for example, when an abnormality is detected in the translation process. Further, even when an abnormality is detected in the speech recognition process or the translation process, the character CR may be operated based on the portion that has been converted into text. Thereby, the information processing apparatus 10 can notify the receiver RE of at least a part of the speech content of the speaker SP even when the speech content of the speaker SP cannot be translated.

また、動作制御部１１６は、音声認識処理、翻訳処理が全て完了する前にキャラクタＣＲを動作させてもよい。例えば、音声認識処理において、音声受付部１３１から取得した音声情報のうち、音声認識が完了した部分（つまり、聞き取れた部分）に基づいて、キャラクタＣＲを動作させてもよい。 Further, the motion control unit 116 may operate the character CR before the voice recognition processing and the translation processing are all completed. For example, in the voice recognition process, the character CR may be operated based on a part of the voice information acquired from the voice reception unit 131 where the voice recognition is completed (that is, a part that can be heard).

また、動作制御部１１６は、キャラクタＣＲの向きを制御してよい。この場合、動作制御部１１６は、情報処理装置１０のユーザから発言があった場合に、そのユーザに対してキャラクタＣＲが向くようにキャラクタＣＲの向きを制御する。また、動作制御部１１６は、発言が終了した場合に、キャラクタＣＲが受話者ＲＥの方に向くように制御する。情報処理装置１０の複数のユーザ各々の配置は、任意の方法により特定されてよい。各ユーザの配置は、例えば、音声受付部１３１が受け付ける音声入力の強さに基づいて特定されてもよいし、情報処理装置１０が備える撮像部の撮像画像から、画像認識により特定されてもよい。 Further, the motion control unit 116 may control the direction of the character CR. In this case, when the user of the information processing apparatus 10 speaks, the motion control unit 116 controls the direction of the character CR so that the character CR faces the user. In addition, the motion control unit 116 controls the character CR to face the listener RE when the speech is finished. The arrangement of each of the plurality of users of the information processing apparatus 10 may be specified by an arbitrary method. The arrangement of each user may be specified based on, for example, the strength of the voice input received by the voice receiving unit 131, or may be specified by image recognition from the captured image of the imaging unit included in the information processing apparatus 10. .

また、キャラクタＣＲは、情報処理装置１０ごとに１つでなくてもよい。例えば、情報処理装置１０は、複数のユーザそれぞれに対応するキャラクタＣＲを制御してもよい。また、情報処理装置１０は、第１評価、第２評価それぞれに対応するキャラクタＣＲを制御してもよい。具体的には、例えば、音声認識処理における異常を検出した場合に動作するキャラクタＣＲと、翻訳処理における異常を検出した場合に動作するキャラクタＣＲとがそれぞれ用意されてもよい。これにより、情報処理装置１０は、翻訳が困難である原因やその対処法をさらに分かり易くユーザに示すことができる。 Further, the number of character CRs may not be one for each information processing apparatus 10. For example, the information processing apparatus 10 may control the character CR corresponding to each of a plurality of users. Further, the information processing apparatus 10 may control the character CR corresponding to each of the first evaluation and the second evaluation. Specifically, for example, a character CR that operates when an abnormality in the speech recognition process is detected and a character CR that operates when an abnormality in the translation process is detected may be prepared. As a result, the information processing apparatus 10 can further indicate to the user the cause of the difficulty of translation and the coping method.

動作登録部１１７は、操作受付部１３２が受け付けた操作に基づいて、動作規定情報を生成する。このとき、動作登録部１１７は、例えば、表示部１５に動作の種類と、音声認識処理及び翻訳処理において検出可能な異常の種類とを選択可能に表示し、ユーザによる選択に基づいて動作規定情報を生成する。そして、動作登録部１１７は、生成した動作規定情報を動作情報記憶部１２１に記憶させる。なお、動作登録部１１７は、言語、利用場面等に対応した動作規定情報を登録可能としてもよいし、各ユーザに対応した動作規定情報を登録可能としてもよい。また、動作情報記憶部１２１は、既に動作情報記憶部１２１に記憶されている動作規定情報を編集可能としてもよい。
以上が情報処理装置１０の構成についての説明である。 The operation registration unit 117 generates operation definition information based on the operation received by the operation reception unit 132. At this time, for example, the operation registration unit 117 displays the type of operation and the type of abnormality that can be detected in the speech recognition process and the translation process in a selectable manner on the display unit 15, and the action definition information based on the selection by the user Is generated. Then, the operation registration unit 117 stores the generated operation definition information in the operation information storage unit 121. Note that the operation registration unit 117 may be able to register the operation definition information corresponding to the language, the usage scene, or the like, or may be able to register the operation definition information corresponding to each user. Further, the motion information storage unit 121 may edit the motion regulation information already stored in the motion information storage unit 121.
The above is the description of the configuration of the information processing apparatus 10.

〔情報処理装置の動作〕
次に、情報処理装置１０の動作について説明する。
図７は、本実施形態に係る情報処理装置１０による処理の流れの一例を示すフローチャートである。
ここでは、一例として、発話者ＳＰが発話を行い、発話に応じた動作をキャラクタＣＲに行わせるまでの処理の流れを示すフローチャートである。
（ステップＳ１００）音声受付部１３１は、発話者ＳＰによる音声入力を受け付ける。音声受付部１３１は、受け付けた音声を示す音声情報を音声認識部１１１に出力する。その後、制御部１１０は、ステップＳ１０２に処理を進める。 [Operation of information processing device]
Next, the operation of the information processing apparatus 10 will be described.
FIG. 7 is a flowchart showing an example of the flow of processing by the information processing apparatus 10 according to the present embodiment.
Here, as an example, it is a flowchart showing the flow of processing until the speaker SP utters and causes the character CR to perform an action corresponding to the utterance.
(Step S100) The voice receiving unit 131 receives a voice input by the speaker SP. The voice reception unit 131 outputs voice information indicating the received voice to the voice recognition unit 111. Thereafter, the control unit 110 proceeds with the process to step S102.

（ステップＳ１０２）音声認識部１１１は、音声受付部１３１から音声情報を取得する。音声受付部１３１は、取得した音声情報を解析し、第１文章情報を生成する。音声認識部１１１は、生成した第１文章情報を翻訳部１１２に出力する。その後、制御部１１０は、ステップＳ１０４に処理を進める。
（ステップＳ１０４）第１評価部１１４は、音声認識処理における異常の有無を評価し、第１評価情報を生成する。音声認識部１１１は、生成した第１評価情報を動作制御部１１６に出力する。その後、制御部１１０は、ステップＳ１０６に処理を進める。 (Step S 102) The voice recognition unit 111 acquires voice information from the voice reception unit 131. The voice reception unit 131 analyzes the acquired voice information and generates first sentence information. The voice recognition unit 111 outputs the generated first sentence information to the translation unit 112. Thereafter, the control unit 110 advances the process to step S104.
(Step S104) The first evaluation unit 114 evaluates whether or not there is an abnormality in the speech recognition process, and generates first evaluation information. The voice recognition unit 111 outputs the generated first evaluation information to the operation control unit 116. Thereafter, control unit 110 advances the process to step S106.

（ステップＳ１０６）翻訳部１１２は、音声認識部１１１から第１文章情報を取得する。翻訳部１１２は、取得した第１文章情報を解析し、第２文章を示す第２文章情報を生成する。その後、制御部１１０は、ステップＳ１０８に処理を進める。
（ステップＳ１０８）第２評価部１１５は、翻訳処理における異常の有無を評価し、第２評価情報を生成する。翻訳部１１２は、生成した第２評価情報を動作制御部１１６に出力する。その後、制御部１１０は、ステップＳ１１０に処理を進める。 (Step S 106) The translation unit 112 acquires first sentence information from the speech recognition unit 111. The translation unit 112 analyzes the acquired first sentence information and generates second sentence information indicating the second sentence. Thereafter, control unit 110 advances the process to step S108.
(Step S108) The second evaluation unit 115 evaluates the presence or absence of abnormality in the translation process, and generates second evaluation information. The translation unit 112 outputs the generated second evaluation information to the operation control unit 116. Thereafter, the control unit 110 advances the process to step S110.

（ステップＳ１１０）動作制御部１１６は、動作情報記憶部１２１に記憶されている動作規定情報を参照し、第１評価情報、第２評価情報に基づいて動作を選択する。このとき、動作制御部１１６は、第１文章情報、第２文章情報を参照し、その内容に応じた動作を選択してもよい。その後、制御部１１０は、ステップＳ１１２に処理を進める。
（ステップＳ１１２）動作制御部１１６は、ステップＳ１１０の処理で選択した動作を、キャラクタＣＲに行わせる。また、音声合成部１１３は、異常の情報、翻訳結果等を表す音声情報を生成し、キャラクタＣＲの動作と協調させて音声を出力させる。そして、制御部１１０は、図７に示す処理を終了する。
以上が情報処理装置１０の動作についての説明である。 (Step S110) The operation control unit 116 refers to the operation definition information stored in the operation information storage unit 121, and selects an operation based on the first evaluation information and the second evaluation information. At this time, the operation control unit 116 may refer to the first sentence information and the second sentence information and select an operation according to the contents. Thereafter, the control unit 110 proceeds with the process to step S112.
(Step S112) The motion control unit 116 causes the character CR to perform the motion selected in the process of step S110. The voice synthesizer 113 also generates voice information representing abnormality information, translation results, and the like, and outputs voice in cooperation with the action of the character CR. And the control part 110 complete | finishes the process shown in FIG.
The above is the description of the operation of the information processing apparatus 10.

〔第１の実施形態のまとめ〕
以上説明してきたように、本実施形態による情報処理装置１０は、第１言語（例えば、日本語）の文章を示す第１文章情報を取得する音声認識部１１１（取得部の一例）と、第１文章情報が示す文章を、第１言語とは異なる第２言語（例えば、英語）に翻訳した文章を示す第２文章情報を生成する翻訳処理を実行する翻訳部１１２と、翻訳処理における異常の有無を評価する第２評価部１１５と、第２評価部１１５による第２評価に基づいてキャラクタ（例えば、キャラクタＣＲ）を動作させる動作制御部１１６と、を備える。 [Summary of First Embodiment]
As described above, the information processing apparatus 10 according to the present embodiment includes the voice recognition unit 111 (an example of an acquisition unit) that acquires first sentence information indicating a sentence in a first language (for example, Japanese), the first A translation unit 112 for executing a translation process for generating second sentence information indicating a sentence translated from a sentence indicated by one sentence information into a second language (for example, English) different from the first language; A second evaluation unit 115 that evaluates the presence or absence, and a motion control unit 116 that moves a character (for example, a character CR) based on the second evaluation by the second evaluation unit 115 are provided.

これにより、情報処理装置１０は、翻訳処理において、翻訳の精度を低下させる異常を検出し、その原因や対処法をユーザに分かり易く示すため、翻訳を適切に行うことができる。具体的には、情報処理装置１０は、言い回しが翻訳に適していない等の問題や、問題ごとの対処法をユーザに分かり易く示すことができる。 Thereby, in the translation process, the information processing apparatus 10 detects an abnormality that lowers the accuracy of translation, and indicates the cause and countermeasure to the user in an easy-to-understand manner, so that the translation can be performed appropriately. Specifically, the information processing apparatus 10 can easily show the user a problem such as the phrase being unsuitable for translation and a countermeasure for each problem.

また、情報処理装置１０は、第１言語の音声を示す音声情報から、第１文章情報を生成する音声認識処理における異常の有無を評価する第１評価部１１４、を備え、音声認識部１１１は、音声認識処理を実行することにより、第１文章情報を取得し、動作制御部１１６は、第１評価部による第１評価と、第２評価部による第２評価とのいずれか又は両方に基づいてキャラクタを動作させる。
これにより、情報処理装置１０は、音声認識処理において、翻訳の精度を低下させる異常を検出し、その原因や対処法をユーザに分かり易く示す。従って、翻訳を適切に行うことができる。具体的には、情報処理装置１０は、発言が不明瞭である、声が小さい等の問題や問題ごとの対処法をユーザに分かり易く示すことができる。 In addition, the information processing apparatus 10 includes a first evaluation unit 114 that evaluates the presence or absence of abnormality in the speech recognition process that generates the first sentence information from the speech information indicating the speech in the first language. The first sentence information is acquired by executing the voice recognition process, and the motion control unit 116 is based on one or both of the first evaluation by the first evaluation unit and the second evaluation by the second evaluation unit. To move the character.
Thereby, the information processing apparatus 10 detects an abnormality that lowers the accuracy of translation in the speech recognition process, and easily shows the cause and the countermeasure to the user. Therefore, translation can be performed appropriately. Specifically, the information processing apparatus 10 can easily show the user a problem such as an unclear statement or a low voice, and a countermeasure for each problem.

また、動作制御部１１６は、音声認識処理において異常が無く、且つ、翻訳処理において異常が無い場合に、第２文章情報に基づいて、キャラクタを動作させる。
これにより、動作制御部１１６は、音声認識処理や翻訳処理に異常が無い場合には、翻訳語の文章の内容に応じてキャラクタを動作させること等ができるため、ユーザ同士の会話をより円滑に進めることができる。 The motion control unit 116 moves the character based on the second sentence information when there is no abnormality in the speech recognition process and there is no abnormality in the translation process.
As a result, when there is no abnormality in the speech recognition process or the translation process, the motion control unit 116 can move the character according to the content of the sentence of the translated word, so that the conversation between users can be performed more smoothly. Can proceed.

また、動作制御部１１６は、第１評価において異常が無く、且つ、第２評価において異常がある場合に、第１文章情報に基づいて、キャラクタを動作させる。
これにより、動作制御部１１６は、音声認識処理において異常が無く、翻訳処理に異常がある場合には、翻訳前の文章の内容に応じてキャラクタを動作させること等ができる。つまり、翻訳語の音声を出力することができない場合であっても、少なくとも部分的に発話者ＳＰの発言内容を受話者ＲＥに通知することができる。 The motion control unit 116 moves the character based on the first sentence information when there is no abnormality in the first evaluation and there is an abnormality in the second evaluation.
Thereby, the motion control unit 116 can move the character according to the content of the sentence before translation when there is no abnormality in the speech recognition process and there is an abnormality in the translation process. That is, even if the translated word cannot be output, at least partially, the content of the speech of the speaker SP can be notified to the receiver RE.

また、情報処理装置１０は、第１評価と第２評価との両方又はいずれかと、キャラクタの動作との対応関係を定める動作規定情報であって、言語ごとに互いに異なる動作規定情報を記憶する言語動作情報記憶部を備え、動作制御部は、第１言語又は第２言語に応じた動作規定情報を参照して第１評価と第２評価とのいずれか又は両方に対応する動作を選択し、選択した動作をキャラクタに行わせる。
これにより、情報処理装置１０は、文化圏ごとに異なるジェスチャーを適切に選択し、キャラクタを動作させることができる。 In addition, the information processing apparatus 10 is action specification information that defines a correspondence relationship between the character evaluation and / or the first evaluation and / or the second evaluation, and stores different action specification information for each language. An operation information storage unit is provided, and the operation control unit selects an operation corresponding to one or both of the first evaluation and the second evaluation with reference to the operation definition information corresponding to the first language or the second language, Causes the character to perform the selected action.
Thereby, the information processing apparatus 10 can appropriately select a different gesture for each cultural area and operate the character.

また、情報処理装置１０は、利用場面ごとに異なる動作規定情報であって、言語ごとの動作規定情報に定められていない対応関係を定める動作規定情報を記憶する場面動作記憶部を備え、動作制御部は、自装置の利用場面に応じた動作規定情報を参照して第１評価と第２評価とのいずれか又は両方に対応する動作を選択し、選択した動作をキャラクタに行わせる。
これにより、情報処理装置１０は、利用場面ごとに異なるパターンでキャラクタを動作させることができる。また、情報処理装置１０は、言語ごとの動作規定情報に定められている動作については、利用場面ごとの動作規定情報に定める必要がないため、キャラクタの動作パターンを効率的に規定することができる。 In addition, the information processing apparatus 10 includes a scene action storage unit that stores action definition information that is different in each usage scene and that defines a correspondence relationship that is not defined in the action definition information for each language. The section refers to the action definition information according to the usage scene of the own device, selects an action corresponding to one or both of the first evaluation and the second evaluation, and causes the character to perform the selected action.
Thereby, the information processing apparatus 10 can move a character with a different pattern for each use scene. Further, the information processing apparatus 10 can efficiently define the character motion pattern because the motion defined in the motion definition information for each language does not need to be defined in the motion definition information for each use scene. .

また、情報処理装置１０は、動作規定情報を設定するキャラクタの動作を選択する操作を受け付ける操作受付部１３２と、操作受付部１３２が受け付けた操作に基づいて、言語動作情報記憶部１２２と場面動作情報記憶部１２３とのいずれか又は両方に、動作規定情報を記憶させる動作登録部１１７と、を備える。
これにより、情報処理装置１０は、キャラクタの動作を、ユーザにより設定可能とするため、より自由度の高いコミュニケーションを提供することができる。 In addition, the information processing apparatus 10 receives an operation for selecting an action of a character for which action defining information is set, an operation reception unit 132 that receives an operation, and the language motion information storage unit 122 and the scene motion based on the operation received by the operation reception unit 132. An operation registration unit 117 that stores the operation defining information is provided in either or both of the information storage unit 123.
Thereby, the information processing apparatus 10 can provide communication with a higher degree of freedom in order to allow the user to set the character motion.

[第２の実施形態]
〔情報処理装置の概要〕
本発明の第２の実施形態について説明する。ここでは、上述した実施形態と同様の構成には、同一の符号を付し、説明を援用する。
本実施形態に係る情報処理装置１０Ａは、第１の実施形態に係る情報処理装置１０と同様に、互いに異なる２つ以上の言語における会話の通訳を支援する装置である。ただし、情報処理装置１０Ａは、音声認識処理や翻訳処理において異常が検出された場合に、発話者ＳＰだけでなく、受話者ＲＥに向けた動作をキャラクタＣＲに行わせる。 [Second Embodiment]
[Outline of information processing equipment]
A second embodiment of the present invention will be described. Here, the same code | symbol is attached | subjected to the structure similar to embodiment mentioned above, and description is used.
Similar to the information processing apparatus 10 according to the first embodiment, the information processing apparatus 10A according to the present embodiment is an apparatus that supports interpretation of conversations in two or more different languages. However, the information processing apparatus 10A causes the character CR to perform an action directed not only to the speaker SP but also to the receiver RE when an abnormality is detected in the speech recognition process or the translation process.

図８は、本実施形態に係る情報処理装置１０Ａの概要を説明するための図である。
図８には、情報処理装置１０Ａの利用に係る３つの場面Ｃ２１〜Ｃ２３を示す。これら第１〜第３場面Ｃ２１〜Ｃ２３には、音声受付部１３１−１、１３１−２、キャラクタＣＲと、発話者ＳＰと、受話者ＲＥと、がそれぞれ示されている。
第１場面Ｃ２１では、発話者ＳＰが発言を行っている。
次に、第２場面Ｃ２２では、発話者ＳＰが話している最中に、受話者ＲＥ（発話者ＳＰとは別のユーザ）が発言し始めている。この場合、発話者ＳＰの発言と受話者ＲＥとの発言が混ざるため、音声認識の精度が低下する恐れがあり、好ましくない。そこで、情報処理装置１０Ａは、受話者ＲＥの発言を制止する動作をキャラクタＣＲに行わせる。
その後の第３場面では、キャラクタＣＲの動作により受話者ＲＥは発言を取り止めている。このように、情報処理装置１０Ａは、発話者ＳＰだけでなく、受話者ＲＥに対しても翻訳を困難にしている原因やその対処法を示すことができるため、翻訳を適切に行うことができる。 FIG. 8 is a diagram for explaining an overview of the information processing apparatus 10A according to the present embodiment.
FIG. 8 shows three scenes C21 to C23 related to use of the information processing apparatus 10A. In these first to third scenes C21 to C23, voice reception units 131-1 and 131-2, a character CR, a speaker SP, and a receiver RE are shown, respectively.
In the first scene C21, the speaker SP is speaking.
Next, in the second scene C22, the speaker RE (a user different from the speaker SP) starts to speak while the speaker SP is speaking. In this case, since the utterance of the speaker SP and the utterance of the receiver RE are mixed, the accuracy of voice recognition may be lowered, which is not preferable. Therefore, the information processing apparatus 10 A causes the character CR to perform an action of stopping the speech of the listener RE.
In the third scene thereafter, the listener RE stops speaking by the action of the character CR. As described above, the information processing apparatus 10A can indicate not only the speaker SP but also the receiver RE not only the cause of the translation that is difficult, and the countermeasures, but can appropriately perform the translation. .

〔情報処理装置の構成〕
次に、情報処理装置１０Ａの構成について説明する。
図９は、本実施形態に係る情報処理装置１０Ａの機能構成の一例を示すブロック図である。
情報処理装置１０Ａは、第１の実施形態に係る情報処理装置１０が備える制御部１１０に代えて制御部１１０Ａを備える。また、制御部１１０Ａは、制御部１１０が備える動作制御部１１６に代えて動作制御部１１６Ａを備える。
動作制御部１１６Ａは、動作制御部１１６と同様に動作規定情報を参照して、キャラクタＣＲの動作を選択する。ただし、動作制御部１１６Ａが参照する動作規定情報には、例えば、発話者ＳＰ向け、受話者ＲＥ向け等、キャラクタＣＲが行う動作の対象者の情報が記述されている。
以上が情報処理装置１０Ａの構成についての説明である。 [Configuration of information processing device]
Next, the configuration of the information processing apparatus 10A will be described.
FIG. 9 is a block diagram illustrating an example of a functional configuration of the information processing apparatus 10A according to the present embodiment.
The information processing apparatus 10A includes a control unit 110A instead of the control unit 110 included in the information processing apparatus 10 according to the first embodiment. In addition, the control unit 110A includes an operation control unit 116A instead of the operation control unit 116 included in the control unit 110.
Similar to the motion control unit 116, the motion control unit 116A refers to the motion regulation information and selects the motion of the character CR. However, in the action defining information referred to by the action control unit 116A, for example, information on the target person of the action performed by the character CR, such as for the speaker SP and the receiver RE, is described.
The above is the description of the configuration of the information processing apparatus 10A.

〔第２の実施形態のまとめ〕
以上説明してきたように、本実施形態による情報処理装置１０Ａにおいて、動作制御部１１６Ａは、受話者ＲＥに対する動作をキャラクタに行わせる。
これにより、情報処理装置１０Ａは、音声認識処理や翻訳処理において異常が検出された場合に、発話者ＳＰだけでなく、受話者ＲＥに向けた動作をキャラクタＣＲに行わせる。従って、情報処理装置１０Ａは、発話者ＳＰの発言を翻訳しやすい環境を整え、翻訳を適切に行うことができる。 [Summary of Second Embodiment]
As described above, in the information processing apparatus 10A according to the present embodiment, the motion control unit 116A causes the character to perform a motion on the listener RE.
Thus, the information processing apparatus 10A causes the character CR to perform an action not only for the speaker SP but also for the receiver RE when an abnormality is detected in the speech recognition process or the translation process. Therefore, the information processing apparatus 10A can prepare an environment in which the speech of the speaker SP is easily translated and can appropriately perform the translation.

[第３の実施形態]
〔情報処理装置の概要〕
本発明の第３の実施形態について説明する。ここでは、上述した実施形態と同様の構成には、同一の符号を付し、説明を援用する。
本実施形態に係る情報処理装置１０Ｂは、第１の実施形態に係る情報処理装置１０と同様に、互いに異なる２つ以上の言語における会話の通訳を支援する装置である。ただし、情報処理装置１０Ｂは、自装置から出力した音声に対する受話者ＲＥの反応に応じて翻訳の誤りを検出する。 [Third embodiment]
[Outline of information processing equipment]
A third embodiment of the present invention will be described. Here, the same code | symbol is attached | subjected to the structure similar to embodiment mentioned above, and description is used.
Similar to the information processing apparatus 10 according to the first embodiment, the information processing apparatus 10B according to the present embodiment is an apparatus that supports interpretation of conversations in two or more different languages. However, the information processing apparatus 10B detects a translation error in accordance with the response of the listener RE to the voice output from the own apparatus.

図１０は、本実施形態に係る情報処理装置１０Ｂの概要を説明するための図である。
図１０には、情報処理装置１０Ｂの利用に係る３つの場面Ｃ３１〜Ｃ３３を示す。これら第１〜第３場面Ｃ３１〜Ｃ３３には、音声受付部１３１−１、１３１−２、キャラクタＣＲと、発話者ＳＰと、受話者ＲＥと、がそれぞれ示されている。
第１場面Ｃ３１では、情報処理装置１０Ｂが、発話者ＳＰの発言を翻訳した音声を出力している。
次に、第２場面Ｃ３２では、受話者ＲＥが情報処理装置１０Ｂからの発言を理解できなかったため、聞き直している。
その後の第３場面Ｃ３３では、情報処理装置１０Ｂは、第１場面Ｃ３１において出力した音声の翻訳に誤りがあった可能性があるため、受話者ＲＥに謝罪を示す動作を行い、謝罪および発話者に確認を行う音声を出力している。このように情報処理装置１０Ｂは、自装置から出力した音声に対する受話者ＲＥの反応により翻訳結果の確認を行い、適切な意図伝達のための対応を行うことができる。 FIG. 10 is a diagram for explaining an overview of the information processing apparatus 10B according to the present embodiment.
FIG. 10 shows three scenes C31 to C33 related to use of the information processing apparatus 10B. In these first to third scenes C31 to C33, voice receiving units 131-1 and 131-2, a character CR, a speaker SP, and a receiver RE are shown, respectively.
In the first scene C31, the information processing apparatus 10B outputs a voice obtained by translating the speech of the speaker SP.
Next, in the second scene C32, the listener RE cannot hear the speech from the information processing apparatus 10B, and therefore listens again.
In the third scene C33 thereafter, the information processing apparatus 10B performs an operation of showing an apology to the listener RE because there is a possibility that there is an error in the translation of the voice output in the first scene C31. The sound for confirmation is output. In this way, the information processing apparatus 10B can confirm the translation result based on the reaction of the listener RE with respect to the voice output from the own apparatus, and can take appropriate measures for intention transmission.

〔情報処理装置の構成〕
次に、情報処理装置１０Ｂの構成について説明する。
図１１は、本実施形態に係る情報処理装置１０Ｂの機能構成の一例を示すブロック図である。
情報処理装置１０Ｂは、第１の実施形態に係る情報処理装置１０が備える制御部１１０に代えて制御部１１０Ｂを備える。また、制御部１１０Ｂは、制御部１１０が備える動作制御部１１６に代えて動作制御部１１６Ｂを備える。 [Configuration of information processing device]
Next, the configuration of the information processing apparatus 10B will be described.
FIG. 11 is a block diagram illustrating an example of a functional configuration of the information processing apparatus 10B according to the present embodiment.
The information processing apparatus 10B includes a control unit 110B instead of the control unit 110 included in the information processing apparatus 10 according to the first embodiment. Further, the control unit 110B includes an operation control unit 116B instead of the operation control unit 116 included in the control unit 110.

動作制御部１１６Ｂは、動作制御部１１６と同様に動作規定情報を参照して、キャラクタＣＲの動作を選択する。ただし、動作制御部１１６Ｂは、第１文章の精度や第２文章の精度が所定の閾値に比して低い場合に出力した翻訳音声に対して、受話者ＲＥが所定の言葉や文章を発した場合に、特定の動作を選択する。ここで、所定の言葉や文章とは、例えば、聞き返しに用いられる言葉や文章である。また、特定の動作とは、例えば、受話者ＲＥに対して謝罪を示す動作や、発話者ＳＰに対して再度の発言を促す動作である。所定の言葉や文章や、これらに対応する動作は、例えば、動作規定情報に記述されている。 Similar to the motion control unit 116, the motion control unit 116B refers to the motion regulation information and selects the motion of the character CR. However, the motion control unit 116B is configured such that the listener RE utters a predetermined word or sentence in response to the translated speech output when the accuracy of the first sentence or the second sentence is lower than a predetermined threshold. If a particular action is selected. Here, the predetermined words and sentences are, for example, words and sentences used for listening back. The specific operation is, for example, an operation that shows an apology to the listener RE or an operation that prompts the speaker SP to speak again. Predetermined words and sentences and actions corresponding to these are described in, for example, action definition information.

なお、動作制御部１１６Ｂは、例えば、音声出力部１４から音声を出力してからの所定期間に、上記所定の言葉や文章が入力された場合にのみ、上記特定の動作をキャラクタＣＲに行わせるようにしてもよい。また、音声合成部１１３は、動作に合わせて、その動作に対応する音声を音声出力部１４に出力させてもよい。
以上が情報処理装置１０Ｂの構成についての説明である。 For example, the motion control unit 116B causes the character CR to perform the specific motion only when the predetermined word or sentence is input during a predetermined period after the voice is output from the voice output unit 14. You may do it. Further, the voice synthesis unit 113 may cause the voice output unit 14 to output a voice corresponding to the operation in accordance with the operation.
The above is the description of the configuration of the information processing apparatus 10B.

〔第３の実施形態のまとめ〕
以上説明してきたように、本実施形態による情報処理装置１０Ｂにおいて、動作制御部１１６Ｂは、音声出力部１４が出力した音声に対して特定の音声入力が行われた場合に、キャラクタを動作させる。
これにより、情報処理装置１０Ｂは、自装置から出力した音声に対する受話者ＲＥの反応に応じて翻訳の誤りを検出する。 [Summary of Third Embodiment]
As described above, in the information processing apparatus 10 B according to the present embodiment, the motion control unit 116 B moves the character when a specific voice input is performed on the voice output from the voice output unit 14.
Thereby, the information processing apparatus 10B detects an error in translation according to the reaction of the listener RE with respect to the voice output from the own apparatus.

［変形例］
以上、この発明の実施形態について図面を参照して詳述してきたが、具体的な構成は上述の実施形態に限られるものではなく、この発明の要旨を逸脱しない範囲の設計等も含まれる。例えば、上述の第１から第３の実施形態において説明した各構成は、任意に組み合わせたり、分離したりすることができる。 [Modification]
The embodiment of the present invention has been described in detail with reference to the drawings. However, the specific configuration is not limited to the above-described embodiment, and includes a design and the like within a scope not departing from the gist of the present invention. For example, the configurations described in the first to third embodiments described above can be arbitrarily combined or separated.

例えば、情報処理装置１０、１０Ａ、１０Ｂは、ネットワークを介して接続された端末装置であってよい。この場合、発話者ＳＰと受話者ＲＥがそれぞれ情報処理装置１０、１０Ａ、１０Ｂを所有し、各情報処理装置１０、１０Ａ、１０Ｂ間で第１文章情報、第２文章情報、第１評価情報、第２評価情報等を送受信することにより、上述した各実施形態に係る情報処理装置１０、１０Ａ、１０Ｂの動作を実現してもよい。また、情報処理装置１０、１０Ａ、１０Ｂはサーバ装置であってもよい。この場合、発話者ＳＰと受話者ＲＥとは、それぞれ、サーバ装置と通信する端末装置を使用し、該端末装置に表示されたキャラクタＣＲを介して翻訳サービスを受けてもよい。 For example, the information processing devices 10, 10A, and 10B may be terminal devices connected via a network. In this case, the speaker SP and the listener RE own the information processing apparatuses 10, 10A, and 10B, respectively, and the first sentence information, the second sentence information, the first evaluation information, between the information processing apparatuses 10, 10A, and 10B, The operations of the information processing apparatuses 10, 10A, and 10B according to the above-described embodiments may be realized by transmitting and receiving the second evaluation information and the like. Further, the information processing apparatuses 10, 10A, 10B may be server apparatuses. In this case, each of the speaker SP and the receiver RE may use a terminal device that communicates with the server device and receive a translation service via the character CR displayed on the terminal device.

また、第１評価部１１４は、音声認識評価処理のうちの一部のみを行ってもよい。この場合、例えば、音声認識部１１１が確信度等の評価指標の測定を行い、測定結果を第１評価部１１４に出力する。そして、第１評価部１１４は、取得した測定結果と、所定の閾値とを比較して音声認識処理における異常の有無を判定する。また、別の例では、第１評価部１１４が確信度等の評価指標の測定を行い、測定結果を動作制御部１１６、１１６Ａ、１１６Ｂに出力する。そして、動作制御部１１６、１１６Ａ、１１６Ｂは、取得した測定結果と、所定の閾値とを比較して音声認識処理における異常の有無を判定する。 The first evaluation unit 114 may perform only a part of the speech recognition evaluation process. In this case, for example, the voice recognition unit 111 measures an evaluation index such as a certainty factor and outputs the measurement result to the first evaluation unit 114. Then, the first evaluation unit 114 compares the acquired measurement result with a predetermined threshold value to determine whether there is an abnormality in the voice recognition process. In another example, the first evaluation unit 114 measures an evaluation index such as a certainty factor and outputs the measurement result to the operation control units 116, 116A, and 116B. Then, the operation control units 116, 116A, and 116B compare the acquired measurement results with a predetermined threshold value to determine whether or not there is an abnormality in the voice recognition process.

同様に、第２評価部１１５は、翻訳評価処理のうちの一部のみを行ってもよい。この場合、例えば、翻訳部１１２が確信度等の評価指標の測定を行い、測定結果を第２評価部１１５に出力する。そして、第２評価部１１５は、取得した測定結果と、所定の閾値とを比較して翻訳処理における異常の有無を判定する。また、別の例では、第２評価部１１５が確信度等の評価指標の測定を行い、測定結果を動作制御部１１６、１１６Ａ、１１６Ｂに出力する。そして、動作制御部１１６、１１６Ａ、１１６Ｂは、取得した測定結果と、所定の閾値とを比較して翻訳処理における異常の有無を判定する。
このように、上述した各実施形態における情報処理装置１０、１０Ａ、１０Ｂの各構成が行う処理は、任意に分離されたり、他の構成により行われてもよい。 Similarly, the second evaluation unit 115 may perform only a part of the translation evaluation process. In this case, for example, the translation unit 112 measures an evaluation index such as a certainty factor and outputs the measurement result to the second evaluation unit 115. Then, the second evaluation unit 115 compares the acquired measurement result with a predetermined threshold value to determine whether there is an abnormality in the translation process. In another example, the second evaluation unit 115 measures an evaluation index such as a certainty factor and outputs the measurement result to the operation control units 116, 116A, and 116B. Then, the operation control units 116, 116A, and 116B compare the acquired measurement results with a predetermined threshold value to determine whether there is an abnormality in the translation process.
As described above, the processing performed by each configuration of the information processing apparatuses 10, 10A, and 10B in the above-described embodiments may be arbitrarily separated or performed by another configuration.

また、上述した実施形態において、音声認識評価処理や翻訳評価処理に用いる各種閾値は、固定でなくてもよい。例えば、これらの閾値は言語や利用場面に応じて予め定められたものを採用してもよい。この場合、これらの閾値は、言語ごとの動作規定情報や利用場面ごとの動作規定情報に予め記述されていてもよい。 In the above-described embodiment, various threshold values used for the speech recognition evaluation process and the translation evaluation process may not be fixed. For example, these threshold values may be determined in advance according to the language or usage scene. In this case, these threshold values may be described in advance in the action definition information for each language or the action definition information for each use scene.

また、上述した実施形態では、発話者ＳＰの音声による発言を翻訳する場合について説明したが、これには限られない。情報処理装置１０は、例えば、音声認識することなく、直接的に第１文章情報を取得してもよい。具体的には、情報処理装置１０は、キーボード等の文字入力用の入力装置を備え、該入力装置への入力に基づいて、第１文章情報を取得してもよい。また、例えば、情報処理装置１０は、外部装置から送信された第１文章情報を取得してもよい。このように、情報処理装置１０は、文字入力や、他の装置との通信等を介して第１文章情報を取得する取得部を備えてもよい。 Moreover, although embodiment mentioned above demonstrated the case where the speech by the voice of speaker SP was translated, it is not restricted to this. For example, the information processing apparatus 10 may directly acquire the first sentence information without performing voice recognition. Specifically, the information processing device 10 may include an input device for character input such as a keyboard, and may acquire the first sentence information based on the input to the input device. For example, the information processing apparatus 10 may acquire the first sentence information transmitted from the external apparatus. As described above, the information processing apparatus 10 may include an acquisition unit that acquires the first sentence information through character input, communication with another apparatus, or the like.

また、上述の情報処理装置１０、１０Ａ、１０Ｂの機能を実現するためのプログラムをコンピュータ読み取り可能な記録媒体に記録して、この記録媒体に記録されたプログラムをコンピュータシステムに読み込ませ、実行することにより情報処理装置１０、１０Ａ、１０Ｂとしての処理を行ってもよい。ここで、「記録媒体に記録されたプログラムをコンピュータシステムに読み込ませ、実行する」とは、コンピュータシステムにプログラムをインストールすることを含む。 Also, a program for realizing the functions of the information processing apparatuses 10, 10A, and 10B described above is recorded on a computer-readable recording medium, and the program recorded on the recording medium is read into a computer system and executed. Thus, the processing as the information processing devices 10, 10A, 10B may be performed. Here, “loading and executing a program recorded on a recording medium into a computer system” includes installing the program in the computer system.

ここでいう「コンピュータシステム」とは、ＯＳや周辺機器等のハードウェアを含むものとする。また、「コンピュータシステム」は、インターネットやＷＡＮ、ＬＡＮ、専用回線等の通信回線を含むネットワークを介して接続された複数のコンピュータ装置を含んでもよい。また、「コンピュータ読み取り可能な記録媒体」とは、フレキシブルディスク、光磁気ディスク、ＲＯＭ、ＣＤ−ＲＯＭ等の可搬媒体、コンピュータシステムに内蔵されるハードディスク等の記憶装置のことをいう。 The “computer system” here includes an OS and hardware such as peripheral devices. The “computer system” may include a plurality of computer devices connected via a network including a communication line such as the Internet, WAN, LAN, and dedicated line. The “computer-readable recording medium” refers to a storage device such as a flexible medium, a magneto-optical disk, a portable medium such as a ROM and a CD-ROM, and a hard disk incorporated in a computer system.

このように、プログラムを記憶した記録媒体は、ＣＤ−ＲＯＭ等の非一過性の記録媒体であってもよい。また、記録媒体には、当該プログラムを配信するために配信サーバからアクセス可能な内部又は外部に設けられた記録媒体も含まれる。配信サーバの記録媒体に記憶されるプログラムのコードは、端末装置で実行可能な形式のプログラムのコードと異なるものでもよい。すなわち、配信サーバからダウンロードされて端末装置で実行可能な形でインストールができるものであれば、配信サーバで記憶される形式は問わない。なお、プログラムを複数に分割し、それぞれ異なるタイミングでダウンロードした後に端末装置で合体される構成や、分割されたプログラムのそれぞれを配信する配信サーバが異なっていてもよい。 As described above, the recording medium storing the program may be a non-transitory recording medium such as a CD-ROM. The recording medium also includes a recording medium provided inside or outside that is accessible from the distribution server in order to distribute the program. The code of the program stored in the recording medium of the distribution server may be different from the code of the program that can be executed by the terminal device. That is, the format stored in the distribution server is not limited as long as it can be downloaded from the distribution server and installed in a form that can be executed by the terminal device. Note that the program may be divided into a plurality of parts, downloaded at different timings, and combined in the terminal device, or the distribution server that distributes each of the divided programs may be different.

さらに「コンピュータ読み取り可能な記録媒体」とは、ネットワークを介してプログラムが送信された場合のサーバやクライアントとなるコンピュータシステム内部の揮発性メモリ（ＲＡＭ）のように、一定時間プログラムを保持しているものも含むものとする。また、上記プログラムは、上述した機能の一部を実現するためのものであってもよい。さらに、上述した機能をコンピュータシステムに既に記録されているプログラムとの組み合わせで実現できるもの、いわゆる差分ファイル（差分プログラム）であってもよい。 Furthermore, the “computer-readable recording medium” holds a program for a certain period of time, such as a volatile memory (RAM) inside a computer system that becomes a server or a client when the program is transmitted via a network. Including things. The program may be for realizing a part of the functions described above. Furthermore, what can implement | achieve the function mentioned above in combination with the program already recorded on the computer system, and what is called a difference file (difference program) may be sufficient.

１０、１０Ａ、１０Ｂ…情報処理装置、１１…ＣＰＵ、１２…記憶部、１２１…動作情報記憶部、１３…入力部、１３１…音声受付部、１３２…操作受付部、１４…音声出力部、１５…表示部、１１０、１１０Ａ、１１０Ｂ…制御部、１１１…音声認識部、１１２…翻訳部、１１３…音声合成部、１１６、１１６Ａ、１１６Ｂ…動作制御部 DESCRIPTION OF SYMBOLS 10, 10A, 10B ... Information processing apparatus, 11 ... CPU, 12 ... Storage part, 121 ... Operation information storage part, 13 ... Input part, 131 ... Audio | voice reception part, 132 ... Operation reception part, 14 ... Audio | voice output part, 15 ... Display unit, 110, 110A, 110B ... Control unit, 111 ... Speech recognition unit, 112 ... Translation unit, 113 ... Speech synthesis unit, 116, 116A, 116B ... Operation control unit

Claims

An acquisition unit for acquiring first sentence information indicating a sentence in a first language;
A translation unit for executing a translation process for generating second sentence information indicating a sentence obtained by translating the sentence indicated by the first sentence information into a second language different from the first language;
A second evaluation unit for evaluating the presence or absence of abnormality in the translation process;
A motion control unit that moves the character based on the second evaluation by the second evaluation unit;
An information processing apparatus comprising:

A first evaluator that evaluates the presence or absence of an abnormality in the speech recognition process for generating the first sentence information from speech information indicating the speech in the first language;
With
The acquisition unit acquires the first sentence information by executing the voice recognition process,
The information processing apparatus according to claim 1, wherein the motion control unit moves the character based on one or both of a first evaluation by the first evaluation unit and a second evaluation by the second evaluation unit.

The information processing apparatus according to claim 2, wherein the motion control unit moves the character based on the second sentence information when there is no abnormality in the speech recognition process and there is no abnormality in the translation process. .

The said action control part operates the said character based on said 1st sentence information, when there is no abnormality in the said speech recognition process and there is abnormality in the said translation process. Information processing device.

A language action information storage unit that stores action definition information that defines a correspondence relationship between the first evaluation and / or the second evaluation and the action of the character, and that is different from each other for each language,
With
The operation control unit selects and selects an operation corresponding to one or both of the first evaluation and the second evaluation with reference to the operation definition information according to the first language or the second language. The information processing apparatus according to claim 2, wherein the character is caused to perform an action.

A scene action storage unit that stores action regulation information that defines the correspondence relationship, which is different action definition information for each use scene and is not defined in the action definition information for each language,
With
The motion control unit selects a motion corresponding to one or both of the first evaluation and the second evaluation with reference to motion regulation information corresponding to a use scene of the device, and selects the selected motion as the character. The information processing apparatus according to claim 5.

An operation accepting unit for accepting an operation for setting the operation regulation information;
Based on the operation accepted by the operation accepting unit, an operation registration unit that stores the action defining information in either or both of the language action information storage unit and the scene action storage unit;
The information processing apparatus according to claim 6.

A first step in which the information processing apparatus acquires first sentence information indicating a sentence in a first language;
A second step in which the information processing apparatus executes a translation process for generating second sentence information indicating a sentence obtained by translating the sentence indicated by the first sentence information into a second language different from the first language;
A third step in which the information processing apparatus evaluates whether there is an abnormality in the translation process;
A fourth step in which the information processing apparatus moves the character based on the evaluation in the third step;
An information processing method including:

On the computer,
A first step of acquiring first sentence information indicating a sentence in a first language;
A second step of executing a translation process for generating second sentence information indicating a sentence obtained by translating the sentence indicated by the first sentence information into a second language different from the first language;
A third step of evaluating the presence or absence of abnormality in the translation process;
A fourth step of moving the character based on the evaluation in the third step;
A program that executes