JP2015055773A

JP2015055773A - Information processing device, method, and program

Info

Publication number: JP2015055773A
Application number: JP2013189446A
Authority: JP
Inventors: 昌明遠藤; Masaaki Endo; 清幸鈴木; Kiyoyuki Suzuki; 雅巳中村; Masami Nakamura
Original assignee: CAREER & BRIDGE Inc; Advanced Media Inc
Current assignee: CAREER & BRIDGE Inc; Advanced Media Inc
Priority date: 2013-09-12
Filing date: 2013-09-12
Publication date: 2015-03-23

Abstract

PROBLEM TO BE SOLVED: To achieve efficiency of work which is constructed in a screen.SOLUTION: A touch operation recognition part 51 recognizes content of touch operation performed by a user. A voice recognition control part 52 controls execution of voice recognition processing to voice data indicating utterance content of the user. An operation recognition integrated part 53 integrates the content of the touch operation recognized by the touch operation recognition part 51 and the result of the voice recognition processing executed by the control of the voice recognition control part 52, for recognizing the content of whole operation performed by the user. Especially the operation recognition integrated part 53, when the touch operation and utterance are simultaneously performed by the user, integrates the content of the touch operation and the result of the voice recognition processing, and recognizes as same instruction operation. On the other hand, the operation recognition integrated part 53, when only one of the touch operation and the utterance is performed by the user, divides at least one of a kind and accuracy of the operation by the content of the touch operation and the result of the voice recognition processing, thereby recognizes the content of whole operation performed by the user.

Description

本発明は、情報処理装置、情報処理方法及びプログラムに関する。 The present invention relates to an information processing apparatus, an information processing method, and a program.

従来より、情報処理装置に対する入力の技術として、ユーザの指等をタッチパネルの画面にタッチさせて（このような操作を以下「タッチ操作」と呼ぶ）、そのタッチの位置座標を入力する技術が広く知られている。
また、音声認識技術の発展により、ユーザの発話内容を音声データとして入力し、当該音声データに対して音声認識処理を施し、その認識結果を入力する技術も、近年登場してきている。
さらに、特許文献１には、タッチ操作と音声入力の両方を併せた技術も開示されている。 2. Description of the Related Art Conventionally, as an input technique for an information processing apparatus, there has been widely used a technique in which a user's finger or the like is touched on the screen of a touch panel (such an operation is hereinafter referred to as “touch operation”) and the touch position coordinates are input Are known.
In recent years, with the development of voice recognition technology, a technology for inputting user's utterance content as voice data, performing voice recognition processing on the voice data, and inputting the recognition result has recently appeared.
Further, Patent Document 1 discloses a technique that combines both touch operation and voice input.

特開２００７−０４８１７７号公報JP 2007-048177 A

しかしながら、特許文献１に記載の技術は、単に、タッチ操作による入力内容の補正処理に、音声入力の内容を補助的に用いる技術に過ぎない。従って、近年、画面内で作図する作業の効率化という要求が挙げられているが、特許文献１に記載の技術を含め従来の技術では、かかる要求に充分に応えられない状況である。 However, the technique described in Patent Document 1 is merely a technique that supplementarily uses the contents of the voice input in the correction process of the input contents by the touch operation. Therefore, in recent years, there has been a demand for improving the efficiency of drawing on the screen. However, conventional techniques including the technique described in Patent Document 1 cannot sufficiently meet such demands.

本発明は、このような状況に鑑みてなされたものであり、画面内で作図する作業の効率化を図ることを目的とする。 The present invention has been made in view of such a situation, and an object of the present invention is to improve the efficiency of the work of drawing on the screen.

上記目的を達成するため、本発明の一態様の情報処理装置は、
ユーザによるタッチ操作の内容を認識するタッチ操作認識手段と、
前記ユーザの発話内容を示す音声データに対する音声認識処理の実行を制御する音声認識制御手段と、
前記タッチ操作認識手段により認識された前記タッチ操作の内容と、前記音声認識制御手段の制御により実行された前記音声認識処理の結果とを統合して、前記ユーザによる操作全体の内容を認識する操作認識統合手段と、
を備え、
前記操作認識統合手段は、
ユーザによる前記タッチ操作と発話とが同時に行われた場合、前記タッチ操作の内容と前記音声認識処理の結果とを統合して、同一の指示操作として認識し、
ユーザによる前記タッチ操作と発話とのうちの一方が単体で行われた場合、操作の種類と精度のうち少なくとも一方を、前記タッチ操作の内容と前記音声認識処理の結果とで切り分けることで、前記ユーザによる操作全体の内容を認識する、
ことを特徴とする。 In order to achieve the above object, an information processing apparatus of one embodiment of the present invention provides:
Touch operation recognition means for recognizing the content of the touch operation by the user;
Voice recognition control means for controlling execution of voice recognition processing on voice data indicating the user's speech content;
An operation for recognizing the content of the entire operation by the user by integrating the content of the touch operation recognized by the touch operation recognizing unit and the result of the speech recognition process executed by the control of the speech recognition control unit. Cognitive integration means,
With
The operation recognition integration unit includes:
When the touch operation and utterance by the user are performed at the same time, the content of the touch operation and the result of the voice recognition process are integrated and recognized as the same instruction operation,
When one of the touch operation and utterance by the user is performed alone, at least one of the type and accuracy of the operation is separated by the content of the touch operation and the result of the voice recognition process, Recognize the entire content of user operations,
It is characterized by that.

本発明によれば、画面内で作図する作業の効率化を図ることが可能になる。 According to the present invention, it is possible to improve the efficiency of the work of drawing on the screen.

本発明の一実施形態に係る情報システムの構成を示す模式図である。It is a schematic diagram which shows the structure of the information system which concerns on one Embodiment of this invention. 図１の情報処理システムのうち、本発明の一実施形態に係るユーザ端末のハードウェアの構成を示すブロック図である。It is a block diagram which shows the structure of the hardware of the user terminal which concerns on one Embodiment of this invention among the information processing systems of FIG. 図２のユーザ端末の機能的構成のうち、アプリ実行処理を実行するための機能的構成を示す機能ブロック図である。It is a functional block diagram which shows the functional structure for performing an application execution process among the functional structures of the user terminal of FIG. 図３の機能構成を有する図２のユーザ端末の記憶部に記憶されている指示操作テーブルを示す図である。It is a figure which shows the instruction | indication operation table memorize | stored in the memory | storage part of the user terminal of FIG. 2 which has the function structure of FIG. 図３の機能的構成を有する図２のユーザ端末が実行するアプリ実行処理の流れを説明するフローチャートである。It is a flowchart explaining the flow of the application execution process which the user terminal of FIG. 2 which has the functional structure of FIG. 3 performs. 図３の機能的構成を有する図２のユーザ端末の表示部で表示されるオブジェクトの表示例を説明する図である。It is a figure explaining the example of a display of the object displayed on the display part of the user terminal of FIG. 2 which has the functional structure of FIG. 図３の機能的構成を有する図２のユーザ端末の表示部で表示されるオブジェクトの表示例を説明する図である。It is a figure explaining the example of a display of the object displayed on the display part of the user terminal of FIG. 2 which has the functional structure of FIG. 図３の機能的構成を有する図２のユーザ端末の表示部で表示されるオブジェクトの表示例を説明する図である。It is a figure explaining the example of a display of the object displayed on the display part of the user terminal of FIG. 2 which has the functional structure of FIG. 図３の機能的構成を有する図２のユーザ端末の表示部で表示されるオブジェクトの表示例を説明する図である。It is a figure explaining the example of a display of the object displayed on the display part of the user terminal of FIG. 2 which has the functional structure of FIG. 図３の機能的構成を有する図２のユーザ端末の表示部で表示されるオブジェクトの表示例を説明する図である。It is a figure explaining the example of a display of the object displayed on the display part of the user terminal of FIG. 2 which has the functional structure of FIG. 図３の機能的構成を有する図２のユーザ端末の表示部で表示されるオブジェクトの表示例を説明する図である。It is a figure explaining the example of a display of the object displayed on the display part of the user terminal of FIG. 2 which has the functional structure of FIG. 図３の機能的構成を有する図２のユーザ端末の表示部で表示されるオブジェクトの表示例を説明する図である。It is a figure explaining the example of a display of the object displayed on the display part of the user terminal of FIG. 2 which has the functional structure of FIG. 図３の機能的構成を有する図２のユーザ端末の表示部で表示されるオブジェクトの表示例を説明する図である。It is a figure explaining the example of a display of the object displayed on the display part of the user terminal of FIG. 2 which has the functional structure of FIG.

以下、本発明の実施形態について、図面を用いて説明する。 Hereinafter, embodiments of the present invention will be described with reference to the drawings.

図１は、本発明の一実施形態に係る情報処理システムの構成を示している。
図１に示す情報処理システムは、Ｎ台（Ｎは１以上の任意の整数値）のユーザ端末１１−１乃至１１−Ｎの各々と、音声認識サーバ１２とが、インターネット等の所定のネットワーク１３を介して相互に接続されることで構成されている。 FIG. 1 shows the configuration of an information processing system according to an embodiment of the present invention.
The information processing system shown in FIG. 1 includes N user terminals 11-1 to 11-N (N is an arbitrary integer value equal to or greater than 1) and a voice recognition server 12 that are connected to a predetermined network 13 such as the Internet. It is comprised by connecting mutually via.

ユーザ端末１１−１乃至１１−Ｎは、複数のユーザのうち所定の者に所持される、スマートフォン、携帯電話機、パーソナルコンピュータ等で構成される。
なお、以下、ユーザ端末１１−１乃至１１−Ｎの各々を個々に区別する必要が無い場合、これらをまとめて、「ユーザ端末１１」と呼ぶ。
ユーザ端末１１は、所定のアプリケーションソフトウェア（Ａｐｐｌｉｃａｔｉｏｎｓｏｆｔｗａｒｅ）、例えばプレゼンテーション用の資料を作成するためのソフトウェアを実行し、当該ソフトウェアに対する命令の操作入力として、タッチパネルを利用した入力と、音声の入力とを共に受け付けることができる。
ユーザ端末１１は、音声の入力を受け付けた場合、当該音声をデータ化して（以下、データ化された音声を「音声データ」と呼ぶ）、ネットワーク１３を介して音声認識サーバ１２に送信する。 The user terminals 11-1 to 11-N are configured by a smartphone, a mobile phone, a personal computer, or the like possessed by a predetermined person among a plurality of users.
Hereinafter, when it is not necessary to individually distinguish each of the user terminals 11-1 to 11-N, these are collectively referred to as “user terminal 11”.
The user terminal 11 executes predetermined application software (Application software), for example, software for creating a presentation material, and inputs using a touch panel and voice input as operation inputs of instructions to the software Both can be accepted.
When receiving input of voice, the user terminal 11 converts the voice into data (hereinafter, the voice converted into data is referred to as “voice data”), and transmits the voice to the voice recognition server 12 via the network 13.

音声認識サーバ１２は、ユーザ端末１１からの音声データをネットワーク１３を介して受信すると、当該音声データに対して音声認識処理を施し、その認識結果をユーザ端末１１に送信する。
音声認識サーバ１２は、複数のユーザ端末１１から音声データが夫々送信されてきた場合も、夫々の音声データに対して並列に音声認識処理をリアルタイムに施すことができる。 When the voice recognition server 12 receives voice data from the user terminal 11 via the network 13, the voice recognition server 12 performs voice recognition processing on the voice data and transmits the recognition result to the user terminal 11.
The voice recognition server 12 can perform voice recognition processing on each voice data in parallel in real time even when voice data is transmitted from each of the plurality of user terminals 11.

図２は、図１の情報処理システムのうち、本発明の情報処理装置の一実施形態に係るユーザ端末１１のハードウェアの構成を示すブロック図である。
ユーザ端末１１は、ＣＰＵ（ＣｅｎｔｒａｌＰｒｏｃｅｓｓｉｎｇＵｎｉｔ）２１と、ＲＯＭ（ＲｅａｄＯｎｌｙＭｅｍｏｒｙ）２２と、ＲＡＭ（ＲａｎｄｏｍＡｃｃｅｓｓＭｅｍｏｒｙ）２３と、バス２４と、入出力インターフェース２５と、タッチ操作入力部２６と、表示部２７と、音声入力部２８と、記憶部２９と、通信部３０と、ドライブ３１と、を備えている。 FIG. 2 is a block diagram showing a hardware configuration of the user terminal 11 according to an embodiment of the information processing apparatus of the present invention in the information processing system of FIG.
The user terminal 11 includes a CPU (Central Processing Unit) 21, a ROM (Read Only Memory) 22, a RAM (Random Access Memory) 23, a bus 24, an input / output interface 25, a touch operation input unit 26, a display. Unit 27, voice input unit 28, storage unit 29, communication unit 30, and drive 31.

ＣＰＵ２１は、ＲＯＭ２２に記録されているプログラム、又は、記憶部２９からＲＡＭ２３にロードされたプログラムに従って各種の処理を実行する。
ＲＡＭ２３には、ＣＰＵ２１が各種の処理を実行する上において必要なデータ等も適宜記憶される。 The CPU 21 executes various processes according to a program recorded in the ROM 22 or a program loaded from the storage unit 29 to the RAM 23.
The RAM 23 appropriately stores data necessary for the CPU 21 to execute various processes.

ＣＰＵ２１、ＲＯＭ２２及びＲＡＭ２３は、バス２４を介して相互に接続されている。このバス２４にはまた、入出力インターフェース２５も接続されている。入出力インターフェース２５には、タッチ操作入力部２６、表示部２７、音声入力部２８、記憶部２９、通信部３０、及びドライブ３１が接続されている。 The CPU 21, ROM 22, and RAM 23 are connected to each other via a bus 24. An input / output interface 25 is also connected to the bus 24. A touch operation input unit 26, a display unit 27, a voice input unit 28, a storage unit 29, a communication unit 30, and a drive 31 are connected to the input / output interface 25.

タッチ操作入力部２６は、例えば表示部２７の表示領域に積層される静電容量式又は抵抗膜式の位置入力センサにより構成され、タッチ操作がなされた位置の座標を検出する。ここで、タッチ操作とは、タッチ操作入力部２６に対する物体（ユーザの指やタッチペン等）の接触又は近接の操作をいう。なお、以下、タッチ操作がなされた位置を「タッチ位置」と呼び、タッチ位置の座標を「タッチ座標」と呼ぶ。
表示部２７は、ディスプレイにより構成され、各種画像を表示する。
即ち、本実施形態では、タッチ操作入力部２６と表示部２７とにより、タッチパネルが構成されている。 The touch operation input unit 26 includes, for example, a capacitance type or resistance film type position input sensor stacked in the display area of the display unit 27, and detects the coordinates of the position where the touch operation is performed. Here, the touch operation refers to a contact or proximity operation of an object (such as a user's finger or a touch pen) with respect to the touch operation input unit 26. Hereinafter, the position where the touch operation is performed is referred to as “touch position”, and the coordinates of the touch position are referred to as “touch coordinates”.
The display unit 27 includes a display and displays various images.
That is, in the present embodiment, the touch operation input unit 26 and the display unit 27 constitute a touch panel.

音声入力部２８は、例えばマイクロフォンにより構成され、ユーザの発話による音声等をアナログ信号として入力する。
なお、音声のアナログ信号は、Ａ／Ｄ変換処理が施されてデジタル信号、即ち音声データに変換される。このＡ／Ｄ変換処理の実行場所は、特に限定されず、ＣＰＵ２１であってもよいし、図示せぬ専用のハードウェアであってもよいが、本実施形態では説明の便宜上、音声入力部２８であるものとする。
即ち、本実施形態では、音声入力部２８は、ユーザの発話を示す音声データを出力し、ＣＰＵ２１等に供給する。 The voice input unit 28 is composed of, for example, a microphone, and inputs voice or the like generated by the user's utterance as an analog signal.
Note that the audio analog signal is subjected to A / D conversion processing and converted into a digital signal, that is, audio data. The execution place of the A / D conversion processing is not particularly limited, and may be the CPU 21 or dedicated hardware (not shown), but in the present embodiment, for convenience of explanation, the voice input unit 28 is used. Suppose that
That is, in this embodiment, the voice input unit 28 outputs voice data indicating the user's utterance and supplies it to the CPU 21 and the like.

記憶部２９は、ハードディスクやＤＲＡＭ（ＤｙｎａｍｉｃＲａｎｄｏｍＡｃｃｅｓｓＭｅｍｏｒｙ）等で構成され、各種データを記憶する。
通信部３０は、ネットワーク１３を介して他の装置（本実施形態では主に図１の音声認識サーバ１２）との間で行う通信を制御する。 The storage unit 29 is configured with a hard disk, a DRAM (Dynamic Random Access Memory), or the like, and stores various data.
The communication unit 30 controls communication performed with another device (mainly the voice recognition server 12 in FIG. 1 in the present embodiment) via the network 13.

ドライブ３１には、必要に応じて、リムーバブルメディア４１が適宜装着される。ドライブ３１によってリムーバブルメディア４１から読み出されたプログラムは、必要に応じて記憶部２９にインストールされる。また、リムーバブルメディア４１は、記憶部２９に記憶されている各種データも、記憶部２９と同様に記憶することができる。 A removable medium 41 is appropriately attached to the drive 31 as necessary. The program read from the removable medium 41 by the drive 31 is installed in the storage unit 29 as necessary. The removable medium 41 can also store various data stored in the storage unit 29 in the same manner as the storage unit 29.

図３は、このようなユーザ端末１１の機能的構成のうち、アプリ実行処理を実行するための機能的構成を示す機能ブロック図である。
アプリ実行処理とは、所定のアプリケーションソフトウェア、例えばプレゼンテーション用の資料を作成するソフトウェアを起動して実行するに際し、タッチ操作と音声入力を適宜受け付け、受け付けた内容を解釈し、その解釈結果に従って各種処理を実行するまでの一連の処理をいう。 FIG. 3 is a functional block diagram illustrating a functional configuration for executing the application execution process among the functional configurations of the user terminal 11.
The application execution process is a process for starting and executing predetermined application software, for example, software for creating presentation materials, appropriately accepting touch operations and voice input, interpreting the received contents, and performing various processes according to the interpretation results. This is a series of processing until the execution of.

アプリ実行処理が実行される場合には、ＣＰＵ２１においては、図３に示すように、タッチ操作認識部５１と、音声認識制御部５２と、操作認識統合部５３と、アプリ実行部５４と、表示画像生成部５５と、表示制御部５６とが機能する。 When the application execution process is executed, in the CPU 21, as shown in FIG. 3, the touch operation recognition unit 51, the voice recognition control unit 52, the operation recognition integration unit 53, the application execution unit 54, and the display The image generation unit 55 and the display control unit 56 function.

タッチ操作認識部５１は、タッチパネル（正確には入力部１６）に対してタッチ操作がなされると、当該タッチ操作の内容を認識して、具体的にはタッチ座標を認識して、その認識結果（以下、「タッチ操作認識結果」とも呼ぶ）を操作認識統合部５３に通知する。
音声認識制御部５２は、音声入力部２８に対して音声が入力されると、当該音声を認識するための制御を実行する。具体的には例えば、音声認識制御部５２は、音声入力部２８から出力される音声データを取得して、通信部３０及びネットワーク１３を介して音声認識サーバ１２に送信する。そして、当該音声データに対する音声認識結果が音声認識サーバ１２から送信されてくると、音声認識制御部５２は、当該音声認識結果を受信して、それに基づいて音声入力操作の内容を認識して、その認識結果を操作認識統合部５３に通知する。 When a touch operation is performed on the touch panel (more precisely, the input unit 16), the touch operation recognition unit 51 recognizes the content of the touch operation, specifically recognizes the touch coordinates, and the recognition result. (Hereinafter also referred to as “touch operation recognition result”) is notified to the operation recognition integration unit 53.
When a voice is input to the voice input unit 28, the voice recognition control unit 52 executes control for recognizing the voice. Specifically, for example, the voice recognition control unit 52 acquires voice data output from the voice input unit 28 and transmits the voice data to the voice recognition server 12 via the communication unit 30 and the network 13. When the voice recognition result for the voice data is transmitted from the voice recognition server 12, the voice recognition control unit 52 receives the voice recognition result, recognizes the content of the voice input operation based on the result, The recognition result is notified to the operation recognition integration unit 53.

操作認識統合部５３は、タッチ操作認識部５１から通知されたタッチ操作の内容（タッチ座標）と、音声認識制御部５２から通知された音声入力操作の内容とを統合して、ユーザによる操作全体の内容を認識する。 The operation recognition integration unit 53 integrates the content (touch coordinates) of the touch operation notified from the touch operation recognition unit 51 and the content of the voice input operation notified from the voice recognition control unit 52, and performs the entire operation by the user. Recognize the contents of

操作認識統合部５３は、例えば、操作の種類や精度を、タッチ操作と音声の入力とで切り分けることができる。
具体的には例えば、所定のアプリケーションで用いられるオブジェクトが、表示部２７に表示される場合において、当該オブジェクトに対して、選択、移動、拡大、縮小及び色の変更が指示操作の種類として存在するものとする。この場合、操作認識統合部５３は、例えばオブジェクトの選択や移動、拡大、縮小といった類の指示操作（又は、連続的変化を停止する操作）については、タッチ操作（そのタッチ操作認識結果であり、具体的にはそのタッチ座標）で受け付ける。他方、操作認識統合部５３は、オブジェクトの色の変更、操作の停止といった類の指示操作については、「〜色に変更」や「とまれ」というキーワード等の音声の入力（その音声認識結果）で受け付ける。具体的には、操作認識統合部５３は、図４の指示操作テーブルに基づいて、タッチ操作と音声の入力とに基づいて所定の指示操作を認識する。 For example, the operation recognition integration unit 53 can classify the type and precision of the operation by touch operation and voice input.
Specifically, for example, when an object used in a predetermined application is displayed on the display unit 27, selection, movement, enlargement, reduction, and color change exist as the type of instruction operation for the object. Shall. In this case, the operation recognition integration unit 53 is, for example, a touch operation (the result of the touch operation recognition) for an instruction operation (or an operation to stop the continuous change) such as selection, movement, enlargement, or reduction of an object. Specifically, the touch coordinates are accepted. On the other hand, the operation recognition integration unit 53 inputs voices such as keywords “to change to color” and “to rare” for the instruction operations such as changing the color of the object and stopping the operation (result of the voice recognition). Accept. Specifically, the operation recognition integration unit 53 recognizes a predetermined instruction operation based on the touch operation and voice input based on the instruction operation table of FIG.

図４は、図３の機能的構成を有する図２のユーザ端末１１の記憶部２９に記憶されている指示操作テーブルを示す図である。図４の指示操作テーブルにおいては、タッチ操作認識結果と、音声認識結果との組合せに対して、具体的な指示操作（アクション）の定義が記述されている。図４の指示操作テーブルには、タッチ操作認識結果と、音声認識結果とに基づいて、具体的な指示操作（アクション）が定義されている。例えば、タッチ操作認識結果が「オブジェクトの選択」であり、音声認識結果が「〜色に変更」である場合には、操作認識統合部５３は、図４の指示操作テーブルを参照して、ユーザによる具体的な指示操作は「オブジェクトを〜色に変更」であると認識する。 4 is a diagram showing an instruction operation table stored in the storage unit 29 of the user terminal 11 of FIG. 2 having the functional configuration of FIG. In the instruction operation table of FIG. 4, a specific definition of an instruction operation (action) is described for a combination of a touch operation recognition result and a voice recognition result. In the instruction operation table of FIG. 4, specific instruction operations (actions) are defined based on the touch operation recognition result and the voice recognition result. For example, when the touch operation recognition result is “select object” and the voice recognition result is “change to color”, the operation recognition integration unit 53 refers to the instruction operation table of FIG. It is recognized that the specific instruction operation by is “change object to color”.

また例えば、操作認識統合部５３は、オブジェクトの目的地までの移動操作の精度について、目的地から一定以上離れているときは、操作の精度を低くしてタッチ操作で受け付け、目的地に一定以内に近づいたときは、操作の精度を高くして「もう少し右に移動」等の音声の入力で受け付けることができる。
このようにして、ユーザは、オブジェクトの移動の操作として求められる精度に応じて、タッチ操作と音声の入力（発話）を切り分けることができるので、便宜である。 In addition, for example, the operation recognition integration unit 53 accepts a touch operation with a low accuracy of the operation when the object is away from the destination by a certain amount or less when the object is away from the destination. When approaching, the operation accuracy can be increased and accepted by voice input such as “move right a little more”.
In this way, the user can distinguish between the touch operation and the voice input (speech) according to the accuracy required as the operation of moving the object, which is convenient.

ここで、操作認識統合部５３は、ユーザにより選択された各種制御モードに基づいて、異なる操作を受け付けることができる。
例えば、動的な制御を行うモードが選択されている場合には、操作認識統合部５３は、「もう少し」という音声入力が行われた際には、オブジェクトの変化速度を遅くする操作を認識し、その後、「とまれ」というキーワードの音声入力が行われるか、又は、タッチ操作が行われた場合、当該オブジェクトの停止操作を認識する。
また、静的な制御を行うモードが選択されている場合には、操作認識統合部５３は、「もう少し」という音声入力が行われた際には、オブジェクトを表示部２７の画面上において数％移動させる操作を認識する。 Here, the operation recognition integration unit 53 can accept different operations based on various control modes selected by the user.
For example, when a mode for performing dynamic control is selected, the operation recognition integration unit 53 recognizes an operation that slows down the change speed of an object when a voice input “more” is performed. Thereafter, when the voice input of the keyword “Tare rare” is performed or when a touch operation is performed, the stop operation of the object is recognized.
In addition, when the mode for performing the static control is selected, the operation recognition integration unit 53 causes the object to be displayed on the screen of the display unit 27 by several percent when the voice input “more” is performed. Recognize the operation to move.

ここで、ユーザは、操作認識統合部５３による認識結果（その結果現れるオブジェクトの移動等）が自身が求めたものと異なる場合、「違う」や「その２倍」等の否定形のキーワードを、タッチ操作や音声入力で、フィードバックすることができる。
この場合、操作認識統合部５３は、ユーザのタッチ操作入力部２６や、音声入力部２８の操作に基づき、「違う」や「その２倍」等の否定形のキーワードを否定語として認識し、その否定語の内容に基づいて、操作の認識をやり直す。そして、操作認識統合部５３は、この一連の処理を学習することで、即ち否定語による指示を初期値として学習することで、認識の精度を高めることができる。
このように、操作認識統合部５３は、ユーザのフィードバックにより認識の精度を高めることができる。換言すると、ユーザは、音声入力やタッチ操作を通じて、操作認識統合部５３による認識結果（その結果現れるオブジェクトの移動等）に対するフィードバックを入力するだけで、音声入力やタッチ操作による指示操作を自由にカスタマイズできることになる。 Here, when the recognition result (movement of the object that appears as a result) by the operation recognition integration unit 53 is different from what the user has requested, the user can specify a negative keyword such as “different” or “twice that”. Feedback can be provided by touch operation or voice input.
In this case, the operation recognition integration unit 53 recognizes a negative keyword such as “different” or “twice” as a negative word based on the operation of the touch operation input unit 26 or the voice input unit 28 of the user, Based on the contents of the negative word, the operation is recognized again. Then, the operation recognition integration unit 53 can increase the recognition accuracy by learning this series of processes, that is, by learning an instruction by a negative word as an initial value.
In this way, the operation recognition integration unit 53 can increase the accuracy of recognition by user feedback. In other words, the user can freely customize the instruction operation by voice input or touch operation only by inputting feedback for the recognition result (movement of the object that appears as a result) by the operation recognition integration unit 53 through voice input or touch operation. It will be possible.

さらに、ユーザは、例えば、タッチ操作入力部２６に対するタッチ操作により、フリーハンドで人物等の絵を描くと共に、各種指示操作を音声入力ですることもできる。
具体的には例えば、ユーザが、当該人物の腕に相当するオブジェクトの線（エッジ）をタッチ操作した状態で、「もう少し太く」や「もう少し右」等の指示を音声入力で行うことができる。この場合、操作認識統合部５３は、タッチ操作のタッチ座標に対応する線を少し太くさせたり、少し右に移動させたりするという指示操作を認識する。
また例えば、ユーザが、当該人物の腕に相当するオブジェクトの線（エッジ）をタッチ操作した状態で、「赤」等のキーワードを音声入力することができる。この場合、操作認識統合部５３は、タッチ操作のタッチ座標に対応する線を赤色で塗りつぶすという指示操作を認識する。 Furthermore, for example, the user can draw a picture of a person or the like with a freehand by a touch operation on the touch operation input unit 26 and can also input various instruction operations by voice input.
Specifically, for example, in a state where the user touches an object line (edge) corresponding to the person's arm, an instruction such as “slightly thicker” or “slightly right” can be given by voice input. In this case, the operation recognition integration unit 53 recognizes an instruction operation in which the line corresponding to the touch coordinates of the touch operation is slightly thicker or slightly moved to the right.
In addition, for example, a keyword such as “red” can be input by voice in a state where the user performs a touch operation on a line (edge) of an object corresponding to the person's arm. In this case, the operation recognition integration unit 53 recognizes an instruction operation in which a line corresponding to the touch coordinates of the touch operation is painted in red.

操作認識統合部５３は、例えば、同一の指示操作に対して、音声の入力とタッチ操作とを組み合わせることができる。
具体的には例えば、ユーザは、表示部２７の画面上の２点間で線分を描画する場合、画面上で１点目と２点目をタッチ操作しながら、「この点とこの点とを結んだ線分を描画」等の音声を発する、といったタッチ操作と音声の入力の協働を図ることができる。この場合、操作認識統合部５３は、タッチ操作の内容に基づいて２点のタッチ座標を認識し、入力された音声の認識結果に基づいて、認識された２点のタッチ座標を結んだ線分を描画するという指示として受け付け、表示画像生成部５５等に通知する。これにより、後述するように、表示画像生成部５５により、認識された２点のタッチ座標を結んだ線分を含む表示画像が生成されて、表示制御部５６の制御に基づいて、表示部２７に表示される。 For example, the operation recognition integration unit 53 can combine voice input and touch operation for the same instruction operation.
Specifically, for example, when the user draws a line segment between two points on the screen of the display unit 27, while touching the first point and the second point on the screen, It is possible to collaborate with a touch operation such as “drawing a line segment connecting” and voice input. In this case, the operation recognition integration unit 53 recognizes the two touch coordinates based on the content of the touch operation, and based on the input speech recognition result, the line segment connecting the two recognized touch coordinates. Is displayed as an instruction to draw the image, and the display image generation unit 55 and the like are notified. Thereby, as will be described later, the display image generation unit 55 generates a display image including a line segment connecting the two recognized touch coordinates, and the display unit 27 is controlled based on the control of the display control unit 56. Is displayed.

アプリ実行部５４は、操作認識統合部５３により認識された、ユーザによる操作の内容に従って、所定のアプリケーションソフトウェア、例えばプレゼンテーション用の資料を作成するソフトウェアを起動して実行する。 The application execution unit 54 activates and executes predetermined application software, for example, software for creating a presentation material, according to the content of the user operation recognized by the operation recognition integration unit 53.

表示画像生成部５５は、アプリ実行部５４によるアプリケーションソフトウェアの実行に伴い、必要な各種表示画像のデータを生成する。この場合、表示画像生成部５５は、操作認識統合部５３により認識された、ユーザによる操作の内容に従って、表示画像のデータを適宜更新する。 The display image generation unit 55 generates necessary various display image data as the application execution unit 54 executes the application software. In this case, the display image generation unit 55 appropriately updates the data of the display image according to the content of the user operation recognized by the operation recognition integration unit 53.

表示制御部５６は、表示画像生成部５５によりデータとして生成された表示画像を、表示部２７に表示させる制御を実行する。 The display control unit 56 executes control for causing the display unit 27 to display the display image generated as data by the display image generation unit 55.

次に、かかる機能的構成を有するユーザ端末１１が実行する、アプリ実行処理について説明する。
図５は、図３の機能構成を有する図２のユーザ端末１１が実行する、アプリ実行処理の流れの一例を示すフローチャートである。
アプリ実行部５４により、所定のアプリケーションソフトウェアが起動されると、アプリ実行処理が開始されて、次のようなステップＳ１以降の処理が実行される。 Next, an application execution process executed by the user terminal 11 having such a functional configuration will be described.
FIG. 5 is a flowchart showing an example of the flow of application execution processing executed by the user terminal 11 of FIG. 2 having the functional configuration of FIG.
When predetermined application software is activated by the application execution unit 54, an application execution process is started, and the following processes after step S1 are executed.

ステップＳ１において、タッチ操作認識部５１及び音声認識制御部５２は、操作がなされたか否かを判定する。
タッチ操作及び音声入力の何れもがなされていない場合、ステップＳ１においてＮＯであると判定されて、処理はステップＳ１に戻される。即ち、タッチ操作又は音声入力があるまでの間、ステップＳ１の判定処理が繰り返し実行されて、アプリ実行処理は待機状態になる。
タッチ操作と音声入力のうち少なくとも一方がなされると、ステップＳ１においてＹＥＳであると判定され、処理はステップＳ２に進む。 In step S1, the touch operation recognition unit 51 and the voice recognition control unit 52 determine whether or not an operation has been performed.
If neither touch operation nor voice input is performed, it is determined as NO in Step S1, and the process returns to Step S1. That is, until the touch operation or voice input is performed, the determination process in step S1 is repeatedly executed, and the application execution process enters a standby state.
When at least one of the touch operation and the voice input is performed, it is determined as YES in Step S1, and the process proceeds to Step S2.

ステップＳ２において、タッチ操作認識部５１及び音声認識制御部５２は、操作入力は音声とタッチの両方であるか否かを判定する。
タッチ操作と音声入力のうち何れか一方がなされた場合、ステップＳ２においてＮＯであると判定されて、処理はステップＳ１０に進む。ただし、ステップＳ１０以降の処理については後述する。
タッチ操作と音声入力の両方がなされた場合、ステップＳ２においてＹＥＳであると判定されて、処理はステップＳ３に進む。 In step S2, the touch operation recognition unit 51 and the voice recognition control unit 52 determine whether or not the operation input is both voice and touch.
When either one of the touch operation and the voice input is performed, it is determined as NO in Step S2, and the process proceeds to Step S10. However, the processing after step S10 will be described later.
When both the touch operation and the voice input are performed, it is determined as YES in Step S2, and the process proceeds to Step S3.

ステップＳ３において、音声認識制御部５２は、音声認識処理の制御を実行する。即ち例えば、音声認識制御部５２は、音声入力部２８から出力される音声データを取得して、通信部３０及びネットワーク１３を介して音声認識サーバ１２に送信して、音声認識処理の実行を指示する。
ステップＳ４において、音声認識制御部５２は、当該音声データに対する音声認識結果が音声認識サーバ１２から送信されてくると、当該音声認識結果を受信して、音声入力操作の内容を認識する。 In step S3, the voice recognition control unit 52 performs control of voice recognition processing. That is, for example, the voice recognition control unit 52 acquires voice data output from the voice input unit 28 and transmits the voice data to the voice recognition server 12 via the communication unit 30 and the network 13 to instruct execution of voice recognition processing. To do.
In step S4, when the voice recognition result for the voice data is transmitted from the voice recognition server 12, the voice recognition control unit 52 receives the voice recognition result and recognizes the content of the voice input operation.

ステップＳ５において、タッチ操作認識部５１は、タッチ操作の内容、具体的にはタッチ座標を認識する。 In step S5, the touch operation recognition unit 51 recognizes the content of the touch operation, specifically the touch coordinates.

ステップＳ６において、操作認識統合部５３は、ステップＳ４の処理で認識された音声入力操作の内容と、ステップＳ５の処理で認識されたタッチ操作の内容（タッチ座標）とを統合して、ユーザによる操作全体の内容を認識する。 In step S6, the operation recognition integration unit 53 integrates the content of the voice input operation recognized in the process of step S4 and the content of the touch operation recognized in the process of step S5 (touch coordinates). Recognize the contents of the entire operation.

ステップＳ７において、アプリ実行部５４は、ステップＳ７の処理で統合された、ユーザによる操作の内容に応じて、所定のアプリケーションソフトウェア、例えばプレゼンテーション用の資料を作成するソフトウェアを実行する。
なお、ここでいうソフトウェアの実行とは、起動後のソフトウェアのプログラムのうち、ユーザによる操作の内容に対応するコマンド等に基づいて、当該プログラムの一部に相当する部分が実行されることを含む。 In step S7, the application execution unit 54 executes predetermined application software, for example, software for creating presentation materials, according to the content of the user operation integrated in the process of step S7.
Note that the execution of software herein includes execution of a portion corresponding to a part of the program based on a command or the like corresponding to the content of the operation performed by the user in the software program after startup. .

ステップＳ８において、表示画像生成部５５は、ステップＳ７の処理結果に応じて、即ちアプリケーションソフトウェアの実行結果に応じて、表示画像のデータに生成又は更新をする。更新又は生成された表示画像は、表示部２７に表示される。 In step S8, the display image generation unit 55 generates or updates display image data according to the processing result of step S7, that is, according to the execution result of the application software. The updated or generated display image is displayed on the display unit 27.

ステップＳ９において、操作認識統合部５３は、処理の終了が指示されたか否かを判定する。
処理の終了の指示は、特に限定されないが、本実施形態では、ユーザ端末１１の電源の遮断指示が採用されている。即ち、本実施形態では、電源の遮断指示がなされるまでの間、ステップＳ９においてＮｏであると判定されて、処理はステップＳ１に戻され、それ以降の処理が繰り返される。これに対して、電源の遮断指示がなされると、ステップＳ９においてＹｅｓであると判定されて、アプリ実行処理が終了となる。 In step S9, the operation recognition integration unit 53 determines whether or not an instruction to end the process has been given.
The instruction to end the process is not particularly limited, but in the present embodiment, an instruction to turn off the power of the user terminal 11 is adopted. That is, in this embodiment, until the power-off instruction is issued, it is determined No in step S9, the process returns to step S1, and the subsequent processes are repeated. On the other hand, when a power-off instruction is given, it is determined Yes in step S9, and the application execution process ends.

以上、タッチ操作と音声入力とが同時に実行された場合における、アプリ実行処理の流れについて説明した。次に、音声操作が単独でなされた場合、及びタッチ操作が単独でなされた場合の処理の流れの夫々について説明する。 Heretofore, the flow of the application execution process when the touch operation and the voice input are executed at the same time has been described. Next, a description will be given of each of the processing flows when the voice operation is performed alone and when the touch operation is performed alone.

音声入力又はタッチ操作が単独でなされた場合には、ステップＳ２においてＮｏであると判定されて、処理はステップＳ１０に進む。 When the voice input or the touch operation is performed alone, it is determined No in step S2, and the process proceeds to step S10.

ステップＳ１０において、操作認識統合部５３は、操作入力は、音声入力であるか否かを認識する。 In step S10, the operation recognition integration unit 53 recognizes whether or not the operation input is a voice input.

操作入力が音声入力単体である場合には、ステップＳ１０においてＹｅｓであると判定されて処理はステップＳ１１に進む。
ステップＳ１１において、音声認識制御部５２は、音声認識処理の制御を実行する。即ち例えば、音声認識制御部５２は、音声入力部２８から出力される音声データを取得して、通信部３０及びネットワーク１３を介して音声認識サーバ１２に送信して、音声認識処理の実行を指示する。
ステップＳ１２において、音声認識制御部５２は、当該音声データに対する音声認識結果が音声認識サーバ１２から送信されてくると、当該音声認識結果を受信して、音声入力操作の内容を認識する。
ステップＳ７において、アプリ実行部５４は、ステップＳ１２の処理で認識された音声操作の内容を、ユーザによる操作の内容として、その内容に応じて、所定のアプリケーションソフトウェア、例えばプレゼンテーション用の資料を作成するソフトウェアを実行する。なお、ここでいうソフトウェアの実行とは、起動後のソフトウェアのプログラムのうち、ユーザによる操作の内容に対応するコマンド等に基づいて、当該プログラムの一部に相当する部分が実行されることを含む。
ステップＳ８において、表示画像生成部５５は、ステップＳ７の処理結果に応じて、即ちアプリケーションソフトウェアの実行結果に応じて、表示画像のデータに生成又は更新をする。更新又は生成された表示画像は、表示部２７に表示される。 If the operation input is a single voice input, it is determined Yes in step S10, and the process proceeds to step S11.
In step S11, the voice recognition control unit 52 executes control of voice recognition processing. That is, for example, the voice recognition control unit 52 acquires voice data output from the voice input unit 28 and transmits the voice data to the voice recognition server 12 via the communication unit 30 and the network 13 to instruct execution of voice recognition processing. To do.
In step S12, when the voice recognition result for the voice data is transmitted from the voice recognition server 12, the voice recognition control unit 52 receives the voice recognition result and recognizes the content of the voice input operation.
In step S7, the application execution unit 54 creates predetermined application software, for example, presentation materials, according to the content of the voice operation recognized in the processing of step S12 as the content of the operation by the user. Run the software. Note that the execution of software herein includes execution of a portion corresponding to a part of the program based on a command or the like corresponding to the content of the operation performed by the user in the software program after startup. .
In step S8, the display image generation unit 55 generates or updates display image data according to the processing result of step S7, that is, according to the execution result of the application software. The updated or generated display image is displayed on the display unit 27.

このような操作入力が音声入力単体である場合の一連の処理に対して、操作入力がタッチ操作単体である場合には、ステップＳ１０においてＮｏであると判定されて、処理はステップＳ１３に進み、次のような一連の処理が実行される。
即ち、ステップＳ１３において、タッチ操作認識部５１は、タッチ操作の内容、具体的にはタッチ座標を認識する。
ステップＳ７において、アプリ実行部５４は、ステップＳ１３の処理で認識されたタッチ操作の内容（タッチ座標）を、ユーザによる操作の内容として、その内容に応じて、所定のアプリケーションソフトウェア、例えばプレゼンテーション用の資料を作成するソフトウェアを実行する。なお、ここでいうソフトウェアの実行とは、起動後のソフトウェアのプログラムのうち、ユーザによる操作の内容に対応するコマンド等に基づいて、当該プログラムの一部に相当する部分が実行されることを含む。
ステップＳ８において、表示画像生成部５５は、ステップＳ７の処理結果に応じて、即ちアプリケーションソフトウェアの実行結果に応じて、表示画像のデータに生成又は更新をする。更新又は生成された表示画像は、表示部２７に表示される。 In contrast to the series of processes when the operation input is a single voice input, if the operation input is a single touch operation, it is determined No in step S10, and the process proceeds to step S13. The following series of processing is executed.
That is, in step S13, the touch operation recognition unit 51 recognizes the content of the touch operation, specifically, the touch coordinates.
In step S7, the application execution unit 54 uses the content of the touch operation (touch coordinates) recognized in the process of step S13 as the content of the operation by the user, and according to the content, predetermined application software, for example, for presentation Run the software that creates the material. Note that the execution of software herein includes execution of a portion corresponding to a part of the program based on a command or the like corresponding to the content of the operation performed by the user in the software program after startup. .
In step S8, the display image generation unit 55 generates or updates display image data according to the processing result of step S7, that is, according to the execution result of the application software. The updated or generated display image is displayed on the display unit 27.

ここで、図６乃至図１３を参照して、プレゼンテーション用の資料を作成するアプリケーションソフトウェアが起動して実行される場合の、アプリ実行処理について具体的に説明する。
図６乃至図１３は、図３の機能的構成を有する図２のユーザ端末１１の表示部２７（図２参照）で表示されるオブジェクトの表示例を説明する図である。 Here, with reference to FIG. 6 to FIG. 13, application execution processing when application software for creating presentation materials is started and executed will be specifically described.
6 to 13 are views for explaining display examples of objects displayed on the display unit 27 (see FIG. 2) of the user terminal 11 of FIG. 2 having the functional configuration of FIG.

本例では、ユーザが、最終的に図６に示すようなプレゼンテーション用のシート１００を完成させるまでの各種指示操作がなされた場合のアプリ実行処理の結果（表示部２７の画面表示）が、図７乃至図１３の夫々に示されている。
ここで、最終的なシート１００には、オブジェクト２００と、オブジェクト３００と、を含む各種オブジェクトが含まれている。オブジェクト２００は、「課題」という文字が内部に記載された楕円形のオブジェクトである。オブジェクト３００は、「結論」という文字が内部に記載された四角形のオブジェクトである。 In this example, the result of the application execution process (screen display on the display unit 27) when the user performs various instruction operations until the user finally completes the presentation sheet 100 as shown in FIG. It is shown in each of FIGS.
Here, the final sheet 100 includes various objects including the object 200 and the object 300. The object 200 is an oval object in which the characters “task” are written. The object 300 is a rectangular object in which the word “conclusion” is written.

先ず、図７の状態からユーザの指示操作が行われるものとする。
即ち、図７の状態では、シート１００にはオブジェクト３００のみが含まれているため、ユーザは、この後、オブジェクト２００を図６のように挿入してシート１００を完成させる必要がある。 First, it is assumed that a user instruction operation is performed from the state of FIG.
That is, in the state of FIG. 7, since only the object 300 is included in the sheet 100, the user needs to insert the object 200 as shown in FIG. 6 to complete the sheet 100.

ここで、アプリ実行部５４により、所定のアプリケーションソフトウェアが起動されると、図５のアプリ実行処理が開始されて、図８に示すようなアプリケーションウィンドウ４００が図２の情報処理装置１の表示部２７（図２）に表示される。
アプリケーションウィンドウ４００は、ボタンエリア５００と、アプリケーションエリア６００とを含む複数のエリアにより構成される。
ボタンエリア５００は、アプリケーションウィンドウ４００内の上部に配置され、アプリケーションソフトウェアが実行された場合に、各機能が夫々対応付けられた複数のソフトウェアボタンから構成される。
アプリケーションエリア６００は、編集中のシート１００が表示される。即ち、ユーザのタッチ操作に基づいて作成された各種オブジェクトがシート１００内に表示される。 Here, when predetermined application software is started by the application execution unit 54, the application execution process of FIG. 5 is started, and an application window 400 as shown in FIG. 8 is displayed on the display unit of the information processing apparatus 1 of FIG. 27 (FIG. 2).
The application window 400 includes a plurality of areas including a button area 500 and an application area 600.
The button area 500 is arranged in the upper part of the application window 400, and includes a plurality of software buttons each associated with each function when application software is executed.
In the application area 600, the sheet 100 being edited is displayed. That is, various objects created based on the user's touch operation are displayed in the sheet 100.

本例では、ユーザは、楕円形のオブジェクト２００をシート１００に挿入する必要があるため、楕円形の元になる円形のオブジェクト２００を選択するためのソフトウェアボタン５０１に対して、人差し指７００でタッチ操作をする。
これにより、図５において、ステップＳ１ＹＥＳ、ステップＳ２ＮＯ、ステップＳ１０ＮＯの後、ステップＳ１３において、タッチ操作認識部５１により、タッチ操作の内容が認識される。本例では、ソフトウェアボタン５０１に対しては、円形のオブジェクト２００を所定のサイズで画面左上端に表示させるという指示操作が予め対応付けられているものとし、当該指示操作が認識されるものとする。その結果、ステップＳ７及びＳ８の処理の結果として、図８に示すように、円形のオブジェクト２００がシート１００の左上に表示される。 In this example, since the user needs to insert the elliptical object 200 into the sheet 100, the user performs a touch operation with the index finger 700 on the software button 501 for selecting the circular object 200 that is the original of the elliptical shape. do.
Thereby, in FIG. 5, after step S1YES, step S2NO, and step S10NO, the touch operation recognition unit 51 recognizes the content of the touch operation in step S13. In this example, it is assumed that an instruction operation for displaying the circular object 200 at a predetermined size on the upper left corner of the screen is associated with the software button 501 in advance, and the instruction operation is recognized. . As a result, as a result of the processing in steps S7 and S8, a circular object 200 is displayed on the upper left of the sheet 100 as shown in FIG.

このように、タッチ操作により円形のオブジェクト２００を選択した段階では、図８に示すように、当該オブジェクト２００は画面左上端に表示されているので、図６の最終的な位置と大きくずれている。
従って、ユーザは、円形のオブジェクト２００を、大きく移動させて、図６の最終的な位置の近傍まで近づける必要がある。このような大雑把な移動操作は、タッチ操作が好適である。
そこで、ユーザは、タッチ操作の１つとして、オブジェクト２００に対して人差し指７００でタッチした状態で、所望の方向（本実施形態では、シート１００の右下の方向）に移動させるドラッグ操作を行う。
これにより、図５において、ステップＳ１ＹＥＳ、ステップＳ２ＮＯ、ステップＳ１０ＮＯの後、ステップＳ１３において、タッチ操作の内容が認識される。具体的には、ステップＳ１３において、タッチ操作認識部５１は、ドラッグ操作により移動されたタッチ座標の推移を認識する。その結果、ステップＳ７及びＳ８の処理の結果として、図９に示すように、円形のオブジェクト２００が、ドラッグ操作の軌跡（本例ではシート１００の右下の方向に移動する軌跡）に従って大きく移動する。 As described above, when the circular object 200 is selected by the touch operation, the object 200 is displayed at the upper left corner of the screen as shown in FIG. .
Therefore, the user needs to move the circular object 200 to a position close to the final position in FIG. Such a rough movement operation is preferably a touch operation.
Therefore, as one of the touch operations, the user performs a drag operation to move the object 200 in a desired direction (in the present embodiment, the lower right direction of the sheet 100) while touching the object 200 with the index finger 700.
Thereby, in FIG. 5, after step S1YES, step S2NO, and step S10NO, the content of the touch operation is recognized in step S13. Specifically, in step S13, the touch operation recognizing unit 51 recognizes the transition of touch coordinates moved by the drag operation. As a result, as a result of the processing in steps S7 and S8, as shown in FIG. 9, the circular object 200 moves largely according to the drag operation trajectory (in this example, the trajectory moving in the lower right direction of the sheet 100). .

図９の段階では、オブジェクト２００は円形でありかつ小サイズであるので、図６の最終的な楕円形かつ大サイズとは大きく異なる。
従って、ユーザは、円形のオブジェクト２００を、主に左右方向に大きく拡大させて、図６の最終的な形状まで近づける必要がある。このような大雑把な拡大操作は、タッチ操作が好適である。
そこで、ユーザは、タッチ操作の１つとして、オブジェクト２００に対し人差し指７００と親指７０１で互いに押し広げるピンチアウト（ｐｉｎｃｈｏｕｔ）操作を行う。
これにより、図５において、ステップＳ１ＹＥＳ、ステップＳ２ＮＯ、ステップＳ１０ＮＯの後、ステップＳ１３において、タッチ操作の内容が認識される。具体的には、ステップＳ１３において、タッチ操作認識部５１は、ピンチアウト操作の内容から、円形のオブジェクト２００を主に左右方向に拡大させるという指示操作を認識する。その結果、ステップＳ７及びＳ８の処理の結果として、図１０に示すように、円形のオブジェクト２００が、ピンチアウト操作（本例では左右方向）に従って主に左右方向に拡大し、その結果楕円形の形状に変化する。
なお、図示しないが、ユーザは、オブジェクト２００を大きな割合で縮小したい場合、タッチ操作として、当該オブジェクト２００に対し人差し指７００と親指７０１とで互いにつまむピンチイン（ｐｉｎｃｈｉｎ）操作を行えばよい。 In the stage of FIG. 9, since the object 200 is circular and has a small size, it differs greatly from the final oval and large size of FIG.
Therefore, the user needs to enlarge the circular object 200 mainly in the left-right direction so as to approach the final shape shown in FIG. Such a rough enlargement operation is preferably a touch operation.
Therefore, the user performs a pinch-out operation that pushes the object 200 with the index finger 700 and the thumb 701 as one of touch operations.
Thereby, in FIG. 5, after step S1YES, step S2NO, and step S10NO, the content of the touch operation is recognized in step S13. Specifically, in step S13, the touch operation recognition unit 51 recognizes an instruction operation for enlarging the circular object 200 mainly in the left-right direction from the contents of the pinch-out operation. As a result, as a result of the processing in steps S7 and S8, as shown in FIG. 10, the circular object 200 expands mainly in the left-right direction according to the pinch-out operation (left-right direction in this example), and as a result, the elliptical object 200 Change to shape.
Although not shown, when the user wants to reduce the object 200 at a large rate, the user may perform a pinch-in operation to pinch the object 200 with the index finger 700 and the thumb 701 as the touch operation.

図１０の段階では、オブジェクト２００の色は白色であるので、図６の最終的な色（例えば青色）とは大きく異なる。
従って、ユーザは、オブジェクト２００を青色に塗る操作をする必要がある。オブジェクト２００の中を満遍なくかつ線（エッジ）の外にはみ出さないように青色で塗る操作は、タッチ操作では不適である一方、音声入力では「青」と一言発話するだけなので好適である。ただし、青に塗る対象を特定する必要があり、この特定については、音声入力で説明するよりも、対象のオブジェクト２００をタッチするといったタッチ操作をした方が好適である。
そこで、ユーザは、オブジェクト２００に対してタッチ操作を行うと共に、「青」という音声入力を行う。
これにより、図５において、ステップＳ１ＹＥＳ、ステップＳ２ＹＥＳの後、ステップＳ３及びＳ４において、音声認識制御部５２により、青色に塗るという内容の指示操作が認識され、ステップＳ５において、タッチ操作認識部５１により、塗る対象はオブジェクト２００であるという内容の指示操作が認識される。そして、ステップＳ６において、操作認識統合部５０により、これらの指示操作が統合されて、オブジェクト２００を青色に塗るという指示操作が認識される。その結果、ステップＳ７及びＳ８の処理の結果として、図１１に示すように、オブジェクト２００が青色に塗りつぶされる。 In the stage of FIG. 10, since the color of the object 200 is white, it differs greatly from the final color (for example, blue) of FIG.
Therefore, the user needs to perform an operation of painting the object 200 in blue. The operation of painting the object 200 in blue so that it does not protrude evenly and out of the line (edge) is not suitable for the touch operation, but is preferable because it only utters “blue” in speech input. However, it is necessary to specify a target to be painted in blue. For this specification, it is preferable to perform a touch operation such as touching the target object 200, rather than using voice input.
Therefore, the user performs a touch operation on the object 200 and performs a voice input “blue”.
Thus, in FIG. 5, after steps S1YES and S2YES, in steps S3 and S4, the voice recognition control unit 52 recognizes the instruction operation of painting in blue, and in step S5, the touch operation recognition unit 51 recognizes. The instruction operation indicating that the object to be painted is the object 200 is recognized. In step S <b> 6, the operation recognition integration unit 50 recognizes the instruction operation to unify these instruction operations and paint the object 200 in blue. As a result, as a result of the processing in steps S7 and S8, the object 200 is painted blue as shown in FIG.

なお、音声入力による色の指定（変更指示）は、特に単色に限らず、例えば「青８０％と赤２０％との割合で混合」のように混合で指定してもよい。この場合、操作認識統合部５０が、ＣＩＥｘｙ色度図等に基づいて「青８０％と赤２０％との割合で混合」した色はいかなる色かを認識し、認識した色に塗るという指示操作を認識する。
このように、操作認識統合部５０は、ＣＩＥｘｙ色度図等に基づいて任意の色を認識することができるので、さらにユーザは「もう少し青系統」等のあいまいな表現で色の指定（変更指示）をすることもできる。 The color designation (change instruction) by voice input is not limited to a single color, and may be designated by mixing, for example, “mixing at a ratio of 80% blue and 20% red”. In this case, the operation recognition integration unit 50 recognizes what color is the color “mixed at a ratio of 80% blue and 20% red” based on the CIExy chromaticity diagram and the like, and applies to the recognized color. Recognize
As described above, the operation recognition integration unit 50 can recognize an arbitrary color based on the CIExy chromaticity diagram and the like. Further, the user can specify a color (change instruction) using an ambiguous expression such as “a little more blue system”. ).

図１１の段階では、オブジェクト２００の中には文字が入力されていないので、図６の最終的な、「課題」という文字が入力されているものとは大きく異なる。
従って、ユーザは、オブジェクト２００の中に「課題」という文字を入力する操作をする必要がある。オブジェクト２００を選択して文字を入力する操作は、長文や複雑なものではタッチ操作で構わないが、単語レベルの入力に関しては、音声入力では「課題」を挿入と一言発話するだけなので好適である。ただし、文字を入力する対象を特定する必要があり、この特定については、音声入力で説明するよりも、対象のオブジェクト２００をタッチするといったタッチ操作をした方が好適である。
そこで、ユーザは、オブジェクト２００に対してタッチ操作を行うと共に、「課題」を挿入という音声入力を行う。
これにより、図５において、ステップＳ１ＹＥＳ、ステップＳ２ＹＥＳの後、ステップＳ３及びＳ４において、音声認識制御部５２により、「課題」という文字を入力するという内容の指示操作が認識され、ステップＳ５において、タッチ操作認識部５１により、文字入力の対象はオブジェクト２００であるという内容の指示操作が認識される。そして、ステップＳ６において、操作認識統合部５０により、これらの指示操作が統合されて、オブジェクト２００の中に「課題」という文字を入力するという指示操作が認識される。その結果、ステップＳ７及びＳ８の処理の結果として、図１２に示すように、オブジェクト２００の中に「課題」という文字が入力される。 In the stage of FIG. 11, since no character is input in the object 200, it is greatly different from the final input of the “task” character in FIG. 6.
Therefore, the user needs to perform an operation of inputting the character “task” in the object 200. The operation of selecting the object 200 and inputting a character may be a touch operation in a long sentence or a complicated one. However, with respect to an input at a word level, a voice input is performed because only a single word “issue” is uttered. is there. However, it is necessary to specify a target for inputting characters. For this specification, it is preferable to perform a touch operation such as touching the target object 200, rather than a voice input.
Therefore, the user performs a touch operation on the object 200 and performs a voice input for inserting an “issue”.
Thus, in FIG. 5, after steps S1YES and S2YES, in steps S3 and S4, the voice recognition control unit 52 recognizes the instruction operation with the content of inputting the word “task”, and touches in step S5. The operation recognizing unit 51 recognizes the instruction operation with the content that the object of character input is the object 200. In step S <b> 6, the operation recognition integration unit 50 integrates these instruction operations and recognizes the instruction operation of inputting the character “task” in the object 200. As a result, as a result of the processes in steps S7 and S8, the characters “task” are input into the object 200 as shown in FIG.

なお、ユーザは、オブジェクト２００のサイズや位置を微調整したい場合がある。このような微調整は、タッチ操作より音声入力の方が好適である。ここで、微調整の対象を特定する必要があり、この特定については、上述のように、対象のオブジェクト２００をタッチするといったタッチ操作をした方が一般的には好適であるが、オブジェクト２００が予期せぬ位置に移動等してしまう誤動作が起こる可能性がある。そこで、このような誤動作を防止するために、特定についても音声入力を用いることができる。
そこで、ユーザは、「円を少し右」という音声入力を行う。
これにより、図５において、ステップＳ１ＹＥＳ、ステップＳ２ＮＯ、ステップＳ１０ＹＥＳの後、ステップＳ１１及びＳ１２において、音声認識制御部５２により、オブジェクト２００を少し右方に移動させるという指示操作が認識される。ここで、「少し」という認識の仕方は、上述したように、動的なものと静的なものがあり、また、ユーザのフィードバックにより任意に調整（カスタマイズ）可能である。そして、ステップＳ７及びＳ８の処理の結果として、図１３に示すように、オブジェクト２００が右方に少し移動され、シート１００は最終的な形態（図６と同様の形態）になる。 Note that the user may want to fine-tune the size and position of the object 200. For such fine adjustment, voice input is preferable to touch operation. Here, it is necessary to specify the target of fine adjustment. For this specification, it is generally preferable to perform a touch operation such as touching the target object 200 as described above. There is a possibility that a malfunction may occur such as movement to an unexpected position. Therefore, in order to prevent such a malfunction, voice input can also be used for identification.
Therefore, the user performs a voice input “a little to the right of the circle”.
Thereby, in FIG. 5, after step S1YES, step S2NO, and step S10YES, in steps S11 and S12, the voice recognition control unit 52 recognizes the instruction operation for moving the object 200 slightly to the right. Here, as described above, the method of recognizing “little” is either dynamic or static, and can be arbitrarily adjusted (customized) by user feedback. Then, as a result of the processing in steps S7 and S8, as shown in FIG. 13, the object 200 is slightly moved to the right, and the sheet 100 is in the final form (the same form as in FIG. 6).

なお、本発明は、上述の実施形態に限定されるものではなく、本発明の目的を達成できる範囲での変形、改良等は本発明に含まれるものである。 In addition, this invention is not limited to the above-mentioned embodiment, The deformation | transformation in the range which can achieve the objective of this invention, improvement, etc. are included in this invention.

換言すると、本発明が適用される情報処理装置は、上述の実施形態としてのユーザ端末１１を含め、次のような構成を有する、各種各様の実施形態を取ることができる。
即ち、ユーザ端末１１を含め、本発明が適用される情報処理装置は、図２に示すように、タッチ操作認識部５１と、音声認識制御部５２と、操作認識統合部５３とを備えている。
タッチ操作認識部５１は、ユーザによるタッチ操作の内容を認識する。
音声認識制御部５２は、ユーザの発話内容を示す音声データに対する音声認識処理の実行を制御する。
操作認識統合部５３は、タッチ操作認識部５１により認識されたタッチ操作の内容と、音声認識制御部５２の制御により実行された音声認識処理の結果とを統合して、ユーザによる操作全体の内容を認識する。
特に、操作認識統合部５３は、ユーザによるタッチ操作と発話とが同時に行われた場合、タッチ操作の内容と音声認識処理の結果とを統合して、同一の指示操作として認識する。一方、操作認識統合部５３は、ユーザによるタッチ操作と発話とのうちの一方が単体で行われた場合、操作の種類と精度のうち少なくとも一方を、タッチ操作の内容と音声認識処理の結果とで切り分けることで、ユーザによる操作全体の内容を認識する。
これにより、画面内で作図する作業の効率化が図れるという効果を奏することが可能になる。
以下、かかる効果について具体的に説明する。 In other words, the information processing apparatus to which the present invention is applied can take various embodiments including the user terminal 11 as the above-described embodiment and having the following configuration.
That is, the information processing apparatus to which the present invention is applied including the user terminal 11 includes a touch operation recognition unit 51, a voice recognition control unit 52, and an operation recognition integration unit 53, as shown in FIG. .
The touch operation recognition unit 51 recognizes the content of the touch operation by the user.
The voice recognition control unit 52 controls execution of voice recognition processing on voice data indicating the user's utterance content.
The operation recognition integration unit 53 integrates the content of the touch operation recognized by the touch operation recognition unit 51 and the result of the voice recognition processing executed by the control of the voice recognition control unit 52, and the content of the entire operation by the user. Recognize
In particular, when the user's touch operation and speech are simultaneously performed, the operation recognition integration unit 53 integrates the content of the touch operation and the result of the voice recognition processing and recognizes it as the same instruction operation. On the other hand, when one of the touch operation and speech by the user is performed alone, the operation recognition integration unit 53 determines at least one of the type and accuracy of the operation, the content of the touch operation, and the result of the voice recognition process. The contents of the entire operation by the user are recognized.
As a result, it is possible to achieve an effect that the efficiency of the work of drawing on the screen can be improved.
Hereinafter, such an effect will be specifically described.

例えば、作図の作業を伴うビジネス系のアプリケーションソフトウェアについて考える。このようなビジネス系のアプリケーションソフトウェアでは、スピードが大切である。そこで、本発明の情報処理装置の一実施形態に係るユーザ端末１１を適用することで、音声入力とタッチ操作との組み合わせが可能になる、作図の作業に要する時間が従来と比較して例えば２割程度短縮することが可能になる。 For example, consider business application software that involves drawing. In such business application software, speed is important. Therefore, by applying the user terminal 11 according to an embodiment of the information processing apparatus of the present invention, it is possible to combine voice input and touch operation. It becomes possible to shorten about 10%.

即ち、音声入力とタッチ操作とは夫々、一長一短がある。 That is, voice input and touch operation have advantages and disadvantages, respectively.

具体的には、タッチ操作が得意とする指示操作は例えば、
（Ａ）位置指定、
（Ｂ）目標（ターゲット）指定、
（Ｃ）素早い移動が必要となる際の各種指示、
（Ｄ）メニューボタンを使う際の細かい指示や指定、
といったものがある。 Specifically, the instruction operation that the touch operation is good at is, for example,
(A) position designation,
(B) Target (target) designation,
(C) Various instructions when quick movement is required,
(D) Detailed instructions and specifications when using the menu button
There is something like this.

これに対して、音声入力が得意とする指示操作は例えば、
（ａ）微小距離の移動指示、
（ｂ）曖昧な指示、例えば「もう少し赤色」に等の色変更指示、「もう少し大きく」等のサイズ変更指示、
（ｃ）起動に関する各種指示、例えば色や図のメニュー起動の指示、四角や三角の表示指示、Ｎ×Ｍの表（Ｎ、Ｍは相互に独立した任意の整数値）の起動指示、次、次ページ、最後、最初等の指定、
（ｄ）指では動かしにくい微妙な操作、例えば、回転、とまれ等の指示、
といったものがある。 In contrast, an instruction operation that is good at voice input is, for example,
(A) Instruction for moving a minute distance,
(B) Ambiguous instructions, for example, a color change instruction such as “a little more red”, a size change instruction such as “a little bigger”,
(C) Various instructions related to activation, for example, instructions for starting menus of colors and diagrams, instructions for displaying squares and triangles, instructions for starting N × M tables (N and M are arbitrary integer values independent of each other), Specify next page, last, first, etc.
(D) Subtle operations that are difficult to move with a finger, for example, instructions such as rotation and rareness,
There is something like this.

上述の指示操作の中で、ユーザにとって、タッチ操作は容易に感じるが、音声入力は困難（煩わしい）と感じるものは、（Ａ）、（Ｂ）、（Ｃ）の操作である。
これに対して、ユーザにとって、音声入力は容易に感じるが、タッチ操作は困難（煩わしい）と感じるものは、（ａ）、（ｂ）、（ｃ）の操作である。
従って、例えば、ユーザは、タッチ操作で位置を指定した後、音声入力で起動させ、大ざっぱな指示はタッチ操作を主に利用し、細かい指示は音声入力とタッチ操作を組み合わせて調整することで、作図の時間短縮が可能になる。
このような指示操作は、上述の特許文献１を含む従来の装置では実現不可能であり、本発明が適用された情報処理装置、例えば本実施形態のユーザ端末１１が必要になる。 Among the above-described instruction operations, the operations (A), (B), and (C) that the user feels that the touch operation is easy but the voice input is difficult (or troublesome) are the operations.
On the other hand, what the user feels is that voice input is easy but the touch operation is difficult (inconvenient) is the operations (a), (b), and (c).
Therefore, for example, after a user designates a position by a touch operation, it is activated by voice input, rough instructions mainly use the touch operation, and fine instructions are adjusted by combining voice input and touch operation, Drawing time can be shortened.
Such an instruction operation cannot be realized by a conventional apparatus including the above-described Patent Document 1, and an information processing apparatus to which the present invention is applied, for example, the user terminal 11 of the present embodiment is required.

また、上述の指示操作の全て、即ち（Ａ）乃至（Ｄ）及び（ａ）乃至（ｄ）の指示操作は、ビジネス系のアプリケーションに一般的に求められるものである。そうすると、これらの指示操作については、タッチ操作と音声入力の双方が受け付け可能な装置が要求される。ここでいう「双方が受け付け可能」とは、タッチ操作と音声入力の同時受け付けが可能という意味と、タッチ操作と音声入力の何れか一方がなされた場合に受け付けが可能という意味との両方を含む概念である。
このような要求に応えるためには、上述の特許文献１を含む従来の装置では足りず、本発明が適用された情報処理装置、例えば本実施形態のユーザ端末１１が必要になる。 All of the above-described instruction operations, that is, the instruction operations (A) to (D) and (a) to (d) are generally required for business applications. Then, for these instruction operations, a device that can accept both touch operation and voice input is required. Here, “both can accept” includes both the meaning that a touch operation and voice input can be received simultaneously and the meaning that a touch operation and voice input can be accepted when either touch operation or voice input is performed. It is a concept.
In order to meet such a demand, the conventional apparatus including the above-described Patent Document 1 is not sufficient, and an information processing apparatus to which the present invention is applied, for example, the user terminal 11 of the present embodiment is necessary.

なお、本発明が適用される情報処理装置、例えば本実施形態のユーザ端末１１は、上述の機能を有していればその実現形態は特に限定されないが、作図のソフトウェアに音声認識を組み込む形態で構築することが望ましい。 Note that the information processing apparatus to which the present invention is applied, for example, the user terminal 11 of the present embodiment, is not particularly limited in its implementation as long as it has the above-described functions. However, the voice recognition is incorporated into the drawing software. It is desirable to build.

また、本発明が適用される情報処理装置、例えば本実施形態のユーザ端末１１が受け付け可能な指示操作は、上述した「図形」のみならず、「絵（写真含む）」、「造形物」、「シンボル（文字含む）」等を含めたオブジェクトに対して適用可能である。 In addition, the instruction operation that can be accepted by the information processing apparatus to which the present invention is applied, for example, the user terminal 11 of the present embodiment, is not limited to the above-described “graphics” but also “pictures (including photographs)”, “modeled objects”, It can be applied to objects including “symbols (including characters)”.

また、上述の実施形態では、本発明が適用される情報処理装置は、ユーザ端末１１を例として説明したが、特にこれに限定されない。
例えば、本発明は、音声入力とタッチ操作とを受け付け可能な電子機器一般に適用することができる。具体的には、例えば、本発明は、スマートフォン等の携帯端末、携帯型ナビゲーション装置、携帯電話機、ポータブルゲーム、デジタルカメラ、ノート型のパーソナルコンピュータ、プリンタ、テレビジョン受像機、ビデオカメラ等に適用可能である。 In the above-described embodiment, the information processing apparatus to which the present invention is applied has been described using the user terminal 11 as an example, but is not particularly limited thereto.
For example, the present invention can be applied to general electronic devices that can accept voice input and touch operation. Specifically, for example, the present invention can be applied to portable terminals such as smartphones, portable navigation devices, mobile phones, portable games, digital cameras, notebook personal computers, printers, television receivers, video cameras, and the like. It is.

上述した一連の処理は、ハードウェアにより実行させることもできるし、ソフトウェアにより実行させることもできる。
換言すると、図３の機能的構成は例示に過ぎず、特に限定されない。即ち、上述した一連の処理を全体として実行できる機能がユーザ端末１１に備えられていれば足り、この機能を実現するためにどのような機能ブロックを用いるのかは特に図３の例に限定されない。
また、１つの機能ブロックは、ハードウェア単体で構成してもよいし、ソフトウェア単体で構成してもよいし、それらの組み合わせで構成してもよい。 The series of processes described above can be executed by hardware or can be executed by software.
In other words, the functional configuration of FIG. 3 is merely an example, and is not particularly limited. That is, it is sufficient that the user terminal 11 has a function capable of executing the above-described series of processing as a whole, and what functional block is used to realize this function is not particularly limited to the example of FIG.
In addition, one functional block may be constituted by hardware alone, software alone, or a combination thereof.

一連の処理をソフトウェアにより実行させる場合には、そのソフトウェアを構成するプログラムが、コンピュータ等にネットワークや記録媒体からインストールされる。
コンピュータは、専用のハードウェアに組み込まれているコンピュータであってもよい。また、コンピュータは、各種のプログラムをインストールすることで、各種の機能を実行することが可能なコンピュータ、例えば汎用のパーソナルコンピュータであってもよい。 When a series of processing is executed by software, a program constituting the software is installed on a computer or the like from a network or a recording medium.
The computer may be a computer incorporated in dedicated hardware. The computer may be a computer capable of executing various functions by installing various programs, for example, a general-purpose personal computer.

このようなプログラムを含む記録媒体は、ユーザにプログラムを提供するために装置本体とは別に配布される図２のリムーバブルメディア４１により構成されるだけでなく、装置本体に予め組み込まれた状態でユーザに提供される記録媒体等で構成される。リムーバブルメディア４１は、例えば、磁気ディスク（フロッピディスクを含む）、光ディスク、又は光磁気ディスク等により構成される。光ディスクは、例えば、ＣＤ−ＲＯＭ（ＣｏｍｐａｃｔＤｉｓｋ−ＲｅａｄＯｎｌｙＭｅｍｏｒｙ），ＤＶＤ（ＤｉｇｉｔａｌＶｅｒｓａｔｉｌｅＤｉｓｋ）等により構成される。光磁気ディスクは、ＭＤ（Ｍｉｎｉ−Ｄｉｓｋ）等により構成される。また、装置本体に予め組み込まれた状態でユーザに提供される記録媒体は、例えば、プログラムが記録されている図２のＲＯＭ２２や、図２の記憶部２９に含まれるハードディスク等で構成される。 The recording medium including such a program is not only constituted by the removable medium 41 of FIG. 2 distributed separately from the apparatus main body in order to provide the program to the user, but also in a state in which the recording medium is incorporated in the apparatus main body in advance. It is comprised with the recording medium etc. which are provided in this. The removable medium 41 is composed of, for example, a magnetic disk (including a floppy disk), an optical disk, a magneto-optical disk, or the like. The optical disk is composed of, for example, a CD-ROM (Compact Disk-Read Only Memory), a DVD (Digital Versatile Disk), or the like. The magneto-optical disk is configured by an MD (Mini-Disk) or the like. In addition, the recording medium provided to the user in a state of being incorporated in advance in the apparatus main body includes, for example, the ROM 22 in FIG. 2 in which the program is recorded, the hard disk included in the storage unit 29 in FIG.

なお、本明細書において、記録媒体に記録されるプログラムを記述するステップは、その順序に沿って時系列的に行われる処理はもちろん、必ずしも時系列的に処理されなくとも、並列的或いは個別に実行される処理をも含むものである。
また、本明細書において、システムの用語は、複数の装置や複数の手段等より構成される全体的な装置を意味するものとする。 In the present specification, the step of describing the program recorded on the recording medium is not limited to the processing performed in time series along the order, but is not necessarily performed in time series, either in parallel or individually. The process to be executed is also included.
Further, in the present specification, the term “system” means an overall apparatus configured by a plurality of devices, a plurality of means, and the like.

１１、１１−１乃至１１−Ｎ・・・ユーザ端末，１２・・・音声認識サーバ，１３・・・ネットワーク，２１・・・ＣＰＵ，２６・・・タッチ操作入力部、２７・・・表示部、２８・・・音声入力部、２９・・・記憶部，５９・・・通信部，６０・・・ＧＰＳ部、６１・・・ドライブ，５１・・・タッチ操作認識部、５２・・・音声認識制御部、５３・・・操作認識統合部、５４・・・アプリ実行部、５５・・・表示画像生成部、５６・・・表示制御部５６ DESCRIPTION OF SYMBOLS 11, 11-1 thru | or 11-N ... User terminal, 12 ... Voice recognition server, 13 ... Network, 21 ... CPU, 26 ... Touch operation input part, 27 ... Display part 28 ... voice input unit, 29 ... storage unit, 59 ... communication unit, 60 ... GPS unit, 61 ... drive, 51 ... touch operation recognition unit, 52 ... voice Recognition control unit, 53 ... Operation recognition integration unit, 54 ... Application execution unit, 55 ... Display image generation unit, 56 ... Display control unit 56

Claims

Touch operation recognition means for recognizing the content of the touch operation by the user;
Voice recognition control means for controlling execution of voice recognition processing on voice data indicating the user's speech content;
An operation for recognizing the content of the entire operation by the user by integrating the content of the touch operation recognized by the touch operation recognizing unit and the result of the speech recognition process executed by the control of the speech recognition control unit. Cognitive integration means,
With
The operation recognition integration unit includes:
When the touch operation and utterance by the user are performed at the same time, the content of the touch operation and the result of the voice recognition process are integrated and recognized as the same instruction operation,
When one of the touch operation and utterance by the user is performed alone, at least one of the type and accuracy of the operation is separated by the content of the touch operation and the result of the voice recognition process, Recognize the entire content of user operations,
Information processing device.

Application execution means for executing predetermined application software according to the content of the entire operation by the user recognized by the operation recognition integration means;
Display image generation means for generating or updating display image data in accordance with execution of the application software by the application execution means,
The information processing apparatus according to claim 1.

An information processing method executed by an information processing apparatus that accepts an operation by a user,
A touch operation recognition step for recognizing the content of the touch operation by the user;
A voice recognition control step for controlling execution of voice recognition processing on voice data indicating the user's utterance content;
The content of the touch operation recognized by the process of the touch operation recognition step and the result of the voice recognition process executed by the control process of the voice recognition control step are integrated to obtain the content of the entire operation by the user. An operation recognition integration step for recognizing;
Including
The operation recognition integration step includes
When the touch operation and utterance by the user are performed at the same time, the content of the touch operation and the result of the voice recognition process are integrated and recognized as the same instruction operation,
When one of the touch operation and utterance by the user is performed alone, at least one of the type and accuracy of the operation is separated by the content of the touch operation and the result of the voice recognition process, Recognize the entire content of user operations,
Including steps,
Information processing method.

A computer that accepts user operations,
Touch operation recognition means for recognizing the content of the touch operation by the user,
Voice recognition control means for controlling execution of voice recognition processing on voice data indicating the user's utterance content;
An operation for recognizing the content of the entire operation by the user by integrating the content of the touch operation recognized by the touch operation recognizing unit and the result of the speech recognition process executed by the control of the speech recognition control unit. Cognitive integration means,
Function as
As at least part of the operation recognition integration means,
When the touch operation and utterance by the user are performed at the same time, the content of the touch operation and the result of the voice recognition process are integrated and recognized as the same instruction operation,
When one of the touch operation and utterance by the user is performed alone, at least one of the type and accuracy of the operation is separated by the content of the touch operation and the result of the voice recognition process, Recognize the entire content of user operations,
A program that makes it work.