WO2000019307A1 - Method and apparatus for processing interaction - Google Patents

Method and apparatus for processing interaction Download PDF

Info

Publication number
WO2000019307A1
WO2000019307A1 PCT/JP1998/004295 JP9804295W WO0019307A1 WO 2000019307 A1 WO2000019307 A1 WO 2000019307A1 JP 9804295 W JP9804295 W JP 9804295W WO 0019307 A1 WO0019307 A1 WO 0019307A1
Authority
WO
WIPO (PCT)
Prior art keywords
output
information
data processing
input
output information
Prior art date
Application number
PCT/JP1998/004295
Other languages
French (fr)
Japanese (ja)
Inventor
Yasuharu Nanba
Tomohiro Murata
Hirokazu Aoshima
Original Assignee
Hitachi, Ltd.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hitachi, Ltd. filed Critical Hitachi, Ltd.
Priority to PCT/JP1998/004295 priority Critical patent/WO2000019307A1/en
Publication of WO2000019307A1 publication Critical patent/WO2000019307A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/14Digital output to display device ; Cooperation and interconnection of the display device with other functional units
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/14Digital output to display device ; Cooperation and interconnection of the display device with other functional units
    • G06F3/1423Digital output to display device ; Cooperation and interconnection of the display device with other functional units controlling a plurality of local displays, e.g. CRT and flat panel display

Definitions

  • the present invention relates to a technology for using a user interface such as a personal computer and a portable information device. More specifically, the present invention relates to a microphone, a camera, a pen, a mouse, a keyboard, and the like as input devices for voice, handwritten, and printed characters. It controls computers and application software on behalf of the user based on instructions using multiple input modalities such as image images, gestures, pen gestures, etc., and uses a display, speaker, robot arm, carrier, etc. as an output device and audio. The present invention relates to a dialogue processing method and a dialogue processing device that responds with a plurality of output modalities such as text, stationary surfaces, moving images, and sound effects. Background art
  • An object of the present invention is to provide a plurality of data processing means so that an output result requiring simple calculation processing is output to a user in a short time, and an output result requiring complicated calculation processing is output to a user after an appropriate time. In this way, it is possible to obtain output results corresponding to the processing time from a single data / instruction input by the user, and to increase the efficiency of interaction. It is an object of the present invention to provide a dialog processing method and a dialog processing apparatus which can easily select an output result in real time.
  • Another object of the present invention is to sequentially report the intermediate result to the user, thereby achieving a response-free time.
  • Dialogue processing and dialogue processing that can reduce the anxiety felt by the user by shortening the time, make the behavior of the dialogue processing itself understandable, and allow the user to unconsciously learn an efficient data input method. It is to provide a device.
  • a plurality of data processing means capable of responding to a single data are prepared, and these are operated in parallel. Each time output information based on the plurality of data processes is obtained, a response is output to the user. Therefore, a quick response is returned from the data processing means consisting of only the sub-means that does not require much effort, and the sub-means that take a long time to respond to perform complicated calculations may respond after a certain amount of time. it can. Since the response contents required for the user are returned in a time commensurate with the response time, the no-response time is minimized, and therefore, it is possible to reduce the user's feeling of irritability and inconvenience, and the problem ( 1) is solved.
  • FIG. 1 is a diagram showing Embodiment 1 of the present invention
  • FIG. 2 is a diagram showing Embodiment 2 of the present invention
  • FIG. 3 is a diagram showing Embodiment 3 of the present invention
  • FIG. 9 is a diagram showing Example 4.
  • FIG. 1 shows an embodiment of the present invention.
  • a plurality of input means (101 1 103) and data processing of input information from the plurality of input means (101 to 103) are performed.
  • a plurality of output means (10 7 1) for outputting output information from the plurality of data processing means (10 4 -10 6).
  • Each of these means (101111) is realized by combining software and software in a computer system. Some means may be realized by sharing the resources of one computer system.
  • each of these means (1101109) is connected to a processor, storage means for temporarily storing data to be processed by the processor (such as a memory or an external storage device), and other means. Communication means for transferring data may be provided.
  • processors, storage means, and communication means may use existing technologies used in ordinary computer systems, their details will not be described here. Incidentally, it is desirable that each of these means (101111) be processed independently in parallel or in parallel in order to effectively utilize the present invention. Techniques for processing multiple means in parallel or in parallel are also known as parallel processing and distributed processing. Since the technique may be used, the details will not be described here.
  • the input means (101 to 103) in the present embodiment includes, in addition to input means of a general computer system (keyboard / mouse, etc.), means for inputting a user's voice such as a microphone or a handset, Touchno ,.
  • a means of inputting user movements such as a Nerve computer data globe, or input of information (eg, printed matter) or a user (eg, facial expression, gesture, etc.) viewed by the user, such as an image scanner or video camera There are means to do so.
  • the effects of the present invention are not limited to the input means enumerated in this way. For example, even if a sensor for detecting a user's body temperature, pulse, brain wave, or the like is used as an input means, it will be easily understood by those skilled in the art from the specification that the present invention can be implemented.
  • the data processing means (104 to 106) in this embodiment receives the input information received by such input means (101 to 103). What is desired here is that, for example, the input information received by a certain input unit 101 is usually transferred to at least two or more data processing units. The present invention can be implemented even when it is delivered to one data processing means, but is not desirable in terms of the effects intended by the present invention. Now, each data processing means independently processes in parallel or in parallel. An example of the data processing means (104 to 106) will be described below. In addition, between the sub-units, which are the components of each data processing unit described below, the processing may be performed via a storage unit or a communication unit in the processing of the computer system.
  • Example 1 of data processing means Input information from input means having an input device such as a microphone, a camera, a pen, a keyboard, etc. is stored in a reception buffer, and the input information is transmitted as output information from a speaker / guffer to a speaker. And output to an output device such as a display with an output device. For example, audio input from a microphone is output from a speaker. Alternatively, the image data input from the camera is displayed on the display. Alternatively, stroke data entered from a pen is displayed in ink format as a trajectory (handwriting).
  • a recognition process is performed on input information input in the same manner as in example 1 of data processing means, and the recognition result is output in the same manner as in example 1 of data processing means.
  • a speech recognition result is output for speech.
  • it outputs an image recognition result from the image data, in particular, an OCR character recognition result of a character image captured in the image data.
  • a handwritten character recognition result is output from the stroke data.
  • these recognition results are once converted to codes (character codes, etc.) and then output in another modality.
  • the voice input is displayed in characters, or the voice is synthesized and output from the image input.
  • modality in the present application is used to mean “the type of exchange channel used by a person and a computer for communication”. Specifically, for input, voice, handwriting stroke, image surface image, keyboard typing, gesture gesture, etc.For output, synthesized voice, character display, graphic display, response to abbreviated program operation, etc. Means
  • Example 3 of data processing means A semantic analysis process is performed on the recognition processing result recognized in the same manner as in Example 2 of the data processing means, and a command is generated based on the semantic analysis result.
  • the output device and the application program are operated and responded in the same manner as in Example 2 of the data processing means based on the data. For example, if you type “Send mail” by voice, a message “Start mail system” appears in the window, and the mail system that is the application system starts.
  • Example 4 of data processing means An intention analysis is performed on the semantic analysis results analyzed in the same manner as in Example 3 of the data processing means, and a response strategy is determined based on the intention analysis results. Then, based on the determined response joint result, the data is output in the same manner as in Example 3 of the overnight processing means. For example, if the character “Return” is input by hand and the voice is also input “How is this method?” At the same time, the response strategy is “Describe the method of returning”. Animated icons and output help messages with synthesized speech synchronized with them.
  • the output means (107 to 109) in the present embodiment includes, for example, a speech synthesizer, a robot arm, and the like in addition to the output means (display, speaker, etc.) of a normal computer system.
  • a speech synthesizer when displaying on a display, there are various formats such as text, still images, moving images, icons, and combinations thereof.
  • a speaker for output when using a speaker for output, there are also modes such as BGM, sound effects, voice, and combinations thereof. In the present invention, these may be divided into different output units, or some of the methods may be combined and implemented as one output unit. Further, the effects of the present invention are not limited to the output means thus enumerated.
  • a response may be indirectly output by controlling the application system or issuing a command to the database or the database system. In this sense, it will be easily understood by those skilled in the art from this specification that the present invention can be implemented even if the television air conditioner is used as an output unit.
  • the same effect as described above can be obtained when the data processing means performs processing for outputting an intermediate result of the sub-means.
  • various interpretation processes on the computer system side are performed. By notifying the user of the results of the process one after another, the non-response time can be shortened to reduce the anxiety felt by the user, and the behavior of the dialog processing itself can be understood. Further, by receiving such a response, the user can learn what input means can be used to obtain a desired computer system response.
  • each means can be implemented by an independent program (or device).
  • Each means can be individually distributed as a computer system.
  • Such implementations can usually directly use techniques such as distributed processing and age-oriented programming. These techniques are well known to those skilled in the art and will not be described further.
  • FIG. 2 shows another embodiment of the present invention.
  • a plurality of input means 101 to 103
  • a plurality of these input means 101 to
  • Input information management means (210) for receiving input information from the input information processing means (210), and a plurality of data processing means (104-104) for processing the input information from the input information management means (201).
  • Example 1 Main differences from Example 1 The difference is that an input information management means (201) and an output information management means (202) are provided. The other means are the same as those described in the first embodiment, and therefore will not be described here.
  • Each of these means includes a processor, storage means for temporarily storing data to be processed by the processor (such as a memory and an external storage device), and communication means for transferring data to other means. May be provided.
  • the input information management means (201) in this embodiment receives all input information from the plurality of input means (101 to 103).
  • the input information is passed to an appropriate (single or plural) data processing means based on classification information determined in advance for each input means or separation information derived from the input information.
  • the classification information determined in advance for each input means is information on the data processing means to be delivered according to the input means.
  • the data processing means (104-: L06) is associated in advance with data processing means including speech recognition and speaker recognition from among the data processing means (104-: L06), and data which does not include these is stored. It is determined in advance that there is no correspondence with the processing means (for example, a data processing means comprising only character recognition means and the like).
  • the classification information that can be derived from the input information is information relating to a data processing unit to which the input information is to be delivered to specific input information in advance. For example, in the case of input information having an unambiguous intention such as an emergency stop, it is determined that the data processing means (104 to 106) should be preferentially delivered to a specific data processing means. .
  • the input information is delivered to an appropriate data processing unit for each input unit. In other words, it is possible to avoid malfunction of the data processing means and reduce the value of error check processing and error processing.
  • the output information management means (202) in the present embodiment receives all the output information from the plurality of data processing means (104 to 106).
  • the output information is appropriately determined based on the classification information determined in advance for each output means and the classification information derived from the output information.
  • Information about the data processing means to be delivered according to the output means For example, in the case of an output unit called a speaker, a data processing unit including voice synthesis and sound effect synthesis is previously associated with one of the data processing units (104 to 106), and a data processing unit that does not include these is provided. It is determined in advance that there is no correspondence with the means (for example, the data processing means consisting only of the moving image generation means).
  • the classification information that can be derived from the output information is information on an output unit to which the output information is to be delivered to specific output information in advance. For example, in the case of output information having a unique control content such as a reduction in volume, it is determined that output information (107 to 109) should be preferentially delivered to a specific output means.
  • the output information is delivered to an appropriate data processing unit for each output unit. That is, it is possible to avoid a malfunction of the output unit and reduce the load of the error check processing and the error processing.
  • Example 2 of data processing means Recognizing voice, A process of responding to the result of conversion into a character string of “mail”.
  • Example 3 of data processing means Interpret the contents of voice as a command to the application system, and respond, for example, to activate a mail system.
  • Processing (Example 4 of data processing means) Processing that interprets the intention of the voice and responds with a help suggestion such as "Who will you send it to?"
  • Such a plurality of data processing means Data processing results from the output information management means (202) can be sequentially transferred to appropriate output means.
  • the output information management means (202) sequentially and sequentially transfers the data processing results to the output means
  • the output information management means (202) is based on priority information determined in advance for each data processing means. Then, the output information management means (202) may interrupt the transfer of the data processing results to the output means sequentially and sequentially. For example, it is assumed that the priorities are determined in the order of (Example 1 of data processing means) to (Example 4 of data processing means).
  • the delivery of the processing result to the output means is interrupted, and the delivery of the data processing result of (Example 4 of the data processing means) to the output means is started.
  • the priority is higher than that of the data processing result of (Example 4 of data processing means). Because it is low, do not interrupt. In this way, by determining the priority for each data processing means, even if the data being output is interrupted earlier in time, more appropriate data processing obtained later can be output. In the above example, for the sake of simplicity, the priority is determined in advance for each data processing means, but depending on the work situation, designation from the user, the content of the input data, etc.
  • the priority information may be changed.
  • the output information management means (202) performs data processing. If the input of predetermined confirmation instruction information (for example, pressing the “Ok” button) is performed while the results are sequentially and sequentially delivered to the output means, the output information management means In (202), the delivery of the data processing results to the output means may be interrupted. At this time, information such as the time of occurrence, the place of occurrence, and the person who accompanied the confirmation instruction information may be used to determine whether or not to actually interrupt. In this way, it is possible to prevent the output information management means (202) from inadvertently continuing to successively deliver data processing results to the output means forever.
  • the output information management means (202) includes storage means (memory or memory) for storing output information from a plurality of data processing means (104 to: I06).
  • storage means memory or memory
  • option information corresponding to the output information is passed to the output means (107-109), and The predetermined selection instruction information corresponding to the option information is input again by the input means.
  • the output information management means (202) has, as option information,
  • Option 1 Voice playback
  • Option 2 Voice recognition result output
  • Option 3 Interpretation as command
  • Option 4 Interpretation of intention
  • the output information management means (202) outputs the corresponding data processing result. That is, for example, a process of outputting a start command to the mail system, which is a result of interpreting the contents of the voice of “mail” as a command, is executed.
  • the option information is output in a lump and the user selects it, thereby allowing the user to select the data processing result originally intended.
  • the user can take appropriate data processing according to the task at that time. It can be carried out.
  • the options output in this way allow the user to clearly and clearly understand the results and behavior during the conversation processing (the power of speech recognition being successful but not understanding the meaning). There is also a side effect that the user can understand the data input method suitable for the user.
  • FIG. 3 shows another embodiment of the present invention.
  • This embodiment is an example in which, when there are sub-means that can be commonly used in the data processing means (104 to 106) in FIGS. 1 and 2, these are combined into one.
  • the input means (10;! To 103), the input information management means (201), the output information management means (202), and the output means (107 to 109) are almost the same as those described so far, so the detailed description will be omitted. Omit.
  • the data processing means 1 (301) is composed of input information management means (201) and output information management means (202) as sub-means.
  • the input information management means (201) receives input information from the plurality of input means (101 to 103) and passes the input information to the input information recognition means (303) and the output information management means (202).
  • the output information management means (202) transfers the data processing result transferred from the input information management means (201) or the output information synthesis means (304) to the plurality of output means (107-109).
  • the data processing means 2 (302) is a sub-means of the data processing means 1 (301) (that is, an input information management means (201) and an output information management means (202)), and has input information recognition as a sub-means. Means (303) and output information synthesizing means (304).
  • the input information recognizing means (303) receives the input information from the input information managing means (201), performs a recognition process, and delivers the recognition result of the input information to the semantic analyzing means (306) and the output information synthesizing means (304). .
  • the output information synthesizing means (304) receives the data processing result from the input information recognition means (303) or the command generating means (307), performs a synthesizing process, and delivers the data processing result to the output information managing means (202). .
  • the data processing means 3 (305) is a sub-means of the data processing means 2 (302) (ie, input information management means (201), output information management means (202), and input information recognition means).
  • semantic analysis means (306) and command generation means (307) are sub-means.
  • the semantic analysis means (306) receives the recognition result of the input information from the input information recognition means (303), performs a semantic analysis process, and delivers the semantic analysis result to the intention analysis means (309) and the command generation means (307).
  • the command generating means (307) receives the data processing result from the semantic analyzing means (306) or the response planning means (310), performs a command generating process, and delivers the data processing result to the output information synthesizing means (304).
  • the data processing means 4 (308) is a sub-means of the data processing means 3 (305) (ie, input information management means (201), output information management means (202), input information recognition means (303), and output
  • the semantic analyzing means (306), and the command generating means (307) it is composed of intention analyzing means (309) and response planning means (310) as sub-means.
  • the intention analysis means (309) receives the result of the semantic analysis of the input information from the semantic analysis means (306), analyzes the intention, and delivers the result to the response planning means (310).
  • the response planning means (310) receives the data processing result from the intention analysis means (309), drafts a response, and outputs the data processing result to the command generation means (30).
  • each of these means and sub-means is used to store data in a processor, storage means (memory, external storage device, etc.) for temporarily storing data processed by the processor, and other means. It's okay to have communication means for handing over.
  • the processing route of the data processing means in this embodiment is substantially equivalent to the following four routes. That is, (Processing path of data processing means 1) Input information management means (201) Processing path to output information management means (202) immediately
  • Processing path 2 of data processing means Processing path from input information management means (201) to input information recognition means (303), to output information synthesis means (304), and to output information management means (202)
  • Processing path 4 of data processing means From input information management means (201), through input information recognition means (303), through semantic analysis means (306), through intention analysis means (309), and response planning means ( 310), through the command generation means (307), through the output information configuration means (304), and to the output information management means (202).
  • FIG. 4 shows another embodiment of the present invention.
  • This embodiment is an example of implementation called multi-modal interaction processing or multi-modal interaction processing that simultaneously handles a plurality of data processing means while having common sub-means as shown in FIG.
  • the voice input device (101) is one of input means, and is a device for inputting voice from a user via a microphone, a transmitter, or the like.
  • the stroke input device 402) is one of the input means, and is a device for inputting a stroke by a user's hand through a pen, a tablet touch panel, or the like.
  • the image input device (103) is one of input means, and is a device for inputting data of a printed material or the like viewed by a user via a image scanner, a CCD camera, or the like.
  • Text input device (104) input It is one of the means, and is a device in which a user inputs text via a keyboard or the like.
  • input means such as a mouse, a data glove, and a line-of-sight recognition device are connected in the same manner as the above-described input device, it will be understood by those skilled in the art that this specification can be implemented by those skilled in the art. It will be easy to understand if you read it.
  • the data processing means 1 includes input media control means (406) and output media control means (407) as sub-means.
  • the input media control means (406) controls the input devices (that is, the voice input device (101), the stroke input device (102), the image input device (103), and the text input device (104)). Receives raw input information from each input device, such as data, stroke data (usually a row of coordinates), image data, character codes, etc., formats the data, controls individual modality recognition means (409) and output media control Hand over to means (407).
  • it may be executed as a separate processing program for each input device.
  • the output media control means (407) controls output devices (that is, a voice synthesizer (418), an icon control device (419), a human agent control device (420), and an application control device (421)), and The raw data passed from the media control means (406), the control sequence of the speech synthesizer passed from the individual modality generation means (410), the event sequence to the window system, and the anthropomorphic agent And output information such as application commands, etc. to each output device.
  • a voice synthesizer (418)
  • an icon control device (419
  • a human agent control device 420
  • an application control device 4211
  • output devices that is, a voice synthesizer (418), an icon control device (419), a human agent control device (420), and an application control device (421)
  • the data processing means 2 is separate from the data processing means 1 (405) in addition to the sub-means (that is, the input media control means (406) and the output media control means (407)).
  • Individual modality recognition means (409) controls input media Receiving the voice data, stroke data, image data, and character codes as input information from the means (406), and recognizing them as a voice language, a handwritten character, a print character, and a character code, respectively; (415) and intermodality recognition adjustment means (412).
  • the preferred recognition algorithm / recognition unit differs for each modality, different implementation forms may be used as programs. In addition, these recognition processes may use existing technologies.
  • the individual modality generating means (410) converts the recognition result passed from the individual modality recognizing means (409) and the output information passed from the inter-modality response adjusting means (413) into a control sequence for the speech synthesizer. It converts the event sequence for the window system, the command for the anthropomorphic agent, the command for the application, etc. into the output information and transfers it to the output media control means (407).
  • the data processing means 3 (411) is a sub-means of the data processing means 2 (408) (ie, input media control means (406), individual modality recognition means (409), and individual modality generation means (410).
  • intermodality recognition adjustment means (412) determines an appropriate combination based on a plurality of recognition results from the individual modality recognition means (409), and uses this processing result with the semantic analysis means (415) and inter-modality response adjustment. Hand over to means (413).
  • the process for determining an appropriate combination is, for example, conversion into a data structure independent of each modality (for example, a recognition lattice structure), and a collection of components for each modality called a “dictionary”, This is done by evaluating the recognition results based on a set of rules for the temporal and positional arrangement of the components called a grammar, and determining a candidate combination of multiple recognition results.
  • a grammar a set of rules for the temporal and positional arrangement of the components
  • other processing methods may be used.
  • the inter-modality response adjusting means (413) is configured to output the appropriately combined recognition result passed from the inter-modality recognition adjusting means (412).
  • the data processing means 4 is a sub-means of the data processing means 3 (411) (ie, input media control means (406), individual modality recognition means (409), and intermodality recognition adjustment means (412). ), Intermodality response adjustment means (413), individual modality generation means (410), and output media control means (407)), as semantic analysis means (415), objective estimation means (416), and strategy determination means. (4 17) and power.
  • the semantic analyzing means (415) receives the processing result from the individual modality recognizing means (409), analyzes the semantics, and transfers the result to the purpose estimating means (416).
  • the purpose estimating means (417) receives the result of the semantic analysis from the semantic analyzing means (415), estimates the purpose, and transfers it to the strategy determining means (417).
  • the strategy determining means (417) receives the objective estimation result from the objective estimating means (417), determines a response strategy, derives information to be responded to the user, and delivers the information to the inter-modality response adjusting means (413).
  • the voice synthesizing device (418) is one of the output means, and performs voice synthesis based on the control sequence received from the output media control means (407).
  • the icon control device (419) is one of the output means, and performs a display control of icons and the like in the window system based on the event sequence received from the output media control means (407).
  • the anthropomorphic agent control device (420) is one of the output means, and is a device for controlling the anthropomorphic agent based on an instruction received from the output media control means (407).
  • the application control device (412) is one of output means, and is a device for controlling an application based on an application command or the like received from the output media control means (407).
  • output means such as projectors, transport vehicles, robot arms, etc. It will be easily understood by those skilled in the art from this specification that the present invention can be practiced even if the devices are connected in the same manner as the output device.
  • single or plural input data are processed by plural data processing means. That is, when one or a plurality of input data passes through the data processing means constituted by only the sub-means which does not require much effort (for example, as in the case of the data processing means 1 (405)). In the case of low-level processing paths, if a response is returned quickly and data processing involves sub-means that require time-consuming responses to perform complex calculations (for example, data processing means 4 (4 1 4) (A high level treatment route such as) will respond after some time.
  • the non-response time of the application becomes very short, and the remarkable effect of reducing the user's irritability and inconvenience can be reduced.
  • an output result requiring simple calculation processing can be obtained in a short time, and an output result requiring complicated calculation processing can be obtained.
  • an output result commensurate with the processing time can be obtained from a single data Z command input by the user, and an interaction processing method capable of increasing the efficiency of interaction. It is possible to provide a dialog processing apparatus and a dialog processing method and a dialog processing apparatus capable of easily selecting an output result according to a user's needs in real time.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • User Interface Of Digital Computer (AREA)

Abstract

A plurality of data processing means each processing a single data are operated in parallel and a response is sent to the user every time output information is obtained based on the processing of the plurality of data. The processing results are outputted through various paths at a time or sequentially to the user optionally, so that intended data processing results can be selected.

Description

明 細 書 対話処理方法および対話処理装置 技術分野  Description Dialogue processing method and dialogue processing device
本発明は、パーソナルコンピュータや携帯情報機器等のユーザィンタフヱ一スの 利用技術に関し、 より詳細には、 マイク、 カメラ、ペン、 マウス、 キーボードなど を入力デバイスとして、音声、文字の手書き、 印刷された文字のイメージ画像、身 振り、ペンジヱスチヤなど複数の入力モダリティを用いた指示に基づき、ユーザに 代わってコンピュータやアプリケーションソフトウエア等を制御し、ディスプレイ、 スピーカ、 ロボッ トアーム、搬送車などを出力デバイスとして、音声、 テキスト、 静止面、動画、効果音など複数の出力モダリティによつて応答する対話処理方法お よび対話処理装置に関する。 背景技術  The present invention relates to a technology for using a user interface such as a personal computer and a portable information device. More specifically, the present invention relates to a microphone, a camera, a pen, a mouse, a keyboard, and the like as input devices for voice, handwritten, and printed characters. It controls computers and application software on behalf of the user based on instructions using multiple input modalities such as image images, gestures, pen gestures, etc., and uses a display, speaker, robot arm, carrier, etc. as an output device and audio. The present invention relates to a dialogue processing method and a dialogue processing device that responds with a plurality of output modalities such as text, stationary surfaces, moving images, and sound effects. Background art
対話処理方法の一つとして、マルチモーダルユーザィンタフヱ一スを用いた対話 処理がある。 従来の技術の例としては、 文献 「Neal, J.G and Shapiro, S.C: Intelligent Multi-media Interface Technology, In Sullivan, J.W. and Tyler, S.W. editors, Intelligent User Interfaces, pp.11-43, ACM Press, Addison-wesley, New York (1991).」 では、 複数の入力装置からのデータを一旦統合し、 一つの処理装置 で解析処理や出力計画処理等を施した後、複数の出力装置を用 L、て応答することが 述べられている。  As one of the dialogue processing methods, there is a dialogue process using a multimodal user interface. Examples of conventional technologies include the literature "Neal, JG and Shapiro, SC: Intelligent Multi-media Interface Technology, In Sullivan, JW and Tyler, SW editors, Intelligent User Interfaces, pp. 11-43, ACM Press, Addison- wesley, New York (1991). ", integrates data from multiple input devices, performs analysis and output planning with a single processor, and responds using multiple output devices. It is stated that
しかし、 上記技術を基本にした対話処理方法では、 以下に示す問題点がある。 問題点 (1 ) 要素処理の遅延が対話処理全体の遅延につながる。  However, the dialogue processing method based on the above technology has the following problems. Problems (1) The delay of element processing leads to the delay of the entire interactive processing.
すなわち、意味解析や出力計画処理等データを統合的に扱う処理(各モダリティ の入出力処理は含まず)は、入力データに基づいて、 シーケンシャルに動作するた め、各処理に掛る時間の総和が、対話処理全体の応答(レスポンス)性能に大きく 影響している。特に、意味解析処理などの高度に複雑な計算を行う処理は、通常、 長い時間がかかるため、その間の無応答時間がユーザに苛々感ゃ使い勝手の悪さを 感じさせる要因となっている。 In other words, processes that handle data in an integrated manner, such as semantic analysis and output planning ) Does not include input / output processing, which operates sequentially based on input data. Therefore, the sum of the time required for each processing greatly affects the response performance of the entire interactive processing. In particular, processing that performs highly complex calculations such as semantic analysis usually takes a long time, and the non-response time during that time is a factor that makes the user feel irritated and inconvenient.
問題点( 2 ) 低水準な処理だけによる出力結果と高水準な出力結果とを選択で きない。  Problem (2) It is not possible to select an output result obtained only by low-level processing or a high-level output result.
すなわち、一通りの処理の流れしか持たないので、或るデータの入力に応じて、 常に、 同水準に処理された出力結果しか得られない。例えば、マイクロホンを介し て音声を入力した場合、単なる録音データ、 あるいは、音声認識結果に基づく音声 指示解釈に対する応答結果に対し、ユーザが出力結果を選択する余地はない。ある いは、例えば、ペンパソコン等を介して(手書き操作による)ペン入力を行った場 合、単なる手書き図形のインク表示(ビットマップ表示)、 ストローク認識結果に 基づく手書き文字認識結果、あるいは、さらに手書き文字認識結果に基づいた指示 解釈に対する応答結果に対し、 ユーザが出力結果を選択する余地はない。 発明の開示  In other words, since there is only one kind of processing flow, only an output result processed to the same level is always obtained in response to input of certain data. For example, when voice is input via a microphone, there is no room for the user to select an output result for simple recorded data or a response result to a voice instruction interpretation based on a voice recognition result. Alternatively, for example, when a pen input is performed (by handwriting operation) via a pen personal computer or the like, a simple ink display of a handwritten figure (bitmap display), a handwritten character recognition result based on a stroke recognition result, or a further handwritten character recognition result. There is no room for the user to select the output result for the response result to the instruction interpretation based on the handwritten character recognition result. Disclosure of the invention
本発明の目的は、複数のデータ処理手段を持つことで、簡易な計算処理で済む出 力結果は短時間に、複雑な計算処理を要する出力結果は相応の時間の後に、ユーザ に出力することで、ユーザが入力する単一のデータ/指示から処理時間に見合う出 力結果を取得でき、ィンタラクションの効率を高めることが可能な対話処理 法お よび対話処理装置、および、ユーザのニーズに合わせた出力結果を実時間において 容易に選択をすることが可能な対話処理方法および対話処理装置を提供すること にある。  An object of the present invention is to provide a plurality of data processing means so that an output result requiring simple calculation processing is output to a user in a short time, and an output result requiring complicated calculation processing is output to a user after an appropriate time. In this way, it is possible to obtain output results corresponding to the processing time from a single data / instruction input by the user, and to increase the efficiency of interaction. It is an object of the present invention to provide a dialog processing method and a dialog processing apparatus which can easily select an output result in real time.
本発明のもう一つ目的は、途中結果を逐次ユーザに報告することで、無応答時間 を短くしてユーザの感じる不安を軽減させたり、対話処理自体の挙動をわからせた り、効率的なデータ入力方法をユーザに無意識的に学習させたりすることが可能な 対話処理方法および対話処理装置を提供することにある。 Another object of the present invention is to sequentially report the intermediate result to the user, thereby achieving a response-free time. Dialogue processing and dialogue processing that can reduce the anxiety felt by the user by shortening the time, make the behavior of the dialogue processing itself understandable, and allow the user to unconsciously learn an efficient data input method. It is to provide a device.
問題点(1 )の解決のためには、単一のデータに応じることのできるデータ処理 手段を複数用意し、これらを並列的に動作させる。この複数のデータ処理に基づく 出力情報が得られる度に、ユーザに応答出力を行う。従って、手間の掛らないサブ 手段のみによつて構成されるデータ処理手段からは素早く応答が返り、複雑な計算 を行うためにレスポンスに時間の掛るサブ手段はそれなりの時間の後に応答する ことができる。ユーザにとって必要な応答内容がそれに見合う時間で返ってくるの で、無応答時間が最小になり、従って、ユーザに苛々感ゃ使い勝手の悪さを感じさ せることを軽減することができ、 問題点 (1 ) が解決する。  In order to solve the problem (1), a plurality of data processing means capable of responding to a single data are prepared, and these are operated in parallel. Each time output information based on the plurality of data processes is obtained, a response is output to the user. Therefore, a quick response is returned from the data processing means consisting of only the sub-means that does not require much effort, and the sub-means that take a long time to respond to perform complicated calculations may respond after a certain amount of time. it can. Since the response contents required for the user are returned in a time commensurate with the response time, the no-response time is minimized, and therefore, it is possible to reduce the user's feeling of irritability and inconvenience, and the problem ( 1) is solved.
次に問題点 (2 ) の解決のためには、 単一のデータに応じることのできるデー 夕処理を複数用意し、各経路から処理されたそれぞれのデータ処理結果をユーザに 選択肢として一括して出力したり、順次追加しつつ出力することで、ユーザは意図 したデータ処理結果を選ぶことができる。例えば、単に音声で「メールしろ」 と計 算機に入力した場合であっても、低水準なデ一夕処理結果から高水準なデ一夕処理 結果まで様々なデータ処理が考えられる。すなわち、 ( 1 )その音声を音声波形デ 一夕として記憶(録音)し、そのデータを再生出力するという応答を行う処理、(2 ) 音声を認識し、 「メールしろ」という文字列に変換した結果を応答する処理、 (3 ) 音声の内容をアプリケーションシステムへの命令として解釈し、例えばメ一ルシス テムを起動するという応答を行う処理、 (4 )音声の意図を解釈し、 「誰に ¾りま すか?」 といったヘルプや提案を返事するという応答を行う処理、などがある。 こ のような処理の結果をユーザが選択できることで、その時の作業に応じた適切なデ 一夕処理を行うことができる。 このようにして、 問題点 (2 ) が解決する。  Next, in order to solve the problem (2), multiple data processing that can respond to a single data is prepared, and the data processing results processed from each route are collectively provided to the user as options. The user can select the intended data processing result by outputting the data or outputting the data sequentially. For example, even if the user simply inputs "Email" to the calculator, various data processings from low-level data processing results to high-level data processing results are conceivable. That is, (1) a process of storing (recording) the voice as an audio waveform data and responding to reproduce and output the data, and (2) recognizing the voice and converting it to a character string of “mail”. The process of responding to the result, (3) the process of interpreting the content of the voice as an instruction to the application system, for example, the response of starting the mail system, and (4) the process of interpreting the intention of the voice, Do you want to reply? Help or a response to reply with a suggestion. Since the user can select the result of such processing, an appropriate overnight processing can be performed according to the work at that time. In this way, the problem (2) is solved.
ちなみに、このようにして出力された選択肢によって、ユーザは対話処理の途中 結果や挙動(文字認識は失敗したと力、、音声認識は成功したが意味が理解できてい ないとか)カ^具体的に明瞭にわかるようになり、ユーザは自分に合ったデータ入 力方法がわかるようになる。 図面の簡単な説明 By the way, the choices output in this way allow the user to The results and behavior (character recognition failed, and voice recognition succeeded, but the meaning was not understood) were clearly and clearly understood, and the user could find a data input method that suits him. You will understand. BRIEF DESCRIPTION OF THE FIGURES
第 1図は本発明の実施例 1を示す図、第 2図は本発明の実施例 2を示す図、第 3図 は本発明の実施例 3を示す図、 第 4図は本発明の実施例 4を示す図である。 発明を実施するための最良の形態 FIG. 1 is a diagram showing Embodiment 1 of the present invention, FIG. 2 is a diagram showing Embodiment 2 of the present invention, FIG. 3 is a diagram showing Embodiment 3 of the present invention, and FIG. FIG. 9 is a diagram showing Example 4. BEST MODE FOR CARRYING OUT THE INVENTION
以下、 図面を用いて本発明の実施例を説明する。  Hereinafter, embodiments of the present invention will be described with reference to the drawings.
(実施例 1 ) 第 1図は本発明の実施例を示す。本実施例は、第 1図に示すように、 複数の入力手段( 1 0 1 1 0 3 )と、これらの複数の入力手段( 1 0 1〜: 1 0 3 ) からの入力情報をデータ処理する複数のデータ処理手段( 1 0 4 ~ 1 0 6 ) と、 こ れらの複数のデータ処理手段(1 0 4 1 0 6 )からの出力情報を出力する複数の 出力手段(1 0 7 1 0 9 ) より構成されている。 これらの各手段(1 0 1 1 0 9 )は、計算機システムにおいては、 ドウエアゃソフトウエアを組み合わせて 実現される。いくつかの手段が一つの計算機システムのリソースを共有して実現し ても構わない。 また、 これらの各手段(1 0 1 1 0 9 ) は、 それぞれプロセッサ や、プロセッサで処理すべきデータを一時的に保持するための記憶手段(メモリや 外部記憶装置など)や、他の手段へデータを引き渡すための通信手段などを備えて いて構わない。 これらのプロセッサや記憶手段や通信手段は、通常の計算機 ステ ムで使用されている既存技術を利用して構わないので、ここではその詳細は説明し ない。 なお、 これらの各手段(1 0 1 1 0 9 ) は、 それぞれ独立に並列的または 並行的に処理されることが本発明を有効に利用する上で望ましい。複数の手段を並 列的または並行的に処理する技術も、並列処理や分散処理等で知られている既存技 術を利用して構わないので、 ここではその詳細は説明しない。 Embodiment 1 FIG. 1 shows an embodiment of the present invention. In the present embodiment, as shown in FIG. 1, a plurality of input means (101 1 103) and data processing of input information from the plurality of input means (101 to 103) are performed. And a plurality of output means (10 7 1) for outputting output information from the plurality of data processing means (10 4 -10 6). 0 9). Each of these means (101111) is realized by combining software and software in a computer system. Some means may be realized by sharing the resources of one computer system. In addition, each of these means (1101109) is connected to a processor, storage means for temporarily storing data to be processed by the processor (such as a memory or an external storage device), and other means. Communication means for transferring data may be provided. Since these processors, storage means, and communication means may use existing technologies used in ordinary computer systems, their details will not be described here. Incidentally, it is desirable that each of these means (101111) be processed independently in parallel or in parallel in order to effectively utilize the present invention. Techniques for processing multiple means in parallel or in parallel are also known as parallel processing and distributed processing. Since the technique may be used, the details will not be described here.
本実施例における入力手段( 1 0 1〜 1 0 3 )は、通常の計算機システムの入力 手段(キーボードゃマウスなど)に加え、例えば、マイクロホンや受話器のように ユーザの音声を入力する手段や、タッチノ、。ネルゃぺンコンピュータゃデータグロー ブのようにユーザの動きを入力する手段や、イメージスキャナやビデオカメラのよ うにユーザの見ている情報 (印刷物など) やユーザ自体 (表情ゃジエスチヤなど) を入力する手段などがある。 もちろん、本発明の効果はこのようにして列挙した入 力手段に制限されているわけではない。例えば、ユーザの体温や脈拍や脳波等を検 出するセンサを入力手段として利用したとしても、本発明が実施できるということ は、 同業者が本明細書を読めば容易に理解できるであろう。  The input means (101 to 103) in the present embodiment includes, in addition to input means of a general computer system (keyboard / mouse, etc.), means for inputting a user's voice such as a microphone or a handset, Touchno ,. A means of inputting user movements, such as a Nerve computer data globe, or input of information (eg, printed matter) or a user (eg, facial expression, gesture, etc.) viewed by the user, such as an image scanner or video camera There are means to do so. Of course, the effects of the present invention are not limited to the input means enumerated in this way. For example, even if a sensor for detecting a user's body temperature, pulse, brain wave, or the like is used as an input means, it will be easily understood by those skilled in the art from the specification that the present invention can be implemented.
本実施例におけるデータ処理手段(1 0 4〜1 0 6 )は、このような入力手段( 1 0 1〜1 0 3 )によって受け付けられた入力情報を引き渡される。ここで喚起した いことは、例えば或る入力手段 1 0 1が受け付けた入力情報は、通常、少なくとも 2つ以上のデータ処理手段に引き渡されることである。本発明は、 1つのデータ処 理手段に引き渡した場合でも実施はできるが、本発明が意図する効果の点で望まし くない。 さて、各データ処理手段は、独立に並列的または並行的に処理している。 データ処理手段( 1 0 4〜1 0 6 )の例を以下で説明する。なお、以下で説明する 各データ処理手段の構成要素であるサブ手段の間に、計算機システムの処理上、記 憶手段や通信手段を介して実施しても構わない。  The data processing means (104 to 106) in this embodiment receives the input information received by such input means (101 to 103). What is desired here is that, for example, the input information received by a certain input unit 101 is usually transferred to at least two or more data processing units. The present invention can be implemented even when it is delivered to one data processing means, but is not desirable in terms of the effects intended by the present invention. Now, each data processing means independently processes in parallel or in parallel. An example of the data processing means (104 to 106) will be described below. In addition, between the sub-units, which are the components of each data processing unit described below, the processing may be performed via a storage unit or a communication unit in the processing of the computer system.
(データ処理手段の例 1 )マイク、 カメラ、ペン、キ一ボードなどの入力デバイス を持つ入力手段からの入力情報を受信バッファに蓄え、その入力情報を出力情報と して送信/ぐッファからスピーカ、ディスプレイなど出力デノくィスを持つ出力手段へ 出力する。例えば、マイクから入力した音声を、 スピーカから出力する。 あるいは、 カメラから入力したイメージデータをディプレイ上に表示する。あるいは、ペンか ら入力したストロークデータをインク形式でその軌跡 (筆跡) を表示する。 (データ処理手段の例 2 )データ処理手段の例 1と同様にして入力した入力情報に 対して認識処理を施し、その認識結果をデータ処理手段の例 1と同様にして出力す る。例えば、音声に対して音声認識結果を出力する。あるいは、 イメージデータか ら画像認識結果、 特に、 イメージデータ内に写されている文字の像の OCR文字認 識結果を出力する。あるいは、ストロークデータから手書き文字認識結果を出力す る。通常、 これらの認識結果は一旦コード(文字コード等) に変換した後、別のモ ダリティで出力する。例えば、音声入力を文字表示したり、イメージ入力から音声 合成出力したりする。 なお、以下、本願における 「モダリティ」 とは、 「人と計算 機がコミュニケ一シヨンに用いる交換チャネルの種類」の意味で使う。具体的には、 入力に関して、音声、手書きス 'トローク、 イメージ面像、キーボードのタイピング、 身振り手ぶりなど、 出力に関して、合成音声、文字表示、 グラフィック表示、 アブ リケ一シヨンプログラム作動応答などの様式を意味している。 (Example 1 of data processing means) Input information from input means having an input device such as a microphone, a camera, a pen, a keyboard, etc. is stored in a reception buffer, and the input information is transmitted as output information from a speaker / guffer to a speaker. And output to an output device such as a display with an output device. For example, audio input from a microphone is output from a speaker. Alternatively, the image data input from the camera is displayed on the display. Alternatively, stroke data entered from a pen is displayed in ink format as a trajectory (handwriting). (Example 2 of data processing means) A recognition process is performed on input information input in the same manner as in example 1 of data processing means, and the recognition result is output in the same manner as in example 1 of data processing means. For example, a speech recognition result is output for speech. Alternatively, it outputs an image recognition result from the image data, in particular, an OCR character recognition result of a character image captured in the image data. Alternatively, a handwritten character recognition result is output from the stroke data. Normally, these recognition results are once converted to codes (character codes, etc.) and then output in another modality. For example, the voice input is displayed in characters, or the voice is synthesized and output from the image input. Hereinafter, the term “modality” in the present application is used to mean “the type of exchange channel used by a person and a computer for communication”. Specifically, for input, voice, handwriting stroke, image surface image, keyboard typing, gesture gesture, etc.For output, synthesized voice, character display, graphic display, response to abbreviated program operation, etc. Means
(デ一夕処理手段の例 3 )データ処理手段の例 2と同様にして認識した認識処理結 果に対して意味解析処理を施し、その意味解析結果に基づいてコマンド生成を行い、 生成したコマンドに基づいてデータ処理手段の例 2と同様にして出力デバイスや アプリケーションプログラムを動作させて応答する。例えば、音声で「メールしろ」 と入力した場合、 「メールシステムを起動します」 とウィンドウにメッセージを表 示し、 アプリケーションシステムであるメールシステムを起動する。  (Example 3 of data processing means) A semantic analysis process is performed on the recognition processing result recognized in the same manner as in Example 2 of the data processing means, and a command is generated based on the semantic analysis result. The output device and the application program are operated and responded in the same manner as in Example 2 of the data processing means based on the data. For example, if you type “Send mail” by voice, a message “Start mail system” appears in the window, and the mail system that is the application system starts.
(データ処理手段の例 4 )データ処理手段の例 3と同様にして意味解析した意味解 析結果に対して意図解析を行し、、その意図解析結果に基づし、て応答戦略を決定し、 決定した応答連略結果に基づし、てデ一夕処理手段の例 3と同様にして出力す、る。例 えば、 「返送」 という文字を手書きで文字入力しつつ、 音声で「この方法は?」 と 同時的に入力した場合、応答戦略として「返送する方法について説明する」ことを 決定し、具体的なアイコンのアニメーションや、これと同期した合成音声によるへ ルプメッセージの出力を行う。 以上のデータ処理手段の例 1〜4以外にも、データ処理手段を構成しているサブ 手段の組み合わせに応じて多様なデータ処理手段がある。 (Example 4 of data processing means) An intention analysis is performed on the semantic analysis results analyzed in the same manner as in Example 3 of the data processing means, and a response strategy is determined based on the intention analysis results. Then, based on the determined response joint result, the data is output in the same manner as in Example 3 of the overnight processing means. For example, if the character “Return” is input by hand and the voice is also input “How is this method?” At the same time, the response strategy is “Describe the method of returning”. Animated icons and output help messages with synthesized speech synchronized with them. In addition to the above examples 1-4 of the data processing means, there are various data processing means depending on the combination of the sub-means constituting the data processing means.
本実施例における出力手段(1 0 7 ~ 1 0 9 )は、通常の計算機システムの出力 手段(ディスプレイやスピーカなど) に加え、例えば、音声合成装置や、 ロボッ 卜 アームなどがある。 また、ディスプレイを使って表示出力する場合、テキスト、静 止画、動画、 アイコン、 それらの組み合わせ、 などの様式がある。 スピーカを使つ て出力する場合も、 B G M、効果音、音声、 それらの組み合わせ、 などの様式があ る。本発明では、 これらをそれぞれ別の出力手段に分けても良いし、いくつかの様 式を組み合わせて一つの出力手段として実施しても構わない。さらに、本発明の効 果はこのようにして列挙した出力手段に制限されているわけではない。例えば、ァ プリケーションシステムを制御したり、〇 Sやデータベースシステムへのコマンド を発効したりするなどして間接的に応答出力しても構わない。この意味では、テレ ビゃ空調機を出力手段としてみなして利用したとしても、本発明が実施できるとい うことは、 同業者が本明細書を読めば容易に理解できるであろう。  The output means (107 to 109) in the present embodiment includes, for example, a speech synthesizer, a robot arm, and the like in addition to the output means (display, speaker, etc.) of a normal computer system. In addition, when displaying on a display, there are various formats such as text, still images, moving images, icons, and combinations thereof. When using a speaker for output, there are also modes such as BGM, sound effects, voice, and combinations thereof. In the present invention, these may be divided into different output units, or some of the methods may be combined and implemented as one output unit. Further, the effects of the present invention are not limited to the output means thus enumerated. For example, a response may be indirectly output by controlling the application system or issuing a command to the database or the database system. In this sense, it will be easily understood by those skilled in the art from this specification that the present invention can be implemented even if the television air conditioner is used as an output unit.
以上説明してきたように、複数のデ一タ処理手段が独立して並列的にデー夕処理 を行うことによって、手間の掛らな 、サブ手段のみによつて構成されたデ一タ処理 手段を経由した場合は、素早く応答が返るし、複雑な計算を行うためにレスポンス に時間の掛るサブ手段を含むデータ処理を経由した場合は、それなりの時間の後に 応答してくる。従って、ユーザにとって必要な応答内容が、常にそれに見合う時間 で出力されるという効果、及び、最早の応答内容は取り敢えずユーザに提示される ので、計算機システム全体(あるいはアプリケーションシステム全体)の無!^答時 間が最小になり、ユーザに苛々感ゃ使 L、勝手の悪さを感じさせることを軽減するこ とができるという顕著な効果も得ることができる。なお、本発明の簡単な変形例と して、データ処理手段がサブ手段の途中結果を出力させる処理を行わせた場合も上 述と同様の効果を得られる。また、このように計算機システム側の多様な解釈ゃ処 理の途中結果を逐次的にユーザに知らせることで、無応答時間を短くしてユーザの 感じる不安を軽減させたり、対話処理自体の挙動をわからせることができる。さら に、このような応答をユーザが受け取ることにより、 どのような入力手段を使用し た場合に所望の計算機システムの応答が得られるかをユーザが学習することがで きる。 As described above, by performing data processing independently and in parallel by a plurality of data processing means, it is possible to reduce the time and effort required for the data processing means constituted by only the sub-means. If it goes through, it will return a quick response, and if it goes through data processing that involves sub-means that require a long time to perform complicated calculations, it will respond after a certain amount of time. Therefore, the response contents required for the user are always output in a time commensurate with the effect, and the earliest response contents are presented to the user for the time being, so that the entire computer system (or the entire application system) is not available! The answer time can be minimized, and the user can be less irritated and less likely to feel uncomfortable. Note that, as a simple modification of the present invention, the same effect as described above can be obtained when the data processing means performs processing for outputting an intermediate result of the sub-means. Also, as described above, various interpretation processes on the computer system side are performed. By notifying the user of the results of the process one after another, the non-response time can be shortened to reduce the anxiety felt by the user, and the behavior of the dialog processing itself can be understood. Further, by receiving such a response, the user can learn what input means can be used to obtain a desired computer system response.
このように、本発明は、複数の入力すべき情報を同時的に入力したり、複数の出 力すべき情報を同時的に出力するマルチモーダルィンタフヱースシステムやマル チモーダル対話システムに対して特に有効な効果を発揮できる。 もちろん、複数の 情報をそれが対応する入力デバイス(または出力デバイス)毎に切り替えつつ逐次 的に入力(または出力) したり、予め用意された情報を入出力する形態のインタフ ヱース (例えば、 マルチメディアインタフェース、 グラフィカルユーザインタフエ —ス、 コマンドラインインタフヱ一スなど) や対話システムでも実施できる。 第 1図から分かるように、本実施例は、各手段が独立のプログラム(または装置) で実施することができる。各手段は、計算機システムとしては、それぞれ個別に分 散配置することが可能である。 このような実装は、通常、分散処理とかエージ工ン ト指向プログラミングとかいった技術を直接的に使用することできる。これらの技 術は、 当該分野の同業者では周知の技術であるので、 これ以上の説明はしない。  As described above, the present invention is applied to a multi-modal interface system or a multi-modal interactive system that simultaneously inputs a plurality of pieces of information to be input and simultaneously outputs a plurality of pieces of information to be output. Particularly effective effects can be exhibited. Needless to say, an interface in which a plurality of pieces of information are sequentially input (or output) while being switched for each input device (or output device) corresponding to the information, or an interface for inputting and outputting information prepared in advance (for example, multimedia) Interface, graphical user interface, command line interface, etc.) and interactive systems. As can be seen from FIG. 1, in this embodiment, each means can be implemented by an independent program (or device). Each means can be individually distributed as a computer system. Such implementations can usually directly use techniques such as distributed processing and age-oriented programming. These techniques are well known to those skilled in the art and will not be described further.
(実施例 2 ) 第 2図は本発明の別の実施例を示す。本実施例は、第 2図に示すよ うに、複数の入力手段( 1 0 1〜 1 0 3 ) と、 これらの複数の入力手段( 1 0 1〜(Embodiment 2) FIG. 2 shows another embodiment of the present invention. In this embodiment, as shown in FIG. 2, a plurality of input means (101 to 103) and a plurality of these input means (101 to
1 0 3 )からの入力情報を受け取る入力情報管理手段(2 0 1 ) と、入力情報管理 手段(2 0 1 )からの入力情報をデータ処理する複数のデータ処理手段(1 0 4〜Input information management means (210) for receiving input information from the input information processing means (210), and a plurality of data processing means (104-104) for processing the input information from the input information management means (201).
1 0 6 ) と、 これらの複数のデータ処理手段(1 0 4〜1 0 6 )からの出力情報を 受け取る出力情報管理手段(2 0 2 ) と、 出力情報管理手段からの出力情報を出力 する複数の出力手段(1 0 7〜1 0 9 ) より構成されている。実施例 1との主な差 異は、入力情報管理手段 ( 2 0 1 ) と出力情報管理手段 ( 2 0 2 )が設けられてい るということである。これ以外の手段は、実施例 1で説明したものと同様であるた め、特にここでは説明しない。 また、 これらの各手段は、それぞれプロセッサや、 プロセッサで処理すべきデータを一時的に保持するための記憶手段(メモリや外部 記憶装置など)や、他の手段へデータを引き渡すための通信手段などを備えていて 構わない。 106), output information management means (202) for receiving output information from the plurality of data processing means (104 to 106), and output information from the output information management means. It comprises a plurality of output means (107 to 109). Main differences from Example 1 The difference is that an input information management means (201) and an output information management means (202) are provided. The other means are the same as those described in the first embodiment, and therefore will not be described here. Each of these means includes a processor, storage means for temporarily storing data to be processed by the processor (such as a memory and an external storage device), and communication means for transferring data to other means. May be provided.
本実施例における入力情報管理手段( 2 0 1 )は、複数の入力手段( 1 0 1〜 1 0 3 )からの入力情報を一旦すベて受け取る。入力手段毎に予め決めておいた分別 情報や、入力情報から導き出せる分別情報に基づいて、その入力情報を適切な(単 数あるいは複数の)データ処理手段へ引き渡す。入力手段毎に予め決めておいた分 別情報とは、入力手段に応じて引き渡すべきデータ処理手段に関する情報である。 例えば、 マイクロホンという入力手段の場合、 データ処理手段 (1 0 4〜: L 0 6 ) の中から音声認識や話者認識等を含むデータ処理手段へ予め対応関係付けておき、 これらを含まないデータ処理手段(例えば、文字認識手段等のみから構成されるデ 一夕処理手段)へは対応関係がない旨を予め決めておく。 また、入力情報から導き 出せる分別情報とは、予め特定の入力情報に対して、その入力情報を引き渡すべき データ処理手段に関する情報である。例えば、緊急停止といった一義的な意図を持 つような入力情報の場合、データ処理手段(1 0 4〜1 0 6 )の中から特定のデー 夕処理手段へ優先的に引き渡すように決めておく。以上のような分別情報を用いる ことで、入力手段毎に適切なデータ処理手段へ入力情報が引き渡される。すなわち、 データ処理手段の誤動作を回避したり、エラ一チェック処理やエラー処理の尊荷を 軽減させることができる。  The input information management means (201) in this embodiment receives all input information from the plurality of input means (101 to 103). The input information is passed to an appropriate (single or plural) data processing means based on classification information determined in advance for each input means or separation information derived from the input information. The classification information determined in advance for each input means is information on the data processing means to be delivered according to the input means. For example, in the case of an input means such as a microphone, the data processing means (104-: L06) is associated in advance with data processing means including speech recognition and speaker recognition from among the data processing means (104-: L06), and data which does not include these is stored. It is determined in advance that there is no correspondence with the processing means (for example, a data processing means comprising only character recognition means and the like). Further, the classification information that can be derived from the input information is information relating to a data processing unit to which the input information is to be delivered to specific input information in advance. For example, in the case of input information having an unambiguous intention such as an emergency stop, it is determined that the data processing means (104 to 106) should be preferentially delivered to a specific data processing means. . By using the classification information as described above, the input information is delivered to an appropriate data processing unit for each input unit. In other words, it is possible to avoid malfunction of the data processing means and reduce the value of error check processing and error processing.
本実施例における出力情報管理手段(2 0 2 )は、複数のデータ処理手段(1 0 4〜1 0 6 )からの出力情報を一旦すベて受け取る。出力手段毎に予め決めておい た分別情報や、出力情報から導き出せる分別情報に基づいて、その出力情報を適切 な(単数あるいは複数の)出力手段へ引き渡す。出力手段に応じて引き渡されるべ きデータ処理手段に関する情報である。例えば、スピーカという出力手段の場合、 データ処理手段( 1 0 4〜 1 0 6 )の中から音声合成や効果音合成等を含むデータ 処理手段と予め対応関係付けておき、これらを含まないデータ処理手段(例えば、 動画生成手段等のみから構成されるデータ処理手段)とは対応関係がない旨を予め 決めておく。 また、 出力情報から導き出せる分別情報とは、予め特定の出力情報に 対して、その出力情報を引き渡すべき出力手段に関する情報である。例えば、音量 の低減といった一義的な制御内容を持つような出力情報の場合、出力手段(1 0 7 〜1 0 9 )の中から特定の出力手段へ優先的に引き渡すように決めておく。以上の ような分別情報を用いることで、出力手段毎に適切なデータ処理手段へ出力情報が 引き渡される。すなわち、 出力手段の誤動作を回避したり、エラーチェック処理や エラ一処理の負荷を軽減させることができる。 The output information management means (202) in the present embodiment receives all the output information from the plurality of data processing means (104 to 106). The output information is appropriately determined based on the classification information determined in advance for each output means and the classification information derived from the output information. To one or more output means. Information about the data processing means to be delivered according to the output means. For example, in the case of an output unit called a speaker, a data processing unit including voice synthesis and sound effect synthesis is previously associated with one of the data processing units (104 to 106), and a data processing unit that does not include these is provided. It is determined in advance that there is no correspondence with the means (for example, the data processing means consisting only of the moving image generation means). Further, the classification information that can be derived from the output information is information on an output unit to which the output information is to be delivered to specific output information in advance. For example, in the case of output information having a unique control content such as a reduction in volume, it is determined that output information (107 to 109) should be preferentially delivered to a specific output means. By using the classification information as described above, the output information is delivered to an appropriate data processing unit for each output unit. That is, it is possible to avoid a malfunction of the output unit and reduce the load of the error check processing and the error processing.
以上の説明から、単一のデ一夕に応じることのできるデータ処理を複数用意し、 各経路から処理されたそれぞれのデータ処理結果を、順次逐次的に出力することが できる。例えば、ユーザが「メールしろ」 という音声を入力手段(例えば、 1 0 1 ) であるマイクロホンを介して入力する場合、入力情報管理手段(2 0 1 )は、 これ を音声波形データとして受け取り、複数のデータ処理手段(1 0 4〜1 0 6 )へ分 配する。 このデータ処理手段は、例えば、以下の (データ処理手段の例 1 )〜 (デ 一夕処理手段の例 4 )を行う。 (データ処理手段の例 1 )その音声を音声波形デ一 タとして記憶(録音) し、 そのデータを再生出力するという応答を行う処理、 (デ —夕処理手段の例 2 )音声を認識し、 「メールしろ」 という文字列に変換した結果 を応答する処理、 (データ処理手段の例 3 )音声の内容をアプリケーションシステ ムへの命令として解釈し、例えばメ一ルシステムを起動するという応答を行う処理、 (データ処理手段の例 4 )音声の意図を解釈し、 「誰に送りますか?」 といったへ ルプゃ提案を返事するという応答を行う処理。 このような複数のデータ処理手段 からのデータ処理結果を出力情報管理手段(2 0 2 )カ^順次逐次的に適切な出力 手段へ引き渡すことができる。 From the above description, it is possible to prepare a plurality of data processes that can respond to a single data, and to sequentially output the respective data processing results processed from each path. For example, when the user inputs a voice saying “Send mail” through a microphone that is an input means (for example, 101), the input information management means (201) receives this as audio waveform data, and To the data processing means (104 to 106). This data processing means performs, for example, the following (Example 1 of data processing means) to (Example 4 of data processing means). (Example 1 of data processing means) A process of storing (recording) the voice as voice waveform data and responding to reproduce and output the data. (Example 2 of data processing means) Recognizing voice, A process of responding to the result of conversion into a character string of “mail”. (Example 3 of data processing means) Interpret the contents of voice as a command to the application system, and respond, for example, to activate a mail system. Processing, (Example 4 of data processing means) Processing that interprets the intention of the voice and responds with a help suggestion such as "Who will you send it to?" Such a plurality of data processing means Data processing results from the output information management means (202) can be sequentially transferred to appropriate output means.
上記実施例の変形例として、 出力情報管理手段(2 0 2 )がデータ処理結果を順 次逐次的に出力手段へ引き渡している最中に、予めデータ処理手段毎に決められた プライオリティ情報に基づいて、出力情報管理手段(2 0 2 )はデータ処理結果を 順次逐次的に出力手段へ引き渡しを中断させても良い。例えば、上記(データ処理 手段の例 1 )から (データ処理手段の例 4 )の順にプライオリティが強くなるよう に決めておくとする。仮に、 (データ処理手段の例 1 )のデータ処理の結果を出力 情報管理手段(2 0 2 )が出力手段へ引き渡している最中に(データ処理手段の例 2 )が出力情報管理手段 ( 2 0 2 ) に引き渡された場合、 これらの 2つのデータ出 力手段のプライオリティ情報を比較し、低い方の(データ処理手段の例 1 )のデー タ処理結果の出力手段への引き渡しを中断し、 (データ処理手段の例 2 )のデータ 処理結果を出力手段へ引き渡し始めることにする。 この最中に、 さらに、 (データ 処理手段の例 4 )が出力情報管理手段 ( 2 0 2 )に引き渡された場合、 同様にして、 プライオリティが低い方の(データ処理手段の例 2 )のデータ処理結果の出力手段 への引き渡しを中断し、 (データ処理手段の例 4 )のデータ処理結果を出力手段へ 引き渡し始めることにする。 この最中に、 (データ処理手段の例 3 )のデータ処理 結果が出力情報管理手段 ( 2 0 2 ) に引き渡された場合は、 (データ処理手段の例 4 )のデータ処理結果よりもプライオリティが低いので、 中断はしない。 このよう に、データ処理手段ごとにプライオリティを決めておくことによって、時間的に前 に出力中のデータを中断させてでも、後で得られるより適切なデータ処理を出力さ せることができる。以上の例では、説明を簡単にするために、データ処理手段ごと にプライオリティを予め決めておいたが、作業の状況や、ユーザからの指定や、入 力されるデータの内容等に応じて、プライオリティ情報を変化させても構わない。 上記実施例のさらなる変形例として、 出力情報管理手段 ( 2 0 2 )がデータ処理 結果を順次逐次的に出力手段へ引き渡している最中に、予め決められた確定指示情 報の入力 (たとえば、 「O k」 ボタンの押下など)が行われた場合に、 出力情報管 理手段(2 0 2 )はそのデータ処理結果を順次逐次的に出力手段へ引き渡しを中断 させても良い。 このとき、確定指示情報に付随する発生時刻、発生場所、発生者な どの情報を利用して本当に中断すべきか否かを判断してもよい。このようにして、 出力情報管理手段(2 0 2 )が不用意にいつまでもデータ処理結果を順次逐次的に 出力手段へ引き渡し続けることを止めることができる。 As a modification of the above-described embodiment, while the output information management means (202) sequentially and sequentially transfers the data processing results to the output means, the output information management means (202) is based on priority information determined in advance for each data processing means. Then, the output information management means (202) may interrupt the transfer of the data processing results to the output means sequentially and sequentially. For example, it is assumed that the priorities are determined in the order of (Example 1 of data processing means) to (Example 4 of data processing means). Assuming that the result of the data processing of (Example 1 of the data processing means) is output to the output means by the information management means (202), the output information management means (2) 0 2), the priority information of these two data output means is compared, and the transfer of the lower (example 1 of the data processing means) data processing result to the output means is interrupted. The data processing result of (Example 2) of data processing means will be started to be delivered to the output means. In the meantime, if (Example 4 of the data processing means) is transferred to the output information management means (202), the data of the lower priority (Example 2 of the data processing means) is similarly processed. The delivery of the processing result to the output means is interrupted, and the delivery of the data processing result of (Example 4 of the data processing means) to the output means is started. During this process, if the data processing result of (Example 3 of data processing means) is passed to the output information management means (202), the priority is higher than that of the data processing result of (Example 4 of data processing means). Because it is low, do not interrupt. In this way, by determining the priority for each data processing means, even if the data being output is interrupted earlier in time, more appropriate data processing obtained later can be output. In the above example, for the sake of simplicity, the priority is determined in advance for each data processing means, but depending on the work situation, designation from the user, the content of the input data, etc. The priority information may be changed. As a further modified example of the above embodiment, the output information management means (202) performs data processing. If the input of predetermined confirmation instruction information (for example, pressing the “Ok” button) is performed while the results are sequentially and sequentially delivered to the output means, the output information management means In (202), the delivery of the data processing results to the output means may be interrupted. At this time, information such as the time of occurrence, the place of occurrence, and the person who accompanied the confirmation instruction information may be used to determine whether or not to actually interrupt. In this way, it is possible to prevent the output information management means (202) from inadvertently continuing to successively deliver data processing results to the output means forever.
上記実施例のさらなる変形例として、 出力情報管理手段(2 0 2 )は、複数のデ —夕処理手段( 1 0 4〜: I 0 6 )からの出力情報を記憶保持する記憶手段(メモリ や外部記憶装置など)に複数の出力情報が蓄えられた場合に、それらの出力情報を 出力する代わりに、 出力情報に対応する選択肢情報を出力手段 ( 1 0 7 - 1 0 9 ) に引渡し、この選択肢情報に対応する予め決められた選択指示情報が再び入力手段 As a further modified example of the above embodiment, the output information management means (202) includes storage means (memory or memory) for storing output information from a plurality of data processing means (104 to: I06). When a plurality of output information is stored in an external storage device, etc., instead of outputting the output information, option information corresponding to the output information is passed to the output means (107-109), and The predetermined selection instruction information corresponding to the option information is input again by the input means.
( 1 0 1〜1 0 3 )から引き渡された場合は、その選択指示情報と選択肢情報とか ら対応する出力情報の全部あるレ、は一部を出力するように作動しても構わな 、。上 記の具体例で説明すると、 出力情報管理手段(2 0 2 ) には、選択肢情報として、(101 to 103), all the output information corresponding to the selection instruction information and the option information may be partially output. Explaining with the above specific example, the output information management means (202) has, as option information,
「選択肢 1 :音声再生」 と 「選択肢 2 :音声認識結果出力」 と 「選択肢 3 :命令と して解釈」 と 「選択肢 4 :意図の解釈」 という 4つの選択肢情報を出力し、ユーザ からの選択を待つ。 ここでは、仮に、 ユーザが選択肢情報の内、 「選択肢 3 :命令 として解釈」を選択指示(このこと自体は入力手段を介して為される) したと仮定 する。 これを受けて出力情報管理手段(2 0 2 ) は、対応する先ほどのデータ処理 結果を出力する。すなわち、例えば、 「メールしろ」 という音声の内容を命令とし て解釈した結果であるメールシステムへの起動命令を出力する処理を実行する。こ のように、一旦、ユーザが入力したデータをそのまま出力する前に、その選択肢情 報を一括して出力してユーザに選択させることで、ユーザの本来意図したデータ処 理結果を選ばせることができ、ユーザはその時の作業に応じた適切なデータ処理を 行うことができる。また、 このようにして出力された選択肢によって、ユーザは対 話処理の途中結果や挙動(音声認識は成功したが意味が理解できていないと力、)が、 具体的に明瞭にわかるようになり、ユーザは自分に合ったデ一夕入力方法などがわ かるようになるという副次的な効果もある。 “Option 1: Voice playback”, “Option 2: Voice recognition result output”, “Option 3: Interpretation as command” and “Option 4: Interpretation of intention” are output, and the user can select. Wait for. Here, it is assumed that the user selects and instructs “option 3: Interpretation as a command” in the option information (this itself is performed via input means). In response, the output information management means (202) outputs the corresponding data processing result. That is, for example, a process of outputting a start command to the mail system, which is a result of interpreting the contents of the voice of “mail” as a command, is executed. In this way, once the data input by the user is output as it is, the option information is output in a lump and the user selects it, thereby allowing the user to select the data processing result originally intended. The user can take appropriate data processing according to the task at that time. It can be carried out. In addition, the options output in this way allow the user to clearly and clearly understand the results and behavior during the conversation processing (the power of speech recognition being successful but not understanding the meaning). There is also a side effect that the user can understand the data input method suitable for the user.
(実施例 3 ) 第 3図は本発明の別の実施例を示す。本実施例は、第 1図や第 2図 おけるデータ処理手段( 104〜 106 )において共通的に利用できるサブ手段が ある場合、 これを一つにまとめた例である。入力手段( 10;!〜 103 )、入力情 報管理手段 (201) 、 出力情報管理手段(202) 、 および、 出力手段 (107 〜109) は、 これまでの説明とほぼ同様なので詳細な説明は省く。  (Embodiment 3) FIG. 3 shows another embodiment of the present invention. This embodiment is an example in which, when there are sub-means that can be commonly used in the data processing means (104 to 106) in FIGS. 1 and 2, these are combined into one. The input means (10;! To 103), the input information management means (201), the output information management means (202), and the output means (107 to 109) are almost the same as those described so far, so the detailed description will be omitted. Omit.
データ処理手段 1 ( 301 )は、サブ手段として入力情報管理手段 (201) と 出力情報管理手段 (202) とからなる。入力情報管理手段(201 ) は、複数の 入力手段( 101〜 103 )から入力情報を受け取り、その入力情報を入力情報認 識手段 (303 )と出力情報管理手段 (202)へ引き渡す。出力情報管理手段 ( 2 02) は、入力情報管理手段 (201) または出力情報合成手段(304 )から弓 I き渡されたデータ処理結果を複数の出力手段 ( 107- 109 )へ引き渡す。  The data processing means 1 (301) is composed of input information management means (201) and output information management means (202) as sub-means. The input information management means (201) receives input information from the plurality of input means (101 to 103) and passes the input information to the input information recognition means (303) and the output information management means (202). The output information management means (202) transfers the data processing result transferred from the input information management means (201) or the output information synthesis means (304) to the plurality of output means (107-109).
データ処理手段 2 (302) は、 データ処理手段 1 (301) のサブ手段(すな わち、 入力情報管理手段 (201) と出力情報管理手段 (202) ) に加え、 サブ 手段として入力情報認識手段 (303 ) と出力情報合成手段 (304) と力、らなる。 入力情報認識手段( 303 ) は、入力情報管理手段(201)から入力情報を受け 取り、認識処理を行い、入力情報の認識結果を意味解析手段 (306 ) と出力情報 合成手段 (304 )へ引き渡す。 出力情報合成手段( 304 ) は、入力情報認識手 段(303 ) またはコマンド生成手段 (307 )からデータ処理結果を受け取り、 合成処理を行い、 そのデータ処理結果を出力情報管理手段 (202)へ引き渡す。 データ処理手段 3 ( 305 ) は、 データ処理手段 2 (302 )のサブ手段(すな わち、入力情報管理手段(201) と出力情報管理手段 (202) と入力情報認識 手段( 303 ) と出力情報合成手段( 304 ) ) に加え、サブ手段として意味解析 手段(306) とコマンド生成手段(307) とからなる。意味解析手段(306 ) は、入力情報認識手段 (303 )から入力情報の認識結果を受け取り、意味解析処 理を行い、 意味解析結果を意図解析手段 (309 ) とコマンド生成手段 ( 307 ) へ引き渡す。 コマンド生成手段( 307 ) は、意味解析手段 (306 ) または応答 立案手段 (310)からデータ処理結果を受け取り、 コマンド生成処理を行い、 そ のデータ処理結果を出力情報合成手段 (304 )へ引き渡す。 The data processing means 2 (302) is a sub-means of the data processing means 1 (301) (that is, an input information management means (201) and an output information management means (202)), and has input information recognition as a sub-means. Means (303) and output information synthesizing means (304). The input information recognizing means (303) receives the input information from the input information managing means (201), performs a recognition process, and delivers the recognition result of the input information to the semantic analyzing means (306) and the output information synthesizing means (304). . The output information synthesizing means (304) receives the data processing result from the input information recognition means (303) or the command generating means (307), performs a synthesizing process, and delivers the data processing result to the output information managing means (202). . The data processing means 3 (305) is a sub-means of the data processing means 2 (302) (ie, input information management means (201), output information management means (202), and input information recognition means). In addition to the means (303) and the output information synthesizing means (304)), there are semantic analysis means (306) and command generation means (307) as sub-means. The semantic analysis means (306) receives the recognition result of the input information from the input information recognition means (303), performs a semantic analysis process, and delivers the semantic analysis result to the intention analysis means (309) and the command generation means (307). . The command generating means (307) receives the data processing result from the semantic analyzing means (306) or the response planning means (310), performs a command generating process, and delivers the data processing result to the output information synthesizing means (304).
データ処理手段 4 (308 ) は、 データ処理手段 3 (305 )のサブ手段(すな わち、入力情報管理手段 (201) と出力情報管理手段 (202) と入力情報認識 手段( 303 ) と出力情報合成手段 (304) と意味解析手段 (306) とコマン ド生成手段 (307) ) に加え、 サブ手段として意図解析手段 (309 ) と応答立 案手段 (310) と力、らなる。意図解析手段( 309 )は、意味解析手段 (306 ) から入力情報の意味解析結果を受け取り、意図解析を行!、、応答立案手段 (310) へ引き渡す。応答立案手段(310) は、意図解析手段 ( 309 )からデータ処理 結果を受け取り、応答立案を行い、そのデータ処理結果をコマンド生成手段(30 The data processing means 4 (308) is a sub-means of the data processing means 3 (305) (ie, input information management means (201), output information management means (202), input information recognition means (303), and output In addition to the information synthesizing means (304), the semantic analyzing means (306), and the command generating means (307)), it is composed of intention analyzing means (309) and response planning means (310) as sub-means. The intention analysis means (309) receives the result of the semantic analysis of the input information from the semantic analysis means (306), analyzes the intention, and delivers the result to the response planning means (310). The response planning means (310) receives the data processing result from the intention analysis means (309), drafts a response, and outputs the data processing result to the command generation means (30).
7)へ引き渡す。 Hand over to 7).
なお、本実施例の説明の主旨は、このようにサブ手段を共通利用しても本発明の 効果が発揮できることの一例であることを示すことであるため、各サブ手段は周知 の技術を利用して構わないし、ここで挙げた以外のサブ手段を利用していても構わ ないし、ここで挙げた構成と異なるデータ処理手段で実施しても構わない。また、 これらの各手段やサブ手段は、それぞれプロセッサや、プロセッサで処理す きデ 一夕を一時的に保持するための記憶手段(メモリや外部記憶装置など)や、他の手 段へデータを引き渡すための通信手段などを備えてレ、て構わな 、。  The purpose of the description of the present embodiment is to show that the effects of the present invention can be exerted even when the sub-units are commonly used in this way, so that each sub-unit uses a known technology. Alternatively, sub-means other than those described here may be used, or data processing means different from the above-described configuration may be used. In addition, each of these means and sub-means is used to store data in a processor, storage means (memory, external storage device, etc.) for temporarily storing data processed by the processor, and other means. It's okay to have communication means for handing over.
従って、本実施例におけるデータ処理手段の処理経路は、実質的に次の 4通りが あることに相当している。 すなわち、 (データ処理手段の処理経路 1 )入力情報管理手段(201 )力、ら、直ちに出力情 報管理手段 (202)へ行く処理経路 Therefore, the processing route of the data processing means in this embodiment is substantially equivalent to the following four routes. That is, (Processing path of data processing means 1) Input information management means (201) Processing path to output information management means (202) immediately
(データ処理手段の処理経路 2 )入力情報管理手段( 201 )から、入力情報認識 手段( 303 )を経、 出力情報合成手段 (304)を経、 出力情報管理手段( 20 2 )へ行く処理経路  (Processing path 2 of data processing means) Processing path from input information management means (201) to input information recognition means (303), to output information synthesis means (304), and to output information management means (202)
(データ処理手段の処理経路 3)入力情報管理手段(201 )から、入力情報認識 手段 (303 ) を経、 意味解析手段 (306) を経、 コマンド生成手段 (307 ) を経、 出力情報合成手段 ( 304 )を経、 出力情報管理手段( 202 )へ行く処理 経路  (Processing path of data processing means 3) From input information management means (201), through input information recognition means (303), through semantic analysis means (306), through command generation means (307), output information synthesis means (304) Processing route to output information management means (202)
(データ処理手段の処理経路 4 )入力情報管理手段( 201 )から、入力情報認識 手段( 303 )を経、意味解析手段(306)を経、意図解析手段 (309)を経、 応答立案手段(310)を経、 コマンド生成手段 (307)を経、 出力情報構成手 段 ( 304 ) を経、 出力情報管理手段 ( 202 )へ行く処理経路  (Processing path 4 of data processing means) From input information management means (201), through input information recognition means (303), through semantic analysis means (306), through intention analysis means (309), and response planning means ( 310), through the command generation means (307), through the output information configuration means (304), and to the output information management means (202).
これらの 4通りの処理経路によって得られる効果は、 (実施例 1 )や(実施例 2 ) の効果に加え、複数のデータ処理手段に亘つて共通的に利用できるサブ手段を一つ にまとめることで、 トータルの計算処理量が節約できるという顕著な効果がある。 The effect obtained by these four processing paths is that the sub-means that can be commonly used across multiple data processing means are combined into one, in addition to the effects of (Embodiment 1) and (Embodiment 2). Thus, there is a remarkable effect that the total amount of calculation processing can be saved.
(実施例 4 ) 第 4図は本発明の別の実施例を示す。本実施例は、第 3図のように 共通のサブ手段を備えつつ、複数のデータ処理手段を同時的に扱うマルチモーダル ィンタフヱース処理またはマルチモーダルィンタラクション処理と呼ばれる実装 例である。音声入力装置(101)は入力手段の一つであり、ユーザからの音声を マイクロホンや送話器等を介して入力する装置である。ストローク入力装置 40 2)は入力手段の一つであり、ユーザの手によるストロ一クをペンおよびタブレツ トゃタツチパネル等を介して入力する装置である。イメージ入力装置(103)は 入力手段の一つであり、ユーザが見ている印刷物等のデータをィメ一ジスキャナや CCDカメラ等を介して入力する装置である。テキスト入力装置(104)は入力 手段の一つであり、ユーザがキーボード等を介してテキストを入力する装置である。 もちろん、この他に、マウスやデータグローブや視線認識装置などの入力手段が上 記入力装置と同様に接続されていても、本発明が実施できることは、当同業者によ れば本明細書を読めば容易に理解されよう。 Embodiment 4 FIG. 4 shows another embodiment of the present invention. This embodiment is an example of implementation called multi-modal interaction processing or multi-modal interaction processing that simultaneously handles a plurality of data processing means while having common sub-means as shown in FIG. The voice input device (101) is one of input means, and is a device for inputting voice from a user via a microphone, a transmitter, or the like. The stroke input device 402) is one of the input means, and is a device for inputting a stroke by a user's hand through a pen, a tablet touch panel, or the like. The image input device (103) is one of input means, and is a device for inputting data of a printed material or the like viewed by a user via a image scanner, a CCD camera, or the like. Text input device (104) input It is one of the means, and is a device in which a user inputs text via a keyboard or the like. Of course, in addition to this, even if input means such as a mouse, a data glove, and a line-of-sight recognition device are connected in the same manner as the above-described input device, it will be understood by those skilled in the art that this specification can be implemented by those skilled in the art. It will be easy to understand if you read it.
データ処理手段 1 (405 )は、サブ手段として入力メディア制御手段(406) と出力メディア制御手段(407 )からなる。入力メディア制御手段(406)は、 入力装置(すなわち、音声入力装置(101) 、 ストローク入力装置(102) 、 ィメージ入力装置 (103)、 および、 テキスト入力装置(104) )を制御し、 音声データや、 ストロークデータ (通常、座標の列)や、 イメージデータや、文字 コードなど、それぞれの入力装置からの生の入力情報を受け取り、形式を整えて、 個別モダリティ認識手段(409) と出力メディア制御手段(407)へ引き渡す。 もちろん、入力装置を制御するために、それぞれの入力装置毎に別の処理プロダラ ムとして実行されていても一向に構わない。 出力メディア制御手段(407)は、 出力装置(すなわち、音声合成装置(418) 、 アイコン制御装置(419)、擬 人化エージェント制御装置(420)、 アプリケーション制御装置(421) )を 制御し、入力メディア制御手段(406)から受け渡された生のデータや、個別モ ダリティ生成手段(410)から受け渡された音声合成装置の制御シーケンスや、 ウィンドウシステムへのイベントシーケンスや、擬人化エージヱン卜への命令や、 アプリケーションのコマンドなどをの出力情報を、それぞれの出力装置へ引き渡す。 もちろん、出力装置を制御するために、それぞれの出力装置毎に別の処理プログラ ムとして実行されていても一向に構わない。  The data processing means 1 (405) includes input media control means (406) and output media control means (407) as sub-means. The input media control means (406) controls the input devices (that is, the voice input device (101), the stroke input device (102), the image input device (103), and the text input device (104)). Receives raw input information from each input device, such as data, stroke data (usually a row of coordinates), image data, character codes, etc., formats the data, controls individual modality recognition means (409) and output media control Hand over to means (407). Of course, in order to control the input devices, it may be executed as a separate processing program for each input device. The output media control means (407) controls output devices (that is, a voice synthesizer (418), an icon control device (419), a human agent control device (420), and an application control device (421)), and The raw data passed from the media control means (406), the control sequence of the speech synthesizer passed from the individual modality generation means (410), the event sequence to the window system, and the anthropomorphic agent And output information such as application commands, etc. to each output device. Of course, in order to control the output devices, it may be executed as a separate processing program for each output device.
データ処理手段 2 (408 )は、 データ処理手段 1 (405 )のサブ手段(すな わち、入力メディア制御手段 (406 ) と出力メディア制御手段(407 ) )に加 え、サブ手段 して個別モダリティ認識手段(409 ) と個別モダリティ生成手段 (410) とからなる。個別モダリティ認識手段(409)は、入力メディア制御 手段(406 )からの入力情報である音声データ、 ストロークデータ、 イメージデ 一夕、 および、文字コードを受け取り、 それぞれ、 音声言語、 手書き文字、 印刷文 字、文字コードとしてそれぞれ認識し、意味解析手段 (415) とモダリティ間認 識調整手段 (412)へ引き渡す。なお、個々のモダリティ毎に好適な認識アルゴ リズムゃ認識単位が異なるため、プログラムとしてはそれぞれ別の実装形態となつ て構わない。 また、 これらの認識処理は既存技術を利用して構わない。個別モダリ ティ生成手段 (410)は、個別モダリティ認識手段 (409)から引き渡された 認識結果や、モダリティ間応答調整手段(413)から引き渡された出力情報を、 音声合成装置のための制御シーケンスや、ウインドウシステムのためのィベントシ 一ケンスや、擬人化エージヱン卜のための命令や、アプリケーションのためのコマ ンドなどをの出力情報へ変換し、 出力メディア制御手段 (407)へ引き渡す。 データ処理手段 3 (411)は、 データ処理手段 2 (408)のサブ手段(すな わち、入カメディァ制御手段(406) と個別モダリティ認識手段(409) と個 別モダリティ生成手段 (410) と出力メディア制御手段 (407) ) に加え、サ ブ手段としてモダリティ間認識調整手段( 412 )とモダリティ間応答調整手段( 4 13) とからなる。 モダリティ間認識調整手段(412)は、個別モダリティ認識 手段(409)からの複数の認識結果に基づいて、適切な組み合わせを決定し、 こ の処理結果を意味解析手段(415) とモダリティ間応答調整手段(413)へ引 き渡す。 なお、適切な組み合わせを決定するための処理は、例えば、各モダリティ に非依存なデータ構造(例えば、 認識ラテイス構造)へ変換し、 「辞書」 と呼ばれ るモダリティごとの構成要素集や、 「文法」 と呼ばれる構成要素の時間的 位置的 な並び規則集をもとに認識結果を評価し、複数の認識結果の組み合わせ候補を決定 することによって為される。 もちろん、本発明を実施するにあたり、 これ以外の処 理の方法を使っていても構わない。 モダリティ間応答調整手段(413)は、 モダ リティ間認識調整手段(412 )から引き渡された適切に組み合わされた認識結果 や、戦略決定手段(417)から引き渡された応答すべき情報を、複数の出力情報 を出力モダリティと出力順序と出力タイミングを考慮しつつ調整し、その調整した 結果を出力情報として個別モダリティ生成手段 (410)へ引き渡す。 The data processing means 2 (408) is separate from the data processing means 1 (405) in addition to the sub-means (that is, the input media control means (406) and the output media control means (407)). Modality recognition means (409) and individual modality generation means (410). Individual modality recognition means (409) controls input media Receiving the voice data, stroke data, image data, and character codes as input information from the means (406), and recognizing them as a voice language, a handwritten character, a print character, and a character code, respectively; (415) and intermodality recognition adjustment means (412). In addition, since the preferred recognition algorithm / recognition unit differs for each modality, different implementation forms may be used as programs. In addition, these recognition processes may use existing technologies. The individual modality generating means (410) converts the recognition result passed from the individual modality recognizing means (409) and the output information passed from the inter-modality response adjusting means (413) into a control sequence for the speech synthesizer. It converts the event sequence for the window system, the command for the anthropomorphic agent, the command for the application, etc. into the output information and transfers it to the output media control means (407). The data processing means 3 (411) is a sub-means of the data processing means 2 (408) (ie, input media control means (406), individual modality recognition means (409), and individual modality generation means (410). In addition to the output media control means (407)), it comprises intermodality recognition adjustment means (412) and intermodality response adjustment means (413) as sub-means. The inter-modality recognition adjustment means (412) determines an appropriate combination based on a plurality of recognition results from the individual modality recognition means (409), and uses this processing result with the semantic analysis means (415) and inter-modality response adjustment. Hand over to means (413). The process for determining an appropriate combination is, for example, conversion into a data structure independent of each modality (for example, a recognition lattice structure), and a collection of components for each modality called a “dictionary”, This is done by evaluating the recognition results based on a set of rules for the temporal and positional arrangement of the components called a grammar, and determining a candidate combination of multiple recognition results. Of course, in carrying out the present invention, other processing methods may be used. The inter-modality response adjusting means (413) is configured to output the appropriately combined recognition result passed from the inter-modality recognition adjusting means (412). Or the information to be responded passed from the strategy determining means (417), adjusting the plurality of output information in consideration of the output modality, the output order and the output timing, and using the adjusted result as the output information as the individual modality generating means. Hand over to (410).
データ処理手段 4 (414)は、 データ処理手段 3 (41 1)のサブ手段(すな わち、入力メディア制御手段 (406 ) と個別モダリティ認識手段 (409) とモ ダリティ間認識調整手段(412) とモダリティ間応答調整手段(413) と個別 モダリティ生成手段 (410) と出力メディア制御手段 (407) ) に加え、サブ 手段として意味解析手段 (415) と目的推定手段 (416) と戦略決定手段(4 17 )と力、らなる。意味解析手段( 415 )は、個別モダリティ認識手段 (409) からの処理結果を受け取り、 これを意味解析し、 目的推定手段(416)へ引き渡 す。 目的推定手段(417)は、意味解析手段(415)から意味解析結果を受け 取り、 その目的を推定し、戦略決定手段 (417)へ引き渡す。 戦略決定手段(4 17)は、 目的推定手段 (417)から目的推定結果を受け取り、応答戦略を決定 してユーザへの応答すべき情報を導出し、モダリティ間応答調整手段(413)へ 引き渡す。  The data processing means 4 (414) is a sub-means of the data processing means 3 (411) (ie, input media control means (406), individual modality recognition means (409), and intermodality recognition adjustment means (412). ), Intermodality response adjustment means (413), individual modality generation means (410), and output media control means (407)), as semantic analysis means (415), objective estimation means (416), and strategy determination means. (4 17) and power. The semantic analyzing means (415) receives the processing result from the individual modality recognizing means (409), analyzes the semantics, and transfers the result to the purpose estimating means (416). The purpose estimating means (417) receives the result of the semantic analysis from the semantic analyzing means (415), estimates the purpose, and transfers it to the strategy determining means (417). The strategy determining means (417) receives the objective estimation result from the objective estimating means (417), determines a response strategy, derives information to be responded to the user, and delivers the information to the inter-modality response adjusting means (413).
音声合成装置(418)は出力手段の一つであり、 出力メディア制御手段(40 7)から受け取った制御シーケンスに基づいて音声合成を行う装置である。アイコ ン制御装置 (419) は出力手段の一つであり、 出力メディア制御手段 (407) から受け取ったィベントシーケンスに基づいてウインドウシステム内のアイコン 等の表示制御などを行う装置である。擬人化エージェント制御装置(420 )は出 力手段の一つであり、出力メディア制御手段(407 )から受け取った命令 基づ いて擬人化エージェントを制御する装置である。アプリケーション制御装置(41 2)は出力手段の一つであり、 出力メディア制御手段(407)から受け取ったァ プリケ一ションのコマンドなどに基づいてアプリケーションを制御する装置であ る。 もちろん、 この他に、プロジェクタや搬送車やロボットアームなどの出力手段 が上記出力装置と同様に接続されていても、本発明が実施できることは、当同業者 によれば本明細書を読めば容易に理解されよう。 The voice synthesizing device (418) is one of the output means, and performs voice synthesis based on the control sequence received from the output media control means (407). The icon control device (419) is one of the output means, and performs a display control of icons and the like in the window system based on the event sequence received from the output media control means (407). The anthropomorphic agent control device (420) is one of the output means, and is a device for controlling the anthropomorphic agent based on an instruction received from the output media control means (407). The application control device (412) is one of output means, and is a device for controlling an application based on an application command or the like received from the output media control means (407). Of course, in addition to this, output means such as projectors, transport vehicles, robot arms, etc. It will be easily understood by those skilled in the art from this specification that the present invention can be practiced even if the devices are connected in the same manner as the output device.
以上のように接銃された複数のデータ処理手段を同時的に扱うマルチモーダル インタフヱース処理またはマルチモーダルインタラクション処理は、 (実施例 1 )、 The multi-modal interface processing or the multi-modal interaction processing that simultaneously handles a plurality of data processing means connected as described above is (Example 1)
(実施例 2 )、 および、 (実施例 3 ) と同様に、単一もしくは複数の入力データを 複数のデータ処理手段によって処理される。すなわち、単一もしくは複数の入力デ 一タが、手間の掛らないサブ手段のみによつて構成されたデータ処理手段を経由し た場合(例えば、 データ処理手段 1 ( 4 0 5 ) のような低水準な処理経路) は、素 早く応答が返るし、複雑な計算を行うためにレスポンスに時間の掛るサブ手段を含 むデータ処理を経由した場合(例えば、データ処理手段 4 ( 4 1 4 )のような高水 準な処理経路)は、それなりの時間の後に応答してくる。ユーザにとって必要な水 準のデータ処理結果が、それなりに見合う時間で出力されるという効果、及び、最 早の応答内容(この場合、 データ処理手段 1 ( 4 0 5 )の処理経路) は比較的短時 間のレスポンスでユーザに応答されるので、本発明を利用した計算機システム全体As in (Embodiment 2) and (Embodiment 3), single or plural input data are processed by plural data processing means. That is, when one or a plurality of input data passes through the data processing means constituted by only the sub-means which does not require much effort (for example, as in the case of the data processing means 1 (405)). In the case of low-level processing paths, if a response is returned quickly and data processing involves sub-means that require time-consuming responses to perform complex calculations (for example, data processing means 4 (4 1 4) (A high level treatment route such as) will respond after some time. The effect that the data processing result of the required level for the user is output in a reasonable time, and the earliest response content (in this case, the processing route of the data processing means 1 (405)) are relatively small. Since the user is responded with a short response time, the entire computer system using the present invention is
(あるいはアプリケーションシステム全体)の無応答時間が非常に短くなり、ユー ザに苛々感ゃ使い勝手の悪さを感じさせることを軽減することができるという顕 著な効果も得る。 The non-response time of the application (or the entire application system) becomes very short, and the remarkable effect of reducing the user's irritability and inconvenience can be reduced.
産業上の利用可能性 Industrial applicability
以上説明したように本発明の実施例によれば、複数のデ一夕処理手段を持つこ とで、簡易な計算処理で済む出力結果は短時間に、複雑な計算処理を要する出力結 果は相応の時間の後に、ユーザに出力することで、ユーザが入力する単一のデ一夕 Z指示から処理時間に見合う出力結果を取得でき、インタラクションの効率を高め ることが可能な対話処理方法および対話処理装置、および、ユーザのニーズに合わ せた出力結果を実時間において容易に選択をすることが可能な対話処理方法およ び対話処理装置を提供することが可能となる。  As described above, according to the embodiment of the present invention, by providing a plurality of data processing means, an output result requiring simple calculation processing can be obtained in a short time, and an output result requiring complicated calculation processing can be obtained. By outputting to the user after an appropriate time, an output result commensurate with the processing time can be obtained from a single data Z command input by the user, and an interaction processing method capable of increasing the efficiency of interaction. It is possible to provide a dialog processing apparatus and a dialog processing method and a dialog processing apparatus capable of easily selecting an output result according to a user's needs in real time.

Claims

請 求 の 範 囲 The scope of the claims
1 . 外部からの情報の入力を受け付ける少なくとも 1っ以上の入カ手段、少なく とも 2つ以上のデータ処理手段、および、外部へ情報を出力する少なくとも 1っ以 上の出力手段を備える計算機システムにおいて、 1. In a computer system having at least one or more input means for receiving input of information from the outside, at least two or more data processing means, and at least one or more output means for outputting information to the outside ,
少なくとも 1つの入力手段は少なくとも 1回の入力情報を受け付けて、該入力情 報を少なくとも 2つのデータ処理手段へ引渡し、該データ処理手段はそれぞれ所定 のデータ処理を並行的に処理し、データ処理の結果である出力情報を少なくとも 1 つの前記出力手段へ引渡し、該出力手段は該出力情報を出力することを特徴とする 対話処理方法。  The at least one input means receives at least one input information, and transfers the input information to at least two data processing means. The data processing means respectively performs predetermined data processing in parallel, and executes the data processing. An interaction processing method, wherein the output information as a result is delivered to at least one of the output means, and the output means outputs the output information.
2. 前記データ処理手段の一つは、上記入力手段を介して受け付けられた入力情 報を、そのまま出力情報として出力する手段であることを特徴とする請求の範囲第 1項記載の対話処理方法。  2. The interactive processing method according to claim 1, wherein one of said data processing means is means for outputting input information received via said input means as output information as it is. .
3 . 前記データ処理手段の一つは、上記入力手段を介して受け付けられた入力情 報を認識し、上記認識結果に基づいて、出力情報を合成することを特徴とする請求 の範囲第 1項記載の対話処理方法。  3. The data processing means according to claim 1, wherein one of the data processing means recognizes input information received via the input means, and synthesizes output information based on the recognition result. The described interactive processing method.
4 . 前記データ処理手段の一つは、  4. One of the data processing means is:
上記入力手段を介して受け付けられた入力情報を認識し、 Recognize the input information received via the input means,
上記認識結果に基づいて、 意味を解析し、 Based on the above recognition results, analyze the meaning,
上記意味解析結果に基づいて、 コマンドを生成し、 Generate a command based on the semantic analysis result above,
上記コマンド生成結果に基づいて、 出力情報を合成する Synthesize output information based on the above command generation result
ことを特徴とする請求の範囲第 1項記載の対話処理方法。 2. The interaction processing method according to claim 1, wherein:
5 . 前記デー夕処理手段の一つは、  5. One of the data processing means is
上記入力手段を介して受け付けられた入力情報を認識し、 Recognize the input information received via the input means,
上記認識結果に基づいて、 意味を解析し、 上記意味解析結果に基づいて、 意図を解析し、 Based on the above recognition results, analyze the meaning, Based on the above semantic analysis results, analyze the intention,
上記意図解析結果に基づいて、 応答を立案し、 Based on the result of the above intention analysis, a response is planned,
上記応答立案結果に基づいて、 コマンドを生成し、 Generates a command based on the response planning result,
上記コマンド生成結果に基づいて、 出力情報を合成する Synthesize output information based on the above command generation result
ことを特徴とする請求の範囲第 1項記載の対話処理方法。 2. The interaction processing method according to claim 1, wherein:
6 . 前記対話処理方法において、前記入力手段と前記データ処理手段との間に入 力情報管理手段を設け、該入力手段からの入力情報を受け取り、前記入力手段毎に 予め与えた分別情報または該入力情報が含む所定の分別情報に基づいて決められ たデータ処理手段へ該入力情報を引き渡すことを特徴とする請求の範囲第 1項記 載の対話処理方法。  6. In the interactive processing method, an input information managing means is provided between the input means and the data processing means, receives input information from the input means, and sorts information provided in advance for each of the input means or 2. The interactive processing method according to claim 1, wherein said input information is passed to data processing means determined based on predetermined classification information included in the input information.
7. 前記対話処理方法において、前記データ処理手段と前記出力手段との間に出 力情報管理手段を設け、該データ処理手段からの出力情報を受け取り、前記出力手 段毎に予め与えた分別情報または該出力情報が含む所定の分別情報に基づいて決 められた出力処理手段へ該出力情報を引き渡すことを特徴とする請求の範囲第 1 項記載の対話処理方法。  7. In the interactive processing method, output information management means is provided between the data processing means and the output means, receives output information from the data processing means, and outputs classification information provided in advance for each of the output means. 2. The interactive processing method according to claim 1, wherein said output information is delivered to output processing means determined based on predetermined classification information included in said output information.
8 . 前記対話処理方法において、前記出力情報管理手段は、前記データ処理手段 から引き渡された出力情報を記憶保持し、複数の出力情報が蓄えられた場合にそれ らの出力情報に基づく選択肢情報を前記出力手段へ引き渡し、選択肢情報に対応し て予め決められた選択指示情報が前記デ一タ処理手段から弓 Iき渡された場合は、該 選択指示情報と該選択肢情報に基づいて該出力情報を全部あるいは一部を出力す るか否かを判断することを特徴とする請求の範囲第 7項記載の対話処理方法。 8. In the interactive processing method, the output information management means stores and holds the output information delivered from the data processing means, and when a plurality of output information are stored, outputs the option information based on the output information. If the selection instruction information is passed to the output means and predetermined selection instruction information corresponding to the option information is delivered from the data processing means, the output information is obtained based on the selection instruction information and the option information. 8. The interactive processing method according to claim 7, wherein it is determined whether to output all or part of the dialogue.
9 . 前記対話処理方法において、前記出力情報管理手段は、前記データ処理手段 から引き渡された出力情報を前記出力手段へ引き渡すことが完了していないうち に次の出力情報が前記データ処理手段から引き渡された場合、前記デー夕処理手段 毎に予め与えたプライオリティ情報に基づいて先行する出力処理を中断させて出 力するか否かを判断することを特徴とする請求の範囲第 7項記載の対話処理方法。9. In the interactive processing method, the output information management unit may receive the next output information from the data processing unit before the output information passed from the data processing unit has been delivered to the output unit. In this case, the preceding output processing is interrupted and output based on the priority information given in advance for each of the data processing means. 8. The dialogue processing method according to claim 7, wherein it is determined whether or not to apply.
1 0. 前記対話処理方法において、前記出力情報管理手段は、前記データ処理手 段から引き渡された出力情報を前記出力手段へ引き渡すことが完了していないう ちに、予め決められた確定指示情報を前記データ処理手段から引き渡された場合は、 該確定指示情報に付随する発生時刻、発生場所、発生者の少なくとも 1つの情報に 基づいて前記出力手段の出力処理を中断させて出力するか否かを判断することを 特徴とする請求の範囲第 7項記載の対話処理方法。 10. In the interactive processing method, the output information management means may determine predetermined instruction information before the output information delivered from the data processing means has been delivered to the output means. Is passed from the data processing means, the output processing of the output means is interrupted or not based on at least one of the occurrence time, the occurrence place, and the generator accompanying the confirmation instruction information. 8. The interactive processing method according to claim 7, wherein:
1 1 . 外部からの情報の入力を受け付ける少なくとも 1つの入力手段、少なくと も 2つのデータ処理手段、および、外部へ情報を出力する少なくとも 1つの出力手 段を備える対話処理装置において、  1 1. A dialogue processing apparatus including at least one input means for receiving input of information from outside, at least two data processing means, and at least one output means for outputting information to the outside,
少なくとも 1つの入力手段は少なくとも 1回の入力情報を受け付けて、該入力情 報を少なくとも 2つのデータ処理手段へ引渡し、該データ処理手段はそれぞれ所定 のデータ処理を並行的に処理し、データ処理の結果である出力情報を少なくとも 1 つの前記出力手段へ引渡し、該出力手段は該出力情報を出力することを特徴とする 対話処理装置。  The at least one input means receives at least one input information, and transfers the input information to at least two data processing means. The data processing means respectively performs predetermined data processing in parallel, and executes the data processing. An interactive processing device, wherein the output information as a result is delivered to at least one of the output means, and the output means outputs the output information.
1 2. 前記デー夕処理手段の一つは、  1 2. One of the data processing means is
上記入力手段を介して受け付けられた入力情報を、そのまま出力情報として出力す る手段を有することを特徴とする請求の範囲第 1 1項記載の対話処理装置。 12. The interactive processing device according to claim 11, further comprising means for outputting the input information received via said input means as output information as it is.
1 3 . 前記データ処理手段の一つは、  1 3. One of the data processing means is:
上記入力手段を介して受け付けられた入力情報を認識する手段と、 Means for recognizing input information received via the input means;
上記認識結果に基づいて、 出力情報を合成する手段 . Means for synthesizing output information based on the recognition result.
とからなることを特徴とする請求の範囲第 1 1項記載の対話処理装置。 The interactive processing device according to claim 11, wherein the interactive processing device comprises:
1 4 . 前記デー夕処理手段の一つは、 1 4. One of the data processing means is
上記入力手段を介して受け付けられた入力情報を認識する手段と、 Means for recognizing input information received via the input means;
上記認識結果に基づいて、 意味を解析する手段と、 上記意味解析結果に基づいて、 コマンドを生成する手段と、 Means for analyzing the meaning based on the recognition result, Means for generating a command based on the semantic analysis result,
上記コマンド生成結果に基づいて、 出力情報を合成する手段と Means for synthesizing output information based on the command generation result,
からなることを特徴とする請求の範囲第 1 1項記載の対話処理装置。 12. The interactive processing device according to claim 11, comprising:
1 5. 前記データ処理手段の一つは、 1 5. One of the data processing means is:
上記入力手段を介して受け付けられた入力情報を認識する手段と、 Means for recognizing input information received via the input means;
上記認識結果に基づいて、 意味を解析する手段と、 Means for analyzing the meaning based on the recognition result,
上記意味解析結果に基づ 、て、 意図を解析する手段と、 Means for analyzing the intention based on the semantic analysis result,
上記意図解析結果に基づいて、 応答を立案する手段と、 Means for drafting a response based on the result of the intention analysis;
上記応答立案結果に基づいて、 コマンドを生成する手段と、 Means for generating a command based on the response planning result;
上記コマンド生成結果に基づいて、 出力情報を合成する手段と Means for synthesizing output information based on the command generation result,
からなることを特徴とする請求の範囲第 1 1項記載の対話処理装置。 12. The interactive processing device according to claim 11, comprising:
1 6 . 前記対話処理装置において、前記入力手段と前記データ処理手段との間に 入力情報管理手段を設け、該入力手段からの入力情報を受け取り、前記入力手段毎 に予め与えた分別情報または該入力情報が含む所定の分別情報に基づいて決めら れたデータ処理手段へ該入力情報を引き渡すことを特徴とする請求の範囲第 1 1 項記載の対話処理装置。 16. In the dialogue processing device, an input information management unit is provided between the input unit and the data processing unit, receives input information from the input unit, and sorts information provided in advance for each input unit or The interactive processing apparatus according to claim 11, wherein said input information is delivered to data processing means determined based on predetermined classification information included in the input information.
1 7 . 前記対話処理装置において、前記データ処理手段と前記出力手段との間に 出力情報管理手段を設け、該データ処理手段からの出力情報を受け取り、前記出力 手段毎に予め与えた分別情報または該出力情報が含む所定の分別情報に基づいて 決められた出力処理手段へ該出力情報を引き渡すことを特徴とする請求の範囲第 1 1項記載の対話処理装置。  17. In the dialogue processing device, an output information management unit is provided between the data processing unit and the output unit, receives output information from the data processing unit, and sorts information provided in advance for each of the output units. 12. The interactive processing apparatus according to claim 11, wherein said output information is delivered to output processing means determined based on predetermined classification information included in said output information.
1 8. 前記対話処理装置において、前記出力情報管理手段は、前記データ処理手 段から引き渡された出力情報を記憶保持し、複数の出力情報が蓄えられた場合にそ れらの出力情報に基づく選択肢情報を前記出力手段へ引き渡し、選択肢情報に対応 して予め決められた選択指示情報が前記データ処理手段から引き渡された場合は、 該選択指示情報と該選択肢情報に基づ 、て該出力情報を全部あるし、は一部を出力 するか否かを判断することを特徴とする請求の範囲第 1 7項記載の対話処理装置。 1 8. In the interactive processing device, the output information management means stores and holds the output information delivered from the data processing means, and based on the output information when a plurality of output information are stored. When the option information is delivered to the output unit, and the predetermined selection instruction information corresponding to the option information is delivered from the data processing unit, 18. The interactive processing apparatus according to claim 17, wherein it is determined whether or not to output all or some of the output information based on the selection instruction information and the option information. .
1 9. 前記対話処理装置において、前記出力情報管理手段は、前記データ処理手 段から引き渡された出力情報を前記出力手段へ引き渡すことが完了していないう ちに次の出力情報が前記データ処理手段から引き渡された場合、前記デ一夕処理手 段毎に予め与えたブラィオリティ情報に基づ t、て先行する出力処理を中断させて 出力するか否かを判断することを特徴とする請求の範囲第 1 7項記載の対話処理 1 9. In the interactive processing device, the output information management means may determine that the next output information has not been transferred to the output means while the output information passed from the data processing means has not been completed. And determining whether or not to interrupt the preceding output processing and output based on the priority information given in advance for each of the data processing means when the data is delivered from the means. Interaction processing described in item 17 of the scope
2 0. 前記対話処理装置において、前記出力情報管理手段は、前記データ処理手 段から引き渡された出力情報を前記出力手段へ引き渡すことが完了していないう ちに、予め決められた確定指示情報を前記デ一タ処理手段から引き渡された場合は、 該確定指示情報に付随する発生時刻、発生場所、発生者の少なくとも 1つの情報に 基づいて前記出力手段の出力処理を中断させて出力するか否かを判断することを 特徴とする請求の範囲第 1 7項記載の対話処理装置。 20. In the interactive processing device, the output information management means may determine predetermined instruction information before the delivery of the output information passed from the data processing means to the output means is completed. Is output from the data processing means, the output processing of the output means is interrupted and output based on at least one of the occurrence time, the occurrence place, and the generator accompanying the confirmation instruction information. 18. The interaction processing device according to claim 17, wherein it is determined whether or not the dialogue processing is performed.
PCT/JP1998/004295 1998-09-25 1998-09-25 Method and apparatus for processing interaction WO2000019307A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/JP1998/004295 WO2000019307A1 (en) 1998-09-25 1998-09-25 Method and apparatus for processing interaction

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/JP1998/004295 WO2000019307A1 (en) 1998-09-25 1998-09-25 Method and apparatus for processing interaction

Publications (1)

Publication Number Publication Date
WO2000019307A1 true WO2000019307A1 (en) 2000-04-06

Family

ID=14209063

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP1998/004295 WO2000019307A1 (en) 1998-09-25 1998-09-25 Method and apparatus for processing interaction

Country Status (1)

Country Link
WO (1) WO2000019307A1 (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2005321730A (en) * 2004-05-11 2005-11-17 Fujitsu Ltd Dialog system, dialog system implementation method, and computer program
JP2016512364A (en) * 2013-03-15 2016-04-25 クアルコム,インコーポレイテッド System and method for switching processing modes using gestures
JP2020507165A (en) * 2017-11-21 2020-03-05 ジョンアン インフォメーション テクノロジー サービシズ カンパニー リミテッド Information processing method and apparatus for data visualization

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPS58219643A (en) * 1982-06-16 1983-12-21 Hitachi Ltd Simulation control system
JPS62213949A (en) * 1986-03-14 1987-09-19 Hitachi Ltd High speed processing device for monitoring process
JPH08263258A (en) * 1995-03-23 1996-10-11 Hitachi Ltd Input device, input method, information processing system and management method for input information

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPS58219643A (en) * 1982-06-16 1983-12-21 Hitachi Ltd Simulation control system
JPS62213949A (en) * 1986-03-14 1987-09-19 Hitachi Ltd High speed processing device for monitoring process
JPH08263258A (en) * 1995-03-23 1996-10-11 Hitachi Ltd Input device, input method, information processing system and management method for input information

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
YASUHARU NANBA, SHUNICHI TANO, HIROYAKI KINUKAWA: "Semantic Analysis Utilizing Fusibility of Specific Attribute of Multimobal Data (in Japanese)", THE TRANSACTION OF INFORMATION PROCESSING SOCIETY OF JAPAN, vol. 38, no. 7, 15 July 1997 (1997-07-15), pages 1441 - 1453, XP002927046 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2005321730A (en) * 2004-05-11 2005-11-17 Fujitsu Ltd Dialog system, dialog system implementation method, and computer program
JP2016512364A (en) * 2013-03-15 2016-04-25 クアルコム,インコーポレイテッド System and method for switching processing modes using gestures
JP2020507165A (en) * 2017-11-21 2020-03-05 ジョンアン インフォメーション テクノロジー サービシズ カンパニー リミテッド Information processing method and apparatus for data visualization

Similar Documents

Publication Publication Date Title
US6513011B1 (en) Multi modal interactive system, method, and medium
JP2001229392A (en) Rational architecture for executing conversational character with communication of small number of messages
US6253176B1 (en) Product including a speech recognition device and method of generating a command lexicon for a speech recognition device
JP2003076389A (en) Information terminal having operation controlled through touch screen or voice recognition and instruction performance method for this information terminal
Marsic et al. Natural communication with information systems
Hasegawa et al. Active agent oriented multimodal interface system
EP1126436A2 (en) Speech recognition from multimodal inputs
US20080104512A1 (en) Method and apparatus for providing realtime feedback in a voice dialog system
JPH11249773A (en) Device and method for multimodal interface
JP3753882B2 (en) Multimodal interface device and multimodal interface method
Corradini et al. A map-based system using speech and 3D gestures for pervasive computing
JP2001100878A (en) Multi-modal input/output device
US20100223548A1 (en) Method for introducing interaction pattern and application functionalities
JP3822357B2 (en) Interface device and method for multimodal input / output device
CN102473047A (en) Auxiliary touch monitor system which enables independent touch input, and independent touch input method for an auxiliary touch monitor
JP2004192653A (en) Multi-modal interface device and multi-modal interface method
JPH06131108A (en) Information input device
WO2000019307A1 (en) Method and apparatus for processing interaction
JP6950708B2 (en) Information processing equipment, information processing methods, and information processing systems
CN117971154A (en) Multimodal response
US20090125640A1 (en) Ultrasmall portable computer apparatus and computing system using the same
JP2985785B2 (en) Human motion dialogue system
TW200844769A (en) Manipulation device using natural language entry
JP2002108388A (en) Interaction device and recording medium recorded with interactive processing program
JP3894767B2 (en) Dialogue device

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A1

Designated state(s): JP US

AL Designated countries for regional patents

Kind code of ref document: A1

Designated state(s): AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE

DFPE Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101)
121 Ep: the epo has been informed by wipo that ep was designated in this application
ENP Entry into the national phase

Ref country code: JP

Ref document number: 2000 572747

Kind code of ref document: A

Format of ref document f/p: F

122 Ep: pct application non-entry in european phase