WO2000019307A1

WO2000019307A1 - Method and apparatus for processing interaction

Info

Publication number: WO2000019307A1
Application number: PCT/JP1998/004295
Authority: WO
Inventors: Yasuharu Nanba; Tomohiro Murata; Hirokazu Aoshima
Original assignee: Hitachi, Ltd.
Priority date: 1998-09-25
Filing date: 1998-09-25
Publication date: 2000-04-06

Abstract

A plurality of data processing means each processing a single data are operated in parallel and a response is sent to the user every time output information is obtained based on the processing of the plurality of data. The processing results are outputted through various paths at a time or sequentially to the user optionally, so that intended data processing results can be selected.

Description

Description Dialogue processing method and dialogue processing device

The present invention relates to a technology for using a user interface such as a personal computer and a portable information device. More specifically, the present invention relates to a microphone, a camera, a pen, a mouse, a keyboard, and the like as input devices for voice, handwritten, and printed characters. It controls computers and application software on behalf of the user based on instructions using multiple input modalities such as image images, gestures, pen gestures, etc., and uses a display, speaker, robot arm, carrier, etc. as an output device and audio. The present invention relates to a dialogue processing method and a dialogue processing device that responds with a plurality of output modalities such as text, stationary surfaces, moving images, and sound effects. Background art

As one of the dialogue processing methods, there is a dialogue process using a multimodal user interface. Examples of conventional technologies include the literature "Neal, JG and Shapiro, SC: Intelligent Multi-media Interface Technology, In Sullivan, JW and Tyler, SW editors, Intelligent User Interfaces, pp. 11-43, ACM Press, Addison- wesley, New York (1991). ", integrates data from multiple input devices, performs analysis and output planning with a single processor, and responds using multiple output devices. It is stated that

However, the dialogue processing method based on the above technology has the following problems. Problems (1) The delay of element processing leads to the delay of the entire interactive processing.

In other words, processes that handle data in an integrated manner, such as semantic analysis and output planning ) Does not include input / output processing, which operates sequentially based on input data. Therefore, the sum of the time required for each processing greatly affects the response performance of the entire interactive processing. In particular, processing that performs highly complex calculations such as semantic analysis usually takes a long time, and the non-response time during that time is a factor that makes the user feel irritated and inconvenient.

Problem (2) It is not possible to select an output result obtained only by low-level processing or a high-level output result.

In other words, since there is only one kind of processing flow, only an output result processed to the same level is always obtained in response to input of certain data. For example, when voice is input via a microphone, there is no room for the user to select an output result for simple recorded data or a response result to a voice instruction interpretation based on a voice recognition result. Alternatively, for example, when a pen input is performed (by handwriting operation) via a pen personal computer or the like, a simple ink display of a handwritten figure (bitmap display), a handwritten character recognition result based on a stroke recognition result, or a further handwritten character recognition result. There is no room for the user to select the output result for the response result to the instruction interpretation based on the handwritten character recognition result. Disclosure of the invention

An object of the present invention is to provide a plurality of data processing means so that an output result requiring simple calculation processing is output to a user in a short time, and an output result requiring complicated calculation processing is output to a user after an appropriate time. In this way, it is possible to obtain output results corresponding to the processing time from a single data / instruction input by the user, and to increase the efficiency of interaction. It is an object of the present invention to provide a dialog processing method and a dialog processing apparatus which can easily select an output result in real time.

Another object of the present invention is to sequentially report the intermediate result to the user, thereby achieving a response-free time. Dialogue processing and dialogue processing that can reduce the anxiety felt by the user by shortening the time, make the behavior of the dialogue processing itself understandable, and allow the user to unconsciously learn an efficient data input method. It is to provide a device.

In order to solve the problem (1), a plurality of data processing means capable of responding to a single data are prepared, and these are operated in parallel. Each time output information based on the plurality of data processes is obtained, a response is output to the user. Therefore, a quick response is returned from the data processing means consisting of only the sub-means that does not require much effort, and the sub-means that take a long time to respond to perform complicated calculations may respond after a certain amount of time. it can. Since the response contents required for the user are returned in a time commensurate with the response time, the no-response time is minimized, and therefore, it is possible to reduce the user's feeling of irritability and inconvenience, and the problem ( 1) is solved.

Next, in order to solve the problem (2), multiple data processing that can respond to a single data is prepared, and the data processing results processed from each route are collectively provided to the user as options. The user can select the intended data processing result by outputting the data or outputting the data sequentially. For example, even if the user simply inputs "Email" to the calculator, various data processings from low-level data processing results to high-level data processing results are conceivable. That is, (1) a process of storing (recording) the voice as an audio waveform data and responding to reproduce and output the data, and (2) recognizing the voice and converting it to a character string of “mail”. The process of responding to the result, (3) the process of interpreting the content of the voice as an instruction to the application system, for example, the response of starting the mail system, and (4) the process of interpreting the intention of the voice, Do you want to reply? Help or a response to reply with a suggestion. Since the user can select the result of such processing, an appropriate overnight processing can be performed according to the work at that time. In this way, the problem (2) is solved.

By the way, the choices output in this way allow the user to The results and behavior (character recognition failed, and voice recognition succeeded, but the meaning was not understood) were clearly and clearly understood, and the user could find a data input method that suits him. You will understand. BRIEF DESCRIPTION OF THE FIGURES

FIG. 1 is a diagram showing Embodiment 1 of the present invention, FIG. 2 is a diagram showing Embodiment 2 of the present invention, FIG. 3 is a diagram showing Embodiment 3 of the present invention, and FIG. FIG. 9 is a diagram showing Example 4. BEST MODE FOR CARRYING OUT THE INVENTION

Hereinafter, embodiments of the present invention will be described with reference to the drawings.

Embodiment 1 FIG. 1 shows an embodiment of the present invention. In the present embodiment, as shown in FIG. 1, a plurality of input means (101 1 103) and data processing of input information from the plurality of input means (101 to 103) are performed. And a plurality of output means (10 7 1) for outputting output information from the plurality of data processing means (10 4 -10 6). 0 9). Each of these means (101111) is realized by combining software and software in a computer system. Some means may be realized by sharing the resources of one computer system. In addition, each of these means (1101109) is connected to a processor, storage means for temporarily storing data to be processed by the processor (such as a memory or an external storage device), and other means. Communication means for transferring data may be provided. Since these processors, storage means, and communication means may use existing technologies used in ordinary computer systems, their details will not be described here. Incidentally, it is desirable that each of these means (101111) be processed independently in parallel or in parallel in order to effectively utilize the present invention. Techniques for processing multiple means in parallel or in parallel are also known as parallel processing and distributed processing. Since the technique may be used, the details will not be described here.

The input means (101 to 103) in the present embodiment includes, in addition to input means of a general computer system (keyboard / mouse, etc.), means for inputting a user's voice such as a microphone or a handset, Touchno ,. A means of inputting user movements, such as a Nerve computer data globe, or input of information (eg, printed matter) or a user (eg, facial expression, gesture, etc.) viewed by the user, such as an image scanner or video camera There are means to do so. Of course, the effects of the present invention are not limited to the input means enumerated in this way. For example, even if a sensor for detecting a user's body temperature, pulse, brain wave, or the like is used as an input means, it will be easily understood by those skilled in the art from the specification that the present invention can be implemented.

The data processing means (104 to 106) in this embodiment receives the input information received by such input means (101 to 103). What is desired here is that, for example, the input information received by a certain input unit 101 is usually transferred to at least two or more data processing units. The present invention can be implemented even when it is delivered to one data processing means, but is not desirable in terms of the effects intended by the present invention. Now, each data processing means independently processes in parallel or in parallel. An example of the data processing means (104 to 106) will be described below. In addition, between the sub-units, which are the components of each data processing unit described below, the processing may be performed via a storage unit or a communication unit in the processing of the computer system.

(Example 1 of data processing means) Input information from input means having an input device such as a microphone, a camera, a pen, a keyboard, etc. is stored in a reception buffer, and the input information is transmitted as output information from a speaker / guffer to a speaker. And output to an output device such as a display with an output device. For example, audio input from a microphone is output from a speaker. Alternatively, the image data input from the camera is displayed on the display. Alternatively, stroke data entered from a pen is displayed in ink format as a trajectory (handwriting). (Example 2 of data processing means) A recognition process is performed on input information input in the same manner as in example 1 of data processing means, and the recognition result is output in the same manner as in example 1 of data processing means. For example, a speech recognition result is output for speech. Alternatively, it outputs an image recognition result from the image data, in particular, an OCR character recognition result of a character image captured in the image data. Alternatively, a handwritten character recognition result is output from the stroke data. Normally, these recognition results are once converted to codes (character codes, etc.) and then output in another modality. For example, the voice input is displayed in characters, or the voice is synthesized and output from the image input. Hereinafter, the term “modality” in the present application is used to mean “the type of exchange channel used by a person and a computer for communication”. Specifically, for input, voice, handwriting stroke, image surface image, keyboard typing, gesture gesture, etc.For output, synthesized voice, character display, graphic display, response to abbreviated program operation, etc. Means

(Example 3 of data processing means) A semantic analysis process is performed on the recognition processing result recognized in the same manner as in Example 2 of the data processing means, and a command is generated based on the semantic analysis result. The output device and the application program are operated and responded in the same manner as in Example 2 of the data processing means based on the data. For example, if you type “Send mail” by voice, a message “Start mail system” appears in the window, and the mail system that is the application system starts.

(Example 4 of data processing means) An intention analysis is performed on the semantic analysis results analyzed in the same manner as in Example 3 of the data processing means, and a response strategy is determined based on the intention analysis results. Then, based on the determined response joint result, the data is output in the same manner as in Example 3 of the overnight processing means. For example, if the character “Return” is input by hand and the voice is also input “How is this method?” At the same time, the response strategy is “Describe the method of returning”. Animated icons and output help messages with synthesized speech synchronized with them. In addition to the above examples 1-4 of the data processing means, there are various data processing means depending on the combination of the sub-means constituting the data processing means.

The output means (107 to 109) in the present embodiment includes, for example, a speech synthesizer, a robot arm, and the like in addition to the output means (display, speaker, etc.) of a normal computer system. In addition, when displaying on a display, there are various formats such as text, still images, moving images, icons, and combinations thereof. When using a speaker for output, there are also modes such as BGM, sound effects, voice, and combinations thereof. In the present invention, these may be divided into different output units, or some of the methods may be combined and implemented as one output unit. Further, the effects of the present invention are not limited to the output means thus enumerated. For example, a response may be indirectly output by controlling the application system or issuing a command to the database or the database system. In this sense, it will be easily understood by those skilled in the art from this specification that the present invention can be implemented even if the television air conditioner is used as an output unit.

As described above, by performing data processing independently and in parallel by a plurality of data processing means, it is possible to reduce the time and effort required for the data processing means constituted by only the sub-means. If it goes through, it will return a quick response, and if it goes through data processing that involves sub-means that require a long time to perform complicated calculations, it will respond after a certain amount of time. Therefore, the response contents required for the user are always output in a time commensurate with the effect, and the earliest response contents are presented to the user for the time being, so that the entire computer system (or the entire application system) is not available! The answer time can be minimized, and the user can be less irritated and less likely to feel uncomfortable. Note that, as a simple modification of the present invention, the same effect as described above can be obtained when the data processing means performs processing for outputting an intermediate result of the sub-means. Also, as described above, various interpretation processes on the computer system side are performed. By notifying the user of the results of the process one after another, the non-response time can be shortened to reduce the anxiety felt by the user, and the behavior of the dialog processing itself can be understood. Further, by receiving such a response, the user can learn what input means can be used to obtain a desired computer system response.

As described above, the present invention is applied to a multi-modal interface system or a multi-modal interactive system that simultaneously inputs a plurality of pieces of information to be input and simultaneously outputs a plurality of pieces of information to be output. Particularly effective effects can be exhibited. Needless to say, an interface in which a plurality of pieces of information are sequentially input (or output) while being switched for each input device (or output device) corresponding to the information, or an interface for inputting and outputting information prepared in advance (for example, multimedia) Interface, graphical user interface, command line interface, etc.) and interactive systems. As can be seen from FIG. 1, in this embodiment, each means can be implemented by an independent program (or device). Each means can be individually distributed as a computer system. Such implementations can usually directly use techniques such as distributed processing and age-oriented programming. These techniques are well known to those skilled in the art and will not be described further.

(Embodiment 2) FIG. 2 shows another embodiment of the present invention. In this embodiment, as shown in FIG. 2, a plurality of input means (101 to 103) and a plurality of these input means (101 to

Input information management means (210) for receiving input information from the input information processing means (210), and a plurality of data processing means (104-104) for processing the input information from the input information management means (201).

106), output information management means (202) for receiving output information from the plurality of data processing means (104 to 106), and output information from the output information management means. It comprises a plurality of output means (107 to 109). Main differences from Example 1 The difference is that an input information management means (201) and an output information management means (202) are provided. The other means are the same as those described in the first embodiment, and therefore will not be described here. Each of these means includes a processor, storage means for temporarily storing data to be processed by the processor (such as a memory and an external storage device), and communication means for transferring data to other means. May be provided.

The input information management means (201) in this embodiment receives all input information from the plurality of input means (101 to 103). The input information is passed to an appropriate (single or plural) data processing means based on classification information determined in advance for each input means or separation information derived from the input information. The classification information determined in advance for each input means is information on the data processing means to be delivered according to the input means. For example, in the case of an input means such as a microphone, the data processing means (104-: L06) is associated in advance with data processing means including speech recognition and speaker recognition from among the data processing means (104-: L06), and data which does not include these is stored. It is determined in advance that there is no correspondence with the processing means (for example, a data processing means comprising only character recognition means and the like). Further, the classification information that can be derived from the input information is information relating to a data processing unit to which the input information is to be delivered to specific input information in advance. For example, in the case of input information having an unambiguous intention such as an emergency stop, it is determined that the data processing means (104 to 106) should be preferentially delivered to a specific data processing means. . By using the classification information as described above, the input information is delivered to an appropriate data processing unit for each input unit. In other words, it is possible to avoid malfunction of the data processing means and reduce the value of error check processing and error processing.

The output information management means (202) in the present embodiment receives all the output information from the plurality of data processing means (104 to 106). The output information is appropriately determined based on the classification information determined in advance for each output means and the classification information derived from the output information. To one or more output means. Information about the data processing means to be delivered according to the output means. For example, in the case of an output unit called a speaker, a data processing unit including voice synthesis and sound effect synthesis is previously associated with one of the data processing units (104 to 106), and a data processing unit that does not include these is provided. It is determined in advance that there is no correspondence with the means (for example, the data processing means consisting only of the moving image generation means). Further, the classification information that can be derived from the output information is information on an output unit to which the output information is to be delivered to specific output information in advance. For example, in the case of output information having a unique control content such as a reduction in volume, it is determined that output information (107 to 109) should be preferentially delivered to a specific output means. By using the classification information as described above, the output information is delivered to an appropriate data processing unit for each output unit. That is, it is possible to avoid a malfunction of the output unit and reduce the load of the error check processing and the error processing.

From the above description, it is possible to prepare a plurality of data processes that can respond to a single data, and to sequentially output the respective data processing results processed from each path. For example, when the user inputs a voice saying “Send mail” through a microphone that is an input means (for example, 101), the input information management means (201) receives this as audio waveform data, and To the data processing means (104 to 106). This data processing means performs, for example, the following (Example 1 of data processing means) to (Example 4 of data processing means). (Example 1 of data processing means) A process of storing (recording) the voice as voice waveform data and responding to reproduce and output the data. (Example 2 of data processing means) Recognizing voice, A process of responding to the result of conversion into a character string of “mail”. (Example 3 of data processing means) Interpret the contents of voice as a command to the application system, and respond, for example, to activate a mail system. Processing, (Example 4 of data processing means) Processing that interprets the intention of the voice and responds with a help suggestion such as "Who will you send it to?" Such a plurality of data processing means Data processing results from the output information management means (202) can be sequentially transferred to appropriate output means.

As a modification of the above-described embodiment, while the output information management means (202) sequentially and sequentially transfers the data processing results to the output means, the output information management means (202) is based on priority information determined in advance for each data processing means. Then, the output information management means (202) may interrupt the transfer of the data processing results to the output means sequentially and sequentially. For example, it is assumed that the priorities are determined in the order of (Example 1 of data processing means) to (Example 4 of data processing means). Assuming that the result of the data processing of (Example 1 of the data processing means) is output to the output means by the information management means (202), the output information management means (2) 0 2), the priority information of these two data output means is compared, and the transfer of the lower (example 1 of the data processing means) data processing result to the output means is interrupted. The data processing result of (Example 2) of data processing means will be started to be delivered to the output means. In the meantime, if (Example 4 of the data processing means) is transferred to the output information management means (202), the data of the lower priority (Example 2 of the data processing means) is similarly processed. The delivery of the processing result to the output means is interrupted, and the delivery of the data processing result of (Example 4 of the data processing means) to the output means is started. During this process, if the data processing result of (Example 3 of data processing means) is passed to the output information management means (202), the priority is higher than that of the data processing result of (Example 4 of data processing means). Because it is low, do not interrupt. In this way, by determining the priority for each data processing means, even if the data being output is interrupted earlier in time, more appropriate data processing obtained later can be output. In the above example, for the sake of simplicity, the priority is determined in advance for each data processing means, but depending on the work situation, designation from the user, the content of the input data, etc. The priority information may be changed. As a further modified example of the above embodiment, the output information management means (202) performs data processing. If the input of predetermined confirmation instruction information (for example, pressing the “Ok” button) is performed while the results are sequentially and sequentially delivered to the output means, the output information management means In (202), the delivery of the data processing results to the output means may be interrupted. At this time, information such as the time of occurrence, the place of occurrence, and the person who accompanied the confirmation instruction information may be used to determine whether or not to actually interrupt. In this way, it is possible to prevent the output information management means (202) from inadvertently continuing to successively deliver data processing results to the output means forever.

As a further modified example of the above embodiment, the output information management means (202) includes storage means (memory or memory) for storing output information from a plurality of data processing means (104 to: I06). When a plurality of output information is stored in an external storage device, etc., instead of outputting the output information, option information corresponding to the output information is passed to the output means (107-109), and The predetermined selection instruction information corresponding to the option information is input again by the input means.

(101 to 103), all the output information corresponding to the selection instruction information and the option information may be partially output. Explaining with the above specific example, the output information management means (202) has, as option information,

“Option 1: Voice playback”, “Option 2: Voice recognition result output”, “Option 3: Interpretation as command” and “Option 4: Interpretation of intention” are output, and the user can select. Wait for. Here, it is assumed that the user selects and instructs “option 3: Interpretation as a command” in the option information (this itself is performed via input means). In response, the output information management means (202) outputs the corresponding data processing result. That is, for example, a process of outputting a start command to the mail system, which is a result of interpreting the contents of the voice of “mail” as a command, is executed. In this way, once the data input by the user is output as it is, the option information is output in a lump and the user selects it, thereby allowing the user to select the data processing result originally intended. The user can take appropriate data processing according to the task at that time. It can be carried out. In addition, the options output in this way allow the user to clearly and clearly understand the results and behavior during the conversation processing (the power of speech recognition being successful but not understanding the meaning). There is also a side effect that the user can understand the data input method suitable for the user.

(Embodiment 3) FIG. 3 shows another embodiment of the present invention. This embodiment is an example in which, when there are sub-means that can be commonly used in the data processing means (104 to 106) in FIGS. 1 and 2, these are combined into one. The input means (10;! To 103), the input information management means (201), the output information management means (202), and the output means (107 to 109) are almost the same as those described so far, so the detailed description will be omitted. Omit.

The data processing means 1 (301) is composed of input information management means (201) and output information management means (202) as sub-means. The input information management means (201) receives input information from the plurality of input means (101 to 103) and passes the input information to the input information recognition means (303) and the output information management means (202). The output information management means (202) transfers the data processing result transferred from the input information management means (201) or the output information synthesis means (304) to the plurality of output means (107-109).

The data processing means 2 (302) is a sub-means of the data processing means 1 (301) (that is, an input information management means (201) and an output information management means (202)), and has input information recognition as a sub-means. Means (303) and output information synthesizing means (304). The input information recognizing means (303) receives the input information from the input information managing means (201), performs a recognition process, and delivers the recognition result of the input information to the semantic analyzing means (306) and the output information synthesizing means (304). . The output information synthesizing means (304) receives the data processing result from the input information recognition means (303) or the command generating means (307), performs a synthesizing process, and delivers the data processing result to the output information managing means (202). . The data processing means 3 (305) is a sub-means of the data processing means 2 (302) (ie, input information management means (201), output information management means (202), and input information recognition means). In addition to the means (303) and the output information synthesizing means (304)), there are semantic analysis means (306) and command generation means (307) as sub-means. The semantic analysis means (306) receives the recognition result of the input information from the input information recognition means (303), performs a semantic analysis process, and delivers the semantic analysis result to the intention analysis means (309) and the command generation means (307). . The command generating means (307) receives the data processing result from the semantic analyzing means (306) or the response planning means (310), performs a command generating process, and delivers the data processing result to the output information synthesizing means (304).

The data processing means 4 (308) is a sub-means of the data processing means 3 (305) (ie, input information management means (201), output information management means (202), input information recognition means (303), and output In addition to the information synthesizing means (304), the semantic analyzing means (306), and the command generating means (307)), it is composed of intention analyzing means (309) and response planning means (310) as sub-means. The intention analysis means (309) receives the result of the semantic analysis of the input information from the semantic analysis means (306), analyzes the intention, and delivers the result to the response planning means (310). The response planning means (310) receives the data processing result from the intention analysis means (309), drafts a response, and outputs the data processing result to the command generation means (30).

Hand over to 7).

The purpose of the description of the present embodiment is to show that the effects of the present invention can be exerted even when the sub-units are commonly used in this way, so that each sub-unit uses a known technology. Alternatively, sub-means other than those described here may be used, or data processing means different from the above-described configuration may be used. In addition, each of these means and sub-means is used to store data in a processor, storage means (memory, external storage device, etc.) for temporarily storing data processed by the processor, and other means. It's okay to have communication means for handing over.

Therefore, the processing route of the data processing means in this embodiment is substantially equivalent to the following four routes. That is, (Processing path of data processing means 1) Input information management means (201) Processing path to output information management means (202) immediately

(Processing path 2 of data processing means) Processing path from input information management means (201) to input information recognition means (303), to output information synthesis means (304), and to output information management means (202)

(Processing path of data processing means 3) From input information management means (201), through input information recognition means (303), through semantic analysis means (306), through command generation means (307), output information synthesis means (304) Processing route to output information management means (202)

(Processing path 4 of data processing means) From input information management means (201), through input information recognition means (303), through semantic analysis means (306), through intention analysis means (309), and response planning means ( 310), through the command generation means (307), through the output information configuration means (304), and to the output information management means (202).

The effect obtained by these four processing paths is that the sub-means that can be commonly used across multiple data processing means are combined into one, in addition to the effects of (Embodiment 1) and (Embodiment 2). Thus, there is a remarkable effect that the total amount of calculation processing can be saved.

Embodiment 4 FIG. 4 shows another embodiment of the present invention. This embodiment is an example of implementation called multi-modal interaction processing or multi-modal interaction processing that simultaneously handles a plurality of data processing means while having common sub-means as shown in FIG. The voice input device (101) is one of input means, and is a device for inputting voice from a user via a microphone, a transmitter, or the like. The stroke input device 402) is one of the input means, and is a device for inputting a stroke by a user's hand through a pen, a tablet touch panel, or the like. The image input device (103) is one of input means, and is a device for inputting data of a printed material or the like viewed by a user via a image scanner, a CCD camera, or the like. Text input device (104) input It is one of the means, and is a device in which a user inputs text via a keyboard or the like. Of course, in addition to this, even if input means such as a mouse, a data glove, and a line-of-sight recognition device are connected in the same manner as the above-described input device, it will be understood by those skilled in the art that this specification can be implemented by those skilled in the art. It will be easy to understand if you read it.

The data processing means 1 (405) includes input media control means (406) and output media control means (407) as sub-means. The input media control means (406) controls the input devices (that is, the voice input device (101), the stroke input device (102), the image input device (103), and the text input device (104)). Receives raw input information from each input device, such as data, stroke data (usually a row of coordinates), image data, character codes, etc., formats the data, controls individual modality recognition means (409) and output media control Hand over to means (407). Of course, in order to control the input devices, it may be executed as a separate processing program for each input device. The output media control means (407) controls output devices (that is, a voice synthesizer (418), an icon control device (419), a human agent control device (420), and an application control device (421)), and The raw data passed from the media control means (406), the control sequence of the speech synthesizer passed from the individual modality generation means (410), the event sequence to the window system, and the anthropomorphic agent And output information such as application commands, etc. to each output device. Of course, in order to control the output devices, it may be executed as a separate processing program for each output device.

The data processing means 2 (408) is separate from the data processing means 1 (405) in addition to the sub-means (that is, the input media control means (406) and the output media control means (407)). Modality recognition means (409) and individual modality generation means (410). Individual modality recognition means (409) controls input media Receiving the voice data, stroke data, image data, and character codes as input information from the means (406), and recognizing them as a voice language, a handwritten character, a print character, and a character code, respectively; (415) and intermodality recognition adjustment means (412). In addition, since the preferred recognition algorithm / recognition unit differs for each modality, different implementation forms may be used as programs. In addition, these recognition processes may use existing technologies. The individual modality generating means (410) converts the recognition result passed from the individual modality recognizing means (409) and the output information passed from the inter-modality response adjusting means (413) into a control sequence for the speech synthesizer. It converts the event sequence for the window system, the command for the anthropomorphic agent, the command for the application, etc. into the output information and transfers it to the output media control means (407). The data processing means 3 (411) is a sub-means of the data processing means 2 (408) (ie, input media control means (406), individual modality recognition means (409), and individual modality generation means (410). In addition to the output media control means (407)), it comprises intermodality recognition adjustment means (412) and intermodality response adjustment means (413) as sub-means. The inter-modality recognition adjustment means (412) determines an appropriate combination based on a plurality of recognition results from the individual modality recognition means (409), and uses this processing result with the semantic analysis means (415) and inter-modality response adjustment. Hand over to means (413). The process for determining an appropriate combination is, for example, conversion into a data structure independent of each modality (for example, a recognition lattice structure), and a collection of components for each modality called a “dictionary”, This is done by evaluating the recognition results based on a set of rules for the temporal and positional arrangement of the components called a grammar, and determining a candidate combination of multiple recognition results. Of course, in carrying out the present invention, other processing methods may be used. The inter-modality response adjusting means (413) is configured to output the appropriately combined recognition result passed from the inter-modality recognition adjusting means (412). Or the information to be responded passed from the strategy determining means (417), adjusting the plurality of output information in consideration of the output modality, the output order and the output timing, and using the adjusted result as the output information as the individual modality generating means. Hand over to (410).

The data processing means 4 (414) is a sub-means of the data processing means 3 (411) (ie, input media control means (406), individual modality recognition means (409), and intermodality recognition adjustment means (412). ), Intermodality response adjustment means (413), individual modality generation means (410), and output media control means (407)), as semantic analysis means (415), objective estimation means (416), and strategy determination means. (4 17) and power. The semantic analyzing means (415) receives the processing result from the individual modality recognizing means (409), analyzes the semantics, and transfers the result to the purpose estimating means (416). The purpose estimating means (417) receives the result of the semantic analysis from the semantic analyzing means (415), estimates the purpose, and transfers it to the strategy determining means (417). The strategy determining means (417) receives the objective estimation result from the objective estimating means (417), determines a response strategy, derives information to be responded to the user, and delivers the information to the inter-modality response adjusting means (413).

The voice synthesizing device (418) is one of the output means, and performs voice synthesis based on the control sequence received from the output media control means (407). The icon control device (419) is one of the output means, and performs a display control of icons and the like in the window system based on the event sequence received from the output media control means (407). The anthropomorphic agent control device (420) is one of the output means, and is a device for controlling the anthropomorphic agent based on an instruction received from the output media control means (407). The application control device (412) is one of output means, and is a device for controlling an application based on an application command or the like received from the output media control means (407). Of course, in addition to this, output means such as projectors, transport vehicles, robot arms, etc. It will be easily understood by those skilled in the art from this specification that the present invention can be practiced even if the devices are connected in the same manner as the output device.

The multi-modal interface processing or the multi-modal interaction processing that simultaneously handles a plurality of data processing means connected as described above is (Example 1)

As in (Embodiment 2) and (Embodiment 3), single or plural input data are processed by plural data processing means. That is, when one or a plurality of input data passes through the data processing means constituted by only the sub-means which does not require much effort (for example, as in the case of the data processing means 1 (405)). In the case of low-level processing paths, if a response is returned quickly and data processing involves sub-means that require time-consuming responses to perform complex calculations (for example, data processing means 4 (4 1 4) (A high level treatment route such as) will respond after some time. The effect that the data processing result of the required level for the user is output in a reasonable time, and the earliest response content (in this case, the processing route of the data processing means 1 (405)) are relatively small. Since the user is responded with a short response time, the entire computer system using the present invention is

The non-response time of the application (or the entire application system) becomes very short, and the remarkable effect of reducing the user's irritability and inconvenience can be reduced.

Industrial applicability

As described above, according to the embodiment of the present invention, by providing a plurality of data processing means, an output result requiring simple calculation processing can be obtained in a short time, and an output result requiring complicated calculation processing can be obtained. By outputting to the user after an appropriate time, an output result commensurate with the processing time can be obtained from a single data Z command input by the user, and an interaction processing method capable of increasing the efficiency of interaction. It is possible to provide a dialog processing apparatus and a dialog processing method and a dialog processing apparatus capable of easily selecting an output result according to a user's needs in real time.

Claims

The scope of the claims

1. In a computer system having at least one or more input means for receiving input of information from the outside, at least two or more data processing means, and at least one or more output means for outputting information to the outside ,

The at least one input means receives at least one input information, and transfers the input information to at least two data processing means. The data processing means respectively performs predetermined data processing in parallel, and executes the data processing. An interaction processing method, wherein the output information as a result is delivered to at least one of the output means, and the output means outputs the output information.

2. The interactive processing method according to claim 1, wherein one of said data processing means is means for outputting input information received via said input means as output information as it is. .

3. The data processing means according to claim 1, wherein one of the data processing means recognizes input information received via the input means, and synthesizes output information based on the recognition result. The described interactive processing method.

4. One of the data processing means is:

Recognize the input information received via the input means,

Based on the above recognition results, analyze the meaning,

Generate a command based on the semantic analysis result above,

Synthesize output information based on the above command generation result

2. The interaction processing method according to claim 1, wherein:

5. One of the data processing means is

Recognize the input information received via the input means,

Based on the above recognition results, analyze the meaning, Based on the above semantic analysis results, analyze the intention,

Based on the result of the above intention analysis, a response is planned,

Generates a command based on the response planning result,

Synthesize output information based on the above command generation result

2. The interaction processing method according to claim 1, wherein:

6. In the interactive processing method, an input information managing means is provided between the input means and the data processing means, receives input information from the input means, and sorts information provided in advance for each of the input means or 2. The interactive processing method according to claim 1, wherein said input information is passed to data processing means determined based on predetermined classification information included in the input information.

7. In the interactive processing method, output information management means is provided between the data processing means and the output means, receives output information from the data processing means, and outputs classification information provided in advance for each of the output means. 2. The interactive processing method according to claim 1, wherein said output information is delivered to output processing means determined based on predetermined classification information included in said output information.

8. In the interactive processing method, the output information management means stores and holds the output information delivered from the data processing means, and when a plurality of output information are stored, outputs the option information based on the output information. If the selection instruction information is passed to the output means and predetermined selection instruction information corresponding to the option information is delivered from the data processing means, the output information is obtained based on the selection instruction information and the option information. 8. The interactive processing method according to claim 7, wherein it is determined whether to output all or part of the dialogue.

9. In the interactive processing method, the output information management unit may receive the next output information from the data processing unit before the output information passed from the data processing unit has been delivered to the output unit. In this case, the preceding output processing is interrupted and output based on the priority information given in advance for each of the data processing means. 8. The dialogue processing method according to claim 7, wherein it is determined whether or not to apply.

10. In the interactive processing method, the output information management means may determine predetermined instruction information before the output information delivered from the data processing means has been delivered to the output means. Is passed from the data processing means, the output processing of the output means is interrupted or not based on at least one of the occurrence time, the occurrence place, and the generator accompanying the confirmation instruction information. 8. The interactive processing method according to claim 7, wherein:

1 1. A dialogue processing apparatus including at least one input means for receiving input of information from outside, at least two data processing means, and at least one output means for outputting information to the outside,

The at least one input means receives at least one input information, and transfers the input information to at least two data processing means. The data processing means respectively performs predetermined data processing in parallel, and executes the data processing. An interactive processing device, wherein the output information as a result is delivered to at least one of the output means, and the output means outputs the output information.

1 2. One of the data processing means is

12. The interactive processing device according to claim 11, further comprising means for outputting the input information received via said input means as output information as it is.

1 3. One of the data processing means is:

Means for recognizing input information received via the input means;

Means for synthesizing output information based on the recognition result.

The interactive processing device according to claim 11, wherein the interactive processing device comprises:

1 4. One of the data processing means is

Means for recognizing input information received via the input means;

Means for analyzing the meaning based on the recognition result, Means for generating a command based on the semantic analysis result,

Means for synthesizing output information based on the command generation result,

12. The interactive processing device according to claim 11, comprising:

1 5. One of the data processing means is:

Means for recognizing input information received via the input means;

Means for analyzing the meaning based on the recognition result,

Means for analyzing the intention based on the semantic analysis result,

Means for drafting a response based on the result of the intention analysis;

Means for generating a command based on the response planning result;

12. The interactive processing device according to claim 11, comprising:

16. In the dialogue processing device, an input information management unit is provided between the input unit and the data processing unit, receives input information from the input unit, and sorts information provided in advance for each input unit or The interactive processing apparatus according to claim 11, wherein said input information is delivered to data processing means determined based on predetermined classification information included in the input information.

17. In the dialogue processing device, an output information management unit is provided between the data processing unit and the output unit, receives output information from the data processing unit, and sorts information provided in advance for each of the output units. 12. The interactive processing apparatus according to claim 11, wherein said output information is delivered to output processing means determined based on predetermined classification information included in said output information.

1 8. In the interactive processing device, the output information management means stores and holds the output information delivered from the data processing means, and based on the output information when a plurality of output information are stored. When the option information is delivered to the output unit, and the predetermined selection instruction information corresponding to the option information is delivered from the data processing unit, 18. The interactive processing apparatus according to claim 17, wherein it is determined whether or not to output all or some of the output information based on the selection instruction information and the option information. .

1 9. In the interactive processing device, the output information management means may determine that the next output information has not been transferred to the output means while the output information passed from the data processing means has not been completed. And determining whether or not to interrupt the preceding output processing and output based on the priority information given in advance for each of the data processing means when the data is delivered from the means. Interaction processing described in item 17 of the scope

20. In the interactive processing device, the output information management means may determine predetermined instruction information before the delivery of the output information passed from the data processing means to the output means is completed. Is output from the data processing means, the output processing of the output means is interrupted and output based on at least one of the occurrence time, the occurrence place, and the generator accompanying the confirmation instruction information. 18. The interaction processing device according to claim 17, wherein it is determined whether or not the dialogue processing is performed.