JP4728905B2 - Spoken dialogue apparatus and spoken dialogue program - Google Patents

Spoken dialogue apparatus and spoken dialogue program Download PDF

Info

Publication number
JP4728905B2
JP4728905B2 JP2006211166A JP2006211166A JP4728905B2 JP 4728905 B2 JP4728905 B2 JP 4728905B2 JP 2006211166 A JP2006211166 A JP 2006211166A JP 2006211166 A JP2006211166 A JP 2006211166A JP 4728905 B2 JP4728905 B2 JP 4728905B2
Authority
JP
Japan
Prior art keywords
response
paraphrase
type
keyword
voice
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
JP2006211166A
Other languages
Japanese (ja)
Other versions
JP2008039928A (en
Inventor
浩彦 佐川
実 冨樫
健 大野
浩明 小窪
大介 斎藤
健 本間
景子 桂川
信夫 畑岡
久 高橋
Original Assignee
クラリオン株式会社
日産自動車株式会社
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by クラリオン株式会社, 日産自動車株式会社 filed Critical クラリオン株式会社
Priority to JP2006211166A priority Critical patent/JP4728905B2/en
Publication of JP2008039928A publication Critical patent/JP2008039928A/en
Application granted granted Critical
Publication of JP4728905B2 publication Critical patent/JP4728905B2/en
Application status is Active legal-status Critical
Anticipated expiration legal-status Critical

Links

Images

Description

  The present invention relates to a voice dialogue apparatus and a program therefor for smoothly performing exchange between a user and various devices by voice dialogue.

A number of technologies related to a voice dialogue system that performs dialogue with a user and provides information and services required by the user have been proposed. In order to communicate smoothly with the user through the voice dialogue system, in addition to being able to correctly interpret the voice input by the user, an appropriate response sentence is presented to the user in response to the voice input by the user. It is important to make it easy for the user to input voice following the response sentence.

“Patent Document 1” discloses a technique for preparing a keyword knowledge base storing a first keyword to be recognized and a second keyword associated with the first keyword and a slot item (keyword type). Has been. A second keyword corresponding to the first keyword extracted from the input speech is selected from the keyword knowledge base and recorded for each slot item. A response sentence is generated based on the storage status of each second keyword for each slot item.
In “Patent Document 2”, words recognized from the input speech are classified into categories and classes, and further, a probability that indicates which class of words is recognized is obtained from the reliability of the words recognized from the input speech. Technology is disclosed. An utterance type such as refinement, answer correction, or re-input is determined based on the obtained probability, and a response sentence is generated. The response sentence is generated by inserting a category word or category class into the response sentence pattern.

The prior art discloses a technique for changing a keyword recognized from a voice input by a user based on a correspondence relationship between a keyword prepared in advance and a paraphrase and replacing the keyword in a response sentence.
Also, a technique for determining whether or not to change a keyword and a technique for determining a response sentence pattern are shown based on the reliability of the recognized keyword and the conversation history.
JP 2005-301138 A JP 2004-251998 A

However, in the prior art, what kind of word is changed for a certain keyword is uniquely determined, and the rule for changing the keyword based on the reliability or the like or determining the response sentence is as follows: A common rule is set and processed using it.
On the other hand, an appropriate change method and a response sentence format when inserting a certain keyword into a response sentence may change depending on various situations, for example, a case where a plurality of input keywords exist, in addition to the keyword itself. For this reason, there is a problem that the conventional technique that sets and uses a common rule cannot flexibly change and determine the response sentence or the corresponding process according to the phonological characteristics of each keyword. As a result, the conventional technology cannot generate an appropriate response sentence.

  Accordingly, an object of the present invention is to provide a voice interaction apparatus and a voice interaction program that can generate an appropriate response sentence.

In order to solve the above problems, in the present invention, for a keyword that is a recognition target, a paraphrase used when the keyword is inserted in a response sentence, a response type indicating the type of the response sentence, and a paraphrase Record the terms and conditions under which the response type is selected. In addition, a response sentence template representing a response sentence format is prepared for each response type.
The paraphrase word and the response type for the recognized keyword are determined based on the condition for selecting the paraphrase word and the response type, and the response sentence template is searched based on the response type. A response sentence is generated by inserting a paraphrase into the retrieved response sentence template.
Conditions for selecting a paraphrase and a response type include a condition based on a confidence value for a recognized keyword, the number of recognized keywords, a recognized keyword type, a history of past response types, and a past response. Any one or more of sentence history and past user speech recognition results are included.

  According to the present invention, it is possible to generate an appropriate response sentence.

(Embodiment 1)
The first embodiment of the present invention (voice dialogue apparatus and voice dialogue program) will be described below with reference to FIGS.

  FIG. 1 is a diagram showing a configuration example of Embodiment 1 of the present invention. In FIG. 1, it is assumed that a voice dialogue apparatus that inputs the location and name of a target facility by a user by voice, searches for information on the target facility, and outputs the result.

  In FIG. 1, a microphone 101 is a means for converting a user's voice into an electric signal, and a voice input unit 102 is a means for converting the electric signal input from the microphone 101 into voice data that can be processed by the information processing unit 105. is there. The voice output unit 103 is means for converting voice data generated from a response sentence to the input voice of the user into an electric signal, and the speaker 104 is means for outputting the converted electric signal as voice. The information processing unit 105 is means for executing processing for exchanging with the user based on various programs stored in the storage unit 106.

The spoken dialogue apparatus 1 is a computer having a main storage device made of a semiconductor memory such as a CPU (Central Processing Unit) and a RAM (Random Access Memory), an auxiliary storage device made of a hard disk device, an input / output interface, etc., not shown. It is configured using. Here, the CPU corresponds to the information processing unit 105, the main storage device corresponds to the storage unit 106, and the input / output interface corresponds to the voice input unit 102 and the voice output unit 103.
The main storage device stores a speech recognition program 107, a dialogue control program 108, a speech synthesis program 109, and a search program 110 in the storage unit 106.
The auxiliary storage device stores a dialogue scenario 111, a keyword type dictionary 112, a paraphrase dictionary 113, a response sentence template dictionary 114, and a database 115. Details of each function will be described later.

  The voice recognition program 107 is executed by the information processing unit 105 to recognize a keyword expressed in the input voice data of the user, and outputs the result. The result can be obtained in the form of (Kanagawa Prefecture, 0.8) (XX Museum, 0.9), for example, for the voice of the user “XX Museum of Kanagawa”. . Here, the word expressed in parentheses is a keyword to be recognized, and the numerical value written together with the keyword is a reliability indicating the certainty of the recognized keyword. As the reliability, the numerical value of the reliability obtained for each keyword as a result of the speech recognition process in a commonly used speech recognition technique can be used as it is. In the above example, it is assumed that only the prefecture name or museum name as a keyword is output as a result, but it is also possible to output all words other than the keyword such as “no”. Furthermore, in the above example, only the keyword with the highest reliability is output as a result, but a plurality of candidates for each keyword in the voice data can also be output.

  The dialogue control program 108 is executed by the information processing unit 105 to determine a response type and a paraphrase from the paraphrase dictionary 113 based on the recognized keyword and its reliability on the condition of the reliability. Based on the determined response type, a response sentence corresponding to the determined response type is determined from the response sentence template dictionary 114, and the determined paraphrase is inserted into the determined response sentence. A response sentence for prompting the utterance of is generated. Details of the processing of the dialog control program 108 will be described later.

  The voice synthesis program 109 is executed by the information processing unit 105 to convert the response sentence generated by the dialogue control program 108 into voice data and output the voice data.

  The search program 110 is a program that is executed by the information processing unit 105 to search for information on a target facility from the database 115 using the location and name of the facility input by the user as search conditions. The database 115 uses a known relational database or the like. Further, the search program 110 can be easily realized by using search means normally prepared in the database. Alternatively, as the database 115, a means for searching information on the Internet which is generally used can be used.

  FIG. 2 is a block diagram showing the format of each dialogue scenario stored in the dialogue scenario 111. In the dialogue scenario, the type and number of keywords input by the user, information for recognizing the user's voice (grammar name for user voice recognition), and processing (command) performed using the keyword input by the user Information on

The dialogue name 201 is a character string representing the name of the dialogue used to distinguish the dialogue scenario, and the name 202 of the slot 1 and the name 204 of the slot n are character strings representing the name of the slot input by the user. . Here, the slot refers to a memory area for storing each keyword input by the user, and the name of the slot is used to distinguish the memory area. The keyword corresponding to the slot input by the user is stored in the memory area corresponding to the slot.
The slot 1 type 203 and the slot n type 205 are character strings representing the types of keywords stored in the slots, and the same character strings as the keyword type names used in the keyword type dictionary 112 described later are used. use. For example, character strings such as “prefecture name” and “museum” are stored. The slot 1 type 203 and the slot n type 205 are used to determine in which slot a keyword input by the user is stored when stored in the slot.

The user speech recognition grammar name 206 is a character string representing the name of a speech recognition grammar in which a keyword used for recognizing speech data input by the user and a rule relating to the arrangement of the keywords are registered. As the speech recognition grammar, a format used in a generally used speech recognition technology can be used. In addition, since the speech wording and keywords input by the user are different for each dialog, the speech recognition grammar is set for each dialog in the first embodiment of the present invention, but it can be used for all target dialogs. It is also possible to prepare a simple grammar for speech recognition and use it.
The command 207 is a character string representing a command for searching the database based on a keyword input by the user to the name 202 of the slot 1 and the name 204 of the slot n. For example, when searching the database using slot 1 and slot 2 as search conditions, if the command format is “SEARCH condition 1 condition 2”, the command 207 includes “SEARCH [name of slot 1] [slot 2 name] ”. Here, SEARCH is a name of a command for performing a search, and the description [name of slot n] indicates that this place is replaced with a keyword stored in slot n.

  FIG. 3 is a configuration diagram showing a format of information stored in the keyword type dictionary 112. The keyword type dictionary 112 is a dictionary that stores a keyword included in the input user's voice and the name of the type in association with each other.

  The column indicated by the type 301 is the name of the keyword type, and the column indicated by the keyword 302 describes the keyword corresponding to the keyword type. For example, in FIG. 3, “XX Museum” 304, “ΔΔ Museum” 305, and “XX Museum” 306 are keywords corresponding to the type “Museum” 303. “Tokyo” 308, “Kanagawa” 309 and “Chiba” 310 are keywords corresponding to the type of “prefecture name” 307.

  FIG. 4 is a configuration diagram showing a format of information stored in the paraphrase dictionary 113. The paraphrase dictionary 113 is a dictionary in which a paraphrase rule for generating a response sentence is set based on a keyword included in the input user's voice and its reliability.

  The column indicated by the keyword 401 is the keyword to be paraphrased, the column indicated by the condition (reliability) 402 is the condition for paraphrasing, the column indicated by the paraphrase 403 is the paraphrase word and the column indicated by the response type 404 is the response type Is described. In the column indicated by the condition (reliability) 402 for performing the paraphrase, “x” represents the reliability of the keyword recognized from the user's voice data, for example, “x> 0.8” Represents the condition “when reliability is greater than 0.8”. Also, the four lines included in the column 405 represent paraphrase rules for the keyword “XX museum”, and the four lines included in the column 406 represent paraphrase rules for the keyword “Kanagawa”. Furthermore, the line indicated by reference numeral 407 represents one of the paraphrasing rules of “XX museum”. When the reliability is greater than 0.8, the keyword “XX museum” is used as it is. The rule is that the response type “confirm keyword” is selected. On the other hand, in the case of the row indicated by the reference numeral 408, when the reliability is greater than 0.5 and less than or equal to 0.8, the keyword “XX museum” is replaced with “art museum name”, and the response is “narrow down” The rule is that the type is selected.

  FIG. 5 is a configuration diagram showing a format of information stored in the response sentence template dictionary 114. The response sentence template dictionary 114 is a dictionary that records the format of the response sentence associated with the response type 404 determined by the paraphrase dictionary 113 (see FIG. 4).

  The response type is described in the column indicated by the response type 501, and the response statement template corresponding to the response type is described in the column indicated by the response statement template 502. In the line indicated by reference numeral 503, a response sentence template for the response type “keyword confirmation” is displayed. In the line indicated by reference numeral 504, the response sentence template for the response type “restriction” is indicated. In the line indicated by reference numeral 505, the response type “ The response sentence template for “type confirmation” is stored in the row indicated by reference numeral 506, and the response sentence template for the response type “re-input” is stored. “X” in each response sentence template indicates that the paraphrase word 403 determined by the paraphrase dictionary 113 is inserted at that position. That is, the response sentence is generated by inserting a paraphrase word determined by the paraphrase dictionary 113 into the response sentence template 502 corresponding to the response type 501.

For example, in the paraphrase dictionary 113 (see FIG. 4), when the paraphrase rule for the line indicated by reference numeral 407 is applied, the response sentence template for the line indicated by reference numeral 503 is selected in the response sentence template dictionary 114 and the paraphrase word is selected. Since “XX Museum” is inserted, the response sentence is “Are you sure you want to use XX Museum?”.
On the other hand, when the paraphrase rule for the line indicated by reference numeral 408 is applied, the response sentence template for the line indicated by reference numeral 504 is selected in the response sentence template dictionary 114 and the paraphrase word “art museum name” is inserted. Therefore, the response sentence is "Please tell me the name of the museum again."

  FIG. 6 is a flowchart showing a processing procedure of the voice interactive apparatus 1 to which the first embodiment of the present invention is applied. The processing of the voice interactive apparatus 1 to which the first exemplary embodiment of the present invention is applied mainly executes the interactive control program 108. The dialogue control program 108 uses a dialogue scenario 111, a keyword type dictionary 112, a paraphrase dictionary 113, and a response sentence template dictionary 114 to input a keyword as a search condition for searching information intended by the user by voice. When the necessary keywords are prepared, the database 115 is searched and the result is output to the user. The procedure in the first embodiment of the present invention will be described with reference to FIG.

In FIG. 6, a specific dialogue scenario is selected in advance by the user by another voice command or a menu on the screen (not shown), and the user corresponding to the selected dialogue scenario is stored in the voice recognition program 107. It is assumed that the grammar name for speech recognition is notified and the speech data input from the user can be recognized. Further, it is assumed that the contents of each slot of the dialogue scenario 111 at the start of processing are empty.
In the following, steps S600 to S603 are executed by the information processing unit 105, so that the dialogue control program 108 performs them.

  When the process starts, step S600 initializes each slot. That is, when a specific dialogue scenario 111 (see FIG. 2) is selected by the user, the dialogue scenario 111 is stored in the memory area. The dialogue scenario 111 stores a user speech recognition grammar name 206, a corresponding command 207, a slot 1 type 203, and a slot n type 205. Further, a memory area for storing the contents of the slot N (N = 1,... N) is secured, and the contents are left empty (empty slot).

  In step S601, on the condition that the slot N (N = 1,... N) existing in the dialogue scenario 111 stores the contents of the slot N that is an empty slot, a keyword is input to the empty slot. The response type for the response sentence for prompting is determined from the correspondence table shown in FIG. Step S601 is a process performed when a dialog with the user is newly started or when the user is prompted to enter a keyword in a new slot.

Here, FIG. 7 is a correspondence table showing information for storing the relationship between empty slots and corresponding response types in the first embodiment of the present invention. This correspondence table is composed of an empty slot list 701 and a response type 702. The column indicated by the empty slot list 701 stores the name of the slot, and the column indicated by the response type 702 stores the response type for the slot. For example, the row indicated by reference numeral 703 indicates that the response type is “request 1” when slot 1 is an empty slot. Similarly, the row indicated by reference numeral 704 indicates that the response type is “request 2” when slot 2 is an empty slot.
The correspondence table shown in FIG. 7 is prepared in advance as a correspondence table in which empty slots and response types are associated with the user speech recognition grammar name 206. As a kind of paraphrase rule, a response type corresponding to an empty slot can be stored in the paraphrase dictionary 113 (see FIG. 4). Alternatively, a separate storage means may be provided to store response types corresponding to empty slots.

  In step S602, the response sentence template dictionary 114 (see FIG. 5) is searched based on the response type selected in step S601, and the corresponding response sentence template is determined. In the information stored in the response sentence template dictionary 114, the response sentence template corresponding to the response type 702 shown in FIG. 7 is not taken into account. For example, the lines indicated by reference numerals 801 and 802 in FIG. In addition, the response sentence template for the empty slot can be included in the response sentence template dictionary 114.

  Here, FIG. 8 is a configuration diagram showing information stored in the response sentence template dictionary 114a obtained by extending the response sentence template dictionary 114 shown in FIG. In FIG. 8, the line indicated by reference numeral 801 shows a response sentence when slot 1 is an empty slot, and the line indicated by reference numeral 802 shows a response sentence when slot 2 is an empty slot. Become.

  In step S603, the determined paraphrase word is inserted into the response sentence template to generate a response sentence. Here, the paraphrase to be inserted is determined in step S607 described later. Therefore, in step S603 when the response type is selected in step S601, the process proceeds with no paraphrase word in the response sentence template.

  In step S604, the response sentence generated in step S603 is converted into voice data by the voice synthesis program 109 and output from the speaker 104 via the voice output unit 103.

In step S605, the voice recognition program 107 recognizes the voice data input by the user with respect to the response sentence output in step S604, extracts the keywords, together with the reliability indicating the extracted keywords and their probabilities. Notify the dialog control program.
Hereinafter, steps S606 to S611 are executed by the information processing unit 105, so that the dialogue control program 108 performs them.

  In step S606, it is determined whether the recognition result of the voice data is a response to the confirmation. As a determination method, the recognition result is checked by checking whether or not a specific word indicating a response to a confirmation registered in advance, for example, “Yes” or “No” is included. If this word is included, it is determined that the response is to the confirmation, and if it is not included, it is only necessary to determine that the response is not the confirmation. In addition, information on whether or not the response sentence is a response sentence requesting confirmation may be stored, and if the response sentence is a response sentence requesting confirmation, the user's voice can be determined as a response to the confirmation. At this time, whether or not the response sentence is a response sentence that requires confirmation is information indicating whether or not the response type 501 or the response sentence template 502 in the response sentence template dictionary 114 (see FIG. 5) requires confirmation. Can be easily realized. Furthermore, it can also be determined using both information indicating whether or not the response sentence is a response sentence for confirmation and a specific word included in the recognition result.

If it is determined in step S606 that the user's voice is not a response to the confirmation (No), the process proceeds to step S607.
In step S607, the paraphrase dictionary 113 (see FIG. 4) is searched based on the keyword recognized in step S605 and its reliability, the paraphrase word 403 and the response type 404 are determined, and the process returns to step S602.

For example, if the result of recognizing the user's voice in step S605 is (XX museum, 0.4), first search from the paraphrase dictionary 113 (see FIG. 4) using “XX museum” as a search key. In step S607, the paraphrase 403 “museum” and the response type 404 “type confirmation” are selected.
In this example, in step S602, since the corresponding response sentence template 505 is selected from the response sentence template dictionary 114 (see FIG. 5) based on the response type, the response sentence generated in step S603 is “museum museum”. Are you sure? "

  On the other hand, if it is determined in step S606 that the user's voice is a response to the confirmation (Yes), the process proceeds to step S608.

  In step S608, the user's voice further determines whether the response to the confirmation is affirmative or negative. Also in this case, if the user's voice recognition result includes a specific word indicating affirmation, such as “Yes” or “Yes”, the user's voice has a positive response to the confirmation. When a specific word representing negation, for example, “No”, “No”, or the like is included, the user's voice may be determined as a negative response to the confirmation.

  If it is determined in step S608 that the response of the user's voice to the confirmation is negative (No), the process proceeds to step S609, where the keyword to be confirmed is deleted, and the process returns to step S601. It is easy to determine what keyword is the target of confirmation by holding the recognition result of the user's voice input before the voice of the user who is judged to be a response to the confirmation. Can do.

  If it is determined in step S608 that the response to the user's voice confirmation is affirmative (Yes), the process proceeds to step S610, and the keyword to be confirmed is selected in the corresponding slot N (N = 1,... N) is stored in a memory area for storing the contents. For this purpose, the keyword type 301 is obtained from the keyword type dictionary 112 (see FIG. 3) using the keyword as a key. Furthermore, a slot having a slot N type (N = 1,... N) that matches the obtained keyword type is searched from the dialogue scenario 111 in FIG. The searched slot is set as a slot corresponding to the keyword, and the keyword is stored in a memory area for storing the contents of the slot N (N = 1,... N).

After storing the keyword in the corresponding slot in step S610, the process proceeds to step S611 to check whether the keyword is stored in all slots.
If no keyword is stored in all slots (No), the process returns to step S601.

  When keywords are stored in all slots in step S611 (Yes), the process proceeds to step S612, and a search is performed using the command indicated by reference numeral 207 in the dialogue scenario 111 (see FIG. 2) and the keywords stored in the slots. The program 110 executes a search process of the database 115 and outputs the result by the speech synthesis program 109.

  Furthermore, although the paraphrase dictionary 113 (see FIG. 4) stores paraphrase words corresponding to each keyword, paraphrase words can also be stored for each type of keyword. In this case, a format as shown in FIG. 9 may be used as the paraphrase dictionary.

  Here, FIG. 9 shows a configuration diagram representing information stored in the paraphrase dictionary for the keyword type in the first embodiment of the present invention.

In FIG. 9, the difference from the paraphrase dictionary 113 shown in FIG. 4 is the contents of the column indicated by the type 901 and the column indicated by the paraphrase word 902. In the column indicated by the type 901, a character string representing the type of keyword to be paraphrased is described.
In FIG. 9, “art museum” 903 and “prefecture name” 904 are character strings representing the types of keywords. The content of the column indicated by the paraphrase word 902 is substantially the same as the content of the column indicated by the paraphrase word 403 in FIG. 4, but the contents included in the rules indicated by the rows denoted by reference numerals 905 and 906 are different. The rule indicated by the line denoted by reference numeral 905 indicates that “if the recognized keyword type is a museum and the reliability of the keyword is greater than 0.8, the recognized keyword is selected as a paraphrase”. Yes. The description [keyword] included in the lines indicated by reference numerals 905 and 906 indicates that the recognized keyword is used as a paraphrase.

  When the paraphrase dictionary 113a shown in FIG. 9 is used, step S607 in the flowchart shown in FIG. 6 is changed as follows. That is, in step S607, first, a keyword recognized from the user's voice is searched from the keyword type dictionary 112, and the keyword type 301 is determined. Based on the reliability of the keyword recognized as the determined keyword type 301, the paraphrase dictionary 113 a is searched to determine the paraphrase word and the response type.

  Further, in the paraphrase dictionary 113 shown in FIGS. 4 and 9, only one type of paraphrase word is registered for a combination of a keyword and a condition, but a plurality of paraphrase words can also be registered. In this case, as a method for determining the paraphrase word, for example, a random number may be used.

  According to the first embodiment of the present invention, it is possible to flexibly generate a response sentence even when the reliability of the keyword recognition result varies greatly depending on the phonological characteristics and length of the keyword. For example, even when a keyword is recognized correctly, the reliability obtained may be always high or low depending on the type of keyword. In the paraphrase dictionary 113 (see FIG. 4), the conditions regarding reliability (see reference numeral 402) for selecting paraphrase words are different for “XX Museum” and “Kanagawa Prefecture”, respectively. "Is high when the correct recognition result is obtained, and" Kanagawa Prefecture "is premised on that the reliability is often low when the correct recognition result is obtained. Thus, by setting conditions for selecting paraphrasing words for each keyword, it is possible to generate an appropriate response sentence according to the characteristics of the keyword to be recognized.

(Embodiment 2)
The second embodiment of the present invention (voice dialogue apparatus and voice dialogue program) will be described below with reference to FIG.

  In Embodiment 1 of the present invention described above, it is assumed that only one keyword is included in the user's voice. Usually, in the dialogue with the user, the operability is improved if the user can include two or more keywords in one voice.

  In the second embodiment of the present invention, the format of the response sentence template dictionary 114 is as shown in FIG. 10 so that a flexible response sentence can be generated even when a plurality of keywords are included in the user's voice. And In the response sentence template dictionary 114 shown in FIG. 5, since there is one type of target keyword, the corresponding response type is also one type. However, when there are two or more target keywords at the same time, On the other hand, by applying the paraphrase dictionary 113 shown in FIG. 4, a response type is determined for each keyword. Therefore, in the response sentence template dictionary 114b shown in FIG. 10, a response sentence template can be determined for a combination of response types.

The dictionary shown in FIG. 10 shows the case where there are two target slots, and the response type combinations selected from the keywords corresponding to the respective slots are the response type 1001 of slot 1 and the response type 1002 of slot 2. Recorded in the indicated column. For example, in the line indicated by reference numeral 1003, when both the response type of slot 1 and slot 2 is “keyword confirmation”, a response sentence “Are you sure you want to use [name of slot 2] in [name of slot 1]?” A template is selected.
Here, during the process using the flowchart shown in FIG. 6, in the process of step S <b> 603, [slot 1 name] and [slot 2 name] are the types of the respective slots N stored in the dialogue scenario 111. A keyword paraphrase corresponding to (N = 1,... N) is inserted. That is, in which slot the keyword replacement word is inserted, the keyword type corresponding to the target keyword is searched from the keyword type dictionary 112, and further, the dialogue scenario 111 is matched with the searched keyword type. It can be determined by searching the type of slot N to be performed (N = 1,... N).

  Further, in the rows indicated by reference numerals 1003 to 1006 in FIG. 10, the response types selected from the keywords corresponding to the respective slots are specifically recorded, but in the rows indicated by reference numerals 1007 to 1009, The response type corresponding to slot 2 has a format in which no restriction is provided. The symbol “*” described in the column denoted by reference numeral 1002 indicates that there is no restriction regarding the response type. Thus, for example, if the response type corresponding to slot 1 is “narrow down”, the line indicated by reference numeral 1007 is “speak [name of slot 1] again regardless of the response type corresponding to slot 2”. Is selected.

  When a plurality of keywords are included in the user's voice and a response sentence is generated in a format as indicated by reference numerals 1007 to 1009 in FIG. 10, in the process using the flowchart shown in FIG. Only one keyword is checked, and another keyword needs to be input again. For this reason, for example, information indicating whether or not confirmation is performed on the recognition result of the user's voice is added to the dialogue scenario 111 for each slot, and the paraphrase dictionary 113 ( If a response sentence is generated using the response sentence template dictionary 114 (see FIG. 4) and the response sentence template dictionary 114 (see FIG. 5), an efficient dialogue can be performed for all keywords.

  In addition, when a plurality of keywords are included in the user's voice, a paraphrase dictionary as shown in FIG. 11 is used so that the response type is uniquely determined by the combination of the recognized keywords and their reliability. It can also be used.

Here, FIG. 11 is a block diagram showing the format of a paraphrase dictionary for a combination of a plurality of keywords.
The first keyword 1101, the condition 1102 regarding the reliability of the first keyword, the second keyword 1103, the condition 1104 regarding the reliability of the second keyword, the paraphrase 1105 for the first keyword, two, A paraphrase 1106 and a response type 1107 for the eye keyword are registered as a set. For the keywords indicated by reference numerals 1101 to 1104 and combinations of their reliability, paraphrases and response types for the respective keywords are determined.

  FIG. 12 shows a format of a response sentence template dictionary used when paraphrase words and response types for a plurality of keywords are determined by using, for example, the paraphrase dictionary 113b of FIG. The format of the response sentence template dictionary 114c shown in FIG. 12 is basically the same as that of the response sentence template dictionary 114 shown in FIG. 5, but the content of the response sentence template in the column indicated by the response sentence template 1201 includes a plurality of slots. The keyword corresponding to can be inserted.

For example, when the condition indicated by reference numeral 1108 in FIG. 11 is applied, the response sentence template in the row indicated by reference numeral 1202 in FIG. 12 is selected. That is, if the type of slot 1 is “prefecture name” and the type of slot 2 is “museum”, the response sentence is “Are you sure you want to visit the XX museum in Kanagawa?”.
On the other hand, when the condition indicated by 1109 in FIG. 11 is applied, the response sentence template indicated by reference numeral 1204 in FIG. 12 is selected, and the response sentence is “What museum in Kanagawa?”.
Furthermore, when the condition indicated by reference numeral 1110 in FIG. 11 is applied, the response sentence template indicated by reference numeral 1203 in FIG. 12 is selected, and the response sentence is “Are you sure you want to use the XX museum?”.

Furthermore, the paraphrase dictionary 113 and the response sentence template dictionary 114 for only one keyword and the paraphrase dictionary 113 and the response sentence template 114 for a combination of two or more keywords may be used in combination. it can.
For example, a priority is added to the target slot and a search is performed by using the paraphrase dictionary 113 and the response sentence template dictionary 114 related to the slot with a higher priority, or the paraphrase dictionary 113 for a combination of a larger number of slots. Or the response sentence template dictionary 114 may be preferentially searched.
Also, the slot priority and the number of target slots can be used together, and the priority for the combination of slots can be defined in advance.

According to Embodiment 2 of the present invention, it is possible to flexibly generate an appropriate response sentence even when it is better to change the keyword paraphrasing method according to the type and number of keywords input from the user. Become.
For example, a case is assumed where a response sentence that prompts re-input specifying a type of a keyword recognized from the user's voice is generated. If the user's voice is a single keyword such as “It is a XX museum,” the response sentence may be “Please tell me the name of the museum again.”
On the other hand, if the user's voice is two keywords (Kanagawa Prefecture, XX Museum of Art), such as “It is a XX museum in Kanagawa Prefecture”, “Which museum is in Kanagawa Prefecture?” Is an appropriate expression.
In the first example, “XX museum” is replaced with “art museum name”, and in the second example, “which museum” is replaced.
Thus, even if the keywords to be paraphrased are the same, an appropriate paraphrase method may differ depending on the number of keywords included in the response sentence. In addition, depending on the response sentence before that, the expression “what kind of art museum” or “what kind of art museum” may be appropriate instead of “which art museum”.
In such a case, an appropriate response can be obtained by using the paraphrase dictionary 113 and the response sentence template dictionary 114 for only one keyword together with the paraphrase dictionary 113 and the response sentence template 114 for a combination of two or more keywords. A sentence can be generated.

(Embodiment 3)
A third embodiment of the present invention (voice dialogue apparatus and voice dialogue program) will be described with reference to FIG.

  The paraphrase dictionary 113 in the first and second embodiments of the present invention described above uses only the recognized keyword and its reliability as conditions for selecting a paraphrase word and a response type. Usually, in a dialogue with a user, it is often the case that a smooth exchange can be performed if the response sentence is changed according to the amount of information to be exchanged or the content of the dialogue up to immediately before. In order to realize this, items other than the recognized keyword and reliability are added as conditions for selecting a paraphrase word and a response type in the paraphrase word dictionary 113.

  FIG. 13 is a configuration diagram showing information stored in the paraphrase dictionary in which items are added to the conditions in the third embodiment of the present invention.

  In the paraphrase dictionary 113 c in FIG. 13, “number of other slots” shown in the column 1301 is added as a condition. In the column 1301, “*” indicates that there is no restriction on the number of slots other than the target slot. “0” indicates that there is no slot other than the target slot, and “y ≧ 1” indicates that the number of slots other than the target slot is 1 or more. y is a variable name used for convenience.

  Here, for example, it is assumed that the paraphrase dictionary 113c shown in FIG. 13 and the response sentence template dictionary 114 shown in FIGS. 5 and 10 are used. As a result of recognizing the user's voice, it is assumed that the keyword included in the voice is only “XX museum”, the reliability is 0.7, and the keyword corresponds to slot 2. In this case, when the paraphrase dictionary 113c shown in FIG. 13 is applied, the row indicated by reference numeral 1302 is selected, and the paraphrase word “museum name” and the response type “narrow down” are selected. Since there is only one target slot, the response sentence template “Please speak X again” is selected by the response sentence template dictionary 114 shown in FIG. 5, and the selected paraphrase is inserted as a response sentence. , "Please tell me the name of the museum again."

  On the other hand, “Kanagawa Prefecture” corresponding to slot 1 is also included in the user's voice, and the reliability is assumed to be 1.0. In this case, the paraphrase word and the response type for “Kanagawa prefecture” are “Kanagawa prefecture” and “Keyword confirmation”, respectively, by applying the paraphrase dictionary 113c shown in FIG. Since there are two target slots for the slot 1 and the slot 2, the response sentence template 114b of FIG. 10 indicates that the response sentence template is [name of slot 2 of [name of slot 1]. ]? "Is selected.

  Here, when the number of other slots is not considered as a condition (for example, when using the paraphrase dictionary 113 shown in FIG. 4), the paraphrase word for “XX museum” is “museum name”, so the selected response By inserting a paraphrase into the sentence template, the response sentence becomes "is it the name of an art museum in Kanagawa?" Since the response type for “XX Museum” is “narrowed down”, it is inappropriate as a response sentence.

  On the other hand, when the number of other slots is considered as a condition according to the paraphrase dictionary 113c shown in FIG. 13, the number of other slots is 1 as the paraphrase for “XX Museum”, so “What Art Museum” is selected. . By inserting the selected paraphrase into the response sentence template, “What kind of museum in Kanagawa Prefecture?” Is generated as a response sentence, and an appropriate response sentence is generated for the response type “narrow down” Is possible.

  In the paraphrase dictionary 113c shown in FIG. 13, the condition based on the reliability and the number of other slots (the number of keywords included in the user's voice) is used as the condition for selecting the paraphrase word and the response type. It is also possible to use other slot types, user names, and conversation history (history of past response types, response sentences, user speech recognition results, etc.) as conditions. In order to use these pieces of information, a column for storing each content may be added to the paraphrase dictionary 113c shown in FIG.

  Further, when using a user name, the name may be input by voice input, a keyboard, or the like when starting a conversation. Alternatively, the user's face can be recognized and input from an image captured from the camera by using a known face image recognition technique. Thereby, it becomes possible to change the form of a response sentence for every user.

Furthermore, when using the conversation history, the response type, the response sentence, and the user speech recognition result column may be stored in the paraphrase dictionary 113 as conditions. For example, the following dialogue is assumed.
(1) Response text: Please tell us the name of the facility.
(2) User voice: XX museum.
(3) Response: Please tell us the name of the museum once again.
(4) User voice: XX Museum (5) Response: Are you sure you want to visit XX Museum?
(6) User voice: Yes When the paraphrase dictionary 113 shown in FIG. 4 and the response sentence template dictionary 114a shown in FIG. 8 are used, the response type of each response sentence is (1) is request 1, (3) Is narrowing down, and (5) is keyword confirmation. When the conversation history is represented by a sequence of response type and user speech recognition result, for example, (response: request 1) (user: XX museum) (response: narrowing down) (user: XX museum) (response: The dialogue history can be expressed in a format such as (Keyword Confirmation) (User: Yes). “Response” is an abbreviation for response type, and “user” is an abbreviation for user speech recognition result. If information in such a format is stored in the paraphrase dictionary 113 and a result of a dialog actually performed in the same format is separately recorded, the dialog history can be used as a condition for the paraphrase dictionary 113.
In the above example, only the recognized keyword is registered as the recognition result of the user voice. However, the reliability obtained as a result of the recognition may be recorded together. It is also easy to store the actually output response text instead of the response type. Further, only the response sentence or only the recognition result of the user voice may be stored. In addition, as the conversation history stored as a condition in the paraphrase dictionary 113, the number of items in the stored conversation history may be limited.

  By using the dialogue history as a condition of the paraphrase dictionary 113, it is possible to easily perform control such as changing a response sentence when the response type is narrowed down or repeated type confirmation.

  According to the third embodiment of the present invention, it is possible to finely control the contents of the response sentence for each more complicated condition than in the second embodiment.

  The spoken dialogue apparatus of the present invention can finely set the contents of the response sentence for each keyword or its type expressed in the input user's voice data, and as a result, Can be expected to improve the operability. Accordingly, the present invention is suitable for use as an automatic response system in a call center or as an operation interface for devices such as vending machines and ATMs.

It is a block diagram which shows the structural example of the voice interactive apparatus by Embodiment 1 of this invention. It is a block diagram showing the format of the dialogue scenario in Embodiment 1 of this invention. It is a block diagram showing the format of the information stored in the keyword kind dictionary in Embodiment 1 of this invention. It is a block diagram showing the format of the information stored in the paraphrase dictionary in Embodiment 1 of this invention. It is a block diagram showing the format of the information stored in the response sentence template dictionary in Embodiment 1 of this invention. It is a flowchart showing the process sequence of the dialogue control program in Embodiment 1 of this invention. It is a correspondence table which shows the information which stores the relationship between an empty slot and a response type in Embodiment 1 of this invention. It is a block diagram showing the format of the response sentence template dictionary expanded in Embodiment 1 of this invention. It is a block diagram showing the format of the paraphrase dictionary with respect to the kind of keyword in Embodiment 1 of this invention. It is a block diagram showing the format of the response sentence template dictionary in Embodiment 2 of this invention. It is a block diagram showing the format of the paraphrase dictionary with respect to the combination of the several keyword in Embodiment 2 of this invention. It is a block diagram showing the format of the response sentence template dictionary corresponding to the some keyword in Embodiment 2 of this invention. It is a block diagram showing the format of the paraphrase dictionary in Embodiment 3 of this invention.

Explanation of symbols

DESCRIPTION OF SYMBOLS 107 Speech recognition program 108 Dialog control program 109 Speech synthesis program 112 Keyword type dictionary 113 Paraphrase dictionary 114 Response sentence template dictionary

Claims (8)

  1. Voice recognition means for recognizing one or more keywords and their reliability from the input user voice;
    For each keyword, a response type indicating the type of response sentence to be communicated to the user by voice, a paraphrase used when the recognized keyword is included in the response sentence, and a condition for selecting the response sentence were recorded. A paraphrase dictionary,
    A response sentence template dictionary storing a response sentence associated with the response type;
    Based on the recognized keyword and its reliability, the response type and the paraphrase are determined from the paraphrase dictionary on the basis of the reliability, and the response type is determined based on the determined response type. Dialogue control means for determining a response sentence associated with the response sentence template dictionary, inserting the determined paraphrase into the determined response sentence, and generating a response sentence;
    Voice synthesizing apparatus, comprising: voice synthesis means for converting the generated response sentence into voice data and outputting the voice data.
  2. The paraphrase dictionary is
    In addition to the reliability, the number of keywords included in the user's voice, the type of the keyword, the past response type history, the past response sentence history, the past user voice recognition The spoken dialogue apparatus according to claim 1, wherein any one or more of the results are included.
  3. The response sentence template dictionary is:
    The spoken dialogue according to claim 1 or 2, wherein the response sentence corresponding to a combination of response types corresponding to each of two or more keywords included in the inputted user voice is recorded. apparatus.
  4. The dialogue control means includes
    For each of two or more keywords included in the input user's voice, a response associated with a combination having a large number of response types constituting the combination among the combinations of the determined response types The spoken dialogue apparatus according to claim 3, wherein the sentence is determined with priority.
  5. The paraphrase dictionary is
    The spoken dialogue apparatus according to claim 1 or 2, wherein the paraphrase for each keyword is recorded in association with a combination of two or more keywords included in the inputted user's voice.
  6. A keyword type dictionary that records the keyword and the name of the type in association with each other;
    The paraphrase dictionary is
    For each name of the type, record the response type, the paraphrase, and the condition,
    The dialogue control means includes
    Based on the recognized keyword, the keyword type name is determined from the keyword type dictionary, and based on the determined keyword type name and the condition, from the paraphrase dictionary, the response type and The spoken dialogue apparatus according to any one of claims 1 to 5, wherein the paraphrase is determined.
  7. The paraphrase dictionary is
    Record a plurality of said paraphrases,
    The dialogue control means includes
    The spoken dialogue apparatus according to claim 1, wherein any one of the plurality of paraphrase words determined from the paraphrase dictionary is randomly determined.
  8. A voice input unit that inputs voice uttered by the user via the voice input device, and a voice output unit that outputs voice to be heard by the user via the voice output device;
    For each keyword, a response type indicating the type of response sentence to be communicated to the user by voice, a paraphrase used when the recognized keyword is included in the response sentence, and a condition for selecting the response sentence were recorded. In a computer having a storage device in which a paraphrase dictionary and a response sentence template dictionary that records a response sentence associated with the response type are stored,
    A process of recognizing one or more keywords and their reliability from the voice of the user input via the voice input unit;
    A process of determining the response type and the paraphrase from the paraphrase dictionary based on the recognized keyword and its reliability, with the reliability as a condition,
    A process for determining a response sentence associated with the response type from the response sentence template dictionary based on the determined response type;
    Processing for inserting the determined paraphrase into the determined response sentence and generating a response sentence;
    A process of synthesizing the response sentence into voice data and outputting the voice data via a voice output unit;
    Are executed in this order.
JP2006211166A 2006-08-02 2006-08-02 Spoken dialogue apparatus and spoken dialogue program Active JP4728905B2 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP2006211166A JP4728905B2 (en) 2006-08-02 2006-08-02 Spoken dialogue apparatus and spoken dialogue program

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
JP2006211166A JP4728905B2 (en) 2006-08-02 2006-08-02 Spoken dialogue apparatus and spoken dialogue program

Publications (2)

Publication Number Publication Date
JP2008039928A JP2008039928A (en) 2008-02-21
JP4728905B2 true JP4728905B2 (en) 2011-07-20

Family

ID=39175040

Family Applications (1)

Application Number Title Priority Date Filing Date
JP2006211166A Active JP4728905B2 (en) 2006-08-02 2006-08-02 Spoken dialogue apparatus and spoken dialogue program

Country Status (1)

Country Link
JP (1) JP4728905B2 (en)

Families Citing this family (76)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8645137B2 (en) 2000-03-16 2014-02-04 Apple Inc. Fast, language-independent method for user authentication by voice
US8677377B2 (en) 2005-09-08 2014-03-18 Apple Inc. Method and apparatus for building an intelligent automated assistant
US9330720B2 (en) 2008-01-03 2016-05-03 Apple Inc. Methods and apparatus for altering audio output signals
US8996376B2 (en) 2008-04-05 2015-03-31 Apple Inc. Intelligent text-to-speech conversion
US20100030549A1 (en) 2008-07-31 2010-02-04 Lee Michael M Mobile device having human language translation capability with positional feedback
US10241752B2 (en) 2011-09-30 2019-03-26 Apple Inc. Interface for a virtual digital assistant
US9858925B2 (en) 2009-06-05 2018-01-02 Apple Inc. Using context information to facilitate processing of commands in a virtual assistant
US9431006B2 (en) 2009-07-02 2016-08-30 Apple Inc. Methods and apparatuses for automatic speech recognition
US9318108B2 (en) 2010-01-18 2016-04-19 Apple Inc. Intelligent automated assistant
US10276170B2 (en) 2010-01-18 2019-04-30 Apple Inc. Intelligent automated assistant
US8682667B2 (en) 2010-02-25 2014-03-25 Apple Inc. User profiling for selecting user specific voice input processing information
US9262612B2 (en) 2011-03-21 2016-02-16 Apple Inc. Device access using voice authentication
US10241644B2 (en) 2011-06-03 2019-03-26 Apple Inc. Actionable reminder entries
US10057736B2 (en) 2011-06-03 2018-08-21 Apple Inc. Active transport based notifications
US8994660B2 (en) 2011-08-29 2015-03-31 Apple Inc. Text correction processing
US9280610B2 (en) 2012-05-14 2016-03-08 Apple Inc. Crowd sourcing information to fulfill user requests
US9721563B2 (en) 2012-06-08 2017-08-01 Apple Inc. Name recognition system
US9495129B2 (en) 2012-06-29 2016-11-15 Apple Inc. Device, method, and user interface for voice-activated navigation and browsing of a document
US9547647B2 (en) 2012-09-19 2017-01-17 Apple Inc. Voice-based media searching
WO2014197336A1 (en) 2013-06-07 2014-12-11 Apple Inc. System and method for detecting errors in interactions with a voice-based digital assistant
US9582608B2 (en) 2013-06-07 2017-02-28 Apple Inc. Unified ranking with entropy-weighted information for phrase-based semantic auto-completion
WO2014197334A2 (en) 2013-06-07 2014-12-11 Apple Inc. System and method for user-specified pronunciation of words for speech synthesis and recognition
WO2014197335A1 (en) 2013-06-08 2014-12-11 Apple Inc. Interpreting and acting upon commands that involve sharing information with remote devices
KR101772152B1 (en) 2013-06-09 2017-08-28 애플 인크. Device, method, and graphical user interface for enabling conversation persistence across two or more instances of a digital assistant
US10176167B2 (en) 2013-06-09 2019-01-08 Apple Inc. System and method for inferring user intent from speech inputs
US9430463B2 (en) 2014-05-30 2016-08-30 Apple Inc. Exemplar-based natural language processing
US9715875B2 (en) 2014-05-30 2017-07-25 Apple Inc. Reducing the need for manual start/end-pointing and trigger phrases
US9633004B2 (en) 2014-05-30 2017-04-25 Apple Inc. Better resolution when referencing to concepts
US10078631B2 (en) 2014-05-30 2018-09-18 Apple Inc. Entropy-guided text prediction using combined word and character n-gram language models
US9842101B2 (en) 2014-05-30 2017-12-12 Apple Inc. Predictive conversion of language input
US9785630B2 (en) 2014-05-30 2017-10-10 Apple Inc. Text prediction using combined word N-gram and unigram language models
US9760559B2 (en) 2014-05-30 2017-09-12 Apple Inc. Predictive text input
WO2015184186A1 (en) 2014-05-30 2015-12-03 Apple Inc. Multi-command single utterance input method
US9338493B2 (en) 2014-06-30 2016-05-10 Apple Inc. Intelligent automated assistant for TV user interactions
US9818400B2 (en) 2014-09-11 2017-11-14 Apple Inc. Method and apparatus for discovering trending terms in speech requests
US9646609B2 (en) 2014-09-30 2017-05-09 Apple Inc. Caching apparatus for serving phonetic pronunciations
US9668121B2 (en) 2014-09-30 2017-05-30 Apple Inc. Social reminders
US10074360B2 (en) 2014-09-30 2018-09-11 Apple Inc. Providing an indication of the suitability of speech recognition
US9886432B2 (en) 2014-09-30 2018-02-06 Apple Inc. Parsimonious handling of word inflection via categorical stem + suffix N-gram language models
US10127911B2 (en) 2014-09-30 2018-11-13 Apple Inc. Speaker identification and unsupervised speaker adaptation techniques
US9865280B2 (en) 2015-03-06 2018-01-09 Apple Inc. Structured dictation using intelligent automated assistants
US9721566B2 (en) 2015-03-08 2017-08-01 Apple Inc. Competing devices responding to voice triggers
US9886953B2 (en) 2015-03-08 2018-02-06 Apple Inc. Virtual assistant activation
US9899019B2 (en) 2015-03-18 2018-02-20 Apple Inc. Systems and methods for structured stem and suffix language models
WO2016151698A1 (en) * 2015-03-20 2016-09-29 株式会社 東芝 Dialog device, method and program
US9959866B2 (en) 2015-04-02 2018-05-01 Panasonic Intellectual Property Management Co., Ltd. Computer-implemented method for generating a response sentence by using a weight value of node
US9842105B2 (en) 2015-04-16 2017-12-12 Apple Inc. Parsimonious continuous-space phrase representations for natural language processing
US10083688B2 (en) 2015-05-27 2018-09-25 Apple Inc. Device voice control for selecting a displayed affordance
US10127220B2 (en) 2015-06-04 2018-11-13 Apple Inc. Language identification from short strings
US10101822B2 (en) 2015-06-05 2018-10-16 Apple Inc. Language input correction
US9578173B2 (en) 2015-06-05 2017-02-21 Apple Inc. Virtual assistant aided communication with 3rd party service in a communication session
US10255907B2 (en) 2015-06-07 2019-04-09 Apple Inc. Automatic accent detection using acoustic models
US10186254B2 (en) 2015-06-07 2019-01-22 Apple Inc. Context-based endpoint detection
US9697820B2 (en) 2015-09-24 2017-07-04 Apple Inc. Unit-selection text-to-speech synthesis using concatenation-sensitive neural networks
US10366158B2 (en) 2015-09-29 2019-07-30 Apple Inc. Efficient word encoding for recurrent neural network language models
US10049668B2 (en) 2015-12-02 2018-08-14 Apple Inc. Applying neural network language models to weighted finite state transducers for automatic speech recognition
US10223066B2 (en) 2015-12-23 2019-03-05 Apple Inc. Proactive assistance based on dialog communication between devices
US9934775B2 (en) 2016-05-26 2018-04-03 Apple Inc. Unit-selection text-to-speech synthesis based on predicted concatenation parameters
US9972304B2 (en) 2016-06-03 2018-05-15 Apple Inc. Privacy preserving distributed evaluation framework for embedded personalized systems
US10249300B2 (en) 2016-06-06 2019-04-02 Apple Inc. Intelligent list reading
US10049663B2 (en) 2016-06-08 2018-08-14 Apple, Inc. Intelligent automated assistant for media exploration
DK179309B1 (en) 2016-06-09 2018-04-23 Apple Inc Intelligent automated assistant in a home environment
US10192552B2 (en) 2016-06-10 2019-01-29 Apple Inc. Digital assistant providing whispered speech
US10067938B2 (en) 2016-06-10 2018-09-04 Apple Inc. Multilingual word prediction
DK179343B1 (en) 2016-06-11 2018-05-14 Apple Inc Intelligent task discovery
DK201670540A1 (en) 2016-06-11 2018-01-08 Apple Inc Application integration with a digital assistant
DK179415B1 (en) 2016-06-11 2018-06-14 Apple Inc Intelligent device arbitration and control
US10043516B2 (en) 2016-09-23 2018-08-07 Apple Inc. Intelligent automated assistant
US10417266B2 (en) 2017-05-09 2019-09-17 Apple Inc. Context-aware ranking of intelligent response suggestions
US10332518B2 (en) 2017-05-09 2019-06-25 Apple Inc. User interface for correcting recognition errors
US10395654B2 (en) 2017-05-11 2019-08-27 Apple Inc. Text normalization based on a data-driven learning network
US10410637B2 (en) 2017-05-12 2019-09-10 Apple Inc. User-specific acoustic models
US10403278B2 (en) 2017-05-16 2019-09-03 Apple Inc. Methods and systems for phonetic matching in digital assistant services
US10303715B2 (en) 2017-05-16 2019-05-28 Apple Inc. Intelligent automated assistant for media exploration
US10311144B2 (en) 2017-05-16 2019-06-04 Apple Inc. Emoji word sense disambiguation
US10403283B1 (en) 2018-06-01 2019-09-03 Apple Inc. Voice interaction at a primary device to access call functionality of a companion device

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP3454897B2 (en) * 1994-01-31 2003-10-06 株式会社日立製作所 Voice dialogue system
JP4293340B2 (en) * 2003-02-18 2009-07-08 幸宏 伊東 Dialogue understanding device

Also Published As

Publication number Publication date
JP2008039928A (en) 2008-02-21

Similar Documents

Publication Publication Date Title
US7546529B2 (en) Method and system for providing alternatives for text derived from stochastic input sources
US10025781B2 (en) Network based speech to speech translation
CN102667889B (en) Apparatus and method for foreign language learning
US5970448A (en) Historical database storing relationships of successively spoken words
US5384701A (en) Language translation system
US8543375B2 (en) Multi-mode input method editor
US6789093B2 (en) Method and apparatus for language translation using registered databases
US8954329B2 (en) Methods and apparatus for acoustic disambiguation by insertion of disambiguating textual information
US5991719A (en) Semantic recognition system
EP1096472B1 (en) Audio playback of a multi-source written document
US8364487B2 (en) Speech recognition system with display information
JP4237915B2 (en) Method executed on a computer for enabling the user to set the sound of the string
US9640175B2 (en) Pronunciation learning from user correction
US20130185059A1 (en) Method and System for Automatically Detecting Morphemes in a Task Classification System Using Lattices
KR100734409B1 (en) Method and system for text editing in hand-held electronic device
RU2477518C2 (en) Recognition architecture for generating asian hieroglyphs
US20150025885A1 (en) System and method of dictation for a speech recognition command system
US7552045B2 (en) Method, apparatus and computer program product for providing flexible text based language identification
JP2015038731A (en) Method for disambiguating multiple readings in language conversion
JP2018077870A (en) Speech recognition method
EP1521239A1 (en) Multi-modal input form with dictionary and grammar
US6374224B1 (en) Method and apparatus for style control in natural language generation
US6801897B2 (en) Method of providing concise forms of natural commands
US20030195739A1 (en) Grammar update system and method
US9236047B2 (en) Voice stream augmented note taking

Legal Events

Date Code Title Description
A621 Written request for application examination

Free format text: JAPANESE INTERMEDIATE CODE: A621

Effective date: 20090617

A711 Notification of change in applicant

Free format text: JAPANESE INTERMEDIATE CODE: A712

Effective date: 20100212

A977 Report on retrieval

Free format text: JAPANESE INTERMEDIATE CODE: A971007

Effective date: 20110309

TRDD Decision of grant or rejection written
A01 Written decision to grant a patent or to grant a registration (utility model)

Free format text: JAPANESE INTERMEDIATE CODE: A01

Effective date: 20110412

A01 Written decision to grant a patent or to grant a registration (utility model)

Free format text: JAPANESE INTERMEDIATE CODE: A01

A61 First payment of annual fees (during grant procedure)

Free format text: JAPANESE INTERMEDIATE CODE: A61

Effective date: 20110415

R150 Certificate of patent or registration of utility model

Free format text: JAPANESE INTERMEDIATE CODE: R150

FPAY Renewal fee payment (event date is renewal date of database)

Free format text: PAYMENT UNTIL: 20140422

Year of fee payment: 3

R250 Receipt of annual fees

Free format text: JAPANESE INTERMEDIATE CODE: R250

R250 Receipt of annual fees

Free format text: JAPANESE INTERMEDIATE CODE: R250

R250 Receipt of annual fees

Free format text: JAPANESE INTERMEDIATE CODE: R250

R250 Receipt of annual fees

Free format text: JAPANESE INTERMEDIATE CODE: R250

R250 Receipt of annual fees

Free format text: JAPANESE INTERMEDIATE CODE: R250

R250 Receipt of annual fees

Free format text: JAPANESE INTERMEDIATE CODE: R250