WO2015132829A1 - 音声対話装置、音声対話システムおよび音声対話方法 - Google Patents
音声対話装置、音声対話システムおよび音声対話方法 Download PDFInfo
- Publication number
- WO2015132829A1 WO2015132829A1 PCT/JP2014/005689 JP2014005689W WO2015132829A1 WO 2015132829 A1 WO2015132829 A1 WO 2015132829A1 JP 2014005689 W JP2014005689 W JP 2014005689W WO 2015132829 A1 WO2015132829 A1 WO 2015132829A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- word
- voice
- utterance
- words
- data
- Prior art date
Links
- 230000003993 interaction Effects 0.000 title claims abstract description 22
- 238000000034 method Methods 0.000 title claims description 35
- 230000004044 response Effects 0.000 claims abstract description 66
- 230000015572 biosynthetic process Effects 0.000 claims abstract description 11
- 238000003786 synthesis reaction Methods 0.000 claims abstract description 11
- 239000000284 extract Substances 0.000 claims abstract description 7
- 238000012545 processing Methods 0.000 claims description 21
- 230000002452 interceptive effect Effects 0.000 claims description 5
- 238000000605 extraction Methods 0.000 claims 1
- 230000008569 process Effects 0.000 description 18
- 230000008859 change Effects 0.000 description 17
- 238000012790 confirmation Methods 0.000 description 17
- 238000010586 diagram Methods 0.000 description 17
- 235000015220 hamburgers Nutrition 0.000 description 17
- 235000002595 Solanum tuberosum Nutrition 0.000 description 8
- 244000061456 Solanum tuberosum Species 0.000 description 8
- 238000001514 detection method Methods 0.000 description 5
- 238000007792 addition Methods 0.000 description 3
- 238000005516 engineering process Methods 0.000 description 3
- 230000004308 accommodation Effects 0.000 description 2
- 238000004891 communication Methods 0.000 description 2
- 238000004590 computer program Methods 0.000 description 2
- 230000006870 function Effects 0.000 description 2
- 230000003287 optical effect Effects 0.000 description 2
- 235000012015 potatoes Nutrition 0.000 description 2
- 230000002159 abnormal effect Effects 0.000 description 1
- 239000000571 coke Substances 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 230000010354 integration Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 239000002245 particle Substances 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L13/00—Speech synthesis; Text to speech systems
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L13/00—Speech synthesis; Text to speech systems
- G10L13/02—Methods for producing synthetic speech; Speech synthesisers
- G10L13/027—Concept to speech synthesisers; Generation of natural phrases from machine-based concepts
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/10—Speech classification or search using distance or distortion measures between unknown speech and reference templates
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/18—Speech classification or search using natural language modelling
- G10L15/1822—Parsing for meaning understanding
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification techniques
- G10L17/22—Interactive procedures; Man-machine interfaces
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04M—TELEPHONIC COMMUNICATION
- H04M3/00—Automatic or semi-automatic exchanges
- H04M3/42—Systems providing special services or facilities to subscribers
- H04M3/487—Arrangements for providing information services, e.g. recorded voice services or time announcements
- H04M3/493—Interactive information services, e.g. directory enquiries ; Arrangements therefor, e.g. interactive voice response [IVR] systems or voice portals
- H04M3/4936—Speech interaction details
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L2015/088—Word spotting
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
- G10L2015/225—Feedback of the input speech
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04M—TELEPHONIC COMMUNICATION
- H04M2203/00—Aspects of automatic or semi-automatic exchanges
- H04M2203/10—Aspects of automatic or semi-automatic exchanges related to the purpose or context of the telephonic communication
- H04M2203/1058—Shopping and product ordering
Definitions
- the present disclosure relates to a voice dialogue apparatus, a voice dialogue system, and a voice dialogue method.
- an automatic reservation system that automatically makes reservations for facilities such as accommodation facilities or air tickets
- a voice dialogue system that accepts orders based on user utterances (see, for example, Patent Document 1).
- a speech analysis technique disclosed in Patent Document 2 is used to analyze a user's utterance sentence.
- unnecessary sounds such as “e-” are removed from an utterance and word candidates are extracted.
- An automatic reservation system such as a voice dialogue system is required to improve the recognition rate of utterances.
- This disclosure provides a voice dialogue apparatus, a voice dialogue system, and a voice dialogue method that can improve the recognition rate of utterances.
- the voice interaction device includes an acquisition unit that acquires utterance data indicating a user's utterance, a storage unit that stores a plurality of keywords, a plurality of words extracted from the utterance data, and each of the plurality of words
- a word determination unit that determines whether or not any of the plurality of keywords matches, and the plurality of words includes a first word that is determined not to match any of the plurality of keywords
- a response sentence creating unit for generating a response sentence and a voice generation unit for generating voice data of the response sentence.
- the speech dialogue apparatus, the voice dialogue system, and the voice dialogue method according to the present disclosure can improve the speech recognition rate.
- FIG. 1 is a diagram illustrating an example of a configuration of a voice interaction system according to an embodiment.
- FIG. 2 is a block diagram showing an example of the configuration of the automatic order post and the voice interaction server in the embodiment.
- FIG. 3 is a diagram illustrating an example of the menu DB according to the embodiment.
- FIG. 4A is a diagram illustrating an example of order data according to the embodiment.
- FIG. 4B is a diagram illustrating an example of order data according to the embodiment.
- FIG. 4C is a diagram illustrating an example of order data according to the embodiment.
- FIG. 4D is a diagram illustrating an example of order data according to the embodiment.
- FIG. 5 is a diagram illustrating an example of a display screen that displays order data according to the embodiment.
- FIG. 6 is a flowchart illustrating an example of a processing procedure of order processing executed by the voice interaction server according to the embodiment.
- FIG. 7 is a diagram illustrating an example of a question and answer between the voice output from the speaker of the automatic order post and the user in the embodiment.
- FIG. 8 is a flowchart illustrating an example of a processing procedure of an utterance sentence analysis process executed by the speech dialogue server according to the embodiment.
- FIG. 9 is a diagram illustrating an example of a question and answer between the voice output from the speaker of the automatic order post and the user in the embodiment.
- the voice interaction system creates a response sentence that prompts re-input of the first word that could not be analyzed, using the second word that could be analyzed in the user's utterance sentence.
- FIG. 1 is a diagram illustrating an example of a configuration of a voice interaction system according to the present embodiment.
- the voice interaction system 100 includes an automatic order post 10 installed outside the store 200 and a voice interaction server (voice interaction device) 20 installed inside the store 200. . Details of the voice interactive system 100 will be described later.
- an order post 10c for placing an order while directly talking with the store clerk is provided outside the store 200. Further, in the store 200, there is further provided an interactive device 30 that enables a dialogue between the store clerk and the user in cooperation with the order post 10c, and a product delivery counter 40 that delivers the product ordered by the user. .
- the user on the vehicle 300 enters the vehicle 300 from the road outside the site, parks the vehicle next to the order post 10c, the automatic order post 10a or 10b installed in the site, and places the order post. Use to place an order.
- the product delivery counter 40 receives the product.
- FIG. 2 is a block diagram showing an example of the configuration of the automatic order post 10 and the voice interaction server 20 in the present embodiment.
- the automatic order post 10 includes a microphone 11, a speaker 12, a display panel 13, and a vehicle detection sensor 14.
- the microphone 11 is an example of a voice input unit that acquires user utterance data and outputs the utterance data to the voice dialogue server 20, and outputs a signal corresponding to a voice (sound wave) uttered by the user to the voice dialogue server 20.
- the speaker 12 is an example of an audio output unit that outputs audio using audio data output from the audio dialogue server 20.
- the display panel 13 displays the contents of the order received by the voice dialogue server 20.
- FIG. 3 is a diagram showing an example of the screen of the display panel 13. As shown in FIG. 3, the display panel 13 displays the contents of the order that can be acquired by the voice interaction server 20.
- the contents of the order include an order number, a product surface, a size, a number, and the like.
- the vehicle detection sensor 14 is composed of an optical sensor, for example.
- the optical sensor for example, when light is emitted from a light source and the vehicle 300 moves to the side of the order post, whether or not the vehicle 300 exists at a predetermined position is detected by detecting reflected light reflected by the vehicle 300. To detect.
- the voice interaction server 20 starts order processing.
- the vehicle detection sensor 14 is not an essential component of the present disclosure. Other sensors may be used, and an order start button may be provided on the automatic order post 10 so that the start of the order is detected by a user operation.
- the voice dialogue server 20 includes a dialogue unit 21, a memory 22, and a display control unit 23.
- the dialogue unit 21 is an example of a control unit that performs dialogue processing with the user.
- the dialogue unit 21 accepts an order based on the user's utterance and creates order data.
- the dialogue unit 21 includes a word determination unit 21a, a response sentence creation unit 21b, a speech synthesis unit 21c, and an order data creation unit 21d.
- the dialogue unit 21 is configured by an integrated circuit such as an ASIC (Application Specific Integrated Circuit).
- the word determination unit 21a acquires utterance data indicating the user's utterance from the signal output from the microphone 11 of the automatic order post 10 (also functions as an acquisition unit), and analyzes the utterance sentence.
- the utterance sentence is analyzed by keyword spotting.
- keyword spotting keywords stored in the keyword DB in advance are extracted from a user's utterance sentence, and other sounds are discarded as redundant words. For example, if “Nise” is recorded as a keyword for instructing the change, and the user speaks “Keyword A” “Make” “Keyword B” “Make”, the keyword A is changed to the keyword B. Is analyzed. Further, for example, by using the technique described in Patent Document 1, unnecessary sounds such as “e ⁇ ” are removed from the utterance sentence to extract word candidates.
- the response sentence creation unit 21b creates a dialog sentence to be output to the automatic order post 10. Details will be described later.
- the voice synthesis unit 21c is an example of a voice generation unit that generates voice data for outputting the dialogue sentence created by the response sentence creation unit 21b from the speaker 12 of the automatic order post 10.
- the voice synthesizer 21c creates a synthesized voice of the response sentence by voice synthesis.
- the order data creation unit 21d is an example of a data processing unit that performs a predetermined process using the analysis result of the utterance data in the word determination unit 21a.
- the order data creation unit 21d uses the word extracted in the word determination unit 21a. Create the order data. Details will be described later.
- the memory 22 is configured by a storage medium such as a RAM (Randam Access Memory), a ROM (Read Only Memory), and a hard disk.
- the memory 22 stores data required for order processing executed by the voice interaction server 20. Specifically, the memory 22 stores a keyword DB 22a, a menu DB 22b, order data 22c, and the like.
- the keyword DB 22a is an example of a storage unit that stores a plurality of keywords.
- the plurality of keywords are keywords used for analyzing an utterance sentence.
- the keyword DB 22a indicates a word indicating a product name, a numerical value (word indicating the number of items), a word indicating a size, a word for instructing change of an existing order such as “to do”, an end of an order, etc.
- a plurality of keywords that are considered to be used for placing an order, such as a word to be stored, are stored.
- the keyword DB 22a may store keywords that are not directly related to order processing.
- the menu DB 22b is a database in which information on products handled at the store 200 is stored.
- FIG. 3 is a diagram illustrating an example of the menu DB 22b.
- the menu DB 22b stores a menu ID and a product name. Further, each menu ID stores a selectable size and an orderable number. Other arbitrary information such as drink hot and cold designation may be further added.
- the order data 22c is data indicating the contents of the order, and is sequentially created every time the user speaks.
- 4A to 4D are diagrams illustrating an example of the order data 22c.
- the order data 22c includes an order number, product name, size, and number.
- FIG. 5 is a diagram illustrating an example of a display screen that displays the order data 22c.
- the display screen of FIG. 5 corresponds to FIG. 4A.
- the order number, product name, size, and number are displayed.
- FIG. 6 is a flowchart showing an example of a processing procedure of order processing (voice dialogue method) executed by the voice dialogue server 20.
- FIG. 7 and FIG. 9 are diagrams showing examples of questions and answers between the voice output from the speaker 12 of the automatic order post 10 and the user.
- the numbers described in the left column of the column in which the texts in FIGS. 7 and 9 are described indicate the order of questions and answers. 7 and 9 are the same up to the fourth.
- the dialogue unit 21 of the voice dialogue server 20 starts an order process (S1).
- the speech synthesizer 21c generates speech data for outputting the speech “please order” from the speaker 12 by speech synthesis and outputs the speech data to the speaker 12, as shown in FIG.
- the word determination unit 21a acquires an utterance sentence indicating the user's utterance from the microphone 11 (S2), and performs an utterance sentence analysis process for analyzing the utterance sentence (S3). Note that the utterance sentence analysis process is executed one sentence at a time. When the user utters a plurality of sentences continuously, the utterance is decomposed into sentences and processed one sentence at a time.
- FIG. 8 is a flowchart showing an example of a processing procedure of an utterance sentence analysis process executed by the speech dialogue server 20.
- the word determination unit 21a analyzes the utterance sentence acquired in step S2 of FIG. 6 (S11).
- the voice analysis technique of Patent Document 2 may be used.
- the word determination unit 21a removes redundant words from the utterance sentence.
- the redundant word indicates a word that is not necessary for order processing.
- the redundant words in the present embodiment include, for example, words, particles, etc. that are not directly related to orders such as “eto”, “good morning” or adjectives.
- nouns such as product names, words for instructing addition of new orders, or words for instructing change of existing orders.
- the word determination unit 21a converts the utterance data into “Em”, “Hamburger”, “To”. “Potato”, “No”, “S”, “”, “Two”, “Each” are decomposed, and “Ett”, “To”, “No”, ““ are removed as redundant words.
- the word determination unit 21a extracts one or more words from the utterance data from which redundant words are removed, and determines whether each of the extracted one or more words matches the keyword stored in the keyword DB 22a. To do.
- the word determination unit 21a extracts five words “Ett”, “hamburger” “potato” “S” “two” “one by one”. To do. Further, the word determination unit 21a determines whether each of the five words “hamburger”, “potato”, “S”, “two”, and “one by one” matches any of the plurality of keywords stored in the keyword DB 22a. Determine whether.
- a word that does not match any of the plurality of keywords stored in the keyword DB 22a will be described as a first word
- a word that matches any of the plurality of keywords will be described as a second word.
- the word determination unit 21a determines whether or not there is a confirmation required part in the utterance sentence (S12). In the present embodiment, when the utterance data includes a misrecognized part or a condition nonconforming part, it is determined that there is a confirmation required part.
- the misrecognized portion is a portion determined to be the first word. More specifically, the first word includes a portion of a word that is not unclear but is not in the keyword DB 22a, and an unclear sound such as “**”.
- Conditions that do not meet conditions are orders that do not meet the delivery conditions for the product.
- the product delivery conditions are orders that do not satisfy the conditions stored in the menu DB 22b of FIG.
- the word determination unit 21a extracts three words “hamburger”, “S”, and “two”.
- “hamburger” an example of the first keyword
- “S” is associated with a numerical value (corresponding to the second keyword) from 1 to the orderable number, but “S” indicating the size.
- the word determination unit 21a determines that there is a second word “S” that does not match “hamburger (an example of the first keyword)”.
- the word determination unit 21a has a second word “100” that does not match the number that can be ordered, that is, “hamburger (first keyword)”. Judge that there is.
- the word determination unit 21a determines that the condition is not met when the second word that is not associated with the first keyword is extracted. Note that the word determination unit 21a also determines that the condition is not met even when there is a word indicating the number considered to be abnormal as the number of orders at one time.
- the word determining unit 21a determines that there is a confirmation required part.
- the word determination unit 21a checks whether or not the utterance is composed of the second word indicating the end of the order (S13). In the case of the utterance sentence 2 in the table of FIG. 7, it is determined that the order has not ended.
- the order data creation unit 21d determines whether or not the utterance text indicates a change of the existing order. Is determined (S14). In the case of the utterance sentence 2 in the table of FIG. 7, it is determined that the order is not a change.
- the order data creation unit 21d creates new order data (S15).
- the order data shown in FIG. 4A is generated. Since there are two second words indicating the product name in the utterance sentence, two records are created. In each record, the product name “hamburger” or “potato” is stored. In the “hamburger” record size column, as shown in FIG. 3, “ ⁇ ” indicating that the size cannot be specified is input because the size is not specified. In the field for the number of “hamburger” records, “2” is entered. For the “potato” record, “S” is stored in the size column and “2” is stored in the number column.
- the order data creation unit 21d changes the existing order (S16).
- step S4 After the order data is updated, as shown in FIG. 6, it is confirmed whether or not the order is finished (S4).
- step S13 in FIG. 8 since it is determined in step S13 in FIG. 8 that there is no second word indicating the end of the order (No in S4), the process proceeds to step S2 to acquire the next utterance sentence (S2).
- the word determination unit 21a acquires an utterance sentence indicating the user's utterance from the microphone 11 (S2), and performs an utterance sentence analysis process for analyzing the utterance sentence (S3).
- the word determination unit 21a analyzes the utterance sentence acquired in step S2 of FIG. 6 (S11).
- the voice dialogue server 20 determines whether or not there is a confirmation necessary part in the utterance sentence (S12). In the case of the utterance sentence of 3 in the table of FIG. 7, since there is “**” in the confirmation required point, it is determined that the first word is included.
- the voice dialogue server 20 confirms whether the necessary confirmation part is erroneous recognition (S17).
- the response sentence creating unit 21b creates a response sentence that prompts the re-utterance of the misrecognized part (S17). S18).
- the response sentence creation unit 21b creates a response sentence using the second word extracted from the utterance sentence that is determined to be misrecognized.
- the utterance sentence 3 in the table of FIG. 7 “No. 2” and “Ni” are extracted as the second word, so that “No. 2” is the second word uttered immediately before “**”.
- the response sentence is created by applying to the part.
- the second word extracted immediately after “**” may be used.
- a response sentence may be created using the second word extracted immediately after “**”.
- a response sentence may be created using a plurality of second words, such as “After 2nd word”, before 2nd word could not be heard.
- the voice synthesizer 21c creates voice data of the response sentence created in step S18 and outputs it to the speaker 12 (S19).
- the response sentence creation unit 21b creates a response sentence including the conformance condition when it is determined in step S12 that there is a condition nonconformity part as a confirmation required part by the word determination unit 21a (No in S17).
- step S12 when the above-mentioned utterance sentence “two hamburgers S” is input, in step S12, the word determination unit 21a determines that an unspecified size “S” is specified. For this reason, the response sentence creation unit 21b creates a response sentence including a conforming condition such as “hamburger size cannot be specified”.
- the word determination unit 21a determines that a larger number than the orderable number is designated.
- the response sentence creation unit 21b creates a response sentence including the number of orders that can be ordered at once (an example of matching conditions and an example of the second keyword), for example, “10”.
- the response sentence creation unit 21b creates a response sentence such as “Please specify the number of hamburgers within“ 10 ””, for example.
- the voice synthesizer 21c creates voice data of the response sentence created in step S20 and outputs it to the speaker 12 (S21).
- step S19 or step S21 the word determination unit 21a obtains an answer sentence indicating the user's utterance from the microphone 11 and analyzes the answer sentence (S22).
- the voice dialogue server 20 determines whether or not the answer sentence is an answer to the response sentence (S23).
- the voice conversation server 20 determines that the answer sentence is an answer to the response sentence.
- the voice dialogue server 20 extracts two second words “cola” and “one”. In this case, since the product name “Cola” is extracted, it is determined that it is not an answer to the response sentence.
- the voice dialog server 20 determines whether the response text indicates a change of the existing order (S24). In the case of the answer sentence 5 in the table of FIG. 7, it is determined that the order has been changed.
- the order data creation unit 21d changes the order data (S26). In the case of the answer sentence 5 in the table of FIG. 7, the data of the second size is changed from S to L as shown in FIG. 4B.
- the order data creation unit 21d creates data for a new order (S25).
- the speech dialogue server 20 discards the utterance sentence currently being analyzed, sets the answer sentence acquired in S22 as the utterance sentence, and continues the process ( S27). In the case of 5 in the table of FIG. 9, an answer sentence “one more cola” is set as an utterance sentence.
- the voice dialogue server 20 determines whether or not there is a confirmation necessary part using the analysis result of the answer sentence in step S22 (S12). In the case of 5 in the table of FIG. 9, it is determined that there is no confirmation required point, and the process proceeds to step S13.
- the voice dialogue server 20 confirms whether or not the utterance sentence is composed of the second word indicating the end of the order (S13). In the case of the utterance sentence 5 in the table of FIG. 9, it is determined that the order is not finished. Further, in the case of the utterance sentence 5 in the table of FIG. 9, since it is not a change of the existing order (No in S14), the order data is updated as a new order (S15).
- step S3 when it is analyzed in the utterance sentence analysis process in step S3 that the utterance sentence is not a keyword indicating the end of the order (No in S4), the process proceeds to step S2 and the utterance sentence is determined by the word determination unit 21a. Get the.
- the response sentence creation unit 21b creates voice data for inquiring whether there is a change, and causes the speaker 12 to output voice.
- Step S6 If there is a change (Yes in S6), the voice dialogue server 20 moves to Step S2 and accepts the change content.
- the voice dialogue server 20 determines the order data (S7).
- the store 200 prepares a product.
- the vehicle 300 moves to the product delivery counter 40, pays the price, and receives the product.
- the voice conversation server (speech dialog apparatus) 20 When it is determined that there is a misrecognized part, the voice conversation server (speech dialog apparatus) 20 according to the present embodiment generates a response sentence using an audible part of the utterance data determined to have a misrecognized part. create. As a result, it is possible to re-listen only the necessary confirmation part, and the speech recognition rate can be improved.
- the voice conversation server 20 of this embodiment can rehearse only the confirmation required part, the user can more clearly recognize which part the voice conversation server could not hear, and again the confirmation necessary part. Can effectively be prevented.
- the answer sentence becomes only a word or a very short sentence, and the recognition rate of the utterance can be improved.
- the voice interaction server 20 can shorten the time required for the entire order processing.
- the speech dialogue server 20 discards the utterance data when an utterance different from the answer candidate is made with respect to the response sentence. This is because if the utterance for the response sentence is different from the answer candidate, the previous utterance data is often canceled. As a result, it is possible to shorten processing such as the user canceling the immediately preceding utterance.
- the voice dialogue server 20 of the above embodiment for example, when an order not conforming to the menu DB 22b is made, for example, when the number exceeds 100, a response sentence including the number that can be ordered at one time. Create Thereby, it becomes easy for the user to make an utterance suitable for the condition.
- the voice conversation server of the above embodiment may be applied to an airline ticket reservation system installed in a facility such as an airport or a convenience store or an accommodation facility reservation system.
- the case where the dialogue unit 21 of the voice dialogue server 20 is configured using an integrated circuit such as an ASIC is illustrated, but the present invention is not limited to this.
- a system LSI Large Scale Integration
- a CPU Central Processing Unit
- a computer program software that defines the functions of the word determination unit 21a, the response sentence creation unit 21b, the speech synthesis unit 21c, and the order data creation unit 21d. It may be realized by.
- the computer program may be transmitted via an electric communication line, a wireless or wired communication line, a network represented by the Internet, a data broadcast, or the like.
- the speech dialogue server 20 is provided in the store 200 has been described as an example, but it may be provided in the automatic order post 10 or provided outside the store 200. It may be connected to each device in the store 200 and the automatic order post 10 via the network. Further, each component of the voice conversation server 20 does not need to be provided in one server, and may be provided in a distributed manner in a computer on the cloud, a computer provided in the store 200, or the like. .
- the word determination unit 21a includes the speech recognition process, that is, the process of converting the speech signal collected by the microphone 11 into text data.
- the voice recognition process may be configured to be executed by another processing module separated from the dialogue unit 21 or the voice dialogue server 20.
- the dialogue unit 21 includes the voice synthesis unit 21c, but the voice synthesis unit 21c may be configured by another processing module separated from the dialogue unit 21 or the voice dialogue server 20. I do not care. All of the word determination unit 21a, the response sentence creation unit 21b, the speech synthesis unit 21c, and the order data creation unit 21d that constitute the dialogue unit 21 are configured by separate processing modules separated from the dialogue unit 21 or the voice dialogue server 20. It does not matter.
- the present disclosure can be applied to a voice dialogue apparatus and a voice dialogue system that analyze a user's utterance and automatically receive a product order or make a reservation.
- the present disclosure can be applied to a system installed in a drive-through or a system that reserves a ticket installed in a facility such as a convenience store.
Landscapes
- Engineering & Computer Science (AREA)
- Human Computer Interaction (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Computational Linguistics (AREA)
- Signal Processing (AREA)
- Artificial Intelligence (AREA)
- Machine Translation (AREA)
Abstract
Description
例えば、商品の注文に用いられる音声対話システムでは、少なくとも「商品名」および「個数」を抽出する必要がある。商品によっては、「サイズ」等の項目が必要な場合がある。
以下、図1~図9を用いて、実施の形態を説明する。本実施の形態の音声対話システムは、ユーザの発話文のうちの解析できた第二単語を用いて、解析できなかった第一単語の再入力を促す応答文を作成する。
図1は、本実施の形態における音声対話システムの構成の一例を示す図である。
図2は、本実施の形態における自動オーダーポスト10および音声対話サーバ20の構成の一例を示すブロック図である。
音声対話サーバ20は、図2に示すように、対話部21と、メモリ22と、表示制御部23とを備えている。
図6は、音声対話サーバ20で実行される注文処理(音声対話方法)の処理手順の一例を示すフローチャートである。図7および図9は、自動オーダーポスト10のスピーカ12から出力される音声とユーザとの間の問答の一例を示す図である。なお、図7および図9の文章が記載された欄の左側の欄に記載している数字は、問答の順序を示している。図7と図9とでは、4番までが同じである。
本実施の形態の音声対話サーバ(音声対話装置)20は、誤認識部分があると判定された場合、誤認識部分があると判定された発話データのうちの聞き取れた部分を用いて応答文を作成する。これにより、要確認部分だけを聞き直すことが可能になり、発話の認識率を向上させることができる。
以上のように、本出願において開示する技術の例示として、実施の形態を説明した。しかしながら、本開示における技術は、これに限定されず、適宜、変更、置き換え、付加、省略などを行った実施の形態にも適用可能である。また、上記実施の形態で説明した各構成要素を組み合わせて、新たな実施の形態とすることも可能である。
10c オーダーポスト
11 マイク
12 スピーカ
13 表示パネル
20 音声対話サーバ
21 対話部
21a 単語判定部
21b 応答文作成部
21c 音声合成部
21d 注文データ作成部
22 メモリ
22a キーワードDB
22b メニューDB
22c 注文データ
23 表示制御部
30 対話装置
40 商品受け渡しカウンタ
100 音声対話システム
200 店舗
300 車両
Claims (6)
- ユーザの発話を示す発話データを取得する取得部と、
複数のキーワードが記憶された記憶部と、
前記発話データから複数の単語を抽出し、前記複数の単語のそれぞれについて、前記複数のキーワードのいずれかに一致するか否かを判定する単語判定部と、
前記複数の単語に、前記複数のキーワードのいずれにも一致しないと判定された第一単語が含まれる場合に、前記複数の単語のうちの前記複数のキーワードのいずれかに一致すると判定された第二単語を含む応答文であって、前記第一単語に相当する部分の再入力を促す応答文を作成する応答文作成部と、
前記応答文の音声データを生成する音声生成部とを備える、
音声対話装置。 - 前記取得部は、さらに、前記応答文の音声データが出力された後における前記ユーザの発話を示す回答データを取得し、
前記音声対話装置は、さらに、
前記応答文に対する1または複数の回答候補を取得し、前記回答データが前記1または複数の回答候補の何れかに一致しないときは、前記発話データを破棄するデータ処理部を備える、
請求項1に記載の音声対話装置。 - 前記記憶部は、前記複数のキーワードに含まれる第一キーワードと前記複数のキーワードに含まれる第二キーワードとが対応付けられて記憶され、
前記応答文作成部は、前記単語判定部が前記発話データから前記第一キーワードに一致する第二単語と前記第二キーワードに一致しない第二単語とを抽出した場合に、前記第二キーワードを含む応答文を作成する、
請求項1または2に記載の音声対話装置。 - 前記単語判定部は、前記発話データから冗長語を省いた後に、前記発話データからの前記複数の単語の抽出を行う、
請求項1~3の何れか1項に記載の音声対話装置。 - 請求項1~4の何れか1項に記載の音声対話装置と、
ユーザの発話データを取得し、前記音声対話装置に出力する音声入力部と、前記音声データを用いて音声出力する音声出力部とを備える自動オーダーポストとを備える、
音声対話システム。 - 複数の第二単語が記憶されたデータベースと、ユーザとの対話処理を行う制御部とを備えた音声対話装置において実行される音声対話方法であって、
前記制御部が、ユーザの発話データを取得するステップと、
前記制御部が、前記発話データから複数の単語を抽出し、前記複数の単語のそれぞれについて、前記複数のキーワードのいずれかに一致するか否かを判定するステップと、
前記制御部が、前記複数の単語に、前記複数のキーワードのいずれにも一致しないと判定された第一単語が含まれる場合に、前記複数の単語のうちの前記複数のキーワードの何れかに一致すると判定された第二単語を含む応答文であって、前記第一単語に相当する部分の再入力を促す応答文を作成するステップと、
前記制御部が、前記応答文の音声データを音声合成により作成するステップとを実行する、
音声対話方法。
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP2016505943A JP6384681B2 (ja) | 2014-03-07 | 2014-11-12 | 音声対話装置、音声対話システムおよび音声対話方法 |
US14/914,383 US20160210961A1 (en) | 2014-03-07 | 2014-11-12 | Speech interaction device, speech interaction system, and speech interaction method |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP2014045724 | 2014-03-07 | ||
JP2014-045724 | 2014-03-07 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2015132829A1 true WO2015132829A1 (ja) | 2015-09-11 |
Family
ID=54054674
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/JP2014/005689 WO2015132829A1 (ja) | 2014-03-07 | 2014-11-12 | 音声対話装置、音声対話システムおよび音声対話方法 |
Country Status (3)
Country | Link |
---|---|
US (1) | US20160210961A1 (ja) |
JP (1) | JP6384681B2 (ja) |
WO (1) | WO2015132829A1 (ja) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP6141483B1 (ja) * | 2016-03-29 | 2017-06-07 | 株式会社リクルートライフスタイル | 音声翻訳装置、音声翻訳方法、及び音声翻訳プログラム |
JP2022051777A (ja) * | 2018-06-12 | 2022-04-01 | トヨタ自動車株式会社 | 車両用コクピット |
CN114678012A (zh) * | 2022-02-18 | 2022-06-28 | 青岛海尔科技有限公司 | 语音交互数据的处理方法和装置、存储介质及电子装置 |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH05197389A (ja) * | 1991-08-13 | 1993-08-06 | Toshiba Corp | 音声認識装置 |
JP2001142484A (ja) * | 1991-11-18 | 2001-05-25 | Toshiba Corp | 音声対話方法及びそのシステム |
JP2007017990A (ja) * | 2006-07-20 | 2007-01-25 | Denso Corp | 単語列認識装置 |
JP2007187799A (ja) * | 2006-01-12 | 2007-07-26 | Nissan Motor Co Ltd | 音声対話装置および音声対話方法 |
Family Cites Families (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH07282081A (ja) * | 1994-04-12 | 1995-10-27 | Matsushita Electric Ind Co Ltd | 音声対話型情報検索装置 |
US7331036B1 (en) * | 2003-05-02 | 2008-02-12 | Intervoice Limited Partnership | System and method to graphically facilitate speech enabled user interfaces |
CN101111885A (zh) * | 2005-02-04 | 2008-01-23 | 株式会社查纳位资讯情报 | 使用抽出的声音数据生成应答声音的声音识别系统 |
US7949529B2 (en) * | 2005-08-29 | 2011-05-24 | Voicebox Technologies, Inc. | Mobile systems and methods of supporting natural language human-machine interactions |
US8457973B2 (en) * | 2006-03-04 | 2013-06-04 | AT&T Intellectual Propert II, L.P. | Menu hierarchy skipping dialog for directed dialog speech recognition |
US8600760B2 (en) * | 2006-11-28 | 2013-12-03 | General Motors Llc | Correcting substitution errors during automatic speech recognition by accepting a second best when first best is confusable |
US20130132079A1 (en) * | 2011-11-17 | 2013-05-23 | Microsoft Corporation | Interactive speech recognition |
WO2014197336A1 (en) * | 2013-06-07 | 2014-12-11 | Apple Inc. | System and method for detecting errors in interactions with a voice-based digital assistant |
WO2014197334A2 (en) * | 2013-06-07 | 2014-12-11 | Apple Inc. | System and method for user-specified pronunciation of words for speech synthesis and recognition |
US20140372892A1 (en) * | 2013-06-18 | 2014-12-18 | Microsoft Corporation | On-demand interface registration with a voice control system |
US9214156B2 (en) * | 2013-08-06 | 2015-12-15 | Nuance Communications, Inc. | Method and apparatus for a multi I/O modality language independent user-interaction platform |
-
2014
- 2014-11-12 JP JP2016505943A patent/JP6384681B2/ja active Active
- 2014-11-12 US US14/914,383 patent/US20160210961A1/en not_active Abandoned
- 2014-11-12 WO PCT/JP2014/005689 patent/WO2015132829A1/ja active Application Filing
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH05197389A (ja) * | 1991-08-13 | 1993-08-06 | Toshiba Corp | 音声認識装置 |
JP2001142484A (ja) * | 1991-11-18 | 2001-05-25 | Toshiba Corp | 音声対話方法及びそのシステム |
JP2007187799A (ja) * | 2006-01-12 | 2007-07-26 | Nissan Motor Co Ltd | 音声対話装置および音声対話方法 |
JP2007017990A (ja) * | 2006-07-20 | 2007-01-25 | Denso Corp | 単語列認識装置 |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP6141483B1 (ja) * | 2016-03-29 | 2017-06-07 | 株式会社リクルートライフスタイル | 音声翻訳装置、音声翻訳方法、及び音声翻訳プログラム |
JP2017182310A (ja) * | 2016-03-29 | 2017-10-05 | 株式会社リクルートライフスタイル | 音声翻訳装置、音声翻訳方法、及び音声翻訳プログラム |
JP2022051777A (ja) * | 2018-06-12 | 2022-04-01 | トヨタ自動車株式会社 | 車両用コクピット |
JP7327536B2 (ja) | 2018-06-12 | 2023-08-16 | トヨタ自動車株式会社 | 車両用コクピット |
CN114678012A (zh) * | 2022-02-18 | 2022-06-28 | 青岛海尔科技有限公司 | 语音交互数据的处理方法和装置、存储介质及电子装置 |
Also Published As
Publication number | Publication date |
---|---|
US20160210961A1 (en) | 2016-07-21 |
JP6384681B2 (ja) | 2018-09-05 |
JPWO2015132829A1 (ja) | 2017-03-30 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11823659B2 (en) | Speech recognition through disambiguation feedback | |
US11887590B2 (en) | Voice enablement and disablement of speech processing functionality | |
US11081107B2 (en) | Contextual entity resolution | |
US20210249013A1 (en) | Method and Apparatus to Provide Comprehensive Smart Assistant Services | |
US11004444B2 (en) | Systems and methods for enhancing user experience by communicating transient errors | |
US8756064B2 (en) | Method and system for creating frugal speech corpus using internet resources and conventional speech corpus | |
KR101309042B1 (ko) | 다중 도메인 음성 대화 장치 및 이를 이용한 다중 도메인 음성 대화 방법 | |
JP7230806B2 (ja) | 情報処理装置、及び情報処理方法 | |
WO2016136207A1 (ja) | 音声対話装置、音声対話システム、音声対話装置の制御方法、および、プログラム | |
TW201337911A (zh) | 電子裝置以及語音識別方法 | |
KR20160081244A (ko) | 자동 통역 시스템 및 이의 동작 방법 | |
KR101949427B1 (ko) | 상담내용 자동평가 시스템 및 그 방법 | |
JP6384681B2 (ja) | 音声対話装置、音声対話システムおよび音声対話方法 | |
JP2013088552A (ja) | 発音トレーニング装置 | |
Rudzionis et al. | Web services based hybrid recognizer of Lithuanian voice commands | |
CN112562734B (zh) | 一种基于语音检测的语音交互方法及其装置 | |
JP3340163B2 (ja) | 音声認識装置 | |
KR102011595B1 (ko) | 청각 장애인을 위한 소통 지원 장치 및 방법 | |
JP2022018724A (ja) | 情報処理装置、情報処理方法、及び情報処理プログラム | |
CN113593523A (zh) | 基于人工智能的语音检测方法、装置及电子设备 | |
Garg et al. | Automation and Presentation of Word Document Using Speech Recognition | |
Radzikowski et al. | Non-native speech recognition using audio style transfer | |
Engell | TaleTUC: Text-to-Speech and Other Enhancements to Existing Bus Route Information Systems | |
Tsiakoulis et al. | Dialogue context sensitive speech synthesis using factorized decision trees. | |
CN117995172A (zh) | 语音识别方法及装置、电子设备和计算机可读存储介质 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 14885006 Country of ref document: EP Kind code of ref document: A1 |
|
ENP | Entry into the national phase |
Ref document number: 2016505943 Country of ref document: JP Kind code of ref document: A |
|
WWE | Wipo information: entry into national phase |
Ref document number: 14914383 Country of ref document: US |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 14885006 Country of ref document: EP Kind code of ref document: A1 |