WO2019234487A1 - Voice recognition system - Google Patents

Voice recognition system Download PDF

Info

Publication number
WO2019234487A1
WO2019234487A1 PCT/IB2019/000425 IB2019000425W WO2019234487A1 WO 2019234487 A1 WO2019234487 A1 WO 2019234487A1 IB 2019000425 W IB2019000425 W IB 2019000425W WO 2019234487 A1 WO2019234487 A1 WO 2019234487A1
Authority
WO
WIPO (PCT)
Prior art keywords
speaker
voice
voice recognition
control unit
interaction control
Prior art date
Application number
PCT/IB2019/000425
Other languages
French (fr)
Other versions
WO2019234487A8 (en
Inventor
Hidenobu Suzuki
Makoto Manabe
Original Assignee
Toyota Jidosha Kabushiki Kaisha
Denso Corporation
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Toyota Jidosha Kabushiki Kaisha, Denso Corporation filed Critical Toyota Jidosha Kabushiki Kaisha
Publication of WO2019234487A1 publication Critical patent/WO2019234487A1/en
Publication of WO2019234487A8 publication Critical patent/WO2019234487A8/en

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification

Definitions

  • the present invention relates to a voice recognition system.
  • 2017-083600 A discloses that even when a plurality of occupants in a vehicle speaks at the same time, mixing of voice is restrained by removing a voice acquired with a second microphone that is disposed in a passenger's seat from a voice acquired with a first microphone that is disposed in a driver's seat.
  • 2003-345389 A discloses a voice recognition system including a driver's seat-side speech switch, a passenger's seat-side speech switch, and a rear seat-side speech switch, which recognize the voice of an occupant and permit voice input operation such that a driver and an occupant other than the driver can perform the voice input operation.
  • JP 2017-083600 A it is not possible to perform voice operation based on the voice of occupants other than the driver, which may deteriorate the convenience of the occupants.
  • JP 2003-345389 A does not consider system behavior in the case where one operator is executing voice operation, such as for executing a desired task through a plurality of interactive steps, and during the period when the system accepts the speech of the operator, another speaker speaks. It is a general standard that only after an occupant, who acquires the right to perform the voice operation first, completes his or her task, the speech of another occupant is accepted. Hence, the occupant, other than the occupant who first speaks and starts voice operation, might be prevented from participating in the voice operation, and it may not be possible for the occupants to jointly advance the voice operation.
  • the present invention provides a voice recognition system that enables a second speaker, who is different from a first speaker who speaks at the start of voice recognition, to perform voice operation depending on the situation.
  • the voice recognition system is a voice recognition system configured to perform voice recognition of a voice of an occupant in a vehicle, and respond to a content of the recognized voice, the vehicle being configured to permit voice operation by speech.
  • the voice recognition system includes: a speaker identification unit configured to identify a speaker from a voice signal; a voice recognition unit configured to perform a voice recognition process on the voice signal; an interaction control unit configured to analyze a result of voice recognition by the voice recognition unit and generate a response instruction based on an analysis content; and a response generation unit configured to generate response data based on the response instruction.
  • the interaction control unit is configured to determine, when a second speaker who speaks after start of voice operation by a first speaker is different from the first speaker, whether or not to accept voice operation by the second speaker.
  • the interaction control unit may be configured to determine, when the second speaker is different from the first speaker, whether or not to accept the voice operation by the second speaker, based on whether a speaker different from the first speaker is permitted to participate in operation of a task that is executed based on the voice operation started by the first speaker.
  • the interaction control unit may be configured to determine, when the second speaker is different from the first speaker, whether or not to accept the voice operation by the second speaker, based on whether a task that is executed based on the voice operation started by the first speaker is completed.
  • the interaction control unit may be configured to determine, when the second speaker is different from the first speaker, whether or not to accept the voice operation by the second speaker depending on a content of a speech by the second speaker.
  • the interaction control unit may be configured to accept voice operation by a speaker different from the first speaker, based on an instruction of the first speaker.
  • the interaction control unit may be configured to accept voice operation relevant to the second task in parallel to the voice operation relevant to the first task.
  • FIG. 1 is a block diagram showing a configuration example of a voice recognition system according to one embodiment of the present invention
  • FIG. 2 is a flowchart showing one example of the procedure of a voice recognition method using the voice recognition system according to one embodiment of the present invention
  • FIG. 3 shows a first process example of the voice recognition system according to one embodiment of the present invention
  • FIG. 4 shows the positional relationship between occupants of a vehicle
  • FIG. 5 shows a second process example of the voice recognition system according to one embodiment of the present invention
  • FIG 6 is a block diagram showing a first modification of the voice recognition system according to one embodiment of the present invention.
  • FIG 7 is a block diagram showing a second modification of the voice recognition system according to one embodiment of the present invention.
  • FIG. 1 shows a configuration example of the voice recognition system according to one embodiment of the present invention.
  • the voice recognition system 1 shown in FIG. 1 includes an on-board device 10 that acquires a voice signal, and a server 20 that analyzes the voice signal and determines a response to the voice signal.
  • the voice recognition system 1 performs voice recognition of a voice of an occupant in a vehicle 2 that permits voice operation by speech, and responds to the content of the recognized voice.
  • the vehicle 2 also includes an on-board device 10, a front seat input-output device 30, and a backseat input-output device 40.
  • the front seat input-output device 30, which is an input-output device in the front seat of the vehicle 2, includes a microphone 31, a speaker 32, and an indicator 34.
  • the front seat of the vehicle 2 is constituted of a driver's seat (D seat) and a passenger's seat (P seat), with each of the seats being equipped with the microphone 31 and the speaker 32.
  • the indicator 34 is provided on the front face of the front seat.
  • the backseat input-output device 40 which is an input-output device in the backseat of the vehicle 2, includes a microphone 41 and a speaker 42.
  • the backseat of the vehicle 2 is constituted of two rear seats, with each of the seats being equipped with the microphone 41 and the speaker 42.
  • the on-board device 10 includes an input-output control unit 11, a speaker identification unit 12, a voice input control unit 13, a display output control unit 14, and a voice output control unit 15.
  • the server 20 includes an interaction control unit 21, a voice recognition unit 22, a response generation unit 23, and an interaction control rule storage unit 24.
  • the voice input control unit 13 acquires a voice signal input into the microphone 31 or the microphone 41 from a speaker, and performs a process such as noise removal, and AD conversion, and outputs the processed voice signal to the input-output control unit 11.
  • the speaker identification unit 12 identifies the speaker of a voice signal input into the voice input control unit 13, and outputs speaker information indicating the identified speaker to the input-output control unit 11. For example, when the microphone is disposed in each of the seats as in the present embodiment, a speaker can be identified by identifying which microphone receives input of a speech signal. Accordingly, the voice input control unit 13 may output to the input-output control unit 11 a voice signal in association with a microphone ID indicating which microphone receives input of the voice signal. In this case, the speaker identification unit 12 acquires the voice signal and the microphone ID from the input-output control unit 11, and identifies the speaker of the voice signal. Alternatively, the speaker identification unit 12 may identify the speaker of the voice signal with a method of acquiring voice signal patterns indicating the characteristics of the occupants of the vehicle in advance.
  • the input-output control unit 11 transmits the voice signal that is input from the voice input control unit 13 and the speaker information that is input from the speaker identification unit 12 to the interaction control unit 21 through an antenna.
  • the interaction control unit 21 receives the voice signal and the speaker information from the input-output control unit 11 through the antenna. The interaction control unit 21 then outputs the voice signal to the voice recognition unit 22.
  • the voice recognition unit 22 performs a voice recognition process on the voice signal input from the interaction control unit 21 to convert the voice signal into a character string (a text), and outputs the voice recognition result to the interaction control unit 21.
  • the interaction control unit 21 analyzes the voice recognition result with any known methods, such as a morphological analysis, and estimates a speaking intention of the speaker of the voice signal. The interaction control unit 21 then generates a response instruction based on the analysis content (i.e., in accordance with the speaking intention), and outputs the response instruction to the response generation unit 23.
  • any known methods such as a morphological analysis
  • the interaction control unit 21 then generates a response instruction based on the analysis content (i.e., in accordance with the speaking intention), and outputs the response instruction to the response generation unit 23.
  • the response generation unit 23 Based on the response instruction input from the interaction control unit 21, the response generation unit 23 generates data to be displayed on the indicator 34 and voice data to be output from the speakers 32, 42 (these data are hereinafter called “response data”), and outputs the response data to the interaction control unit 21.
  • the interaction control unit 21 transmits the response data to the input-output control unit 11 through the antenna.
  • the display output control unit 14 Based on the response data input from the input-output control unit 11 , the display output control unit 14 generates display data to be displayed on the indicator 34, and outputs the display data to the indicator 34.
  • the voice output control unit 15 Based on the response data input from the input-output control unit 11, the voice output control unit 15 generates voice data to be output from the speakers 32, 42, and outputs the voice data to the speakers 32, 42.
  • the interaction control rule storage unit 24 stores an interaction control rule which defines whether or not to accept the speech of the second speaker.
  • the interactive control rule stipulates that the speech of the second speaker is accepted when the task is to search for and determine a store.
  • the interactive control rule stipulates that the speech of the second speaker is rejected when the task is to transmit a mail or make a call.
  • the interaction control unit 21 determines whether the first speaker is identical to the second speaker.
  • the interaction control unit 21 refers to the interaction control rules stored in the interaction control rule storage unit 24. Then, the interaction control unit 21 determines whether a current task is the task that permits the second speaker to perform voice operation, and determines whether to accept the speech by the second speaker, i.e., whether or not to accept the voice operation by the second speaker.
  • the interaction control unit 21 can determine whether to accept the speech by the second speaker depending on the content of the speech of the second speaker. For example, when the first speaker is a driver, the first speaker may leave the interactive steps, after the start of the voice operation, to other occupants in order to concentrate on driving operation. Hence, in the case where the first speaker who starts voice operation speaks "Other occupants answer the subsequent questions", or the second speaker speaks "On behalf of the first speaker ", the interaction control unit 21 may accept the speech by the second speaker. The interaction control unit 21 may also estimate the speaking intention, and when determining that the content of the speech is irrelevant to the task, the interaction control unit 21 may reject the speech by the second speaker.
  • the first speaker who starts voice operation may be allowed to select whether to accept the speech of another occupant in the middle of an interactive step for advancing the interactive step, or to reject advancing the interactive step, and the selected result may be reported to the interaction control unit 21.
  • the interaction control unit 21 generates a response instruction based on the selected result and outputs the response instruction to the response generation unit 23.
  • the interaction control unit 21 may present the result of determining whether to accept or reject the speech of the second speaker. For example, the interaction control unit 21 may display the determination result on the indicator 34, or may output the voice from the speakers 32, 42. When the voice recognition system 1 determines to reject the speech of the second speaker, the voice recognition system 1 may seek the determination of the first speaker.
  • the interaction control unit 21 may send a response instruction to the response generation unit 23 such that another voice operation is started in parallel to the ongoing voice operation.
  • the processes allocated to the respective processing units are not limited to the aforementioned example.
  • the voice recognition unit 22 may estimate the speaking intention.
  • FIG. 2 is a flowchart showing one example of the procedure of the voice recognition method using the voice recognition system 1.
  • the voice input control unit 13 acquires a speaker's voice signal (step Sl l). Then, the speaker identification unit 12 identifies the speaker of the acquired voice signal (step S12).
  • the voice recognition unit 22 performs a voice recognition process to convert the acquired voice signal into a character string (step S13).
  • the interaction control unit 21 then analyzes the voice recognition result to estimate a speaking intention and generate a response instruction in accordance with the speaking intention (step S14).
  • the response generation unit 23 generates response data based on the response instruction (step S15).
  • the interaction control unit 21 determines whether the task is completed based on the content of the voice signal (step S16). When determining that the task is not completed (step SI6-N0), the interaction control unit 21 advances the process to step S17, and continues the interactive step.
  • the voice input control unit 13 acquires a voice again (step S17). Then, the speaker identification unit 12 identifies the speaker of the acquired voice signal (step S18), and the interaction control unit 21 determines whether the speaker is the initiator of the operation (step S19). [0041] In step S19, when determining that the speaker is not the initiator of the operation, (step Sl9-No), the interaction control unit 21 refers to the interaction control rules stored in the interaction control rule storage unit 24 and determines whether the current task is the task that permits participation of the speaker (step S20). When determining that the current task is not the task that permits participation of the speaker
  • step S20-No the interaction control unit 21 returns the process to step S17.
  • step S 19- Yes the interaction control unit 21 determines that the speaker is the initiator of the operation (step S 19- Yes) in step S19, or when the interaction control unit 21 determines in step S20 that the current task is the task that permits participation of the speaker (step S20-Yes)
  • the voice recognition unit 22 performs the voice recognition process to convert the acquired voice signal to a character string (step S21). Then, the interaction control unit 21 analyzes the voice recognition result to estimate the speaking intention, and generates a response instruction in accordance with the speaking intention (step S22).
  • the response generation unit 23 generates and outputs response data based on the response instruction (step S23).
  • the interaction control unit 21 determines whether the task is completed (step S24), and when determining that the task is not completed (step S24-No), the interaction control unit 21 returns the process to step S 1 7.
  • FIG. 3 shows an example in which a speaker, different from the operation initiator, speaks in the middle of an interactive step, and the speaker is permitted to perform voice operation, as a first process example of the voice recognition system 1.
  • four persons, A, B, C, D, are riding in the vehicle 2.
  • a microphone 31-1 collects the voice, and the voice input control unit 13 acquires a voice signal "search for popular eel restaurants around here" (step Sl l).
  • the speaker identification unit 12 identifies A as the speaker of the voice signal "search for popular eel restaurants around here”(step S 12).
  • the voice recognition unit 22 performs voice recognition of the voice signal "search for popular eel restaurants around here” (step S13).
  • the interaction control unit 21 analyzes the voice recognition result to estimate a speaking intention, and generates a response instruction in accordance with the speaking intention (step S14).
  • the response generation unit 23 Upon reception of the response instruction from the interaction control unit 21 , the response generation unit 23 searches for the eel restaurants in a prescribed range from the location of the vehicle 2, and their degrees of popularity (for example, an average score of evaluation by visitors who visited the restaurants), generate data indicative of the search result, and displays search result X on the display screen of the indicator 34 (step S15). The response generation unit 23 also generates voice data "Four restaurants are found around here. Which restaurant do you choose?", and outputs the voice data from the speakers 32, 42 (step S15).
  • degrees of popularity for example, an average score of evaluation by visitors who visited the restaurants
  • the interaction control unit 21 determines that the task "searching for stores and determining a destination" needs a response to the presented search result X, and determines that the task is not yet completed (step SI6-N0).
  • a microphone 41-1 collects the voice, and the voice input control unit 13 acquires a voice signal "I want to go to UNAFUJI” (step S 17).
  • the speaker identification unit 12 identifies B as the speaker of the voice signal "I want to go to UNAFUJI"(step S 18).
  • the interaction control unit 21 determines that the speaker B is different from the operation initiator A (step Sl9-No). Then, based on the interaction control rules stored in the interaction control rule storage unit 24, the interaction control unit 21 determines whether the task "searching for stores and determining a destination" is the task that permits participation of the speaker B in the middle of execution of the task. Since the task "searching for stores and determining a destination" allows all the occupants A to D to make a determination, the interaction control unit 21 determines that the current task is the task that permits participation of the speaker (step S20-Yes).
  • the voice recognition unit 22 performs voice recognition of the voice signal "I want to go to UNAFUJI" (step S21).
  • the interaction control unit 21 estimates a speaking intention of the voice, and generates a response instruction in accordance with the speaking intention (step S22).
  • the response generation unit 23 Upon reception of the response instruction from the interaction control unit 21, the response generation unit 23 generates voice data "OK. Starting route guidance”, and outputs the voice data from the speakers 32, 42 (step S23). The response generation unit 23 may further generate data indicating the route to "UNAFUJI", and display the data on the indicator 34.
  • FIG. 5 shows an example, in which a speaker different from the operation initiator speaks in the middle of an interactive step, and voice operation by the speaker is rejected, as a second process example of the voice recognition system 1.
  • four persons, A, B, C, D are riding in the vehicle 2.
  • a microphone 31-1 collects the voice, and the voice input control unit 13 acquires the voice signal "I want to make a call to **" (step Sl 1).
  • the speaker identification unit 12 identifies A as the speaker of the voice signal "I want to make a call to **" (step S12).
  • the voice recognition unit 22 performs voice recognition of the voice signal "I want to make a call to **" (step S13).
  • the interaction control unit 21 estimates a speaking intention of the voice, and generates a response instruction in accordance with the speaking intention (step S14).
  • the response generation unit 23 Upon reception of the response instruction from the interaction control unit 21, the response generation unit 23 searches "**" from a telephone directory registered in advance, generates data indicating a search result, and displays search result Y on the display screen of the indicator 34 (step S15). The response generation unit 23 also generates voice data "Which **?", and outputs the voice data from the speakers 32, 42 (step s 15).
  • the interaction control unit 21 determines that the task "making a call” needs a response to the presented result Y, and determines that the task is not yet completed (step S 16-No).
  • a microphone 41-1 collects the voice, and the voice input control unit 13 acquires a voice signal "JIRO” (step S17).
  • the speaker identification unit 12 identifies B as the speaker of the voice signal "JIRO"(step S18).
  • the interaction control unit 21 determines that the speaker B is different from the operation initiator A (step Sl9-No). Then, based on the interaction control rules stored in the interaction control rule storage unit 24, the interaction control unit 21 determines whether the task "making a call" permits participation of the speaker B in the middle of execution of the task. In the task "making a call", the operation initiator is assumed to determine to whom a call is made. Accordingly, the interaction control unit 21 determines that the current task is not the task that permits participation of the speaker B (step S20-No).
  • the microphone 31-1 collects the voice, and the voice input control unit 13 acquires a voice signal "HANAKO” (step S17).
  • the speaker identification unit 12 identifies A as the speaker of the voice signal "HANAKO” (step S18).
  • the interaction control unit 21 determines that the speaker A is the initiator of the operation (step Sl 9-Yes).
  • the voice recognition unit 22 performs voice recognition of the voice signal "HANAKO" (step S21).
  • the interaction control unit 21 estimates a speaking intention of the voice, and generates a response instruction in accordance with the speaking intention (step S22).
  • the response generation unit 23 Upon reception of the response instruction from the interaction control unit 21, the response generation unit 23 generates voice data "OK. Calling HANAKO”, and outputs the voice data from the speakers 32, 42 (step S23). At the same time, the response generation unit 23 acquires the telephone number of "HANAKO **", and displays telephone number Z of "HANAKO **" on the indicator 34 (step S23).
  • the voice recognition system 1 identifies a speaker from a voice signal.
  • the voice recognition system 1 determines whether or not to accept the voice operation by the second speaker.
  • the second speaker who is different from the first speaker who speaks at the start of voice recognition, can perform voice operation depending on the situation.
  • the voice recognition system 1 can determine whether or not to accept the voice operation by the second speaker based on the interaction control rules.
  • the second speaker when the second speaker, different from the first speaker who speaks at the start of voice recognition, speaks, it is possible to automatically determine whether the current task is the task where it is appropriate to permit the second speaker to perform voice operation. Therefore, in the task where it is appropriate to permit the second speaker to perform voice operation, the second speaker can perform voice operation.
  • FIG. 6 shows the configuration of a voice recognition system 1 -2 that is the first modification of the voice recognition system 1.
  • the voice recognition system 1-2 includes an on-board device 10' that acquires a voice signal, and the server 20 that analyzes the voice signal and determines a response to the voice signal.
  • a vehicle 2-2 includes the on-board device 10', a front seat input-output device 30', and a backseat input-output device 40'.
  • the front seat input-output device 30' is different from the front seat input-output device 30 of the voice recognition system 1 in that a voice recognition start button 33 is provided in addition to the microphone 31 , the speaker 32, and the indicator 34.
  • the front seat of the vehicle 2 is constituted of a driver's seat (D seat) and a passenger's seat (P seat), with each of the seats being equipped with the microphone 31, the speaker 32, and the voice recognition start button 33.
  • the indicator 34 is provided on the front face of the front seat.
  • the backseat input-output device 40' is different from the backseat input-output device 40 of the voice recognition system 1 in that a voice recognition start button 43 is provided in addition to the microphone 41 and the speaker 42.
  • the backseat of the vehicle 2 is constituted of two rear seats, with each of the seats being equipped with the microphone 41, the speaker 42, and the voice recognition start button 43.
  • the voice recognition start buttons 33, 43 output a voice recognition start signal to a speaker identification unit 12', when the buttons are pressed.
  • the on-board device 10' includes the input-output control unit 11, the speaker identification unit 12', the voice input control unit 13, the display output control unit 14, and the voice output control unit 15.
  • the server 20 includes the interaction control unit 21, the voice recognition unit 22, the response generation unit 23, and the interaction control rule storage unit 24.
  • the voice recognition system 1-2 is different from the voice recognition system 1 in that the speaker identification unit 12 is replaced with the speaker identification unit 12'.
  • the speaker identification unit 12' can identify a speaker by identifying which voice recognition start button, the voice recognition start button 33 or the voice recognition start button 43, is used to input a voice recognition start signal. For example, when the voice recognition start signal is input through the voice recognition start button 33 included in the driver's seat, the driver is identified as the speaker.
  • the front seat input-output device 30' includes the voice recognition start button 33
  • the backseat input-output device 40' includes the voice recognition start button 43.
  • the speaker identification unit 12' can easily identify the speaker.
  • the input-output control unit 11 may be configured to cancel the voice signal input from the microphone 31 or 41. As a consequence, even when the microphone 31, 41 collect the voice irrelevant to execution of a task, it becomes possible to prevent malfunction.
  • FIG. 7 shows the configuration of a voice recognition system 1-3 that is the second modification of the voice recognition system 1.
  • the voice recognition system 1-3 which does not include the server 20, is mounted on a vehicle 2-3.
  • the vehicle 2-3 includes the front seat input-output device 30, the backseat input-output device 40, and a voice recognition system 1-3.
  • the voice recognition system 1-3 includes the speaker identification unit 12, the voice input control unit 13, the display output control unit 14, the voice output control unit 15, an interaction control unit 2G, the voice recognition unit 22, the response generation unit 23, and the interaction control rule storage unit 24.
  • the configuration of the voice recognition system 1 is divided into a vehicle side and a server side. However the configuration of the voice recognition system 1-3 is integrated into the vehicle side. Hence, the voice recognition system 1 -3 does not include the input-output control unit 11 provided in the voice recognition system 1.
  • the interaction control unit 2G is different from the interaction control unit 21 of the voice recognition system 1 in that the voice signal and the speaker information are directly acquired from the voice input control unit 13 and the speaker identification unit 12 without through the input-output control unit 11 and that the response data is directly output to the display output control unit 14 and the voice output control unit 15 without through the input-output control unit 11. Since other processing aspects of the interaction control units 2G are similar to those of the voice recognition system 1, the description thereof is omitted. Since the details of the process in each of other component units are also similar to those of the voice recognition system 1 , the description thereof is omitted.
  • the configuration of the voice recognition system 1-3 is integrated into the vehicle side without being divided into the vehicle side and the server side. Therefore, while calculation load at the side of the vehicle becomes larger than the calculation load in the voice recognition system 1, communication with the server 20 becomes unnecessary. Accordingly, it becomes possible to reliably accept speaker's requests without depending on the communication environment.
  • a computer may be used such that the computer functions as all or some of the voice recognition system.
  • Such a computer can implement the functions of the voice recognition system by storing in advance programs describing the contents of the processes which implement each of the function of the voice recognition system, and reading and executing the programs by the CPU of the computer.
  • the programs may be recorded on a computer-readable medium. If the computer-readable medium is used, it is possible to install the programs in the computer.
  • the computer-readable medium on which the programs are recorded may be a non-transitory recording medium.
  • the non-transitory recording medium may be any recording medium, such as a CD-ROM and a DVD-ROM, for example.

Abstract

A voice recognition system includes: a speaker identification unit configured to identify a speaker from a voice signal; a voice recognition unit configured to perform a voice recognition process on the voice signal; an interaction control unit configured to analyze a result of voice recognition by the voice recognition unit, and generate a response instruction based on an analysis content; and a response generation unit configured to generate response data based on the response instruction. When a first speaker who starts voice operation is different from a second speaker who speaks after the start of the voice operation, the interaction control unit is configured to determine whether or not to accept the voice operation by the second speaker.

Description

VOICE RECOGNITION SYSTEM
BACKGROUND OF THE INVENTION
1. Field of the Invention
[0001] The present invention relates to a voice recognition system.
2. Description of Related Art
[0002] Japanese Patent Application Publication No. 2017-083600 (JP
2017-083600 A) discloses that even when a plurality of occupants in a vehicle speaks at the same time, mixing of voice is restrained by removing a voice acquired with a second microphone that is disposed in a passenger's seat from a voice acquired with a first microphone that is disposed in a driver's seat.
[0003] Japanese Patent Application Publication No. 2003-345389 (JP
2003-345389 A) discloses a voice recognition system including a driver's seat-side speech switch, a passenger's seat-side speech switch, and a rear seat-side speech switch, which recognize the voice of an occupant and permit voice input operation such that a driver and an occupant other than the driver can perform the voice input operation.
SUMMARY OF THE INVENTION
[0004] However, with the technique disclosed in JP 2017-083600 A, it is not possible to perform voice operation based on the voice of occupants other than the driver, which may deteriorate the convenience of the occupants.
[0005] The technique disclosed in JP 2003-345389 A does not consider system behavior in the case where one operator is executing voice operation, such as for executing a desired task through a plurality of interactive steps, and during the period when the system accepts the speech of the operator, another speaker speaks. It is a general standard that only after an occupant, who acquires the right to perform the voice operation first, completes his or her task, the speech of another occupant is accepted. Hence, the occupant, other than the occupant who first speaks and starts voice operation, might be prevented from participating in the voice operation, and it may not be possible for the occupants to jointly advance the voice operation.
[0006] The present invention provides a voice recognition system that enables a second speaker, who is different from a first speaker who speaks at the start of voice recognition, to perform voice operation depending on the situation.
[0007] The voice recognition system according to an aspect of the present invention is a voice recognition system configured to perform voice recognition of a voice of an occupant in a vehicle, and respond to a content of the recognized voice, the vehicle being configured to permit voice operation by speech. The voice recognition system includes: a speaker identification unit configured to identify a speaker from a voice signal; a voice recognition unit configured to perform a voice recognition process on the voice signal; an interaction control unit configured to analyze a result of voice recognition by the voice recognition unit and generate a response instruction based on an analysis content; and a response generation unit configured to generate response data based on the response instruction. The interaction control unit is configured to determine, when a second speaker who speaks after start of voice operation by a first speaker is different from the first speaker, whether or not to accept voice operation by the second speaker.
[0008] According to the above aspect of the present invention, it becomes possible to determine whether or not to accept the voice operation by the second speaker, different from the first speaker who speaks at the start of voice recognition, and to permit the voice operation by the second speaker depending on the situation. This makes it possible to improve the convenience of the occupants.
[0009] In the above aspect, the interaction control unit may be configured to determine, when the second speaker is different from the first speaker, whether or not to accept the voice operation by the second speaker, based on whether a speaker different from the first speaker is permitted to participate in operation of a task that is executed based on the voice operation started by the first speaker.
[0010] In the above aspect, the interaction control unit may be configured to determine, when the second speaker is different from the first speaker, whether or not to accept the voice operation by the second speaker, based on whether a task that is executed based on the voice operation started by the first speaker is completed.
[0011] In the above aspect, the interaction control unit may be configured to determine, when the second speaker is different from the first speaker, whether or not to accept the voice operation by the second speaker depending on a content of a speech by the second speaker.
[0012] In the above aspect, the interaction control unit may be configured to accept voice operation by a speaker different from the first speaker, based on an instruction of the first speaker.
[0013] In the above aspect, when a first task executed based on voice operation started by the voice operation of the first speaker is in execution, and a speaker different from the first speaker performs voice operation that requests execution of a second task different from the first task, the interaction control unit may be configured to accept voice operation relevant to the second task in parallel to the voice operation relevant to the first task.
BRIEF DESCRIPTION OF THE DRAWINGS
[0014] Features, advantages, and technical and industrial significance of exemplary embodiments of the invention will be described below with reference to the accompanying drawings, in which like numerals denote like elements, and wherein:
FIG. 1 is a block diagram showing a configuration example of a voice recognition system according to one embodiment of the present invention;
FIG. 2 is a flowchart showing one example of the procedure of a voice recognition method using the voice recognition system according to one embodiment of the present invention;
FIG. 3 shows a first process example of the voice recognition system according to one embodiment of the present invention;
FIG. 4 shows the positional relationship between occupants of a vehicle; FIG. 5 shows a second process example of the voice recognition system according to one embodiment of the present invention;
FIG 6 is a block diagram showing a first modification of the voice recognition system according to one embodiment of the present invention; and
FIG 7 is a block diagram showing a second modification of the voice recognition system according to one embodiment of the present invention.
DETAILED DESCRIPTION OF EMBODIMENTS
[0015] Hereinafter, one embodiment of the present invention will be described in detail with reference to the drawings.
[0016] First, the configuration of a voice recognition system according to one embodiment of the present invention will be described with reference to the FIG. 1. FIG. 1 shows a configuration example of the voice recognition system according to one embodiment of the present invention. The voice recognition system 1 shown in FIG. 1 includes an on-board device 10 that acquires a voice signal, and a server 20 that analyzes the voice signal and determines a response to the voice signal. The voice recognition system 1 performs voice recognition of a voice of an occupant in a vehicle 2 that permits voice operation by speech, and responds to the content of the recognized voice. The vehicle 2 also includes an on-board device 10, a front seat input-output device 30, and a backseat input-output device 40.
[0017] The front seat input-output device 30, which is an input-output device in the front seat of the vehicle 2, includes a microphone 31, a speaker 32, and an indicator 34. In the present embodiment, the front seat of the vehicle 2 is constituted of a driver's seat (D seat) and a passenger's seat (P seat), with each of the seats being equipped with the microphone 31 and the speaker 32. The indicator 34 is provided on the front face of the front seat.
[0018] The backseat input-output device 40, which is an input-output device in the backseat of the vehicle 2, includes a microphone 41 and a speaker 42. In the present embodiment, the backseat of the vehicle 2 is constituted of two rear seats, with each of the seats being equipped with the microphone 41 and the speaker 42.
[0019] The on-board device 10 includes an input-output control unit 11, a speaker identification unit 12, a voice input control unit 13, a display output control unit 14, and a voice output control unit 15. The server 20 includes an interaction control unit 21, a voice recognition unit 22, a response generation unit 23, and an interaction control rule storage unit 24.
[0020] The voice input control unit 13 acquires a voice signal input into the microphone 31 or the microphone 41 from a speaker, and performs a process such as noise removal, and AD conversion, and outputs the processed voice signal to the input-output control unit 11.
[0021] The speaker identification unit 12 identifies the speaker of a voice signal input into the voice input control unit 13, and outputs speaker information indicating the identified speaker to the input-output control unit 11. For example, when the microphone is disposed in each of the seats as in the present embodiment, a speaker can be identified by identifying which microphone receives input of a speech signal. Accordingly, the voice input control unit 13 may output to the input-output control unit 11 a voice signal in association with a microphone ID indicating which microphone receives input of the voice signal. In this case, the speaker identification unit 12 acquires the voice signal and the microphone ID from the input-output control unit 11, and identifies the speaker of the voice signal. Alternatively, the speaker identification unit 12 may identify the speaker of the voice signal with a method of acquiring voice signal patterns indicating the characteristics of the occupants of the vehicle in advance.
[0022] The input-output control unit 11 transmits the voice signal that is input from the voice input control unit 13 and the speaker information that is input from the speaker identification unit 12 to the interaction control unit 21 through an antenna.
[0023] The interaction control unit 21 receives the voice signal and the speaker information from the input-output control unit 11 through the antenna. The interaction control unit 21 then outputs the voice signal to the voice recognition unit 22.
[0024] The voice recognition unit 22 performs a voice recognition process on the voice signal input from the interaction control unit 21 to convert the voice signal into a character string (a text), and outputs the voice recognition result to the interaction control unit 21.
[0025] The interaction control unit 21 analyzes the voice recognition result with any known methods, such as a morphological analysis, and estimates a speaking intention of the speaker of the voice signal. The interaction control unit 21 then generates a response instruction based on the analysis content (i.e., in accordance with the speaking intention), and outputs the response instruction to the response generation unit 23.
[0026] Based on the response instruction input from the interaction control unit 21, the response generation unit 23 generates data to be displayed on the indicator 34 and voice data to be output from the speakers 32, 42 (these data are hereinafter called "response data"), and outputs the response data to the interaction control unit 21. The interaction control unit 21 transmits the response data to the input-output control unit 11 through the antenna.
[0027] Based on the response data input from the input-output control unit 11 , the display output control unit 14 generates display data to be displayed on the indicator 34, and outputs the display data to the indicator 34.
[0028] Based on the response data input from the input-output control unit 11, the voice output control unit 15 generates voice data to be output from the speakers 32, 42, and outputs the voice data to the speakers 32, 42.
[0029] In voice operation which requires a plurality of interactive steps until execution of a task, when a first speaker who starts the voice operation is not identical to a second speaker who speaks after the start of the voice operation (a speaker who speaks in the middle of an ongoing interactive step), the interaction control rule storage unit 24 stores an interaction control rule which defines whether or not to accept the speech of the second speaker. For example, the interactive control rule stipulates that the speech of the second speaker is accepted when the task is to search for and determine a store. The interactive control rule stipulates that the speech of the second speaker is rejected when the task is to transmit a mail or make a call. [0030] In the case where the second speaker speaks after the first speaker starts voice operation, the interaction control unit 21 determines whether the first speaker is identical to the second speaker. When both the speakers are not identical, the interaction control unit 21 refers to the interaction control rules stored in the interaction control rule storage unit 24. Then, the interaction control unit 21 determines whether a current task is the task that permits the second speaker to perform voice operation, and determines whether to accept the speech by the second speaker, i.e., whether or not to accept the voice operation by the second speaker.
[0031] Even in the case where the interaction control rule storage unit 24 is not provided, the interaction control unit 21 can determine whether to accept the speech by the second speaker depending on the content of the speech of the second speaker. For example, when the first speaker is a driver, the first speaker may leave the interactive steps, after the start of the voice operation, to other occupants in order to concentrate on driving operation. Hence, in the case where the first speaker who starts voice operation speaks "Other occupants answer the subsequent questions", or the second speaker speaks "On behalf of the first speaker ...", the interaction control unit 21 may accept the speech by the second speaker. The interaction control unit 21 may also estimate the speaking intention, and when determining that the content of the speech is irrelevant to the task, the interaction control unit 21 may reject the speech by the second speaker.
[0032] Alternatively, the first speaker who starts voice operation may be allowed to select whether to accept the speech of another occupant in the middle of an interactive step for advancing the interactive step, or to reject advancing the interactive step, and the selected result may be reported to the interaction control unit 21. In this case, the interaction control unit 21 generates a response instruction based on the selected result and outputs the response instruction to the response generation unit 23.
[0033] The interaction control unit 21 may present the result of determining whether to accept or reject the speech of the second speaker. For example, the interaction control unit 21 may display the determination result on the indicator 34, or may output the voice from the speakers 32, 42. When the voice recognition system 1 determines to reject the speech of the second speaker, the voice recognition system 1 may seek the determination of the first speaker.
[0034] When an occupant, other than the first speaker who starts voice operation, speaks while the interactive step is ongoing, and the content of the speech is to request a task other than the task in operation, the interaction control unit 21 may send a response instruction to the response generation unit 23 such that another voice operation is started in parallel to the ongoing voice operation.
[0035] The processes allocated to the respective processing units are not limited to the aforementioned example. For example, instead of the interaction control unit 21, the voice recognition unit 22 may estimate the speaking intention.
[0036] Next, a voice recognition method with the voice recognition system 1 will be described with reference to FIG. 2. FIG. 2 is a flowchart showing one example of the procedure of the voice recognition method using the voice recognition system 1.
[0037] In the voice recognition system 1, first, the voice input control unit 13 acquires a speaker's voice signal (step Sl l). Then, the speaker identification unit 12 identifies the speaker of the acquired voice signal (step S12).
[0038] Next, the voice recognition unit 22 performs a voice recognition process to convert the acquired voice signal into a character string (step S13). The interaction control unit 21 then analyzes the voice recognition result to estimate a speaking intention and generate a response instruction in accordance with the speaking intention (step S14).
[0039] Next, the response generation unit 23 generates response data based on the response instruction (step S15). The interaction control unit 21 then determines whether the task is completed based on the content of the voice signal (step S16). When determining that the task is not completed (step SI6-N0), the interaction control unit 21 advances the process to step S17, and continues the interactive step.
[0040] In the case of continuing the interactive step, the voice input control unit 13 acquires a voice again (step S17). Then, the speaker identification unit 12 identifies the speaker of the acquired voice signal (step S18), and the interaction control unit 21 determines whether the speaker is the initiator of the operation (step S19). [0041] In step S19, when determining that the speaker is not the initiator of the operation, (step Sl9-No), the interaction control unit 21 refers to the interaction control rules stored in the interaction control rule storage unit 24 and determines whether the current task is the task that permits participation of the speaker (step S20). When determining that the current task is not the task that permits participation of the speaker
(step S20-No), the interaction control unit 21 returns the process to step S17.
[0042] Meanwhile, when the interaction control unit 21 determines that the speaker is the initiator of the operation (step S 19- Yes) in step S19, or when the interaction control unit 21 determines in step S20 that the current task is the task that permits participation of the speaker (step S20-Yes), the voice recognition unit 22 performs the voice recognition process to convert the acquired voice signal to a character string (step S21). Then, the interaction control unit 21 analyzes the voice recognition result to estimate the speaking intention, and generates a response instruction in accordance with the speaking intention (step S22).
[0043] Next, the response generation unit 23 generates and outputs response data based on the response instruction (step S23). The interaction control unit 21 then determines whether the task is completed (step S24), and when determining that the task is not completed (step S24-No), the interaction control unit 21 returns the process to step S 1 7.
[0044] Next, a first specific example of the above process of the voice recognition system 1 will be described with reference to FIG. 3. FIG. 3 shows an example in which a speaker, different from the operation initiator, speaks in the middle of an interactive step, and the speaker is permitted to perform voice operation, as a first process example of the voice recognition system 1. Here, as shown in FIG. 4, four persons, A, B, C, D, are riding in the vehicle 2.
[0045] When A speaks "search for popular eel restaurants around here", a microphone 31-1 collects the voice, and the voice input control unit 13 acquires a voice signal "search for popular eel restaurants around here" (step Sl l). The speaker identification unit 12 identifies A as the speaker of the voice signal "search for popular eel restaurants around here"(step S 12). The voice recognition unit 22 performs voice recognition of the voice signal "search for popular eel restaurants around here" (step S13). The interaction control unit 21 analyzes the voice recognition result to estimate a speaking intention, and generates a response instruction in accordance with the speaking intention (step S14).
[0046] Upon reception of the response instruction from the interaction control unit 21 , the response generation unit 23 searches for the eel restaurants in a prescribed range from the location of the vehicle 2, and their degrees of popularity (for example, an average score of evaluation by visitors who visited the restaurants), generate data indicative of the search result, and displays search result X on the display screen of the indicator 34 (step S15). The response generation unit 23 also generates voice data "Four restaurants are found around here. Which restaurant do you choose?", and outputs the voice data from the speakers 32, 42 (step S15).
[0047] The interaction control unit 21 determines that the task "searching for stores and determining a destination" needs a response to the presented search result X, and determines that the task is not yet completed (step SI6-N0).
[0048] Next, when B speaks "I want to go to UNAFUJI", a microphone 41-1 collects the voice, and the voice input control unit 13 acquires a voice signal "I want to go to UNAFUJI" (step S 17). The speaker identification unit 12 identifies B as the speaker of the voice signal "I want to go to UNAFUJI"(step S 18).
[0049] The interaction control unit 21 determines that the speaker B is different from the operation initiator A (step Sl9-No). Then, based on the interaction control rules stored in the interaction control rule storage unit 24, the interaction control unit 21 determines whether the task "searching for stores and determining a destination" is the task that permits participation of the speaker B in the middle of execution of the task. Since the task "searching for stores and determining a destination" allows all the occupants A to D to make a determination, the interaction control unit 21 determines that the current task is the task that permits participation of the speaker (step S20-Yes).
[0050] The voice recognition unit 22 performs voice recognition of the voice signal "I want to go to UNAFUJI" (step S21). The interaction control unit 21 estimates a speaking intention of the voice, and generates a response instruction in accordance with the speaking intention (step S22).
[0051] Upon reception of the response instruction from the interaction control unit 21, the response generation unit 23 generates voice data "OK. Starting route guidance", and outputs the voice data from the speakers 32, 42 (step S23). The response generation unit 23 may further generate data indicating the route to "UNAFUJI", and display the data on the indicator 34.
[0052] Next, a second specific example of the process of the voice recognition system 1 will be described with reference to FIG. 5. FIG. 5 shows an example, in which a speaker different from the operation initiator speaks in the middle of an interactive step, and voice operation by the speaker is rejected, as a second process example of the voice recognition system 1. Here again, as shown in FIG. 4, four persons, A, B, C, D are riding in the vehicle 2.
[0053] When A speaks "I want to make a call to **", a microphone 31-1 collects the voice, and the voice input control unit 13 acquires the voice signal "I want to make a call to **" (step Sl 1). The speaker identification unit 12 identifies A as the speaker of the voice signal "I want to make a call to **" (step S12). The voice recognition unit 22 performs voice recognition of the voice signal "I want to make a call to **" (step S13). The interaction control unit 21 estimates a speaking intention of the voice, and generates a response instruction in accordance with the speaking intention (step S14).
[0054] Upon reception of the response instruction from the interaction control unit 21, the response generation unit 23 searches "**" from a telephone directory registered in advance, generates data indicating a search result, and displays search result Y on the display screen of the indicator 34 (step S15). The response generation unit 23 also generates voice data "Which **?", and outputs the voice data from the speakers 32, 42 (step s 15).
[0055] The interaction control unit 21 determines that the task "making a call" needs a response to the presented result Y, and determines that the task is not yet completed (step S 16-No).
[0056] Next, when B speaks "JIRO", a microphone 41-1 collects the voice, and the voice input control unit 13 acquires a voice signal "JIRO" (step S17). The speaker identification unit 12 identifies B as the speaker of the voice signal "JIRO"(step S18).
[0057] As a consequence, the interaction control unit 21 determines that the speaker B is different from the operation initiator A (step Sl9-No). Then, based on the interaction control rules stored in the interaction control rule storage unit 24, the interaction control unit 21 determines whether the task "making a call" permits participation of the speaker B in the middle of execution of the task. In the task "making a call", the operation initiator is assumed to determine to whom a call is made. Accordingly, the interaction control unit 21 determines that the current task is not the task that permits participation of the speaker B (step S20-No).
[0058] Next, when B speaks "HANAKO", the microphone 31-1 collects the voice, and the voice input control unit 13 acquires a voice signal "HANAKO" (step S17). The speaker identification unit 12 identifies A as the speaker of the voice signal "HANAKO" (step S18).
[0059] As a consequence, the interaction control unit 21 determines that the speaker A is the initiator of the operation (step Sl 9-Yes). The voice recognition unit 22 performs voice recognition of the voice signal "HANAKO" (step S21). The interaction control unit 21 estimates a speaking intention of the voice, and generates a response instruction in accordance with the speaking intention (step S22).
[0060] Upon reception of the response instruction from the interaction control unit 21, the response generation unit 23 generates voice data "OK. Calling HANAKO", and outputs the voice data from the speakers 32, 42 (step S23). At the same time, the response generation unit 23 acquires the telephone number of "HANAKO **", and displays telephone number Z of "HANAKO **" on the indicator 34 (step S23).
[0061] As described in the foregoing, the voice recognition system 1 identifies a speaker from a voice signal. When the first speaker who starts voice operation is not identical to the second speaker who speaks after the start of the voice operation, the voice recognition system 1 determines whether or not to accept the voice operation by the second speaker. Hence, with the embodiment, the second speaker, who is different from the first speaker who speaks at the start of voice recognition, can perform voice operation depending on the situation.
[0062] Whether the second speaker, different from the first speaker who speaks at the start of voice recognition, can perform voice operation is stored in advance for each task as interaction control rules. As a result, the voice recognition system 1 can determine whether or not to accept the voice operation by the second speaker based on the interaction control rules. Hence, with the embodiment, when the second speaker, different from the first speaker who speaks at the start of voice recognition, speaks, it is possible to automatically determine whether the current task is the task where it is appropriate to permit the second speaker to perform voice operation. Therefore, in the task where it is appropriate to permit the second speaker to perform voice operation, the second speaker can perform voice operation.
First Modification
[0063] Description is now given of a first modification of the voice recognition system 1. FIG. 6 shows the configuration of a voice recognition system 1 -2 that is the first modification of the voice recognition system 1. The voice recognition system 1-2 includes an on-board device 10' that acquires a voice signal, and the server 20 that analyzes the voice signal and determines a response to the voice signal. A vehicle 2-2 includes the on-board device 10', a front seat input-output device 30', and a backseat input-output device 40'.
[0064] The front seat input-output device 30' is different from the front seat input-output device 30 of the voice recognition system 1 in that a voice recognition start button 33 is provided in addition to the microphone 31 , the speaker 32, and the indicator 34. In the present embodiment, the front seat of the vehicle 2 is constituted of a driver's seat (D seat) and a passenger's seat (P seat), with each of the seats being equipped with the microphone 31, the speaker 32, and the voice recognition start button 33. The indicator 34 is provided on the front face of the front seat. [0065] The backseat input-output device 40' is different from the backseat input-output device 40 of the voice recognition system 1 in that a voice recognition start button 43 is provided in addition to the microphone 41 and the speaker 42. In the present embodiment, the backseat of the vehicle 2 is constituted of two rear seats, with each of the seats being equipped with the microphone 41, the speaker 42, and the voice recognition start button 43.
[0066] When an occupant speaks to the voice recognition system 1-2, the occupant speaks within a specified time after pressing the voice recognition start button 33 or the voice recognition start button 43. The voice recognition start buttons 33, 43 output a voice recognition start signal to a speaker identification unit 12', when the buttons are pressed.
[0067] The on-board device 10' includes the input-output control unit 11, the speaker identification unit 12', the voice input control unit 13, the display output control unit 14, and the voice output control unit 15. The server 20 includes the interaction control unit 21, the voice recognition unit 22, the response generation unit 23, and the interaction control rule storage unit 24. The voice recognition system 1-2 is different from the voice recognition system 1 in that the speaker identification unit 12 is replaced with the speaker identification unit 12'.
[0068] The speaker identification unit 12' can identify a speaker by identifying which voice recognition start button, the voice recognition start button 33 or the voice recognition start button 43, is used to input a voice recognition start signal. For example, when the voice recognition start signal is input through the voice recognition start button 33 included in the driver's seat, the driver is identified as the speaker.
[0069] As described in the foregoing, according to the first modification, the front seat input-output device 30' includes the voice recognition start button 33, and the backseat input-output device 40' includes the voice recognition start button 43. Hence, the speaker identification unit 12' can easily identify the speaker. In the case where the microphone 31 or 41 collects a voice before the voice recognition start button 43 is pressed, the input-output control unit 11 may be configured to cancel the voice signal input from the microphone 31 or 41. As a consequence, even when the microphone 31, 41 collect the voice irrelevant to execution of a task, it becomes possible to prevent malfunction.
Second Modification
[0070] Description is now given of a second modification of the voice recognition system 1. FIG. 7 shows the configuration of a voice recognition system 1-3 that is the second modification of the voice recognition system 1. The voice recognition system 1-3, which does not include the server 20, is mounted on a vehicle 2-3. The vehicle 2-3 includes the front seat input-output device 30, the backseat input-output device 40, and a voice recognition system 1-3.
[0071] The voice recognition system 1-3 includes the speaker identification unit 12, the voice input control unit 13, the display output control unit 14, the voice output control unit 15, an interaction control unit 2G, the voice recognition unit 22, the response generation unit 23, and the interaction control rule storage unit 24. The configuration of the voice recognition system 1 is divided into a vehicle side and a server side. However the configuration of the voice recognition system 1-3 is integrated into the vehicle side. Hence, the voice recognition system 1 -3 does not include the input-output control unit 11 provided in the voice recognition system 1.
[0072] The interaction control unit 2G is different from the interaction control unit 21 of the voice recognition system 1 in that the voice signal and the speaker information are directly acquired from the voice input control unit 13 and the speaker identification unit 12 without through the input-output control unit 11 and that the response data is directly output to the display output control unit 14 and the voice output control unit 15 without through the input-output control unit 11. Since other processing aspects of the interaction control units 2G are similar to those of the voice recognition system 1, the description thereof is omitted. Since the details of the process in each of other component units are also similar to those of the voice recognition system 1 , the description thereof is omitted.
[0073] Thus, the configuration of the voice recognition system 1-3 is integrated into the vehicle side without being divided into the vehicle side and the server side. Therefore, while calculation load at the side of the vehicle becomes larger than the calculation load in the voice recognition system 1, communication with the server 20 becomes unnecessary. Accordingly, it becomes possible to reliably accept speaker's requests without depending on the communication environment.
[0074] Although the voice recognition system has been described in the foregoing, a computer may be used such that the computer functions as all or some of the voice recognition system. Such a computer can implement the functions of the voice recognition system by storing in advance programs describing the contents of the processes which implement each of the function of the voice recognition system, and reading and executing the programs by the CPU of the computer.
[0075] The programs may be recorded on a computer-readable medium. If the computer-readable medium is used, it is possible to install the programs in the computer. Here, the computer-readable medium on which the programs are recorded may be a non-transitory recording medium. The non-transitory recording medium may be any recording medium, such as a CD-ROM and a DVD-ROM, for example.
[0076] Although the embodiment mentioned above has been described as a typical example, it is clear to those skilled in the art that many changes and replacements are possible within the scope and the range of the present invention. Therefore, it should be understood that the present invention is not limited to the above-mentioned embodiment, and various modifications and changes are possible without departing from the claims. For example, a plurality of configuration blocks shown in the block diagram of the embodiment may be combined, or one configuration block may be divided.

Claims

1. A voice recognition system configured to perform voice recognition of a voice of an occupant in a vehicle, and respond to a content of the recognized voice, the vehicle being configured to permit voice operation by speech, the voice recognition system comprising:
a speaker identification unit configured to identify a speaker from a voice signal; a voice recognition unit configured to perform a voice recognition process on the voice signal;
an interaction control unit configured to analyze a result of voice recognition by the voice recognition unit and generate a response instruction based on an analysis content; and
a response generation unit configured to generate response data based on the response instruction, wherein
the interaction control unit is configured to determine, when a second speaker who speaks after start of voice operation by a first speaker is different from the first speaker, whether or not to accept voice operation by the second speaker.
2. The voice recognition system according to claim 1, wherein the interaction control unit is configured to determine, when the second speaker is different from the first speaker, whether or not to accept the voice operation by the second speaker, based on whether a speaker different from the first speaker is permitted to participate in operation of a task that is executed based on the voice operation started by the first speaker.
3. The voice recognition system according to claim 1, wherein the interaction control unit is configured to determine, when the second speaker is different from the first speaker, whether or not to accept the voice operation by the second speaker, based on whether a task that is executed based on the voice operation started by the first speaker is completed.
4. The voice recognition system according to claim 1, wherein the interaction control unit is configured to determine, when the second speaker is different from the first speaker, whether or not to accept the voice operation by the second speaker depending on a content of a speech by the second speaker.
5. The voice recognition system according to claim 1, wherein the interaction control unit is configured to accept voice operation by a speaker different from the first speaker, based on an instruction of the first speaker.
6. The voice recognition system according to claim 1, wherein when a first task executed based on voice operation started by the voice operation of the first speaker is in execution, and a speaker different from the first speaker performs voice operation that requests execution of a second task different from the first task, the interaction control unit is configured to accept voice operation relevant to the second task in parallel to the voice operation relevant to the first task.
PCT/IB2019/000425 2018-06-05 2019-05-28 Voice recognition system WO2019234487A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
JP2018-107851 2018-06-05
JP2018107851A JP7000257B2 (en) 2018-06-05 2018-06-05 Speech recognition system

Publications (2)

Publication Number Publication Date
WO2019234487A1 true WO2019234487A1 (en) 2019-12-12
WO2019234487A8 WO2019234487A8 (en) 2020-02-13

Family

ID=66951980

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/IB2019/000425 WO2019234487A1 (en) 2018-06-05 2019-05-28 Voice recognition system

Country Status (2)

Country Link
JP (1) JP7000257B2 (en)
WO (1) WO2019234487A1 (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120209454A1 (en) * 2011-02-10 2012-08-16 Ford Global Technologies, Llc System and method for controlling a restricted mode in a vehicle
CN107767875A (en) * 2017-10-17 2018-03-06 深圳市沃特沃德股份有限公司 Sound control method, device and terminal device

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP3261087A1 (en) 2013-09-03 2017-12-27 Panasonic Intellectual Property Corporation of America Voice interaction control method

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120209454A1 (en) * 2011-02-10 2012-08-16 Ford Global Technologies, Llc System and method for controlling a restricted mode in a vehicle
CN107767875A (en) * 2017-10-17 2018-03-06 深圳市沃特沃德股份有限公司 Sound control method, device and terminal device

Also Published As

Publication number Publication date
WO2019234487A8 (en) 2020-02-13
JP2019211635A (en) 2019-12-12
JP7000257B2 (en) 2022-01-19

Similar Documents

Publication Publication Date Title
JP7091807B2 (en) Information provision system and information provision method
WO2017134818A1 (en) Facility information guide device, server device, and facility information guide method
CN110281932A (en) Travel controlling system, vehicle, drive-control system, travel control method and storage medium
CN110598886B (en) Information processing device, information processing method, and non-transitory storage medium
CN113207104B (en) Vehicle-mounted information system
WO2019069731A1 (en) Information processing device, information processing method, program, and moving body
JP2014216714A (en) Information providing apparatus for sharing information in vehicle, portable terminal, and program
CN111801667B (en) Vehicle operation support device and vehicle operation support method
CN110880314B (en) Voice interaction device, control method for voice interaction device, and non-transitory storage medium storing program
JP2019105573A (en) Parking lot assessment device, parking lot information providing method, and data structure of parking lot information
JP2018169692A (en) Driving ability determination device
JP2008236636A (en) On-vehicle hands-free telephone call device and navigation system for vehicles
WO2019234487A1 (en) Voice recognition system
US20200082820A1 (en) Voice interaction device, control method of voice interaction device, and non-transitory recording medium storing program
JP6884605B2 (en) Judgment device
US11537692B2 (en) Personal identification apparatus and personal identification method
JP2023127059A (en) On-vehicle apparatus, information processing method, and program
US10984792B2 (en) Voice output system, voice output method, and program storage medium
JP7077895B2 (en) Operation evaluation device, operation evaluation system, operation evaluation method, and computer program for operation evaluation
JP2011201528A (en) On-vehicle system
JP2022065915A (en) Discrimination system, discrimination method, and computer program
JP7347344B2 (en) Information processing devices, information processing systems, programs, and vehicles
JP7318587B2 (en) agent controller
JP7393216B2 (en) Information output device and information output method
US11657806B2 (en) Information output system and information output method

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19731781

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19731781

Country of ref document: EP

Kind code of ref document: A1