WO2019234487A1

WO2019234487A1 - Voice recognition system

Info

Publication number: WO2019234487A1
Application number: PCT/IB2019/000425
Authority: WO
Inventors: Hidenobu Suzuki; Makoto Manabe
Original assignee: Toyota Jidosha Kabushiki Kaisha; Denso Corporation
Priority date: 2018-06-05
Filing date: 2019-05-28
Publication date: 2019-12-12
Also published as: WO2019234487A8; JP2019211635A; JP7000257B2

Abstract

A voice recognition system includes: a speaker identification unit configured to identify a speaker from a voice signal; a voice recognition unit configured to perform a voice recognition process on the voice signal; an interaction control unit configured to analyze a result of voice recognition by the voice recognition unit, and generate a response instruction based on an analysis content; and a response generation unit configured to generate response data based on the response instruction. When a first speaker who starts voice operation is different from a second speaker who speaks after the start of the voice operation, the interaction control unit is configured to determine whether or not to accept the voice operation by the second speaker.

Description

VOICE RECOGNITION SYSTEM

BACKGROUND OF THE INVENTION

1. Field of the Invention

[0001] The present invention relates to a voice recognition system.

2. Description of Related Art

[0002] Japanese Patent Application Publication No. 2017-083600 (JP

2017-083600 A) discloses that even when a plurality of occupants in a vehicle speaks at the same time, mixing of voice is restrained by removing a voice acquired with a second microphone that is disposed in a passenger's seat from a voice acquired with a first microphone that is disposed in a driver's seat.

[0003] Japanese Patent Application Publication No. 2003-345389 (JP

2003-345389 A) discloses a voice recognition system including a driver's seat-side speech switch, a passenger's seat-side speech switch, and a rear seat-side speech switch, which recognize the voice of an occupant and permit voice input operation such that a driver and an occupant other than the driver can perform the voice input operation.

SUMMARY OF THE INVENTION

[0004] However, with the technique disclosed in JP 2017-083600 A, it is not possible to perform voice operation based on the voice of occupants other than the driver, which may deteriorate the convenience of the occupants.

[0005] The technique disclosed in JP 2003-345389 A does not consider system behavior in the case where one operator is executing voice operation, such as for executing a desired task through a plurality of interactive steps, and during the period when the system accepts the speech of the operator, another speaker speaks. It is a general standard that only after an occupant, who acquires the right to perform the voice operation first, completes his or her task, the speech of another occupant is accepted. Hence, the occupant, other than the occupant who first speaks and starts voice operation, might be prevented from participating in the voice operation, and it may not be possible for the occupants to jointly advance the voice operation.

[0006] The present invention provides a voice recognition system that enables a second speaker, who is different from a first speaker who speaks at the start of voice recognition, to perform voice operation depending on the situation.

[0007] The voice recognition system according to an aspect of the present invention is a voice recognition system configured to perform voice recognition of a voice of an occupant in a vehicle, and respond to a content of the recognized voice, the vehicle being configured to permit voice operation by speech. The voice recognition system includes: a speaker identification unit configured to identify a speaker from a voice signal; a voice recognition unit configured to perform a voice recognition process on the voice signal; an interaction control unit configured to analyze a result of voice recognition by the voice recognition unit and generate a response instruction based on an analysis content; and a response generation unit configured to generate response data based on the response instruction. The interaction control unit is configured to determine, when a second speaker who speaks after start of voice operation by a first speaker is different from the first speaker, whether or not to accept voice operation by the second speaker.

[0008] According to the above aspect of the present invention, it becomes possible to determine whether or not to accept the voice operation by the second speaker, different from the first speaker who speaks at the start of voice recognition, and to permit the voice operation by the second speaker depending on the situation. This makes it possible to improve the convenience of the occupants.

[0009] In the above aspect, the interaction control unit may be configured to determine, when the second speaker is different from the first speaker, whether or not to accept the voice operation by the second speaker, based on whether a speaker different from the first speaker is permitted to participate in operation of a task that is executed based on the voice operation started by the first speaker.

[0010] In the above aspect, the interaction control unit may be configured to determine, when the second speaker is different from the first speaker, whether or not to accept the voice operation by the second speaker, based on whether a task that is executed based on the voice operation started by the first speaker is completed.

[0011] In the above aspect, the interaction control unit may be configured to determine, when the second speaker is different from the first speaker, whether or not to accept the voice operation by the second speaker depending on a content of a speech by the second speaker.

[0012] In the above aspect, the interaction control unit may be configured to accept voice operation by a speaker different from the first speaker, based on an instruction of the first speaker.

[0013] In the above aspect, when a first task executed based on voice operation started by the voice operation of the first speaker is in execution, and a speaker different from the first speaker performs voice operation that requests execution of a second task different from the first task, the interaction control unit may be configured to accept voice operation relevant to the second task in parallel to the voice operation relevant to the first task.

BRIEF DESCRIPTION OF THE DRAWINGS

[0014] Features, advantages, and technical and industrial significance of exemplary embodiments of the invention will be described below with reference to the accompanying drawings, in which like numerals denote like elements, and wherein:

FIG. 1 is a block diagram showing a configuration example of a voice recognition system according to one embodiment of the present invention;

FIG. 2 is a flowchart showing one example of the procedure of a voice recognition method using the voice recognition system according to one embodiment of the present invention;

FIG. 3 shows a first process example of the voice recognition system according to one embodiment of the present invention;

FIG. 4 shows the positional relationship between occupants of a vehicle; FIG. 5 shows a second process example of the voice recognition system according to one embodiment of the present invention;

FIG 6 is a block diagram showing a first modification of the voice recognition system according to one embodiment of the present invention; and

FIG 7 is a block diagram showing a second modification of the voice recognition system according to one embodiment of the present invention.

DETAILED DESCRIPTION OF EMBODIMENTS

[0015] Hereinafter, one embodiment of the present invention will be described in detail with reference to the drawings.

[0016] First, the configuration of a voice recognition system according to one embodiment of the present invention will be described with reference to the FIG. 1. FIG. 1 shows a configuration example of the voice recognition system according to one embodiment of the present invention. The voice recognition system 1 shown in FIG. 1 includes an on-board device 10 that acquires a voice signal, and a server 20 that analyzes the voice signal and determines a response to the voice signal. The voice recognition system 1 performs voice recognition of a voice of an occupant in a vehicle 2 that permits voice operation by speech, and responds to the content of the recognized voice. The vehicle 2 also includes an on-board device 10, a front seat input-output device 30, and a backseat input-output device 40.

[0017] The front seat input-output device 30, which is an input-output device in the front seat of the vehicle 2, includes a microphone 31, a speaker 32, and an indicator 34. In the present embodiment, the front seat of the vehicle 2 is constituted of a driver's seat (D seat) and a passenger's seat (P seat), with each of the seats being equipped with the microphone 31 and the speaker 32. The indicator 34 is provided on the front face of the front seat.

[0018] The backseat input-output device 40, which is an input-output device in the backseat of the vehicle 2, includes a microphone 41 and a speaker 42. In the present embodiment, the backseat of the vehicle 2 is constituted of two rear seats, with each of the seats being equipped with the microphone 41 and the speaker 42.

[0019] The on-board device 10 includes an input-output control unit 11, a speaker identification unit 12, a voice input control unit 13, a display output control unit 14, and a voice output control unit 15. The server 20 includes an interaction control unit 21, a voice recognition unit 22, a response generation unit 23, and an interaction control rule storage unit 24.

[0020] The voice input control unit 13 acquires a voice signal input into the microphone 31 or the microphone 41 from a speaker, and performs a process such as noise removal, and AD conversion, and outputs the processed voice signal to the input-output control unit 11.

[0021] The speaker identification unit 12 identifies the speaker of a voice signal input into the voice input control unit 13, and outputs speaker information indicating the identified speaker to the input-output control unit 11. For example, when the microphone is disposed in each of the seats as in the present embodiment, a speaker can be identified by identifying which microphone receives input of a speech signal. Accordingly, the voice input control unit 13 may output to the input-output control unit 11 a voice signal in association with a microphone ID indicating which microphone receives input of the voice signal. In this case, the speaker identification unit 12 acquires the voice signal and the microphone ID from the input-output control unit 11, and identifies the speaker of the voice signal. Alternatively, the speaker identification unit 12 may identify the speaker of the voice signal with a method of acquiring voice signal patterns indicating the characteristics of the occupants of the vehicle in advance.

[0022] The input-output control unit 11 transmits the voice signal that is input from the voice input control unit 13 and the speaker information that is input from the speaker identification unit 12 to the interaction control unit 21 through an antenna.

[0023] The interaction control unit 21 receives the voice signal and the speaker information from the input-output control unit 11 through the antenna. The interaction control unit 21 then outputs the voice signal to the voice recognition unit 22.

[0024] The voice recognition unit 22 performs a voice recognition process on the voice signal input from the interaction control unit 21 to convert the voice signal into a character string (a text), and outputs the voice recognition result to the interaction control unit 21.

[0025] The interaction control unit 21 analyzes the voice recognition result with any known methods, such as a morphological analysis, and estimates a speaking intention of the speaker of the voice signal. The interaction control unit 21 then generates a response instruction based on the analysis content (i.e., in accordance with the speaking intention), and outputs the response instruction to the response generation unit 23.

[0026] Based on the response instruction input from the interaction control unit 21, the response generation unit 23 generates data to be displayed on the indicator 34 and voice data to be output from the speakers 32, 42 (these data are hereinafter called "response data"), and outputs the response data to the interaction control unit 21. The interaction control unit 21 transmits the response data to the input-output control unit 11 through the antenna.

[0027] Based on the response data input from the input-output control unit 11 , the display output control unit 14 generates display data to be displayed on the indicator 34, and outputs the display data to the indicator 34.

[0028] Based on the response data input from the input-output control unit 11, the voice output control unit 15 generates voice data to be output from the speakers 32, 42, and outputs the voice data to the speakers 32, 42.

[0029] In voice operation which requires a plurality of interactive steps until execution of a task, when a first speaker who starts the voice operation is not identical to a second speaker who speaks after the start of the voice operation (a speaker who speaks in the middle of an ongoing interactive step), the interaction control rule storage unit 24 stores an interaction control rule which defines whether or not to accept the speech of the second speaker. For example, the interactive control rule stipulates that the speech of the second speaker is accepted when the task is to search for and determine a store. The interactive control rule stipulates that the speech of the second speaker is rejected when the task is to transmit a mail or make a call. [0030] In the case where the second speaker speaks after the first speaker starts voice operation, the interaction control unit 21 determines whether the first speaker is identical to the second speaker. When both the speakers are not identical, the interaction control unit 21 refers to the interaction control rules stored in the interaction control rule storage unit 24. Then, the interaction control unit 21 determines whether a current task is the task that permits the second speaker to perform voice operation, and determines whether to accept the speech by the second speaker, i.e., whether or not to accept the voice operation by the second speaker.

[0031] Even in the case where the interaction control rule storage unit 24 is not provided, the interaction control unit 21 can determine whether to accept the speech by the second speaker depending on the content of the speech of the second speaker. For example, when the first speaker is a driver, the first speaker may leave the interactive steps, after the start of the voice operation, to other occupants in order to concentrate on driving operation. Hence, in the case where the first speaker who starts voice operation speaks "Other occupants answer the subsequent questions", or the second speaker speaks "On behalf of the first speaker ...", the interaction control unit 21 may accept the speech by the second speaker. The interaction control unit 21 may also estimate the speaking intention, and when determining that the content of the speech is irrelevant to the task, the interaction control unit 21 may reject the speech by the second speaker.

[0032] Alternatively, the first speaker who starts voice operation may be allowed to select whether to accept the speech of another occupant in the middle of an interactive step for advancing the interactive step, or to reject advancing the interactive step, and the selected result may be reported to the interaction control unit 21. In this case, the interaction control unit 21 generates a response instruction based on the selected result and outputs the response instruction to the response generation unit 23.

[0033] The interaction control unit 21 may present the result of determining whether to accept or reject the speech of the second speaker. For example, the interaction control unit 21 may display the determination result on the indicator 34, or may output the voice from the speakers 32, 42. When the voice recognition system 1 determines to reject the speech of the second speaker, the voice recognition system 1 may seek the determination of the first speaker.

[0034] When an occupant, other than the first speaker who starts voice operation, speaks while the interactive step is ongoing, and the content of the speech is to request a task other than the task in operation, the interaction control unit 21 may send a response instruction to the response generation unit 23 such that another voice operation is started in parallel to the ongoing voice operation.

[0035] The processes allocated to the respective processing units are not limited to the aforementioned example. For example, instead of the interaction control unit 21, the voice recognition unit 22 may estimate the speaking intention.

[0036] Next, a voice recognition method with the voice recognition system 1 will be described with reference to FIG. 2. FIG. 2 is a flowchart showing one example of the procedure of the voice recognition method using the voice recognition system 1.

[0037] In the voice recognition system 1, first, the voice input control unit 13 acquires a speaker's voice signal (step Sl l). Then, the speaker identification unit 12 identifies the speaker of the acquired voice signal (step S12).

[0038] Next, the voice recognition unit 22 performs a voice recognition process to convert the acquired voice signal into a character string (step S13). The interaction control unit 21 then analyzes the voice recognition result to estimate a speaking intention and generate a response instruction in accordance with the speaking intention (step S14).

[0039] Next, the response generation unit 23 generates response data based on the response instruction (step S15). The interaction control unit 21 then determines whether the task is completed based on the content of the voice signal (step S16). When determining that the task is not completed (step SI6-N0), the interaction control unit 21 advances the process to step S17, and continues the interactive step.

[0040] In the case of continuing the interactive step, the voice input control unit 13 acquires a voice again (step S17). Then, the speaker identification unit 12 identifies the speaker of the acquired voice signal (step S18), and the interaction control unit 21 determines whether the speaker is the initiator of the operation (step S19). [0041] In step S19, when determining that the speaker is not the initiator of the operation, (step Sl9-No), the interaction control unit 21 refers to the interaction control rules stored in the interaction control rule storage unit 24 and determines whether the current task is the task that permits participation of the speaker (step S20). When determining that the current task is not the task that permits participation of the speaker

(step S20-No), the interaction control unit 21 returns the process to step S17.

[0042] Meanwhile, when the interaction control unit 21 determines that the speaker is the initiator of the operation (step S 19- Yes) in step S19, or when the interaction control unit 21 determines in step S20 that the current task is the task that permits participation of the speaker (step S20-Yes), the voice recognition unit 22 performs the voice recognition process to convert the acquired voice signal to a character string (step S21). Then, the interaction control unit 21 analyzes the voice recognition result to estimate the speaking intention, and generates a response instruction in accordance with the speaking intention (step S22).

[0043] Next, the response generation unit 23 generates and outputs response data based on the response instruction (step S23). The interaction control unit 21 then determines whether the task is completed (step S24), and when determining that the task is not completed (step S24-No), the interaction control unit 21 returns the process to step S 1 7.

[0044] Next, a first specific example of the above process of the voice recognition system 1 will be described with reference to FIG. 3. FIG. 3 shows an example in which a speaker, different from the operation initiator, speaks in the middle of an interactive step, and the speaker is permitted to perform voice operation, as a first process example of the voice recognition system 1. Here, as shown in FIG. 4, four persons, A, B, C, D, are riding in the vehicle 2.

[0045] When A speaks "search for popular eel restaurants around here", a microphone 31-1 collects the voice, and the voice input control unit 13 acquires a voice signal "search for popular eel restaurants around here" (step Sl l). The speaker identification unit 12 identifies A as the speaker of the voice signal "search for popular eel restaurants around here"(step S 12). The voice recognition unit 22 performs voice recognition of the voice signal "search for popular eel restaurants around here" (step S13). The interaction control unit 21 analyzes the voice recognition result to estimate a speaking intention, and generates a response instruction in accordance with the speaking intention (step S14).

[0046] Upon reception of the response instruction from the interaction control unit 21 , the response generation unit 23 searches for the eel restaurants in a prescribed range from the location of the vehicle 2, and their degrees of popularity (for example, an average score of evaluation by visitors who visited the restaurants), generate data indicative of the search result, and displays search result X on the display screen of the indicator 34 (step S15). The response generation unit 23 also generates voice data "Four restaurants are found around here. Which restaurant do you choose?", and outputs the voice data from the speakers 32, 42 (step S15).

[0047] The interaction control unit 21 determines that the task "searching for stores and determining a destination" needs a response to the presented search result X, and determines that the task is not yet completed (step SI6-N0).

[0048] Next, when B speaks "I want to go to UNAFUJI", a microphone 41-1 collects the voice, and the voice input control unit 13 acquires a voice signal "I want to go to UNAFUJI" (step S 17). The speaker identification unit 12 identifies B as the speaker of the voice signal "I want to go to UNAFUJI"(step S 18).

[0049] The interaction control unit 21 determines that the speaker B is different from the operation initiator A (step Sl9-No). Then, based on the interaction control rules stored in the interaction control rule storage unit 24, the interaction control unit 21 determines whether the task "searching for stores and determining a destination" is the task that permits participation of the speaker B in the middle of execution of the task. Since the task "searching for stores and determining a destination" allows all the occupants A to D to make a determination, the interaction control unit 21 determines that the current task is the task that permits participation of the speaker (step S20-Yes).

[0050] The voice recognition unit 22 performs voice recognition of the voice signal "I want to go to UNAFUJI" (step S21). The interaction control unit 21 estimates a speaking intention of the voice, and generates a response instruction in accordance with the speaking intention (step S22).

[0051] Upon reception of the response instruction from the interaction control unit 21, the response generation unit 23 generates voice data "OK. Starting route guidance", and outputs the voice data from the speakers 32, 42 (step S23). The response generation unit 23 may further generate data indicating the route to "UNAFUJI", and display the data on the indicator 34.

[0052] Next, a second specific example of the process of the voice recognition system 1 will be described with reference to FIG. 5. FIG. 5 shows an example, in which a speaker different from the operation initiator speaks in the middle of an interactive step, and voice operation by the speaker is rejected, as a second process example of the voice recognition system 1. Here again, as shown in FIG. 4, four persons, A, B, C, D are riding in the vehicle 2.

[0053] When A speaks "I want to make a call to **", a microphone 31-1 collects the voice, and the voice input control unit 13 acquires the voice signal "I want to make a call to **" (step Sl 1). The speaker identification unit 12 identifies A as the speaker of the voice signal "I want to make a call to **" (step S12). The voice recognition unit 22 performs voice recognition of the voice signal "I want to make a call to **" (step S13). The interaction control unit 21 estimates a speaking intention of the voice, and generates a response instruction in accordance with the speaking intention (step S14).

[0054] Upon reception of the response instruction from the interaction control unit 21, the response generation unit 23 searches "**" from a telephone directory registered in advance, generates data indicating a search result, and displays search result Y on the display screen of the indicator 34 (step S15). The response generation unit 23 also generates voice data "Which **?", and outputs the voice data from the speakers 32, 42 (step s 15).

[0055] The interaction control unit 21 determines that the task "making a call" needs a response to the presented result Y, and determines that the task is not yet completed (step S 16-No).

[0056] Next, when B speaks "JIRO", a microphone 41-1 collects the voice, and the voice input control unit 13 acquires a voice signal "JIRO" (step S17). The speaker identification unit 12 identifies B as the speaker of the voice signal "JIRO"(step S18).

[0057] As a consequence, the interaction control unit 21 determines that the speaker B is different from the operation initiator A (step Sl9-No). Then, based on the interaction control rules stored in the interaction control rule storage unit 24, the interaction control unit 21 determines whether the task "making a call" permits participation of the speaker B in the middle of execution of the task. In the task "making a call", the operation initiator is assumed to determine to whom a call is made. Accordingly, the interaction control unit 21 determines that the current task is not the task that permits participation of the speaker B (step S20-No).

[0058] Next, when B speaks "HANAKO", the microphone 31-1 collects the voice, and the voice input control unit 13 acquires a voice signal "HANAKO" (step S17). The speaker identification unit 12 identifies A as the speaker of the voice signal "HANAKO" (step S18).

[0059] As a consequence, the interaction control unit 21 determines that the speaker A is the initiator of the operation (step Sl 9-Yes). The voice recognition unit 22 performs voice recognition of the voice signal "HANAKO" (step S21). The interaction control unit 21 estimates a speaking intention of the voice, and generates a response instruction in accordance with the speaking intention (step S22).

[0060] Upon reception of the response instruction from the interaction control unit 21, the response generation unit 23 generates voice data "OK. Calling HANAKO", and outputs the voice data from the speakers 32, 42 (step S23). At the same time, the response generation unit 23 acquires the telephone number of "HANAKO **", and displays telephone number Z of "HANAKO **" on the indicator 34 (step S23).

[0061] As described in the foregoing, the voice recognition system 1 identifies a speaker from a voice signal. When the first speaker who starts voice operation is not identical to the second speaker who speaks after the start of the voice operation, the voice recognition system 1 determines whether or not to accept the voice operation by the second speaker. Hence, with the embodiment, the second speaker, who is different from the first speaker who speaks at the start of voice recognition, can perform voice operation depending on the situation.

[0062] Whether the second speaker, different from the first speaker who speaks at the start of voice recognition, can perform voice operation is stored in advance for each task as interaction control rules. As a result, the voice recognition system 1 can determine whether or not to accept the voice operation by the second speaker based on the interaction control rules. Hence, with the embodiment, when the second speaker, different from the first speaker who speaks at the start of voice recognition, speaks, it is possible to automatically determine whether the current task is the task where it is appropriate to permit the second speaker to perform voice operation. Therefore, in the task where it is appropriate to permit the second speaker to perform voice operation, the second speaker can perform voice operation.

First Modification

[0063] Description is now given of a first modification of the voice recognition system 1. FIG. 6 shows the configuration of a voice recognition system 1 -2 that is the first modification of the voice recognition system 1. The voice recognition system 1-2 includes an on-board device 10' that acquires a voice signal, and the server 20 that analyzes the voice signal and determines a response to the voice signal. A vehicle 2-2 includes the on-board device 10', a front seat input-output device 30', and a backseat input-output device 40'.

[0064] The front seat input-output device 30' is different from the front seat input-output device 30 of the voice recognition system 1 in that a voice recognition start button 33 is provided in addition to the microphone 31 , the speaker 32, and the indicator 34. In the present embodiment, the front seat of the vehicle 2 is constituted of a driver's seat (D seat) and a passenger's seat (P seat), with each of the seats being equipped with the microphone 31, the speaker 32, and the voice recognition start button 33. The indicator 34 is provided on the front face of the front seat. [0065] The backseat input-output device 40' is different from the backseat input-output device 40 of the voice recognition system 1 in that a voice recognition start button 43 is provided in addition to the microphone 41 and the speaker 42. In the present embodiment, the backseat of the vehicle 2 is constituted of two rear seats, with each of the seats being equipped with the microphone 41, the speaker 42, and the voice recognition start button 43.

[0066] When an occupant speaks to the voice recognition system 1-2, the occupant speaks within a specified time after pressing the voice recognition start button 33 or the voice recognition start button 43. The voice recognition start buttons 33, 43 output a voice recognition start signal to a speaker identification unit 12', when the buttons are pressed.

[0067] The on-board device 10' includes the input-output control unit 11, the speaker identification unit 12', the voice input control unit 13, the display output control unit 14, and the voice output control unit 15. The server 20 includes the interaction control unit 21, the voice recognition unit 22, the response generation unit 23, and the interaction control rule storage unit 24. The voice recognition system 1-2 is different from the voice recognition system 1 in that the speaker identification unit 12 is replaced with the speaker identification unit 12'.

[0068] The speaker identification unit 12' can identify a speaker by identifying which voice recognition start button, the voice recognition start button 33 or the voice recognition start button 43, is used to input a voice recognition start signal. For example, when the voice recognition start signal is input through the voice recognition start button 33 included in the driver's seat, the driver is identified as the speaker.

[0069] As described in the foregoing, according to the first modification, the front seat input-output device 30' includes the voice recognition start button 33, and the backseat input-output device 40' includes the voice recognition start button 43. Hence, the speaker identification unit 12' can easily identify the speaker. In the case where the microphone 31 or 41 collects a voice before the voice recognition start button 43 is pressed, the input-output control unit 11 may be configured to cancel the voice signal input from the microphone 31 or 41. As a consequence, even when the microphone 31, 41 collect the voice irrelevant to execution of a task, it becomes possible to prevent malfunction.

Second Modification

[0070] Description is now given of a second modification of the voice recognition system 1. FIG. 7 shows the configuration of a voice recognition system 1-3 that is the second modification of the voice recognition system 1. The voice recognition system 1-3, which does not include the server 20, is mounted on a vehicle 2-3. The vehicle 2-3 includes the front seat input-output device 30, the backseat input-output device 40, and a voice recognition system 1-3.

[0071] The voice recognition system 1-3 includes the speaker identification unit 12, the voice input control unit 13, the display output control unit 14, the voice output control unit 15, an interaction control unit 2G, the voice recognition unit 22, the response generation unit 23, and the interaction control rule storage unit 24. The configuration of the voice recognition system 1 is divided into a vehicle side and a server side. However the configuration of the voice recognition system 1-3 is integrated into the vehicle side. Hence, the voice recognition system 1 -3 does not include the input-output control unit 11 provided in the voice recognition system 1.

[0072] The interaction control unit 2G is different from the interaction control unit 21 of the voice recognition system 1 in that the voice signal and the speaker information are directly acquired from the voice input control unit 13 and the speaker identification unit 12 without through the input-output control unit 11 and that the response data is directly output to the display output control unit 14 and the voice output control unit 15 without through the input-output control unit 11. Since other processing aspects of the interaction control units 2G are similar to those of the voice recognition system 1, the description thereof is omitted. Since the details of the process in each of other component units are also similar to those of the voice recognition system 1 , the description thereof is omitted.

[0073] Thus, the configuration of the voice recognition system 1-3 is integrated into the vehicle side without being divided into the vehicle side and the server side. Therefore, while calculation load at the side of the vehicle becomes larger than the calculation load in the voice recognition system 1, communication with the server 20 becomes unnecessary. Accordingly, it becomes possible to reliably accept speaker's requests without depending on the communication environment.

[0074] Although the voice recognition system has been described in the foregoing, a computer may be used such that the computer functions as all or some of the voice recognition system. Such a computer can implement the functions of the voice recognition system by storing in advance programs describing the contents of the processes which implement each of the function of the voice recognition system, and reading and executing the programs by the CPU of the computer.

[0075] The programs may be recorded on a computer-readable medium. If the computer-readable medium is used, it is possible to install the programs in the computer. Here, the computer-readable medium on which the programs are recorded may be a non-transitory recording medium. The non-transitory recording medium may be any recording medium, such as a CD-ROM and a DVD-ROM, for example.

[0076] Although the embodiment mentioned above has been described as a typical example, it is clear to those skilled in the art that many changes and replacements are possible within the scope and the range of the present invention. Therefore, it should be understood that the present invention is not limited to the above-mentioned embodiment, and various modifications and changes are possible without departing from the claims. For example, a plurality of configuration blocks shown in the block diagram of the embodiment may be combined, or one configuration block may be divided.

Claims

1. A voice recognition system configured to perform voice recognition of a voice of an occupant in a vehicle, and respond to a content of the recognized voice, the vehicle being configured to permit voice operation by speech, the voice recognition system comprising:

a speaker identification unit configured to identify a speaker from a voice signal; a voice recognition unit configured to perform a voice recognition process on the voice signal;

an interaction control unit configured to analyze a result of voice recognition by the voice recognition unit and generate a response instruction based on an analysis content; and

a response generation unit configured to generate response data based on the response instruction, wherein

the interaction control unit is configured to determine, when a second speaker who speaks after start of voice operation by a first speaker is different from the first speaker, whether or not to accept voice operation by the second speaker.

2. The voice recognition system according to claim 1, wherein the interaction control unit is configured to determine, when the second speaker is different from the first speaker, whether or not to accept the voice operation by the second speaker, based on whether a speaker different from the first speaker is permitted to participate in operation of a task that is executed based on the voice operation started by the first speaker.

3. The voice recognition system according to claim 1, wherein the interaction control unit is configured to determine, when the second speaker is different from the first speaker, whether or not to accept the voice operation by the second speaker, based on whether a task that is executed based on the voice operation started by the first speaker is completed.

4. The voice recognition system according to claim 1, wherein the interaction control unit is configured to determine, when the second speaker is different from the first speaker, whether or not to accept the voice operation by the second speaker depending on a content of a speech by the second speaker.

5. The voice recognition system according to claim 1, wherein the interaction control unit is configured to accept voice operation by a speaker different from the first speaker, based on an instruction of the first speaker.

6. The voice recognition system according to claim 1, wherein when a first task executed based on voice operation started by the voice operation of the first speaker is in execution, and a speaker different from the first speaker performs voice operation that requests execution of a second task different from the first task, the interaction control unit is configured to accept voice operation relevant to the second task in parallel to the voice operation relevant to the first task.