WO2022239142A1 - Dispositif de reconnaissance vocale et procédé de reconnaissance vocale - Google Patents

Dispositif de reconnaissance vocale et procédé de reconnaissance vocale Download PDF

Info

Publication number
WO2022239142A1
WO2022239142A1 PCT/JP2021/018019 JP2021018019W WO2022239142A1 WO 2022239142 A1 WO2022239142 A1 WO 2022239142A1 JP 2021018019 W JP2021018019 W JP 2021018019W WO 2022239142 A1 WO2022239142 A1 WO 2022239142A1
Authority
WO
WIPO (PCT)
Prior art keywords
conversation
speaker
degree
data
unit
Prior art date
Application number
PCT/JP2021/018019
Other languages
English (en)
Japanese (ja)
Inventor
なるみ 細川
Original Assignee
三菱電機株式会社
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 三菱電機株式会社 filed Critical 三菱電機株式会社
Priority to PCT/JP2021/018019 priority Critical patent/WO2022239142A1/fr
Publication of WO2022239142A1 publication Critical patent/WO2022239142A1/fr

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/28Constructional details of speech recognition systems

Definitions

  • the present disclosure relates to a speech recognition device and a speech recognition method.
  • Some voice recognition devices that recognize user's utterances include a voice recognition device that starts voice recognition of user's utterances only when the user utters a wake-up word (hereinafter referred to as "conventional voice recognition device").
  • a wake-up word is a word that instructs the start of speech recognition.
  • a conventional speech recognition apparatus starts speech recognition only when a user utters a wakeup word, thereby avoiding a situation in which speech recognition is erroneously started when a user utters a word other than the wakeup word. can be avoided.
  • the user since the user has to utter a wake-up word every time the conventional speech recognition apparatus starts speech recognition, the user may feel annoyed.
  • Patent Document 1 discloses a speech recognition device that starts speech recognition of a user's utterance even if the user does not utter a wake-up word.
  • the response determination unit calculates a likelihood score, which is an index for determining whether or not the user's utterance is an utterance to the speech recognition device, based on the context of the user's series of utterances. is doing. Also, in the speech recognition apparatus, if the ID of the user who spoke last time is different from the ID of the user who is speaking this time, the likelihood score is lowered. The response determination unit determines that the user's utterance is an utterance to the speech recognition device if the likelihood score is greater than or equal to a threshold.
  • the speech recognition device disclosed in Patent Document 1 lowers the likelihood score when the ID of the user who spoke last time is different from the ID of the user who is speaking this time.
  • Another user speaks immediately after one user speaks there are cases where it takes a long time for another user to speak after one user speaks.
  • Lowering the likelihood score as in is not necessarily an appropriate process. Therefore, with this speech recognition device, it is difficult to distinguish whether the user's utterance is an utterance to the speech recognition device or is part of a conversation between two users. There is a problem that when the utterance is directed to the user, it may be misidentified as a conversation between users.
  • the present disclosure has been made to solve the above problems. It is an object of the present invention to obtain a speech recognition device and a speech recognition method that can reduce the number of speech recognition devices compared to the disclosed speech recognition device.
  • a speech recognition device recognizes an utterance among a plurality of users existing in a space based on an image in the space captured by a camera or sound in the space collected by a microphone.
  • a speaker identification unit that identifies a speaker who is a user who is speaking, a user other than the speaker identified by the speaker identification unit among a plurality of users, and the degree of past conversation with the speaker Only when conversation degree data is acquired, and based on the conversation degree data, it is determined whether or not the speaker's utterance is for the speech recognition device, and it is determined that it is for the speech recognition device.
  • a response unit that generates response data to the utterance of the speaker.
  • FIG. 1 is a configuration diagram showing a speech recognition device 5 according to Embodiment 1;
  • FIG. 2 is a hardware configuration diagram showing hardware of the speech recognition device 5 according to Embodiment 1.
  • FIG. 2 is a hardware configuration diagram of a computer in which the speech recognition device 5 is implemented by software, firmware, or the like;
  • FIG. 2 is a flowchart showing a speech recognition method, which is a processing procedure of the speech recognition device 5 shown in FIG. 1;
  • 2 is a flow chart showing a processing procedure of a conversation degree updating unit 15 in the speech recognition device 5 shown in FIG. 1;
  • 2 is a flow chart showing a processing procedure of a conversation degree updating unit 15 in the speech recognition device 5 shown in FIG. 1;
  • 2 is a configuration diagram showing a speech recognition device 5 according to Embodiment 2.
  • FIG. 2 is a hardware configuration diagram showing hardware of a speech recognition device 5 according to Embodiment 2.
  • FIG. 11 is a configuration diagram showing a speech recognition device 5 according to Embodiment 3;
  • FIG. 11 is a hardware configuration diagram showing hardware of a speech recognition device 5 according to Embodiment 3;
  • FIG. 1 is a configuration diagram showing a speech recognition device 5 according to Embodiment 1.
  • FIG. 2 is a hardware configuration diagram showing hardware of the speech recognition device 5 according to the first embodiment.
  • Camera 1 is implemented by, for example, an infrared camera, a visible light camera, or an ultraviolet camera. The camera 1 captures an image of the space, and outputs video data representing an image of the space to the speech recognition device 5 .
  • the microphone 2 collects sounds in the space and outputs sound data representing the sounds in the space to the speech recognition device 5 .
  • Embodiment 1 will be described assuming that the space is a compartment of a vehicle. Therefore, the multiple users present in the space are vehicle occupants. However, this is only an example, and the space may be, for example, a room within a building. When the space is a room in a building, the users existing in the space are residents, guests, or the like living in the room.
  • the in-vehicle sensor 3 is, for example, a pressure sensor installed in each of a plurality of seats, an infrared sensor installed in the vehicle, a GPS (Global Positioning System) sensor installed in the vehicle, or a sensor installed in the vehicle. It is realized by a gyro sensor that is For example, when the vehicle-mounted sensor 3 is implemented by a plurality of pressure sensors, the pressure sensor that senses the weight of the occupant among the plurality of pressure sensors outputs a sensing signal to the voice recognition device 5 . When the vehicle-mounted sensor 3 is realized by, for example, a GPS sensor, the vehicle-mounted sensor 3 outputs running position data indicating the position where the vehicle is running to the voice recognition device 5 .
  • a GPS Global Positioning System
  • the navigation device 4 is an in-vehicle device installed in the vehicle, or a device such as a smart phone brought into the vehicle by the user.
  • the navigation device 4 has, for example, a navigation function of guiding a route to a destination.
  • the navigation device 4 outputs, for example, route data indicating a route to a destination to the voice recognition device 5 .
  • the speech recognition device 5 includes a speaker identification unit 11 , a speaker presence/absence determination unit 14 , a conversation degree update unit 15 and a response unit 18 .
  • the voice recognition device 5 recognizes the voice of an occupant who is a user, and determines whether or not the voice of the occupant is directed to the voice recognition device 5 .
  • the speech recognition device 5 generates response data to the utterance of the speaker only when it determines that the speech is directed to the speech recognition device 5 .
  • the voice recognition device 5 outputs the response data to the vehicle-mounted device 6 and the output device 7, respectively.
  • the in-vehicle device 6 is, for example, a navigation device, an air conditioner device, or an audio device.
  • the vehicle-mounted device 6 operates according to the response data output from the speech recognition device 5 .
  • the output device 7 is, for example, a display, a lighting device, or a speaker.
  • the output device 7 operates according to the response data output from the speech recognition device 5 .
  • the speaker identification unit 11 is implemented by, for example, a speaker identification circuit 31 shown in FIG.
  • the speaker identification unit 11 includes an occupant identification unit 12 and a speaker identification processing unit 13 .
  • the speaker identification unit 11 detects the presence of a plurality of occupants present in the vehicle based on the image of the vehicle interior, which is the space captured by the camera 1, or the sound of the vehicle interior collected by the microphone 2. to identify the speaker who is the passenger who is speaking.
  • the speaker identification unit 11 outputs speaker data indicating the identified speaker to the speaker presence/absence determination unit 14 .
  • the occupant identifying unit 12 acquires a sensing signal from a pressure sensor that senses the weight of the occupant, among a plurality of pressure sensors included in the vehicle-mounted sensor 3 . By acquiring the sensing signal, the occupant identifying unit 12 identifies the pressure sensor outputting the sensing signal among the plurality of pressure sensors, and places the occupant in the seat where the identified pressure sensor is installed. is sitting. The occupant identification unit 12 acquires video data output from the camera 1 . The occupant identification unit 12 selects an image showing the face of each occupant (hereinafter referred to as a “face image”) from an image of an area including a seat on which each occupant is seated, among images of the interior of the vehicle indicated by the image data. ) is cut out.
  • face image an image showing the face of each occupant
  • the occupant identification unit 12 analyzes each face image to perform personal authentication of each occupant, and outputs identification information of each occupant to the speaker identification processing unit 13 .
  • the occupant identification unit 12 also outputs the face image of each occupant to the speaker identification processing unit 13 .
  • the occupant identification unit 12 may identify the position of the seat on which each occupant sits, based on the video data output from the camera 1, as will be described later.
  • the occupant identification unit 12 performs personal authentication of each occupant based on the video inside the vehicle.
  • the passenger identifying unit 12 will identify each passenger based on the sound in the vehicle indicated by the sound data. Authentication may be performed. That is, the occupant identification unit 12 extracts the voice of each occupant from the sound inside the vehicle indicated by the sound data, and performs voiceprint authentication of each voice, thereby performing personal authentication of each occupant.
  • the sounds in the cabin include the running sound of the vehicle, the sound of cold or warm air discharged from the air conditioner, the noise outside the vehicle, or the sound of music being played by audio equipment.
  • the speaker identification processing unit 13 acquires the identification information of each passenger and the face image of each passenger from the passenger identification unit 12 .
  • the speaker identification processing unit 13 identifies the occupant whose mouth is moving by analyzing each face image.
  • the speaker identification processing unit 13 assumes that the passenger whose mouth is moving is the speaker, and outputs the identification information of the speaker to each of the speaker presence/absence determination unit 14, the conversation degree update processing unit 17, and the response appropriateness determination unit 19. do.
  • the speaker identification processing unit 13 identifies the speaker based on the image inside the vehicle.
  • the speaker identification processing unit 13 may identify the speaker based on the sound inside the vehicle collected by the microphone 2 . That is, for example, when the microphones 2 are installed in each seat, the speaker identification processing unit 13 selects the seat where the microphone 2 that collects the loudest voice among the plurality of microphones 2 is installed. You may make it specify that the passenger
  • the speaker identification processing unit 13 may identify the speaker from the incoming direction of the voice to the microphone 2 . In these cases, the speaker identification processing unit 13 outputs sound data representing the voice of the identified speaker to the speaker presence/absence determination unit 14 .
  • the talker presence/absence determination unit 14 is realized by, for example, a talker presence/absence determination circuit 32 shown in FIG. Based on the number of speakers identified by the speaker identification unit 11, the speaker presence/absence determination unit 14 selects, from among a plurality of passengers, passengers who are conversing with the speaker identified by the speaker identification unit 11. Determine if a certain talker is present. That is, when the speaker identification processing unit 13 acquires the identification information of each of the plurality of speakers, the speaker presence/absence determination unit 14 determines that the speaker exists. The talker presence/absence determination unit 14 outputs to the conversation degree update unit 15 a determination result indicating whether or not there is a talker who is having a conversation with the speaker.
  • the conversation level update unit 15 is realized by, for example, the conversation level update circuit 33 shown in FIG.
  • the conversation level update unit 15 includes a conversation level data storage unit 16 and a conversation level update processing unit 17 .
  • the conversation degree update unit 15 acquires conversation degree data indicating the past conversation degree of the speaker.
  • the conversation degree update unit 15 acquires conversation degree data from the conversation degree data storage unit 16 inside.
  • the conversation level update unit 15 may acquire the conversation level data from outside the speech recognition device 5 .
  • Conversation degree update unit 15 updates the acquired conversation degree data so as to increase the degree of conversation when it is determined by talker presence/absence determination unit 14 that a talker exists.
  • the conversation degree update unit 15 updates the acquired conversation degree data so as to lower the degree of conversation when the conversation person presence/absence determination unit 14 determines that there is no conversation person.
  • the degree of conversation is the frequency of conversation between the speaker and the fellow passenger, the number of conversations, or the conversation time.
  • the conversation degree data storage unit 16 is a storage medium that stores conversation degree data indicating the degree of past conversation between a speaker and a passenger other than the speaker among the plurality of passengers present in the vehicle. be.
  • each passenger can be a speaker, so the past conversation degree data for each passenger is stored. That is, if there are, for example, two passengers in the vehicle, the conversation degree data storage unit 16 stores two pieces of conversation degree data, and if there are, for example, three passengers in the vehicle. If there is, the conversation level data storage unit 16 stores three pieces of conversation level data.
  • the conversation degree update processing unit 17 acquires the conversation degree data for the speaker indicated by the identification information output from the speaker identification processing unit 13 from among the conversation degree data stored in the conversation degree data storage unit 16. . If the determination result output from the speaker presence/absence determining unit 14 indicates that a speaker is present, the conversation level update processing unit 17 updates the acquired conversation level data so as to increase the level of conversation. Update. If the determination result output from the speaker presence/absence determining unit 14 indicates that the speaker does not exist, the conversation level update processing unit 17 updates the acquired conversation level data so as to lower the level of conversation. Update. The conversation level update processing unit 17 stores the updated conversation level data in the conversation level data storage unit 16 .
  • the response unit 18 is implemented by, for example, a response circuit 34 shown in FIG.
  • the response unit 18 includes a response appropriateness determination unit 19 , a voice recognition unit 20 and a response data generation unit 21 .
  • the response unit 18 collects conversation degree data indicating the degree of past conversations with the speaker other than the speaker identified by the speaker identification unit 11 among the plurality of passengers present in the vehicle. get.
  • the response unit 18 determines whether or not the speech of the speaker is directed to the speech recognition device 5 based on the acquired conversation degree data.
  • the response unit 18 generates response data to the utterance of the utterer only when it is determined that the utterance of the utterer is directed to the speech recognition device 5 .
  • the response propriety determination unit 19 acquires conversation degree data for the speaker indicated by the identification information output from the speaker identification processing unit 13 from among the conversation degree data stored in the conversation degree data storage unit 16.
  • the response propriety determination unit 19 includes the image of the interior of the vehicle, which is the space captured by the camera 1, the sound of the interior of the vehicle collected by the microphone 2, the traveling position data output from the vehicle-mounted sensor 3, and the data output from the navigation device 4.
  • a degree of response which is an index for determining whether or not the utterance of the utterer is directed to the speech recognition device 5, is calculated based on the obtained route data.
  • the response propriety determination unit 19 corrects the response degree based on the degree of conversation indicated by the acquired conversation degree data.
  • the degree of response after correction decreases as the degree of conversation indicated by the degree of conversation data increases. If the degree of response after correction is equal to or greater than the first threshold, the response suitability determination unit 19 determines that the utterance of the utterer is directed to the speech recognition device 5 . If the degree of response after correction is less than the first threshold, the response suitability determination unit 19 determines that the utterance of the utterer is not the utterance to the speech recognition device 5 .
  • the first threshold value may be stored in the internal memory of the response appropriateness determination unit 19 or may be given from the outside of the speech recognition device 5 .
  • the response propriety determination unit 19 determines that the utterance of the speaker is directed to the speech recognition device 5 if the degree of response after correction is equal to or greater than the first threshold. I am trying to do it. However, this is only an example, and if the degree of conversation indicated by the obtained conversation degree data is equal to or less than the second threshold, the response propriety determination unit 19 determines that the utterance of the speaker is directed to the speech recognition device 5. If the degree of conversation indicated by the acquired degree of conversation data is greater than the second threshold value, it may be determined that the utterance of the speaker is not the utterance to the speech recognition device 5. .
  • the second threshold value may be stored in the internal memory of the response appropriateness determination unit 19 or may be given from the outside of the speech recognition device 5 .
  • the voice recognition unit 20 performs voice recognition on the sound in the vehicle compartment collected by the microphone 2 and outputs voice recognition result data indicating the voice recognition result to the response data generation unit 21 .
  • the speech recognition result data may be text data or voice data. If the response propriety determination unit 19 determines that the utterance of the speaker is directed to the voice recognition device 5, the response data generation unit 21 generates the voice recognition result data output from the voice recognition unit 20. Generate response data for The response data generator 21 outputs the response data to the vehicle-mounted device 6 and the output device 7, respectively.
  • the speech recognition device 5 shown in FIG. 1 includes a speaker identification unit 11, a speaker presence/absence determination unit 14, a conversation degree update unit 15, and a response unit 18 as components. Any component of the speech recognition device 5 may be distributed to a server device connected to a network, a mobile terminal, or the like. If any component is distributed to a server device or the like, the speech recognition device 5 transmits data or the like given to the component to the server device or the like, and receives data or the like output from the component. It must be equipped with a transmitter/receiver that
  • each of the speaker identification unit 11, the speaker presence/absence determination unit 14, the conversation level update unit 15, and the response unit 18, which are the constituent elements of the speech recognition device 5, are implemented by dedicated hardware as shown in FIG. Assuming it will be implemented. That is, it is assumed that the speech recognition device 5 is realized by a speaker identification circuit 31, a speaker presence/absence determination circuit 32, a conversation level update circuit 33, and a response circuit .
  • Each of the speaker identification circuit 31, the talker presence/absence determination circuit 32, the conversation degree update circuit 33, and the response circuit 34 is, for example, a single circuit, a composite circuit, a programmed processor, a parallel programmed processor, an ASIC (Application Specific Integrated Circuit), FPGA (Field-Programmable Gate Array), or a combination thereof.
  • ASIC Application Specific Integrated Circuit
  • FPGA Field-Programmable Gate Array
  • the components of the speech recognition device 5 are not limited to those realized by dedicated hardware, and the speech recognition device 5 may be realized by software, firmware, or a combination of software and firmware. good too.
  • Software or firmware is stored as a program in a computer's memory.
  • a computer means hardware that executes a program, for example, a CPU (Central Processing Unit), a central processing unit, a processing unit, an arithmetic unit, a microprocessor, a microcomputer, a processor, or a DSP (Digital Signal Processor). do.
  • FIG. 3 is a hardware configuration diagram of a computer when the speech recognition device 5 is implemented by software, firmware, or the like.
  • the speech recognition device 5 is realized by software, firmware, or the like, there is a program for causing a computer to execute respective processing procedures in the speaker identification unit 11, the speaker presence/absence determination unit 14, the conversation degree update unit 15, and the response unit 18.
  • a program is stored in the memory 41 .
  • the processor 42 of the computer executes the program stored in the memory 41 .
  • FIG. 2 shows an example in which each component of the speech recognition device 5 is realized by dedicated hardware
  • FIG. 3 shows an example in which the speech recognition device 5 is realized by software, firmware, or the like.
  • this is only an example, and some components of the speech recognition device 5 may be implemented by dedicated hardware, and the remaining components may be implemented by software, firmware, or the like.
  • FIG. 4 is a flow chart showing a speech recognition method, which is a processing procedure of the speech recognition device 5 shown in FIG. 5A and 5B are flow charts showing the processing procedure of the conversation level updating unit 15 in the speech recognition device 5 shown in FIG.
  • the camera 1 captures the interior of the vehicle, and outputs image data showing the image of the interior of the vehicle to the occupant identification unit 12, the speaker presence/absence determination unit 14, and the response propriety determination unit 19, respectively.
  • the microphone 2 collects sounds in the vehicle interior and outputs sound data representing the sounds in the vehicle interior to the occupant identification unit 12, the speaker presence/absence determination unit 14, the response appropriateness determination unit 19, and the voice recognition unit 20, respectively.
  • the in-vehicle sensor 3 includes a pressurization sensor, an infrared sensor, a GPS sensor, a gyro sensor, or the like, and outputs sensor information indicating sensing results to the occupant identification unit 12 and the response propriety determination unit 19, respectively.
  • the navigation device 4 outputs setting data indicating the destination, route data indicating the route to the destination, voice guidance information, and the like to the response propriety determination unit 19 .
  • the occupant identifying unit 12 acquires a sensing signal from a pressure sensor that senses the weight of the occupant, among a plurality of pressure sensors included in the vehicle-mounted sensor 3 . By acquiring the sensing signal, the occupant identification unit 12 identifies the pressure sensor outputting the sensing signal among the plurality of pressure sensors. The occupant identification unit 12 determines that the occupant is sitting in the seat where the identified pressure sensor is installed. If the identified pressurization sensor is installed, for example, in the driver's seat, the occupant identifying unit 12 determines that the occupant is sitting in the driver's seat. If the specified pressure sensor is installed, for example, in the front passenger seat, the occupant specifying unit 12 determines that the passenger is sitting in the front passenger seat.
  • the occupant identification unit 12 acquires video data output from the camera 1 .
  • the occupant identification unit 12 cuts out a face image, which is an image showing the face of each occupant, from among the images of the interior of the vehicle indicated by the video data, from the video of the area including the seat on which each occupant is seated. I do.
  • the occupant identification unit 12 analyzes each face image to perform personal authentication of each occupant, and outputs identification information of each occupant to the speaker identification processing unit 13 (step ST1 in FIG. 4).
  • the occupant identification unit 12 also outputs the face image of each occupant to the speaker identification processing unit 13 .
  • the occupant identification unit 12 performs personal authentication of each occupant based on the video inside the vehicle. However, this is only an example, and if all the passengers present in the vehicle are vocalizing, the passenger identification unit 12 will determine the sound based on the sound in the vehicle indicated by the sound data output from the microphone 2. , personal authentication of each passenger may be performed. That is, the occupant identification unit 12 extracts the voice of each occupant from the sound in the vehicle interior indicated by the sound data, and performs voiceprint authentication of each voice, thereby performing personal authentication of each occupant. may
  • the speaker identification processing unit 13 acquires the identification information of each passenger and the face image of each passenger from the passenger identification unit 12 .
  • the speaker identification processing unit 13 searches for the passenger whose mouth is moving by analyzing each face image, and identifies the passenger whose mouth is moving as the speaker (step ST2 in FIG. 4).
  • the speaker identification processing unit 13 identifies the speaker based on the image inside the vehicle. However, this is only an example, and the speaker identification processing unit 13 may identify the speaker based on the sound inside the vehicle indicated by the sound data output from the microphone 2 . That is, for example, when the microphones 2 are installed in each seat, the speaker identification processing unit 13 selects the seat where the microphone 2 that collects the loudest voice among the plurality of microphones 2 is installed. You may make it specify that the passenger
  • the speaker identification processing unit 13 After identifying the speaker, the speaker identification processing unit 13 outputs the identification information of the speaker to the speaker presence/absence determination unit 14, the conversation degree update processing unit 17, and the response appropriateness determination unit 19, respectively. If speaker identification processing unit 13 identifies a plurality of speakers, speaker identification processing unit 13 outputs the identification information of each speaker to speaker presence/absence determination unit 14, conversation degree update processing unit 17, and response appropriateness determination unit 19, respectively.
  • the speaker presence/absence determination unit 14 converses with the speaker. It is determined that there is a speaker who is present (step ST4 in FIG. 4). If one piece of identification information is output from the speaker identification processing unit 13 as the identification information of the speaker (step ST3 in FIG. 4: NO), the speaker presence/absence determination unit 14 converses with the speaker. It is determined that there is no talker who is present (step ST5 in FIG. 4). The talker presence/absence determination unit 14 outputs to the conversation degree update processing unit 17 a determination result indicating whether or not there is a talker who is having a conversation with the speaker.
  • the conversation degree update processing unit 17 acquires speaker identification information from the speaker identification processing unit 13 and acquires a determination result indicating whether or not a speaker exists from the speaker presence/absence determination unit 14 .
  • the conversation level update processing unit 17 acquires past conversation level data for the speaker indicated by the acquired identification information from past conversation level data stored in the conversation level data storage unit 16 . If the past conversation level data for the speaker indicated by the identification information is not stored in the conversation level data storage unit 16, the conversation level update processing unit 17 updates the conversation level K indicated by the conversation level data for the speaker. Initialize.
  • the initial value of the degree of conversation K is 1, for example.
  • the conversation degree update processing unit 17 updates the acquired conversation degree data so as to increase the degree K of conversation if the obtained determination result indicates that there is a conversation partner (step in FIG. 4).
  • the conversation level update processing unit 17 updates the acquired conversation level data so as to lower the conversation level K (step in FIG. 4). ST7).
  • the conversation level update processing unit 17 stores the updated conversation level data in the conversation level data storage unit 16 .
  • the conversation degree update processing unit 17 acquires the identification information of the speaker from the speaker identification processing unit 13, it causes two counters (1) and (2) (not shown) to start counting (see FIG. 5A). step ST21, step ST31 in FIG. 5B). Counters ( 1 ) and ( 2 ) may be included in conversation level update processing section 17 or may be provided outside conversation level update processing section 17 .
  • the internal memory of the conversation degree update processing unit 17 stores the first set value and the second set value. First setting value ⁇ second setting value.
  • the degree-of-conversation update processing unit 17 causes the counter (1) to start counting, and before the count value of the counter (1) reaches the first set value, indicates that there is a speaker.
  • the conversation level data is updated so as to increase the conversation level K (step ST23 in FIG. 5A). That is, the conversation degree update processing unit 17 increases the conversation degree K by adding the degree change value CH to the current conversation degree K, for example, as shown in the following equation (1).
  • Update degree data is a preset value, such as 0.1.
  • the level change value CH may be stored in the internal memory of the conversation level update processing unit 17 or may be given from the outside of the speech recognition device 5 .
  • Post-update degree of conversation K current degree of conversation K+degree change value CH (1)
  • the conversation degree update processing unit 17 must receive a determination result indicating that there is a speaker before the count value of the counter (1) reaches the first set value (step ST22: If NO), the conversation degree data is not updated.
  • the conversation degree update processing unit 17 resets the count value of the counter (1) to zero (step ST24 in FIG. 5A).
  • the conversation degree update processing section 17 receives a determination result indicating that a speaker exists before the count value of the counter (2) reaches the second set value (step ST32 in FIG. 5B). : NO)
  • the conversation degree data is not updated.
  • the conversation degree update processing unit 17 resets the count value of the counter (2) to zero (step ST34 in FIG. 5B).
  • the conversation degree update processing unit 17 updates the conversation degree data so as to increase the degree K of conversation.
  • the conversation level update processing unit 17 may not update the conversation level data.
  • the conversation degree update processing unit 17 updates the conversation degree data so as to lower the degree K of conversation. Even if the count value of the counter (2) reaches the second set value, even if the determination result indicating the presence of the talker is not received, the current degree of conversation K has already reached the lower limit value. is reached, the conversation level update processing unit 17 may not update the conversation level data. As a result, it is possible to prevent the post-correction response degree P′ from becoming less than the first threshold value even if a conversation partner exists for a long period of time due to the conversation degree K becoming too small. can.
  • Each of the upper limit value and the lower limit value may be stored in the internal memory of the conversation level update processing unit 17 or may be given from the outside of the speech recognition device 5 .
  • the response propriety determination unit 19 acquires speaker identification information from the speaker identification processing unit 13 .
  • the response propriety determining unit 19 acquires conversation level data of the speaker indicated by the acquired identification information from the conversation level data stored in the conversation level data storage unit 16 .
  • the response propriety determination unit 19 acquires conversation level data after being updated by the conversation level update processing unit 17 .
  • the conversation degree data acquired by the response propriety determination unit 19 may be conversation degree data about the speaker. Conversation degree data may be acquired.
  • the response propriety determination unit 19 includes the image of the interior of the vehicle, which is the space captured by the camera 1, the sound of the interior of the vehicle collected by the microphone 2, the traveling position data output from the vehicle-mounted sensor 3, and the data output from the navigation device 4. Based on the received voice guidance information, etc., the degree of response P, which is an index for determining whether or not the utterance of the utterer is directed to the voice recognition device 5, is calculated (step ST8 in FIG. 4). Then, the response propriety determination unit 19 corrects the response degree P by dividing the response degree P by the conversation degree K indicated by the conversation degree data, as shown in the following equation (3). step ST9).
  • P' is the degree of response after correction, and decreases as the degree of conversation K increases.
  • the calculation process itself of the degree of response P may be any calculation process, for example, the calculation process disclosed in the above-mentioned Patent Document 1.
  • Patent Literature 1 a likelihood score corresponding to the degree of response P is calculated based on the context of a series of user's utterances.
  • the response propriety determination unit 19 calculates a large response degree P when sound is collected by the microphone 2 within a certain period of time after the voice guidance information is output from the navigation device 4 .
  • the degree of response P smaller than the degree of response P described above is calculated.
  • the response propriety determination unit 19 calculates a small response degree P when sounds are continuously collected by the microphone 2 .
  • the response degree P is calculated.
  • the response propriety determination unit 19 compares the corrected response degree P′ with the first threshold. If the degree of response P′ after correction is equal to or greater than the first threshold (step ST10 in FIG. 4: YES), the response propriety determination unit 19 determines that the utterance of the utterer is directed to the speech recognition device 5. is determined (step ST11 in FIG. 4). If the response degree P′ after correction is less than the first threshold (step ST10 in FIG. 4: NO), the response propriety determination unit 19 determines that the utterance of the utterer is not the utterance to the speech recognition device 5. is determined (step ST12 in FIG. 4).
  • the response propriety determination unit 19 determines that the utterance of the speaker is directed to the speech recognition device 5 if the response degree P′ after correction is equal to or greater than the first threshold. I am trying to make a judgment. However, this is only an example, and if the degree K of conversation indicated by the obtained conversation degree data is equal to or less than the second threshold, the response suitability determination unit 19 determines that the utterance of the speaker is an utterance to the speech recognition device 5. If it is determined that there is, and the degree of conversation K is greater than the second threshold, it may be determined that the utterance of the speaker is not the utterance to the speech recognition device 5 .
  • the voice recognition unit 20 acquires sound data output from the microphone 2 .
  • the speech recognition unit 20 speech-recognizes the sound indicated by the acquired sound data, and outputs speech recognition result data indicating the speech recognition result to the response data generation unit 21 . If the response propriety determination unit 19 determines that the utterance of the speaker is directed to the voice recognition device 5, the response data generation unit 21 generates the voice recognition result data output from the voice recognition unit 20. is generated (step ST13 in FIG. 4). If the response propriety determination unit 19 determines that the utterance of the speaker is not the utterance to the voice recognition device 5, the response data generation unit 21 does not generate response data for the voice recognition result data.
  • the processing itself for generating response data for speech recognition result data is a known technique, and detailed description thereof will be omitted.
  • the voice recognition result data is, for example, data indicating "cold”
  • the response data generation unit 21 sends the air conditioner, which is the in-vehicle device 6, a response content to the effect of "increase the set temperature by 1 degree", for example. Generate the response data shown.
  • the speech recognition result data is data indicating, for example, "the volume is low”
  • the response data generation unit 21 sends the audio device, which is the vehicle-mounted device 6, response content to the effect that, for example, "increase the playback volume”.
  • the response data generation unit 21 sends to the navigation device, which is the in-vehicle device 6, for example, "Set the destination to XX ⁇ Create response data indicating the content of the response to the effect that it will be set to the coast.
  • the response data generator 21 outputs the response data to the vehicle-mounted device 6 and the output device 7, respectively.
  • the vehicle-mounted device 6 acquires the response data output from the response data generator 21 .
  • the in-vehicle device 6 operates according to the response data. If the in-vehicle device 6 is an air conditioner and the response data indicates, for example, the content of the response to the effect that "increase the set temperature by 1 degree", the air conditioner as the in-vehicle device 6 operates to raise the set temperature by 1 degree. . If the in-vehicle device 6 is an audio device and the response data indicates, for example, response data indicating the content of the response to the effect that "increase the playback volume", the audio device as the in-vehicle device 6 increases the playback volume.
  • the navigation device that is the in-vehicle device 6 can Works to set the ground to ⁇ coast.
  • the output device 7 outputs the content of the response indicated by the response data output from the response data generator 21 . If the output device 7 is, for example, a display, the display, which is the output device 7, displays the content of the response indicated by the response data. If the output device 7 is, for example, a lighting device, the lighting device that is the output device 7 changes the color of the lighting so that it can be seen that the response data has been output from the speech recognition device 5 . If the output device 7 is, for example, a speaker, the speaker, which is the output device 7, outputs the content of the response indicated by the response data.
  • the speaker presence/absence determination unit 14 analyzes the motion of a user other than the speaker based on the image of the interior of the vehicle indicated by the video data output from the camera 1, and analyzes the result of the motion. , it may be determined whether or not a user different from the speaker is conversing with the speaker.
  • the determination processing of the speaker presence/absence determining unit 14 based on the motion analysis result will be specifically described below.
  • the speaker presence/absence determination unit 14 acquires speaker identification information from the speaker identification processing unit 13 and acquires video data output from the camera 1 .
  • the video data is video data temporally later than the video data output from the camera 1 to the occupant identification unit 12 .
  • the speaker presence/absence determination unit 14 cuts out facial images of users different from the speaker indicated by the identification information from the video of the vehicle interior indicated by the video data, and analyzes the facial images of each user to determine whether the mouth is moving. Identify users. If there is a user whose mouth is moving, the speaker presence/absence determination unit 14 determines that there is a speaker who is conversing with the speaker. If there is no user whose mouth is moving, the speaker presence/absence determination unit 14 determines that there is no speaker conversing with the speaker.
  • the speaker presence/absence determination unit 14 identifies the user whose mouth is moving by analyzing the face image of the user different from the speaker. However, this is only an example, and the speaker presence/absence determination unit 14 analyzes the face image of the user to determine whether the user is nodding, shaking his or her head, or looking at the speaker. etc. may be specified. If there is a user or the like nodding, the speaker presence/absence determination unit 14 determines that there is a speaker conversing with the speaker, and if there is no user or the like nodding, It is determined that there is no speaker who is having a conversation with the speaker.
  • the response propriety determination unit 19 corrects the response degree P based on the conversation degree K indicated by the conversation degree data, as shown in Equation (3). However, this is only an example, and the response propriety determination unit 19 subtracts the conversation degree K indicated by the conversation degree data from the response degree P as shown in the following equation (4). may be corrected.
  • P' PK (4)
  • the conversation degree update processing unit 17 sets, for example, 0 as the initial value of the degree of conversation K indicated by the conversation degree data of the speaker.
  • the passenger identification unit 12 performs personal authentication of each passenger, and then the speaker identification processing unit 13 identifies the speaker. is doing.
  • the occupant identification unit 12 may perform personal authentication of each occupant if there is no change in the seat of any occupant even if one of the occupants speaks.
  • the speaker identification processing unit 13 may identify the speaker. That is, the occupant identification unit 12 identifies the position of the seat on which each occupant is seated based on the image data output from the camera 1, and only when there is a change in the seat of any occupant, each occupant may be performed again.
  • An example of a mode in which there is a change in the seat of one of the occupants is when the occupant gets in and out of the vehicle.
  • a speaker identification unit 11 that identifies a speaker who is a user who is speaking, a user other than the speaker identified by the speaker identification unit 11 among a plurality of users, and past conversations with the speaker.
  • Conversation degree data indicating the degree is acquired, based on the conversation degree data, it is determined whether or not the utterance of the speaker is directed to the speech recognition device 5, and is determined to be directed to the speech recognition device 5.
  • the speech recognition device 5 is configured to include a response unit 18 that generates response data to the utterance of the speaker only when the speech recognition device 5 is performed. Therefore, the speech recognition device 5 reduces the probability of misrecognizing a conversation between users when the user's speech is directed to the speech recognition device 5, compared to the speech recognition device disclosed in Patent Document 1. can do.
  • Embodiment 2 a voice recognition device 5 having a travel purpose prediction unit 22 that predicts the travel purpose of the vehicle from the destination set in the navigation device 4 or the travel route of the vehicle will be described.
  • FIG. 6 is a configuration diagram showing a speech recognition device 5 according to Embodiment 2.
  • FIG. 7 is a hardware configuration diagram showing hardware of the speech recognition device 5 according to the second embodiment.
  • the same reference numerals as those in FIGS. 1 and 2 denote the same or corresponding parts, so description thereof will be omitted.
  • the travel purpose prediction unit 22 is realized by, for example, a travel purpose prediction circuit 35 shown in FIG.
  • the travel purpose prediction unit 22 acquires destination setting data indicating the destination set in the navigation device 4 or travel route data indicating the travel route of the vehicle recorded in the navigation device 4 .
  • the travel purpose prediction unit 22 predicts the travel purpose of the vehicle from the destination indicated by the destination setting data or the travel route indicated by the travel route data.
  • the travel purpose prediction unit 22 outputs travel purpose data indicating the predicted travel purpose of the vehicle to the conversation degree update unit 23 .
  • the travel purpose prediction unit 22 acquires travel route data from the navigation device 4.
  • the vehicle-mounted sensor 3 is realized by a GPS sensor, for example, the travel purpose prediction unit 22 acquires GPS data output from the GPS sensor, and calculates the travel route of the vehicle from the GPS data. You may make it specify.
  • the travel purpose prediction unit 22 may acquire angular velocity data output from the gyro sensor and identify the travel route of the vehicle from the angular velocity data. .
  • the conversation level update unit 23 is implemented by, for example, a conversation level update circuit 36 shown in FIG.
  • the conversation level update unit 23 includes a conversation level data storage unit 24 and a conversation level update processing unit 25 .
  • the conversation level update unit 23 acquires speaker identification information from the speaker identification unit 11 and acquires travel purpose data from the travel purpose prediction unit 22 .
  • the conversation degree update unit 23 acquires the conversation degree data for the driving purpose indicated by the driving purpose data from among a plurality of conversation degree data for each driving purpose indicating the past conversation degree of the speaker indicated by the identification information.
  • the conversation level update unit 23 acquires the conversation level data from the conversation level data storage unit 24 inside.
  • the conversation level update unit 23 may acquire the conversation level data from outside the speech recognition device 5 .
  • Conversation degree update unit 23 updates the acquired conversation degree data so as to increase the degree of conversation when it is determined by talker presence/absence determination unit 14 that a talker exists.
  • Conversation degree update unit 23 updates the acquired conversation degree data so as to lower the degree of conversation when it is determined by talker presence/absence determination unit 14 that there is no talker.
  • the conversation degree update processing unit 25 acquires speaker identification information from the speaker identification processing unit 13 and acquires travel purpose data from the travel purpose prediction unit 22 .
  • the conversation degree update processing unit 25 updates the conversation of the speaker indicated by the identification information output from the speaker identification processing unit 13 from among the plurality of conversation degree data for each driving purpose stored in the conversation degree data storage unit 24. Conversation level data for the driving purpose indicated by the driving purpose data, which is degree data, is acquired. If the determination result output from the speaker presence/absence determination unit 14 indicates that a speaker is present, the conversation level update processing unit 25 updates the acquired conversation level data so as to increase the level of conversation. Update.
  • the conversation level update processing unit 25 updates the acquired conversation level data so as to lower the level of conversation. Update.
  • the conversation level update processing unit 25 stores the updated conversation level data in the conversation level data storage unit 24 .
  • each of the speaker identification unit 11, the speaker presence/absence determination unit 14, the travel purpose prediction unit 22, the conversation degree update unit 23, and the response unit 18, which are the constituent elements of the speech recognition device 5, are configured as shown in FIG. It is assumed to be realized by special dedicated hardware. That is, it is assumed that the speech recognition device 5 is implemented by a speaker identification circuit 31, a speaker presence/absence determination circuit 32, a driving purpose prediction circuit 35, a conversation degree update circuit 36, and a response circuit 34.
  • Each of the speaker identification circuit 31, the speaker presence/absence determination circuit 32, the driving purpose prediction circuit 35, the conversation degree update circuit 36, and the response circuit 34 is, for example, a single circuit, a composite circuit, a programmed processor, or a parallel programmed circuit. Processors, ASICs, FPGAs, or combinations thereof are applicable.
  • the components of the speech recognition device 5 are not limited to those realized by dedicated hardware, and the speech recognition device 5 may be realized by software, firmware, or a combination of software and firmware. good too.
  • the speech recognition device 5 is realized by software or firmware, each processing procedure in the speaker identification unit 11, the speaker presence/absence determination unit 14, the driving purpose prediction unit 22, the conversation level update unit 23, and the response unit 18 is performed.
  • a program to be executed by the computer is stored in the memory 41 shown in FIG.
  • the processor 42 shown in FIG. 3 executes the program stored in the memory 41 .
  • FIG. 7 shows an example in which each component of the speech recognition device 5 is realized by dedicated hardware
  • FIG. 3 shows an example in which the speech recognition device 5 is realized by software, firmware, or the like.
  • this is only an example, and some components of the speech recognition device 5 may be implemented by dedicated hardware, and the remaining components may be implemented by software, firmware, or the like.
  • the navigation device 4 outputs destination setting data indicating the destination to the travel purpose prediction unit 22 .
  • the travel purpose prediction unit 22 acquires destination setting data from the navigation device 4 .
  • the travel purpose prediction unit 22 predicts the travel purpose of the vehicle from the destination indicated by the destination setting data. If the destination is an amusement park or a leisure facility such as a ball game ground, the travel purpose prediction unit 22 predicts that the travel purpose of the vehicle is leisure. If the destination is an office building or a business facility such as a factory, the travel purpose prediction unit 22 predicts that the travel purpose of the vehicle is business. If the destination is a shopping facility such as a department store or a supermarket, the travel purpose prediction unit 22 predicts that the travel purpose of the vehicle is shopping.
  • the travel purpose prediction unit 22 outputs travel purpose data indicating the travel purpose of the vehicle to the conversation level update unit 23 .
  • the travel purpose prediction unit 22 acquires travel route data indicating the travel route of the vehicle from the navigation device 4 .
  • the travel purpose prediction unit 22 acquires GPS data from the GPS sensor instead of acquiring travel route data from the navigation device 4, and calculates the travel route of the vehicle from the GPS data. may be specified.
  • the travel purpose prediction unit 22 may acquire angular velocity data from the gyro sensor and identify the travel route of the vehicle from the angular velocity data.
  • the travel purpose prediction unit 22 supplies the travel route data to a learning model (not shown), and acquires travel purpose data indicating the travel purpose of the vehicle from the learning model.
  • the travel purpose prediction unit 22 outputs the travel purpose data to the conversation degree update unit 23 .
  • the learning model machine-learns the driving purpose of the vehicle using driving route data indicating the driving route of the vehicle and teacher data indicating the driving purpose of the vehicle.
  • the learned learning model outputs travel purpose data indicating the travel purpose of the vehicle when given travel route data.
  • the travel purpose prediction unit 22 predicts the travel purpose of the vehicle using a learning model. However, this is only an example, and the travel purpose prediction unit 22 may predict the travel purpose of the vehicle using, for example, a rule base.
  • the conversation degree update processing unit 25 acquires speaker identification information from the speaker identification processing unit 13 and acquires travel purpose data from the travel purpose prediction unit 22 .
  • the conversation level update processing unit 25 updates the conversation level data of the speaker indicated by the identification information from among the plurality of conversation level data for each driving purpose stored in the conversation level data storage unit 24, and the driving purpose data. Acquire the degree of conversation data for the purpose of driving indicated by .
  • the conversation degree update processing unit 25 acquires a determination result indicating whether or not there is a speaker who is having a conversation with the speaker from the speaker presence/absence determination unit 14 .
  • the conversation degree update processing unit 25 updates the acquired conversation degree data so as to increase the degree K of conversation if the determination result indicates that there is a speaker.
  • the conversation level update processing unit 25 updates the acquired conversation level data so as to lower the conversation level K if the determination result indicates that there is no speaker.
  • the update processing of the conversation degree data by the conversation degree update processing unit 25 is the same as the update processing of the conversation degree data by the conversation degree update processing unit 17 shown in FIG. 1, so a specific description of the update processing is omitted.
  • the conversation level update processing unit 25 stores the updated conversation level data in the conversation level data storage unit 24 .
  • the space is the passenger compartment of the vehicle, and the plurality of users present in the space are the plurality of passengers in the vehicle, which are set in the navigation device 4.
  • the speech recognition device 5 shown in FIG. 6 is configured to include a travel purpose prediction unit 22 that predicts the travel purpose of the vehicle from the destination or the travel route of the vehicle. Further, the conversation degree updating unit 23 of the speech recognition device 5 updates the driving purpose predicted by the driving purpose prediction unit 22 from among a plurality of pieces of conversation degree data for each driving purpose indicating the past conversation degree of the speaker.
  • Conversation level data is acquired, and if it is determined by the speaker presence/absence determining unit 14 that a speaker exists, the acquired conversation level data is updated so as to increase the degree of conversation, and the speaker presence/absence determination unit 14 If it is determined that there is no talker, the acquired conversation degree data is updated so as to lower the degree of conversation. Therefore, the speech recognition device 5 shown in FIG. 6 has a higher probability than the speech recognition device 5 shown in FIG. can be reduced.
  • Embodiment 3 In the third embodiment, conversation degree data for the seat position identified by the speaker identification unit 11 is acquired from among a plurality of conversation degree data for each seat position indicating the past conversation degree of the speaker.
  • the speech recognition device 5 including the updating unit 26 will be described.
  • FIG. 8 is a configuration diagram showing a speech recognition device 5 according to Embodiment 3.
  • FIG. 9 is a hardware configuration diagram showing hardware of the speech recognition device 5 according to the third embodiment.
  • the same reference numerals as those in FIGS. 1 and 2 denote the same or corresponding parts, so description thereof will be omitted.
  • the conversation level update unit 26 is implemented by, for example, a conversation level update circuit 37 shown in FIG.
  • the conversation level update unit 26 includes a conversation level data storage unit 27 and a conversation level update processing unit 28 .
  • the conversation level update unit 26 acquires seat position data indicating the positions of the seats on which the respective passengers are seated, and speaker identification information from the speaker identification unit 11 .
  • a conversation degree update unit 26 acquires the conversation degree data of the seat position indicated by the seat position data from among a plurality of conversation degree data for each seat position indicating the past conversation degree of the speaker indicated by the identification information.
  • the conversation degree update unit 26 acquires the conversation degree data from the conversation degree data storage unit 27 inside.
  • the conversation level update unit 26 may acquire the conversation level data from outside the speech recognition device 5 .
  • the conversation level update unit 26 updates the acquired conversation level data so as to increase the degree of conversation when the speaker presence/absence determination unit 14 determines that a speaker exists. If the speaker presence/absence determining unit 14 determines that there is no speaker, the conversation level updating unit 26 updates the acquired conversation level data so as to lower the level of conversation.
  • the conversation level data storage unit 27 is a storage medium that stores a plurality of conversation level data for each seat position. When there are a plurality of passengers in the vehicle, each passenger can become a speaker. Therefore, a plurality of conversation degree data for each seat position are stored for each passenger. Assume that the number of occupants present in the passenger compartment is, for example, three, and the three occupants are C1, C2, and C3.
  • the conversation degree data storage unit 27 stores the conversation degree data according to the pattern P1 and the conversation degree data according to the pattern P2 as the conversation degree data for the passenger C1.
  • the conversation degree data storage unit 27 stores the conversation degree data according to the pattern P1 and the conversation degree data according to the pattern P2 as the conversation degree data for the passenger C1.
  • the conversation degree data storage unit 27 stores the conversation degree data for the pattern P1, the conversation degree data for the pattern P2, and the conversation degree data for the pattern P3 as the conversation degree data for the passenger C2. data and conversation degree data relating to pattern P4.
  • the conversation degree update processing unit 28 acquires seat position data indicating the position of the seat where each passenger is seated from the occupant identification unit 12 and acquires speaker identification information from the speaker identification processing unit 13 .
  • the conversation degree update processing unit 28 selects the conversation degree data of the speaker indicated by the identification information from among the plurality of conversation degree data for each seat position stored in the conversation degree data storage unit 27, and the seat position data. Acquire the degree of conversation data for the seat position indicated by . If the determination result output from the speaker presence/absence determining unit 14 indicates that a speaker is present, the conversation level update processing unit 28 updates the acquired conversation level data so as to increase the level of conversation. Update.
  • the conversation level update processing unit 28 updates the acquired conversation level data so as to lower the level of conversation. Update.
  • the conversation level update processing unit 28 causes the conversation level data storage unit 27 to store the updated conversation level data.
  • the conversation level updating unit 26 is applied to the speech recognition device 5 shown in FIG.
  • the conversation level data storage unit 27 stores a plurality of conversation level data for each driving purpose and seat position.
  • the conversation degree update processing unit 28 updates the conversation degree data of the speaker indicated by the identification information output from the speaker identification processing unit 13 from among the plurality of conversation degree data stored in the conversation degree data storage unit 27. Also, conversation degree data for the driving purpose indicated by the driving purpose data and at the seat position indicated by the seat position data is acquired.
  • each of the speaker identification unit 11, the speaker presence/absence determination unit 14, the conversation degree update unit 26, and the response unit 18, which are the constituent elements of the speech recognition device 5, are implemented by dedicated hardware as shown in FIG. Assuming it will be implemented. That is, it is assumed that the speech recognition device 5 is implemented by a speaker identification circuit 31, a speaker presence/absence determination circuit 32, a conversation degree update circuit 37, and a response circuit .
  • Each of the speaker identification circuit 31, the talker presence/absence determination circuit 32, the conversation degree update circuit 37, and the response circuit 34 is, for example, a single circuit, a composite circuit, a programmed processor, a parallel programmed processor, ASIC, FPGA, Or a combination of these applies.
  • the components of the speech recognition device 5 are not limited to those realized by dedicated hardware, and the speech recognition device 5 may be realized by software, firmware, or a combination of software and firmware. good too.
  • the speech recognition device 5 is realized by software, firmware, or the like, there is a program for causing a computer to execute respective processing procedures in the speaker identification unit 11, the speaker presence/absence determination unit 14, the conversation level update unit 26, and the response unit 18.
  • a program is stored in the memory 41 shown in FIG.
  • the processor 42 shown in FIG. 3 executes the program stored in the memory 41 .
  • FIG. 9 shows an example in which each component of the speech recognition device 5 is realized by dedicated hardware
  • FIG. 3 shows an example in which the speech recognition device 5 is realized by software, firmware, or the like.
  • this is only an example, and some components of the speech recognition device 5 may be implemented by dedicated hardware, and the remaining components may be implemented by software, firmware, or the like.
  • the occupant identification unit 12 identifies the position where each occupant is seated, as in the first embodiment.
  • the occupant identification unit 12 outputs seat position data indicating the position of the seat on which each occupant is seated to the conversation degree update processing unit 28 .
  • the speaker identification processing unit 13 outputs speaker identification information to the conversation level update processing unit 28 .
  • the conversation degree update processing unit 28 acquires seat position data indicating the position of the seat where each passenger is seated from the occupant identification unit 12 and acquires speaker identification information from the speaker identification processing unit 13 .
  • the conversation degree update processing unit 28 selects the conversation degree data of the speaker indicated by the identification information from among the plurality of conversation degree data for each seat position stored in the conversation degree data storage unit 27, and the seat position data. Acquire the degree of conversation data for the seat position indicated by . Assume that the number of occupants present in the passenger compartment is, for example, three, and the three occupants are C1, C2, and C3.
  • the conversation degree data storage unit 27 stores the conversation degree data according to the pattern P1 and the conversation degree data according to the pattern P2 as the conversation degree data for the passenger C1.
  • the conversation degree update processing unit 28 acquires the conversation degree data related to the pattern P1 as the conversation degree data for the passenger C1.
  • the degree-of-conversation update processing unit 28 acquires from the speaker presence/absence determination unit 14 a determination result indicating whether or not there is a speaker who is having a conversation with the speaker.
  • the conversation degree update processing unit 28 updates the acquired conversation degree data so as to increase the degree K of conversation if the determination result indicates that there is a speaker.
  • the conversation level update processing unit 28 updates the acquired conversation level data so as to lower the conversation level K if the determination result indicates that there is no speaker.
  • the update processing of the conversation level data by the conversation level update processor 28 is the same as the update process of the conversation level data by the conversation level update processor 17 shown in FIG.
  • the conversation level update processing unit 28 causes the conversation level data storage unit 27 to store the updated conversation level data.
  • the space is a vehicle compartment
  • the plurality of users present in the space are the plurality of passengers in the vehicle.
  • the respective seat positions of the speaker and the person speaking are specified based on the indoor video or the sound inside the vehicle.
  • the conversation degree update unit 26 acquires conversation degree data for the seat position identified by the speaker identification unit 11 from among a plurality of conversation degree data for each seat position indicating the past conversation degree of the speaker, If the speaker presence/absence determination unit 14 determines that a speaker exists, the acquired conversation degree data is updated so as to increase the degree of conversation, and the speaker presence/absence determination unit 14 determines whether a speaker exists.
  • the speech recognition device 5 is configured so as to update the acquired conversation degree data so as to lower the degree of conversation if it is determined that there is no conversation. Therefore, the speech recognition device 5 shown in FIG. 8 has a higher probability than the speech recognition device 5 shown in FIG. can be reduced.
  • the present disclosure is suitable for speech recognition devices and speech recognition methods.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Telephonic Communication Services (AREA)

Abstract

Le présent dispositif de reconnaissance vocale (5) est configuré de façon à comporter : une unité d'identification de locuteur (11) qui identifie un locuteur, qui est un utilisateur qui parle parmi une pluralité d'utilisateurs présents dans un espace, sur la base d'une image dans l'espace capturée par une caméra (1) ou d'un son dans l'espace capté par un microphone (2) ; et une unité de réponse (18) qui acquiert des données de degré de conversation indiquant le degré de conversations passées entre le locuteur et les utilisateurs autres que le locuteur identifié par l'unité d'identification de locuteur (11) parmi la pluralité d'utilisateurs, détermine si la parole du locuteur est ou non une parole dirigée vers le dispositif de reconnaissance vocale (5) sur la base des données de degré de conversation, et génère des données de réponse à la parole du locuteur uniquement lorsqu'il est déterminé que la parole est dirigée vers le dispositif de reconnaissance vocale (5).
PCT/JP2021/018019 2021-05-12 2021-05-12 Dispositif de reconnaissance vocale et procédé de reconnaissance vocale WO2022239142A1 (fr)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/JP2021/018019 WO2022239142A1 (fr) 2021-05-12 2021-05-12 Dispositif de reconnaissance vocale et procédé de reconnaissance vocale

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/JP2021/018019 WO2022239142A1 (fr) 2021-05-12 2021-05-12 Dispositif de reconnaissance vocale et procédé de reconnaissance vocale

Publications (1)

Publication Number Publication Date
WO2022239142A1 true WO2022239142A1 (fr) 2022-11-17

Family

ID=84028541

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2021/018019 WO2022239142A1 (fr) 2021-05-12 2021-05-12 Dispositif de reconnaissance vocale et procédé de reconnaissance vocale

Country Status (1)

Country Link
WO (1) WO2022239142A1 (fr)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2018216180A1 (fr) * 2017-05-25 2018-11-29 三菱電機株式会社 Dispositif de reconnaissance vocale et procédé de reconnaissance vocale
WO2019171732A1 (fr) * 2018-03-08 2019-09-12 ソニー株式会社 Dispositif et procédé de traitement d'informations, programme, et système de traitement d'informations
WO2020044543A1 (fr) * 2018-08-31 2020-03-05 三菱電機株式会社 Dispositif de traitement d'informations, procédé de traitement d'informations et programme

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2018216180A1 (fr) * 2017-05-25 2018-11-29 三菱電機株式会社 Dispositif de reconnaissance vocale et procédé de reconnaissance vocale
WO2019171732A1 (fr) * 2018-03-08 2019-09-12 ソニー株式会社 Dispositif et procédé de traitement d'informations, programme, et système de traitement d'informations
WO2020044543A1 (fr) * 2018-08-31 2020-03-05 三菱電機株式会社 Dispositif de traitement d'informations, procédé de traitement d'informations et programme

Similar Documents

Publication Publication Date Title
ES2806204T3 (es) Técnicas para reconomiento de voz para activación y sistemas y métodos relacionados
JP6466385B2 (ja) サービス提供装置、サービス提供方法およびサービス提供プログラム
JP3910898B2 (ja) 指向性設定装置、指向性設定方法及び指向性設定プログラム
JP3702978B2 (ja) 認識装置および認識方法、並びに学習装置および学習方法
CN108146360A (zh) 车辆控制的方法、装置、车载设备和可读存储介质
US11176948B2 (en) Agent device, agent presentation method, and storage medium
JP7192222B2 (ja) 発話システム
JP2002091466A (ja) 音声認識装置
JP2017090612A (ja) 音声認識制御システム
JP2020060696A (ja) コミュニケーション支援システム、コミュニケーション支援方法、およびプログラム
CN110696756A (zh) 一种车辆的音量控制方法及装置、汽车、存储介质
JP2020126166A (ja) エージェントシステム、情報処理装置、情報処理方法、およびプログラム
WO2019130399A1 (fr) Dispositif de reconnaissance de la parole, système de reconnaissance de la parole et procédé de reconnaissance de la parole
CN111007968A (zh) 智能体装置、智能体提示方法及存储介质
JP2004354930A (ja) 音声認識システム
WO2020079733A1 (fr) Dispositif de reconnaissance vocale, système de reconnaissance vocale et procédé de reconnaissance vocale
US11709065B2 (en) Information providing device, information providing method, and storage medium
JP3838159B2 (ja) 音声認識対話装置およびプログラム
WO2022239142A1 (fr) Dispositif de reconnaissance vocale et procédé de reconnaissance vocale
JP6785889B2 (ja) サービス提供装置
JP7065964B2 (ja) 音場制御装置および音場制御方法
JP4561222B2 (ja) 音声入力装置
JP6833147B2 (ja) 情報処理装置、プログラム及び情報処理方法
JP2020060623A (ja) エージェントシステム、エージェント方法、およびプログラム
JP2024067341A (ja) 車両の情報提示方法及び情報提示装置

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21941876

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE