CN111724798B

CN111724798B - Vehicle-mounted device control system, vehicle-mounted device control apparatus, vehicle-mounted device control method, and storage medium

Info

Publication number: CN111724798B
Application number: CN202010189106.6A
Authority: CN
Inventors: 荒川桂辅; 尾中润一郎
Original assignee: Honda Motor Co Ltd
Current assignee: Honda Motor Co Ltd
Priority date: 2019-03-19
Filing date: 2020-03-17
Publication date: 2024-05-07
Anticipated expiration: 2040-03-17
Also published as: CN111724798A; JP7261626B2; JP2020154098A

Abstract

The invention provides a vehicle-mounted device control system, a vehicle-mounted device control device, a vehicle-mounted device control method and a storage medium. The in-vehicle device control system includes: an acquisition unit that acquires a sound including a speech content of an occupant who is riding in the vehicle; an in-vehicle apparatus control unit; a voice recognition unit that recognizes a voice; a determination unit that determines an in-vehicle device that instructs an operation; a determination unit that determines whether or not the specified vehicle-mounted device is a vehicle-mounted device belonging to a predetermined group; and a general switch, wherein, when the vehicle-mounted device receiving the instruction is the vehicle-mounted device belonging to the prescribed group, the vehicle-mounted device control unit outputs at least one of a sound asking whether the operation can be performed or not and a prompt consent image prompting consent to perform the operation through the speaker or the display unit, and when an input indicating consent is received through the general switch, controls the operation of the vehicle-mounted device receiving the instruction.

Description

Vehicle-mounted device control system, vehicle-mounted device control apparatus, vehicle-mounted device control method, and storage medium

Technical Field

The invention relates to a vehicle-mounted device control system, a vehicle-mounted device control apparatus, a vehicle-mounted device control method, and a storage medium.

Background

Research into man-machine interfaces for providing information by conducting voice conversations with humans is ongoing. In association with this, there is known a technique of determining whether or not to speak with a person who is the subject of communication by a robot, a speech volume, a speech tone, or a technique of recognizing a voice uttered by an occupant by using a dictionary in which a vocabulary is registered and controlling a plurality of control target devices in a vehicle room based on the content of the recognized voice (for example, refer to japanese patent No. 4976903 and japanese patent application laid-open No. 2007-286136).

Summary of the invention

Problems to be solved by the invention

However, in the related art, for example, in a case where a plurality of occupants are present in a vehicle cabin, it is sometimes difficult to reliably instruct operation of the in-vehicle apparatus that only a specific occupant (for example, a driver of the vehicle) is permitted by the sound of the occupant speaking.

Disclosure of Invention

An object of an aspect of the present invention is to provide an in-vehicle device control system, an in-vehicle device control apparatus, an in-vehicle device control method, and a storage medium that can reliably instruct only a specific occupant to perform an operation and reduce the burden on a driving responsible person who instructs the vehicle.

Means for solving the problems

The vehicle-mounted device control system, the vehicle-mounted device control apparatus, the vehicle-mounted device control method, and the storage medium of the present invention adopt the following configurations.

(1): An in-vehicle device control system according to an aspect of the present invention includes: an acquisition unit that acquires a sound including a speech content of an occupant who is riding in the vehicle; an in-vehicle device control unit that is mounted on the vehicle and controls the operation of in-vehicle devices including a speaker and a display unit; a voice recognition unit that recognizes a voice including a speech content of an occupant of the vehicle acquired by the acquisition unit; a determination portion that determines the in-vehicle apparatus that instructs an action by the sound of the occupant recognized by the sound recognition portion; a determination unit that determines whether or not the specified in-vehicle device is an in-vehicle device belonging to a predetermined group; and a general-purpose switch, wherein when the determination unit determines that the vehicle-mounted device that receives the instruction is the vehicle-mounted device belonging to the predetermined group, the vehicle-mounted device control unit outputs at least one of a sound that asks whether or not the vehicle-mounted device belonging to the predetermined group can execute an action corresponding to the instruction and a prompt approval image that prompts approval of execution of the action in the vehicle-mounted device belonging to the predetermined group, through the speaker or the display unit, and when an input indicating approval of execution of the instruction by the occupant is received through the general-purpose switch, controls the action of the vehicle-mounted device that receives the instruction.

(2): In the aspect of (1) above, the in-vehicle devices belonging to the predetermined group are in-vehicle devices that affect the behavior of the vehicle.

(3): In the aspect of (1) above, the in-vehicle devices belonging to the predetermined group are in-vehicle devices corresponding to operations that are permitted only by the driver in the vehicle.

(4): In the above-described aspect (1), the general-purpose switch may be used for other purposes in addition to a scenario in which a predetermined input related to an audio instruction including an input indicating the agreement is received.

(5): In the aspect of (4) above, when an input indicating the start of receiving a sound is received through the general-purpose switch, the sound recognition unit starts recognizing a sound including a content of a speech of the occupant collected by a microphone as the acquisition unit.

(6): The general switch is provided to a steering wheel in any one of the above (1) to (5).

(7): In addition to any one of the above (1) to (6), the in-vehicle apparatus control system further includes a switch that causes the in-vehicle apparatus control unit to control an operation of the in-vehicle apparatus belonging to the group other than the predetermined group when it is determined that the in-vehicle apparatus that received the instruction is the in-vehicle apparatus belonging to the group other than the predetermined group.

(8): An in-vehicle device control apparatus according to an aspect of the present invention includes: an acquisition unit that acquires a sound including a speech content of an occupant who is riding in the vehicle; an in-vehicle device control unit that is mounted on the vehicle and controls the operation of in-vehicle devices including a speaker and a display unit; a voice recognition unit that recognizes a voice including a speech content of the occupant acquired by the acquisition unit; a determination portion that determines the in-vehicle apparatus that instructs an action by the sound of the occupant recognized by the sound recognition portion; a determination unit that determines whether or not the specified in-vehicle device is an in-vehicle device belonging to a predetermined group; and a general-purpose switch, wherein when the determination unit determines that the vehicle-mounted device that receives the instruction is a vehicle-mounted device belonging to the predetermined group, the vehicle-mounted device control unit outputs at least one of a sound that asks whether or not the vehicle-mounted device belonging to the predetermined group can execute an action corresponding to the instruction and a prompt approval image that prompts approval of execution of the action in the vehicle-mounted device belonging to the predetermined group, through the speaker or the display unit, and when an input indicating approval of execution of the instruction by the occupant is received through the general-purpose switch, controls the action of the vehicle-mounted device that receives the instruction.

(9): In one aspect of the present invention, a vehicle-mounted device control method includes a step of causing a single or a plurality of computers in a vehicle-mounted device control system including an acquisition unit for acquiring a sound including a speech content of an occupant of a vehicle and a general-purpose switch to execute: identifying a sound that includes a speaking content of the occupant; determining an in-vehicle apparatus that instructs an action by the sound of the identified occupant; determining whether the determined vehicle-mounted device is a vehicle-mounted device belonging to a prescribed group; when it is determined that the vehicle-mounted device that receives the instruction is a vehicle-mounted device belonging to the predetermined group, outputting, via a speaker or a display unit, at least one of a sound asking whether or not the vehicle-mounted device belonging to the predetermined group can execute an action corresponding to the instruction and a prompt approval image prompting approval of executing the action in the vehicle-mounted device belonging to the predetermined group; and at this time, when an input indicating that the occupant agrees to execute the instruction is received through the general-purpose switch, controlling an operation of the in-vehicle apparatus that receives the instruction.

(10): A storage medium according to an aspect of the present invention stores a program that is installed in a single or a plurality of computers in an in-vehicle device control system provided with an acquisition unit that acquires sound including a speech content of an occupant of a vehicle and a general-purpose switch, and that causes the computers to execute: identifying a sound that includes a speaking content of the occupant; determining an in-vehicle apparatus that instructs an action by the sound of the identified occupant; determining whether the determined vehicle-mounted device is a vehicle-mounted device belonging to a prescribed group; when it is determined that the in-vehicle device that accepted the instruction is an in-vehicle device belonging to the predetermined group, outputting, via a speaker or a display unit, at least one of a sound asking whether or not an in-vehicle device belonging to the predetermined group can execute an action corresponding to the instruction and a prompt approval image prompting approval of execution of the action in-vehicle devices belonging to the predetermined group; and at this time, when an input indicating that the occupant agrees to execute the instruction is received through the general-purpose switch, controlling an operation of the in-vehicle apparatus that receives the instruction.

Effects of the invention

According to the aspects of (1) to (10) above, the operation of the in-vehicle device can be easily instructed to the occupant while maintaining the safety of the vehicle.

Drawings

Fig. 1 is a diagram showing an example of the structure of an intelligent agent system according to the first embodiment.

Fig. 2 is a diagram showing an example of the structure of the intelligent agent apparatus according to the first embodiment.

Fig. 3 is a view showing an example of the vehicle interior viewed from the driver's seat.

Fig. 4 is a diagram showing an example of the vehicle interior of the vehicle M viewed from above.

Fig. 5 is a view showing an example of a promotion agreement image of the reclining mechanism of the driver seat.

Fig. 6 is a diagram showing an example of the structure of the server device according to the first embodiment.

Fig. 7 is a diagram showing an example of the content of the answer information.

Fig. 8 is a diagram showing an example of a timing chart of a scenario in which information indicating that the in-vehicle device control of the in-vehicle device is affected is received.

Fig. 9 is a flowchart showing a flow of a series of processes of the agent apparatus according to the first embodiment.

Fig. 10 is a flowchart showing a flow of processing of an example of the server device according to the first embodiment.

Fig. 11 is a diagram showing an example of the intelligent agent apparatus according to the second embodiment.

Fig. 12 is a flowchart showing a flow of a series of processes of the agent apparatus according to the second embodiment.

Symbol description:

1 … agent system, 100A … agent device, 102 … agent side communication section, 106A, 106B, 106C, 106D, 106E … microphone, 108A, 108B, 108C, 108D, 108E … speaker, 110A, 110B, 110C … display section, 112 … general switch, 120A … agent side control section, 122 … acquisition section, 124 … sound synthesis section, 126 … output control section, 128 … communication control section, 130 … determination section, 132 … determination section, 134 … in-vehicle device control section, 150, 150A … agent side storage, 152 … in-vehicle device information, 200 … server device, 202 … server side communication section, 210 … server side control section, 212 acquisition section, 214a … speech section extraction section, 216A … voice recognition section, 222a … agent data generation section, 224 … communication control section, 230 … server side storage section, 234a … answer information, VE … in-vehicle device, NVE … non-influencing in-vehicle device, EVE … influencing in-vehicle device, M … vehicle.

Detailed Description

Embodiments of an in-vehicle device control system, an in-vehicle device control apparatus, an in-vehicle device control method, and a storage medium according to the present invention are described below with reference to the drawings.

< First embodiment >

[ System Structure ]

Fig. 1 is a diagram showing an example of the structure of an intelligent agent system 1 according to the first embodiment. The agent system 1 of the first embodiment includes, for example, an agent device 100 mounted on a vehicle (hereinafter referred to as a vehicle M) and a server device 200. The vehicle M is, for example, a two-wheel, three-wheel, four-wheel or the like vehicle. The driving source of these vehicles may be an internal combustion engine such as a diesel engine or a gasoline engine, an electric motor, or a combination thereof. The motor operates using generated power generated by a generator connected to the internal combustion engine or discharge power of the secondary battery or the fuel cell.

The agent device 100 and the server device 200 are connected to be communicable via a network NW. The network NW includes LAN (Local Area Network), WAN (Wide Area Network), and the like. The network NW may include, for example, a network using wireless communication such as Wi-Fi or Bluetooth (registered trademark, which will be omitted below). The agent system 1 may be constituted by a plurality of agent devices 100 and a plurality of server devices 200.

The agent device 100 acquires the sound from the occupant of the vehicle M using the agent function, and transmits the acquired sound to the server device 200. The agent device 100 performs a dialogue with an occupant, provides information such as an image or a video, or controls the in-vehicle device VE or other devices based on data (e.g., agent data) or the like obtained from the server device. For example, the vehicle M is mounted with an in-vehicle device VE whose operation affects the behavior of the vehicle M (hereinafter referred to as an affecting in-vehicle device EVE) and an in-vehicle device VE whose operation does not affect the behavior of the vehicle M (hereinafter referred to as a non-affecting in-vehicle device NVE). The vehicle-mounted device EVE is, for example, a device that affects the posture of the driver (a reclining mechanism of the driver, a seat position control mechanism of the driver, or the like), a device involved in automatic driving or high-level driving support (for example, ACC (Adaptive Cruise Control), VSA (Vehicle Stability Assist), or the like), or the like, and is a device that allows (permits) only the driver to operate. In contrast, the non-influencing vehicle-mounted device NVE is, for example, an air conditioner, a power window, a sound, a car navigator, or the like, and is a device that allows an occupant other than the driver to operate. As a method of classifying the in-vehicle devices VE, there are, for example, a method of classifying the in-vehicle devices VE corresponding to only the operation allowed (permitted) by the driver in the vehicle and other in-vehicle devices VE. As the in-vehicle apparatus corresponding to the operation that is allowed (permitted) only by the driver, for example, a power window on the driver seat side or the like is used in addition to influencing the in-vehicle apparatus EVE.

The server device 200 communicates with the agent device 100 mounted on the vehicle M, and acquires various data from the agent device 100. The server device 200 generates, based on the acquired data, agent data related to an inquiry made based on sound or the like, and supplies the generated agent data to the agent device 100. The functions of the server apparatus 200 of the first embodiment are included in the agent functions. In addition, the function of the server apparatus 200 updates the agent function in the agent apparatus 100 to a function of higher accuracy.

[ Structure of agent device ]

Fig. 2 is a diagram showing an example of the structure of the intelligent agent apparatus 100 according to the first embodiment. The smart device 100 according to the first embodiment includes, for example, a smart-side communication unit 102, a microphone (microphone) 106, a speaker 108, a display unit 110, a first general switch 112, a second general switch 113 (including 113b, 113c, and 113 d), a smart-side control unit 120, and a smart-side storage unit 150. These apparatuses and devices may be connected to each other via a multi-way communication line such as CAN (Controller Area Network) communication line, a serial communication line, a wireless communication network, or the like. The configuration of the intelligent agent apparatus 100 shown in fig. 2 is merely an example, and a part of the configuration may be omitted or another configuration may be further added.

The agent-side communication unit 102 includes a communication interface NIC (Network Interface controller) or the like. The agent-side communication unit 102 communicates with the server apparatus 200 and the like via the network NW.

The microphone 106 is an audio input device that converts an audio signal in the vehicle interior into an electrical signal and receives the electrical signal. The microphone 106 outputs the received data of the sound (hereinafter, referred to as sound data) to the agent-side control unit 120. For example, the microphone 106 is provided in the vicinity of the front when an occupant sits on a seat in the vehicle cabin. For example, the microphone 106 is provided in the vicinity of a floor mat lamp (mat lamp), a steering wheel, an instrument panel, or a seat. The microphone 106 may be provided in a plurality in the vehicle interior.

The speaker 108 is provided near a seat in the vehicle interior or near the display portion 110, for example. The speaker 108 outputs a sound based on the information output by the agent side control section 120.

The display unit 110 includes a display device such as an LCD (Liquid CRYSTAL DISPLAY) or an organic EL (Electroluminescence) display. The display unit 110 displays an image based on the information output by the agent-side control unit 120.

The first general switch 112 is a user interface such as a button. The first general switch 112 receives an operation by an occupant, and outputs a signal corresponding to the received operation to the agent-side control unit 120. The first general switch 112 is provided on a steering wheel, for example. For example, the first general switch 112 does not allocate a dedicated function, and when used for some applications, the agent device 100 determines the application and instructs the application by the sound output from the speaker 108 and the image displayed on the display 110. Specifically, the first general switch 112 is "open the power window on the driver seat side" as sent from the speaker 108? In case of consent, please press the first general switch 112. "etc., indicates the use thereof.

The first general switch 112 may be used for applications other than receiving input indicating approval of the occupant. For example, the first general switch 112 may be used as a switch to accept the beginning of speech. The first general switch 112 may be used for other purposes, in addition to a scenario in which a predetermined input related to an audio instruction is received, including an input indicating approval of an occupant. Other applications are, for example, the start of a call of a cellular phone paired with an acoustic device of the vehicle M, the volume adjustment of the acoustic device, the start/stop of the acoustic device, the turning on/off of the illumination in the vehicle, and the like. The first general switch 112 may be configured to emit light, and may be configured to illuminate or flash at a timing for use when input indicating approval of the occupant is received or at a timing for use in other applications, thereby indicating the timing for receiving input to the occupant. When the first general switch 112 receives an input, the use may be indicated to the occupant by making the emission color different according to the use.

The second universal switch 113 is a user interface such as a button. The second common switch 113 receives an operation by the occupant, and outputs a signal corresponding to the received operation to the agent-side control unit 120. The second general-purpose switch 113 is not assigned with a dedicated function, for example, and when used for some applications, the agent device 100 determines the application and instructs the application by the sound output from the speaker 108 and the image displayed on the display 110. Specifically, the second general switch 113, as sent from the speaker 108, "is the air conditioner started? In case of consent, please press the second general switch 113. "etc., indicates the use thereof.

Fig. 3 is a view showing an example of the vehicle interior viewed from the driver's seat. Microphones 106A to 106C, speakers 108A to 108C, and display units 110A to 110C are provided in the vehicle interior of the illustrated example. The microphone 106A is provided on a steering wheel, for example, and mainly receives sounds uttered by the driver. The microphone 106B is provided in an instrument panel (dash panel or garnish) IP on the front surface of the passenger seat, for example, and mainly receives the sound of the passenger speaking into the passenger seat. The microphone 106C is provided near the center of the instrument panel (between the driver seat and the passenger seat), for example.

The speaker 108A is provided, for example, at a lower portion of the driver's seat side door, the speaker 108B is provided, for example, at a lower portion of the passenger's seat side door, and the speaker 108C is provided, for example, near the display 110C, that is, near a center of the instrument panel IP.

The Display unit 110A is, for example, a Head-Up Display (HUD) device that displays a virtual image in front of a line of sight when the driver visually confirms the outside of the vehicle. The HUD device is a device that projects light onto a front windshield of the vehicle M or a transparent member having light transmittance, which is called a combiner, to visually confirm a virtual image by an occupant. The occupant is mainly a driver, but may be an occupant other than a driver.

The display portion 110B is provided in an instrument panel IP near the front face of the driver's seat (the seat closest to the steering wheel) and at a position where the occupant can visually confirm from the gap of the steering wheel or visually confirm across the steering wheel. The display portion 110B is, for example, an LCD, an organic EL display device, or the like. The display unit 110B displays images of, for example, the speed of the vehicle M, the engine speed, the fuel level, the radiator water temperature, the travel distance, and other information.

The display 110C is provided near the center of the dashboard IP. The display unit 110C is, for example, an LCD, an organic EL display device, or the like, as in the display unit 110B. The display unit 110C displays contents such as television programs and movies.

The first general switch 112 is provided, for example, in a position in the steering wheel that does not interfere with the driving operation (for example, a position other than the outer periphery of the steering wheel).

In the vehicle M, a microphone and a speaker may be provided near the rear seat. Fig. 4 is a diagram showing an example of the vehicle interior of the vehicle M viewed from above. In the vehicle interior, microphones 106D and 106E and speakers 108D and 108E may be provided in addition to the microphones and speakers illustrated in fig. 3.

The microphone 106D is provided, for example, in the vicinity of a rear seat ST3 provided behind the passenger seat ST2 (for example, behind the passenger seat ST 2), and mainly receives the sound of the speech of the occupant seated in the rear seat ST 3. The microphone 106E is provided, for example, in the vicinity of a rear seat ST4 provided behind the driver seat ST1 (for example, behind the driver seat ST 1), and mainly receives sound of speech of an occupant seated in the rear seat ST 4.

The speaker 108D is provided at a lower portion of the door on the rear seat ST3 side, for example, and the speaker 108E is provided at a lower portion of the door on the rear seat ST4 side, for example.

The second common switch 113 is provided near the microphones 106A to 106D, for example.

The vehicle M illustrated in fig. 1 is described as a vehicle provided with a steering wheel operable by a driver as an occupant, as illustrated in fig. 3 or 4, but the present invention is not limited thereto. For example, the vehicle M may be a roof-less, i.e., vehicle-free (or without a clear distinction thereof) vehicle. In the example of fig. 3 or 4, the case where the driver seat on which the driver who operates the vehicle M sits and the passenger seat and the rear seat on which the other passengers who do not operate the vehicle M sit are located in one room has been described, but the present invention is not limited to this. For example, the vehicle M may be a saddle-ride type motor-driven two-wheeled vehicle having a steering handle instead of a steering wheel. In the example of fig. 3 or 4, the case where the vehicle M is a vehicle provided with a steering wheel is described, but the present invention is not limited thereto. For example, the vehicle M may be an autonomous vehicle in which a steering operation device such as a steering wheel is not provided. An autonomous vehicle is a vehicle that controls one or both of steering and acceleration and deceleration of the vehicle independently of an operation by an occupant to perform driving control, for example.

Returning to the description of fig. 2, the agent-side control unit 120 includes, for example, an acquisition unit 122, a sound synthesis unit 124, an output control unit 126, a communication control unit 128, a determination unit 130, a determination unit 132, and an in-vehicle device control unit 134. These components are realized by executing programs (software) by a processor such as CPU (Central Processing Unit) or GPU (Graphics Processing Unit). Some or all of these components may be realized by hardware (circuit part) such as LSI(Large Scale Integration)、ASIC(Application Specific Integrated Circuit)、FPGA(Field-Programmable Gate Array) or by cooperation of software and hardware. The program may be stored in advance in the agent side storage unit 150 (storage device including a non-transitory storage medium), or may be stored in a removable storage medium (non-transitory storage medium) such as a DVD or a CD-ROM, and then installed in the agent side storage unit 150 by mounting the storage medium on a drive device.

The agent side storage 150 is implemented by an HDD, a flash memory, EEPROM (Electrically Erasable Programmable Read Only Memory), ROM (Read Only Memory), RAM (Random Access Memory), or the like. The agent-side storage unit 150 stores, for example, a program or the like referred to by the processor and in-vehicle device information 152. The in-vehicle device information 152 is information indicating (a list of) in-vehicle devices VE mounted on the vehicle M, and is information indicating whether the in-vehicle devices VE affect the in-vehicle devices EVE or not affect the in-vehicle devices NVE.

The acquisition unit 122 acquires sound data from the microphone 106, or acquires other information.

When the voice control content is included in the data (the agent data described later) received from the server apparatus 200 by the agent-side communication unit 102, the voice synthesis unit 124 generates an artificial synthesized voice (hereinafter referred to as an agent voice) as voice control based on the voice data instructed by speaking (i.e., voice instruction).

When the voice synthesizer 124 generates the agent voice, the output controller 126 causes the speaker 108 to output the agent voice. When the agent data includes the image control content, the output control unit 126 causes the display unit 110 to display the image data instructed as the image control. The output control unit 126 may cause the display unit 110 to display an image of the recognition result of the audio data (text data such as a sentence).

The communication control unit 128 transmits the audio data acquired by the acquisition unit 122 to the server apparatus 200 via the agent-side communication unit 102.

When the agent data includes information indicating the in-vehicle device control, the determination unit 130 determines the in-vehicle device VE that performs the in-vehicle device control based on the in-vehicle device information 152. The determination unit 130 searches the in-vehicle device information 152 using, for example, the name of the in-vehicle device VE included in the meaning information as a search keyword, and determines the in-vehicle device VE.

The determination unit 132 determines whether the in-vehicle device VE determined by the determination unit 130 is an influence vehicle-mounted device EVE based on the in-vehicle device information 152.

When the determination unit 132 determines that the in-vehicle device VE instructed to operate by the in-vehicle device control content does not affect the in-vehicle device EVE (i.e., does not affect the in-vehicle device NVE), the in-vehicle device control unit 134 controls the operation of the non-affecting in-vehicle device NVE based on the in-vehicle device control content. In the case where it is determined that the in-vehicle device VE is the influencing in-vehicle device EVE, the in-vehicle device control unit 134 executes control shown in the in-vehicle device control content, and determines whether or not an input indicating approval of the occupant is accepted by the first general switch 112. When the input indicating the approval of the occupant is received by the first general switch 112, the in-vehicle device control unit 134 controls the operation of affecting the in-vehicle device EVE based on the in-vehicle device control content.

The vehicle M further includes a second general switch 113, which is another switch than the first general switch 112, in the center of the dashboard IP, in the vicinity of the rear-seat microphone 106D, and in the vicinity of the microphone 106E, and when the determination unit 132 determines that the vehicle-mounted device VE that instructs the operation by the vehicle-mounted device control content is not a vehicle-mounted device belonging to a predetermined group (that is, a vehicle-mounted device that does not affect the vehicle-mounted device NVE or corresponds to an operation that is permitted by a driver, or the like), the vehicle-mounted device control unit 134 executes the control content that does not affect the vehicle-mounted device NVE based on the vehicle-mounted device control content, or may receive the approval of the occupant via the second general switch 113 provided in the center of the dashboard IP, in the vicinity of the rear-seat microphone 106D, and in the vicinity of the microphone 106E. At this time, when the input indicating the approval of the occupant is received by the second common switch 113, the operation of the non-influencing vehicle-mounted device NVE (or the vehicle-mounted device corresponding to the operation that is permitted to be performed by the driver as well) is controlled based on the vehicle-mounted device control content.

Here, when the determination unit 132 determines that the in-vehicle device VE instructed to operate by the in-vehicle device control content affects the in-vehicle device EVE, it is inquired whether or not the control indicated by the in-vehicle device control content can be executed, and when execution is authorized, the sound synthesis unit 124 generates a sound prompting the operation (for example, pressing) of the first general switch 112. The output control unit 126 outputs the sound generated by the sound synthesizing unit 124 to urge the first general switch 112 to be operated through the speaker 108. When the determination unit 132 determines that the in-vehicle device VE that instructs the operation by the in-vehicle device control content is the influencing vehicle device EVE, it is inquired whether or not the instruction shown by the in-vehicle device control content can be executed for the influencing vehicle device EVE, and when the vehicle device VE agrees, the output control unit 126 causes the display unit 110 to display an image (hereinafter referred to as an agreeing-urging image) that urges the operation (for example, pressing) of the first general switch 112.

Fig. 5 is a diagram showing an example of the acceleration approval image IM1 of the reclining mechanism of the driver seat (that is, the effect vehicle-mounted device EVE). The promotion agreement image IM1 includes, for example, a message MS asking whether or not an instruction (in this case, lying down) indicated by the in-vehicle apparatus control content can be executed for the lying down mechanism of the driver seat, and an image (illustrated image IM 2) showing a method of indicating an agreement operation for the first general switch 112. The message MS is, for example, "can the driver seat lie down? If the general switch can be pressed. "etc.

[ Structure of server device ]

Fig. 6 is a diagram showing an example of the structure of the server apparatus 200 according to the first embodiment. The server device 200 of the first embodiment includes, for example, a server-side communication unit 202, a server-side control unit 210, and a server-side storage unit 230.

The server-side communication unit 202 includes a communication interface such as NIC. The server-side communication unit 202 communicates with the agent devices 100 and the like mounted on the respective vehicles M via the network NW.

The server-side control unit 210 includes, for example, an acquisition unit 212, a speech section extraction unit 214, a voice recognition unit 216, an agent data generation unit 222, and a communication control unit 224. These components are realized by a processor such as a CPU or GPU executing a program (software). Some or all of these components may be realized by hardware (circuit part) such as LSI, ASIC, FPGA or by cooperation of software and hardware. The program may be stored in the server-side storage unit 230 (storage device including a non-transitory storage medium), or may be stored in a removable storage medium (non-transitory storage medium) such as a DVD or a CD-ROM, and the storage medium may be mounted on the server-side storage unit 230 by being mounted on a drive device.

The server-side storage unit 230 is implemented by an HDD, a flash memory, an EEPROM, a ROM, a RAM, or the like. The server-side storage unit 230 stores, for example, answer information 234 in addition to a program referred to by a processor.

Fig. 7 is a diagram showing an example of the content of the answer information 234. In the answer information 234, for example, the content of control executed by the agent-side control unit 120 is associated with meaning information. The meaning information is, for example, meaning recognized by the voice recognition unit 216 based on the entire speech content. The control content includes, for example, in-vehicle device control related to an instruction (control) of an operation of the in-vehicle device VE, sound control for outputting a body sound, image control for displaying on the display unit 110, and the like. For example, in the answer information 234, the in-vehicle apparatus control for "activating the air conditioner", the sound control for "activating the air conditioner", the display control for displaying the temperature in the vehicle interior and the set temperature, and the information indicating "activating the air conditioner" are associated with each other. In the case where the in-vehicle apparatus control content is content related to the effect of the in-vehicle apparatus EVE, the control cannot be executed without the approval of the occupant from the first general switch 112, and therefore the sound information and the display control do not have a correspondence relationship with the meaning information of the effect of the in-vehicle apparatus EVE.

Returning to fig. 6, the acquisition unit 212 acquires audio data from the agent apparatus 100 via the server-side communication unit 202.

The speaking section extraction unit 214 extracts a period during which the occupant speaks (hereinafter referred to as a speaking section) from the voice data acquired by the acquisition unit 122. For example, the speech section extracting unit 214 may extract the speech section based on the amplitude of the sound signal included in the sound data by using the zero-crossing method. The speech section extraction unit 214 may extract the speech section from the audio data based on a mixed gaussian distribution model (GMM: gaussian mixture model), or may extract the speech section from the audio data by performing a template matching process with a database in which the audio signal unique to the speech section is templated.

The voice recognition unit 216 recognizes voice data for each of the speaking sections extracted by the speaking section extraction unit 214, and text the recognized voice data to generate text data including the speaking contents. For example, the voice recognition unit 216 separates the voice signal in the speech section into a plurality of frequency bands such as low frequency band and high frequency band, and fourier transforms the respective classified voice signals to generate a spectrogram. The voice recognition unit 216 inputs the generated spectrogram to the recurrent neural network, thereby obtaining a character string from the spectrogram. The recurrent neural network can learn in advance by using, for example, teacher data in which a known character string corresponding to a learning sound is associated with a spectrogram generated from the learning sound as a teacher label. Then, the voice recognition unit 216 outputs the data of the character string obtained from the recurrent neural network as text data.

The voice recognition unit 216 performs a syntactic analysis of the text data in the natural language, divides the text data into morphemes, and recognizes a sentence included in the text data from each morpheme.

The agent data generation unit 222 refers to the meaning information of the answer information 234 based on the meaning of the speech content recognized by the voice recognition unit 216, and obtains control content in which a correspondence relationship is established with the meaning information. When the meaning of "TURN ON air conditioner", "request to TURN ON power supply of air conditioner", etc. is recognized as a result of the recognition, the agent data generation unit 222 replaces the meaning with the standard character information "start of air conditioner", the standard instruction information "turn_ac_on", etc. Thus, even when there is a character fluctuation in the request for the speech content, it is possible to easily obtain the control content which meets the request.

The agent data generation unit 222 generates agent data for causing processing corresponding to the acquired control content (for example, at least one of in-vehicle device control, sound control, and display control) to be executed.

The communication control unit 224 transmits the agent data generated by the agent data generation unit 222 to the vehicle M via the server-side communication unit 202. Thereby, the vehicle M performs control corresponding to the agent data by the agent-side control unit 120.

[ Timing chart at the time of receiving information that affects in-vehicle device control of in-vehicle device EVE ]

Fig. 8 is a diagram showing an example of a timing chart of a scenario in which information indicating in-vehicle device control affecting the in-vehicle device EVE is received. In fig. 8, each axis (axes AX1 to AX4 shown in the drawing) indicates the passage of time, the movement of the occupant of the vehicle M is shown on the axis AX1, the movement of the speaker 108 is shown on the axis AX2, the movement of the display unit 110 is shown on the axis AX3, and the state of the first general switch 112 is shown on the axis AX 4.

First, at time t1 to t2, the occupant speaks "lie down the driver seat" (item EV1 shown in the drawing). In response to the occurrence of the event EV1, the acquisition unit 122 acquires the speech sound received by the microphone 106 as sound data, and the communication control unit 128 transmits the sound data acquired by the acquisition unit 122 to the server apparatus 200 via the agent-side communication unit 102. The voice recognition unit 216 determines the content of the speech of the voice data acquired by the acquisition unit 122, and recognizes meaning information of the voice data and a case where the in-vehicle apparatus is controlled to "make the driver lie down". The server apparatus 200 transmits, to the agent apparatus 100, agent data including information indicating an in-vehicle device control indicating "a driver is lying down".

The determination unit 132 receives the agent data from the server apparatus 200, and determines whether or not information indicating the in-vehicle device control included in the agent data is information related to influencing the in-vehicle device EVE. When the determination unit 132 determines that the in-vehicle device VE shown by the in-vehicle device control is the effect vehicle device EVE (in this example, the reclining mechanism of the driver seat), the sound synthesis unit 124 generates "can the driver seat recline? If the general switch can be pressed. "equal sound. At time t3, the output control unit 126 outputs the sound (item EV2 shown in the drawing) generated by the sound synthesizing unit 124 through the speaker 108. Further, at time t3, when it is determined by the determination unit 132 that the in-vehicle device VE shown in the in-vehicle device control is the in-vehicle device EVE (in this example, the reclining mechanism of the driver seat), it is inquired whether the driver seat can be reclined, and if so, the output control unit 126 causes the display unit 110 to display a prompt approval image prompting the operation of the first general switch 112 (item EV3 shown in the drawing). The occupant operates the first general switch 112 when confirming one or both of the sound output through the item EV2 or the item EV3 and the displayed promotion approval image and approving "the driver to lie down".

The first general switch 112 is in an accepted state (item EV4 shown in the drawing) for a predetermined time (for example, several tens seconds to several minutes) from the time t3 when it is inquired whether the driver can lie down. The in-vehicle device control unit 134 receives an input indicating consent through the first general-purpose switch 112 within a predetermined time period, i.e., at time t4, and instructs the operator's reclining mechanism to control the operator's reclining.

[ Process flow ]

Next, the flow of the process of the agent system 1 according to the first embodiment will be described with reference to a flowchart. Hereinafter, the process of the agent device 100 and the process of the server device 200 will be described separately. The flow of the processing shown below may be repeatedly executed at a predetermined timing. The predetermined timing is, for example, a timing at which a specific word (for example, a wake-up word) for activating the agent device is extracted from the audio data, a timing at which a selection of a switch for activating the agent device 100 from among various switches mounted on the vehicle M is received, or the like.

Fig. 9 is a flowchart showing a flow of a series of processes of the agent device 100 according to the first embodiment. First, the acquisition unit 122 of the body-side control unit 120 determines whether or not the voice data of the occupant is collected by the microphone 106 after recognizing the wake-up word or after pressing the switch for activating the body device (step S100). The acquisition unit 122 waits until the sound data of the occupant is collected. Next, the communication control unit 128 transmits the audio data to the server apparatus 200 via the agent-side communication unit 102 (step S102). Next, the communication control unit 128 receives the agent data from the server apparatus 200 (step S304).

When the received agent data includes control content, the determination unit 130 determines the vehicle-mounted device VE to perform control based on the vehicle-mounted device information 152 (step S306). The determination unit 132 determines whether the in-vehicle apparatus VE determined by the determination unit 130 is an influence vehicle-mounted apparatus EVE (step S308). When the determination unit 132 determines that the in-vehicle device VE controlled by the in-vehicle device is not the influencing in-vehicle device EVE (i.e., is the non-influencing in-vehicle device NVE), the in-vehicle device control unit 134 causes the non-influencing in-vehicle device NVE (the speaker 108, the display unit 110) to execute control (e.g., sound control, display control) shown by the agent data (step S310).

When it is determined by the determination unit 132 that the in-vehicle device VE is affecting the in-vehicle device EVE, the output control unit 126 causes the speaker 108 to output the sound data generated by the sound synthesis unit 124 and requesting the approval of the occupant, or causes the display unit 110 to display the prompt approval image, thereby requesting the execution of the occupant approval control (step S312). The in-vehicle device control section 134 determines whether or not an input indicating approval is received by the first general switch 112 (step S314). When the vehicle-mounted device control unit 134 receives the approval, it executes the vehicle-mounted device control indicated by the agent data for affecting the vehicle-mounted device EVE (step S110). When the input indicating approval is not received by the first general-purpose switch 112 for a predetermined period of time, the in-vehicle device control unit 134 ends the process without executing the in-vehicle device control shown in the agent data (step S316).

Fig. 10 is a flowchart showing a flow of processing of an example of the server apparatus 200 according to the first embodiment. First, the server-side communication unit 202 acquires audio data from the agent device 100 (step S200). Next, the speech section extracting unit 214 extracts a speech section included in the audio data (step S202). Next, the voice recognition unit 216 recognizes the speech content from the voice data in the extracted speech section. Specifically, the voice recognition unit 216 converts the voice data into text data, and finally recognizes a sentence included in the text data (step S204). The agent data generation unit 222 generates agent data obtained based on the meaning of the entire speech content (step S206). Next, the communication control unit 224 of the server-side control unit 210 transmits the agent data to the agent apparatus 100 via the server-side communication unit 202 (step S208). Thus, the processing of the present flowchart ends.

[ Another example of the acoustic control and display control that prompt consent ]

In the above, the description has been given of the case where, when the in-vehicle apparatus control is the control for affecting the in-vehicle apparatus EVE, it is inquired whether the in-vehicle apparatus control can be executed and the sound synthesizing unit 124 generates the sound prompting the occupant to agree, but the present invention is not limited to this. For example, the answer information 234 may be the following information: as control contents affecting the vehicle-mounted device EVE, a correspondence is established in advance with sound control asking whether the vehicle-mounted device control can be executed and prompting the occupant's consent. Similarly, the answer information 234 may be the following information: a correspondence relationship is established in advance with display control of the display-promotion-consent image as control content affecting the in-vehicle apparatus EVE. In this case, the sound synthesizing unit 124 and the output control unit 126 execute sound control and display control shown by the agent data.

According to the agent system 1 of the first embodiment described above, even when the speech content of the user (occupant) involved in the control of the in-vehicle device VE is erroneously recognized or when the speech of the user involved in the control of the in-vehicle device VE is erroneous, it is possible to suppress the operation of the in-vehicle device VE according to the erroneous recognition, to maintain the safety of the vehicle M, and to easily instruct the operation of the in-vehicle device VE to the occupant.

< Second embodiment >

In the first embodiment described above, the case where the agent apparatus 100 and the server apparatus 200 mounted on the vehicle M are different apparatuses from each other has been described, but the present invention is not limited to this. For example, the components of the server apparatus 200 related to the agent function may be included in the components of the agent apparatus 100. In this case, the server apparatus 200 may function as a virtual machine virtually implemented by the agent side control unit 120 of the agent apparatus 100. Hereinafter, the agent device 100A including the constituent elements of the server device 200 will be described as a second embodiment. In this case, the agent device 100A is an example of "agent system". In the second embodiment, the same components as those in the first embodiment are denoted by the same reference numerals, and detailed description thereof is omitted.

Fig. 11 is a diagram showing an example of the agent apparatus 100A according to the second embodiment. The agent device 100A includes, for example, an agent-side communication unit 102, a microphone 106, a speaker 108, a display unit 110, a first general switch 112, a second general switch 113, an agent-side control unit 120A, and an agent-side storage unit 150A. The agent-side control unit 120A includes, for example, an acquisition unit 122, a sound synthesis unit 124, an output control unit 126, a communication control unit 128, a determination unit 132, an in-vehicle device control unit 134, a speech section extraction unit 214A, a sound recognition unit 216A, and an agent data generation unit 222A.

The agent-side storage unit 150A stores, for example, the in-vehicle device information 152, the response information 234A, and the like in addition to the program referred to by the processor. The answer information 234A may be updated by the latest information acquired from the server apparatus 200.

[ Process flow ]

Fig. 12 is a flowchart showing a flow of a series of processes of the agent device 100A according to the second embodiment. The flow of the processing described below may be repeatedly executed at a predetermined timing, as in the flow of the processing of the first embodiment. First, the acquisition unit 122 of the body-side control unit 120 determines whether or not the sound data of the occupant is collected by the microphone 106 (step S400). The acquisition unit 122 waits until the sound data of the occupant is collected. Next, the speech section extracting unit 214 extracts a speech section included in the audio data (step S402). Next, the voice recognition unit 216 recognizes the speech content from the voice data in the extracted speech section. Specifically, the sound data is converted into text data, and the sentence included in the text data is finally recognized (step S404). The agent data generation unit 222 generates agent data obtained based on the meaning of the entire speech content (step S406).

When the received agent data includes control content, the determination unit 130 determines the vehicle-mounted device VE to perform control based on the vehicle-mounted device information 152 (step S408). The determination unit 132 determines whether the in-vehicle apparatus VE determined by the determination unit 130 is an influence vehicle-mounted apparatus EVE (step S410). When the determination unit 132 determines that the in-vehicle device VE controlled by the in-vehicle device is not the influencing in-vehicle device EVE (i.e., is the non-influencing in-vehicle device NVE), the in-vehicle device control unit 134 causes the non-influencing in-vehicle device NVE (the speaker 108, the display unit 110, etc.) to execute the control (e.g., the sound control, the display control) shown by the agent data (step S412).

When it is determined by the determination unit 132 that the in-vehicle device VE controlled by the in-vehicle device is the influencing vehicle-mounted device EVE, the output control unit 126 causes the speaker 108 to output the sound data generated by the sound synthesis unit 124 requesting approval of execution of the control influencing the in-vehicle device EVE, and causes the display unit 110 to display the prompt approval image, thereby requesting execution of the occupant approval control (step S414). The in-vehicle device control section 134 determines whether or not an input indicating approval is received by the first general switch 112 (step S416). When the vehicle-mounted device control unit 134 receives the approval, it causes the vehicle-mounted device EVE to execute the vehicle-mounted device control indicated by the agent data (step S412). When the input indicating approval is not received by the first general-purpose switch 112 for a predetermined period of time, the in-vehicle device control unit 134 ends the process without executing the in-vehicle device control shown in the agent data (step S418).

According to the agent device 100A of the second embodiment described above, in addition to the same effects as those of the first embodiment, it is not necessary to perform communication with the server device 200 via the network NW every time a sound from an occupant is acquired, and thus the content of a speech can be recognized more quickly. In addition, even in a state where the vehicle M cannot communicate with the server apparatus 200, it is possible to generate the agent data and provide information to the occupant.

While the specific embodiments of the present invention have been described above using the embodiments, the present invention is not limited to the embodiments, and various modifications and substitutions can be made without departing from the scope of the present invention.

For example, in the above-described embodiment, the case where the vehicle is a four-wheeled motor vehicle has been described as an example, but the present invention is not limited thereto. For example, the vehicle may be another vehicle such as a two-wheeled motor vehicle or a transportation truck. The vehicle may be a rental car, a sharing car, or the like. In this case, for example, the agent device 100 may be disposed on a plurality of rental cars, rental two-wheeled vehicles, a plurality of shared cars, or the like. In this case, the intelligent agent apparatus 100 can easily perform an operation by voice even when the occupant is initially seated in the vehicle in which the intelligent agent apparatus 100 is mounted or even when the occupant is unskilled in operation by performing a conversation with the occupant. In addition, since the operation is allowed to be performed by other occupants than the driver, the intelligent agent apparatus 100 can request the operation to the other occupants, and thus the load on the driver can be reduced.

Claims

1. A vehicle-mounted device control system, wherein,

The in-vehicle device control system includes:

an acquisition unit that acquires a sound including a speech content of an occupant who is riding in the vehicle;

an in-vehicle device control unit that is mounted on the vehicle and controls the operation of in-vehicle devices including a speaker and a display unit;

A voice recognition unit that recognizes a voice including a speech content of an occupant of the vehicle acquired by the acquisition unit;

a determination portion that determines the in-vehicle apparatus that instructs an action by the sound of the occupant recognized by the sound recognition portion;

A determination unit that determines whether or not the specified in-vehicle device is an in-vehicle device belonging to a predetermined group; and

The general purpose switch is provided with a switch,

The in-vehicle apparatus control section outputs at least one of a sound asking whether or not an in-vehicle apparatus belonging to the predetermined group can execute an action corresponding to the instruction and a prompt approval image prompting approval of execution of the action in the in-vehicle apparatus belonging to the predetermined group through the speaker or the display section when the determination section determines that the in-vehicle apparatus receiving the instruction is an in-vehicle apparatus belonging to the predetermined group, and controls the action of the in-vehicle apparatus receiving the instruction when an input indicating approval of execution of the instruction by the occupant is received through the general switch,

The in-vehicle device control system further includes a switch that causes the in-vehicle device control unit to control an operation of the in-vehicle devices belonging to the group other than the predetermined group when it is determined that the in-vehicle device that received the instruction is the in-vehicle device belonging to the group other than the predetermined group.

2. The in-vehicle apparatus control system according to claim 1, wherein,

The in-vehicle devices belonging to the predetermined group are in-vehicle devices that affect the behavior of the vehicle.

3. The in-vehicle apparatus control system according to claim 1, wherein,

The in-vehicle devices belonging to the prescribed group are in-vehicle devices corresponding to operations that are permitted only by the driver in the vehicle.

4. The in-vehicle apparatus control system according to claim 1, wherein,

The general switch is a switch that can be used for other purposes in addition to a scenario in which a predetermined input related to an audio instruction including an input indicating the agreement is accepted.

5. The in-vehicle apparatus control system according to claim 4, wherein,

When an input indicating the start of receiving sound is received through the general-purpose switch, the sound recognition unit starts to recognize sound including the content of the speech of the occupant collected by a microphone as the acquisition unit.

6. The in-vehicle apparatus control system according to claim 1 or 5, wherein,

The universal switch is arranged on the steering wheel.

7. A vehicle-mounted device control apparatus, wherein,

The in-vehicle device control device includes:

A voice recognition unit that recognizes a voice including a speech content of the occupant acquired by the acquisition unit;

The general purpose switch is provided with a switch,

The in-vehicle apparatus control section outputs at least one of a sound asking whether or not an in-vehicle apparatus belonging to a predetermined group can execute an action corresponding to the instruction and a promotion approval image prompting approval of execution of the action in the in-vehicle apparatus belonging to the predetermined group through the speaker or the display section when the determination section determines that the in-vehicle apparatus receiving the instruction is an in-vehicle apparatus belonging to the predetermined group, and controls the action of the in-vehicle apparatus receiving the instruction when an input indicating approval of execution of the instruction by the occupant is received through the general switch,

The in-vehicle device control apparatus further includes a switch that causes the in-vehicle device control unit to control an operation of the in-vehicle devices belonging to the group other than the predetermined group when it is determined that the in-vehicle device that received the instruction is the in-vehicle device belonging to the group other than the predetermined group.

8. A vehicle-mounted device control method, wherein,

The in-vehicle device control method causes a single or a plurality of computers in an in-vehicle device control system provided with an acquisition unit that acquires sound including speech content of an occupant of a vehicle and a general-purpose switch to execute the steps of:

Identifying a sound that includes a speaking content of the occupant;

determining an in-vehicle apparatus that instructs an action by the sound of the identified occupant;

determining whether the determined vehicle-mounted device is a vehicle-mounted device belonging to a prescribed group;

when it is determined that the vehicle-mounted device that receives the instruction is a vehicle-mounted device belonging to the predetermined group, outputting, via a speaker or a display unit, at least one of a sound asking whether or not the vehicle-mounted device belonging to the predetermined group can execute an action corresponding to the instruction and a prompt approval image prompting approval of executing the action in the vehicle-mounted device belonging to the predetermined group; and

At this time, when an input indicating that the occupant agrees to execute the instruction is received through the general-purpose switch, an operation of the in-vehicle apparatus that receives the instruction is controlled,

The in-vehicle device control system further includes a switch that controls an operation of the in-vehicle devices belonging to the group other than the predetermined group when it is determined that the in-vehicle device that received the instruction is the in-vehicle device belonging to the group other than the predetermined group.

9. A storage medium, wherein,

The storage medium stores a program that is installed in a single or a plurality of computers in an in-vehicle device control system provided with an acquisition unit that acquires sound including a speech content of an occupant of a vehicle and a general-purpose switch, and that causes the computer to execute:

Identifying a sound that includes a speaking content of the occupant;

When it is determined that the in-vehicle device that accepted the instruction is an in-vehicle device belonging to the predetermined group, outputting, via a speaker or a display unit, at least one of a sound asking whether or not an in-vehicle device belonging to the predetermined group can execute an action corresponding to the instruction and a prompt approval image prompting approval of execution of the action in-vehicle devices belonging to the predetermined group; and