WO2023227129A1 - 语音交互方法、车机终端、车辆及存储介质 - Google Patents

语音交互方法、车机终端、车辆及存储介质 Download PDF

Info

Publication number
WO2023227129A1
WO2023227129A1 PCT/CN2023/096679 CN2023096679W WO2023227129A1 WO 2023227129 A1 WO2023227129 A1 WO 2023227129A1 CN 2023096679 W CN2023096679 W CN 2023096679W WO 2023227129 A1 WO2023227129 A1 WO 2023227129A1
Authority
WO
WIPO (PCT)
Prior art keywords
connection channel
vehicle
voice
server
channel
Prior art date
Application number
PCT/CN2023/096679
Other languages
English (en)
French (fr)
Inventor
郭华鹏
张岩
Original Assignee
广州小鹏汽车科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 广州小鹏汽车科技有限公司 filed Critical 广州小鹏汽车科技有限公司
Publication of WO2023227129A1 publication Critical patent/WO2023227129A1/zh

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B60VEHICLES IN GENERAL
    • B60WCONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
    • B60W40/00Estimation or calculation of non-directly measurable driving parameters for road vehicle drive control systems not related to the control of a particular sub unit, e.g. by using mathematical models
    • B60W40/08Estimation or calculation of non-directly measurable driving parameters for road vehicle drive control systems not related to the control of a particular sub unit, e.g. by using mathematical models related to drivers or passengers
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B60VEHICLES IN GENERAL
    • B60WCONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
    • B60W50/00Details of control systems for road vehicle drive control not related to the control of a particular sub-unit, e.g. process diagnostic or vehicle driver interfaces
    • B60W50/08Interaction between the driver and the control system
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B60VEHICLES IN GENERAL
    • B60WCONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
    • B60W50/00Details of control systems for road vehicle drive control not related to the control of a particular sub-unit, e.g. process diagnostic or vehicle driver interfaces
    • B60W50/08Interaction between the driver and the control system
    • B60W50/14Means for informing the driver, warning the driver or prompting a driver intervention
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/12Protocols specially adapted for proprietary or special-purpose networking environments, e.g. medical networks, sensor networks, networks in vehicles or remote metering networks
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/14Session management
    • H04L67/141Setup of application sessions
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W4/00Services specially adapted for wireless communication networks; Facilities therefor
    • H04W4/30Services specially adapted for particular environments, situations or purposes
    • H04W4/40Services specially adapted for particular environments, situations or purposes for vehicles, e.g. vehicle-to-pedestrians [V2P]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W76/00Connection management
    • H04W76/10Connection setup
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W76/00Connection management
    • H04W76/10Connection setup
    • H04W76/15Setup of multiple wireless link connections

Definitions

  • This application relates to the field of voice interaction technology, and in particular to a voice interaction method, a vehicle-machine terminal, a vehicle and a storage medium.
  • each sound zone pre-occupies a connection channel to communicate with the server, causing system resources to be excessively occupied.
  • This application provides a voice interaction method, a vehicle-machine terminal, a vehicle and a storage medium.
  • a voice interaction method of this application is used for vehicles.
  • the voice interaction method includes:
  • connection channels When the vehicle and the server perform voice interaction, determine the maximum number of connection channels between the vehicle and the server, and the connection channels include at least one core connection channel;
  • the voice instructions collected by the vehicle use the created core connection channel to communicate with the server to process the voice broadcast requirements corresponding to the voice instructions;
  • a new connection channel is created between the vehicle and the server until the number of created connection channels reaches the maximum number.
  • the above voice interaction method first creates a core connection channel to handle the voice broadcast requirements corresponding to voice commands.
  • a new connection channel is created, which can reduce the interaction between the vehicle and the server. , use smaller connection resources to solve multi-channel broadcast scenarios and avoid excessive occupation of system resources.
  • connection channels include at least one core connection channel, including:
  • the maximum number of connection channels is determined based on the determined interaction mode. In this way, the maximum number of connection channels can be determined based on requirements.
  • the interaction mode includes three interaction modes,
  • Mode 1 establishes 1 core connection channel between the vehicle and the server, and the maximum number of connection channels is 3;
  • Mode 2 establishes 1 core connection channel between the vehicle and the server, and the maximum number of connection channels is 2;
  • Mode 3 establishes 1 core connection channel between the vehicle and the server, and the maximum number of connection channels is 1.
  • the voice interaction method includes:
  • the label of the current connection channel is reset to idle.
  • the voice interaction method includes:
  • connection channel When the label of the connection channel is idle, use the connection channel to perform voice broadcast;
  • connection channel When the label of the connection channel is busy, a new connection channel is created, and the new connection channel is used for voice broadcast.
  • the voice interaction method includes:
  • the voice interaction method includes:
  • the corresponding relationship between the connecting channel and the vehicle sound zone is determined.
  • the vehicle sound zone and the connecting channel can be corresponding to each other.
  • a vehicle-machine terminal of the present application includes a memory, a processor, and a computer program stored in the memory.
  • the computer program is executed by the processor, the steps of any of the above voice interaction methods are implemented.
  • a vehicle of the present application includes the above-mentioned vehicle-machine terminal.
  • a computer-readable storage medium of the present application has a computer program stored thereon, and when the computer program is executed by a processor, the steps of any voice interaction method are implemented.
  • the above-mentioned on-board terminals, vehicles and computer-readable storage media first create a core connection channel to handle the voice broadcast requirements corresponding to voice commands.
  • a new connection channel is created. , can reduce the interaction between vehicles and servers, use smaller connection resources to solve multi-channel broadcast scenarios, and avoid excessive occupation of system resources.
  • Figure 1 is a schematic flow chart of the voice interaction method of the present application
  • FIG. 2 is a schematic diagram of the voice interaction method of the present application.
  • FIG. 3 is a schematic diagram of the interaction between the vehicle audio and the server of this application.
  • Figure 4 is a schematic structural diagram of the vehicle of the present application.
  • a voice interaction method is used for vehicles.
  • the voice interaction method includes:
  • Step 11 When the vehicle and the server perform voice interaction, determine the maximum number of connection channels between the vehicle and the server.
  • the connection channels include at least one core connection channel;
  • Step 13 create a core connection channel between the vehicle and the server
  • Step 15 According to the voice commands collected by the vehicle, use the created core connection channel to communicate with the server to process the voice broadcast requirements corresponding to the voice commands;
  • Step 17 When the core connection channel cannot meet the current multi-channel voice broadcast demand, a new connection channel is created between the vehicle and the server until the number of created connection channels reaches the maximum number.
  • the above voice interaction method first creates a core connection channel to handle the voice broadcast requirements corresponding to voice commands.
  • a new connection channel is created, which can reduce the interaction between the vehicle and the server. , use smaller connection resources to solve multi-channel broadcast scenarios and avoid excessive occupation of system resources.
  • connection channel between the vehicle and the server can be used for interaction between the vehicle and the server.
  • the vehicle can collect the voice instructions issued by the user and send the voice instructions to the server through the created connection channel, and the server can The voice command is processed by natural language understanding to obtain the operation of the voice command, and the reply audio file is generated based on the TTS engine.
  • the server sends the audio file to the vehicle through the created connection channel, and the vehicle controls the vehicle's audio for voice broadcast.
  • the connection channel created may be a websocket (WS) connection channel. It can be understood that in other implementations, the created connection channel can also be other types of connection channels, and is not limited to websocket connection channels.
  • the core connection channel can be understood as a connection channel that ensures interaction between the vehicle and the server.
  • One core connection channel can basically meet the broadcasting situation in most car use scenarios, while multiple connection channels in different vehicle sound zones account for a smaller proportion of simultaneous broadcasting scenarios. More often, one connection channel is used for alternate execution of different sound zones. broadcast scene.
  • the vehicle sound zone can be determined according to the user's position in the vehicle.
  • the vehicle sound zone can include the main driver sound zone, the passenger sound zone, the rear sound zone and the whole car sound zone.
  • the main driver sound zone can correspond to the main driver, passenger sound zone.
  • the sound zone can correspond to the co-pilot, the rear sound zone can correspond to the rear passengers, and the whole car sound zone can correspond to the drivers and passengers in the car.
  • the rear sound zone may also include a second row sound zone and a third row sound zone.
  • the second row sound zone may correspond to the second row of passengers, and the third row sound zone may correspond to the third row of passengers.
  • the whole car audio includes but is not limited to the audio at the headrest of the main driver's seat, the headphone jack provided in front of the passenger seat, the audio at the headrest of the passenger seat, front row audio and rear row audio.
  • the front row audio includes the audio on the center console and the front door audio
  • the rear row audio includes the rear door audio and trunk audio.
  • the voice broadcast can be realized by the main driver's audio, which can be a speaker installed at the headrest of the main driver's seat.
  • the voice broadcast can be realized by the passenger audio.
  • the passenger audio can be a headphone interface set in front of the passenger seat, and/or a speaker installed at the headrest of the passenger seat.
  • the voice command in the car can come from any vehicle sound zone, and the voice command can be identified by the user in which sound zone the voice command is issued by a sound collection device (such as a microphone) set in the corresponding sound zone.
  • a sound collection device such as a microphone
  • the announcement sound corresponding to the core connection channel is the whole car sound. That is to say, no matter which sound zone the voice command is issued, the vehicle uses the core connection channel to communicate with the server and receives the audio returned by the server. files, and perform voice broadcast through the whole car audio system. It should be pointed out that the announcement audio corresponding to the core connection channel can also be other audio, such as rear audio or front row audio, and is not limited to full car audio.
  • the vehicle creates new connection channels with the server to meet the current voice broadcast needs until the number of created connection channels reaches the maximum number.
  • step 11 includes:
  • the interaction mode of the vehicle is determined according to the selection instruction, and the maximum number of connection channels corresponding to different interaction modes is different;
  • connection channels can be determined based on requirements.
  • the selection instruction can be triggered by the user.
  • the vehicle can include a central control screen and a resource management module.
  • the central control screen can display a corresponding setting interface, and the user can select the interaction mode through the setting interface.
  • a selection instruction can be generated.
  • the resource management module can determine the interaction mode of the vehicle based on the selection instruction, and determine the control logic of the connection channel created with the server based on the interaction mode.
  • the interaction mode includes three interaction modes,
  • Mode 1 establishes 1 core connection channel between the vehicle and the server, and the maximum number of connection channels is 3;
  • Mode 2 establishes 1 core connection channel between the vehicle and the server, and the maximum number of connection channels is 2;
  • Mode 3 establishes 1 core connection channel between the vehicle and the server, and the maximum number of connection channels is 1.
  • connection channels can satisfy the user's allocation of system resources.
  • Users can choose different interaction modes according to the car usage scenario. For example, when the user has a greater demand for voice interaction, they can choose an interaction mode with a larger maximum number of connection channels. When the user has a small demand for voice interaction, he or she can choose an interaction mode with a smaller maximum number of connection channels to free up more system resources for use by other processes.
  • the resource management module can be used to manage the connection channels created by the above three modes.
  • the core connection channels are all one by default, and the maximum number of connections depends on the specific interaction mode.
  • non-core connection channels are connection channels other than core connection channels, the number of non-core connection channels is the maximum number of connections minus the number of core connection channels
  • this control can refer to the management strategy of the Java thread pool, that is, in the current interactive mode, if one core connection channel cannot handle the multi-channel voice broadcast in this mode, a new connection channel will be created until the maximum number of connections is created.
  • the created core connection channel is used to communicate with the server to process the voice broadcast requirements corresponding to the voice instructions.
  • the core connection channel cannot meet the current multi-channel voice broadcast needs, a new connection channel is created between the vehicle and the server until the number of created connection channels reaches 3.
  • the interaction mode is not limited to the above three modes, and may also include other interaction modes.
  • the number of core connection channels may not be limited to 1, but may also be other numbers. There is no specific limit here. Each mode The number of corresponding core connection channels can be the same or different.
  • the voice interaction method includes:
  • the label of the current connection channel is reset to idle.
  • the resource management module can open a core connection channel by default, that is, create a connection channel between the vehicle and the server. If it is the case of Mode 1, there will be up to 3 connection channels at the same time, and there is the possibility of 3 channels of voice broadcasting at the same time. This mode defaults to a core connection for voice broadcast. Whenever the current connection channel needs to perform voice broadcast, the resource management module marks the label of the current connection channel as busy, and resets the label to idle after the voice broadcast is completed.
  • the voice interaction method includes:
  • connection channel When the label of the connection channel is idle, the connection channel is used for voice broadcast;
  • connection channel When the label of the connection channel is busy, create a new connection channel and use the new connection channel for voice broadcast.
  • the resource management module determines the label of the connection channel. If the connection channel is idle at this time, the corresponding connection channel is used for voice broadcast. If the connection channel is busy, a new connection channel is created to ensure the playback of new audio files. The above new strategy ensures that each playback content is TTS broadcasted through as few connection channels as possible.
  • the voice interaction method includes:
  • the resource management module sets the expiration time for the non-core connection channel.
  • the setting time is to specify an expiration time (for example, 1 minute) when the non-core connection channel is created.
  • the expiration time needs to be dynamically updated.
  • Each time the connection channel is labeled busy, the expiration time will be reset. For example, if it is reset to 1 minute, if the label of the connection channel is still idle after 1 minute, the current non-core connection channel will be removed. It should be noted that only non-core connection connectivity is dynamically maintained. The core connection channel created never expires to ensure the interaction between the vehicle and the server. It is understood that the expiration time can also be set to other specific times, not limited to 1 minute.
  • the resource management module After the resource management module selects the corresponding strategy, it can start to create the corresponding number of connection channels with the server (such as TTS cloud). After creating multiple connection channels, start the interaction logic of each connection channel, and the interaction of each connection channel does not interfere with each other.
  • the server such as TTS cloud
  • the voice interaction method includes:
  • the vehicle sound zone and the connecting channel can be corresponding to each other.
  • the vehicle cabin can be pre-divided into several sound zones.
  • the vehicle cabin can be pre-divided into the main driving sound zone, the passenger sound zone, the rear sound zone and the whole car sound zone according to the user's position in the car.
  • the sound zone can correspond to the main driver
  • the passenger sound zone can correspond to the front passenger
  • the rear sound zone can correspond to the rear passengers
  • the whole car sound zone can correspond to the drivers and passengers in the car.
  • the rear sound zone may also include a second row sound zone and a third row sound zone.
  • the second row sound zone may correspond to the second row of passengers
  • the third row sound zone may correspond to the third row of passengers.
  • connection channel In each interaction mode, the corresponding relationship between the connection channel and the vehicle sound zone can be determined in advance.
  • the maximum number of connection channels is 3.
  • the core connection channel can correspond to the entire car sound zone, one of the non-core connection channels can correspond to the main driver's sound zone, and the other non-core connection channel can correspond to the passenger sound zone. area corresponding.
  • the vehicle When the interaction mode selected by the user is mode one, the vehicle first creates a core connection channel to communicate with the server. After obtaining the first voice command, for example, the first voice command comes from the main driving sound zone, the vehicle communicates with the server through the core connection channel. (cloud) for communication connection, the server receives the first voice command, processes it and obtains the corresponding reply audio file, and returns it to the vehicle through the core connection channel. The vehicle receives the audio file and uses the whole car audio to perform voice broadcast.
  • the server After obtaining the first voice command, for example, the first voice command comes from the main driving sound zone, the vehicle communicates with the server through the core connection channel. (cloud) for communication connection, the server receives the first voice command, processes it and obtains the corresponding reply audio file, and returns it to the vehicle through the core connection channel.
  • the vehicle receives the audio file and uses the whole car audio to perform voice broadcast.
  • the vehicle When the core connection channel cannot meet the multi-channel broadcast requirements, for example, when the vehicle is using the core connection channel to broadcast, the vehicle receives a second voice command from the passenger sound zone, and the vehicle determines that the created core connection channel is busy.
  • a second connection channel will be created, that is, a non-core connection channel is created, and the non-core connection channel is used to interact with the server to obtain the corresponding reply audio file, and through the non-core connection channel, the passenger voice area is used for speech Announcement, at this time, the co-pilot heard the voice reply from the co-pilot sound area.
  • the resource management module will interact with the server according to the interaction mode settings selected by the user. If the interaction mode selected by the user is mode three, all three speakers will perform voice broadcasts. After the vehicle is powered on, the vehicle will first create a core connection channel with the server to handle the voice broadcast requirements of the three speakers.
  • the resource management module will create a second connection channel (that is, a non-core connection channel, and Set the expiration time of the non-core connection channel to 1 minute) and broadcast multiple sound zones simultaneously. After the non-core connection channel completes the broadcast, the label of the non-core connection channel will be reset to idle.
  • the vehicle will check the tag of the non-core connection channel every 1 minute. If the current tag of the non-core connection channel is idle, and the difference between the current time and the last busy time exceeds 1 minute, the vehicle will disconnect the non-core connection channel from the server. Only one core connection is reserved to interact with the server for TTS synthesis.
  • the voice interaction method in the embodiment of this application can achieve at least the following advantages:
  • the user experience is good and the responses are better directed. After the user issues a voice command, he or she can clearly get the feedback on the operation while trying not to disturb other users;
  • the overall efficiency is high, making full use of the in-car audio channel resources, allocating resources reasonably when multiple sound zones interact at the same time, ensuring that the tasks of each sound zone are executed smoothly as much as possible, and the execution does not fail due to the inability to obtain resources, which can realize the control of the server. Connection channels maintain more precise control.
  • a car-machine terminal 100 includes: a memory 12, a processor 14 and a computer program stored in the memory 12.
  • the computer program is executed by the processor 14, the voice of any of the above embodiments is implemented. Steps of the interactive method.
  • a vehicle 200 includes the vehicle-machine terminal 100 of the above embodiment.
  • the vehicle 200 also includes a body 16 , and the vehicle-machine terminal 100 is installed on the body 16 .
  • Embodiments of the present application provide a computer-readable storage medium.
  • a computer program is stored on the computer-readable storage medium.
  • the computer program is executed by the processor 14, the steps of the voice interaction method of any of the above embodiments are implemented.
  • the voice interaction method implemented when the computer program is executed by the processor 14 includes:
  • Step 11 When the vehicle and the server perform voice interaction, determine the maximum number of connection channels between the vehicle 100 and the server.
  • the connection channels include at least one core connection channel;
  • Step 13 Create a core connection channel between the vehicle 100 and the server;
  • Step 15 According to the voice commands collected by the vehicle, use the created core connection channel to communicate with the server to process the voice broadcast requirements corresponding to the voice commands;
  • Step 17 When the core connection channel cannot meet the current multi-channel voice broadcast demand, a new connection channel is created between the vehicle 100 and the server until the number of created connection channels reaches the maximum number.
  • the above-mentioned vehicle-machine terminal 100, vehicle 200 and computer-readable storage medium first create a core connection channel to handle the voice broadcast requirements corresponding to voice commands.
  • a new one is created.
  • the connection channel can reduce the interaction between the vehicle 200 and the server, use smaller connection resources to solve multi-channel broadcast scenarios, and avoid excessive occupation of system resources.
  • the methods of the above embodiments can be implemented by means of software plus the necessary general hardware platform. Of course, it can also be implemented by hardware, but in many cases the former is better. implementation.
  • the technical solution of the present application can be embodied in the form of a software product in essence or that contributes to the existing technology.
  • the computer software product is stored in one of the above storage media (such as ROM/RAM, magnetic disc, optical disk), including several instructions to cause a terminal device (which can be a mobile phone, a computer, a server, a home appliance, or a network device, etc.) to execute the methods of various embodiments of the present application.

Landscapes

  • Engineering & Computer Science (AREA)
  • Signal Processing (AREA)
  • Automation & Control Theory (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Transportation (AREA)
  • Human Computer Interaction (AREA)
  • Mechanical Engineering (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Acoustics & Sound (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Multimedia (AREA)
  • Computational Linguistics (AREA)
  • Mathematical Physics (AREA)
  • Computing Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Mobile Radio Communication Systems (AREA)

Abstract

一种语音交互方法、车机终端、车辆及存储介质。语音交互方法包括:在车辆和服务器进行语音交互时,确定车辆与服务器之间的连接通道的最大数量,连接通道至少包括一个核心连接通道(11);在车辆与服务器之间创建核心连接通道(13);根据车辆采集到的语音指令,利用所创建的核心连接通道与服务器进行通信连接以处理语音指令对应的语音播报需求(15);当核心连接通道无法满足当前的多路语音播报需求时,在车辆与服务器之间创建新的连接通道,直至所创建的连接通道的数量至最大数量为止(17)。

Description

语音交互方法、车机终端、车辆及存储介质
本申请要求于2022年5月27日申请的、申请号为202210586081.2的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及语音交互技术领域,特别涉及一种语音交互方法、车机终端、车辆及存储介质。
背景技术
随着车型技术发展,新车型开始支持一心多用,即一辆车可以同时存在多个音区和用户进行交互,因而各音区的交互请求,也可以通过不同的TTS通道及发声音区给予用户反馈,然而,不管音区是否存在语音交互,每个音区均预先占用一个连接通道与服务器进行通信,使得系统资源被过多占用。
技术问题
本申请提供了一种语音交互方法、车机终端、车辆及存储介质。
技术解决方案
本申请的一种语音交互方法,用于车辆,所述语音交互方法包括:
在车辆和服务器进行语音交互时,确定所述车辆与服务器之间的连接通道的最大数量,所述连接通道至少包括一个核心连接通道;
在所述车辆与所述服务器之间创建所述核心连接通道;
根据车辆采集到的语音指令,利用所创建的核心连接通道与所述服务器进行通信连接以处理所述语音指令对应的语音播报需求;
当所述核心连接通道无法满足当前的多路语音播报需求时,在所述车辆与所述服务器之间创建新的连接通道,直至所创建的连接通道的数量至最大数量为止。
上述语音交互方法,先创建一个核心连接通道来处理语音指令对应的语音播报需求,在核心连接通道无法满足多路语音播报需求的情况下,再创建新的连接通道,能够减少车辆与服务器的交互,以较小的连接资源去解决多通道播报的场景,避免系统资源被过多占用。
确定所述车辆与服务器之间的连接通道的最大数量,所述连接通道至少包括一个核心连接通道,包括:
根据选择指令确定所述车辆的交互模式,不同的交互模式对应的连接通道的最大数量不同;
根据所确定的交互模式,确定所述连接通道的最大数量。如此,可以根据需求来确定连接通道的最大数量。
所述交互模式包括三个交互模式,
模式一为车辆与服务器建立1路核心连接通道,连接通道的最大数量为3路;
模式二为车辆与服务器建立1路核心连接通道,连接通道的最大数量为2路;
模式三为车辆与服务器建立1路核心连接通道,连接通道的最大数量为1路。
如此,可以供用户进行选择,提升用户体验。
所述语音交互方法,包括:
在当前连接通道需要进行语音播报时,标识所述当前连接通道的标签为忙碌;
在所述当前连接通道语音播报完毕后,重置所述当前连接通道的标签为空闲。
如此,可以实现状态机策略。
所述语音交互方法,包括:
在接收到所述服务器根据所述语音指令返回的音频文件时,获取所述连接通道的标签;
在所述连接通道的标签为空闲时,利用所述连接通道进行语音播报;
在所述连接通道的标签为忙碌时,创建新的连接通道,并利用所述新的连接通道进行语音播报。
如此,可以实现连接通道的新增策略。
所述语音交互方法包括:
在创建新的连接通道时,对新的连接通道设置过期时间;
在所述过期时间内,所述新的连接通道进行语音播报时,标识所述新的连接通道的标签为忙碌,并重置所述过期时间;
在所述过期时间后所述新的连通道的标签为空闲时,移除所述新的连接通道。
如此,可以实现过期删除策略。
所述语音交互方法包括:
将车辆座舱预先分成若干个音区;
确定所述连接通道与车辆音区的对应关系。
如此,可以实现车辆音区与连接通道相互对应。
本申请的一种车机终端,包括存储器、处理器及存储在所述存储器的计算机程序,所述计算机程序被所述处理器执行时实现上述任一语音交互方法的步骤。
本申请的一种车辆,包括上述的车机终端。
本申请的一种计算机可读存储介质,其上存储有计算机程序,所述计算机程序被处理器执行时实现任一语音交互方法的步骤。
有益效果
上述车机终端、车辆和计算机可读存储介质,先创建一个核心连接通道来处理语音指令对应的语音播报需求,在核心连接通道无法满足多路语音播报需求的情况下,再创建新的连接通道,能够减少车辆与服务器的交互,以较小的连接资源去解决多通道播报的场景,避免系统资源被过多占用。
本申请的附加方面和优点将在下面的描述中部分给出,部分将从下面的描述中变得明显,或通过本申请的实践了解到。
附图说明
本申请的上述和/或附加的方面和优点从结合下面附图对实施方式的描述中将变得明显和容易理解,其中:
图1是本申请的语音交互方法的流程示意图;
图2是本申请的语音交互方法的模式示意图;
图3是本申请的车辆音响与服务器的交互示意图;
图4是本申请的车辆的结构示意图。
本发明的实施方式
下面详细描述本申请的实施方式,所述实施方式的示例在附图中示出,其中自始至终相同或类似的标号表示相同或类似的元件或具有相同或类似功能的元件。下面通过参考附图描述的实施方式是示例性的,仅用于解释本申请,而不能理解为对本申请的限制。在本申请的描述中,“多个”的含义是两个或两个以上,除非另有明确具体的限定。
本文的公开提供了许多不同的实施方式或例子用来实现本申请的不同结构。为了简化本申请的公开,本文中对特定例子的部件和设置进行描述。当然,它们仅仅为示例,并且目的不在于限制本申请。
请参阅图1,本申请实施方式的一种语音交互方法,用于车辆,语音交互方法包括:
步骤11,在车辆和服务器进行语音交互时,确定车辆与服务器之间的连接通道的最大数量,连接通道至少包括一个核心连接通道;
步骤13,在车辆与服务器之间创建核心连接通道;
步骤15,根据车辆采集到的语音指令,利用所创建的核心连接通道与服务器进行通信连接以处理语音指令对应的语音播报需求;
步骤17,当核心连接通道无法满足当前的多路语音播报需求时,在车辆与服务器之间创建新的连接通道,直至所创建的连接通道的数量至最大数量为止。
上述语音交互方法,先创建一个核心连接通道来处理语音指令对应的语音播报需求,在核心连接通道无法满足多路语音播报需求的情况下,再创建新的连接通道,能够减少车辆与服务器的交互,以较小的连接资源去解决多通道播报的场景,避免系统资源被过多占用。
具体地,车辆与服务器之间的连接通道可以供车辆与服务器之间进行交互,例如,车辆可以采集用户发出的语音指令,并通过已创建好的连接通道将语音指令发送至服务器,服务器可以对语音指令进行自然语言理解等的处理以获取语音指令的操作,并基于TTS引擎生成回复的音频文件,服务器通过已创建好的连接通道发送音频文件至车辆,由车辆控制车辆的音响进行语音播报。在一个实施方式中,所创建的连接通道可以是websocket(WS)连接通道。可以理解,在其他实施方式中,所创建的连接通道还可以是其他类型的连接通道,而不限于websocket连接通道。
确定车辆与服务器之间的连接通道的最大数量,可以保证系统资源能够得到合理利用。核心连接通道,可以理解为,是保证车辆与服务器之间能够进行交互的一个连接通道。一个核心连接通道基本上可以满足多数用车场景下的播报情况,而不同车辆音区的多个连接通道同时播报的场景占比较小,更多的时候是针对不同音区利用一个连接通道交替执行的播报的场景。
车辆音区可以根据用户在车辆内的位置确定,例如,车辆音区可包括主驾音区、副驾音区、后排音区和全车音区,主驾音区可对应于主驾驶,副驾音区可对应于副驾驶,后排音区可对应于后排乘客,全车音区可对应于车内的司乘人员。进一步地,后排音区还可以包括第二排音区和第三排音区,第二排音区可对应于第二排乘客,第三排音区可对应于第三排乘客。
在一个实施方式中,全车音响包括但不限于主驾驶位的头枕处的音响、副驾驶位的前方设置的耳机接口、副驾驶位的头枕处的音响、前排音响和后排音响,前排音响包括中控台上的音响、前车门音响,后排音响包括后车门音响、后备厢音响。
对于主驾音区,可以由主驾音响实现语音播报,主驾音响可以是设置在主驾驶位的头枕处的音响。对于副驾音区,可以由副驾音响实现语音播报,副驾音响可以是设置在副驾驶位前方的耳机接口,和/或设置在副驾驶位的头枕处的音响。对于后排音区,可以利用设置在后排音响来实现播报。
车内的语音指令可以来自任一车辆音区,可以通过设置在相应音区的声音采集装置(如麦克风)来识别语音指令是处于哪个音区的用户发出的。
在一个实施方式中,核心连接通道所对应的播报音响为全车音响,也就是说,不管是哪个音区发出的语音指令,车辆均利用核心连接通道与服务器进行通信连接,接收服务器返回的音频文件,并通过全车音响进行语音播报。需要指出的是,核心连接通道所对应的播报音响还可以是其他音响,例如,后排音响或前排音响,而不限于全车音响。
当核心连接通道无法满足当前的多路语音播报需求时,车辆在与服务器之间再创建新的连接通道,以满足当前的语音播报需求,直至所创建的连接通道的数量为最大数量为止。
在某些实施方式中,步骤11,包括:
根据选择指令确定车辆的交互模式,不同的交互模式对应的连接通道的最大数量不同;
根据所确定的交互模式,确定连接通道的最大数量。
如此,可以根据需求来确定连接通道的最大数量。
具体地,选择指令可以由用户触发,例如,车辆可包括中控屏和资源管理模块,在车辆上电后,中控屏可以显示相应的设置界面,用户可以通过设置界面来选择交互模式。用户触摸中控屏上相应的按钮时,可以生成选择指令,资源管理模块可以根据选择指令来确定车辆的交互模式,并根据交互模式确定与服务器创建的连接通道的控制逻辑。
在某些实施方式中,交互模式包括三个交互模式,
模式一为车辆与服务器建立1路核心连接通道,连接通道的最大数量为3路;
模式二为车辆与服务器建立1路核心连接通道,连接通道的最大数量为2路;
模式三为车辆与服务器建立1路核心连接通道,连接通道的最大数量为1路。
如此,可以供用户进行选择,提升用户体验。
具体地,不同最大数量的连接通道可以满足用户对系统资源的分配。用户可以根据用车场景来选择不同的交互模式,例如,在用户对语音交互需求较大时,可以选择连接通道最大数量较多的交互模式。在用户对语音交互需求较小时,可以选择连接通道最大数量较小的交互模式,以释放更多系统资源供其他进程使用。
资源管理模块可以用于管理以上三种模式创建的连接通道,核心连接通道默认都是一路,最大连接数量的则设置依赖具体的交互模式。
何时去创建非核心连接通道,(非核心连接通道为核心连接通道之外的连接通道,非核心连接通道的数量是最大连接数量减去核心连接通道数量),在一个实施方式中,此控制逻辑可以参照Java线程池的管理策略,即在当前交互模式下,1路核心连接通道无法处理该模式下的多路语音播报时,再创建新的连接通道,直至创建到最大连接数量为止。
例如,在获取到用户选择指令,确定交互模式为模式一时,基于采集到的语音指令,利用所创建的核心连接通道与服务器进行通信连接以处理语音指令对应的语音播报需求。当核心连接通道无法满足当前的多路语音播报需求时,在车辆与服务器之间创建新的连接通道,直至所创建的连接通道的数量至3路为止。
可以理解,在其他实施方式中,交互模式不限于上述三种模式,还可以包括其它交互模式,核心连接通道数量也可不限于1路,还可以是其它数量,在此不作具体限定,每个模式对应的核心连接通道的数量可以相同,也可以不同。
在某些实施方式中,语音交互方法,包括:
在当前连接通道需要进行语音播报时,标识当前连接通道的标签为忙碌;
在当前连接通道语音播报完毕后,重置当前连接通道的标签为空闲。
如此,可以实现状态机策略。
具体地,在一个实施方式中,资源管理模块可以默认开启一路的核心连接通道,即创建车辆与服务器之间的一路连接通道。如果是模式一的情况,最多会同时存在3路连接通道,可以同时进行3路语音播报的可能性。该模式一下默认通过一路核心连接连通进行语音播报。每当当前连接通道需要进行语音播报时,资源管理模块标识当前连接通道的标签为忙碌,当语音播报完毕后再重置该标签为空闲。
在某些实施方式中,语音交互方法,包括:
在接收到服务器根据语音指令返回的音频文件时,获取连接通道的标签;
在连接通道的标签为空闲时,利用连接通道进行语音播报;
在连接通道的标签为忙碌时,创建新的连接通道,并利用新的连接通道进行语音播报。
如此,可以实现连接通道的新增策略。
具体地,在一个实施方式中,当每一次需要播报的音频文件来临时,资源管理模块判断连接通道的标签,如果此时连接通道处于空闲状态则使用对应的连接通道进行语音播报,如果对应的连接通道处于忙碌状态,则创建新的连接通道以保证新的音频文件的播放,通过以上新增策略保证各个播放内容通过尽量少的连接通道来进行TTS播报。
在某些实施方式中,语音交互方法包括:
在创建新的连接通道时,对新的连接通道设置过期时间;
在过期时间内,新的连接通道进行语音播报时,标识新的连接通道的标签为忙碌,并重置过期时间;
在过期时间后新的连通道的标签为空闲时,移除新的连接通道。
如此,可以实现过期删除策略。
具体地,在一个实施方式中,资源管理模块对非核心连接通道会设置过期时间,设置时机在非核心连接通道被创建的时候,指定一个过期时间(例如1分钟),过期时间需要动态更新,每次该连接通道被打上忙碌标签时会重置该过期时间,如重置为1分钟,1分钟后如果该连接通道的标签仍然是空闲状态,移除当前非核心连接通道。需要说明的是,动态维护的只有非核心连接连通。所创建的核心连接通道永不过期,以保证车辆与服务器之间的交互。可以理解,过期时间还可以设置为其他具体的时间,而不限于1分钟。
资源管理模块选择相应的策略之后,就可以开始与服务器(如TTS云端)创建对应的连接通道数量。创建多路连接通道之后,开始每一路连接通道的交互逻辑,每一路连接通道的交互互不干扰。
在某些实施方式中,语音交互方法包括:
将车辆座舱预先分成若干个音区;
确定连接通道与车辆音区的对应关系。
如此,可以实现车辆音区与连接通道相互对应。
具体地,可以将车辆座舱预先分成若干个音区,例如可以根据用户在车内的位置,将车辆座舱预先分成主驾音区、副驾音区、后排音区和全车音区,主驾音区可对应于主驾驶,副驾音区可对应于副驾驶,后排音区可对应于后排乘客,全车音区可对应于车内的司乘人员。进一步地,后排音区还可以包括第二排音区和第三排音区,第二排音区可对应于第二排乘客,第三排音区可对应于第三排乘客等。
在每个交互模式下,均可以预先确定连接通道与车辆音区的对应关系。例如,在模式一下,连接通道的最大数量是3路,核心连接通道可以与全车音区对应,其中一路非核心连接通道可以与主驾音区对应,另一路非核心连接通道可以与副驾音区对应。
当用户选择的交互模式是模式一时,车辆先创建核心连接通道,与服务器进行通信连接,在获取到第一语音指令,如第一语音指令来自主驾音区时,车辆通过核心连接通道与服务器(云端)进行通信连接,服务器接收到第一语音指令,处理后获取相应的回复音频文件,并经核心连接通道返回至车辆,车辆接收到音频文件,利用全车音响进行语音播报。
当核心连接通道无法满足多路播报需求时,例如,车辆在利用核心连接通道正在播报时,车辆又接收到来自副驾音区的第二语音指令,车辆确定已创建的核心连接通道处于忙碌状态,会再创建第二个连接通道,即创建一个非核心连接通道,利用该非核心连接通道与服务器进行交互,以获取相应的回复音频文件,并通过该非核心连接通道,利用副驾音区进行语音播报,此时,副驾驶听到副驾音区发出的语音回复。
请结合图3,以下描述为本申请实施方式的语音交互方法的一个例子。
如图3所示,假如某车辆硬件配置有3个音响发声区,即3个音区,资源管理模块,会根据用户所选择的交互模式设置的情况与服务器进行交互。假如用户选择的交互模式是模式三,三个音响都会进行语音播报。那么在车辆上电之后,车辆会先与服务器创建一个核心连接通道,进而处理三个音响的语音播报需求。
当车里坐了多个用户,同时跟车辆的语音助手对话,驾驶员问今天天气怎么样,副驾驶说打开车窗,后排乘客说打开空调,这样的场景会出现三个音区需要同时响应多个用户(即多连接通道需要同时进行语音播报)的场景,这时如果已经创建的核心连接通道处于忙碌状态时,资源管理模块会创建第二个连接通道(即非核心连接通道,并且设置该非核心连接通道的过期时间是1分钟),进行多个音区同时播报,非核心连接通道在播报完毕后,非核心连接通道的标签会重置为空闲。
车辆每1分钟会检查非核心连接通道的标签,如果当前非核心连接通道的标签为空闲,并且当前时间距离上一次忙碌的时间差超过1分钟,则车辆会与服务器断开该非核心连接通道,只保留一路核心连接连通与服务器进行交互以用于TTS合成。
综上,本申请实施方式的语音交互方法,至少可以实现以下优点:
1、用户体验好,回复的指向性更好,用户发出语音指令后,自己能清晰获知操作反馈,同时尽量不打扰其他用户;
2、整体效率高,充分利用车内音频通道资源,在多音区同时交互时合理调配资源,尽可能保证各音区任务顺利执行、不因无法获取到资源而执行失败,可以实现对服务器的连接通道保持更加精准的控制。
请参图4,本申请实施方式的一种车机终端100,包括:存储器12、处理器14及存储在存储器12的计算机程序,计算机程序被处理器14执行时实现上述任一实施方式的语音交互方法的步骤。
请参图4,本申请实施方式的一种车辆200,包括上述实施方式的车机终端100。
具体地,车辆200还包括车身16,车机终端100安装在车身16。
本申请实施方式提供一种计算机可读存储介质,计算机可读存储介质上存储有计算机程序,计算机程序被处理器14执行时实现上述任一实施方式的语音交互方法的步骤。
在一个实施方式中,计算机程序被处理器14执行时实现的语音交互方法,包括:
步骤11,在车辆和服务器进行语音交互时,确定车辆100与服务器之间的连接通道的最大数量,连接通道至少包括一个核心连接通道;
步骤13,在车辆100与服务器之间创建核心连接通道;
步骤15,根据车辆采集到的语音指令,利用所创建的核心连接通道与服务器进行通信连接以处理语音指令对应的语音播报需求;
步骤17,当核心连接通道无法满足当前的多路语音播报需求时,在车辆100与服务器之间创建新的连接通道,直至所创建的连接通道的数量至最大数量为止。
上述车机终端100、车辆200和计算机可读存储介质,先创建一个核心连接通道来处理语音指令对应的语音播报需求,在核心连接通道无法满足多路语音播报需求的情况下,再创建新的连接通道,能够减少车辆200与服务器的交互,以较小的连接资源去解决多通道播报的场景,避免系统资源被过多占用。
需要说明的是,上述对语音交互方法的实施方式和有益效果的解释说明,也适应于本实施方式的车机终端100、车辆200和计算机可读存储介质,为避免冗余,在此不作详细展开。
在本说明书的描述中,参考术语“一个实施方式”、“一些实施方式”、“示意性实施方式”、“示例”、“具体示例”或“一些示例”等的描述意指结合实施方式或示例描述的具体特征、结构、材料或者特点包含于本申请的至少一个实施方式或示例中。在本说明书中,对上述术语的示意性表述不一定指的是相同的实施方式或示例。而且,描述的具体特征、结构、材料或者特点可以在任何的一个或多个实施方式或示例中以合适的方式结合。
通过以上的实施方式的描述,本领域的技术人员可以清楚地了解到上述实施例方法可借助软件加必需的通用硬件平台的方式来实现,当然也可以通过硬件,但很多情况下前者是更佳的实施方式。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品存储在如上的一个存储介质(如ROM/RAM、磁碟、光盘)中,包括若干指令用以使得一台终端设备(可以是手机,计算机,服务器,家用电器,或者网络设备等)执行本申请各个实施例的方法。
尽管上面已经示出和描述了本申请的实施例,可以理解的是,上述实施例是示例性的,不能理解为对本申请的限制,本领域的普通技术人员在本申请的范围内可以对上述实施方式进行变化、修改、替换和变型。

Claims (10)

  1. 一种语音交互方法,用于车辆,其中,所述语音交互方法包括:
    在车辆和服务器进行语音交互时,确定所述车辆与服务器之间的连接通道的最大数量,所述连接通道至少包括一个核心连接通道;
    在所述车辆与所述服务器之间创建所述核心连接通道;
    根据车辆采集到的语音指令,利用所创建的核心连接通道与所述服务器进行通信连接以处理所述语音指令对应的语音播报需求;
    当所述核心连接通道无法满足当前的多路语音播报需求时,在所述车辆与所述服务器之间创建新的连接通道,直至所创建的连接通道的数量至最大数量为止。
  2. 根据权利要求1所述的语音交互方法,其中,确定所述车辆与服务器之间的连接通道的最大数量,所述连接通道至少包括一个核心连接通道,包括:
    根据选择指令确定所述车辆的交互模式,不同的交互模式对应的连接通道的最大数量不同;
    根据所确定的交互模式,确定所述连接通道的最大数量。
  3. 根据权利要求2所述的语音交互方法,其中,所述交互模式包括三个交互模式,
    模式一为车辆与服务器建立1路核心连接通道,连接通道的最大数量为3路;
    模式二为车辆与服务器建立1路核心连接通道,连接通道的最大数量为2路;
    模式三为车辆与服务器建立1路核心连接通道,连接通道的最大数量为1路。
  4. 根据权利要求1所述的语音交互方法,其中,所述语音交互方法,包括:
    在当前连接通道需要进行语音播报时,标识所述当前连接通道的标签为忙碌;
    在所述当前连接通道语音播报完毕后,重置所述当前连接通道的标签为空闲。
  5. 根据权利要求4所述的语音交互方法,其中,所述语音交互方法,包括:
    在接收到所述服务器根据所述语音指令返回的音频文件时,获取所述连接通道的标签;
    在所述连接通道的标签为空闲时,利用所述连接通道进行语音播报;
    在所述连接通道的标签为忙碌时,创建新的连接通道,并利用所述新的连接通道进行语音播报。
  6. 根据权利要求1所述的语音交互方法,其中,所述语音交互方法包括:
    在创建新的连接通道时,对新的连接通道设置过期时间;
    在所述过期时间内,所述新的连接通道进行语音播报时,标识所述新的连接通道的标签为忙碌,并重置所述过期时间;
    在所述过期时间后所述新的连通道的标签为空闲时,移除所述新的连接通道。
  7. 根据权利要求1所述的语音交互方法,其中,所述语音交互方法包括:
    将车辆座舱预先分成若干个音区;
    确定所述连接通道与车辆音区的对应关系。
  8. 一种车机终端,其中,包括:存储器、处理器及存储在所述存储器的计算机程序,所述计算机程序被所述处理器执行时实现权利要求1至7中任一项所述的语音交互方法的步骤。
  9. 一种车辆,其中,包括权利要求8所述的车机终端。
  10. 一种计算机可读存储介质,其中,所述计算机可读存储介质上存储有计算机程序,所述计算机程序被处理器执行时实现权利要求1至7中任一项所述的语音交互方法的步骤。
PCT/CN2023/096679 2022-05-27 2023-05-26 语音交互方法、车机终端、车辆及存储介质 WO2023227129A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202210586081.2A CN114678026B (zh) 2022-05-27 2022-05-27 语音交互方法、车机终端、车辆及存储介质
CN202210586081.2 2022-05-27

Publications (1)

Publication Number Publication Date
WO2023227129A1 true WO2023227129A1 (zh) 2023-11-30

Family

ID=82079198

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/096679 WO2023227129A1 (zh) 2022-05-27 2023-05-26 语音交互方法、车机终端、车辆及存储介质

Country Status (2)

Country Link
CN (1) CN114678026B (zh)
WO (1) WO2023227129A1 (zh)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114678026B (zh) * 2022-05-27 2022-10-14 广州小鹏汽车科技有限公司 语音交互方法、车机终端、车辆及存储介质

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101677329A (zh) * 2008-09-18 2010-03-24 中兴通讯股份有限公司 一种综合语音资源平台代理服务器及其数据处理方法
US20110171950A1 (en) * 2008-09-26 2011-07-14 Aleksey Anatolyevich Ivanchikov Method of exchanging voice messages between the driver and user of the vehicle
WO2012174515A1 (en) * 2011-06-16 2012-12-20 Agero Connected Services, Inc. Hybrid dialog speech recognition for in-vehicle automated interaction and in-vehicle user interfaces requiring minimal cognitive driver processing for same
CN111816189A (zh) * 2020-07-03 2020-10-23 斑马网络技术有限公司 一种车辆用多音区语音交互方法及电子设备
CN113380247A (zh) * 2021-06-08 2021-09-10 阿波罗智联(北京)科技有限公司 多音区语音唤醒、识别方法和装置、设备、存储介质
CN114678026A (zh) * 2022-05-27 2022-06-28 广州小鹏汽车科技有限公司 语音交互方法、车机终端、车辆及存储介质

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110310633B (zh) * 2019-05-23 2022-05-20 阿波罗智联(北京)科技有限公司 多音区语音识别方法、终端设备和存储介质
CN110475180A (zh) * 2019-08-23 2019-11-19 科大讯飞(苏州)科技有限公司 车载多音区音频处理系统及方法
CN112599133A (zh) * 2020-12-15 2021-04-02 北京百度网讯科技有限公司 基于车辆的语音处理方法、语音处理器、车载处理器
CN113053402B (zh) * 2021-03-04 2024-03-12 广州小鹏汽车科技有限公司 一种语音处理方法、装置和车辆
CN114220430A (zh) * 2021-12-13 2022-03-22 北京百度网讯科技有限公司 多音区语音交互方法、装置、设备以及存储介质

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101677329A (zh) * 2008-09-18 2010-03-24 中兴通讯股份有限公司 一种综合语音资源平台代理服务器及其数据处理方法
US20110171950A1 (en) * 2008-09-26 2011-07-14 Aleksey Anatolyevich Ivanchikov Method of exchanging voice messages between the driver and user of the vehicle
WO2012174515A1 (en) * 2011-06-16 2012-12-20 Agero Connected Services, Inc. Hybrid dialog speech recognition for in-vehicle automated interaction and in-vehicle user interfaces requiring minimal cognitive driver processing for same
CN111816189A (zh) * 2020-07-03 2020-10-23 斑马网络技术有限公司 一种车辆用多音区语音交互方法及电子设备
CN113380247A (zh) * 2021-06-08 2021-09-10 阿波罗智联(北京)科技有限公司 多音区语音唤醒、识别方法和装置、设备、存储介质
CN114678026A (zh) * 2022-05-27 2022-06-28 广州小鹏汽车科技有限公司 语音交互方法、车机终端、车辆及存储介质

Also Published As

Publication number Publication date
CN114678026A (zh) 2022-06-28
CN114678026B (zh) 2022-10-14

Similar Documents

Publication Publication Date Title
WO2023227129A1 (zh) 语音交互方法、车机终端、车辆及存储介质
CN110764724B (zh) 一种显示设备控制方法、装置、设备及存储介质
CN103187077B (zh) 应用于车载设备的音频控制方法及装置、车载设备
CN112130802A (zh) 一种车载音频的播放方法、装置、车辆和存储介质
CN112489661B (zh) 一种车载多屏幕的通话方法及装置
CN115278462B (zh) 一种车内音频处理方法、系统、电子设备及存储介质
CN111372159A (zh) 耳机播放控制方法及耳机
CN110730406A (zh) 一种基于Android系统两路独立音源输出的方法
CN113794968A (zh) 车载音频焦点的仲裁方法及装置
CN115472186A (zh) 车载媒体播放控制方法、装置及电子设备
US20040039505A1 (en) Method for controlling access to devices in a vehicle communication network
KR20210142435A (ko) 차량용 영상 통화 서비스 제공 장치 및 그의 영상 통화 서비스 제공 방법
US11711650B2 (en) Troubleshooting of audio system
CN114125655A (zh) 一种扬声器控制方法、装置、电子设备及存储介质
US20210357179A1 (en) Agent coordination device, agent cooridnation method and recording medium
CN116229934A (zh) 车载语音播报方法及相关设备
CN113778371B (zh) 针对车机系统实现多模块声音管理控制的系统、方法、装置、处理器及其计算机存储介质
CN116153305A (zh) 语音交互方法、语音交互装置、服务器以及可读存储介质
CN115134714A (zh) 车辆座舱内音频播放的控制方法、车辆和存储介质
CN114760434A (zh) 一种可多人在线视频会议的汽车智能座舱及方法
CN115268819A (zh) 一种车内多媒体音区切换方法、装置、汽车及存储介质
CN115440207A (zh) 多屏语音交互方法、装置、设备及计算机可读存储介质
WO2019114427A1 (zh) 车辆功能播报的方法、装置及车载智能控制器
CN115278484A (zh) 音频流的控制方法、装置、设备及介质
CN117591060A (zh) 音频播放的方法、车载设备、车辆及计算机程序产品

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23811199

Country of ref document: EP

Kind code of ref document: A1