WO2022252351A1 - 车机系统的控制方法及控制系统 - Google Patents

车机系统的控制方法及控制系统 Download PDF

Info

Publication number
WO2022252351A1
WO2022252351A1 PCT/CN2021/106071 CN2021106071W WO2022252351A1 WO 2022252351 A1 WO2022252351 A1 WO 2022252351A1 CN 2021106071 W CN2021106071 W CN 2021106071W WO 2022252351 A1 WO2022252351 A1 WO 2022252351A1
Authority
WO
WIPO (PCT)
Prior art keywords
slot
information
slot information
combination
attribute
Prior art date
Application number
PCT/CN2021/106071
Other languages
English (en)
French (fr)
Inventor
吕大伟
Original Assignee
上海擎感智能科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 上海擎感智能科技有限公司 filed Critical 上海擎感智能科技有限公司
Publication of WO2022252351A1 publication Critical patent/WO2022252351A1/zh

Links

Images

Classifications

    • BPERFORMING OPERATIONS; TRANSPORTING
    • B60VEHICLES IN GENERAL
    • B60WCONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
    • B60W50/00Details of control systems for road vehicle drive control not related to the control of a particular sub-unit, e.g. process diagnostic or vehicle driver interfaces
    • B60W50/08Interaction between the driver and the control system
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/18Speech classification or search using natural language modelling
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/18Speech classification or search using natural language modelling
    • G10L15/1822Parsing for meaning understanding
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • G10L2015/223Execution procedure of a spoken command

Definitions

  • the invention relates to the control field of a vehicle-machine system, in particular to a control method and a control system for a vehicle-machine system.
  • One of the evaluation dimensions of the artificial intelligence voice interaction function of the car-machine system is the intent understanding module. That is to say, whether it can understand or recognize the intention expressed by users is the core dimension to measure artificial intelligence.
  • the artificial intelligence voice interaction module in the car-machine system can only recognize a single intention contained in a sentence, and generate a control instruction according to the single intention to control the execution of the car-machine or the equipment in the car-machine.
  • the user often puts forward a series of multiple instructions in the same voice data at the same time, which needs to be executed by the vehicle.
  • the single-intent artificial intelligence voice interaction method and the interactive system are often unable to comprehensively and accurately judge the user's real intention based on multiple operation instructions and multiple operation objects in the same voice data, so it is common to miss some operation instructions, or even execute The problem of incorrect operation.
  • control method of voice interaction in the existing car-machine system is relatively basic, and can only execute simple control instructions with a single intention of the user, but cannot execute the user's multiple intentions during driving. complex control instructions.
  • a first aspect of the present invention provides a method for controlling a vehicle-machine system.
  • the control method of the vehicle-machine system includes the following steps: collecting user voice data; performing voice recognition on the collected voice data to obtain corresponding speech information; performing semantic analysis on the speech information to obtain multiple slot information; Combining the plurality of slot information into a plurality of control instructions according to preset combination configuration information; and executing the plurality of control instructions one by one.
  • the control method of the vehicle-machine system can combine multiple slot information into multiple control instructions through semantic analysis and combined configuration, and control the vehicle-machine to execute one by one.
  • the present invention can comprehensively and accurately judge the real intention of the user according to multiple operating instructions and multiple operating objects in the same voice data, thereby further realizing the intelligent interaction between the vehicle-machine system and the user, and improving the voice quality. Interaction efficiency, and improve user experience.
  • the second aspect of the present invention also provides a control system for a vehicle-machine system.
  • the control system of the vehicle-machine system includes: the vehicle-machine terminal, configured to collect voice data of the user, and execute multiple control instructions obtained from the analysis of the voice data one by one; and the data processing terminal, configured to process the collected voice data. Perform speech recognition on the voice data to obtain the corresponding speech information, perform semantic analysis on the speech information to obtain multiple slot information, and combine the multiple slot information into the multiple control instructions according to the preset combination configuration information .
  • the control system of the vehicle-machine system can combine multiple slot information into multiple control instructions through semantic analysis and combination configuration, and control the vehicle-machine to execute one by one.
  • control system can comprehensively and accurately judge the user's real intention according to multiple operating instructions and multiple operating objects in the same voice data, thereby further realizing the intelligent interaction between the vehicle system and the user, and improving the voice quality. Interaction efficiency, and improve user experience.
  • a third aspect of the present invention further provides a computer-readable storage medium.
  • the computer readable storage medium has computer instructions stored thereon.
  • the vehicle-machine system control method provided by the first aspect of the present invention is implemented.
  • the computer-readable storage medium can comprehensively and accurately judge the real intention of the user according to multiple operating instructions and multiple operating objects in the same voice data, thereby further realizing the intelligence between the vehicle-machine system and the user. Interaction, improve the efficiency of voice interaction, and improve user experience.
  • the present invention provides a vehicle-machine system control, a control system, and a computer-readable storage medium storing the control method, which can realize human-computer interaction in the vehicle-machine system through speech recognition, semantic processing, and intent segmentation. Multi-intent command control of machine-machine voice interaction, so as to further realize the intelligent interaction between the car-machine system and the user, improve the efficiency of voice interaction, and enhance the user experience.
  • FIG. 1 shows an overall architecture diagram of a control method for a vehicle-machine system provided according to some embodiments of the present invention.
  • Fig. 2 shows a system diagram of intent segmentation of the control method of the vehicle-machine system provided according to some embodiments of the present invention.
  • connection should be understood in a broad sense, for example, it can be a fixed connection or a detachable connection. Connected, or integrally connected; it can be mechanically connected or electrically connected; it can be directly connected or indirectly connected through an intermediary, and it can be the internal communication of two components.
  • connection should be understood in specific situations.
  • first”, “second”, “third”, etc. may be used herein to describe various components, regions, layers and/or sections, these components, regions, layers and/or sections It should not be limited by these terms, and these terms are only used to distinguish different components, regions, layers and/or sections. Thus, a first component, region, layer and/or section discussed below could be termed a second component, region, layer and/or section without departing from some embodiments of the present invention.
  • the present invention provides a control method for a vehicle-machine system.
  • FIG. 1 shows an overall architecture diagram of a control method for a vehicle-machine system according to some embodiments of the present invention.
  • the control system of the vehicle-machine system mainly includes a vehicle-machine terminal and a data processing terminal.
  • the car terminal is mainly used to collect the user's voice data and send it to the data processing terminal for analysis, and then obtain multiple single-intention control instructions from the data processing terminal for execution one by one.
  • the data processing terminal can be configured in the cloud control system, and is mainly used for semantic analysis and intent combination of the voice data sent by the vehicle terminal, so as to generate multiple single-intention control instructions that can be correctly recognized and executed by the vehicle terminal.
  • the control method applied to the control system includes the following steps: first, the vehicle terminal can use the microphone module of the vehicle to collect the voice data of the user, and send the voice data to the data processing terminal in the cloud for semantic analysis and intent combination. Afterwards, the data processing end can perform speech recognition on the received voice data to obtain corresponding speech information, and then perform semantic analysis on the obtained speech information to obtain multiple slot information. Afterwards, the data processing end can combine the obtained multiple slot information into multiple single-intent control commands according to the preset combination configuration information, and send these single-intent control commands to the car-machine end for car-machine Execute one by one.
  • the steps of collecting the voice data of the user at the vehicle end mainly include: using a microphone module to collect multiple analog recording signals of the user; then converting the collected multiple analog recording signals into corresponding digital voice signals; The converted voice digital signals are synthesized into voice stream data in time sequence.
  • Digital signals are formed on the basis of analog signals through sampling, quantization, and encoding. Specifically, sampling is to obtain the sample values at each moment of the input analog signal at an appropriate time interval; quantization is to express the values at each moment measured by sampling in binary code; encoding is to quantize the generated The binary numbers are arranged together to form a sequential pulse train.
  • Analog signals are generally quantized into digital signals by PCM pulse code modulation (Pulse Code Modulation), that is, different amplitudes of the analog signal correspond to different binary values.
  • PCM pulse code modulation Pulse Code Modulation
  • the recording analog signal After the recording analog signal is converted into a voice digital signal, the confidentiality of communication is enhanced. After the voice signal is converted by A/D, it can be encrypted first and then transmitted, and after being decrypted at the receiving end, it can be restored to an analog signal by D/A conversion. Moreover, after the recording analog signal is converted into a voice digital signal, not only the anti-interference ability of the signal is improved, especially in the relay, the digital signal can be reproduced to eliminate the accumulation of noise. Transmission errors during analog-to-digital conversion can be controlled, thereby improving transmission quality. Moreover, the analog-to-digital conversion facilitates the use of modern digital signal processing technology to process digital information, and can build an integrated digital communication network, comprehensively transmit various messages, and enhance the function of the communication system.
  • the car-machine terminal can be configured with a human-computer interaction interface such as a voice collection button.
  • the user can click the voice collection button to start the microphone module of the vehicle to collect the voice initiated by it. Air conditioning, close the sunroof".
  • This audio stream will be sent by the microphone module to the processor on the car side, and the audio stream and voice stream will be converted in the processor on the car side.
  • Audio streaming refers to the practice of delivering real-time audio over a network connection. This type of data transfer requires some protocol to handle the time ordering of data packets or other transfer types in order to provide on-demand content to end users. Audio streaming utilizes a buffering system and a secure streaming platform to allow end users to listen to full audio files without interruption. This type of data flow requires a lot of bandwidth.
  • the audio stream of "turn on the air conditioner and close the sunroof” includes “play”, “open”, “empty”, “tune”, “close”, “close”, “sky”, “window” Eight recording analog signals.
  • the microphone module on the vehicle end is responsible for collecting the eight recorded analog signals, and then the processor on the vehicle end converts them into corresponding voice digital signals. These voice digital signals are then synthesized into voice stream data in chronological order, and the obtained voice stream data is sent to the data processing terminal by the vehicle terminal of the vehicle.
  • Voice stream data is arranged and synthesized according to the time sequence of multiple voice digital signals received. For example, after analog-to-digital conversion, the processor sequentially obtains eight voice digital signals of "play”, “open”, “empty”, “tune”, “close”, “close”, “sky” and “window”, and then According to the order in which the eight voice digital signals were obtained, the voice stream data of "turn on the air conditioner and close the sunroof" was synthesized.
  • the vehicle terminal sends the obtained voice stream data to the data processing terminal.
  • the data processing terminal is configured in the cloud control system, including a voice processing system, a semantic processing system, and an intent segmentation system.
  • the voice processing system parses the received voice stream data into corresponding speech information by performing voice recognition processing.
  • Speech information refers to the extracted text information in the speech recognition system that conforms to a specific structure and contains key information.
  • Ordinary text information usually refers to the colloquial information text spoken by the user, such as "Please turn on the air conditioner for me and close the sunroof by the way”.
  • the speech information corresponding to this example may be "turn on the air conditioner and close the sunroof".
  • speech information is more conducive to the semantic analysis step in the subsequent semantic processing system, so that it can more quickly and accurately analyze the control instructions contained in the voice stream data.
  • the voice processing system parses the voice stream data into speech information, it sends the obtained speech information to the semantic processing system that is also configured at the data processing end for further semantic analysis of the speech text.
  • the steps of semantic analysis of the technical information include: first, the semantic processing system extracts keywords from the received spoken information, for example, four keywords can be extracted from “turn on the air conditioner, close the sunroof", which are “open”, “ "air conditioner”, “close”, “skylight”; and then classify the obtained multiple keywords according to the preset slot attributes, and use each keyword as slot information with corresponding slot attributes.
  • the slot refers to the identification of the key information used to accurately express the intention in the sentence in which the user expresses the intention.
  • An intent can have one or more slots, depending on how many key information the intent requires. For example, in the intent of "query the weather”, we know that the weather in different places on different days is different. Usually, when people ask about the weather, they need to provide the weather on which day and place to check. Then, "inquiry date” and “inquiry city” are taken as the key information of the weather intention, and these two are created as slots.
  • the slot attributes in this embodiment mainly include verb attributes and noun attributes.
  • the verb attributes further include category attributes of various actions such as opening, closing, raising, lowering, increasing, decreasing, connecting, disconnecting, and rotating.
  • the noun attributes further include category attributes of various objects such as air-conditioning equipment, audio equipment, video equipment, and communication equipment.
  • the slot attributes of each noun type can only be combined with the slot attributes of some action types.
  • each slot information is arranged according to the first order in which keywords are extracted from the speech information, so as to form a slot information list.
  • the first order refers to the sequence of the extracted keywords in the script text. For example, in the phrase "turn on the air conditioner, close the sunroof", the first order in which keywords are extracted is “open”, “air conditioner”, “close”, and “sunroof”.
  • the slot information list refers to the list of all slot information contained in the script. For example, the content in the slot information list in the above example is “open”, “air conditioner”, “closed”, and “sunroof".
  • the semantic processing system sends the slot information list including multiple slot information to the intent segmentation system configured at the data processing end.
  • the data processing end also includes an intent segmentation system.
  • the intent segmentation system is used to combine the obtained multiple slot information into multiple control instructions according to the preset combined configuration information.
  • the intent segmentation system receives the slot list sent from the semantic processing system, it divides the slot information in the slot list into multiple independent intents through the intent segmentation strategist, and the multiple independent intents can form a or multiple intent lists.
  • the intent segmentation system then sends the composed one or more intent lists to the car-machine terminal.
  • FIG. 2 shows a system diagram of an intention segmentation system of a control method for a vehicle-machine system according to some embodiments of the present invention.
  • the intent segmentation system After the intent segmentation system receives the slot information list, it transmits it to the intent segmentation policer, and the intent segmentation policer compares the slot list according to the configuration information list configured by the policy interface layer. Segmentation and combination of slot information in to form multiple independent intents.
  • each group of combination strategies exists in the form of (first slot attribute, combination direction, second slot attribute).
  • Each combined policy in the configuration information list is arranged in the second preset order.
  • the second order is a policy arrangement order defined by the designer, and is used to indicate the order in which the policy interface implementation layer selects a combined policy to try.
  • the intent segmentation system can first determine the first slot information in the slot information list according to the first order, that is, the slot information represented by the first keyword extracted from the speech information . For example: in the speech information of "increase the temperature of the air conditioner and close the windows", the first slot information is "increase”. Then, the intent segmentation system may determine the first combined strategy whose first slot attribute is "improved" according to the above-mentioned second order.
  • the intent segmentation system can turn to judge whether the attribute of the first slot of the next strategy matches the "increase" slot Bit information matches.
  • strategy 2 in the configuration information list is (raise or lower, backward, air-conditioning equipment), and its first slot attribute indicates raising operation or lowering operation. It can be seen that the first slot attribute of strategy 2 and the speech information The first slot information "improvement" in is matched. In this way, the intent segmentation system can determine the second strategy as the first combined strategy in which the first slot attribute matches the slot attribute of the first slot information.
  • the intent segmentation system can judge one by one whether the slot attribute of the remaining slot information in the slot information list matches the second slot attribute according to the combination direction indicated by strategy two (for example: backward).
  • strategy two for example: backward.
  • backward here refers to the backward direction in the first order, that is, from the first slot information, the second slot information, and the third slot information in the slot information list. backward direction.
  • This combination direction is generally more in line with the user's habit of speaking in the order of verbs first and then nouns, such as "turn on the sound", "turn down the volume” and so on. Therefore, the preferred combination sequence in this embodiment is backward combination, and the first slot attribute in the combination strategy is preferentially the verb slot attribute.
  • each combination strategy may also involve a reverse first order and forward combination direction, so as to conform to the habit of individual users in the speaking order of verbs first and then nouns, such as "turn on the sound", "turn on the volume down” etc.
  • the first slot attribute in each combined strategy in this embodiment is still preferentially a verb slot attribute, for example (turn up or turn down, forward, audio equipment).
  • the first slot information is "increase”
  • the second strategy is the first combined strategy.
  • the combination direction indicated by the first combination strategy is backward combination.
  • the remaining slot information in the slot information list is "air conditioner temperature", "off”, and "window”.
  • the intent splitting policer judges the matching degree between the slot attribute of the remaining slot information in the above slot information list and the second slot attribute of strategy two in turn at the policy interface implementation layer.
  • the intent segmentation system can set the "air conditioner "Temperature” is determined as the first remaining slot information matching the second slot attribute of strategy 2, and the "air-conditioning temperature” and the above-mentioned “increase” are combined into a single-intent control instruction, namely "increase air-conditioning temperature”.
  • the intent segmentation system can further determine the next combination strategy in which the first slot attribute matches the slot attribute of the first slot information (ie "improved") according to the above-mentioned second sequence.
  • the strategy three in the configuration information list is (raise or lower, backward, audio equipment), and its first slot attribute indicates the raising operation or lowering operation. It can be seen that the first slot attribute of strategy three and the speech information The first slot information "improvement" in is matched.
  • the intent segmentation system can determine strategy three as the next combination strategy that matches the first slot attribute with the "improved” slot attribute, and judge the slot information list one by one backward along the combination direction indicated by strategy three Check whether the slot attributes of the remaining slot information match the second slot attribute "audio equipment” of strategy 3. At this time, the slot attribute of "audio volume” in the rest of the slot information in the slot information list just matches the second slot attribute "audio equipment” of strategy 3, then the intention segmentation system can set the "audio volume” "Volume” is determined as the first remaining slot information matching the second slot attribute of Strategy 3, and the "audio volume” and the above-mentioned "increase” are combined into a single-intent control instruction, namely "increase audio volume".
  • the intent segmentation system In the process of dividing the slot information list into multiple independent intents and combining them into multiple control instructions in the intent segmentation strategist, in response to the combination obtaining a control instruction, the intent segmentation system starts from the original slot
  • the multiple slot information involved in the control instruction is deleted from the information list, and the first slot information in the slot information list determined according to the first order is re-determined.
  • the intention segmentation system can display the original slot information list after the first control instruction "increase the temperature of the air conditioner” , delete the two slot information “increase” and “air conditioner temperature” involved in the first control instruction. At this point, only the “closed” and “window” slot information remains in the new slot information list.
  • the intent segmentation strategist can then determine "closed” as the first slot information in the new slot information list according to the order in which the keywords are extracted from the speech text, and re-according to the combinations in the configuration information list Strategies to combine new control instructions. The process of combining new control instructions is the same as the above embodiment, and will not be repeated here.
  • the data processing end also constructs an intent list according to the multiple control instructions obtained by arranging the synthesized order, and sends the constructed intent list to the car-machine end.
  • the vehicle-machine end receives the intent list transmitted from the data processing end, and executes multiple control instructions in the intent list sequentially and in batches.
  • the vehicle-machine terminal may execute the first control instruction in the received intent list first, and count the time length for executing the first control instruction.
  • the on-board device can determine that the first control instruction has been executed, and then execute the next control instruction in the intent list.
  • the car-machine terminal can feed back the results of the car-machine execution control commands to the user through human-computer interaction interfaces such as the vehicle's central control display and the voice broadcast module (Text to Speech, TTS), so as to complete the voice interaction of the entire car-machine system. control process.
  • human-computer interaction interfaces such as the vehicle's central control display and the voice broadcast module (Text to Speech, TTS), so as to complete the voice interaction of the entire car-machine system. control process.
  • those skilled in the art can also configure the data processing end of the control system in the vehicle-machine system based on the above-mentioned idea of the present invention, so that the vehicle-machine system can realize the same The effect of segmentation intent.
  • the present invention also provides a control system for a vehicle-machine system.
  • the control system of the vehicle-machine system realizes the artificial intelligence voice interactive control in the vehicle-machine system by using the above-mentioned control method of the vehicle-machine system.
  • the specific operation is as described above, and will not be repeated here.
  • the control system can comprehensively and accurately judge the real intention of the user according to multiple operating instructions and multiple operating objects in the same voice data, thereby further realizing the intelligent interaction between the vehicle-machine system and the user, and improving the Improve the efficiency of voice interaction and improve user experience.
  • the present invention also provides a computer-readable storage medium on which computer instructions are stored.
  • the computer instructions are executed by the processor, the above-mentioned method configured by the user terminal and the data processing terminal in the control system of the vehicle-machine system is implemented.
  • the computer-readable storage medium can comprehensively and accurately judge the real intention of the user according to multiple operating instructions and multiple operating objects in the same voice data, thereby further realizing the intelligence between the vehicle-machine system and the user. Interaction, improve the efficiency of voice interaction, and improve user experience.
  • vehicle-machine terminal and the data processing terminal described in the above-mentioned embodiments can be realized by a combination of software and hardware.
  • vehicle-machine terminal and the data processing terminal can also be implemented in software or hardware.
  • the vehicle-machine terminal and the data processing terminal can be implemented in one or more application-specific integrated circuits (ASICs), digital signal processors (DSPs), programmable logic devices (PLDs), field programmable gate arrays ( FPGA), processors, controllers, microcontrollers, microprocessors, other electronic devices for performing the functions described above, or a selected combination of the above devices.
  • ASICs application-specific integrated circuits
  • DSPs digital signal processors
  • PLDs programmable logic devices
  • FPGA field programmable gate arrays
  • processors controllers, microcontrollers, microprocessors, other electronic devices for performing the functions described above, or a selected combination of the above devices.
  • the vehicle-machine terminal and the data processing terminal can be implemented by independent software modules such as program modules (procedures) and function modules (functions) running on a general-purpose chip, wherein each module executes a or more of the functions and operations described herein.
  • program modules program modules
  • function modules functions
  • DSP digital signal processor
  • ASIC application-specific integrated circuit
  • FPGA field-programmable gate array
  • a general-purpose processor may be a microprocessor, but in the alternative, the processor may be any conventional processor, controller, microcontroller, or state machine.
  • a processor may also be implemented as a combination of computing devices, e.g., a combination of a DSP and a microprocessor, multiple microprocessors, one or more microprocessors in cooperation with a DSP core, or any other such configuration.

Landscapes

  • Engineering & Computer Science (AREA)
  • Human Computer Interaction (AREA)
  • Acoustics & Sound (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Multimedia (AREA)
  • Automation & Control Theory (AREA)
  • Artificial Intelligence (AREA)
  • Transportation (AREA)
  • Mechanical Engineering (AREA)
  • Fittings On The Vehicle Exterior For Carrying Loads, And Devices For Holding Or Mounting Articles (AREA)

Abstract

一种车机系统的控制方法及控制系统。车机系统的控制方法包括以下步骤:采集用户的语音数据;对采集的语音数据进行语音识别,以获取对应的话术信息;对话术信息进行语义解析,以获取多个槽位信息;根据预设的组合配置信息将多个槽位信息组合成多条控制指令;以及逐一执行多条控制指令。

Description

车机系统的控制方法及控制系统 技术领域
本发明涉及车机系统的控制领域,尤其涉及一种车机系统的控制方法及控制系统。
背景技术
众所周知,当人们的视觉通道被占用时,听觉通道更适合接收紧急和重要的通知。尤其是当人们开车时,双手需握着方向盘,眼睛需要时刻看着前方道路,保持高度的专注以确保驾驶安全。但是,有时在人们开车时遇到一些紧急状况,或者突然想调整车内的配置时,由于这时候视觉通道处于运行中,很难分心去做别的事,正是基于此种情况,语音交互开始引入到汽车中。
车机系统的人工智能语音交互功能的评判维度之一就是意图理解模块。也就是说,是否能够理解或识别用户表述的意图是衡量人工智能与否的核心维度。
现有技术中,车机系统中人工智能语音交互模块只能够识别一句话的话术中含有的单个意图,并根据该单个意图生成一条控制指令,以控制车机或车机内的设备执行。但是在语音交互模块的实际应用中,用户往往会在同一条语音数据中同时提出一连串的多个指令,需要车机去执行。此时,单意图人工智能语音交互方式以及交互系统往往无法根据同一语音数据中的多条操作指令及多个操作对象全面、准确地判断用户的真实意图,从而普遍存在遗漏部分操作指令,甚至执行错误操作的问题。
为了克服现有技术存在的上述问题,本领域亟需一种能够根据同一语音数据中的多条操作指令及多个操作对象全面、准确地判断用户的真实意图的语音交互技术,用于进一步实现车机系统与用户之间的智能互动,提升语音交互的效率,并提升用户体验。
发明内容
以下给出一个或多个方面的简要概述以提供对这些方面的基本理解。此概述不是所有构想到的方面的详尽综览,并且既非旨在指认出所有方面的关键性或决定性要素亦非试图界定任何或所有方面的范围。其唯一的目的是要以简化形式给出一个或多个方面的一些概念以为稍后给出的更加详细的描述之前序。
如上所述,现有技术中,现有车机系统中关于语音交互的控制方法较为基础,仅能执行用户单一意图的简单控制指令,而无法执行用户在行车过程中提出的同时包含多个意图的复杂控制指令。
为了解决上述问题,本发明的第一方面提供了一种车机系统的控制方法。该车机系统的控制方法包括以下步骤:采集用户的语音数据;对采集的语音数据进行语音识别,以获取对应的话术信息;对该话术信息进行语义解析,以获取多个槽位信息;根据预设的组合配置信息将该多个槽位信息组合成多条控制指令;以及逐一执行该多条控制指令。该车机系统的控制方法能够通过语义解析和组合配置将多个槽位信息组合成多条控制指令,并控制车机逐一执行。通过实施该控制方法,本发明能够根据同一语音数据中的多条操作指令及多个操作对象全面、准确地判断用户的真实意图,从而进一步实现车机系统与用户之间 的智能互动,提升语音交互的效率,并提升用户体验。
为了解决上述问题,本发明的第二方面还提供了一种车机系统的控制系统。该车机系统的控制系统包括:车机端,被配置用于采集用户的语音数据,并逐一执行从该语音数据解析获得的多条控制指令;以及数据处理端,被配置用于对采集的语音数据进行语音识别以获取对应的话术信息,对该话术信息进行语义解析以获取多个槽位信息,并根据预设的组合配置信息将该多个槽位信息组合成该多条控制指令。该车机系统的控制系统能够通过语义解析和组合配置将多个槽位信息组合成多条控制指令,并控制车机逐一执行。通过设计上述配置,该控制系统能够根据同一语音数据中的多条操作指令及多个操作对象全面、准确地判断用户的真实意图,从而进一步实现车机系统与用户之间的智能互动,提升语音交互的效率,并提升用户体验。
为了解决上述的问题,本发明的第三方面还提供了一种计算机可读存储介质。该计算机可读存储介质上存储有计算机指令。该计算机指令被处理器执行时,实施本发明的第一方面所提供的车机系统的控制方法。通过实施该控制方法,该计算机可读存储介质能够根据同一语音数据中的多条操作指令及多个操作对象全面、准确地判断用户的真实意图,从而进一步实现车机系统与用户之间的智能互动,提升语音交互的效率,并提升用户体验。
综上,本发明提供了一种车机系统的控制、控制系统、以及存有该控制方法的计算机可读存储介质,能够通过语音识别、语义处理、以及意图切分实现车机系统中的人机语音交互的多意图指令控制,从而进一步实现车机系统与用户之间的智能互动,提升语音交互的效率,并提升用户体验。
附图说明
在结合以下附图阅读本公开的实施例的详细描述之后,能够更好地理解本发明的上述特征和优点。在附图中,各组件不一定是按比例绘制,并且具有类似的相关特性或特征的组件可能具有相同或相近的附图标记。
图1示出了根据本发明的一些实施例提供的车机系统的控制方法的整体架构图;以及
图2示出了根据本发明的一些实施例提供的车机系统的控制方法的意图切分系统图。
具体实施方式
以下由特定的具体实施例说明本发明的实施方式,本领域技术人员可由本说明书所揭示的内容轻易地了解本发明的其他优点及功效。虽然本发明的描述将结合优选实施例一起介绍,但这并不代表此发明的特征仅限于该实施方式。恰恰相反,结合实施方式作发明介绍的目的是为了覆盖基于本发明的权利要求而有可能延伸出的其它选择或改造。为了提供对本发明的深度了解,以下描述中将包含许多具体的细节。本发明也可以不使用这些细节实施。此外,为了避免混乱或模糊本发明的重点,有些具体细节将在描述中被省略。
在本发明的描述中,需要说明的是,除非另有明确的规定和限定,术语“安装”、“相连”、“连接”应做广义理解,例如,可以是固定连接,也可以是可拆卸连接,或一体地连接;可以是机械连接,也可以是电连接;可以是直接相连,也可以通过中间媒介间接相连,可以是两个元件内部的连通。对于本领域的普通技术人员而言,可以具体情况理解上述术 语在本发明中的具体含义。
另外,在以下的说明中所使用的“上”、“下”、“左”、“右”、“顶”、“底”、“水平”、“垂直”应被理解为该段以及相关附图中所绘示的方位。此相对性的用语仅是为了方便说明之用,其并不代表其所叙述的装置需以特定方位来制造或运作,因此不应理解为对本发明的限制。
能理解的是,虽然在此可使用用语“第一”、“第二”、“第三”等来叙述各种组件、区域、层和/或部分,这些组件、区域、层和/或部分不应被这些用语限定,且这些用语仅是用来区别不同的组件、区域、层和/或部分。因此,以下讨论的第一组件、区域、层和/或部分可在不偏离本发明一些实施例的情况下被称为第二组件、区域、层和/或部分。
根据本发明的第一方面,本发明提供了一种车机系统的控制方法。
请参看图1,图1示出了根据本发明的一些实施例提供的车机系统的控制方法的整体架构图。
在图1所示的实施例中,车机系统的控制系统主要包括车机端和数据处理端。该车机端主要用于采集用户的语音数据并将其发送到数据处理端进行解析,再从数据处理端获取其解析获得的多条单意图的控制指令以逐一执行。该数据处理端可以配置于云端控制系统中,主要用于对车机端发送的语音数据进行语义解析及意图组合,以生成能让车机端正确识别并执行的多条单意图的控制指令。
应用于该控制系统的控制方法包括以下步骤:首先,车机端可以利用车辆的麦克风模块采集用户的语音数据,并将该语音数据发送到云端的数据处理端以进行语义解析及意图组合。之后,数据处理端可以对收到的语音数据进行语音识别以获取对应的话术信息,再对获取的话术信息进行语义解析以获取多个槽位信息。再之后,数据处理端可以根据预设的组合配置信息将得到的多个槽位信息组合成多条单意图的控制指令,并将这些单意图的控制指令发送到车机端,以供车机端逐一执行。
具体而言,在车机端采集用户的语音数据的步骤主要包括:利用麦克风模块采集用户的多个录音模拟信号;然后将采集得到的多个录音模拟信号分别转换为对应的语音数字信号;然后将各转换而成的语音数字信号按时间顺序合成语音流数据。
数字信号是在模拟信号的基础上,经过采样、量化、编码而形成的。具体地说,采样就是把输入的模拟信号按适当的时间间隔得到各个时刻的样本值;量化是把经采样测得的各个时刻的值用二进码制来表示;编码则是把量化生成的二进制数排列在一起形成顺序脉冲序列。模拟信号一般通过PCM脉码调制(Pulse Code Modulation)方法量化为数字信号,即让模拟信号的不同幅度分别对应不同的二进制值。
将该录音模拟信号转换为语音数字信号后,加强了通信的保密性。语音信号经A/D变换后,可以先进行加密处理再进行传输,在接收端解密后再经D/A变换还原成模拟信号。而且,该录音模拟信号转换为语音数字信号后,不仅提高了信号的抗干扰能力,尤其在中继时,数字信号可以再生从而消除噪声的积累。模数转换过程中的传输差错可以控制,从而改善了传输质量。而且,模数转换便于使用现代数字信号处理技术来对数字信息进行处理,以及可构建综合数字通信网,综合传递各种消息,使通信系统功能增强。
如图1所示,本实施例中,车机端可以配置有语音采集键等人机交互接口,用户可以 点击该语音采集键以启动车辆的麦克风模块对其发起的语音进行采集,例如“打开空调,关闭天窗”。这条音频流将由麦克风模块发送至车机端的处理器,并在车机端的处理器内进行音频流和语音流的转换。
音频流是指通过网络连接传递实时音频的一种做法。这种类型的数据传输需要某些协议来处理数据包的时间顺序或其他传输类型,以便为最终用户提供按需内容。音频流利用缓冲系统和安全的数据流平台来允许最终用户不间断地收听完整的音频文件。这种类型的数据流需要大量带宽。
本实施例中,对于“打开空调,关闭天窗”这条音频流中包括“打”、“开”、“空”、“调”、“关”、“闭”、“天”、“窗”八条录音模拟信号。车机端的麦克风模块负责采集这八条录音模拟信号,然后由车机端的处理器将其转换为对应的语音数字信号。这些语音数字信号再按照时间顺序合成语音流数据,得到的语音流数据由车机的车机端发送至数据处理端。
语音流数据是根据收到的多个语音数字信号的时间顺序而排列合成的。例如,处理器经过模数转换,依次获得了“打”、“开”、“空”、“调”、“关”、“闭”、“天”、“窗”的八条语音数字信号,然后按照这八条语音数字信号获得时间的先后顺序,合成了“打开空调,关闭天窗”这条语音流数据。
车机端将得到的语音流数据发送至数据处理端。在本实施例中,数据处理端配置于云端控制系统中,包括语音处理系统、语义处理系统、以及意图切分系统。
语音处理系统通过对接收到的语音流数据进行语音识别处理,从而将其解析成对应的话术信息。
话术信息是指语音识别系统中的提取出的符合特定结构、且包含关键信息的文字信息。普通的文字信息通常是指用户说出的偏口语化的信息文字,例如“请帮我打开一下空调,顺便把天窗关了”。而这个例子所对应的话术信息则可以为“打开空调,关闭天窗”。相比于口语化的文字信息,话术信息更加有利于后续语义处理系统中进行的语义解析步骤,使其能够更加快速、精准地解析出语音流数据中包含的控制指令。
语音处理系统在将语音流数据解析成话术信息后,在将得到的该话术信息发送至同配置于数据处理端的语义处理系统中,对该话术文字进行进一步的语义解析。
对话术信息进行语义解析的步骤包括:首先,语义处理系统从接收到的话术信息中提取关键词,例如“打开空调,关闭天窗”中可提取出四个关键词,分别为“打开”、“空调”、“关闭”、“天窗”;然后根据预设的槽位属性对得到的多个关键词进行分类,将各关键词作为具有对应槽位属性的槽位信息。
槽位是指在用户表达意图的句子中,用来准确表达该意图的关键信息的标识。一个意图可以有一个或多个槽位,取决于意图所需的关键信息有多少个。比如,在“查询天气”的意图中,我们知道不同日期不同地方的天气是不一样的,通常人们问天气时需要提供要查哪天哪个地方的天气。于是,“查询日期”、“查询城市”被作为天气意图的关键信息,并且这两者被创建成槽位。
本实施例中的槽位属性主要包括动词属性和名词属性两大类。动词属性中进一步包括打开、关闭、提高、降低、增大、减小、连接、断开、旋转等多种动作的类别属性。名词属性中进一步包括空调设备、音频设备、视频设备、通信设备等多种对象的类别属性。每 种名词类型的槽位属性只能和部分动作类型的槽位属性进行组合。
在上述例子中,“打开空调,关闭天窗”中的关键词为“打开”、“空调”、“关闭”、“天窗”,其中“打开”和“关闭”为动词属性的槽位信息。“空调”和“天窗”为名词属性的槽位信息。
本实施例中,根据从该话术信息中提取关键词的第一顺序排列各个槽位信息,以构成槽位信息列表。第一顺序是指提取出的多个关键词在话术文字中的先后顺序。例如,在“打开空调,关闭天窗”的话术文字中,其中关键词被提取出的第一顺序为“打开”、“空调”、“关闭”、“天窗”。槽位信息列表是指话术中包含的所有槽位信息组成的列表。例如,上述例子中的槽位信息列表中的内容为“打开”、“空调”、“关闭”、“天窗”。
语义处理系统将包含多个槽位信息的槽位信息列表发送至同配置于数据处理端的意图切分系统内。
请继续参看图1,数据处理端还包括意图切分系统。意图切分系统用于根据预设的组合配置信息将获得的多个槽位信息组合成多条控制指令。意图切分系统在接收到从语义处理系统发送来的槽位列表后,通过意图切分策略器将槽位列表中的槽位信息切分成多个独立的意图,该多个独立意图可以组成一个或多个意图列表。意图切分系统再将组成的一个或多个意图列表发送至车机端。
请参看图2,图2示出了根据本发明的一些实施例提供的车机系统的控制方法的意图切分系统图。
在图2的实施例中,意图切分系统接收到槽位信息列表后,将其传送到意图切分策略器中,由意图切分策略器根据策略接口层配置的配置信息列表对槽位列表中的槽位信息进行切分和组合,以形成多个独立的意图。
具体来说,上述配置信息列表中记载有多种组合策略,每组组合策略分别以(第一槽位属性,组合方向,第二槽位属性)的形式存在。配置信息列表中的各组合策略按预设的第二顺序排列。该第二顺序是由设计人员自定义的策略排列顺序,用于指示策略接口实现层选择组合策略进行尝试的顺序。
在进行意图切分时,意图切分系统可以首先根据第一顺序确定槽位信息列表中的首个槽位信息,即从话术信息中第一个提取出的关键字所代表的槽位信息。例如:“提高空调温度,关闭车窗”的话术信息中,其首个槽位信息为“提高”。然后,意图切分系统可以根据上述第二顺序确定第一槽位属性为“提高”的首个组合策略。
继续使用上述的例子,对于“提高空调温度,关闭车窗”的话术信息,假设配置信息列表中的策略一为(打开或关闭,向后,空调设备),其第一槽位属性指示的是打开操作或关闭操作,与该话术信息中的首个槽位信息“提高”不匹配,则意图切分系统可以转而判断下一个策略的第一槽位属性是否与该“提高”的槽位信息匹配。假设配置信息列表中的策略二为(提高或降低,向后,空调设备),其第一槽位属性指示的是提高操作或降低操作,可见策略二的第一槽位属性和该话术信息中的首个槽位信息“提高”是相匹配的。如此,意图切分系统即可将策略二确定为第一槽位属性与首个槽位信息的槽位属性匹配的首个组合策略。
然后,意图切分系统可以根据策略二指示的组合方向(例如:向后),逐一判断槽位 信息列表中其余槽位信息的槽位属性是否与第二槽位属性匹配。可以理解的是,此处的向后是指顺第一顺序向后的方向,即从槽位信息列表中的第一个槽位信息、第二个槽位信息、第三个槽位信息依次向后的方向。该组合方向一般更符合用户说话先动词再名词的说话顺序习惯,例如“打开音响”、“调低音量”等。因此,本实施例中优先采取的组合顺序是向后组合,且组合策略中的第一槽位属性优先为动词槽位属性。
可选地,在另一些实施例中,各组合策略中也可以涉及逆第一顺序向前的组合方向,以符合个别用户先动词再名词的说话顺序习惯,例如“把音响打开”、“把音量调低”等。对应地,该实施例中各组合策略中的第一槽位属性仍优先为动词槽位属性,例如(调高或调低,向前,音响设备)。
在上述“提高空调温度,关闭车窗”的实施例中,首个槽位信息为“提高”,策略二为其首个组合策略。首个组合策略指示的组合方向是向后组合。此时,槽位信息列表中其余的槽位信息分别为“空调温度”、“关闭”、“车窗”。意图切分策略器在策略接口实现层依次判断上述槽位信息列表中剩余的槽位信息的槽位属性和策略二的第二槽位属性的匹配程度。若策略二的第二槽位属性为“空调设备”,正好与槽位信息列表中其余的槽位信息中的“空调温度”的槽位属性相匹配,则意图切分系统即可将“空调温度”确定为与策略二的第二槽位属性匹配的首个其余槽位信息,并将该“空调温度”与上述“提高”组合成一条单意图的控制指令,即“提高空调温度”。
反之,在“提高音响音量,关闭车窗”的实施例中,槽位信息列表中其余槽位信息包括“音响音量”、“关闭”、“车窗”,与策略二的第二槽位属性(空调设备)皆不匹配,则意图切分系统可以根据上述第二顺序进一步确定第一槽位属性与上述首个槽位信息(即“提高”)的槽位属性匹配的下一个组合策略。假设配置信息列表中的策略三为(提高或降低,向后,音响设备),其第一槽位属性指示的是提高操作或降低操作,可见策略三的第一槽位属性和该话术信息中的首个槽位信息“提高”是相匹配的。如此,意图切分系统即可将策略三确定为第一槽位属性与“提高”的槽位属性匹配的下一个组合策略,并沿策略三指示的组合方向,向后逐一判断槽位信息列表中其余槽位信息的槽位属性是否与策略三的第二槽位属性“音响设备”匹配。此时,槽位信息列表中其余的槽位信息中的“音响音量”的槽位属性正好与策略三的第二槽位属性“音响设备”相匹配,则意图切分系统即可将“音响音量”确定为与策略三的第二槽位属性匹配的首个其余槽位信息,并将该“音响音量”与上述“提高”组合成一条单意图的控制指令,即“提高音响音量”。
在意图切分策略器中的将槽位信息列表切分成多个独立的意图并组合成多条控制指令的过程中,响应于组合获得了一条控制指令,意图切分系统就从原始的槽位信息列表中删除该控制指令涉及到的多个槽位信息,并重新根据第一顺序确定的槽位信息列表中的首个槽位信息。
继续以“提高空调温度,关闭车窗”的话术信息的意图切分为例,在获得了第一条控制指令“提高空调温度”后,意图切分系统即可在原始的槽位信息列表中,删除第一条控制指令所涉及的两个槽位信息“提高”和“空调温度”。此时,新的槽位信息列表中仅剩余“关闭”和“车窗”的槽位信息。意图切分策略器可以再根据关键词在话术文字中提取出的先后顺序,确定“关闭”为新的槽位信息列表中的首个槽位信息,并重新根据配置信息 列表中的各组合策略来组合新的控制指令。组合新的控制指令的过程与上述实施例相同,在此不再赘述。
请继续参看图1,数据处理端还根据合成的顺序排列得到的多条控制指令构建意图列表,并且将构建而成的意图列表发送至车机端。
在图1的实施例中,该车机端接收从数据处理端传来的意图列表,并且按顺序分批执行意图列表中的多条控制指令。具体来说,车机端可以首先执行接收到的意图列表中的首条控制指令,并统计执行该首条控制指令的时间长度。响应于执行首条控制指令的时间长度达到预设的时间阈值(例如:3~5秒)时,车机端可以判断首条控制指令已被执行,从而执行意图列表中的下一条控制指令。之后,车机端可以通过车辆的中控显示器、语音播报模块(Text toSpeech,TTS)等人机交互接口,将车机执行控制指令的结果反馈给用户,以完成整个车机系统的语音交互的控制过程。
本领域的技术人员可以理解,上述将数据处理端配置于云端控制系统的方案只是本发明提供的一种非限制性的实施方式,旨在将语义解析及意图切分的步骤转移到云端来实施以降低车机端的数据处理负荷,并使更多数据处理能力较弱的车机系统也实现多意图切分的功能,从而促进该技术的进一步推广。但需要注意的是,该实施例并不对本发明的保护范围构成限制。可选地,在另一些实施例中,本领域的技术人员可以基于本发明的上述构思,将控制系统的数据处理端也配置于车机系统,从而由车机系统在单机状态下实现同样的切分意图的效果。
尽管为使解释简单化将上述方法图示并描述为一系列动作,但是应理解并领会,这些方法不受动作的次序所限,因为根据一个或多个实施例,一些动作可按不同次序发生和/或与来自本文中图示和描述或本文中未图示和描述但本领域技术人员可以理解的其他动作并发地发生。
根据本发明的第二方面,本发明还提供了一种车机系统的控制系统。该车机系统的控制系统通过使用上述的车机系统的控制方法实现车机内的人工智能语音交互控制。具体操作如上所述,此处就不再赘述。通过实施上述控制方法,该控制系统能够根据同一语音数据中的多条操作指令及多个操作对象全面、准确地判断用户的真实意图,从而进一步实现车机系统与用户之间的智能互动,提升语音交互的效率,并提升用户体验。
根据本发明的第三方面,本发明还提供了一种计算机可读存储介质,其上存储有计算机指令。该计算机指令被处理器执行时,实施上述的车机系统的控制系统中的用户端和数据处理端所配置的方法。通过实施该控制方法,该计算机可读存储介质能够根据同一语音数据中的多条操作指令及多个操作对象全面、准确地判断用户的真实意图,从而进一步实现车机系统与用户之间的智能互动,提升语音交互的效率,并提升用户体验。
尽管上述的实施例所述的车机端及数据处理端可以通过软件与硬件的组合来实现的。但是可以理解,该车机端及该数据处理端也可在软件、硬件中加以实施。对于硬件实施而言,该车机端及该数据处理端可在一个或多个专用集成电路(ASIC)、数字信号处理器(DSP)、可编程逻辑器件(PLD)、现场可编程门阵列(FPGA)、处理器、控制器、微控制器、微处理器、用于执行上述功能的其它电子装置或上述装置的选择组合来加以实施。对软件实施而言,该车机端及该数据处理端可通过在通用芯片上运行的诸如程序模块(procedures)和 函数模块(functions)等独立的软件模块来加以实施,其中每一个模块执行一个或多个本文中描述的功能和操作。
本领域技术人员将可理解,信息、信号和数据可使用各种不同技术和技艺中的任何技术和技艺来表示。例如,以上描述通篇引述的数据、指令、命令、信息、信号、位(比特)、码元、和码片可由电压、电流、电磁波、磁场或磁粒子、光场或光学粒子、或其任何组合来表示。
本领域技术人员将进一步领会,结合本文中所公开的实施例来描述的各种解说性逻辑板块、模块、电路、和算法步骤可实现为电子硬件、计算机软件、或这两者的组合。为清楚地解说硬件与软件的这一可互换性,各种解说性组件、框、模块、电路、和步骤在上面是以其功能性的形式作一般化描述的。此类功能性是被实现为硬件还是软件取决于具体应用和施加于整体系统的设计约束。技术人员对于每种特定应用可用不同的方式来实现所描述的功能性,但这样的实现决策不应被解读成导致脱离了本发明的范围。
结合本文所公开的实施例描述的各种解说性逻辑模块、和电路可用通用处理器、数字信号处理器(DSP)、专用集成电路(ASIC)、现场可编程门阵列(FPGA)或其它可编程逻辑器件、分立的门或晶体管逻辑、分立的硬件组件、或其设计成执行本文所描述功能的任何组合来实现或执行。通用处理器可以是微处理器,但在替换方案中,该处理器可以是任何常规的处理器、控制器、微控制器、或状态机。处理器还可以被实现为计算设备的组合,例如DSP与微处理器的组合、多个微处理器、与DSP核心协作的一个或多个微处理器、或任何其他此类配置。
提供对本公开的先前描述是为使得本领域任何技术人员皆能够制作或使用本公开。对本公开的各种修改对本领域技术人员来说都将是显而易见的,且本文中所定义的普适原理可被应用到其他变体而不会脱离本公开的精神或范围。由此,本公开并非旨在被限定于本文中所描述的示例和设计,而是应被授予与本文中所公开的原理和新颖性特征相一致的最广范围。

Claims (17)

  1. 一种车机系统的控制方法,其特征在于,包括以下步骤:
    采集用户的语音数据;
    对采集的语音数据进行语音识别,以获取对应的话术信息;
    对所述话术信息进行语义解析,以获取多个槽位信息;
    根据预设的组合配置信息将所述多个槽位信息组合成多条控制指令;以及
    逐一执行所述多条控制指令。
  2. 如权利要求1所述的控制方法,其中,所述采集用户的语音数据的步骤包括:
    利用麦克风模块采集所述用户的多个录音模拟信号;
    将所述多个录音模拟信号分别转换为对应的语音数字信号;以及
    将各所述语音数字信号按时间顺序合成语音流数据。
  3. 如权利要求2所述的控制方法,其中,所述对采集的语音数据进行语音识别的步骤包括:
    对所述语音流数据进行语音识别处理,以将其解析成对应的话术信息。
  4. 如权利要求1所述的控制方法,其中,所述对所述话术信息进行语义解析的步骤包括:
    从所述话术信息中提取关键词;
    根据预设的槽位属性对所述多个关键词进行分类,将各所述关键词作为具有对应槽位属性的槽位信息;以及
    根据从所述话术信息中提取关键词的第一顺序排列各所述槽位信息,以构成槽位信息列表。
  5. 如权利要求4所述的控制方法,其中,所述组合配置信息中包括多个组合策略,各所述组合策略按预设的第二顺序排列,其中分别包括第一槽位属性、组合方向及第二槽位属性,所述根据预设的组合配置信息将所述多个槽位信息组合成多条控制指令的步骤包括:
    根据所述第一顺序确定所述槽位信息列表中的首个槽位信息;
    根据所述第二顺序确定第一槽位属性与所述首个槽位信息的槽位属性匹配的首个组合策略;
    沿所述首个组合策略指示的组合方向,逐一判断所述槽位信息列表中其余槽位信息的槽位属性是否与所述首个组合策略的第二槽位属性匹配;以及
    将槽位属性与所述首个组合策略的第二槽位属性匹配的首个其余槽位信息,与所述首个槽位信息组合成一条控制指令。
  6. 如权利要求5所述的控制方法,其中,所述根据预设的组合配置信息将所述多个槽位信息组合成多条控制指令的步骤还包括:
    若所述槽位信息列表中其余槽位信息的槽位属性都不匹配于所述首个组合策略的第二槽位属性,则根据所述第二顺序确定第一槽位属性与所述首个槽位信息的槽位属性匹配的下一个组合策略;
    沿所述下一个组合策略指示的组合方向,逐一判断所述槽位信息列表中其余槽位信息的槽位属性是否与所述下一个组合策略的第二槽位属性匹配;以及
    将槽位属性与所述下一个组合策略的第二槽位属性匹配的首个其余槽位信息,与所述首个槽位信息组合成一条控制指令。
  7. 如权利要求5所述的控制方法,其中,所述根据预设的组合配置信息将所述多个槽位信息组合成多条控制指令的步骤还包括:
    响应于组合获得一条控制指令,从所述槽位信息列表中删除所述控制指令涉及的多个槽位信息,并返回所述根据所述第一顺序确定所述槽位信息列表中的首个槽位信息的步骤。
  8. 如权利要求5所述的控制方法,其中,所述逐一执行所述多条控制指令的步骤包括:
    响应于组合获得一条所述控制指令,统计所述车机系统执行前一条控制指令的时间长度;
    响应于所述车机系统执行所述前一条控制指令的时间长度达到预设的时间阈值,控制所述车机系统执行所述控制指令。
  9. 一种车机系统的控制系统,其特征在于,包括:
    车机端,被配置用于采集用户的语音数据,并逐一执行从所述语音数据解析获得的多条控制指令;以及
    数据处理端,被配置用于对采集的语音数据进行语音识别以获取对应的话术信息,对所述话术信息进行语义解析以获取多个槽位信息,并根据预设的组合配置信息将所述多个槽位信息组合成所述多条控制指令。
  10. 如权利要求9所述的控制系统,其中,所述车机端被配置为:
    利用麦克风模块采集所述用户的多个录音模拟信号;
    将所述多个录音模拟信号分别转换为对应的语音数字信号;
    将各所述语音数字信号按时间顺序合成语音流数据;以及
    向所述数据处理端发送所述语音流数据。
  11. 如权利要求10所述的控制系统,其中,所述数据处理端包括语音处理系统,所述语音处理系统被配置为:
    利用所述语音处理系统对所述语音流数据进行语音识别处理,以将其解析成对应的话 术信息。
  12. 如权利要求9所述的控制系统,其中,所述数据处理端包括语义处理系统,所述语义处理系统被配置为:
    利用所述语义处理系统从所述话术信息中提取关键词;
    根据预设的槽位属性对所述多个关键词进行分类,将各所述关键词作为具有对应槽位属性的槽位信息;以及
    根据从所述话术信息中提取关键词的第一顺序排列各所述槽位信息,以构成槽位信息列表。
  13. 如权利要求12所述的控制系统,其中,所述数据处理端还包括意图切分系统,所述组合配置信息中包括多个组合策略,各所述组合策略按预设的第二顺序排列,其中分别包括第一槽位属性、组合方向及第二槽位属性,所述意图切分系统被配置为:
    根据所述第一顺序确定所述槽位信息列表中的首个槽位信息;
    根据所述第二顺序确定第一槽位属性与所述首个槽位信息的槽位属性匹配的首个组合策略;
    沿所述首个组合策略指示的组合方向,逐一判断所述槽位信息列表中其余槽位信息的槽位属性是否与所述首个组合策略的第二槽位属性匹配;以及
    将槽位属性与所述首个组合策略的第二槽位属性匹配的首个其余槽位信息,与所述首个槽位信息组合成一条控制指令。
  14. 如权利要求13所述的控制系统,其中,所述意图切分系统还被配置为:
    若所述槽位信息列表中其余槽位信息的槽位属性都不匹配于所述首个组合策略的第二槽位属性,则根据所述第二顺序确定第一槽位属性与所述首个槽位信息的槽位属性匹配的下一个组合策略;
    沿所述下一个组合策略指示的组合方向,逐一判断所述槽位信息列表中其余槽位信息的槽位属性是否与所述下一个组合策略的第二槽位属性匹配;以及
    将槽位属性与所述下一个组合策略的第二槽位属性匹配的首个其余槽位信息,与所述首个槽位信息组合成一条控制指令。
  15. 如权利要求13所述的控制系统,其中,所述意图切分系统还被配置为:
    响应于组合获得一条控制指令,从所述槽位信息列表中删除所述控制指令涉及的多个槽位信息,并返回所述根据所述第一顺序确定所述槽位信息列表中的首个槽位信息的步骤。
  16. 如权利要求13所述的控制系统,其中,所述数据处理端还被配置为:根据合成顺序排列所述多条控制指令以构建意图列表;以及向所述车机端发送所述意图列表,
    所述车机端还被配置为:执行所述意图列表中的首条控制指令,并统计执行所述首条控制指令的时间长度;以及响应于执行所述首条控制指令的时间长度达到预设的时间阈值, 执行所述意图列表中的下一条控制指令。
  17. 一种计算机可读存储介质,其上存储有计算机指令,其特征在于,所述计算机指令被处理器执行时,实施如权利要求1~8中任一项所述的车机系统的控制方法。
PCT/CN2021/106071 2021-06-02 2021-07-13 车机系统的控制方法及控制系统 WO2022252351A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110613144.4A CN115440200B (zh) 2021-06-02 2021-06-02 车机系统的控制方法及控制系统
CN202110613144.4 2021-06-02

Publications (1)

Publication Number Publication Date
WO2022252351A1 true WO2022252351A1 (zh) 2022-12-08

Family

ID=84271607

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/106071 WO2022252351A1 (zh) 2021-06-02 2021-07-13 车机系统的控制方法及控制系统

Country Status (2)

Country Link
CN (1) CN115440200B (zh)
WO (1) WO2022252351A1 (zh)

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180182380A1 (en) * 2016-12-28 2018-06-28 Amazon Technologies, Inc. Audio message extraction
CN109841212A (zh) * 2017-11-28 2019-06-04 现代自动车株式会社 分析具有多个意图的命令的语音识别系统和语音识别方法
CN110019687A (zh) * 2019-04-11 2019-07-16 宁波深擎信息科技有限公司 一种基于知识图谱的多意图识别系统、方法、设备及介质
CN110853645A (zh) * 2019-12-02 2020-02-28 三星电子(中国)研发中心 一种识别语音命令的方法及装置
CN111538817A (zh) * 2019-01-18 2020-08-14 北京京东尚科信息技术有限公司 人机交互方法和装置
CN111722825A (zh) * 2020-06-28 2020-09-29 广州小鹏车联网科技有限公司 交互方法、信息处理方法、车辆和服务器
CN112298080A (zh) * 2019-07-26 2021-02-02 上海博泰悦臻电子设备制造有限公司 车辆控制方法及系统

Family Cites Families (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9390716B2 (en) * 2013-04-19 2016-07-12 Panasonic Intellectual Property Corporation Of America Control method for household electrical appliance, household electrical appliance control system, and gateway
CN109086282A (zh) * 2017-06-14 2018-12-25 杭州方得智能科技有限公司 一种具备多任务驱动能力的多轮对话的方法和系统
KR102348124B1 (ko) * 2017-11-07 2022-01-07 현대자동차주식회사 차량의 기능 추천 장치 및 방법
CN108563790B (zh) * 2018-04-28 2021-10-08 科大讯飞股份有限公司 一种语义理解方法及装置、设备、计算机可读介质
CN109101545A (zh) * 2018-06-29 2018-12-28 北京百度网讯科技有限公司 基于人机交互的自然语言处理方法、装置、设备和介质
CN109241524B (zh) * 2018-08-13 2022-12-20 腾讯科技(深圳)有限公司 语义解析方法及装置、计算机可读存储介质、电子设备
CN109739965B (zh) * 2018-12-29 2022-07-15 深圳前海微众银行股份有限公司 跨领域对话策略的迁移方法及装置、设备、可读存储介质
CN110413250B (zh) * 2019-06-14 2021-06-01 华为技术有限公司 一种语音交互方法、装置及系统
CN110704641B (zh) * 2019-10-11 2023-04-07 零犀(北京)科技有限公司 一种万级意图分类方法、装置、存储介质及电子设备
CN111368538B (zh) * 2020-02-29 2023-10-24 平安科技(深圳)有限公司 语音交互方法、系统、终端及计算机可读存储介质
CN111738016B (zh) * 2020-06-28 2023-09-05 中国平安财产保险股份有限公司 多意图识别方法及相关设备
CN114186563A (zh) * 2020-09-15 2022-03-15 华为技术有限公司 电子设备及其语义解析方法、介质和人机对话系统

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180182380A1 (en) * 2016-12-28 2018-06-28 Amazon Technologies, Inc. Audio message extraction
CN109841212A (zh) * 2017-11-28 2019-06-04 现代自动车株式会社 分析具有多个意图的命令的语音识别系统和语音识别方法
CN111538817A (zh) * 2019-01-18 2020-08-14 北京京东尚科信息技术有限公司 人机交互方法和装置
CN110019687A (zh) * 2019-04-11 2019-07-16 宁波深擎信息科技有限公司 一种基于知识图谱的多意图识别系统、方法、设备及介质
CN112298080A (zh) * 2019-07-26 2021-02-02 上海博泰悦臻电子设备制造有限公司 车辆控制方法及系统
CN110853645A (zh) * 2019-12-02 2020-02-28 三星电子(中国)研发中心 一种识别语音命令的方法及装置
CN111722825A (zh) * 2020-06-28 2020-09-29 广州小鹏车联网科技有限公司 交互方法、信息处理方法、车辆和服务器

Also Published As

Publication number Publication date
CN115440200A (zh) 2022-12-06
CN115440200B (zh) 2024-03-12

Similar Documents

Publication Publication Date Title
DE102018010463B3 (de) Tragbare Vorrichtung, computerlesbares Speicherungsmedium, Verfahren und Einrichtung für energieeffiziente und leistungsarme verteilte automatische Spracherkennung
US9666190B2 (en) Speech recognition using loosely coupled components
KR102380689B1 (ko) 시각 보조 음성 처리
CN103700370B (zh) 一种广播电视语音识别系统方法及系统
US8560313B2 (en) Transient noise rejection for speech recognition
DE112020004504T5 (de) Kontoverbindung mit Gerät
US10255913B2 (en) Automatic speech recognition for disfluent speech
US20190043503A1 (en) Automatic speech recognition with filler model processing
CN109584876A (zh) 语音数据的处理方法、装置和语音空调
CN107600075A (zh) 车载系统的控制方法和装置
CN108447488B (zh) 增强语音识别任务完成
CN108962262A (zh) 语音数据处理方法和装置
WO2020233363A1 (zh) 语音识别的方法、装置、电子设备和存储介质
US11043222B1 (en) Audio encryption
CN110992955A (zh) 一种智能设备的语音操作方法、装置、设备及存储介质
CN109473103A (zh) 一种会议纪要生成方法
WO2023083142A1 (zh) 分句方法、装置、存储介质及电子设备
CN110232924A (zh) 车载语音管理方法、装置、车辆及存储介质
WO2022252351A1 (zh) 车机系统的控制方法及控制系统
CN111833870A (zh) 车载语音系统的唤醒方法、装置、车辆和介质
CN107767860B (zh) 一种语音信息处理方法和装置
CN110930643A (zh) 一种防止婴幼儿遗落车内的智能安全系统及方法
CN102571882A (zh) 基于网络的语音提醒的方法和系统
CN104702758B (zh) 一种终端及其管理多媒体记事本的方法
CN111431782A (zh) 车辆交互控制方法、装置及计算机可读存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21943708

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21943708

Country of ref document: EP

Kind code of ref document: A1