WO2022001013A1 - 语音交互方法、车辆、服务器、系统和存储介质 - Google Patents
语音交互方法、车辆、服务器、系统和存储介质 Download PDFInfo
- Publication number
- WO2022001013A1 WO2022001013A1 PCT/CN2020/135150 CN2020135150W WO2022001013A1 WO 2022001013 A1 WO2022001013 A1 WO 2022001013A1 CN 2020135150 W CN2020135150 W CN 2020135150W WO 2022001013 A1 WO2022001013 A1 WO 2022001013A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- voice
- vehicle
- request
- server
- result
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims abstract description 56
- 230000003993 interaction Effects 0.000 title claims abstract description 50
- 238000003860 storage Methods 0.000 title claims abstract description 15
- 238000012545 processing Methods 0.000 claims abstract description 31
- 230000009471 action Effects 0.000 claims description 42
- 238000004590 computer program Methods 0.000 claims description 25
- 238000010606 normalization Methods 0.000 claims description 8
- 238000007781 pre-processing Methods 0.000 claims description 5
- 230000004044 response Effects 0.000 claims description 5
- 238000012795 verification Methods 0.000 claims description 3
- 230000008569 process Effects 0.000 description 12
- 238000010586 diagram Methods 0.000 description 10
- 238000005516 engineering process Methods 0.000 description 4
- 238000004891 communication Methods 0.000 description 3
- 230000000694 effects Effects 0.000 description 3
- 230000006870 function Effects 0.000 description 3
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 239000011435 rock Substances 0.000 description 2
- 230000011218 segmentation Effects 0.000 description 2
- 238000004364 calculation method Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000018109 developmental process Effects 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 238000003825 pressing Methods 0.000 description 1
- 230000000750 progressive effect Effects 0.000 description 1
- 238000013077 scoring method Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/16—Sound input; Sound output
- G06F3/167—Audio in a user interface, e.g. using voice commands for navigating, audio feedback
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/3331—Query processing
- G06F16/334—Query execution
- G06F16/3344—Query execution using natural language analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/30—Semantic analysis
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/18—Speech classification or search using natural language modelling
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/18—Speech classification or search using natural language modelling
- G10L15/183—Speech classification or search using natural language modelling using context dependencies, e.g. language models
Definitions
- the present invention relates to the field of voice technology, and in particular, to a voice interaction method, vehicle, server, system and storage medium.
- the intelligence of the car has brought stronger on-board chips and graphics chips.
- the computing power of the new generation of on-board chips and the performance of the graphics chips make it possible to realize a richer interface like a mobile phone and more interesting animations on the in-vehicle system. a possibility.
- the current way of using voice in vehicles is often to set up an independent voice assistant, after receiving the user's voice request, it will give feedback through the server. This way of use is completely independent from the interface of the in-vehicle system. Since it only uses voice signals and lacks more dimensional information, the interaction quality of the human-computer interaction system is unsatisfactory.
- the embodiments of the present invention are proposed to provide a voice interaction method, vehicle, server, system and storage medium that overcome the above problems or at least partially solve the above problems.
- an embodiment of the present invention discloses a voice interaction method, which is applied to a voice interaction system comprising a vehicle and a server that can communicate with the vehicle, and is characterized in that it includes:
- the vehicle receives the user's voice request, and sends the voice request and the context information of the current in-vehicle system GUI to the server;
- the server completes the natural language understanding processing of the voice request according to the context information
- the server uses natural language to understand the processing results, generates executable instructions for the vehicle and sends them to the vehicle;
- the vehicle receives and executes the instruction, and at the same time feeds back the execution result to the user through voice.
- the context information includes the name and type of the operable controls in the current in-vehicle system GUI, the actions supported by the operable controls, the value range of the actions, and the current state of the operable controls.
- the server completes the natural language understanding processing of the voice request according to the context information, including:
- the operation of the output operable control in response to the voice request is processed as a result of natural language understanding.
- scene semantic space is created according to the context information, including:
- a matching result is output according to the sorting result; wherein the matching result includes the operation intention of the operable control, the name of the operable control, and the execution action of the operable control.
- the text in the voice request includes all or part of the text in the voice request, then the text in the voice request is extracted and retrieved in the scene semantic document, including any of the following:
- the entire text in the extracted speech request is retrieved in the scene semantic document.
- retrieval results are recalled using a preset recall strategy, including:
- An embodiment of the present invention further discloses a vehicle, including: a processor, a memory, and a computer program stored on the memory and capable of running on the processor, the computer program being executed by the processor to implement the above-mentioned The steps of the voice interaction method.
- An embodiment of the present invention further discloses a server, including: a processor, a memory, and a computer program stored on the memory and capable of running on the processor, the computer program being executed by the processor to implement the above-mentioned The steps of the voice interaction method.
- the embodiment of the present invention also discloses a voice interaction system, which includes a vehicle and a server that can communicate with the vehicle, wherein the vehicle is provided with a request receiving module, an information sending module, an instruction receiving module and an execution feedback module, and the server is provided with a request receiving module, an information sending module, an instruction receiving module and an execution feedback module.
- a natural language understanding module and an instruction sending module are provided;
- a request receiving module for receiving a user's voice request
- the information sending module is used to send the voice request and the context information of the current vehicle system GUI to the server;
- the natural language understanding module is used to complete the natural language understanding processing of the voice request according to the context information
- the instruction sending module is used for sending the instruction to the vehicle after the server uses natural language to understand the processing result and generates the instruction executable by the vehicle;
- the instruction receiving module is used for receiving and executing the instruction, and at the same time, the execution feedback module feeds back the execution result to the user through voice.
- An embodiment of the present invention further discloses a computer-readable storage medium, characterized in that, a computer program is stored on the computer-readable storage medium, and the computer program implements the above-mentioned voice interaction method when executed by a processor.
- GUI Graphic User Interface
- Any content on the interface can be operated by voice, thereby improving the interaction quality of the human-computer interaction system.
- Fig. 1 is the step flow chart of a kind of voice interaction method embodiment of the present invention
- Fig. 2 is the schematic diagram of the on-board system navigation broadcasting graphical user interface of the present invention
- Fig. 3 is the step flow chart of natural language understanding in a kind of speech interaction method of the present invention.
- Fig. 4 is a code schematic diagram of context information in an embodiment of a voice interaction method of the present invention.
- FIG. 5 is a structural block diagram of an embodiment of a voice interaction system of the present invention.
- FIG. 1 a flow chart of steps of an embodiment of a voice interaction method of the present invention is shown, which may specifically include the following steps:
- the vehicle receives the user's voice request, and sends the voice request and context information of the current on-board system GUI to the server.
- the server completes the natural language understanding processing of the voice request according to the context information.
- the server uses the natural language to understand the processing result, generates an executable instruction for the vehicle, and sends it to the vehicle.
- the vehicle receives and executes the instruction, and at the same time feeds back the execution result to the user through voice.
- the above voice interaction method is applied to a voice interaction system comprising a vehicle and a server capable of communicating with the vehicle.
- the vehicle is provided with a communication module, which can communicate with the server based on the carrier network including 3G, 4G or 5G or other communication connection methods to complete data interaction.
- the display area of the vehicle may include an instrument panel, a vehicle center control screen, and a HUD (Head Up Display, also known as a head-up display) that can be implemented on the vehicle windshield.
- the on-board system running on the vehicle uses a graphical user interface (Graphical User Interface, abbreviation: GUI), and the display area includes many UI elements, and different display areas can display different UI elements or the same UI elements.
- the UI elements may include card objects, application icons or interfaces, folder icons, multimedia file icons, and controls for interacting and operating, and so on.
- the context information includes the name and type of the operable controls in the current on-board system GUI, the actions supported by the operable controls, the value range of the actions and the current state of the operable controls.
- Fig. 2 when the user sees Fig. 2, he can directly send out voice requests such as "set the navigation broadcast volume to 18" and "turn off the system prompt tone".
- the operable controls involved in Figure 2 include three, the first one is a Slider type control named "Navigation Broadcast Volume”, the second one is a SelectTab type and a "Vehicle Prompt Tone" control, the third Three are controls of type Switch and named "System Beep". Among them, each control has a supported action, the value range of the action and the current state of the operable control.
- control named "Navigation Broadcast Volume” can be dragged to adjust the volume value, that is, the supported action is Set, the value range of this action is 0 to 30, and the current state is that the volume is set to 16.
- this control can be set to "Small”, “Medium”, “Large”; that is, the supported action is Set, and the value range of this action is “ Small”, “Medium”, “Large”, the current status is that the vehicle prompt sound is set to small.
- System Prompt Take the control named "System Prompt" as an example, this control can be turned on and off. That is, the supported actions include two actions: Turn On and Turn Off. The current state is that the system prompt is turned on.
- the steps of S2 include:
- the scene semantic space is to create an understandable semantic space based on the context information of the GUI.
- an example of the server creating the scene semantic space according to the context information is as follows in Table 1:
- the steps of S20 include:
- the vehicle sends the context information to the server in the form of a Json file through a communication network including but not limited to an operator network.
- FIG. 4 is an example of a Json file. In this solution, other file formats can also be used to send context information, which is not limited here.
- label represents the name of the operable control, and type represents the type of the operable control.
- the server loads and parses the Json file to obtain scene elements recorded in the file, where the scene elements include several operable controls and other UI elements.
- the server In the step of S203, the server generates a scene semantic document recording the scene semantic space according to the scene element.
- step of S21 includes:
- step S211 text preprocessing is performed on the text in the voice request, including Chinese word segmentation and removal of modal words (“um”, “ba”) and the like.
- Text normalization includes the normalization of numbers and entities. For example, “1.5 seconds” becomes “1.5 seconds” after normalization; “large screen brightness” becomes “large screen brightness” after normalization. "Central Control Brightness”. Extracting the sentence stem is to extract the entity words, action words and numerical values in the sentence, and the extracted sentence stem is mainly used for subsequent retrieval.
- the user's intention can be understood by using the action words in the extracted sentence stem, which facilitates subsequent verification of the operable controls.
- the step of S212 includes: determining a preliminary result of the intention of understanding the user's voice request according to the sentence stem, and then using the negative words in the sentence stem to correct the preliminary result, and outputting the corrected semantic understanding result.
- the text corresponding to the user's voice request is "Do not turn on the system prompt tone”
- the preliminary results can be obtained including the action word of "turn on” and the entity word of "system prompt tone”, but if "turn on the system prompt tone” is used as a semantic understanding
- the result is contrary to the real meaning of the user. Therefore, after obtaining the preliminary result, determine whether there is a negative word in the backbone of the sentence. This time the text includes "don't open”, which can be extracted and used to correct the preliminary result, that is, "don't open” is understood as " closure”.
- the semantic understanding result after this revision is "close the system prompt sound”.
- step of S22 it specifically includes:
- a word list for word segmentation is created in advance according to scenarios such as navigation and music, and then retrieval is performed based on the word list.
- searching a variety of search strategies can be used according to the utilization of different texts. That is, the text in the voice request includes all or part of the text in the voice request, then the step of S221 includes any of the following:
- the entire text in the extracted speech request is retrieved in the scene semantic document.
- the three retrieval strategies listed above cover part of the text including entity words, combinations of entity words and action words, as well as all texts in the voice request, and which retrieval strategy to use can be determined according to specific needs.
- an inverted index and a retrieval based on words and pinyin can be used to implement the retrieval, and the specific implementation is not limited herein.
- the retrieval result is recalled by using a preset recall strategy, wherein the preset recall strategy includes multiple types, as follows:
- Example 3 set the threshold to X%, and recall when the threshold is reached.
- the scoring may adopt various scoring methods such as query matching degree or document matching degree.
- the document matching degree is: matching length/document length (word length), and a specific matching strategy is used for a specific control, such as a POI (Point Of Interest, also known as: Point of Interest) list of controls that often appear in navigation.
- a specific matching strategy is used for a specific control, such as a POI (Point Of Interest, also known as: Point of Interest) list of controls that often appear in navigation.
- Specific matching strategies such as document matching can be used for specific controls like POI lists.
- the preset sorting strategy may include:
- Strategy 1 The scene semantic documents are sorted according to the highest score of each retrieval strategy
- Strategy 2 The scene semantic documents are sorted according to the sum of the scores of each retrieval strategy
- Strategy 3 The scene semantic documents are sorted according to the sum of the weighted scores of each retrieval strategy.
- ⁇ represents a preset score weight parameter.
- the matching includes exact matching and fuzzy matching, wherein the exact matching refers to the completely matched scene semantic document, if there is an action word in the voice request Query, and it conforms to the control operation, it is regarded as complete Matching; fuzzy matching refers to selecting the document with the highest score (if there are multiple results with the same score in the sorting results, multiple results with the same score will be selected), for the case of action words, the action words are combined to judge the selected document. Is the control correct.
- the matching result includes the operation intention of the operable control, the name of the operable control, and the execution action of the operable control.
- An example of a matching result is that the operation intent of the operable control is "set the gesture direction to inward", the name of the operable control is “Gesture Touch Rotation Direction”, and the execution action of the operable control is " Set to inward”.
- a voice request Query "set the gesture direction to inward” is sent for the name of the operable control in the displayed GUI interface as "gesture touch rotation direction”, and then step S22 is performed.
- the operation of the operable control in response to the voice request is: perform the action of "setting inward” on the operable control named “gesture touch rotation direction", that is, this operation can be used as a natural language Understand the processing results and output.
- the server uses the natural language understanding processing result output in the step of S23 to generate an executable instruction of the vehicle and send it to the vehicle.
- step S4 the vehicle receives and executes the instruction.
- the current state of the control named "Gesture Touch Rotation Direction" is "inward", and it can be passed through TTS (Text-To-Speech, also known as: From text to speech), the execution result is fed back to the user through speech.
- TTS Text-To-Speech, also known as: From text to speech
- the user has realized "visible and can-talk" for the graphical user interface on the in-vehicle system, and physical operations such as touching the screen and pressing buttons are not required in the whole process, and the full voice operation during the driving process of the vehicle makes the user's line of sight and attention. Focusing entirely on driving can fully ensure the safety of the vehicle. And by sending the context information of the current on-board system graphical user interface to the server, the server can make full use of the context information to complete natural language understanding processing during the voice interaction process. Any content on the interface can be operated by voice, which improves the interaction quality of the human-computer interaction system.
- FIG. 5 a structural block diagram of an embodiment of a voice interaction system of the present invention is shown, which may specifically include: a vehicle and a server that can communicate with the vehicle, wherein a request receiving module, an information sending module, an instruction The receiving module and the execution feedback module are provided with a natural language understanding module and an instruction sending module on the server.
- a request receiving module for receiving a user's voice request
- the information sending module is used to send the voice request and the context information of the current vehicle system GUI to the server;
- the natural language understanding module is used to complete the natural language understanding processing of the voice request according to the context information
- the instruction sending module is used for sending the instruction to the vehicle after the server uses natural language to understand the processing result and generates the instruction executable by the vehicle;
- the instruction receiving module is used for receiving and executing the instruction, and at the same time, the execution feedback module feeds back the execution result to the user through voice.
- the context information includes the name and type of the operable controls in the GUI of the current vehicle system, the actions supported by the operable controls, the value range of the actions, and the current state of the operable controls.
- the natural language understanding module includes:
- the understanding sub-module is used to semantically understand the voice request and output the semantic understanding result
- the processing sub-module is used to retrieve, recall, sort and match the operable controls in the scene semantic space using the semantic understanding results;
- the output sub-module is used to output the operation of the operable control in response to the voice request as the result of natural language understanding processing.
- creating submodules includes:
- a receiving unit for receiving context information sent by the vehicle
- the loading unit is used to load and parse the scene elements included in the context information
- a generation unit for generating scene semantic documents from scene elements for generating scene semantic documents from scene elements.
- the understanding sub-module includes:
- the processing unit is used to perform text preprocessing and text normalization on the text in the voice request, and then extract the sentence stem;
- the output unit is used to understand the intent of the user's voice request according to the sentence stem and output the semantic understanding result.
- the output unit is further configured to determine the preliminary result of the intention of understanding the user's voice request according to the sentence stem, and then use the negative words in the sentence stem to correct the preliminary result, and output the corrected semantic understanding result.
- the processing sub-module includes:
- the retrieval unit is used to extract the text in the voice request and retrieve it in the scene semantic document;
- the recall unit is used to recall the retrieval results by using the preset recall strategy, and then score the matching degree;
- the sorting unit is used to sort the search results after scoring according to the preset sorting strategy
- the matching unit is configured to output a matching result according to the sorting result; wherein the matching result includes the operation intention of the operable control, the name of the operable control, and the execution action of the operable control.
- the text in the voice request includes all or part of the text in the voice request, and the retrieval unit is specifically used for any of the following:
- the entire text in the extracted speech request is retrieved in the scene semantic document.
- the recall unit is specifically used for the retrieval results, using including text ignoring based on a preset list of ignorable words, core words must be hit, setting thresholds for recall, and verifying action words or negative intentions in the text.
- One or more preset recall strategies for recall are particularly used for the retrieval results, using including text ignoring based on a preset list of ignorable words, core words must be hit, setting thresholds for recall, and verifying action words or negative intentions in the text.
- One or more preset recall strategies for recall are preset recall strategies for recall.
- the embodiment of the present invention also provides a vehicle, including:
- It includes a processor, a memory, and a computer program stored on the memory and capable of running on the processor.
- the computer program When the computer program is executed by the processor, each process of the above-mentioned voice interaction method embodiment can be achieved, and the same technology can be achieved. The effect, in order to avoid repetition, is not repeated here.
- the embodiment of the present invention also provides a server, including:
- It includes a processor, a memory, and a computer program stored on the memory and capable of running on the processor.
- the computer program When the computer program is executed by the processor, each process of the above-mentioned voice interaction method embodiment can be achieved, and the same technology can be achieved. The effect, in order to avoid repetition, is not repeated here.
- Embodiments of the present invention also provide a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and when the computer program is executed by a processor, each process of the foregoing voice interaction method embodiment can be achieved, and the same technical effect can be achieved , in order to avoid repetition, it will not be repeated here.
- embodiments of the embodiments of the present invention may be provided as a method, an apparatus, or a computer program product. Accordingly, embodiments of the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, embodiments of the present invention may take the form of a computer program product implemented on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein.
- computer-usable storage media including, but not limited to, disk storage, CD-ROM, optical storage, etc.
- Embodiments of the present invention are described with reference to flowcharts and/or block diagrams of methods, terminal devices (systems), and computer program products according to embodiments of the present invention. It will be understood that each flow and/or block in the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to the processor of a general purpose computer, special purpose computer, embedded processor or other programmable data processing terminal equipment to produce a machine that causes the instructions to be executed by the processor of the computer or other programmable data processing terminal equipment Means are created for implementing the functions specified in the flow or flows of the flowcharts and/or the blocks or blocks of the block diagrams.
- These computer program instructions may also be stored in a computer readable memory capable of directing a computer or other programmable data processing terminal equipment to operate in a particular manner, such that the instructions stored in the computer readable memory result in an article of manufacture comprising instruction means, the The instruction means implement the functions specified in the flow or flow of the flowcharts and/or the block or blocks of the block diagrams.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Artificial Intelligence (AREA)
- Computational Linguistics (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Health & Medical Sciences (AREA)
- Human Computer Interaction (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Multimedia (AREA)
- General Health & Medical Sciences (AREA)
- Acoustics & Sound (AREA)
- Databases & Information Systems (AREA)
- Data Mining & Analysis (AREA)
- User Interface Of Digital Computer (AREA)
Abstract
一种语音交互方法、车辆、服务器和存储介质,方法包括:车辆接收用户的语音请求,并将语音请求和当前车载系统图形用户界面的上下文信息发送至服务器(S1);服务器根据上下文信息完成语音请求的自然语言理解处理(S2);服务器利用自然语言理解处理结果,生成车辆可执行的指令并发送给车辆(S3);车辆接收并执行该指令,同时将执行结果通过语音反馈给用户(S4)。语音交互过程中服务器可以充分利用上下文信息完成自然语言理解处理,由于增加了更多维度的信息,用户在车辆中看到图形用户界面上的任何内容都可以通过语音进行操作,进而提高了人机交互系统的交互质量。
Description
相关申请的交叉引用
本申请要求于2020年06月28日提交中国专利局的申请号为CN202010596817.5、名称为“语音交互方法、车辆、服务器、系统和存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
本发明涉及语音技术领域,特别是涉及一种语音交互方法、车辆、服务器、系统和存储介质。
随着汽车智能化和语音技术的发展,语音在汽车上的运用越来越广泛。在用户驾驶车辆的过程中,能够无接触地实现用户对车辆或者车辆上车载系统的控制,可以在保障行车安全的情况下增强用户的使用体验。
汽车智能化带来了更强的车机芯片和图形芯片,新一代车机芯片的算力以及图形芯片的性能,使得在车载系统上实现像手机一样更丰富的界面和更有趣味的动画成为了一种可能。现在车辆上使用语音的方式经常是设置一个独立的语音助理,在接收完用户的语音请求后,通过服务器给予反馈。这种使用方式和车载系统的界面是完全独立的,由于只利用语音信号,缺少更多维度的信息,使人机交互系统的交互质量难以令人满意。
发明内容
鉴于上述问题,提出了本发明实施例以便提供一种克服上述问题或者至少部分地解决上述问题的一种语音交互方法、车辆、服务器、系统和存储介质。
为了解决上述问题,本发明实施例公开了一种语音交互方法,应用于 包括车辆和可与车辆进行通信的服务器组成的语音交互系统,其特征在于,包括:
车辆接收用户的语音请求,并将语音请求和当前车载系统图形用户界面的上下文信息发送至服务器;
服务器根据上下文信息完成语音请求的自然语言理解处理;
服务器利用自然语言理解处理结果,生成车辆可执行的指令并发送给车辆;
车辆接收并执行该指令,同时将执行结果通过语音反馈给用户。
进一步地,上下文信息包括当前车载系统图形用户界面中可操作的控件的名称和类型、可操作的控件支持的动作、动作的取值范围和可操作的控件当前的状态。
进一步地,服务器根据上下文信息完成语音请求的自然语言理解处理,包括:
根据上下文信息创建场景语义空间;
对语音请求进行语义理解并输出语义理解结果;
在场景语义空间,利用语义理解结果对可操作的控件进行检索、召回、排序和匹配;
输出可操作的控件响应该语音请求的操作作为自然语言理解处理结果。
进一步地,根据上下文信息创建场景语义空间,包括:
接收车辆发送的上下文信息;
载入并解析上下文信息中包括的场景元素;
根据场景元素生成场景语义文档。
进一步地,对语音请求进行语义理解并输出语义理解结果,包括:
对语音请求中的文本进行文本预处理和文本归一化处理,然后提取句子主干;
根据句子主干理解用户语音请求的意图并输出语义理解结果。
进一步地,根据句子主干理解用户语音请求的意图并输出语义理解结果,包括:
根据句子主干确定理解用户语音请求的意图的初步结果,再利用句子主干中的否定词对初步结果进行修正,输出修正后的语义理解结果。
进一步地,在场景语义空间,利用语义理解结果对可操作的控件进行检索、召回、排序和匹配,包括:
提取语音请求中的文本在场景语义文档中检索;
利用预设召回策略对检索结果进行召回,再进行匹配度打分;
按预设排序策略对打分后的检索结果进行排序;
根据排序结果输出匹配结果;其中匹配结果包括对可操作的控件的操作意图、可操作的控件的名称、以及对可操作的控件的执行动作。
进一步地,语音请求中的文本包括语音请求中的全部文本或者部分文本,则提取语音请求中的文本在场景语义文档中检索,包括以下任意一种:
提取语音请求中的实体词在场景语义文档中检索;
提取语音请求中的包括实体词和动作词在内的文本在场景语义文档中检索;
或,
提取语音请求中的全部文本在场景语义文档中检索。
进一步地,利用预设召回策略对检索结果进行召回,包括:
针对检索结果,利用包括基于预设的可忽略词列表进行文本忽略、核心词必须命中、设置阈值进行召回、对文本中的动作词或者否定意图进行校验在内的一种或者多种预设召回策略进行召回。
本发明实施例还公开了一种车辆,包括:处理器、存储器及存储在所述存储器上并能够在所述处理器上运行的计算机程序,所述计算机程序被所述处理器执行时实现上述的语音交互方法的步骤。
本发明实施例还公开了一种服务器,包括:处理器、存储器及存储在所述存储器上并能够在所述处理器上运行的计算机程序,所述计算机程序被所述处理器执行时实现上述的语音交互方法的步骤。
本发明实施例还公开了一种语音交互系统,该系统包括车辆和可与车辆进行通信的服务器,其中,车辆上设置有请求接收模块、信息发送模块、指 令接收模块和执行反馈模块,服务器上设置有自然语言理解模块和指令发送模块;
请求接收模块,用于接收用户的语音请求;
信息发送模块,用于将语音请求和当前车载系统图形用户界面的上下文信息发送至服务器;
自然语言理解模块,用于根据上下文信息完成语音请求的自然语言理解处理;
指令发送模块,用于在服务器利用自然语言理解处理结果,生成车辆可执行的指令后将指令发送给车辆;
指令接收模块,用于接收并执行该指令,同时通过执行反馈模块将执行结果通过语音反馈给用户。
本发明实施例还公开了一种计算机可读存储介质,其特征在于,所述计算机可读存储介质上存储计算机程序,所述计算机程序被处理器执行时实现上述的语音交互方法。
本发明实施例包括以下优点:
通过将当前车载系统图形用户界面的上下文信息发送至服务器,使得语音交互过程中服务器可以充分利用上下文信息完成自然语言理解处理,由于增加了更多维度的信息,用户在车辆中看到图形用户界面(Graphical User Interface,缩写:GUI)上的任何内容都可以通过语音进行操作,进而提高了人机交互系统的交互质量。
图1是本发明的一种语音交互方法实施例的步骤流程图;
图2是本发明的车载系统导航播报图形用户界面的示意图;
图3是本发明的一种语音交互方法中自然语言理解的步骤流程图;
图4是本发明的一种语音交互方法实施例中上下文信息的代码示意图;
图5是本发明的一种语音交互系统实施例的结构框图。
为使本发明的上述目的、特征和优点能够更加明显易懂,下面结合附图和具体实施方式对本发明作进一步详细的说明。
参照图1,示出了本发明的一种语音交互方法实施例的步骤流程图,具体可以包括如下步骤:
S1,车辆接收用户的语音请求,并将语音请求和当前车载系统图形用户界面的上下文信息发送至服务器。
S2,服务器根据上下文信息完成语音请求的自然语言理解处理。
S3,服务器利用自然语言理解处理结果,生成车辆可执行的指令并发送给车辆。
S4,车辆接收并执行该指令,同时将执行结果通过语音反馈给用户。
上述语音交互方法,应用于包括车辆和可与车辆进行通信的服务器组成的语音交互系统。具体地,车辆上设置有通信模块,可以基于包括3G、4G或者5G在内的运营商网络或者其他通信连接方式,和服务器进行通信完成数据交互。
在车辆中,车辆的显示区域可以包括仪表屏、车载中控屏幕以及车辆挡风玻璃上可以实现的HUD(Head Up Display,又称:抬头显示)等。车辆上运行的车载系统使用图形用户界面(Graphical User Interface,缩写:GUI)上的显示区域包括诸多UI元素,不同的显示区域可以展示不同的UI元素,也可以展示相同的UI元素。其中,UI元素可以包括卡片对象、应用程序图标或界面、文件夹图标、多媒体文件图标以及用于进行交互可操作的控件等等。
在S1的步骤中,上下文信息包括当前车载系统图形用户界面中可操作的控件的名称和类型、可操作的控件支持的动作、动作的取值范围和可操作的控件当前的状态。
以图2为例,用户看到图2的时候,可以直接发出“导航播报音量设为18”、“关闭系统提示音”等语音请求。图2中涉及的可操作的控件包括三个,第一个是类型为Slider且名称为“导航播报音量”的控件、第二个是类型为 SelectTab且名称为“车辆提示音”的控件、第三个是类型为Switch且名称为“系统提示音”的控件。其中,每个控件都有支持的动作,、动作的取值范围和可操作的控件当前的状态。
例如名称为“导航播报音量”的控件可以拖动调节音量的数值,即支持的动作为设置(Set),这个动作的取值范围为0~30,当前的状态是音量被设置为16。
继续以名称为“车辆提示音”的控件为例,这个控件可以被设置为“小”、“中”、“大”;即支持的动作为设置(Set),这个动作的取值范围是“小”、“中”、“大”,当前的状态是车辆提示音被设置为小。
再以名称为“系统提示音”的控件为例,这个控件可以打开、关闭。即支持的动作包括打开(Turn On)和关闭(Turn Off)两个动作,当前的状态是系统提示音被打开。
具体地,如图3所示,S2的步骤包括:
S20,根据上下文信息创建场景语义空间;
S21,对语音请求进行语义理解并输出语义理解结果;
S22,在场景语义空间,利用语义理解结果对可操作的控件进行检索、召回、排序和匹配;
S23,输出可操作的控件响应该语音请求的操作作为自然语言理解处理结果。
场景语义空间是根据GUI的上下文信息,创建可理解的语义空间。在S20的步骤中,基于图2,服务器根据上下文信息创建场景语义空间的一个示例如下表1:
表1
具体地,S20的步骤包括:
S201,接收车辆发送的上下文信息;
S202,载入并解析上下文信息中包括的场景元素;
S203,根据场景元素生成场景语义文档。
在S201的步骤中,车辆通过包括但不限于运营商网络在内的通信网络,将上下文信息以Json文件的形式发送至服务器。图4是Json文件的一个示例,本方案中,也可以使用其他文件格式来发送上下文信息,在此不做限定。在图4中,label表示可操作的控件的名称,type表示可操作的控件的类型。
在S202的步骤中,服务器载入Json文件并进行解析,获得文件中记载的场景元素,场景元素包括若干个可操作的控件和其他UI元素。
在S203的步骤中,服务器根据场景元素,生成记载了场景语义空间的场景语义文档。
进一步地,S21的步骤包括:
S211,对语音请求中的文本进行文本预处理和文本归一化处理,然后提取句子主干;
S212,根据句子主干理解用户语音请求的意图并输出语义理解结果。
在S211的步骤中,对语音请求中的文本进行文本预处理,包括进行中文分词和去除语气用词(“嗯”“吧”)等。文本归一化处理包括对数字和实 体的归一化,例如“一点五秒”在进行归一化处理后变成“1.5秒”;“大屏亮度”在进行归一化处理后变成“中控亮度”。提取句子主干是提取句子中的实体词、动作词和数值,提取的句子主干主要用于后续的检索。
在S212的步骤中,利用提取的句子主干中的动作词可以理解用户的意图,方便进行后续对可操作的控件的验证。
进一步地,S212的步骤包括:根据句子主干确定理解用户语音请求的意图的初步结果,再利用句子主干中的否定词对初步结果进行修正,输出修正后的语义理解结果。例如用户的语音请求对应的文本是“不要打开系统提示音”,可以获得初步结果包括“打开”的动作词和“系统提示音”的实体词,但是如果将“打开系统提示音”作为语义理解结果则与用户的真实意思相反,所以,在获得初步结果后,判断句子主干是否有否定词,本次文本包括“不要打开”,可以抽取用来修正初步结果,即将“不要打开”理解为“关闭”。则本次修正后的语义理解结果为“关闭系统提示音”。
在S22的步骤中,具体包括:
S221,提取语音请求中的文本在场景语义文档中检索;
S222,利用预设召回策略对检索结果进行召回,再进行匹配度打分;
S223,按预设排序策略对打分后的检索结果进行排序;
S224,根据排序结果输出匹配结果;其中匹配结果包括对可操作的控件的操作意图、可操作的控件的名称、以及对可操作的控件的执行动作。
在S221的步骤中,预先根据导航、音乐等场景制作分词的词表,然后基于词表进行检索。检索的时候,可以根据不同文本的利用方式来使用多种检索策略。即语音请求中的文本包括语音请求中的全部文本或者部分文本,则S221的步骤,包括以下任意一种:
提取语音请求中的实体词在场景语义文档中检索;
提取语音请求中的包括实体词和动作词在内的文本在场景语义文档中检索;
或,
提取语音请求中的全部文本在场景语义文档中检索。
上面列举的三种检索策略,涵盖了实体词、实体词和动作词的组合在内的部分文本,以及语音请求中的全部文本,可以根据具体需求来确定使用什么样的检索策略。检索实现时,可以使用例如倒排索引以及基于词和拼音的检索来实现,在此对具体实现方式不做限定。
在S222的步骤中,利用预设召回策略对检索结果进行召回,其中预设的召回策略包括多种,具体如下:
召回策略1:基于预设的可忽略词列表进行文本忽略
示例1,Label=<摇滚>,语音请求的文本Query=“切换为摇滚模式”,“模式”当前场景下可忽略。
召回策略2:核心词必须命中
示例2,Label=<打开地图设置>,语音请求的文本Query=“打开系统设置”,“地图”当前场景下必须命中,否则会产生误召回的结果。
召回策略3:设置阈值进行召回
示例3,设置阈值为X%,达到阈值进行召回。
召回策略4:对文本中的动作词或者否定意图进行校验
示例4,Label=<连接第一个蓝牙>,语音请求的文本Query=“断开第一个蓝牙|不连接第一个蓝牙”,不校验动作词或否定意图会误召回。
在S222的步骤中,打分可以采取Query匹配度或者文档匹配度等多种打分方式。
Query匹配度为:匹配长度/Query主干长度(字长),匹配长度=Query和文档匹配词长度(字长)。
文档匹配度为:匹配长度/文档长度(字长),针对特定控件使用特定匹配策略,例如导航中经常出现的控件POI(Point Of Interest,又称:兴趣点)列表。可以针对像POI列表这种特定控件,使用诸如文档匹配的特定匹配策略。
在S223的步骤中,预设排序策略可以包括:
策略一:场景语义文档按照各个检索策略的最高得分进行排序;
策略二:场景语义文档按照各个检索策略的得分加和进行排序;
策略三:场景语义文档按照各个检索策略的加权得分加和进行排序。
其中,得分的计算方式是:得分=α*文档匹配度+(1-α)*Query匹配度。α表示预设的得分权重参数。
即根据需要选择排序策略,然后获得对应的排序结果。
在S224的步骤中,匹配包括精准匹配和模糊匹配在内的情况,其中精准匹配指对于完全匹配的场景语义文档,如果语音请求Query中有动作词,且其符合控件操作的话,则视为完全匹配;模糊匹配指选择得分最高的文档(若排序结果中存在多个得分相同的结果则对这得分相同的多个结果进行多选),针对有动作词的情况,则结合动作词判断所选控件是否正确。匹配结果包括对可操作的控件的操作意图、可操作的控件的名称、以及对可操作的控件的执行动作。一种匹配结果的示例为,可操作的控件的操作意图为“将手势方向设为向内”,可操作的控件的名称为“手势触控旋转方向”,可操作的控件的执行动作为“设置为向内”。针对显示的GUI界面中可操作的控件的名称为“手势触控旋转方向”发出语音请求Query=“将手势方向设为向内”,则执行S22的步骤。
在S23的步骤中,可操作的控件响应该语音请求的操作为:将名称为“手势触控旋转方向”的可操作的控件执行“设置为向内”的动作,即这个操作可以作为自然语言理解处理结果进行输出。
在S3的步骤中,服务器利用S23这个步骤输出的自然语言理解处理结果,生成车辆可执行的指令并发送给车辆。
在S4的步骤中,车辆接收并执行该指令,执行后,名称为“手势触控旋转方向”的控件当前的状态为“向内”,并且可以通过TTS(Text-To-Speech,又称:从文本到语音)的方式,将执行结果通过语音反馈给用户。
从上可知,用户对于车载系统上的图形用户界面实现了“可见即可说”,全程不需要触摸屏幕、按压按键等物理操作,在车辆行驶的过程中全语音的操作使得用户的视线以及注意力完全专注在驾驶上,可以充分保证车辆行驶安全。并且通过将当前车载系统图形用户界面的上下文信息发送至服务器,使得语音交互过程中服务器可以充分利用上下文信息完成自然语言理解处 理,由于增加了更多维度的信息,用户在车辆中看到图形用户界面上的任何内容都可以通过语音进行操作,进而提高了人机交互系统的交互质量。
需要说明的是,对于方法实施例,为了简单描述,故将其都表述为一系列的动作组合,但是本领域技术人员应该知悉,本发明实施例并不受所描述的动作顺序的限制,因为依据本发明实施例,某些步骤可以采用其他顺序或者同时进行。其次,本领域技术人员也应该知悉,说明书中所描述的实施例均属于优选实施例,所涉及的动作并不一定是本发明实施例所必须的。
参照图5,示出了本发明的一种语音交互系统实施例的结构框图,具体可以包括:车辆和可与车辆进行通信的服务器,其中,车辆上设置有请求接收模块、信息发送模块、指令接收模块和执行反馈模块,服务器上设置有自然语言理解模块和指令发送模块。
请求接收模块,用于接收用户的语音请求;
信息发送模块,用于将语音请求和当前车载系统图形用户界面的上下文信息发送至服务器;
自然语言理解模块,用于根据上下文信息完成语音请求的自然语言理解处理;
指令发送模块,用于在服务器利用自然语言理解处理结果,生成车辆可执行的指令后将指令发送给车辆;
指令接收模块,用于接收并执行该指令,同时通过执行反馈模块将执行结果通过语音反馈给用户。
在语音交互系统中,上下文信息包括当前车载系统图形用户界面中可操作的控件的名称和类型、可操作的控件支持的动作、动作的取值范围和可操作的控件当前的状态。
具体地,自然语言理解模块包括:
创建子模块,用于根据上下文信息创建场景语义空间;
理解子模块,用于对语音请求进行语义理解并输出语义理解结果;
处理子模块,用于在场景语义空间,利用语义理解结果对可操作的控件进行检索、召回、排序和匹配;
输出子模块,用于输出可操作的控件响应该语音请求的操作作为自然语言理解处理结果。
其中,创建子模块包括:
接收单元,用于接收车辆发送的上下文信息;
载入单元,用于载入并解析上下文信息中包括的场景元素;
生成单元,用于根据场景元素生成场景语义文档。
其中,理解子模块包括:
处理单元,用于对语音请求中的文本进行文本预处理和文本归一化处理,然后提取句子主干;
输出单元,用于根据句子主干理解用户语音请求的意图并输出语义理解结果。
进一步地,输出单元还用于根据句子主干确定理解用户语音请求的意图的初步结果,再利用句子主干中的否定词对初步结果进行修正,输出修正后的语义理解结果。
其中,处理子模块包括:
检索单元,用于提取语音请求中的文本在场景语义文档中检索;
召回单元,用于利用预设召回策略对检索结果进行召回,再进行匹配度打分;
排序单元,用于按预设排序策略对打分后的检索结果进行排序;
匹配单元,用于根据排序结果输出匹配结果;其中匹配结果包括对可操作的控件的操作意图、可操作的控件的名称、以及对可操作的控件的执行动作。
进一步地,语音请求中的文本包括语音请求中的全部文本或者部分文本,则检索单元具体用于以下任意一种:
提取语音请求中的实体词在场景语义文档中检索;
提取语音请求中的包括实体词和动作词在内的文本在场景语义文档中 检索;
或,
提取语音请求中的全部文本在场景语义文档中检索。
进一步地,召回单元具体用于针对检索结果,利用包括基于预设的可忽略词列表进行文本忽略、核心词必须命中、设置阈值进行召回、对文本中的动作词或者否定意图进行校验在内的一种或者多种预设召回策略进行召回。
对于装置实施例而言,由于其与方法实施例基本相似,所以描述的比较简单,相关之处参见方法实施例的部分说明即可。
本发明实施例还提供了一种车辆,包括:
包括处理器、存储器及存储在所述存储器上并能够在所述处理器上运行的计算机程序,该计算机程序被处理器执行时实现上述语音交互方法实施例的各个过程,且能达到相同的技术效果,为避免重复,这里不再赘述。
本发明实施例还提供了一种服务器,包括:
包括处理器、存储器及存储在所述存储器上并能够在所述处理器上运行的计算机程序,该计算机程序被处理器执行时实现上述语音交互方法实施例的各个过程,且能达到相同的技术效果,为避免重复,这里不再赘述。
本发明实施例还提供了一种计算机可读存储介质,计算机可读存储介质上存储计算机程序,计算机程序被处理器执行时实现上述语音交互方法实施例的各个过程,且能达到相同的技术效果,为避免重复,这里不再赘述。
本说明书中的各个实施例均采用递进的方式描述,每个实施例重点说明的都是与其他实施例的不同之处,各个实施例之间相同相似的部分互相参见 即可。
本领域内的技术人员应明白,本发明实施例的实施例可提供为方法、装置、或计算机程序产品。因此,本发明实施例可采用完全硬件实施例、完全软件实施例、或结合软件和硬件方面的实施例的形式。而且,本发明实施例可采用在一个或多个其中包含有计算机可用程序代码的计算机可用存储介质(包括但不限于磁盘存储器、CD-ROM、光学存储器等)上实施的计算机程序产品的形式。
本发明实施例是参照根据本发明实施例的方法、终端设备(系统)、和计算机程序产品的流程图和/或方框图来描述的。应理解可由计算机程序指令实现流程图和/或方框图中的每一流程和/或方框、以及流程图和/或方框图中的流程和/或方框的结合。可提供这些计算机程序指令到通用计算机、专用计算机、嵌入式处理机或其他可编程数据处理终端设备的处理器以产生一个机器,使得通过计算机或其他可编程数据处理终端设备的处理器执行的指令产生用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的装置。
这些计算机程序指令也可存储在能引导计算机或其他可编程数据处理终端设备以特定方式工作的计算机可读存储器中,使得存储在该计算机可读存储器中的指令产生包括指令装置的制造品,该指令装置实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能。
这些计算机程序指令也可装载到计算机或其他可编程数据处理终端设备上,使得在计算机或其他可编程终端设备上执行一系列操作步骤以产生计算机实现的处理,从而在计算机或其他可编程终端设备上执行的指令提供用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的步骤。
尽管已描述了本发明实施例的优选实施例,但本领域内的技术人员一旦得知了基本创造性概念,则可对这些实施例做出另外的变更和修改。所以,所附权利要求意欲解释为包括优选实施例以及落入本发明实施例范围的所有变更和修改。
最后,还需要说明的是,在本文中,诸如第一和第二等之类的关系术语仅仅用来将一个实体或者操作与另一个实体或操作区分开来,而不一定要求或者暗示这些实体或操作之间存在任何这种实际的关系或者顺序。而且,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、物品或者终端设备不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、物品或者终端设备所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括所述要素的过程、方法、物品或者终端设备中还存在另外的相同要素。
以上对本发明所提供的语音交互方法、车辆、服务器和存储介质,进行了详细介绍,本文中应用了具体个例对本发明的原理及实施方式进行了阐述,以上实施例的说明只是用于帮助理解本发明的方法及其核心思想;同时,对于本领域的一般技术人员,依据本发明的思想,在具体实施方式及应用范围上均会有改变之处,综上所述,本说明书内容不应理解为对本发明的限制。
Claims (13)
- 一种语音交互方法,应用于包括车辆和可与车辆进行通信的服务器组成的语音交互系统,其特征在于,包括:车辆接收用户的语音请求,并将语音请求和当前车载系统图形用户界面的上下文信息发送至服务器;服务器根据上下文信息完成语音请求的自然语言理解处理;服务器利用自然语言理解处理结果,生成车辆可执行的指令并发送给车辆;车辆接收并执行该指令,同时将执行结果通过语音反馈给用户。
- 如权利要求1所述语音交互方法,其特征在于,上下文信息包括当前车载系统图形用户界面中可操作的控件的名称和类型、可操作的控件支持的动作、动作的取值范围和可操作的控件当前的状态。
- 如权利要求2所述语音交互方法,其特征在于,服务器根据上下文信息完成语音请求的自然语言理解处理,包括:根据上下文信息创建场景语义空间;对语音请求进行语义理解并输出语义理解结果;在场景语义空间,利用语义理解结果对可操作的控件进行检索、召回、排序和匹配;输出可操作的控件响应该语音请求的操作作为自然语言理解处理结果。
- 如权利要求3所述语音交互方法,其特征在于,根据上下文信息创建场景语义空间,包括:接收车辆发送的上下文信息;载入并解析上下文信息中包括的场景元素;根据场景元素生成场景语义文档。
- 如权利要求4所述语音交互方法,其特征在于,对语音请求进行语义理解并输出语义理解结果,包括:对语音请求中的文本进行文本预处理和文本归一化处理,然后提取句子主干;根据句子主干理解用户语音请求的意图并输出语义理解结果。
- 如权利要求5所述语音交互方法,其特征在于,根据句子主干理解用户语音请求的意图并输出语义理解结果,包括:根据句子主干确定理解用户语音请求的意图的初步结果,再利用句子主干中的否定词对初步结果进行修正,输出修正后的语义理解结果。
- 如权利要求6所述语音交互方法,其特征在于,在场景语义空间,利用语义理解结果对可操作的控件进行检索、召回、排序和匹配,包括:提取语音请求中的文本在场景语义文档中检索;利用预设召回策略对检索结果进行召回,再进行匹配度打分;按预设排序策略对打分后的检索结果进行排序;根据排序结果输出匹配结果;其中匹配结果包括对可操作的控件的操作意图、可操作的控件的名称、以及对可操作的控件的执行动作。
- 如权利要求7所述语音交互方法,其特征在于,语音请求中的文本包括语音请求中的全部文本或者部分文本,则提取语音请求中的文本在场景语义文档中检索,包括以下任意一种:提取语音请求中的实体词在场景语义文档中检索;提取语音请求中的包括实体词和动作词在内的文本在场景语义文档中检索;或,提取语音请求中的全部文本在场景语义文档中检索。
- 如权利要求7所述语音交互方法,其特征在于,利用预设召回策略对检索结果进行召回,包括:针对检索结果,利用包括基于预设的可忽略词列表进行文本忽略、核心词必须命中、设置阈值进行召回、对文本中的动作词或者否定意图进行校验在内的一种或者多种预设召回策略进行召回。
- 一种车辆,其特征在于,包括:处理器、存储器及存储在所述存储器上并能够在所述处理器上运行的计算机程序,所述计算机程序被所述处理器执行时实现如权利要求1-9中任一项所述的语音交互方法的步骤。
- 一种服务器,其特征在于,包括:处理器、存储器及存储在所述 存储器上并能够在所述处理器上运行的计算机程序,所述计算机程序被所述处理器执行时实现如权利要求1-9中任一项所述的语音交互方法的步骤。
- 一种语音交互系统,其特征在于,该系统包括车辆和可与车辆进行通信的服务器,其中,车辆上设置有请求接收模块、信息发送模块、指令接收模块和执行反馈模块,服务器上设置有自然语言理解模块和指令发送模块;请求接收模块,用于接收用户的语音请求;信息发送模块,用于将语音请求和当前车载系统图形用户界面的上下文信息发送至服务器;自然语言理解模块,用于根据上下文信息完成语音请求的自然语言理解处理;指令发送模块,用于在服务器利用自然语言理解处理结果,生成车辆可执行的指令后将指令发送给车辆;指令接收模块,用于接收并执行该指令,同时通过执行反馈模块将执行结果通过语音反馈给用户。
- 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质上存储计算机程序,所述计算机程序被处理器执行时实现如权利要求1至9中任一项所述的语音交互方法的步骤。
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010596817.5A CN111767021A (zh) | 2020-06-28 | 2020-06-28 | 语音交互方法、车辆、服务器、系统和存储介质 |
CN202010596817.5 | 2020-06-28 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2022001013A1 true WO2022001013A1 (zh) | 2022-01-06 |
Family
ID=72722481
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2020/135150 WO2022001013A1 (zh) | 2020-06-28 | 2020-12-10 | 语音交互方法、车辆、服务器、系统和存储介质 |
Country Status (2)
Country | Link |
---|---|
CN (2) | CN111767021A (zh) |
WO (1) | WO2022001013A1 (zh) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114842847A (zh) * | 2022-04-27 | 2022-08-02 | 中国第一汽车股份有限公司 | 一种车载用语音控制方法以及装置 |
CN114842839A (zh) * | 2022-04-08 | 2022-08-02 | 北京百度网讯科技有限公司 | 车载人机交互方法、装置、设备、存储介质及程序产品 |
Families Citing this family (20)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111767021A (zh) * | 2020-06-28 | 2020-10-13 | 广州小鹏车联网科技有限公司 | 语音交互方法、车辆、服务器、系统和存储介质 |
CN112164400A (zh) * | 2020-09-18 | 2021-01-01 | 广州小鹏汽车科技有限公司 | 语音交互方法、服务器和计算机可读存储介质 |
CN112242141B (zh) * | 2020-10-15 | 2022-03-15 | 广州小鹏汽车科技有限公司 | 一种语音控制方法、智能座舱、服务器、车辆和介质 |
CN114442989A (zh) * | 2020-11-02 | 2022-05-06 | 海信视像科技股份有限公司 | 自然语言的解析方法及装置 |
CN112637264B (zh) * | 2020-11-23 | 2023-04-21 | 阿波罗智联(北京)科技有限公司 | 一种信息交互方法、装置、电子设备及存储介质 |
CN112614491B (zh) * | 2020-12-11 | 2024-03-08 | 广州橙行智动汽车科技有限公司 | 一种车载语音交互方法、装置、车辆、可读介质 |
CN112685535A (zh) * | 2020-12-25 | 2021-04-20 | 广州橙行智动汽车科技有限公司 | 语音交互方法、服务器、语音交互系统和存储介质 |
CN113076079A (zh) * | 2021-04-20 | 2021-07-06 | 广州小鹏汽车科技有限公司 | 语音控制方法、服务器、语音控制系统和存储介质 |
CN113053394B (zh) * | 2021-04-27 | 2024-01-09 | 广州小鹏汽车科技有限公司 | 语音处理方法、服务器、语音处理系统和存储介质 |
CN113421561B (zh) * | 2021-06-03 | 2024-01-09 | 广州小鹏汽车科技有限公司 | 语音控制方法、语音控制装置、服务器和存储介质 |
CN113253970B (zh) * | 2021-07-09 | 2021-10-12 | 广州小鹏汽车科技有限公司 | 语音交互方法及装置、语音交互系统、交通工具及介质 |
CN113472806B (zh) * | 2021-07-14 | 2022-11-22 | 斑马网络技术有限公司 | 保护隐私的语音交互方法、装置、系统、设备及存储介质 |
CN113450801A (zh) * | 2021-08-27 | 2021-09-28 | 广州小鹏汽车科技有限公司 | 语音交互方法、装置、系统、交通工具及介质 |
CN113990322B (zh) * | 2021-11-04 | 2023-10-31 | 广州小鹏汽车科技有限公司 | 语音交互方法、服务器、语音交互系统和介质 |
CN113971954B (zh) * | 2021-12-23 | 2022-07-12 | 广州小鹏汽车科技有限公司 | 语音交互方法及装置、车辆及存储介质 |
CN113990299B (zh) * | 2021-12-24 | 2022-05-13 | 广州小鹏汽车科技有限公司 | 语音交互方法及其装置、服务器和可读存储介质 |
CN115457951A (zh) * | 2022-05-10 | 2022-12-09 | 北京罗克维尔斯科技有限公司 | 一种语音控制方法、装置、电子设备以及存储介质 |
CN114913854A (zh) * | 2022-07-11 | 2022-08-16 | 广州小鹏汽车科技有限公司 | 语音交互方法、服务器和存储介质 |
CN115910063A (zh) * | 2022-12-01 | 2023-04-04 | 浙江极氪智能科技有限公司 | 语音交互方法、装置、计算机设备及计算机可读存储介质 |
CN116149596A (zh) * | 2023-03-13 | 2023-05-23 | 合众新能源汽车股份有限公司 | 车载语音交互方法、系统及计算机可读介质 |
Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20130080178A1 (en) * | 2011-09-26 | 2013-03-28 | Donghyun KANG | User interface method and device |
CN105070288A (zh) * | 2015-07-02 | 2015-11-18 | 百度在线网络技术(北京)有限公司 | 车载语音指令识别方法和装置 |
CN108279839A (zh) * | 2017-01-05 | 2018-07-13 | 阿里巴巴集团控股有限公司 | 基于语音的交互方法、装置、电子设备及操作系统 |
CN110211586A (zh) * | 2019-06-19 | 2019-09-06 | 广州小鹏汽车科技有限公司 | 语音交互方法、装置、车辆以及机器可读介质 |
CN110795175A (zh) * | 2018-08-02 | 2020-02-14 | Tcl集团股份有限公司 | 模拟控制智能终端的方法、装置及智能终端 |
CN111768777A (zh) * | 2020-06-28 | 2020-10-13 | 广州小鹏车联网科技有限公司 | 语音控制方法、信息处理方法、车辆和服务器 |
CN111768780A (zh) * | 2020-06-28 | 2020-10-13 | 广州小鹏车联网科技有限公司 | 语音控制方法、信息处理方法、车辆和服务器 |
CN111767021A (zh) * | 2020-06-28 | 2020-10-13 | 广州小鹏车联网科技有限公司 | 语音交互方法、车辆、服务器、系统和存储介质 |
Family Cites Families (16)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN100375006C (zh) * | 2006-01-19 | 2008-03-12 | 吉林大学 | 车辆导航装置语音控制系统 |
CN101217584B (zh) * | 2008-01-18 | 2011-04-13 | 同济大学 | 可用于汽车的语音命令控制系统 |
CN102566961A (zh) * | 2010-12-31 | 2012-07-11 | 上海博泰悦臻电子设备制造有限公司 | 基于车载设备的应用程序的语音执行方法及装置 |
CN103187055B (zh) * | 2011-12-28 | 2018-07-27 | 上海博泰悦臻电子设备制造有限公司 | 基于车载应用的数据处理系统 |
CN104536647B (zh) * | 2014-12-16 | 2018-03-13 | 广东欧珀移动通信有限公司 | 应用图标的位置调整方法及装置 |
CN106601232A (zh) * | 2017-01-04 | 2017-04-26 | 江西沃可视发展有限公司 | 一种基于语音识别的车载终端人机交互系统 |
US11150922B2 (en) * | 2017-04-25 | 2021-10-19 | Google Llc | Initializing a conversation with an automated agent via selectable graphical element |
CN107204185B (zh) * | 2017-05-03 | 2021-05-25 | 深圳车盒子科技有限公司 | 车载语音交互方法、系统及计算机可读存储介质 |
CN107608652B (zh) * | 2017-08-28 | 2020-05-22 | 三星电子(中国)研发中心 | 一种语音控制图形界面的方法和装置 |
CN108877791B (zh) * | 2018-05-23 | 2021-10-08 | 百度在线网络技术(北京)有限公司 | 基于视图的语音交互方法、装置、服务器、终端和介质 |
US11037556B2 (en) * | 2018-07-17 | 2021-06-15 | Ford Global Technologies, Llc | Speech recognition for vehicle voice commands |
CN111312233A (zh) * | 2018-12-11 | 2020-06-19 | 阿里巴巴集团控股有限公司 | 一种语音数据的识别方法、装置及系统 |
CN110211584A (zh) * | 2019-06-04 | 2019-09-06 | 广州小鹏汽车科技有限公司 | 车辆控制方法、装置、存储介质及控制终端 |
CN110728982A (zh) * | 2019-10-11 | 2020-01-24 | 上海博泰悦臻电子设备制造有限公司 | 基于语音触屏的信息交互方法、系统、存储介质、车载终端 |
CN111002996B (zh) * | 2019-12-10 | 2023-08-25 | 广州小鹏汽车科技有限公司 | 车载语音交互方法、服务器、车辆和存储介质 |
CN111477224A (zh) * | 2020-03-23 | 2020-07-31 | 一汽奔腾轿车有限公司 | 一种人车虚拟交互系统 |
-
2020
- 2020-06-28 CN CN202010596817.5A patent/CN111767021A/zh active Pending
- 2020-12-10 WO PCT/CN2020/135150 patent/WO2022001013A1/zh active Application Filing
-
2021
- 2021-04-21 CN CN202110432528.6A patent/CN113031905A/zh active Pending
Patent Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20130080178A1 (en) * | 2011-09-26 | 2013-03-28 | Donghyun KANG | User interface method and device |
CN105070288A (zh) * | 2015-07-02 | 2015-11-18 | 百度在线网络技术(北京)有限公司 | 车载语音指令识别方法和装置 |
CN108279839A (zh) * | 2017-01-05 | 2018-07-13 | 阿里巴巴集团控股有限公司 | 基于语音的交互方法、装置、电子设备及操作系统 |
CN110795175A (zh) * | 2018-08-02 | 2020-02-14 | Tcl集团股份有限公司 | 模拟控制智能终端的方法、装置及智能终端 |
CN110211586A (zh) * | 2019-06-19 | 2019-09-06 | 广州小鹏汽车科技有限公司 | 语音交互方法、装置、车辆以及机器可读介质 |
CN111768777A (zh) * | 2020-06-28 | 2020-10-13 | 广州小鹏车联网科技有限公司 | 语音控制方法、信息处理方法、车辆和服务器 |
CN111768780A (zh) * | 2020-06-28 | 2020-10-13 | 广州小鹏车联网科技有限公司 | 语音控制方法、信息处理方法、车辆和服务器 |
CN111767021A (zh) * | 2020-06-28 | 2020-10-13 | 广州小鹏车联网科技有限公司 | 语音交互方法、车辆、服务器、系统和存储介质 |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114842839A (zh) * | 2022-04-08 | 2022-08-02 | 北京百度网讯科技有限公司 | 车载人机交互方法、装置、设备、存储介质及程序产品 |
CN114842847A (zh) * | 2022-04-27 | 2022-08-02 | 中国第一汽车股份有限公司 | 一种车载用语音控制方法以及装置 |
Also Published As
Publication number | Publication date |
---|---|
CN111767021A (zh) | 2020-10-13 |
CN113031905A (zh) | 2021-06-25 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2022001013A1 (zh) | 语音交互方法、车辆、服务器、系统和存储介质 | |
US10489112B1 (en) | Method for user training of information dialogue system | |
JP6588637B2 (ja) | 個別化されたエンティティ発音の学習 | |
US9865264B2 (en) | Selective speech recognition for chat and digital personal assistant systems | |
US20190325866A1 (en) | Systems and Methods for Enhancing Responsiveness to Utterances Having Detectable Emotion | |
US10922322B2 (en) | Systems and methods for speech-based searching of content repositories | |
WO2017200575A1 (en) | Automatically augmenting message exchange threads based on message classification | |
US10622007B2 (en) | Systems and methods for enhancing responsiveness to utterances having detectable emotion | |
US11621001B2 (en) | Systems and methods for enhancing responsiveness to utterances having detectable emotion | |
CN110415679A (zh) | 语音纠错方法、装置、设备和存储介质 | |
EP3831636A2 (en) | Method and apparatus for regulating user emotion, device, and readable storage medium | |
US10566010B2 (en) | Systems and methods for enhancing responsiveness to utterances having detectable emotion | |
CN113239178A (zh) | 意图生成方法、服务器、语音控制系统和可读存储介质 | |
CN110767219B (zh) | 语义更新方法、装置、服务器和存储介质 | |
CN110020429B (zh) | 语义识别方法及设备 | |
WO2022245395A1 (en) | Voice commands for an automated assistant utilized in smart dictation | |
CN113779300B (zh) | 语音输入引导方法、装置和车机 | |
CN112639796A (zh) | 具有音频反馈和词完成的多字符文本输入系统 | |
JP5818753B2 (ja) | 音声対話システム及び音声対話方法 | |
CN111985248B (zh) | 一种信息交互方法以及装置 | |
CN107967308B (zh) | 一种智能交互的处理方法、装置、设备和计算机存储介质 | |
CN111368099B (zh) | 核心信息语义图谱生成方法及装置 | |
KR20240103748A (ko) | 챗봇 서비스 제공 방법 및 챗봇 서비스 제공 시스템 | |
CN116198637A (zh) | 一种控制方法、装置、设备及存储介质 | |
CN117854494A (zh) | 信息输入方法、装置、电子设备及车辆 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 20942992 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
32PN | Ep: public notification in the ep bulletin as address of the adressee cannot be established |
Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 05.06.2023) |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 20942992 Country of ref document: EP Kind code of ref document: A1 |