CN118038865A

CN118038865A - Voice interaction method, device, vehicle and storage medium

Info

Publication number: CN118038865A
Application number: CN202311872708.1A
Authority: CN
Inventors: 赛影辉; 叶德英; 吴正飞; 阴山慧; 孙亚红; 吴倩倩
Original assignee: Chery Automobile Co Ltd
Current assignee: Chery Automobile Co Ltd
Priority date: 2023-12-29
Filing date: 2023-12-29
Publication date: 2024-05-14

Abstract

The application discloses a voice interaction method, a voice interaction device, a vehicle and a storage medium, and belongs to the field of intelligent interaction. The method comprises the following steps: receiving a first voice instruction; identifying at least one first node keyword in the first voice instruction; determining a keyword function tree corresponding to each first node keyword from at least one keyword function tree included in a keyword function database based on the at least one first node keyword; determining a first target bifurcation node and at least one vehicle-mounted function corresponding to the first target bifurcation node from the first keyword function tree based on the at least one first node keyword under the condition that the keyword function tree corresponding to the at least one first node keyword is the same; and executing at least one vehicle-mounted function corresponding to the first target bifurcation node. The application can realize the execution of the vehicle-mounted function through single voice so as to improve the voice interaction efficiency.

Description

Voice interaction method, device, vehicle and storage medium

Technical Field

The present application relates to the field of intelligent interaction, and in particular, to a method and apparatus for voice interaction, a vehicle, and a storage medium.

Background

With the rapid development of vehicle technology, the kinds of functions (i.e., vehicle-mounted functions) carried by a vehicle body are becoming more and more intelligent, diversified, and complex. In order to improve the execution efficiency of the vehicle-mounted function, the vehicle generally supports a voice interaction function, and a user can control the on-off of the vehicle-mounted function in a voice interaction mode, so that a great amount of control time can be saved in the voice interaction mode compared with the conventional key control mode.

With more and more vehicle-mounted functions, in order to meet various use requirements in part of application scenes, a user may need to perform multi-round voice interaction to control the on/off of a certain vehicle-mounted function. However, frequent voice interaction not only reduces the user experience, but also disperses the user's attention during driving, creating a safety hazard. Thus, current voice interactions are less efficient.

Disclosure of Invention

The application provides a voice interaction method, a voice interaction device, a vehicle and a storage medium, which can improve voice interaction efficiency. The technical scheme is as follows:

In one aspect, a voice interaction method is provided, the method comprising:

Receiving a first voice instruction;

identifying at least one first node keyword in the first voice instruction;

determining a keyword function tree corresponding to each first node keyword from at least one keyword function tree included in a keyword function database based on the at least one first node keyword;

Each keyword function tree comprises a plurality of bifurcation nodes, each bifurcation node corresponds to a node keyword and at least one vehicle-mounted function, and the node keywords corresponding to the bifurcation nodes are different;

determining a first target bifurcation node and at least one vehicle-mounted function corresponding to the first target bifurcation node from a first keyword function tree based on the at least one first node keyword under the condition that keyword function trees corresponding to the at least one first node keyword are the same, wherein the first keyword function tree is a keyword function tree corresponding to the at least one first node keyword;

and executing at least one vehicle-mounted function corresponding to the first target bifurcation node.

Optionally, the determining, based on the at least one first node keyword, a first target bifurcation node from a first keyword function tree includes:

Determining a first bifurcation node corresponding to each first node keyword from the first keyword function tree based on the at least one first node keyword to obtain at least one first bifurcation node;

and determining a first bifurcation node with the largest depth in the at least one first bifurcation node as the first target bifurcation node in the case that the at least one first bifurcation node is located on the same path in the first keyword function tree.

Optionally, the method further comprises:

And sending alarm information and ending voice interaction under the condition that the keyword function trees corresponding to the at least one first node keyword are different or the at least one first bifurcation node is located in different paths in the first keyword function tree.

Optionally, after the executing the at least one vehicle-mounted function corresponding to the first target bifurcation node, the method further comprises:

Receiving a second voice instruction;

Identifying at least one second node keyword in the second voice instruction;

Determining a keyword function tree corresponding to each second node keyword from the at least one keyword function tree based on the at least one second node keyword;

Determining a second target bifurcation node and at least one vehicle-mounted function corresponding to the second target bifurcation node from a second keyword function tree based on the at least one second node keyword under the condition that keyword function trees corresponding to the at least one second node keyword are the same, wherein the second keyword function tree is a keyword function tree corresponding to the at least one second node keyword;

If the first keyword function tree is the same as the second keyword function tree, and the second target bifurcation node and the first target bifurcation node are located on the same path, or the second target bifurcation node and the first target bifurcation node belong to the same father node, closing at least one vehicle-mounted function corresponding to the first target bifurcation node, and executing at least one vehicle-mounted function corresponding to the second target bifurcation node.

Optionally, the method further comprises:

If the first keyword function tree is different from the second keyword function tree, or if the first keyword function tree is the same as the second keyword function tree, but the second target bifurcation node and the first target bifurcation node are located in different paths and the second target bifurcation node and the first target bifurcation node are not affiliated to the same father node, determining whether there is an execution conflict between at least one vehicle-mounted function corresponding to the second target bifurcation node and at least one vehicle-mounted function corresponding to the first target bifurcation node;

If at least one vehicle-mounted function corresponding to the second target bifurcation node does not have execution conflict with at least one vehicle-mounted function corresponding to the first target bifurcation node, executing at least one vehicle-mounted function corresponding to the second target bifurcation node;

If at least one vehicle-mounted function corresponding to the second target bifurcation node has execution conflict with at least one vehicle-mounted function corresponding to the first target bifurcation node, closing at least one vehicle-mounted function corresponding to the first target bifurcation node, and executing at least one vehicle-mounted function corresponding to the second target bifurcation node.

Optionally, before the receiving the first voice instruction, the method further includes:

displaying a function setting interface, wherein the function setting interface is used for prompting a user to set node keywords corresponding to each bifurcation node in the at least one keyword function tree and vehicle-mounted functions;

and acquiring node keywords and vehicle-mounted functions corresponding to each bifurcation node in the at least one keyword function tree from the function setting interface.

In another aspect, a voice interaction apparatus is provided, the apparatus comprising:

the voice receiving module is used for receiving a first voice instruction;

The keyword recognition module is used for recognizing at least one first node keyword in the first voice instruction;

the function tree determining module is used for determining a keyword function tree corresponding to each first node keyword from at least one keyword function tree included in the keyword function database based on the at least one first node keyword;

The node function determining module is used for determining a first target bifurcation node and at least one vehicle-mounted function corresponding to the first target bifurcation node from a first keyword function tree based on the at least one first node keyword under the condition that the keyword function trees corresponding to the at least one first node keyword are the same, wherein the first keyword function tree is a keyword function tree corresponding to the at least one first node keyword;

And the function execution module is used for executing at least one vehicle-mounted function corresponding to the first target bifurcation node.

Optionally, the node function determining module is specifically configured to:

Optionally, the apparatus further comprises: an alarm module;

The alarm module is configured to send alarm information and end voice interaction when the keyword function trees corresponding to the at least one first node keyword are different, or the at least one first bifurcation node is located on a different path in the first keyword function tree.

Optionally, the voice receiving module is further configured to receive a second voice instruction;

The keyword recognition module is further used for recognizing at least one second node keyword in the second voice instruction;

The function tree determining module is further configured to determine, from the at least one keyword function tree, a keyword function tree corresponding to each second node keyword based on the at least one second node keyword;

The node function determining module is further configured to determine, based on the at least one second node keyword, a second target bifurcation node and at least one vehicle-mounted function corresponding to the second target bifurcation node from a second keyword function tree, where the second keyword function tree is a keyword function tree corresponding to the at least one second node keyword, where the keyword function tree is the same keyword function tree corresponding to the at least one second node keyword;

The function execution module is further configured to close at least one vehicle-mounted function corresponding to the first target bifurcation node and execute at least one vehicle-mounted function corresponding to the second target bifurcation node if the first keyword function tree is the same as the second keyword function tree, the second target bifurcation node and the first target bifurcation node are located on the same path, or the second target bifurcation node and the first target bifurcation node belong to the same parent node.

Optionally, the function execution module is specifically configured to:

Optionally, the apparatus further comprises: the function setting module is used for:

In another aspect, a vehicle is provided, the vehicle including a memory for storing a computer program and a processor for executing the computer program stored on the memory to implement the steps of the voice interaction method described above.

In another aspect, a computer readable storage medium is provided, in which a computer program is stored, which when executed by a processor, implements the steps of the voice interaction method described above.

In another aspect, a computer program product is provided comprising instructions which, when run on a computer, cause the computer to perform the steps of the voice interaction method described above.

The technical scheme provided by the application has at least the following beneficial effects:

Because each keyword function tree in the keyword function database comprises a plurality of bifurcation nodes, each bifurcation node corresponds to a node keyword and at least one vehicle-mounted function, under the condition that a first voice command is received, based on the first node keyword in the first voice command, a first target bifurcation node can be determined from the keyword function tree contained in the keyword function database, and then the vehicle-mounted function corresponding to the first target bifurcation node is executed, so that the effect of executing the vehicle-mounted function can be realized through single voice without frequent voice interaction, the voice interaction efficiency is improved, and the use experience and the driving safety of a user are improved.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings required for the description of the embodiments will be briefly described below, and it is apparent that the drawings in the following description are only some embodiments of the present application, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a schematic illustration of an implementation environment provided by an embodiment of the present application;

FIG. 2 is a flowchart of a voice interaction method according to an embodiment of the present application;

FIG. 3 is a schematic diagram of a keyword function tree provided by an embodiment of the present application;

Fig. 4 is a schematic structural diagram of a voice interaction device according to an embodiment of the present application;

fig. 5 is a schematic structural diagram of a vehicle according to an embodiment of the present application.

Detailed Description

For the purpose of making the objects, technical solutions and advantages of the embodiments of the present application more apparent, the following detailed description of the embodiments of the present application will be given with reference to the accompanying drawings.

Before explaining the voice interaction method provided by the embodiment of the application in detail, an implementation environment related to the embodiment of the application is introduced.

Referring to fig. 1, fig. 1 is a schematic diagram illustrating an implementation environment according to an exemplary embodiment. The implementation environment includes a voice interaction terminal 101, a processor 102 and at least one functional module 103, the processor 102 being communicatively coupled to the voice interaction terminal 101 and the functional module 103, respectively. The communication connection may be a wired or wireless connection, which is not limited by the embodiments of the present application.

The voice interaction terminal 101 is used to implement voice interaction with a user. By way of example, the voice interaction terminal 101 may comprise a microphone through which the reception of voice instructions is achieved.

In some embodiments, the voice interaction terminal 101 may further include a speaker, so as to implement transmission of the function execution result and the alarm information through the speaker.

The processor 102 is configured to execute corresponding vehicle-mounted functions through the functional module 103 based on the voice command. For example, the processor 102 may include a keyword function database, and determine, through the keyword function database, a vehicle-mounted function corresponding to the voice instruction, for example, determine, based on the keyword function database, a target bifurcation node corresponding to a node keyword in the voice instruction, and further determine, as the vehicle-mounted function corresponding to the voice instruction, the vehicle-mounted function corresponding to the target bifurcation node, and further execute, based on the function module 103 corresponding to the vehicle-mounted function, the vehicle-mounted function.

The processor 102 may be a general purpose CPU (Central Processing Unit ), NP (Network Processor, network processor), microprocessor, or may be one or more integrated circuits for implementing aspects of the present application, such as an ASIC (Application-SPECIFIC INTEGRATED Circuit), PLD (Programmable Logic Device ), or a combination thereof. The PLD may be a CPLD (Complex Programmable Logic Device ), an FPGA (Field-Programmable GATE ARRAY, field-Programmable gate array), a GAL (GENERIC ARRAY Logic, general-purpose array Logic), or any combination thereof.

The function module 103 is used for realizing corresponding functions. In the case of different application scenarios in the embodiment of the present application, the functional module 103 may be a functional module on a different device. For example, in the case where the embodiment of the present application is applied to a vehicle voice interaction, the function module 103 may be an execution module of a vehicle-mounted function of a vehicle, so as to implement different vehicle-mounted functions through the execution module.

The functional module 103 may include a plurality to implement different functions by different functional modules. Still taking the above voice interaction applied to the vehicle as an example, the functional module 103 may include an air conditioning fan of the vehicle, for implementing on-board and off-board air conditioning; the functional module 103 may further include a wiper controller of the vehicle for opening and closing the wiper.

It should be understood by those skilled in the art that the above-mentioned voice interaction terminal 101, processor 102 and functional module 103 are merely examples, and other voice interaction terminals, processors or functional modules that may be present in the present application or may be present in the future are also included in the scope of the present application, and are incorporated herein by reference.

It should be noted that, the application scenario and the implementation environment described in the embodiment of the present application are for more clearly describing the technical solution of the embodiment of the present application, and do not constitute a limitation on the technical solution provided in the embodiment of the present application, and those skilled in the art can know that, with the appearance of a new application scenario and the evolution of the implementation environment, the technical solution provided in the embodiment of the present application is equally applicable to similar technical problems.

The voice interaction method provided by the embodiment of the application is explained in detail below.

Fig. 2 is a flowchart of a voice interaction method according to an embodiment of the present application, where the method is applied to the processor 102. Referring to fig. 2, the method includes the following steps.

Step 201: a first voice command is received.

The first voice command may be issued by the user, and in some embodiments, the processor may be in a normally open state to receive the first voice command issued by the user in real time; in other embodiments, in order to reduce the power consumption of the processor, and considering that the user only has a voice intention to turn on a specific function under a specific scenario, the working scenario for receiving the voice command sent by the user may be limited, so that the processor is in a sleep state under a non-interactive condition, that is, does not accept the voice command sent by the user, and receives the voice command sent by the user under an interactive condition.

For example, the processor may monitor only the instruction for entering the interaction condition in the sleep state, and enter the voice interaction condition when receiving the opening instruction of the user (i.e., the instruction for entering the interaction condition), so as to start receiving the voice instruction sent by the user.

The opening instruction may be a key instruction or a voice instruction, for example, by pressing a specific key, entering a voice interaction working condition, or by sending a voice (such as "voice assistant", "start voice interaction", etc.) containing a specific keyword, entering a voice interaction working condition, etc., and the opening mode of the specific voice interaction working condition may be selected based on the actual use requirement, which is not limited in the embodiment of the present application.

In some embodiments, when the first voice command issued by the user is not received continuously after the start command is received, and the duration of the first voice command is greater than or equal to a duration threshold, the processor reenters the sleep state, where the duration threshold may be determined based on actual use requirements, such as 3 seconds, 5 seconds, and so on.

Step 202: at least one first node keyword in the first voice instruction is identified.

In some embodiments, the first voice instruction may be converted into a text instruction, and then based on text analysis, semantic segmentation and feature extraction are performed on the text instruction, thereby determining at least one first node keyword in the voice instruction; in other embodiments, the at least one first node keyword in the first voice instruction may also be determined directly based on semantic analysis and feature extraction of the voice.

For example, the voice feature information in the first voice command may be obtained based on feature extraction, and then the node keywords matched with the voice feature information may be determined based on the feature matching model, so as to obtain at least one first node keyword in the first voice command.

Step 203: based on the at least one first node keyword, determining a keyword function tree corresponding to each first node keyword from at least one keyword function tree included in the keyword function database.

Each keyword function tree comprises a plurality of bifurcation nodes, each bifurcation node corresponds to a node keyword and at least one vehicle-mounted function, and the node keywords corresponding to the bifurcation nodes are different.

In some embodiments, since the node keywords corresponding to the bifurcation nodes of each keyword function tree are different, the keyword function tree to which each bifurcation node corresponding to each first node keyword belongs may be determined by traversing the node keywords corresponding to each bifurcation node in the keyword function tree, so as to obtain the keyword function tree corresponding to each first node keyword.

The keyword function tree may be a tree structure as shown in fig. 3, where letters in the figure may be understood as different node keywords, and numbers in the figure may be understood as different vehicle function combinations, each vehicle function combination including at least one vehicle function. And then determining whether the keyword function tree is a keyword function tree corresponding to a certain first node keyword by traversing all bifurcation nodes in the keyword function tree.

For example, taking the keyword function tree shown in fig. 3 as an example, if the first node keyword in the first voice instruction is C, determining that the keyword function tree includes a bifurcation node corresponding to the node keyword C by traversing bifurcation nodes in the keyword function tree, and determining that the keyword function tree is a keyword function tree corresponding to the first node keyword C; if the first node keyword in the first voice instruction is Z, determining that the keyword function tree does not comprise a bifurcation node with the corresponding node keyword Z by traversing bifurcation nodes in the keyword function tree, and considering that the keyword function tree is not the keyword function tree corresponding to the first node keyword Z, and continuing traversing the next keyword function tree.

In some embodiments, for any first node keyword, in order to improve the determination efficiency of determining the keyword function tree corresponding to the first node keyword, a plurality of keyword function trees may be traversed at the same time, and if a certain keyword function tree includes a bifurcation node where the corresponding node keyword is the first node keyword, the traversing may be ended.

In some embodiments, after determining the keyword function tree corresponding to each first node keyword through the steps above, if the keyword function tree corresponding to the at least one first node keyword is the same, executing the subsequent steps; if the keyword function trees corresponding to the at least one first node keyword are different, the fact that the function intention indicated by the first voice instruction is unclear and the vehicle-mounted function cannot be executed is indicated, and the warning information needs to be sent and voice interaction is ended.

It should be noted that, in the case that the first voice command includes a first node keyword, the keyword function tree corresponding to the at least one first node keyword may be considered to be the same.

Step 204: and under the condition that the keyword function trees corresponding to the at least one first node keyword are identical, determining a first target bifurcation node and at least one vehicle-mounted function corresponding to the first target bifurcation node from the first keyword function tree based on the at least one first node keyword, wherein the first keyword function tree is the keyword function tree corresponding to the at least one first node keyword.

In some embodiments, in the case that only one first node keyword is included in the first voice instruction, a corresponding bifurcation node of the first node keyword in the first keyword function tree may be directly determined as the first target bifurcation node.

In some embodiments, in combination with specific usage requirements, a determination manner of the first target bifurcation node is determined based on at least one first node keyword in the first voice command when the first voice command includes a plurality of first node keywords.

For example, if there are a plurality of first node keywords in the first voice command, the last first node keyword may be determined to be a first target node keyword based on the occurrence sequence of the plurality of first node keywords in the first voice command, and further, based on the first keyword function tree, the bifurcation node corresponding to the first target node keyword may be determined to be a first target bifurcation node.

In another example, a priority may be set for each node keyword, where in the case that there are a plurality of first node keywords in the first voice instruction, a node keyword with a highest priority is determined to be a first target node keyword, and further, based on the first keyword function tree, a bifurcation node corresponding to the first target node keyword is determined to be a first target bifurcation node.

In some embodiments, based on the at least one first node keyword, determining a first bifurcation node corresponding to each first node keyword from a first keyword function tree, to obtain at least one first bifurcation node; and determining a first bifurcation node with the largest depth in the at least one first bifurcation node as the first target bifurcation node in the case that the at least one first bifurcation node is located on the same path in the first keyword function tree.

In some embodiments, the logical relationship between the on-board functions corresponding to different bifurcation nodes may be expressed by a tree structure of the keyword function tree, for example, the on-board function corresponding to the child bifurcation node is a lower-level function (or a finer-level function) of the on-board function corresponding to the parent bifurcation node.

For example, the vehicle-mounted function corresponding to the parent node is to close the air conditioner, the vehicle-mounted function corresponding to the child node is to close the air conditioner and the driving position window is opened by 1/4, the vehicle-mounted function corresponding to one child node of the child node is to close the air conditioner and all the position windows are opened by 1/3, and the vehicle-mounted function corresponding to the other child node of the child node is to close the air conditioner and the driving position window is opened by 1/4 and play music.

Therefore, when the logical relationship between the in-vehicle functions corresponding to the bifurcation nodes is expressed by the tree structure of the keyword function tree, in the case that the plurality of first bifurcation nodes are bifurcation nodes with different depths in the same path in the first keyword function tree, since the first bifurcation node with the largest depth generally has more abundant bearing information (the corresponding in-vehicle function is more and finer), the first bifurcation node with the largest depth can be determined as the first target bifurcation node.

In other embodiments, the first bifurcation node with the smallest depth in the at least one first bifurcation node may also be determined as the first target bifurcation node based on actual usage requirements.

In some embodiments, in a case that the keyword function tree corresponding to the at least one first node keyword is different, or the at least one first bifurcation node is located on a different path in the first keyword function tree, alarm information is sent and voice interaction is ended.

It should be noted that, when the keyword function trees corresponding to the plurality of first node keywords in the first voice command are different, or the corresponding keyword function trees are the same but the bifurcation nodes corresponding to the plurality of node keywords are located on different paths in the keyword function tree, the vehicle functions corresponding to the plurality of bifurcation nodes are not generally logically associated at this time, so that it can be considered that the function execution intention corresponding to the voice command cannot be determined. In order to misjudge the execution intention of the function and execute the wrong vehicle-mounted function, the interaction experience of the user is further affected, at the moment, any vehicle-mounted function can not be executed, an alarm message is sent to prompt that the current voice command cannot be executed, and voice interaction is ended.

The alarm information can be set in combination with actual use requirements, such as 'I don't hear, please say again, 'currently unable to be executed, please resend voice', etc.

In some embodiments, in combination with the different receiving manners of the first voice command by the processor, if the processor enters the voice interaction working condition when receiving the opening command of the user, the processor may not end the voice interaction after sending the alarm information, and continuously monitor the voice command sent by the user until the voice command is not received within the duration threshold, and then enter the sleep state.

Step 205: and executing at least one vehicle-mounted function corresponding to the first target bifurcation node.

In some embodiments, considering that the scenes of executing the vehicle-mounted functions are rich, in order to avoid execution conflicts between the vehicle-mounted functions, when executing the vehicle-mounted function corresponding to the first target bifurcation node, it is further required to determine whether there is an execution conflict between the vehicle-mounted function corresponding to the first target bifurcation node and the vehicle-mounted function currently executed by the vehicle. Under the condition that at least one vehicle-mounted function corresponding to a first target bifurcation node conflicts with a vehicle-mounted function currently executed by a vehicle, the vehicle-mounted function corresponding to the first target bifurcation node is the function intention of a current real-time voice instruction, and the vehicle-mounted function currently executed needs to be closed and the vehicle-mounted function corresponding to the first target bifurcation node is executed.

In some embodiments, a second voice instruction may also be received; identifying at least one second node keyword in the second voice instruction; determining a keyword function tree corresponding to each second node keyword from the at least one keyword function tree based on the at least one second node keyword; determining a second target bifurcation node and at least one vehicle-mounted function corresponding to the second target bifurcation node from a second keyword function tree based on the at least one second node keyword under the condition that the keyword function trees corresponding to the at least one second node keyword are the same, wherein the second keyword function tree is the keyword function tree corresponding to the at least one second node keyword; if the first keyword function tree is the same as the second keyword function tree, and the second target bifurcation node and the first target bifurcation node are located on the same path, or the second target bifurcation node and the first target bifurcation node belong to the same father node, closing at least one vehicle-mounted function corresponding to the first target bifurcation node, and executing at least one vehicle-mounted function corresponding to the second target bifurcation node.

It should be noted that, the recognition manner of recognizing at least one second node keyword in the second voice command may refer to the recognition manner of recognizing at least one first node keyword in the first voice command in the step 202; similarly, the determining manner of the keyword function tree corresponding to each second node keyword may be determined based on the second node keywords, and the determining manner of the keyword function tree corresponding to the first node keyword may be determined based on the first node keyword in step 203; the determination manner of the second target bifurcation node based on the second node keyword determined from the second keyword function tree may also refer to the corresponding description at step 204, which is not repeated herein.

In some embodiments, for the second node keywords in the second voice instruction, it is still necessary to ensure that the keyword function tree corresponding to all the second node keywords in the second voice instruction is the same keyword function tree, and the bifurcation nodes corresponding to all the second node keywords are all located on the same path in the second keyword function tree. If the keyword function tree corresponding to the certain second node keyword is a different keyword function tree or the bifurcation node corresponding to the certain second node keyword is located on a different path of the keyword function tree, it indicates that the function intention indicated by the current second voice instruction is unclear and the vehicle-mounted function cannot be executed, then the alarm information needs to be sent and voice interaction is ended.

It should be noted that, when the second target bifurcation node and the first target bifurcation node are located on the same path or belong to the same parent node, the second target bifurcation node and the first target bifurcation node have a strong logic association in the keyword function tree, so that the function intention of the second voice instruction can be considered as follows: switching the vehicle-mounted function corresponding to the first target bifurcation node which is executed currently to the vehicle-mounted function corresponding to the second target bifurcation node, namely: and closing the vehicle-mounted function corresponding to the first target bifurcation node, and executing the vehicle-mounted function corresponding to the second target bifurcation node.

In some embodiments, the at least one vehicle-mounted function corresponding to the first target bifurcation node may also be a function started in a conventional manner, such as a function started based on a key instruction and a conventional voice instruction. If the currently started vehicle-mounted function of the vehicle is at least one vehicle-mounted function corresponding to a third bifurcation node in the keyword function tree, the third bifurcation node can be considered to be the first target bifurcation node, and further, if the third bifurcation node and the second target bifurcation node are located on the same path of the same keyword function tree or belong to the same father node, the at least one vehicle-mounted function corresponding to the third bifurcation node can be closed, and the at least one vehicle-mounted function corresponding to the second target bifurcation node can be executed.

Illustratively, the on-board functions that the vehicle has currently turned on include: a1, A2, A3, A4, A5 and A6, determining that the vehicle-mounted function corresponding to one bifurcation node in a certain keyword function tree is A3 and A6 through traversing the keyword database, determining the bifurcation node as a third bifurcation node, if a second voice command is received, and if the second target bifurcation node determined based on the second voice command and the third bifurcation node are located on the same path of the same keyword function tree or belong to the same father node, closing at least one vehicle-mounted function corresponding to the third bifurcation node (namely closing the vehicle-mounted functions A3 and A6), and executing at least one vehicle-mounted function corresponding to the second target bifurcation node.

In some embodiments, if the first keyword function tree is different from the second keyword function tree, or if the first keyword function tree is the same as the second keyword function tree, but the second target bifurcation node is located on a different path from the first target bifurcation node and the second target bifurcation node is not affiliated to the same parent node, determining whether there is an execution conflict between at least one vehicle-mounted function corresponding to the second target bifurcation node and at least one vehicle-mounted function corresponding to the first target bifurcation node; if the at least one vehicle-mounted function corresponding to the second target bifurcation node does not have execution conflict with the at least one vehicle-mounted function corresponding to the first target bifurcation node, executing the at least one vehicle-mounted function corresponding to the second target bifurcation node; if at least one vehicle-mounted function corresponding to the second target bifurcation node has execution conflict with at least one vehicle-mounted function corresponding to the first target bifurcation node, closing at least one vehicle-mounted function corresponding to the first target bifurcation node, and executing at least one vehicle-mounted function corresponding to the second target bifurcation node.

It should be noted that, when the first keyword function tree is different from the second keyword function tree, or the first keyword function tree is the same as the second keyword function tree, but the second target bifurcation node and the first target bifurcation node are neither located on the same path nor belong to the same parent node, the second target bifurcation node and the first target bifurcation node generally have no logical association or have weaker logical association at this time, so the function intention of the second voice instruction may be considered as: and executing the vehicle-mounted function corresponding to the second target bifurcation node.

At this time, in order to avoid execution conflict of the vehicle-mounted functions, it is necessary to further determine whether the vehicle-mounted function corresponding to the second target bifurcation node conflicts with the vehicle-mounted function corresponding to the first target bifurcation node, and if there is no execution conflict, the vehicle-mounted function corresponding to the second target bifurcation node may be directly executed; if there is an execution conflict, since the vehicle-mounted function corresponding to the second target bifurcation node is the function intention of the current real-time voice command, that is, the latest function intention, it is necessary to close the executed vehicle-mounted function and execute the vehicle-mounted function corresponding to the second target bifurcation node.

In some embodiments, if there are multiple vehicle-mounted functions corresponding to the first target bifurcation node, in the case that there is an execution conflict between a certain vehicle-mounted function of the multiple vehicle-mounted functions and a vehicle-mounted function corresponding to the second target bifurcation node, only the certain vehicle-mounted function may be closed, or all the vehicle-mounted functions corresponding to the first target bifurcation node may be closed, which may be specifically and flexibly set in combination with actual use requirements.

In some embodiments, when executing at least one vehicle-mounted function corresponding to the second target bifurcation node, it is further required to determine whether at least one vehicle-mounted function corresponding to the second target bifurcation node conflicts with all vehicle-mounted functions currently executed by the vehicle, if a certain vehicle-mounted function corresponding to the second target bifurcation node conflicts with a certain vehicle-mounted function currently executed by the vehicle, the vehicle-mounted function currently executed by the vehicle is closed, and the vehicle-mounted function corresponding to the second target bifurcation node is opened.

In some embodiments, a function setting interface may be displayed, where the function setting interface is configured to prompt a user to set a node keyword corresponding to each bifurcation node in the at least one keyword function tree and a vehicle-mounted function; and acquiring node keywords and vehicle-mounted functions corresponding to each bifurcation node in the at least one keyword function tree from the function setting interface.

In order to ensure that the vehicle-mounted function corresponding to the bifurcation node can be accurately executed, when there are a plurality of vehicle-mounted functions corresponding to a bifurcation node, it is necessary to ensure that there is no execution conflict among the plurality of vehicle-mounted functions corresponding to the bifurcation node.

Therefore, in some embodiments, when the user sets the vehicle-mounted function corresponding to each bifurcation node in the keyword function tree, collision detection may be performed on the vehicle-mounted function corresponding to the bifurcation node, and if there is an execution collision between a plurality of vehicle-mounted functions corresponding to a bifurcation node currently set by the user, alarm information is sent to prompt the user to modify the vehicle-mounted function corresponding to the bifurcation node and having the execution collision.

Similarly, in some embodiments, in order to avoid that the repeated use of the node keywords results in that the function execution intention of the voice instruction cannot be accurately identified, when the user sets the node keywords corresponding to each bifurcation node in the keyword function tree, conflict detection may be performed on the node keywords, and if the node keywords currently set by the user are repeated with any node keyword in the keyword function tree, alarm information is sent to prompt the user to modify the node keywords.

In some embodiments, the user may also modify, through the function setting interface, a node keyword and a vehicle-mounted function corresponding to any bifurcation node in the keyword function tree, e.g., delete a bifurcation node in the keyword function tree, move a bifurcation node in the keyword function tree, modify a node keyword and/or a vehicle-mounted function corresponding to a bifurcation node in the keyword function tree, etc.

In the embodiment of the application, when the first voice command is received, the first node keyword in the first voice command is identified, and the first target bifurcation node and the vehicle-mounted function corresponding to the first target bifurcation node are determined from the keyword function tree included in the keyword function database based on the first node keyword, so that the vehicle-mounted function is executed, and therefore, when the vehicle-mounted function is executed, the function execution can be realized based on single voice interaction without multiple voice interactions, thereby improving the voice interaction efficiency when the vehicle-mounted function is executed, and improving the user experience and the driving safety. In addition, considering randomness of the voice command, in order to ensure that the corresponding vehicle-mounted function can be accurately executed based on the first voice command, when a plurality of first node keywords are present in the first voice command, if the plurality of first node keywords correspond to different keyword function trees or the first bifurcation nodes corresponding to the plurality of first node keywords are located on different paths of the keyword function tree, as the plurality of first node keywords have no logic relationship, the first target bifurcation nodes are not determined any more, so that the user experience is reduced due to the erroneous execution of the vehicle-mounted function, and the user experience is further improved.

In addition, considering that under different usage scenarios, the function execution intention of the user may change and a voice command (i.e. a second voice command) may be sent again, in the embodiment of the present application, when the second voice command is received, a second target bifurcation node and a vehicle-mounted function corresponding to the second target bifurcation node may be determined from a keyword function tree included in a keyword function database through a second node keyword in the second voice command, and the vehicle-mounted function corresponding to the second target bifurcation node is executed. And considering that a certain logic relationship may exist between the second voice command and the first voice command, and that an execution conflict relationship may exist between the second voice command and the vehicle-mounted function corresponding to the function execution intention of the first voice command, determining whether the logic relationship exists between the first target bifurcation node and the second target bifurcation node through the position relationship between the second target bifurcation node and the first target bifurcation node in the keyword function tree, determining whether the execution conflict exists between the vehicle-mounted function corresponding to the first target bifurcation node and the vehicle-mounted function corresponding to the second target bifurcation node based on conflict analysis, and further determining the execution mode of the vehicle-mounted function corresponding to the second target bifurcation node based on the logic relationship between the first target bifurcation node and the second target bifurcation node and the execution conflict relationship of the corresponding vehicle-mounted function, so as to fully consider the influence of the change of the voice interaction environment on the vehicle-mounted function execution when the function intention corresponding to the voice command is executed, and further improve the voice interaction efficiency and the user experience when the vehicle-mounted function is executed.

Fig. 4 is a schematic structural diagram of a voice interaction device according to an embodiment of the present application, where the voice interaction device may be implemented by software, hardware, or a combination of both as part or all of a voice interaction device, and the voice interaction device may be a processor shown in fig. 1. Referring to fig. 4, the apparatus includes: a voice receiving module 401, a keyword recognition module 402, a function tree determination module 403, a node function determination module 404, and a function execution module 405.

A voice receiving module 401, configured to receive a first voice instruction;

A keyword recognition module 402, configured to recognize at least one first node keyword in the first voice instruction;

A function tree determining module 403, configured to determine, based on the at least one first node keyword, a keyword function tree corresponding to each first node keyword from at least one keyword function tree included in the keyword function database;

A node function determining module 404, configured to determine, based on the at least one first node keyword, a first target bifurcation node and at least one vehicle-mounted function corresponding to the first target bifurcation node from a first keyword function tree, where the first keyword function tree is a keyword function tree corresponding to the at least one first node keyword, where the keyword function tree corresponding to the at least one first node keyword is the same;

the function execution module 405 is configured to execute at least one vehicle-mounted function corresponding to the first target bifurcation node.

Optionally, the node function determining module 404 is specifically configured to:

Optionally, the apparatus further comprises: an alarm module 406;

The alarm module 406 is configured to send alarm information and end voice interaction when the keyword function tree corresponding to the at least one first node keyword is different, or the at least one first bifurcation node is located on a different path in the first keyword function tree.

Optionally, the voice receiving module 401 is further configured to receive a second voice instruction;

The keyword recognition module 402 is further configured to recognize at least one second node keyword in the second voice instruction;

The function tree determining module 403 is further configured to determine, from the at least one keyword function tree, a keyword function tree corresponding to each second node keyword based on the at least one second node keyword;

The node function determining module 404 is further configured to determine, based on the at least one second node keyword, a second target bifurcation node and at least one vehicle-mounted function corresponding to the second target bifurcation node from a second keyword function tree, where the second keyword function tree is a keyword function tree corresponding to the at least one second node keyword, where the keyword function tree is the same keyword function tree corresponding to the at least one second node keyword;

The function execution module 405 is further configured to close at least one vehicle-mounted function corresponding to the first target bifurcation node and execute at least one vehicle-mounted function corresponding to the second target bifurcation node if the first keyword function tree is the same as the second keyword function tree, and the second target bifurcation node and the first target bifurcation node are located on the same path, or the second target bifurcation node and the first target bifurcation node belong to the same parent node.

Optionally, the function execution module 405 is specifically configured to:

If the at least one vehicle-mounted function corresponding to the second target bifurcation node does not have execution conflict with the at least one vehicle-mounted function corresponding to the first target bifurcation node, executing the at least one vehicle-mounted function corresponding to the second target bifurcation node;

Optionally, the apparatus further comprises: a function setting module 407, the function setting module 407 being configured to:

It should be noted that: in the voice interaction device provided in the above embodiment, only the division of the above functional modules is used for illustration when implementing voice interaction, and in practical application, the above functional allocation may be completed by different functional modules according to needs, that is, the internal structure of the device is divided into different functional modules, so as to complete all or part of the functions described above. In addition, the voice interaction device provided in the above embodiment and the voice interaction method embodiment belong to the same concept, and specific implementation processes thereof are detailed in the method embodiment and are not described herein again.

Fig. 5 is a block diagram of a vehicle 500 according to an embodiment of the present application.

In general, the vehicle 500 includes: a processor 501 and a memory 502.

Processor 501 may include one or more processing cores, such as a 4-core processor, an 8-core processor, and the like. The processor 501 may be implemented in at least one hardware form of DSP (DIGITAL SIGNAL Processing), FPGA (Field Programmable GATE ARRAY ), PLA (Programmable Logic Array, programmable logic array). The processor 501 may also include a main processor and a coprocessor, where the main processor is a processor for processing data in an awake state, and is also called a CPU (Central Processing Unit ); a coprocessor is a low-power processor for processing data in a standby state. In some embodiments, the processor 501 may integrate a GPU (Graphics Processing Unit, image processor) for rendering and drawing of content required to be displayed by the display screen. In some embodiments, the processor 501 may also include an AI (ARTIFICIAL INTELLIGENCE ) processor for processing computing operations related to machine learning.

Memory 502 may include one or more computer-readable storage media, which may be non-transitory. Memory 502 may also include high-speed random access memory, as well as non-volatile memory, such as one or more magnetic disk storage devices, flash memory storage devices. In some embodiments, a non-transitory computer readable storage medium in memory 502 is used to store at least one instruction for execution by processor 501 to implement the voice interaction method provided by the method embodiments of the present application.

In some embodiments, vehicle 500 may optionally further include: a peripheral interface 503 and at least one peripheral. The processor 501, memory 502, and peripheral interface 503 may be connected by buses or signal lines. The individual peripheral devices may be connected to the peripheral device interface 503 by buses, signal lines or circuit boards. Specifically, the peripheral device includes: at least one of radio frequency circuitry 504, a display 505, a camera assembly 506, audio circuitry 507, a positioning assembly 508, and a power supply 509.

Peripheral interface 503 may be used to connect at least one Input/Output (I/O) related peripheral to processor 501 and memory 502. In some embodiments, processor 501, memory 502, and peripheral interface 503 are integrated on the same chip or circuit board; in some other embodiments, either or both of the processor 501, memory 502, and peripheral interface 503 may be implemented on separate chips or circuit boards, which is not limited in this embodiment.

The Radio Frequency circuit 504 is configured to receive and transmit RF (Radio Frequency) signals, also known as electromagnetic signals. The radio frequency circuitry 504 communicates with a communication network and other communication devices via electromagnetic signals. The radio frequency circuit 504 converts an electrical signal into an electromagnetic signal for transmission, or converts a received electromagnetic signal into an electrical signal. Optionally, the radio frequency circuit 504 includes: antenna systems, RF transceivers, one or more amplifiers, tuners, oscillators, digital signal processors, codec chipsets, subscriber identity module cards, and so forth. The radio frequency circuitry 504 may communicate with other computer devices via at least one wireless communication protocol. The wireless communication protocol includes, but is not limited to: the world wide web, metropolitan area networks, intranets, generation mobile communication networks (2G, 3G, 4G, and 5G), wireless local area networks, and/or WiFi (WIRELESS FIDELITY ) networks. In some embodiments, the radio frequency circuit 504 may further include NFC (NEAR FIELD Communication) related circuits, which embodiments of the present application are not limited in this respect.

The display 505 is used to display a UI (User Interface). The UI may include graphics, text, icons, video, and any combination thereof. When the display 505 is a touch display, the display 505 also has the ability to collect touch signals at or above the surface of the display 505. The touch signal may be input as a control signal to the processor 501 for processing. At this time, the display 505 may also be used to provide virtual buttons and/or virtual keyboards, also referred to as soft buttons and/or soft keyboards.

The camera assembly 506 is used to capture images or video. Optionally, the camera assembly 506 includes a front camera and a rear camera. In some embodiments, the at least two rear cameras are any one of a main camera, a depth camera, a wide-angle camera and a tele camera, so as to realize that the main camera and the depth camera are fused to realize a background blurring function, and the main camera and the wide-angle camera are fused to realize a panoramic shooting and Virtual Reality (VR) shooting function or other fusion shooting functions. In some embodiments, camera assembly 506 may also include a flash. The flash lamp can be a single-color temperature flash lamp or a double-color temperature flash lamp. The dual-color temperature flash lamp refers to a combination of a warm light flash lamp and a cold light flash lamp, and can be used for light compensation under different color temperatures.

The audio circuitry 507 may include a microphone and a speaker. The microphone is used for collecting sound waves of users and environments, converting the sound waves into electric signals, and inputting the electric signals to the processor 501 for processing, or inputting the electric signals to the radio frequency circuit 504 for voice communication. For purposes of stereo acquisition or noise reduction, the microphones may be provided in a plurality of different portions of the vehicle 500. The microphone may also be an array microphone or an omni-directional pickup microphone. The speaker is used to convert electrical signals from the processor 501 or the radio frequency circuit 504 into sound waves. The speaker may be a conventional thin film speaker or a piezoelectric ceramic speaker. When the speaker is a piezoelectric ceramic speaker, not only the electric signal can be converted into a sound wave audible to humans, but also the electric signal can be converted into a sound wave inaudible to humans for ranging and other purposes. In some embodiments, audio circuitry 507 may also include a headphone jack.

The locating component 508 is used to locate the current geographic location of the vehicle 500 for navigation or LBS (Location Based Service, location-based services). The positioning component 508 may be a positioning component of a GPS (Global Positioning System ), beidou system or galileo system.

The power supply 509 is used to power the various components in the vehicle 500. The power supply 509 may be an alternating current, a direct current, a disposable battery, or a rechargeable battery. When the power supply 509 comprises a rechargeable battery, the rechargeable battery may be a wired rechargeable battery or a wireless rechargeable battery. The wired rechargeable battery is a battery charged through a wired line, and the wireless rechargeable battery is a battery charged through a wireless coil. The rechargeable battery may also be used to support fast charge technology.

Those skilled in the art will appreciate that the configuration shown in fig. 5 is not limiting of the vehicle 500 and may include more or fewer components than shown, or may combine certain components, or may employ a different arrangement of components.

In some embodiments, there is also provided a computer readable storage medium having stored therein a computer program which, when executed by a processor, implements the steps of the voice interaction method of the above embodiments. For example, the computer readable storage medium may be ROM, RAM, CD-ROM, magnetic tape, floppy disk, optical data storage device, etc.

It is noted that the computer readable storage medium mentioned in the embodiments of the present application may be a non-volatile storage medium, in other words, may be a non-transitory storage medium.

It should be understood that all or part of the steps to implement the above-described embodiments may be implemented by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. The computer instructions may be stored in the computer-readable storage medium described above.

That is, in some embodiments, there is also provided a computer program product containing instructions that, when run on a computer, cause the computer to perform the steps of the voice interaction method described above.

It should be understood that references herein to "at least one" mean one or more, and "a plurality" means two or more. In the description of the embodiments of the present application, unless otherwise indicated, "/" means or, for example, a/B may represent a or B; "and/or" herein is merely an association relationship describing an association object, and means that three relationships may exist, for example, a and/or B may mean: a exists alone, A and B exist together, and B exists alone. In addition, in order to facilitate the clear description of the technical solution of the embodiments of the present application, in the embodiments of the present application, the words "first", "second", etc. are used to distinguish the same item or similar items having substantially the same function and effect. It will be appreciated by those of skill in the art that the words "first," "second," and the like do not limit the amount and order of execution, and that the words "first," "second," and the like do not necessarily differ.

It should be noted that, the information (including but not limited to user equipment information, user personal information, etc.), data (including but not limited to data for analysis, stored data, presented data, etc.), and signals related to the embodiments of the present application are all authorized by the user or are fully authorized by the parties, and the collection, use, and processing of the related data is required to comply with the relevant laws and regulations and standards of the relevant countries and regions.

The above embodiments are not intended to limit the present application, and any modifications, equivalent substitutions, improvements, etc. within the spirit and principle of the present application should be included in the scope of the present application.

Claims

1. A method of voice interaction, the method comprising:

Receiving a first voice instruction;

identifying at least one first node keyword in the first voice instruction;

2. The method of claim 1, wherein the determining a first target bifurcation node from a first keyword function tree based on the at least one first node keyword comprises:

3. The method of claim 2, wherein the method further comprises:

4. A method according to any one of claims 1 to 3, wherein after said executing at least one in-vehicle function corresponding to said first target bifurcation node, the method further comprises:

Receiving a second voice instruction;

Identifying at least one second node keyword in the second voice instruction;

5. The method of claim 4, wherein the method further comprises:

6. A method according to any one of claims 1 to 3, wherein prior to said receiving the first voice instruction, the method further comprises:

7. A voice interaction device, the device comprising:

the voice receiving module is used for receiving a first voice instruction;

8. The apparatus of claim 7, wherein the node function determination module is specifically configured to:

9. A vehicle comprising a memory for storing a computer program and a processor for executing the computer program stored on the memory to implement the steps of the method of any of the preceding claims 1-6.

10. A computer-readable storage medium, characterized in that the storage medium has stored therein a computer program which, when executed by a processor, implements the steps of the method of any of claims 1-6.