CN117012186A

CN117012186A - Voice control function generation method, device and equipment

Info

Publication number: CN117012186A
Application number: CN202311052663.3A
Authority: CN
Inventors: 张金洋; 徐键; 郭亚玲
Original assignee: Zhejiang Geely Holding Group Co Ltd; Zhejiang Remote Commercial Vehicle R&D Co Ltd; Zhejiang Geely Remote New Energy Commercial Vehicle Group Co Ltd
Current assignee: Zhejiang Geely Holding Group Co Ltd; Zhejiang Remote Commercial Vehicle R&D Co Ltd; Zhejiang Geely Remote New Energy Commercial Vehicle Group Co Ltd
Priority date: 2023-08-18
Filing date: 2023-08-18
Publication date: 2023-11-07

Abstract

The embodiment of the application provides a method, a device and equipment for generating a voice control function. The method comprises the following steps: acquiring function description information of a voice control function to be added in a voice control program; carrying out semantic recognition processing on the function description information to obtain target intention information and target reply information; performing object recognition processing on the function description information to determine an operated object and a target operation type for operating the operated object; determining a target execution instruction and a target interaction interface according to the operated object and the target operation type; and generating target configuration information corresponding to the voice control function according to the target intention information, the target reply information, the target execution instruction and the target interactive interface, and adding the target configuration information into a configuration file of the voice control program. The method of the application can make the generation process of the voice control function more flexible.

Description

Voice control function generation method, device and equipment

Technical Field

The embodiment of the application relates to the technical field of voice recognition, in particular to a method, a device and equipment for generating a voice control function.

Background

To improve the driving experience of the user, the automobile provider may configure a voice control program (e.g., a voice assistant) on the automobile side so that the user may control some functions (e.g., a navigation function, a multimedia function, an air conditioning function, etc.) of the vehicle through voice interaction.

In the related art, an automobile provider may configure a voice control program of an automobile machine through a finished android application package (Android application package, APK) software package provided by the voice provider.

However, in the above process, the use of the finished APK package will result in the solidification of the voice control function of the voice control program, resulting in lower flexibility in generating the voice control function.

Disclosure of Invention

The embodiment of the application provides a method, a device and equipment for generating a voice control function, which are used for solving the problem that the flexibility is low when the voice control function is generated because the voice control function of a voice control program is solidified by adopting a finished APK software package.

In a first aspect, an embodiment of the present application provides a method for generating a voice control function, including:

acquiring function description information of a voice control function to be added in a voice control program;

Carrying out semantic recognition processing on the function description information to obtain target intention information and target reply information;

performing object recognition processing on the function description information to determine an operated object and a target operation type for operating the operated object;

determining a target execution instruction and a target interaction interface according to the operated object and the target operation type;

and generating target configuration information corresponding to the voice control function according to the target intention information, the target reply information, the target execution instruction and the target interactive interface, and adding the target configuration information into a configuration file of the voice control program.

In one possible implementation manner, the semantic recognition processing is performed on the function description information to obtain target intention information and target reply information, including:

carrying out semantic recognition processing on the function description information to obtain at least one information group to be selected, wherein the information group to be selected comprises intention information to be selected and reply information to be selected;

displaying the at least one information group to be selected and operation controls corresponding to each information group to be selected, wherein the operation controls comprise a confirmation control and a modification control;

And determining the target intention information and the target reply information in response to the operation input to the operation control corresponding to at least one information group to be selected.

In one possible implementation manner, the determining the target intention information and the target reply information in response to the operation of the operation control input corresponding to at least one information group to be selected includes:

in response to a selection operation input to a confirmation control corresponding to a first information group, determining the intention information to be selected in the first information group as the target intention information, and determining the reply information to be selected in the first information group as the target reply information;

wherein the at least one set of information to be selected comprises the first set of information to be selected.

responding to a selection operation input to a modification control corresponding to a second information group to be selected, and displaying a modification interface corresponding to the second information group to be selected, wherein at least one information group to be selected comprises the second information group to be selected;

Responding to the modification operation input in the modification interface, and acquiring an updated second information group to be selected;

and responding to the selected operation input to the confirmation control corresponding to the updated second candidate information group, determining the candidate intention information in the updated second candidate information group as the target intention information, and determining the candidate reply information in the updated second candidate information group as the target reply information.

In one possible implementation manner, performing object recognition processing on the function description information to determine an operated object and a target operation type for operating the operated object includes:

performing word segmentation processing on the function description information to obtain a plurality of segmented words;

determining the vocabulary type corresponding to each word segmentation;

determining word segmentation with the vocabulary type being a first preset vocabulary type as the operated object;

and determining the word segmentation with the vocabulary type being the second preset vocabulary type as the target operation type.

In one possible implementation manner, determining a target execution instruction and a target interaction interface according to the operated object and the target operation type includes:

acquiring a first corresponding relation corresponding to the operated object, wherein the first corresponding relation comprises a plurality of execution instructions and operation types corresponding to each execution instruction;

Determining the target execution instruction according to the target operation type and the first corresponding relation;

and determining the target interaction interface in a plurality of interaction interfaces corresponding to the operated object according to the target execution instruction.

In one possible implementation manner, generating the target configuration information corresponding to the voice control function according to the target intention information, the target reply information, the target execution instruction and the target interaction interface includes:

acquiring initial configuration information, wherein the initial configuration information comprises a plurality of filling areas;

determining filling areas corresponding to the function description information, the target intention information, the target reply information, the target execution instruction and the target interaction interface in the plurality of filling areas;

and filling the function description information, the target intention information, the target reply information, the target execution instruction and the target interaction interface in the corresponding filling area in the initial configuration information to obtain the target configuration information.

In one possible implementation manner, obtaining function description information of a voice control function to be added in a voice control program includes:

Displaying a configuration page, wherein the configuration page comprises a newly added control;

responding to the operation of the input of the new control, and displaying a new input box;

and determining the information input by the user in the newly added input box as the function description information.

In one possible embodiment, the method further comprises:

displaying a configuration page, wherein the configuration page comprises a first update control;

responding to the operation input to the first updating control, and displaying the identification of the existing voice control functions and the second updating control corresponding to each existing voice control function;

responding to the operation of inputting the second updating control corresponding to a first voice control function, displaying an updating page corresponding to the first voice control function, wherein the existing voice control function comprises the first voice control function;

and acquiring updating information input by a user on the updating page, and updating the first voice control function according to the updating information.

In a second aspect, an embodiment of the present application provides a device for generating a voice control function, where the device includes:

the acquisition module is used for acquiring function description information of a voice control function to be added in the voice control program;

The semantic identification module is used for carrying out semantic identification processing on the function description information to obtain target intention information and target reply information;

the object identification module is used for carrying out object identification processing on the function description information so as to determine an operated object and a target operation type for operating the operated object;

the determining module is used for determining a target execution instruction and a target interaction interface according to the operated object and the target operation type;

the generating module is used for generating target configuration information corresponding to the voice control function according to the target intention information, the target reply information, the target execution instruction and the target interaction interface, and adding the target configuration information into a configuration file of the voice control program.

In one possible implementation manner, the semantic recognition module is specifically configured to:

In a possible implementation manner, the semantic recognition module is specifically further configured to:

In one possible implementation manner, the object identification module is specifically configured to:

determining the vocabulary type corresponding to each word segmentation;

In one possible implementation manner, the determining module is specifically configured to:

In one possible implementation manner, the generating module is specifically configured to:

In one possible implementation manner, the acquiring module is specifically configured to:

In one possible embodiment, the apparatus further comprises:

the display module is used for displaying a configuration page, and the configuration page comprises a first update control;

the response module is used for responding to the operation input to the first updating control and displaying the identification of the existing voice control functions and the second updating control corresponding to each existing voice control function;

the response module is further used for responding to the operation of inputting the second updating control corresponding to the first voice control function and displaying an updating page corresponding to the first voice control function, and the existing voice control function comprises the first voice control function;

The acquisition module is also used for acquiring the update information input by the user on the update page;

and the updating module is used for updating the first voice control function according to the updating information.

In a third aspect, an embodiment of the present application provides an electronic device, including a processor, and a memory communicatively coupled to the processor; wherein,

the memory stores computer-executable instructions;

the processor executes computer-executable instructions stored in the memory to implement the method of any one of the first aspects.

In a fourth aspect, embodiments of the present application provide a computer-readable storage medium having stored therein computer-executable instructions for performing the method according to any of the first aspects when executed by a processor.

In a fifth aspect, embodiments of the present application provide a computer program product comprising a computer program which, when executed by a processor, implements a method according to any of the first aspects.

The method, the device and the equipment for generating the voice control function provided by the embodiment of the application can acquire the function description information of the voice control function to be added in the voice control program; carrying out semantic recognition processing on the function description information to obtain target intention information and target reply information; performing object recognition processing on the function description information to determine an operated object and a target operation type for operating the operated object; determining a target execution instruction and a target interaction interface according to the operated object and the target operation type; and generating target configuration information corresponding to the voice control function according to the target intention information, the target reply information, the target execution instruction and the target interactive interface, and adding the target configuration information into a configuration file of the voice control program. In the above process, the user can set the target configuration information of the voice control function in a self-defined manner, and the target configuration information is added into the configuration file corresponding to the voice control program, so that the voice control program can call the target configuration information to generate the corresponding voice control function, and the generation process of the voice control function is more flexible.

Drawings

Fig. 1 is a schematic diagram of an application scenario provided in an embodiment of the present application;

fig. 2 is a flow chart of a method for generating a voice control function according to an embodiment of the present application;

FIG. 3 is a flowchart of a method for obtaining target intention information and target reply information according to an embodiment of the present application;

FIG. 4 is a schematic diagram of another method for generating a voice control function according to an embodiment of the present application;

fig. 5 is a flow chart of a method for updating a voice control function according to an embodiment of the present application;

FIG. 6 is a schematic diagram of a configuration page according to an embodiment of the present application;

FIG. 7 is a schematic diagram of a voice interaction process according to an embodiment of the present application;

fig. 8 is a schematic diagram of an installation position of a sound collection device according to the present application;

fig. 9 is a schematic structural diagram of a generating device with a voice control function according to an embodiment of the present application;

fig. 10 is a schematic structural diagram of another voice control function generating device according to an embodiment of the present application;

fig. 11 is a schematic structural diagram of an electronic device according to an embodiment of the present application.

Detailed Description

Reference will now be made in detail to exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, the same numbers in different drawings refer to the same or similar elements, unless otherwise indicated. The implementations described in the following exemplary examples do not represent all implementations consistent with the application. Rather, they are merely examples of apparatus and methods consistent with aspects of the application as detailed in the accompanying claims.

It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.

For ease of understanding, the terms involved in the present application are explained.

Android application package (Android application package, APK): the application package file format is an application of an Android (Android) operating system and is used for distribution, device mobile application and middleware.

Fig. 1 is a schematic diagram of an application scenario provided in an embodiment of the present application. Referring to fig. 1, the electronic device includes an electronic device in which a voice control program is executed.

The user can perform personalized adjustment on the configuration file of the voice control function in the voice control program. For example, assuming that the electronic device is a vehicle-mounted device, when the vehicle-mounted device is configured in a common vehicle such as a car, the configuration file of the voice control function generally includes configuration information of some general voice control functions, for example, the general voice control functions may be a driving mode switching function, a multimedia control function, and a navigation function. When the vehicle-mounted device is configured in a commercial vehicle such as a heavy truck, besides the configuration information of the general voice control function, a user can add some configuration information corresponding to the personalized voice control function to the original configuration file by using the method provided by the application, for example, the personalized voice control function can be a load mode adjusting function, a parking air-conditioning control function, a fuel anti-theft switch control function, an electronic front blind-repair mirror adjusting function and an electronic rearview mirror visual field follow-up adjusting function.

The user can also update the configuration information of the existing voice control function in the original configuration file. For example, in fig. 1, the user may update the configuration information corresponding to the navigation function, and obtain the updated configuration information of the navigation function.

In the embodiment of the application, the electronic equipment can be intelligent vehicle-mounted equipment or a vehicle-mounted equipment end, and the electronic equipment can also be a smart phone, an intelligent watch, a tablet personal computer, a portable computer, a desktop computer, a laptop portable computer, or virtual reality/augmented reality/mixed reality equipment and the like.

In embodiments of the application, the voice control program may be built based on a voice software development kit (Software Development Kit, SDK). The voice SDK discloses a number of voice service functions that can support the development of voice control programs. The voice SDK may be used in a variety of programming languages and in a variety of platforms. The voice SDK may be adapted for real-time and non-real-time schemes.

In some embodiments, after the voice control program is built based on the SDK, a virtual intelligent assistant (Virtual Personal Assistant, VPA) may also be built based on the SDK according to different business requirements, the virtual intelligent assistant may be used for voice interaction between the user and the voice control program, the VPA may be displayed as different images for different scenes and function requirements, and each image may have various actions and expressions to provide personalized voice interaction services for the user.

Currently, a voice provider can provide a finished APK software package, and the finished APK software package has the advantages of mature, stable and universal functions. In the related art, an automobile provider can configure a voice control program of an automobile machine through a finished product APK software package provided by the voice provider, so that the development time of the voice control program of the automobile machine can be shortened, and the method is convenient and quick.

In view of this, the method for generating a voice control function provided in the embodiment of the present application may obtain function description information of a voice control function to be added in a voice control program; carrying out semantic recognition processing on the function description information to obtain target intention information and target reply information; performing object recognition processing on the function description information to determine an operated object and a target operation type for operating the operated object; determining a target execution instruction and a target interaction interface according to the operated object and the target operation type; and generating target configuration information corresponding to the voice control function according to the target intention information, the target reply information, the target execution instruction and the target interactive interface, and adding the target configuration information into a configuration file of the voice control program.

In the above process, the user can set the target configuration information of the voice control function in a self-defined manner, and the target configuration information is added into the configuration file corresponding to the voice control program, so that the voice control program can call the target configuration information to generate the corresponding voice control function, and the generation process of the voice control function is more flexible.

The method according to the present application will be described below by way of specific examples. It should be noted that the following embodiments may exist alone or in combination with each other, and for the same or similar content, the description will not be repeated in different embodiments.

Fig. 2 is a flow chart of a method for generating a voice control function according to an embodiment of the present application. Referring to fig. 2, the method may include:

s201, function description information of a voice control function to be added in a voice control program is acquired.

The execution body of the embodiment of the application can be electronic equipment or a generation device of a voice control function arranged in the electronic equipment. The generation device of the voice control function can be realized by software, or can be realized by a combination of software and hardware.

There may be multiple pieces of function description information for each voice control function. Next, description will be given of function description information corresponding to a newly added voice control function, taking an electronic device as an example of an intelligent vehicle-mounted device configured in a heavy truck.

For example, referring to fig. 1, in a voice control program of an intelligent in-vehicle apparatus configured in a heavy truck, a voice control function as follows may be newly added: a driving mode switching function, a loading mode adjusting function, a fuel anti-theft switch control function, an electronic front blind mirror adjusting function, an electronic rearview mirror visual field follow-up adjusting function, a parking air conditioner control function and the like.

For ease of understanding, the following exemplifies the function description information of these voice control functions, taking table 1 as an example:

TABLE 1

S202, carrying out semantic recognition processing on the function description information to obtain target intention information and target reply information.

The semantic recognition model can be any semantic recognition model in the related technology, and the application is not limited to the semantic recognition model.

Each item of markup intent information may correspond to multiple items of markup reply information. For example, the correspondence between the target intention information and the target reply information may be as shown in table 2:

TABLE 2

For example, in table 2, since the vehicle body is large, the relative view of the rearview mirror needs to be adjusted in time according to the steering condition of the vehicle to reduce the view blind area range of the driver of the heavy truck as much as possible, so that the voice control function of the electronic rearview mirror view follow-up adjustment function can be newly added in the voice control program of the heavy truck. When the identified target intention information is "turn on electronic rearview mirror visual field follow-up adjustment switch", the corresponding target reply information may be "turn on electronic rearview mirror visual field follow-up adjustment switch for you, and the relative visual field of the rearview mirror will be automatically adjusted according to the current steering situation of the vehicle".

S203, performing object recognition processing on the function description information to determine an operated object and a target operation type for operating the operated object.

The operated objects can be a driving mode switching controller, a multimedia controller, a parking air conditioner controller, a load mode adjusting controller, a fuel anti-theft switch controller, an electronic rearview mirror visual field follow-up adjusting controller, an electronic front blind mirror adjusting controller, an intelligent scene-rest mode switch controller and the like, and the operated objects can also be control interfaces corresponding to the controllers.

The target operation type may be an operation type such as on (start), off, adjust, and switch to the a mode.

For example, the function description information is "switch driving mode to economy mode", and after the object recognition processing, the determined operated object may be "driving mode switch controller", and the target operation type is "switch to economy mode".

S204, determining a target execution instruction and a target interaction interface according to the operated object and the target operation type.

The target execution instructions and target interaction interface may be determined by: acquiring a first corresponding relation corresponding to the operated object; determining a target execution instruction according to the target operation type and the first corresponding relation; and determining a target interaction interface in a plurality of interaction interfaces corresponding to the operated object according to the target execution instruction.

The first correspondence may include a plurality of execution instructions and an operation type corresponding to each execution instruction.

For example, the target operation type is "switch to economy mode", and the target execution instruction may be: the driving mode is switched to the economy mode. The target interactive interface may be a human-machine interface (Human Machine Interface, HMI) corresponding to when the driving mode is the economy mode.

S205, generating target configuration information corresponding to the voice control function according to the target intention information, the target reply information, the target execution instruction and the target interactive interface, and adding the target configuration information into a configuration file of the voice control program.

The target configuration information may be stored in tabular form. For example, the target configuration information may be as shown in table 3:

TABLE 3 Table 3

After adding the target configuration information corresponding to the voice control function to the configuration file of the voice control program, the voice control program can call the target configuration information to generate the corresponding voice control function.

The method for generating the voice control function provided by the embodiment of the application can acquire the function description information of the voice control function to be added in the voice control program; carrying out semantic recognition processing on the function description information to obtain target intention information and target reply information; performing object recognition processing on the function description information to determine an operated object and a target operation type for operating the operated object; determining a target execution instruction and a target interaction interface according to the operated object and the target operation type; and generating target configuration information corresponding to the voice control function according to the target intention information, the target reply information, the target execution instruction and the target interactive interface, and adding the target configuration information into a configuration file of the voice control program. In the above process, the user can set the target configuration information of the voice control function in a self-defined manner, and the target configuration information is added into the configuration file corresponding to the voice control program, so that the voice control program can call the target configuration information to generate the corresponding voice control function, and the generation process of the voice control function is more flexible.

Next, with reference to fig. 3, a process of performing semantic recognition processing on the function description information to obtain target intention information and target reply information will be described in detail.

Fig. 3 is a flowchart of a method for obtaining target intention information and target reply information according to an embodiment of the present application. Referring to fig. 3, the method specifically includes:

s301, carrying out semantic recognition processing on the function description information to obtain at least one information group to be selected.

The candidate information group may include candidate intention information and candidate reply information.

The functional descriptive information may be semantically identified based on natural language processing (Natural Language Processing, NLP) methods. The NLP method includes two parts, natural language understanding (Natural Language Understanding, NLU) and natural language generation (Natural Language Generation, NLG). In practical use, keywords in the function description information can be extracted, and the voice control program can generate at least one information group to be selected for selection by a user according to the extracted keywords.

S302, at least one information group to be selected and operation controls corresponding to the information groups to be selected are displayed.

The operation controls may include a confirmation control and a modification control.

The target intention information and the target reply information may be determined in response to an operation input to an operation control corresponding to the at least one information group to be selected.

When the operation controls are different, the specific execution process of determining the target intention information and the target reply information is different. If the operation control is the determination control, executing S303; and if the operation control is a modification control, executing S304-S306.

S303, determining the to-be-selected intention information in the first to-be-selected information group as target intention information and determining the to-be-selected reply information in the first to-be-selected information group as target reply information in response to the selected operation input to the corresponding confirmation control of the first to-be-selected information group.

Wherein the at least one set of information to be selected comprises a first set of information to be selected.

The user can determine target intention information and target reply information by selecting the determination control to determine the candidate intention information and the candidate reply information in at least one candidate information group. The user may also modify the candidate intention information and the candidate reply information in at least one candidate information group by selecting the modification control, and the specific modification process may refer to the execution process of S304 to S306.

Illustratively, the selection operation may be a click operation or a slide operation, or the like.

S304, responding to the selected operation input to the modification control corresponding to the second information group to be selected, and displaying a modification interface corresponding to the second information group to be selected.

Wherein the at least one set of information to be selected comprises a second set of information to be selected.

S305, responding to the modification operation input in the modification interface, and acquiring the updated second information group to be selected.

The modification interface can be further provided with a text input box and a determination button, the user can firstly select the modification button to modify the candidate intention information and the candidate reply information in the second candidate information group in the text input box, and then select the determination button in the modification interface after modification to obtain the updated second candidate information group.

S306, determining the candidate intention information in the updated second candidate information group as target intention information and determining the candidate reply information in the updated second candidate information group as target reply information in response to the selected operation input to the confirmation control corresponding to the updated second candidate information group.

The method for obtaining the target intention information and the target reply information provided by the embodiment of the application can carry out semantic recognition processing on the function description information to obtain at least one information group to be selected, display the at least one information group to be selected and the operation control corresponding to each information group to be selected, and respond to the operation input to the operation control corresponding to the at least one information group to be selected to determine the target intention information and the target reply information. In the method, the user can operate the determining control and the modifying control corresponding to at least one information group to be selected according to the semantic recognition result so as to personally set the target intention information and the target reply information corresponding to the function description information, so that the generated target intention information and target reply information can better meet the use requirement of the user.

Fig. 4 is a schematic diagram of another method for generating a voice control function according to an embodiment of the present application. Referring to fig. 4, the method includes:

s401, acquiring function description information of a voice control function to be added in a voice control program.

Alternatively, the function description information may be acquired as follows: displaying a configuration page, wherein the configuration page comprises a newly added control; responding to the operation of the input of the new control, and displaying a new input box; and determining the information input by the user in the newly added input box as function description information.

The user can select the new control in the configuration page to input the function description information through the new input box.

The user can refine the function description information when inputting the function description information. For example, when the function description information corresponding to the navigation function is input, it may be thinned to "zoom the navigation map of the navigation system". When the function description information corresponding to the window control function is input, the window control function can be thinned to be 'one fourth of the window opening of the copilot'.

S402, carrying out semantic recognition processing on the function description information to obtain at least one information group to be selected.

S403, displaying at least one information group to be selected and operation controls corresponding to each information group to be selected.

S404, determining target intention information and target reply information in response to the operation input to the operation control corresponding to at least one information group to be selected.

It should be noted that, the specific execution process of S404 may refer to the specific execution processes of S303 to S306, and repeated descriptions are not repeated here.

Next, in connection with S405 to S408, the execution process of performing object recognition processing on the function description information to determine the operated object and the target operation type of the operation on the operated object will be described in detail.

S405, performing word segmentation processing on the function description information to obtain a plurality of segmented words.

For example, the function description information may be "turn on the fuel anti-theft switch immediately", and may be obtained by word segmentation: the fuel theft prevention switch is opened immediately.

S406, determining the vocabulary type corresponding to each word segmentation.

Vocabulary types may include nouns, verbs, and adverbs.

For example, the segmentation is: immediate, open, and fuel anti-theft switch, where "immediate" is a adverb, "open" is a verb, and "fuel anti-theft switch" is a noun.

S407, determining the word segmentation with the vocabulary type being the first preset vocabulary type as an operated object.

The first predetermined vocabulary type may include nouns. For example, the term "fuel theft prevention switch" is used, and the fuel theft prevention switch may be determined as an operation target.

S408, determining the word segmentation with the vocabulary type being the second preset vocabulary type as the target operation type.

The second preset vocabulary type may include verbs, verbs+adverbs, and verbs+nouns.

For example, the word may be the verb "open", then the target operation type is open; the word segmentation can be verb plus adverbs of "open immediately", and the target operation type is open immediately; the word segmentation may be verb+noun "switch movement mode", and the target operation type is to switch movement mode.

Next, a process of determining the target execution instruction and the target interactive interface according to the operated object and the target operation type will be described in detail with reference to S409 to S411.

S409, acquiring a first corresponding relation corresponding to the operated object.

The first corresponding relation comprises a plurality of execution instructions and operation types corresponding to each execution instruction.

For example, the first correspondence may be as shown in table 4:

TABLE 4 Table 4

S410, determining a target execution instruction according to the target operation type and the first corresponding relation.

For example, in table 4, the target operation type is "on", and it may be determined that the target execution instruction is "open the fuel anti-theft switch controller".

S411, determining a target interaction interface in a plurality of interaction interfaces corresponding to the operated object according to the target execution instruction.

There may be a one-to-one correspondence between the target execution instructions and the target interaction interface.

Next, a process of generating target configuration information corresponding to the voice control function according to the target intention information, the target reply information, the target execution instruction, and the target interactive interface will be described in detail with reference to S412 to S414.

S412, acquiring initial configuration information.

The initial configuration information may include a plurality of padding areas therein.

S413, determining filling areas corresponding to the function description information, the target intention information, the target reply information, the target execution instruction and the target interaction interface in the filling areas.

Each fill area may correspond to a fill area identification. And filling the function description information, the target intention information, the target reply information, the target execution instruction and the target interaction interface into the corresponding filling areas according to the filling area identification.

S414, filling function description information, target intention information, target reply information, target execution instructions and a target interaction interface in the corresponding filling area in the initial configuration information to obtain target configuration information.

S415, adding target configuration information into a configuration file of the voice control program.

The method for generating the voice control function provided by the embodiment of the application can also update the existing voice control function. On the basis of any of the above embodiments, a detailed procedure for updating an existing voice control function will be described below with reference to fig. 5.

Fig. 5 is a flow chart of a method for updating a voice control function according to an embodiment of the present application. Referring to fig. 5, the method may include:

s501, displaying a configuration page.

A first update control may be included in the configuration page. The first update control may be a button control or a slider control.

S502, responding to the operation input to the first updating control, and displaying the identification of the existing voice control functions and the second updating control corresponding to each existing voice control function.

For ease of understanding, a configuration page displaying an identification of existing voice control functions and a second update control corresponding to each existing voice control function will be described below with reference to fig. 6.

Fig. 6 is a schematic diagram of a configuration page according to an embodiment of the present application. Referring to fig. 6, the configuration page includes an identification of a plurality of existing voice control functions, each corresponding to a second update control.

S503, responding to the operation of inputting a second update control corresponding to the first voice control function, and displaying an update page corresponding to the first voice control function.

The existing voice control functions include a first voice control function.

S504, acquiring update information input by a user on the update page, and updating the first voice control function according to the update information.

The update page can be further provided with a text input box and a determination control, a user can input update information in the text input box, and after the input is completed, the determination control in the update page is selected to update the configuration information corresponding to the first voice control function, so that the update of the first voice control function is realized.

The method for updating the existing voice control function provided by the embodiment of the application can display the configuration page; responding to the operation of inputting the first updating control on the configuration page, and displaying the identification of the existing voice control functions and the second updating control corresponding to each existing voice control function; responding to the operation of inputting a second update control corresponding to any one of the existing voice control functions, and displaying an update page corresponding to the first voice control function; and acquiring update information input by a user on the update page, and updating the first voice control function according to the update information. In the above process, the user can update the configuration information of the existing voice control function, so as to update the first voice control function, and the voice control function can be flexibly updated according to the user requirement.

In the actual application process, the method provided by the application can also be adopted by an automobile provider, and the configuration information (comprising intention information, reply information, execution instructions and an interactive interface) corresponding to the voice control function is set in a personalized way based on the voice SDK according to the use scene and service requirement of the automobile, and the configuration information is packaged, so that the adaptation development of the voice control program of the automobile machine side is realized. And after the adaptation development is carried out on the voice control program of the vehicle machine side by the vehicle supplier, the developed voice control program can be tested in multiple aspects.

By the method, the automobile provider can develop and test the voice control program of the automobile machine based on the voice SDK without depending on an APK software installation package provided by a third party. The voice control program can be updated after the subsequent service is updated, so that the later maintenance cost of the voice control program can be reduced.

Next, a process of voice interaction between the vehicle-side voice control program and the user will be described with reference to fig. 7.

Fig. 7 is a schematic diagram of a voice interaction process according to an embodiment of the present application. Referring to fig. 7, the system includes a vehicle side including a sound collection device and a voice control program.

The driver can send out interactive voice of 'hello, remote and fuel anti-theft switch on' to the vehicle terminal. The voice acquisition device at the vehicle machine end can acquire the interactive voice and wake up a voice control program.

The voice control program can be provided with a voice recognition model, and the voice recognition model can be adopted to perform voice recognition processing on the interactive voice so as to obtain a voice text corresponding to the interactive voice. The voice control program can also determine the sound source position according to the sound collecting device corresponding to the interactive voice.

The voice control program can perform semantic recognition processing and object recognition processing on the voice text corresponding to the interactive voice to obtain intention information, reply information, execution instruction and interactive interface corresponding to the interactive voice. It can be understood that the voice control program performs recognition processing on the voice text, and intention information can be obtained; object recognition processing is carried out on the voice text, and an operation object and a target operation type can be determined. The voice control program can determine a target execution instruction and a target interaction interface according to the operated object and the target operation type, send the target execution instruction to the operated object and display the target interaction interface. The voice control program may also determine the reply information based on the intention information and the execution result of the target execution instruction.

The voice control program may perform intelligent voice synthesis processing on the reply information to generate a emotionally rich reply voice, and play the reply voice to the user.

The vehicle body end may be provided with a sound collecting device at a plurality of positions, and the installation position of the sound collecting device is schematically described below with reference to fig. 8. Fig. 8 is a schematic diagram of an installation position of a sound collection device according to the present application. Referring to fig. 8, the sound collection device may be a microphone. The main driving microphone can be installed in a main driving area, the auxiliary driving microphone can be installed in an auxiliary driving area, and the corresponding microphones can be installed in different sleeper areas so as to collect interactive voices at different positions.

In one possible implementation manner, the vehicle-mounted device may recognize the interactive voice collected by the voice collecting device, extract characteristic information (e.g., sound intensity, etc.) of the interactive voice, and determine a sound source position of the interactive voice based on the characteristic information.

By locating the sound source position, the operation object can be accurately determined. For example, when the secondary driver initiates a voice request: "hello remote, please open the window", the voice assistant can collect the voice request through the secondary driving microphone, and determine that the operated object is the secondary driving window based on the voice request.

Fig. 9 is a schematic structural diagram of a generating device with a voice control function according to an embodiment of the present application. Referring to fig. 9, the voice control function generating apparatus 10 may include:

an acquisition module 11 for acquiring function description information of a voice control function to be added in a voice control program;

the semantic recognition module 12 is used for performing semantic recognition processing on the function description information to obtain target intention information and target reply information;

an object recognition module 13, configured to perform object recognition processing on the function description information, so as to determine an operated object and a target operation type for operating the operated object;

a determining module 14, configured to determine a target execution instruction and a target interaction interface according to the operated object and the target operation type;

the generating module 15 is configured to generate target configuration information corresponding to the voice control function according to the target intention information, the target reply information, the target execution instruction and the target interactive interface, and add the target configuration information to a configuration file of the voice control program.

The device for generating the voice control function provided by the embodiment of the application can execute the technical scheme shown in the embodiment of the method, and the implementation principle and the beneficial effects are similar, and are not repeated here.

In one possible implementation, the semantic recognition module 12 is specifically configured to:

In a possible implementation, the semantic recognition module 12 is specifically further configured to:

In one possible implementation, the object recognition module 13 is specifically configured to:

determining the vocabulary type corresponding to each word segmentation;

In one possible implementation, the determining module 14 is specifically configured to:

In one possible implementation, the generating module 15 is specifically configured to:

Fig. 10 is a schematic structural diagram of another voice control function generating device according to an embodiment of the present application. Referring to fig. 10, on the basis of the apparatus shown in fig. 9, the apparatus further includes:

A display module 16, configured to display a configuration page, where the configuration page includes a first update control;

a response module 17, configured to display, in response to an operation input to the first update control, an identifier of an existing voice control function and a second update control corresponding to each existing voice control function;

the acquiring module 11 is further configured to acquire update information input by a user on the update page;

an updating module 18, configured to update the first voice control function according to the updating information.

Fig. 11 is a schematic structural diagram of an electronic device according to an embodiment of the present application. Referring to fig. 11, the electronic device 20 may include: a memory 21, and a processor 22. The memory 21, the processor 22, are illustratively interconnected by a bus 23.

The memory 21 is used for storing program instructions;

the processor 22 is configured to execute the program instructions stored in the memory, so as to cause the electronic device 20 to execute the method shown in the above-described method embodiment.

The electronic device provided by the embodiment of the application can execute the technical scheme shown in the embodiment of the method, and the implementation principle and the beneficial effects are similar, and are not repeated here.

Embodiments of the present application provide a computer-readable storage medium having stored therein computer-executable instructions for implementing the method shown in the above-described method embodiments when the computer-executable instructions are executed by a processor.

Embodiments of the present application may also provide a computer program product comprising a computer program which, when executed by a processor, performs the method shown in the above-mentioned method embodiments.

All or part of the steps for implementing the method embodiments described above may be performed by hardware associated with program instructions. The foregoing program may be stored in a readable memory. The program, when executed, performs steps including the method embodiments described above; and the aforementioned memory (storage medium) includes: read-only memory (ROM), random-access memory (RAM), flash memory, hard disk, solid state disk, magnetic tape, floppy disk, optical disk, and any combination thereof.

Embodiments of the present application are described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processing unit of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processing unit of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

It should be noted that, the user information (including but not limited to user equipment information, user personal information, etc.) and the data (including but not limited to data for analysis, stored data, presented data, etc.) related to the present application are information and data authorized by the user or fully authorized by each party, and the collection, use and processing of the related data need to comply with the related laws and regulations and standards of the related country and region, and provide corresponding operation entries for the user to select authorization or rejection.

Other embodiments of the application will be apparent to those skilled in the art from consideration of the specification and practice of the application disclosed herein. This application is intended to cover any variations, uses, or adaptations of the application following, in general, the principles of the application and including such departures from the present disclosure as come within known or customary practice within the art to which the application pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the application being indicated by the following claims.

It is to be understood that the application is not limited to the precise arrangements and instrumentalities shown in the drawings, which have been described above, and that various modifications and changes may be effected without departing from the scope thereof. The scope of the application is limited only by the appended claims.

In the present disclosure, the term "include" and variations thereof may refer to non-limiting inclusion; the term "or" and variations thereof may refer to "and/or". The terms "first," "second," and the like, herein, are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. In the present application, "a plurality of" means two or more. "and/or", describes an association relationship of an association object, and indicates that there may be three relationships, for example, a and/or B, and may indicate: a exists alone, A and B exist together, and B exists alone. The character "/" generally indicates that the context-dependent object is an "or" relationship.

Claims

1. A method for generating a voice control function, comprising:

2. The method according to claim 1, wherein performing semantic recognition processing on the function description information to obtain target intention information and target reply information includes:

3. The method of claim 2, wherein determining the target intent information and the target reply information in response to an operation of an operation control input corresponding to at least one of the groups of candidate information comprises:

4. The method of claim 2, wherein determining the target intent information and the target reply information in response to an operation of an operation control input corresponding to at least one of the groups of candidate information comprises:

5. The method according to any one of claims 1 to 4, wherein performing object recognition processing on the function description information to determine an operated object and a target operation type of operating on the operated object includes:

determining the vocabulary type corresponding to each word segmentation;

6. The method of any of claims 1-5, wherein determining a target execution instruction and a target interaction interface based on the operated object and the target operation type comprises:

7. The method according to any one of claims 1-6, wherein generating target configuration information corresponding to the voice control function according to the target intention information, the target reply information, the target execution instruction, and the target interaction interface includes:

8. The method according to any one of claims 1-7, wherein obtaining function description information of a voice control function to be added in a voice control program comprises:

9. The method according to any one of claims 1-8, further comprising:

10. A speech control function generating apparatus, the apparatus comprising:

11. An electronic device, comprising: a processor, and a memory communicatively coupled to the processor; wherein,

the memory stores computer-executable instructions;

the processor executes computer-executable instructions stored in the memory to implement the method of any one of claims 1 to 9.

12. A computer readable storage medium having stored therein computer executable instructions which when executed by a processor are adapted to carry out the method of any one of claims 1 to 9.

13. A computer program product comprising a computer program which, when executed by a processor, implements the method of any one of claims 1 to 9.