CN112309373A - System and method for self-defining vehicle-mounted voice technology - Google Patents

System and method for self-defining vehicle-mounted voice technology Download PDF

Info

Publication number
CN112309373A
CN112309373A CN202011039892.8A CN202011039892A CN112309373A CN 112309373 A CN112309373 A CN 112309373A CN 202011039892 A CN202011039892 A CN 202011039892A CN 112309373 A CN112309373 A CN 112309373A
Authority
CN
China
Prior art keywords
user
voice
skill
voice skill
module
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011039892.8A
Other languages
Chinese (zh)
Inventor
谢志华
王满红
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huizhou Desay SV Automotive Co Ltd
Original Assignee
Huizhou Desay SV Automotive Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huizhou Desay SV Automotive Co Ltd filed Critical Huizhou Desay SV Automotive Co Ltd
Priority to CN202011039892.8A priority Critical patent/CN112309373A/en
Publication of CN112309373A publication Critical patent/CN112309373A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/06Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
    • G10L15/063Training
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/06Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
    • G10L15/063Training
    • G10L2015/0638Interactive procedures
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • G10L2015/223Execution procedure of a spoken command

Abstract

The invention relates to a system for customizing vehicle-mounted voice skills, which comprises a customized voice skill training engine, wherein the customized voice skill training engine is used for triggering, training, verifying, generating and matching scenes of user-defined voice skills. And the custom voice skill execution engine is used for inputting, analyzing, scene recognition, semantic arbitration, matching and execution of voice requests used by users. And the customized voice skill management public module is responsible for training the unified storage of the generated voice skill configuration and providing corresponding retrieval service when the customized voice skill is used. And the user-defined voice skill display public module is responsible for interface interaction and dialogue corpus management in the process of training and using the user-defined voice skill. According to the vehicle-mounted voice control system and the vehicle-mounted voice control method, through the user-defined voice skill, the user can experience functions which are not available in an original vehicle-mounted voice control product, the actual requirements of individual users are met better, and the skill recognition rate, especially the recognition rate of fuzzy intentions, is improved.

Description

System and method for self-defining vehicle-mounted voice technology
Technical Field
The invention relates to the technical field of automotive electronics, in particular to a system and a method for customizing vehicle-mounted voice skills.
Background
Many medium-high grade automobile vehicle-mounted platforms are provided with a voice control function, so that a main driver or a secondary driver can conveniently use the platforms in driving. At present, the vehicle-mounted voice mainly uses a scheme of three-party SDK (SDK is an abbreviation of Software Development Kit, and Chinese means a Software Development Kit) original voice skills or three-party voice skills aggregation, although the scheme can meet the requirement of common vehicle-mounted voice control, the voice control functions of various vehicles and factories are different greatly, and the product differentiation competitive advantage is not obvious. Although the existing voice skills of three parties are covered, the control of the common voice function is realized, the actual voice skill execution effect possibly cannot meet the linkage requirement of the user in the actual scene due to the lack of perception of the actual vehicle-mounted scene, in addition, the three-party voice skills which are not frequently used are not covered completely, and if the existing skill user has own preferred language, the experience is not intelligent. In the existing three-party voice scheme CP (CP is short for Content Provider, i.e. Content Provider or Content Provider), once the resources are determined not to be modified, in fact, a user may have different CP requirements for the same function in a car, such as listening to music, some people like using QQ music, some people like using cool music, and the like. User participation of the existing vehicle-mounted voice product is limited to personalized setting and new demand feedback collection, and the characteristics of defining product functions and timely experiencing the product functions by the user are lacked.
From the above, the existing vehicle-mounted voice control system has the following defects:
(1) most of the three-party voice finished product solutions are adopted, the function configuration is standardized, the homogenization is serious, the special personalized voice control skill of a user does not exist basically, and the differentiation of products is difficult to embody;
(2) the function experience is uniform, the perception of a vehicle-mounted real scene is lacked, the personalized linkage under a response scene cannot be realized, and in addition, because the language habit difference of using the same function exists among different users, the existing voice skill cannot cover different saying instructions of all users, particularly fuzzy intention recognition;
(3) the APP or the service provider is fixed when leaving a factory, and the user cannot be switched to the preferred APP or service provider;
(4) although there are some functions such as setting up personalized TTS (TTS is an abbreviation of Text To Speech, i.e., "from Text To Speech", which refers To a Speech synthesis technology), for Speech skills, a user only has permission To use, and no entrance To personalized Speech functions is added, and interactivity is general.
Aiming at the problems, a system and a method for customizing vehicle-mounted voice skills are invented.
Disclosure of Invention
The invention aims to solve the problems that the existing vehicle-mounted voice control system basically has no personalized voice control skill special for a user, is difficult to reflect the product differentiation, lacks the perception of a vehicle-mounted real scene, cannot cover different speaking instructions of all users, cannot be switched to a favorite APP or a service provider by the user, has functions of personalized TTS setting and the like, but has only the use permission for the voice skill and does not increase the entrance of a personalized voice function. The concrete solution is as follows:
a system for customizing vehicle voice skills, comprising:
the user-defined voice skill training engine is used as a part of the user-defined personalized function and is used for triggering, training, verifying and generating the user-defined voice skill and matching scenes;
the user-defined speech skill execution engine is used as a part of the personalized function used by the user and is used for inputting, analyzing, scene recognition, semantic arbitration, matching and execution of the speech request used by the user;
the user-defined voice skill management public module is used as a public storage module for training and using the user-defined voice skill, and is responsible for uniformly storing the configuration of the training generated voice skill and providing corresponding retrieval service when the user-defined voice skill is used;
the user-defined voice skill display public module is used as a public display module for training and using the user-defined voice skill and is responsible for interface interaction and dialogue corpus management in the process of training and using the user-defined voice skill;
the interfaces of the custom voice skill training engine, the custom voice skill execution engine, the custom voice skill management public module and the custom voice skill display public module are coupled through software.
Further, the custom voice skill training engine comprises:
the user-defined voice skill triggering module is responsible for responding, arbitrating and analyzing a request of a user for starting the user-defined voice skill;
the user-defined voice skill training module is responsible for learning the specific process of user-defined voice skills in a multi-turn conversation mode;
the vehicle-mounted voice skill verification module is responsible for verifying the effectiveness of the user-defined voice skill;
the user-defined voice skill generation module is responsible for converting user-defined voice skills into a uniform configuration protocol format;
the vehicle-mounted scene selection module is in charge of providing preset selectable user-defined voice skill use scenes for a user;
the custom voice skill execution engine comprises:
the voice request input module is responsible for responding and distributing voice instructions of users;
the original voice skill analysis module is responsible for analyzing the self-contained voice skill intention when leaving the factory;
the vehicle-mounted scene recognition module is responsible for recognizing the current vehicle-mounted voice interaction scene, including a conversation context, a user position, a vehicle machine system state and a vehicle body part state;
the vehicle-mounted voice skill arbitration module is responsible for carrying out arbitration decision on the user-defined voice skill and the original voice skill;
the user-defined voice skill matching module is responsible for acquiring corresponding user-defined voice skill configuration;
and the user-defined voice skill execution module is responsible for analyzing configuration and executing a relevant voice skill response process.
A method for customizing vehicle-mounted voice skills uses the system for customizing vehicle-mounted voice skills, which comprises a customized voice skill training method and a customized voice skill using method, wherein the customized voice skill training method is carried out according to the following steps:
step 1, a user self-defines voice skill input, and a system analyzes and judges to guide the user to select;
step 2, starting a user-defined voice skill training process to guide a user to complete single-step configuration of the user-defined voice skill;
step 3, the system verifies the effectiveness of the user-defined voice skill, judges whether the user-defined voice skill is supported or not, reminds and guides the user to set the currently supported skill, inquires the user whether the setting is finished or the training is continued or not, if the user confirms that the training is finished, the training process is finished, and if the user confirms that the training is not finished or continued, the step 2 is circulated;
step 4, the system configures and stores the user-defined voice skills, pops up a skill confirmation interface, inquires the user whether to add similar instructions and confirm completion, if so, the step 2 is carried out, and if so, the next step is carried out;
step 5, generating new self-defined voice skills and prompting a user to select a corresponding vehicle-mounted scene;
the user-defined voice skill using method comprises the following steps:
step 6, inputting a voice instruction by a user;
step 7, obtaining a system distribution identification result;
step 8, the system identifies the current vehicle-mounted scene;
step 9, the system carries out semantic arbitration, the system preferentially selects the user-defined voice skill, if the scene is applicable, the next step is carried out, if the scene is not matched with the system, the original semantic skill is selected, if the scene is not matched with the system, the user is guided to train the new skill, and the step 2 is carried out;
step 10, the system acquires a user-defined voice skill configuration;
and step 11, the system executes the custom voice skill instruction.
Further, in the step 1, the system receives a request input of a user-defined voice skill through a user-defined voice skill trigger module, analyzes whether the user-defined voice skill belongs to the existing skill, prompts whether the user needs to be updated if the user belongs to the existing skill, and enters a user-defined voice skill training process after the user selects to update; if the skill is not the existing skill, directly entering a user-defined voice skill training process; the user-defined voice skill triggering module calls the user-defined voice skill display public module to obtain a user-defined voice skill training guide interface and a corpus.
Further, in step 2, the user-defined voice skill training process is a multi-turn conversation process, and an interactive interface and a corpus in the process are acquired from a user-defined voice skill display public module.
Further, in step 4, the system calls a custom voice skill display public module through a custom voice skill generation module, obtains a skill definition confirmation interface and a prompt, and converts a multi-turn conversation process into a custom voice skill configuration file or a data format; the user can select to add a similar voice command through manual or voice input, and if the user selects to add the similar voice command, the user-defined voice skill configuration file or data format is updated after the user finishes inputting the similar command; the user can manually modify whether to wait for each step to be completed before proceeding to the next step in the process of self-defining skills, the waiting is required by default, and the user is not allowed to modify the waiting mark aiming at some specific skills.
Further, in step 5, an applicable scene is selected for the current customized voice skill by calling the vehicle-mounted scene selection module through the customized voice skill generation module, the full scene is selected by default to be general, and the user can also select a sub-scene which can be subdivided according to the identity of the visitor; and the custom voice skill generation module is used for storing the generated configuration file path or data to the custom voice skill management public module.
Further, in step 7, the system simultaneously distributes the voice command input by the user to the custom voice skill matching module and the original voice skill analysis module, and obtains the return results of the custom voice skill matching module and the original voice skill analysis module.
Further, in step 8, the system identifies the current scene through the vehicle-mounted scene identification module according to the user position and the system state, and sends the identification result received in the last step to the vehicle-mounted voice skill arbitration module together.
Further, in step 10, the system reads the specific configuration file content of the custom voice skill from the custom voice skill management public module through the custom voice skill matching module, and sends the specific configuration file content to the custom voice skill execution module.
In summary, the technical scheme of the invention has the following beneficial effects:
the invention solves the problems that the existing vehicle-mounted voice control system basically has no personalized voice control skill special for a user, is difficult to reflect the differentiation of products, lacks the perception of a vehicle-mounted real scene, cannot cover different speaking instructions of all users, cannot switch the user into a favorite APP or a service provider, has functions such as personalized TTS setting and the like, but has only the use permission for the voice skill of the user and does not increase the entrance of a personalized voice function. The invention has the following advantages:
(1) by customizing the voice skills, the user can experience the functions which are not available in the original vehicle-mounted voice control product, and the actual requirements of individual users are better met;
(2) the personalized linkage requirements under a vehicle-mounted scene can be met simultaneously, and the preferred saying of the user can be covered;
(3) the skill recognition rate, particularly the recognition rate of the fuzzy intention is improved;
(4) the voice prompting device can absorb the desire of users to use voice, and improves the participation feeling and product experience of the users.
The invention has the innovation point that a user can directly train to newly build or modify the personalized vehicle-mounted voice skills in a voice multi-turn conversation mode, and the thousand-face experience effect of the vehicle-mounted voice function is realized. The vehicle-mounted voice skill self-defining training engine and the execution engine are suitable for scenes which cannot be identified or do not meet the expectations of users under the existing vehicle-mounted voice control, the users can self-define personalized voice skills through the training engine, and the training engine can actively guide the users to complete the whole training process of the self-defined voice skills aiming at the scenes which cannot be identified; and aiming at the scene which is identified to be not in accordance with the expectation, the user can also automatically control and modify the original voice skill so as to generate the vehicle-mounted voice control skill which is exclusive to the individual. The user can use the original voice skill and can also use own exclusive skill under the trained scene when using the vehicle-mounted voice function. The method can solve the problems that the existing vehicle-mounted voice control function is seriously homogenized and lacks of personalized pain points, can make up the defects that the existing vehicle-mounted voice control intention is not fully identified and is not in line with the expectation of a user, can increase the interaction frequency of the user participating in the vehicle voice control, and improves the user stickiness of a vehicle-mounted voice product. Therefore, the method can well make up for the insufficient experience of the existing vehicle-mounted voice product, improves the personalized experience and intelligence of the product, and has strong practicability. According to the invention, by allowing the user to define the voice skill, the scene which cannot be covered by the original vehicle-mounted voice function can be realized, and the personalized voice control function of the user can also be realized. When leaving the factory, voice function is fixed and quantity is limited, but through allowing the user to define the voice skill, on-vehicle voice control's skill can infinitely increase, and every user can both possess own voice control function, like the voice linkage control scene, and the product differentiation is obvious, and intelligence and practicality also obviously promote. In addition, by self-defining the voice skills, the user can train the voice control function which accords with the language habit of the user, the recognition rate of the skills is improved, particularly the recognition rate of fuzzy intentions, and the service provider or APP which can not meet personal requirements of the original voice function can be modified, so that personal preference is fully met. Finally, the user can attract the desire of the user for voice to a great extent by self-defining the voice skill, and the participation sense and the product experience of the user are improved.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings used in the description of the embodiments of the present invention will be briefly described below. It is obvious that the drawings in the following description are only some embodiments of the invention, and that for a person skilled in the art, other drawings can be derived from them without inventive effort.
FIG. 1 is a block diagram of a system for customizing vehicle-mounted speech skills in accordance with the present invention;
FIG. 2 is a flow chart of a method for customizing vehicle-mounted speech skills in accordance with the present invention.
Description of reference numerals:
11-a self-defined voice skill triggering module, 12-a self-defined voice skill training module, 13-a vehicle-mounted voice skill verification module, 14-a self-defined voice skill generating module, 15-a vehicle-mounted scene selecting module, 21-a voice request input module, 22-a original voice skill analyzing module, 23-a vehicle-mounted scene recognition module,
24-vehicle-mounted voice skill arbitration module, 25-custom voice skill matching module, 26-custom voice skill execution module, 100-custom voice skill training engine, 200-custom voice skill execution engine, 300-custom voice skill management public module and 400-custom voice skill display public module.
Detailed Description
The technical solution in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention. It is to be understood that the described embodiments are merely exemplary of the invention, and not restrictive of the full scope of the invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
As shown in fig. 1, a system for customizing vehicle-mounted voice skills comprises:
the customized voice skill training engine 100 is used as a part of the user-defined personalized function, and is used for triggering, training, verifying, generating and matching scenes of the user-defined voice skill.
The custom speech skills execution engine 200, as part of the user's use of personalized functions, is used for user input, parsing, scene recognition, semantic arbitration, matching and execution of voice requests.
The customized voice skill management public module 300 is used as a public storage module for customized voice skill training and use, and is responsible for uniformly storing the configuration of the generated voice skills and providing corresponding retrieval services when the customized voice skills are used.
The custom voice skill display public module 400 is used as a public display module for training and using the custom voice skill and is responsible for interface interaction and dialogue corpus management in the process of training and using the custom voice skill.
The interfaces of the custom voice skill training engine 100, the custom voice skill execution engine 200, the custom voice skill management common module 300, and the custom voice skill display common module 400 are coupled through software.
Further, the custom voice skill training engine 100 includes:
the custom voice skill triggering module 11 is responsible for responding, arbitrating, and analyzing the request of the user for starting the custom voice skill.
The customized voice skill training module 12 is responsible for learning the specific process of the user customized voice skill through a multi-turn conversation mode.
And the vehicle-mounted voice skill verification module 13 is responsible for verifying the effectiveness of the custom voice skill.
And the custom voice skill generating module 14 is responsible for converting the user-defined voice skills into a uniform configuration protocol format.
And the vehicle-mounted scene selection module 15 is responsible for providing preset selectable user-defined voice skill use scenes for the user.
The custom speech skills execution engine 200 includes:
and the voice request input module 21 is responsible for responding and distributing the voice command of the user.
The original voice skill analysis module 22 is responsible for analyzing the self-contained voice skill intention when leaving the factory.
And the vehicle-mounted scene recognition module 23 is responsible for recognizing the current vehicle-mounted voice interaction scene, including a conversation context, a user position, a vehicle machine system state and a vehicle body part state.
And the vehicle-mounted voice skill arbitration module 24 is responsible for making arbitration decision on the user-defined voice skill and the original voice skill.
And the custom voice skill matching module 25 is responsible for acquiring the corresponding custom voice skill configuration.
And the custom voice skill execution module 26 is responsible for analyzing the configuration and executing the relevant voice skill response process.
As shown in fig. 2, a method for customizing vehicle-mounted voice skills, which uses the system for customizing vehicle-mounted voice skills, includes a customized voice skill training method and a customized voice skill using method, wherein the customized voice skill training method is performed according to the following steps:
step S1, the user self-defines the voice skill input, the system analyzes and judges, and guides the user to select;
(the system receives the request input of the user-defined voice skill through the user-defined voice skill trigger module 11, analyzes whether the request input belongs to the existing skill, if so, prompts the user whether the request input needs to be updated, enters the user-defined voice skill training process after the user selects updating, if not, directly enters the user-defined voice skill training process, and the user-defined voice skill trigger module 11 calls the user-defined voice skill display public module 400 to obtain a user-defined voice skill training guide interface and a linguistic data.)
Step S2, starting a user-defined voice skill training process to guide a user to complete the single-step configuration of the user-defined voice skill; (the custom Voice skill training procedure is a multi-turn dialogue procedure, with interactive interfaces and corpora in the procedure being obtained from the custom Voice skill display public Module 400.)
Step S3, the system (through the vehicle-mounted voice skill verification module 13) verifies the effectiveness of the user-defined voice skill and judges whether the user-defined voice skill is supported or not, the user is reminded and guided to set the currently supported skill, S3-1 inquires whether the user sets completion or continues training or not, S3-2 ends the training flow if the user confirms completion, and S3-3 circulates the step S2 if the user confirms that the training is not completed or continues;
(when validation failed, the user was allowed to retry 3-5 times, if still invalid, the system prompted the custom voice skill to fail.)
Step S4, the system configures and stores the user-defined voice skills, pops up a skill confirmation interface, S4-1 inquires whether the user adds similar instructions and completes confirmation, if so, the step S2 is switched, and if so, the next step is carried out;
(the system calls a custom voice skill display public module 400 through a custom voice skill generation module 14 to obtain a skill definition confirmation interface and a prompt and converts the multi-round conversation process into a custom voice skill configuration file or a data format. the user can select to input an additional similar voice instruction through manual or voice, and update the custom voice skill configuration file or the data format after finishing the similar instruction entry by the user
Step S5, generating new self-defined voice skills and prompting a user to select a corresponding vehicle-mounted scene;
(by the custom voice skill generation module 14, calling the vehicle scene selection module 15 to select an applicable scene for the current custom voice skill, the default selection is general for the whole scene, and the user can also select a sub-scene, wherein the sub-scene can be subdivided according to the identity of the visitor, such as the owner/the passenger driver, and the like, or the current system user mode, such as the adult mode/the child mode, and the like, and the sub-scene can be subdivided according to the system application plant skill, such as the navigation scene/the driving recording scene/the charging scene, and the like. the custom voice skill generation module 14 stores the generated configuration file path or data into the custom voice skill management public module 300.)
The user-defined voice skill using method comprises the following steps:
step S6, the user inputs a voice command (through the voice request input module 21);
step S7, obtaining a system distribution identification result; (the system simultaneously distributes the voice command input by the user to the custom voice skill matching module 25 and the original voice skill analysis module 22 to obtain the return results of the two modules.)
Step S8, the system identifies the current vehicle scene; (the system identifies the current scene according to the user position and the system state through the vehicle-mounted scene identification module 23, and sends the identification result received in the previous step to the vehicle-mounted voice skill arbitration module 24.)
Step S9, the system (through the vehicle-mounted voice skill arbitration module 24) performs semantic arbitration, the system preferentially selects the self-defined voice skill, if the scene is applicable S9-1, the next step is performed, if the scene is not matched with the system S9-2, the original semantic skill is selected, if the scene is not matched with the system S9-3, the user is guided to train a new skill, and the step S2 is switched to;
step S10, the system acquires the user-defined voice skill configuration; (the system reads the specific configuration file content of the custom voice skill from the custom voice skill management public module 300 through the custom voice skill matching module 25 and sends the configuration file content to the custom voice skill execution module 26.)
At step S11, the system executes the custom voice skill instruction (via custom voice skill execution module 26).
Taking training the customized early peak mode voice skills as an example, the following steps are described:
the user inputs the 'early peak mode', the vehicle-mounted voice assistant does not have the skill before, the user is prompted to 'apology', the user does not teach me and the user can't teach me', and the user directly enters the custom voice skill training process if the user says 'can'. The vehicle-mounted voice assistant prompts the user to "ask you what you want to do in the first step," and the user answers "listen to headline news. Assuming that the vehicle-mounted voice assistant has news skills, then prompt the user to "good, remember, do you go on? "the user answers" continue ", and the vehicle-mounted voice assistant prompts the user" what to do again next? "the user answers" play music i collect ", and if the vehicle-mounted voice assistant has the musical skill, prompts the user to" good, remember, do you go on? ", the user answers" unused ", the vehicle voice assistant prompts the user to record all of the dialog process. Assuming that the user manually or by voice inputs a similar instruction of "good morning" and then clicks for confirmation, the vehicle-mounted voice assistant will update the above multi-turn conversation process to the following configuration file content format:
Figure BDA0002706325660000171
Figure BDA0002706325660000181
if the user selects the default scenario, the content format of the custom voice skill configuration file is as follows:
Figure BDA0002706325660000182
Figure BDA0002706325660000191
the following is illustrated using the custom early peak mode voice skills as an example:
the user inputs an early peak mode, the skill is the self-defined skill of the vehicle-mounted voice assistant, the original voice skill analysis module does not support the skill, so that the self-defined voice skill is successfully identified when a result is returned, and the original voice skill analysis module fails to identify or identifies the non-vertical skills such as chatting. When the vehicle-mounted voice assistant recognizes that the current user is the vehicle owner and is currently in a scene of navigating to a company, the currently recognized user identity, application scene information and user-defined skill existing early peak mode skill are sent to the arbitration module together with the information of the early peak mode skill which does not exist in the original voice skill analysis. The vehicle-mounted voice assistant judges whether the user wants to execute the early peak mode self-defining skill at present according to the previous input information. The voice skill execution stream format for early peak mode is as follows:
Figure BDA0002706325660000201
in summary, the technical scheme of the invention has the following beneficial effects:
the invention solves the problems that the existing vehicle-mounted voice control system basically has no personalized voice control skill special for a user, is difficult to reflect the differentiation of products, lacks the perception of a vehicle-mounted real scene, cannot cover different speaking instructions of all users, cannot switch the user into a favorite APP or a service provider, has functions such as personalized TTS setting and the like, but has only the use permission for the voice skill of the user and does not increase the entrance of a personalized voice function. The invention has the following advantages:
(1) by customizing the voice skills, the user can experience the functions which are not available in the original vehicle-mounted voice control product, and the actual requirements of individual users are better met;
(2) the personalized linkage requirements under a vehicle-mounted scene can be met simultaneously, and the preferred saying of the user can be covered;
(3) the skill recognition rate, particularly the recognition rate of the fuzzy intention is improved;
(4) the voice prompting device can absorb the desire of users to use voice, and improves the participation feeling and product experience of the users.
The invention has the innovation point that a user can directly train to newly build or modify the personalized vehicle-mounted voice skills in a voice multi-turn conversation mode, and the thousand-face experience effect of the vehicle-mounted voice function is realized. The vehicle-mounted voice skill self-defining training engine and the execution engine are suitable for scenes which cannot be identified or do not meet the expectations of users under the existing vehicle-mounted voice control, the users can self-define personalized voice skills through the training engine, and the training engine can actively guide the users to complete the whole training process of the self-defined voice skills aiming at the scenes which cannot be identified; and aiming at the scene which is identified to be not in accordance with the expectation, the user can also automatically control and modify the original voice skill so as to generate the vehicle-mounted voice control skill which is exclusive to the individual. The user can use the original voice skill and can also use own exclusive skill under the trained scene when using the vehicle-mounted voice function. The method can solve the problems that the existing vehicle-mounted voice control function is seriously homogenized and lacks of personalized pain points, can make up the defects that the existing vehicle-mounted voice control intention is not fully identified and is not in line with the expectation of a user, can increase the interaction frequency of the user participating in the vehicle voice control, and improves the user stickiness of a vehicle-mounted voice product. Therefore, the method can well make up for the insufficient experience of the existing vehicle-mounted voice product, improves the personalized experience and intelligence of the product, and has strong practicability. According to the invention, by allowing the user to define the voice skill, the scene which cannot be covered by the original vehicle-mounted voice function can be realized, and the personalized voice control function of the user can also be realized. When leaving the factory, voice function is fixed and quantity is limited, but through allowing the user to define the voice skill, on-vehicle voice control's skill can infinitely increase, and every user can both possess own voice control function, like the voice linkage control scene, and the product differentiation is obvious, and intelligence and practicality also obviously promote. In addition, by self-defining the voice skills, the user can train the voice control function which accords with the language habit of the user, the recognition rate of the skills is improved, particularly the recognition rate of fuzzy intentions, and the service provider or APP which can not meet personal requirements of the original voice function can be modified, so that personal preference is fully met. Finally, the user can attract the desire of the user for voice to a great extent by self-defining the voice skill, and the participation sense and the product experience of the user are improved.
The above-described embodiments do not limit the scope of the present invention. Any modification, equivalent replacement, and improvement made within the spirit and principle of the above-described embodiments should be included in the protection scope of the technical solution.

Claims (10)

1. A system for customizing vehicle-mounted speech skills, comprising:
the user-defined voice skill training engine is used as a part of the user-defined personalized function and is used for triggering, training, verifying and generating the user-defined voice skill and matching scenes;
the user-defined speech skill execution engine is used as a part of the personalized function used by the user and is used for inputting, analyzing, scene recognition, semantic arbitration, matching and execution of the speech request used by the user;
the user-defined voice skill management public module is used as a public storage module for training and using the user-defined voice skill, and is responsible for uniformly storing the configuration of the training generated voice skill and providing corresponding retrieval service when the user-defined voice skill is used;
the user-defined voice skill display public module is used as a public display module for training and using the user-defined voice skill and is responsible for interface interaction and dialogue corpus management in the process of training and using the user-defined voice skill;
the interfaces of the custom voice skill training engine, the custom voice skill execution engine, the custom voice skill management public module and the custom voice skill display public module are coupled through software.
2. The system of customized vehicular voice skills according to claim 1, wherein said customized voice skill training engine comprises:
the user-defined voice skill triggering module is responsible for responding, arbitrating and analyzing a request of a user for starting the user-defined voice skill;
the user-defined voice skill training module is responsible for learning the specific process of user-defined voice skills in a multi-turn conversation mode;
the vehicle-mounted voice skill verification module is responsible for verifying the effectiveness of the user-defined voice skill;
the user-defined voice skill generation module is responsible for converting user-defined voice skills into a uniform configuration protocol format;
the vehicle-mounted scene selection module is in charge of providing preset selectable user-defined voice skill use scenes for a user;
the custom voice skill execution engine comprises:
the voice request input module is responsible for responding and distributing voice instructions of users;
the original voice skill analysis module is responsible for analyzing the self-contained voice skill intention when leaving the factory;
the vehicle-mounted scene recognition module is responsible for recognizing the current vehicle-mounted voice interaction scene, including a conversation context, a user position, a vehicle machine system state and a vehicle body part state;
the vehicle-mounted voice skill arbitration module is responsible for carrying out arbitration decision on the user-defined voice skill and the original voice skill;
the user-defined voice skill matching module is responsible for acquiring corresponding user-defined voice skill configuration;
and the user-defined voice skill execution module is responsible for analyzing configuration and executing a relevant voice skill response process.
3. A method for customizing vehicle-mounted voice skills, which uses the system for customizing vehicle-mounted voice skills in any one of claims 1 to 2, and is characterized by comprising a customized voice skill training method and a customized voice skill using method, wherein the customized voice skill training method is carried out according to the following steps:
step 1, a user self-defines voice skill input, and a system analyzes and judges to guide the user to select;
step 2, starting a user-defined voice skill training process to guide a user to complete single-step configuration of the user-defined voice skill;
step 3, the system verifies the effectiveness of the user-defined voice skill, judges whether the user-defined voice skill is supported or not, reminds and guides the user to set the currently supported skill, inquires the user whether the setting is finished or the training is continued or not, if the user confirms that the training is finished, the training process is finished, and if the user confirms that the training is not finished or continued, the step 2 is circulated;
step 4, the system configures and stores the user-defined voice skills, pops up a skill confirmation interface, inquires the user whether to add similar instructions and confirm completion, if so, the step 2 is carried out, and if so, the next step is carried out;
step 5, generating new self-defined voice skills and prompting a user to select a corresponding vehicle-mounted scene;
the user-defined voice skill using method comprises the following steps:
step 6, inputting a voice instruction by a user;
step 7, obtaining a system distribution identification result;
step 8, the system identifies the current vehicle-mounted scene;
step 9, the system carries out semantic arbitration, the system preferentially selects the user-defined voice skill, if the scene is applicable, the next step is carried out, if the scene is not matched with the system, the original semantic skill is selected, if the scene is not matched with the system, the user is guided to train the new skill, and the step 2 is carried out;
step 10, the system acquires a user-defined voice skill configuration;
and step 11, the system executes the custom voice skill instruction.
4. The method for customizing vehicle-mounted voice skills according to claim 3, wherein the method comprises the following steps: in the step 1, the system receives the request input of the user-defined voice skill through a user-defined voice skill trigger module, analyzes whether the user-defined voice skill belongs to the existing skill, prompts whether the user needs to be updated if the user belongs to the existing skill, and enters a user-defined voice skill training process after the user selects to update; if the skill is not the existing skill, directly entering a user-defined voice skill training process; the user-defined voice skill triggering module calls the user-defined voice skill display public module to obtain a user-defined voice skill training guide interface and a corpus.
5. The method for customizing vehicle-mounted voice skills according to claim 3, wherein the method comprises the following steps: in step 2, the customized voice skill training process is a multi-turn conversation process, and an interactive interface and a corpus in the process are acquired from a customized voice skill display public module.
6. The method for customizing vehicle-mounted voice skills according to claim 3, wherein the method comprises the following steps: step 4, the system calls a custom voice skill display public module through a custom voice skill generation module, obtains a skill definition confirmation interface and a prompt, and converts a multi-turn conversation process into a custom voice skill configuration file or a data format; the user can select to add a similar voice command through manual or voice input, and if the user selects to add the similar voice command, the user-defined voice skill configuration file or data format is updated after the user finishes inputting the similar command; the user can manually modify whether to wait for each step to be completed before proceeding to the next step in the process of self-defining skills, the waiting is required by default, and the user is not allowed to modify the waiting mark aiming at some specific skills.
7. The method for customizing vehicle-mounted voice skills according to claim 3, wherein the method comprises the following steps: step 5, calling a vehicle-mounted scene selection module through a custom voice skill generation module to select an applicable scene for the current custom voice skill, selecting a full scene for general purpose by default, and selecting a sub-scene by a user, wherein the sub-scene can be subdivided according to the identity of a visitor; and the custom voice skill generation module is used for storing the generated configuration file path or data to the custom voice skill management public module.
8. The method for customizing vehicle-mounted voice skills according to claim 3, wherein the method comprises the following steps: and 7, simultaneously distributing the voice input command of the user by the system, and sending the voice input command to the custom voice skill matching module and the original voice skill analyzing module to obtain the returned results of the custom voice skill matching module and the original voice skill analyzing module.
9. The method for customizing vehicle-mounted voice skills according to claim 3, wherein the method comprises the following steps: in step 8, the system identifies the current scene through the vehicle-mounted scene identification module according to the user position and the system state, and sends the identification result received in the last step to the vehicle-mounted voice skill arbitration module.
10. The method for customizing vehicle-mounted voice skills according to claim 3, wherein the method comprises the following steps: in step 10, the system reads the specific configuration file content of the custom voice skill from the custom voice skill management public module through the custom voice skill matching module, and sends the specific configuration file content to the custom voice skill execution module.
CN202011039892.8A 2020-09-28 2020-09-28 System and method for self-defining vehicle-mounted voice technology Pending CN112309373A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011039892.8A CN112309373A (en) 2020-09-28 2020-09-28 System and method for self-defining vehicle-mounted voice technology

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011039892.8A CN112309373A (en) 2020-09-28 2020-09-28 System and method for self-defining vehicle-mounted voice technology

Publications (1)

Publication Number Publication Date
CN112309373A true CN112309373A (en) 2021-02-02

Family

ID=74488876

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011039892.8A Pending CN112309373A (en) 2020-09-28 2020-09-28 System and method for self-defining vehicle-mounted voice technology

Country Status (1)

Country Link
CN (1) CN112309373A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113380246A (en) * 2021-06-08 2021-09-10 阿波罗智联(北京)科技有限公司 Instruction execution method, related device and computer program product
CN114049894A (en) * 2022-01-11 2022-02-15 广州小鹏汽车科技有限公司 Voice interaction method and device, vehicle and storage medium
CN114089741A (en) * 2021-10-16 2022-02-25 南昌大学 Mobile device with user-defined voice and intelligent efficient and accurate tracking function
EP4086580A1 (en) * 2021-08-27 2022-11-09 Guangzhou Xiaopeng Motors Technology Co., Ltd. Voice interaction method, apparatus and system, vehicle, and storage medium

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102842306A (en) * 2012-08-31 2012-12-26 深圳Tcl新技术有限公司 Voice control method and device as well as voice response method and device
CN104065882A (en) * 2014-06-23 2014-09-24 惠州Tcl移动通信有限公司 Mobile terminal photographing control method and system on basis of intelligent wearing equipment
CN105845136A (en) * 2015-01-13 2016-08-10 中兴通讯股份有限公司 Voice control method and device, and terminal
CN106773817A (en) * 2016-12-01 2017-05-31 北京光年无限科技有限公司 A kind of command analysis method and robot for intelligent robot
CN110211584A (en) * 2019-06-04 2019-09-06 广州小鹏汽车科技有限公司 Control method for vehicle, device, storage medium and controlling terminal
CN110544471A (en) * 2019-09-09 2019-12-06 扬州莱诺汽车科技有限公司 Intelligent control device for vehicle-mounted electric appliance
CN111063353A (en) * 2019-12-31 2020-04-24 苏州思必驰信息科技有限公司 Client processing method allowing user-defined voice interactive content and user terminal

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102842306A (en) * 2012-08-31 2012-12-26 深圳Tcl新技术有限公司 Voice control method and device as well as voice response method and device
CN104065882A (en) * 2014-06-23 2014-09-24 惠州Tcl移动通信有限公司 Mobile terminal photographing control method and system on basis of intelligent wearing equipment
CN105845136A (en) * 2015-01-13 2016-08-10 中兴通讯股份有限公司 Voice control method and device, and terminal
CN106773817A (en) * 2016-12-01 2017-05-31 北京光年无限科技有限公司 A kind of command analysis method and robot for intelligent robot
CN110211584A (en) * 2019-06-04 2019-09-06 广州小鹏汽车科技有限公司 Control method for vehicle, device, storage medium and controlling terminal
CN110544471A (en) * 2019-09-09 2019-12-06 扬州莱诺汽车科技有限公司 Intelligent control device for vehicle-mounted electric appliance
CN111063353A (en) * 2019-12-31 2020-04-24 苏州思必驰信息科技有限公司 Client processing method allowing user-defined voice interactive content and user terminal

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113380246A (en) * 2021-06-08 2021-09-10 阿波罗智联(北京)科技有限公司 Instruction execution method, related device and computer program product
EP4086580A1 (en) * 2021-08-27 2022-11-09 Guangzhou Xiaopeng Motors Technology Co., Ltd. Voice interaction method, apparatus and system, vehicle, and storage medium
CN114089741A (en) * 2021-10-16 2022-02-25 南昌大学 Mobile device with user-defined voice and intelligent efficient and accurate tracking function
CN114049894A (en) * 2022-01-11 2022-02-15 广州小鹏汽车科技有限公司 Voice interaction method and device, vehicle and storage medium

Similar Documents

Publication Publication Date Title
CN112309373A (en) System and method for self-defining vehicle-mounted voice technology
CN107199971B (en) Vehicle-mounted voice interaction method, terminal and computer readable storage medium
CN107204185B (en) Vehicle-mounted voice interaction method and system and computer readable storage medium
CN109493871A (en) The multi-screen voice interactive method and device of onboard system, storage medium and vehicle device
CN109410927A (en) Offline order word parses the audio recognition method combined, device and system with cloud
CN109036374B (en) Data processing method and device
CN111145721A (en) Personalized prompt language generation method, device and equipment
CN102439661A (en) Service oriented speech recognition for in-vehicle automated interaction
Geutner et al. Design of the VICO Spoken Dialogue System: Evaluation of User Expectations by Wizard-of-Oz Experiments.
CN106030701A (en) Method for acquiring at least two pieces of information to be acquired, comprising information content to be linked, using a speech dialogue device, speech dialogue device, and motor vehicle
CN104144192A (en) Voice interaction method and device and vehicle-mounted communication terminal
CN112735387A (en) User-defined vehicle-mounted voice skill system and method
DE102018126525A1 (en) In-vehicle system, procedure and storage medium
CN111816189A (en) Multi-tone-zone voice interaction method for vehicle and electronic equipment
CN112185379A (en) Voice interaction method and device, electronic equipment and storage medium
CN111833875A (en) Embedded voice interaction system
Huang et al. A study on the application of voice interaction in automotive human machine interface experience design
CN114005447A (en) Voice conversation interaction method, device, vehicle and medium
CN113205811A (en) Conversation processing method and device and electronic equipment
CN112035632A (en) Preferred distribution method and system suitable for multi-conversation robot collaboration task
Carlson et al. Application of speech recognition technology to ITS advanced traveler information systems
DE102018200088B3 (en) Method, device and computer-readable storage medium with instructions for processing a voice input, motor vehicle and user terminal with a voice processing
CN113113002A (en) Vehicle voice interaction method and system and voice updating system
CN115565532B (en) Voice interaction method, server and computer readable storage medium
CN113990322B (en) Voice interaction method, server, voice interaction system and medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination