WO2017128775A1

WO2017128775A1 - Voice control system, voice processing method and terminal device

Info

Publication number: WO2017128775A1
Application number: PCT/CN2016/102605
Authority: WO
Inventors: 李向阳
Original assignee: 中兴通讯股份有限公司
Priority date: 2016-01-28
Filing date: 2016-10-19
Publication date: 2017-08-03
Also published as: CN107018228A; CN107018228B

Abstract

Provided are a voice control system, a voice processing method and a terminal device. The voice control system is carried on a terminal device, and the terminal device is also carried with a plurality of different voice service applications. The voice control system comprises: a configuration module and a plurality of voice engine modules. The configuration module is arranged to bind a voice service application to at least one voice engine module according to binding requests from different voice service applications; and the voice engine module is arranged to process input information input into the voice service application, and output a processing result to a corresponding voice service application, so that the voice service application conducts voice control by utilizing the processing result. The embodiments of the present invention provide uniform voice service support for a plurality of voice service applications carried on the same terminal device by means of providing a voice control system, so as to satisfy different diverse demands of various voice service applications, and achieve the purposes of reducing resource occupation and improving efficiency at the same time.

Description

Voice control system, voice processing method and terminal device

Technical field

The present invention relates to the field of communications technologies, and in particular, to a voice control system, a voice processing method, and a terminal device.

Background technique

With the rapid development of mobile communication technology, the fourth generation of digital communication (4G) era has become popular, mobile terminals have become a necessity for people's daily life, and the hardware configuration of intelligent mobile terminals is getting higher and higher. At present, its functions are extremely complicated, and the business is also very complicated. This number has rapidly increased, and this aspect has met the diverse needs of users. Users can obtain a huge amount of information from small mobile terminals to meet the diverse needs of different user groups. On the other hand, the more functions embedded in mobile terminals, the more The more powerful the module is, the more complicated its control is, and the more cumbersome the control process is, which brings great trouble and inconvenience to the user. Intelligent voice technology has great advantages in solving such problems, and can greatly improve the experience of human-computer interaction. Therefore, there are more and more voice products based on embedded terminals.

At present, the voice products based on the embedded terminal in the prior art are independent, including the voice service and the upper layer service logic. When the terminal supports multiple voice applications, the occupied resources are large. On the other hand, the current voice service support generally has a large closedness and technical threshold, which greatly reduces the convenience of its development and use, and also makes its differentiated voice service impossible. That is, the current types of terminal voice service applications are independent, the business logic and the corresponding voice function support are coupled together, and the functional scope thereof is relatively fixed. Even if different voice service software on the same terminal has the same voice engine support, Also independent of each other.

Summary of the invention

An object of the present invention is to provide a voice control system, a voice processing method, and a terminal device, which solves the problem that multiple voice applications on a terminal device device are independent of each other and occupy a large resource.

In order to achieve the above object, an embodiment of the present invention provides a voice control system, where the voice control system is mounted on a terminal device, and the terminal device is further equipped with a plurality of different voice service applications, where the voice control system includes : a configuration module and a plurality of speech engine modules; wherein

The configuration module is configured to bind the voice service application to at least one voice engine module according to a binding request of a different voice service application;

The voice engine module is configured to process input information input to the voice service application, and output the processing result to a corresponding voice service application, so that the voice service application uses the processing result to perform voice control.

The voice control system further includes:

a business process component module connected to the voice engine module and the configuration module, the business process component module configured to perform business process interaction between the voice engine module, the configuration module, and the voice service application Logic control.

The speech engine module is a speech recognition ASR module, a speech synthesis TTS module, a natural semantic understanding NLU module, or a voiceprint recognition VPR module.

The voice control system further includes:

a voice recognition interface corresponding to the voice recognition ASR module and the natural semantic understanding NLU module, a voice synthesis interface corresponding to the voice synthesis TTS module, and a voiceprint recognition interface corresponding to the voiceprint recognition VPR module one or more.

The voice control system further includes:

An external interface corresponding to the business process component module.

The embodiment of the present invention further provides a voice processing method for multiple voice service applications, where the multiple voice service applications are installed on the same terminal device, and the voice processing method includes:

Binding with the voice service application according to a binding request of a different voice service application;

And inputting the input information of the voice service application to the bound voice service application, and outputting the processing result to the corresponding voice service application, so that the voice service should be The voice control is performed by using the processing result.

The plurality of voice service applications are in an active state at different times.

The voice service includes a voice recognition ASR service, a voice synthesis TTS service, a natural semantic understanding NLU service, or a voiceprint recognition VPR service.

The embodiment of the present invention further provides a terminal device, including a voice control system, where the voice control system is mounted on the terminal device, and the terminal device is further equipped with a plurality of different voice service applications, and the voice control system is Including: a configuration module and a plurality of speech engine modules; wherein

The voice control system further includes:

The above technical solutions of the embodiments of the present invention have at least the following beneficial effects:

The voice control system, the voice processing method, and the terminal device in the embodiment of the present invention provide a unified voice service support for multiple voice service applications on the same terminal device by providing a voice control system, thereby satisfying each voice service application. Different differential needs, at the same time, achieve the purpose of reducing resource occupation and improving efficiency.

DRAWINGS

1 is a schematic structural diagram of a voice control system according to an embodiment of the present invention;

2 is a flow chart showing the basic steps of a voice processing method according to an embodiment of the present invention;

FIG. 3 is a diagram showing a speech recognition state transition diagram in a voice control system according to an embodiment of the present invention; FIG.

FIG. 4 is a diagram showing a state transition of a speech synthesis state in a voice control system according to an embodiment of the present invention.

detailed description

The technical problems, the technical solutions, and the advantages of the embodiments of the present invention will be more clearly described in the following description.

The embodiment of the present invention provides a voice control system, a voice processing method, and a terminal device for providing a voice control system, a voice control system, and a plurality of voice applications on a terminal device, which are independent of each other and occupy a large resource. A plurality of voice service applications on the same terminal device provide unified voice service support, so as to meet different differential requirements of each voice service application, and at the same time, the purpose of reducing resource occupation and improving efficiency is achieved.

As shown in FIG. 1 , an embodiment of the present invention provides a voice control system, where the voice control system is mounted on a terminal device, and the terminal device is further equipped with a plurality of different voice service applications, and the voice control system is The system includes: a configuration module 10 and a plurality of voice engine modules 20; wherein

The configuration module 10 is configured to bind the voice service application to at least one voice engine module according to a binding request of a different voice service application;

The voice engine module 20 is configured to process input information input to the voice service application, and output the processing result to a corresponding voice service application, so that the voice service application uses the processing result to perform voice control.

In the foregoing embodiment of the present invention, the configuration module 10 mainly implements configurability of the voice control system, and can perform configurability of the voice engine on the voice platform system according to different demand scenarios; The combination is configured to support only one of the speech engine modules 20, or a subset of any of the optional speech engine modules. At the same time, the voice control system can be configurable for voice language, and the supported voice services can be configured according to the needs of different regions to realize the localization of the voice application. For the voice service application software that needs voice function at the upper layer, according to the function of voice function, When moving, you need to bind the voice control system. For example, an application software only needs the function of voice recognition, and only needs to be bound with the voice recognition module (a type of voice engine module), and the entire function from the audio input to the recognition result output can be realized through the voice recognition module. Voice service applications only need to use the recognition results to process the control logic.

Optionally, the voice control system in the foregoing embodiment of the present invention further includes:

a business process component module 30 connected to the voice engine module 20 and the configuration module 10, the business process component module 30 being configured to apply to the voice engine module 20, the configuration module 10, and the voice service application The business process interaction between the two is logically controlled.

The business process component module 30 provided by the above embodiment of the present invention includes a voice common standard process component that is often set as a terminal device. In addition to supporting the functions supported by the plurality of voice engine modules 20, the component further includes other commonly used terminal devices. Functional business process interaction logic control. As shown in FIG. 1, the business process component module 30 includes a plurality of business process components, one business application of the terminal device may correspond to one or more business process components, and one business process component may also be configured as one or more terminal device services. The application is not specifically limited herein.

Specifically, in the foregoing embodiment of the present invention, the voice engine module is a voice recognition ASR module, a voice synthesis TTS module, a natural semantic understanding NLU module, or a voiceprint recognition VPR module. Among them, the speech recognition (ASR) module: the speech recognition module mainly analyzes and recognizes the audio recording input by the user through various algorithms such as pattern recognition, and finally outputs the recognition result in an agreed text format, and ends the identification. The voice recognition module includes a voice wakeup submodule, and the voice wakeup submodule is configured to continuously identify the wakeup words preset by the user. Similar to the normal recognition, the voice wakeup submodule analyzes and recognizes the audio input by the user according to the wakeup word. After returning the text effect of the agreed format, the next recording monitor will be started immediately, so that the user can input the audio for recognition at any time.

The speech synthesis TTS module: the speech synthesis module mainly associates the text data with the audio data according to the text data stream input by the user, and finally synthesizes the input text data stream into an audio data stream output.

Natural semantic understanding of the NLU module: recognition of the user's audio input, and identification Based on further semantic analysis, the real intention of the user's utterance is obtained, and resources for further information content are provided according to the user's intention.

Voiceprint recognition VPR module: The voiceprint recognition module first performs data collection and feature extraction based on the audio data input by the user, extracts the user's audio features and related parameters, and saves and matches and authenticates the user's audio input. Primary user security scenario.

Preferably, the voice control system in the above embodiment of the present invention further includes:

The voice control system provided by the embodiment of the present invention provides a unified voice recognition interface according to the voice function, and provides a unified voice recognition interface, and the voice synthesis (TTS) function provides a unified voice synthesis interface, and the voice wakeup provides uniformity. The voice wake-up interface, voiceprint recognition (VPR) provides a unified voiceprint recognition interface.

Optionally, the voice control system provided by the embodiment of the present invention further provides an external interface corresponding to the service process component module 30.

For the business application software that needs voice function at the upper layer, according to its function of implementing voice, when it starts, bind the voice control system and call the corresponding voice function interface that it needs, for example, an application software only needs voice recognition. The function can realize the whole function from audio input to recognition result output by calling the interface of speech recognition. The application only needs to use the recognition result to process the control logic. Similarly, the application can also call the voice according to its own needs. Multiple voice function module interfaces supported by the platform to implement corresponding voice functions. Further, the upper layer application software can also conveniently implement the voice function support and control logic of the corresponding service by calling the external interface corresponding to the business process component module 30 of the voice platform system.

In summary, the voice control system provided by the embodiment of the present invention provides a unified voice service for a voice service application on an intelligent terminal, and all voice service applications on the terminal can obtain a corresponding voice service by calling a voice control system, without having to Each contains a separate speech engine, The resource platform is saved. At the same time, the configurability of the voice platform engine can meet the different requirements of different voice services, greatly facilitating the integration of different voice services and improving the user experience of the terminal.

As shown in FIG. 2, the embodiment of the present invention further provides a voice processing method for multiple voice service applications, where the multiple voice service applications are mounted on the same terminal device, and the voice processing is performed. Methods include:

Step 21: Bind the voice service application according to a binding request of a different voice service application.

Step 22: For the bound voice service application, process input information input to the voice service application, and output the processing result to the corresponding voice service application, so that the voice service application uses the processing result to perform voice control.

Optionally, in the voice processing method provided by the embodiment of the present invention, the multiple voice service applications are in an active state at different time intervals.

Specifically, the voice service includes a voice recognition ASR service, a voice synthesis TTS service, a natural semantic understanding NLU service, or a voiceprint recognition VPR service. The plurality of voice services mentioned in the embodiments of the present invention are a combination of any two or more of the foregoing voice services.

Among them, the speech recognition (ASR) service: the speech recognition module mainly analyzes and recognizes the audio input input by the user through various algorithms such as pattern recognition, and finally outputs the recognition result in an agreed text format, and ends the identification. The voice recognition module includes a voice wakeup submodule, and the voice wakeup submodule is configured to continuously identify the wakeup words preset by the user. Similar to the normal recognition, the voice wakeup submodule analyzes and recognizes the audio input by the user according to the wakeup word. After returning the text effect of the agreed format, the next recording monitor will be started immediately, so that the user can input the audio for recognition at any time.

Speech synthesis TTS service: The speech synthesis module mainly associates the text data with the audio data according to the text data stream input by the user, and finally synthesizes the input text data stream into an audio data stream output.

Natural semantic understanding of NLU services: recognition of the user's audio input, and identification Based on further semantic analysis, the real intention of the user's utterance is obtained, and resources for further information content are provided according to the user's intention.

Voiceprint recognition VPR service: The voiceprint recognition module first performs data collection and feature extraction based on the audio data input by the user, extracts the user's audio features and related parameters, and saves and matches and authenticates the user's audio input. Primary user security scenario.

In the embodiment of the present invention, the recording resources of the terminal device are generally exclusive, and only one application can occupy the recording device at the same time, which means that only one application is in an active state at the same time, and applications at different times can be cross-active. Use the voice service support of the same voice control system. However, if the user opens two applications at the same time, the application with higher priority occupies the recording device, and the application with lower priority is automatically disconnected; it should be noted that the priority of the application can be preset or between applications. The interaction decision is not limited to a fixed form.

An example is as follows:

Here is an example of supporting two voice service application products on the smart terminal platform, wherein the application one is a voice assistant, which can perform full voice control on most functions of the mobile phone in a normal use environment, such as making a call, sending a text message, and playing music. , voice-activated camera, life service voice search, etc.; another voice service application 2 is a driving assistant, which can perform full voice control such as navigation, making a call, texting, playing music and the like in a driving environment.

In order to save system resources as much as possible, firstly, according to the requirements of these two applications, the function configuration that the voice platform system needs to support is determined. Here, three engine support, namely voice recognition, voice wake-up and voice synthesis, are required, and then the configuration module reads the configuration. File builds a version of the voice platform system that meets the needs without redundancy.

The calling process of application one is as follows:

To apply the voice service of the voice platform system, the voice platform system must be bound first. After the binding operation is successful, the voice function engine needs to be initialized. In terms of voice recognition, the syntax needs to be loaded after initialization, and the syntax is successfully loaded. After that, the ready state of speech recognition is reached. Similarly, speech synthesis also needs to be initialized by the engine. After the initialization is successful, the ready state of speech synthesis is reached. For speech recognition (including voice wakeup), prepare After the state, the voice starts recording, and the recording is recognized. After the recognition is successful, the recognition result of the text is returned, and the application operates according to the recognition result and continues to the next voice interaction process or enters the end state, as shown in FIG. Transfer map. For speech synthesis, after entering the ready state, if the application needs to broadcast the corresponding text, the corresponding text can be used as a parameter to start the speech synthesis, and the device broadcasts the incoming text, and then performs related operations. And enter the corresponding next ring voice interaction process, or enter the end state, as shown in the state transition diagram shown in Figure 4.

The voice calling process of application 2 is similar to that of the application 1. The recording resources of the current terminal device are generally exclusive, and only one application can occupy the recording device at the same time, which means that only one application is active at the same time, and different times are different. Applications can be cross-active and supported using voice services from the same voice platform system.

It should be noted that, similar to the above, the voice service application that can support any number of different differentiated functions under the condition that the terminal hardware allows, is not limited to the case described in this embodiment.

In order to achieve the above objective, the embodiment of the present invention further provides a terminal device, including a voice control system, where the voice control system is mounted on the terminal device, and the terminal device is further equipped with multiple different voices. a service application, the voice control system includes: a configuration module and a plurality of voice engine modules; wherein

Specifically, the voice control system in the specific embodiment of the present invention further includes:

Specifically, the voice engine module in the specific embodiment of the present invention is a voice recognition ASR module, a voice synthesis TTS module, a natural semantic understanding NLU module, or a voiceprint recognition VPR module.

An external interface corresponding to the business process component module.

It should be noted that, the terminal device provided by the foregoing embodiment of the present invention is a terminal device that carries the voice control system and the voice processing method, and all embodiments of the voice control system and the voice processing method are appropriately configured as the terminal device, and Both can achieve the same or similar benefits.

The above is a preferred embodiment of the present invention, and it should be noted that those skilled in the art can also make several improvements and retouchings without departing from the principles of the present invention. It should also be considered as the scope of protection of the present invention.

Industrial applicability

The foregoing embodiments and the preferred embodiments provide unified voice service support for multiple voice service applications on the same terminal device, so as to meet different differential requirements of each voice service application, and at the same time, reduce resource occupation and improve efficiency. purpose.

Claims

A voice control system, the voice control system is mounted on a terminal device, and the terminal device is further equipped with a plurality of different voice service applications, and the voice control system includes: a configuration module and a plurality of voice engine modules; among them,

The configuration module is configured to bind the voice service application to at least one voice engine module according to a binding request of a different voice service application;

The voice engine module is configured to process input information input to the voice service application, and output the processing result to a corresponding voice service application, so that the voice service application uses the processing result to perform voice control.
The voice control system of claim 1 wherein said voice control system further comprises:

a business process component module connected to the voice engine module and the configuration module, the business process component module configured to perform business process interaction between the voice engine module, the configuration module, and the voice service application Logic control.
The voice control system according to claim 1, wherein the voice engine module is a voice recognition ASR module, a voice synthesis TTS module, a natural semantic understanding NLU module, or a voiceprint recognition VPR module.
The voice control system of claim 3, wherein the voice control system further comprises:

a voice recognition interface corresponding to the voice recognition ASR module and the natural semantic understanding NLU module, a voice synthesis interface corresponding to the voice synthesis TTS module, and a voiceprint recognition interface corresponding to the voiceprint recognition VPR module one or more.
The voice control system of claim 2, wherein the voice control system further comprises:

An external interface corresponding to the business process component module.
A voice processing method for a plurality of voice service applications, where the plurality of voice service applications are mounted on the same terminal device, and the voice processing method includes:

Binding with the voice service application according to a binding request of a different voice service application;

For the bound voice service application, the input information input to the voice service application is processed, and the processing result is output to the corresponding voice service application, so that the voice service application uses the processing result to perform voice control.
The voice processing method of a plurality of voice service applications according to claim 6, wherein the plurality of voice service applications are in an active state at different time intervals.
The voice processing method for a plurality of voice service applications according to claim 7, wherein the voice service comprises a voice recognition ASR service, a voice synthesis TTS service, a natural semantic understanding NLU service, or a voiceprint recognition VPR service.
A terminal device includes a voice control system, and the voice control system is mounted on the terminal device, and the terminal device is further equipped with a plurality of different voice service applications, where the voice control system includes: a configuration module and multiple Voice engine modules; among them,

The configuration module is configured to bind the voice service application to at least one voice engine module according to a binding request of a different voice service application;

The voice engine module is configured to process input information input to the voice service application, and output the processing result to a corresponding voice service application, so that the voice service application uses the processing result to perform voice control.
The terminal device according to claim 9, wherein the voice control system further comprises:

a business process component module connected to the voice engine module and the configuration module, the business process component module configured to perform business process interaction between the voice engine module, the configuration module, and the voice service application Logic control.
The terminal device according to claim 9, wherein the speech engine module is a speech recognition ASR module, a speech synthesis TTS module, a natural semantic understanding NLU module, or a voiceprint recognition VPR module.