WO2017128775A1 - Voice control system, voice processing method and terminal device - Google Patents

Voice control system, voice processing method and terminal device Download PDF

Info

Publication number
WO2017128775A1
WO2017128775A1 PCT/CN2016/102605 CN2016102605W WO2017128775A1 WO 2017128775 A1 WO2017128775 A1 WO 2017128775A1 CN 2016102605 W CN2016102605 W CN 2016102605W WO 2017128775 A1 WO2017128775 A1 WO 2017128775A1
Authority
WO
WIPO (PCT)
Prior art keywords
voice
module
control system
service application
voice service
Prior art date
Application number
PCT/CN2016/102605
Other languages
French (fr)
Chinese (zh)
Inventor
李向阳
Original Assignee
中兴通讯股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 中兴通讯股份有限公司 filed Critical 中兴通讯股份有限公司
Publication of WO2017128775A1 publication Critical patent/WO2017128775A1/en

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M1/00Substation equipment, e.g. for use by subscribers
    • H04M1/72Mobile telephones; Cordless telephones, i.e. devices for establishing wireless links to base stations without route selection
    • H04M1/724User interfaces specially adapted for cordless or mobile telephones
    • H04M1/72403User interfaces specially adapted for cordless or mobile telephones with means for local support of applications that increase the functionality
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M1/00Substation equipment, e.g. for use by subscribers
    • H04M1/247Telephone sets including user guidance or feature selection means facilitating their use
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M1/00Substation equipment, e.g. for use by subscribers
    • H04M1/02Constructional features of telephone sets

Definitions

  • the present invention relates to the field of communications technologies, and in particular, to a voice control system, a voice processing method, and a terminal device.
  • the voice products based on the embedded terminal in the prior art are independent, including the voice service and the upper layer service logic.
  • the terminal supports multiple voice applications, the occupied resources are large.
  • the current voice service support generally has a large closedness and technical threshold, which greatly reduces the convenience of its development and use, and also makes its differentiated voice service impossible. That is, the current types of terminal voice service applications are independent, the business logic and the corresponding voice function support are coupled together, and the functional scope thereof is relatively fixed. Even if different voice service software on the same terminal has the same voice engine support, Also independent of each other.
  • An object of the present invention is to provide a voice control system, a voice processing method, and a terminal device, which solves the problem that multiple voice applications on a terminal device device are independent of each other and occupy a large resource.
  • an embodiment of the present invention provides a voice control system, where the voice control system is mounted on a terminal device, and the terminal device is further equipped with a plurality of different voice service applications, where the voice control system includes : a configuration module and a plurality of speech engine modules; wherein
  • the configuration module is configured to bind the voice service application to at least one voice engine module according to a binding request of a different voice service application;
  • the voice engine module is configured to process input information input to the voice service application, and output the processing result to a corresponding voice service application, so that the voice service application uses the processing result to perform voice control.
  • the voice control system further includes:
  • a business process component module connected to the voice engine module and the configuration module, the business process component module configured to perform business process interaction between the voice engine module, the configuration module, and the voice service application Logic control.
  • the speech engine module is a speech recognition ASR module, a speech synthesis TTS module, a natural semantic understanding NLU module, or a voiceprint recognition VPR module.
  • the voice control system further includes:
  • a voice recognition interface corresponding to the voice recognition ASR module and the natural semantic understanding NLU module
  • a voice synthesis interface corresponding to the voice synthesis TTS module
  • a voiceprint recognition interface corresponding to the voiceprint recognition VPR module one or more.
  • the voice control system further includes:
  • An external interface corresponding to the business process component module is an external interface corresponding to the business process component module.
  • the embodiment of the present invention further provides a voice processing method for multiple voice service applications, where the multiple voice service applications are installed on the same terminal device, and the voice processing method includes:
  • the plurality of voice service applications are in an active state at different times.
  • the voice service includes a voice recognition ASR service, a voice synthesis TTS service, a natural semantic understanding NLU service, or a voiceprint recognition VPR service.
  • the embodiment of the present invention further provides a terminal device, including a voice control system, where the voice control system is mounted on the terminal device, and the terminal device is further equipped with a plurality of different voice service applications, and the voice control system is Including: a configuration module and a plurality of speech engine modules; wherein
  • the configuration module is configured to bind the voice service application to at least one voice engine module according to a binding request of a different voice service application;
  • the voice engine module is configured to process input information input to the voice service application, and output the processing result to a corresponding voice service application, so that the voice service application uses the processing result to perform voice control.
  • the voice control system further includes:
  • a business process component module connected to the voice engine module and the configuration module, the business process component module configured to perform business process interaction between the voice engine module, the configuration module, and the voice service application Logic control.
  • the speech engine module is a speech recognition ASR module, a speech synthesis TTS module, a natural semantic understanding NLU module, or a voiceprint recognition VPR module.
  • the voice control system, the voice processing method, and the terminal device in the embodiment of the present invention provide a unified voice service support for multiple voice service applications on the same terminal device by providing a voice control system, thereby satisfying each voice service application.
  • FIG. 1 is a schematic structural diagram of a voice control system according to an embodiment of the present invention.
  • FIG. 2 is a flow chart showing the basic steps of a voice processing method according to an embodiment of the present invention
  • FIG. 3 is a diagram showing a speech recognition state transition diagram in a voice control system according to an embodiment of the present invention
  • FIG. 4 is a diagram showing a state transition of a speech synthesis state in a voice control system according to an embodiment of the present invention.
  • the embodiment of the present invention provides a voice control system, a voice processing method, and a terminal device for providing a voice control system, a voice control system, and a plurality of voice applications on a terminal device, which are independent of each other and occupy a large resource.
  • a plurality of voice service applications on the same terminal device provide unified voice service support, so as to meet different differential requirements of each voice service application, and at the same time, the purpose of reducing resource occupation and improving efficiency is achieved.
  • an embodiment of the present invention provides a voice control system, where the voice control system is mounted on a terminal device, and the terminal device is further equipped with a plurality of different voice service applications, and the voice control system is The system includes: a configuration module 10 and a plurality of voice engine modules 20; wherein
  • the configuration module 10 is configured to bind the voice service application to at least one voice engine module according to a binding request of a different voice service application;
  • the voice engine module 20 is configured to process input information input to the voice service application, and output the processing result to a corresponding voice service application, so that the voice service application uses the processing result to perform voice control.
  • the configuration module 10 mainly implements configurability of the voice control system, and can perform configurability of the voice engine on the voice platform system according to different demand scenarios;
  • the combination is configured to support only one of the speech engine modules 20, or a subset of any of the optional speech engine modules.
  • the voice control system can be configurable for voice language, and the supported voice services can be configured according to the needs of different regions to realize the localization of the voice application. For the voice service application software that needs voice function at the upper layer, according to the function of voice function, When moving, you need to bind the voice control system.
  • an application software only needs the function of voice recognition, and only needs to be bound with the voice recognition module (a type of voice engine module), and the entire function from the audio input to the recognition result output can be realized through the voice recognition module.
  • Voice service applications only need to use the recognition results to process the control logic.
  • the voice control system in the foregoing embodiment of the present invention further includes:
  • a business process component module 30 connected to the voice engine module 20 and the configuration module 10, the business process component module 30 being configured to apply to the voice engine module 20, the configuration module 10, and the voice service application
  • the business process interaction between the two is logically controlled.
  • the business process component module 30 provided by the above embodiment of the present invention includes a voice common standard process component that is often set as a terminal device. In addition to supporting the functions supported by the plurality of voice engine modules 20, the component further includes other commonly used terminal devices. Functional business process interaction logic control. As shown in FIG. 1, the business process component module 30 includes a plurality of business process components, one business application of the terminal device may correspond to one or more business process components, and one business process component may also be configured as one or more terminal device services. The application is not specifically limited herein.
  • the voice engine module is a voice recognition ASR module, a voice synthesis TTS module, a natural semantic understanding NLU module, or a voiceprint recognition VPR module.
  • the speech recognition (ASR) module the speech recognition module mainly analyzes and recognizes the audio recording input by the user through various algorithms such as pattern recognition, and finally outputs the recognition result in an agreed text format, and ends the identification.
  • the voice recognition module includes a voice wakeup submodule, and the voice wakeup submodule is configured to continuously identify the wakeup words preset by the user. Similar to the normal recognition, the voice wakeup submodule analyzes and recognizes the audio input by the user according to the wakeup word. After returning the text effect of the agreed format, the next recording monitor will be started immediately, so that the user can input the audio for recognition at any time.
  • the speech synthesis TTS module mainly associates the text data with the audio data according to the text data stream input by the user, and finally synthesizes the input text data stream into an audio data stream output.
  • Natural semantic understanding of the NLU module recognition of the user's audio input, and identification Based on further semantic analysis, the real intention of the user's utterance is obtained, and resources for further information content are provided according to the user's intention.
  • Voiceprint recognition VPR module The voiceprint recognition module first performs data collection and feature extraction based on the audio data input by the user, extracts the user's audio features and related parameters, and saves and matches and authenticates the user's audio input. Primary user security scenario.
  • the voice control system in the above embodiment of the present invention further includes:
  • a voice recognition interface corresponding to the voice recognition ASR module and the natural semantic understanding NLU module
  • a voice synthesis interface corresponding to the voice synthesis TTS module
  • a voiceprint recognition interface corresponding to the voiceprint recognition VPR module one or more.
  • the voice control system provided by the embodiment of the present invention provides a unified voice recognition interface according to the voice function, and provides a unified voice recognition interface, and the voice synthesis (TTS) function provides a unified voice synthesis interface, and the voice wakeup provides uniformity.
  • the voice wake-up interface, voiceprint recognition (VPR) provides a unified voiceprint recognition interface.
  • the voice control system provided by the embodiment of the present invention further provides an external interface corresponding to the service process component module 30.
  • the business application software that needs voice function at the upper layer, according to its function of implementing voice, when it starts, bind the voice control system and call the corresponding voice function interface that it needs, for example, an application software only needs voice recognition.
  • the function can realize the whole function from audio input to recognition result output by calling the interface of speech recognition.
  • the application only needs to use the recognition result to process the control logic.
  • the application can also call the voice according to its own needs.
  • the upper layer application software can also conveniently implement the voice function support and control logic of the corresponding service by calling the external interface corresponding to the business process component module 30 of the voice platform system.
  • the voice control system provided by the embodiment of the present invention provides a unified voice service for a voice service application on an intelligent terminal, and all voice service applications on the terminal can obtain a corresponding voice service by calling a voice control system, without having to Each contains a separate speech engine, The resource platform is saved.
  • the configurability of the voice platform engine can meet the different requirements of different voice services, greatly facilitating the integration of different voice services and improving the user experience of the terminal.
  • the embodiment of the present invention further provides a voice processing method for multiple voice service applications, where the multiple voice service applications are mounted on the same terminal device, and the voice processing is performed.
  • Methods include:
  • Step 21 Bind the voice service application according to a binding request of a different voice service application.
  • Step 22 For the bound voice service application, process input information input to the voice service application, and output the processing result to the corresponding voice service application, so that the voice service application uses the processing result to perform voice control.
  • the multiple voice service applications are in an active state at different time intervals.
  • the voice service includes a voice recognition ASR service, a voice synthesis TTS service, a natural semantic understanding NLU service, or a voiceprint recognition VPR service.
  • the plurality of voice services mentioned in the embodiments of the present invention are a combination of any two or more of the foregoing voice services.
  • the speech recognition (ASR) service the speech recognition module mainly analyzes and recognizes the audio input input by the user through various algorithms such as pattern recognition, and finally outputs the recognition result in an agreed text format, and ends the identification.
  • the voice recognition module includes a voice wakeup submodule, and the voice wakeup submodule is configured to continuously identify the wakeup words preset by the user. Similar to the normal recognition, the voice wakeup submodule analyzes and recognizes the audio input by the user according to the wakeup word. After returning the text effect of the agreed format, the next recording monitor will be started immediately, so that the user can input the audio for recognition at any time.
  • Speech synthesis TTS service The speech synthesis module mainly associates the text data with the audio data according to the text data stream input by the user, and finally synthesizes the input text data stream into an audio data stream output.
  • Natural semantic understanding of NLU services recognition of the user's audio input, and identification Based on further semantic analysis, the real intention of the user's utterance is obtained, and resources for further information content are provided according to the user's intention.
  • Voiceprint recognition VPR service The voiceprint recognition module first performs data collection and feature extraction based on the audio data input by the user, extracts the user's audio features and related parameters, and saves and matches and authenticates the user's audio input. Primary user security scenario.
  • the recording resources of the terminal device are generally exclusive, and only one application can occupy the recording device at the same time, which means that only one application is in an active state at the same time, and applications at different times can be cross-active.
  • the application with higher priority occupies the recording device, and the application with lower priority is automatically disconnected; it should be noted that the priority of the application can be preset or between applications.
  • the interaction decision is not limited to a fixed form.
  • the application one is a voice assistant, which can perform full voice control on most functions of the mobile phone in a normal use environment, such as making a call, sending a text message, and playing music. , voice-activated camera, life service voice search, etc.; another voice service application 2 is a driving assistant, which can perform full voice control such as navigation, making a call, texting, playing music and the like in a driving environment.
  • the function configuration that the voice platform system needs to support is determined.
  • three engine support, namely voice recognition, voice wake-up and voice synthesis, are required, and then the configuration module reads the configuration.
  • File builds a version of the voice platform system that meets the needs without redundancy.
  • the voice platform system To apply the voice service of the voice platform system, the voice platform system must be bound first. After the binding operation is successful, the voice function engine needs to be initialized. In terms of voice recognition, the syntax needs to be loaded after initialization, and the syntax is successfully loaded. After that, the ready state of speech recognition is reached. Similarly, speech synthesis also needs to be initialized by the engine. After the initialization is successful, the ready state of speech synthesis is reached. For speech recognition (including voice wakeup), prepare After the state, the voice starts recording, and the recording is recognized. After the recognition is successful, the recognition result of the text is returned, and the application operates according to the recognition result and continues to the next voice interaction process or enters the end state, as shown in FIG. Transfer map.
  • voice recognition including voice wakeup
  • the corresponding text can be used as a parameter to start the speech synthesis, and the device broadcasts the incoming text, and then performs related operations. And enter the corresponding next ring voice interaction process, or enter the end state, as shown in the state transition diagram shown in Figure 4.
  • the voice calling process of application 2 is similar to that of the application 1.
  • the recording resources of the current terminal device are generally exclusive, and only one application can occupy the recording device at the same time, which means that only one application is active at the same time, and different times are different.
  • Applications can be cross-active and supported using voice services from the same voice platform system.
  • the voice service application that can support any number of different differentiated functions under the condition that the terminal hardware allows, is not limited to the case described in this embodiment.
  • the embodiment of the present invention further provides a terminal device, including a voice control system, where the voice control system is mounted on the terminal device, and the terminal device is further equipped with multiple different voices.
  • the voice control system includes: a configuration module and a plurality of voice engine modules; wherein
  • the configuration module is configured to bind the voice service application to at least one voice engine module according to a binding request of a different voice service application;
  • the voice engine module is configured to process input information input to the voice service application, and output the processing result to a corresponding voice service application, so that the voice service application uses the processing result to perform voice control.
  • the voice control system in the specific embodiment of the present invention further includes:
  • a business process component module connected to the voice engine module and the configuration module, the business process component module configured to perform business process interaction between the voice engine module, the configuration module, and the voice service application Logic control.
  • the voice engine module in the specific embodiment of the present invention is a voice recognition ASR module, a voice synthesis TTS module, a natural semantic understanding NLU module, or a voiceprint recognition VPR module.
  • the voice control system in the specific embodiment of the present invention further includes:
  • a voice recognition interface corresponding to the voice recognition ASR module and the natural semantic understanding NLU module
  • a voice synthesis interface corresponding to the voice synthesis TTS module
  • a voiceprint recognition interface corresponding to the voiceprint recognition VPR module one or more.
  • the voice control system in the specific embodiment of the present invention further includes:
  • An external interface corresponding to the business process component module is an external interface corresponding to the business process component module.
  • the terminal device provided by the foregoing embodiment of the present invention is a terminal device that carries the voice control system and the voice processing method, and all embodiments of the voice control system and the voice processing method are appropriately configured as the terminal device, and Both can achieve the same or similar benefits.
  • the foregoing embodiments and the preferred embodiments provide unified voice service support for multiple voice service applications on the same terminal device, so as to meet different differential requirements of each voice service application, and at the same time, reduce resource occupation and improve efficiency. purpose.

Landscapes

  • Engineering & Computer Science (AREA)
  • Human Computer Interaction (AREA)
  • Signal Processing (AREA)
  • Acoustics & Sound (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Multimedia (AREA)
  • Computational Linguistics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Telephonic Communication Services (AREA)
  • Telephone Function (AREA)

Abstract

Provided are a voice control system, a voice processing method and a terminal device. The voice control system is carried on a terminal device, and the terminal device is also carried with a plurality of different voice service applications. The voice control system comprises: a configuration module and a plurality of voice engine modules. The configuration module is arranged to bind a voice service application to at least one voice engine module according to binding requests from different voice service applications; and the voice engine module is arranged to process input information input into the voice service application, and output a processing result to a corresponding voice service application, so that the voice service application conducts voice control by utilizing the processing result. The embodiments of the present invention provide uniform voice service support for a plurality of voice service applications carried on the same terminal device by means of providing a voice control system, so as to satisfy different diverse demands of various voice service applications, and achieve the purposes of reducing resource occupation and improving efficiency at the same time.

Description

一种语音控制系统、语音处理方法及终端设备Voice control system, voice processing method and terminal device 技术领域Technical field
本发明涉及通信技术领域,特别涉及一种语音控制系统、语音处理方法及终端设备。The present invention relates to the field of communications technologies, and in particular, to a voice control system, a voice processing method, and a terminal device.
背景技术Background technique
随着移动通信技术迅猛发展,第四代数字通信(4G)时代开始普及,移动终端已成为人们日常生活的必需品,智能移动终端的硬件配置越来越高,目前其功能已极其繁杂,业务也迅速增多,这一方面满足了用户的多种需要,用户能够从小小的移动终端上获得海量的信息,满足不同用户群体的多种需求,但另一方面手机终端所嵌入的功能越多、各模块的功能越强大,其控制也就越复杂,控制流程也就越繁琐,从而给用户带来极大的困扰和不便。智能语音技术在解决此类问题上有显出极大的优势,能够极大地提高人机交互的体验,所以目前基于嵌入式终端的语音产品也越来越多。With the rapid development of mobile communication technology, the fourth generation of digital communication (4G) era has become popular, mobile terminals have become a necessity for people's daily life, and the hardware configuration of intelligent mobile terminals is getting higher and higher. At present, its functions are extremely complicated, and the business is also very complicated. This number has rapidly increased, and this aspect has met the diverse needs of users. Users can obtain a huge amount of information from small mobile terminals to meet the diverse needs of different user groups. On the other hand, the more functions embedded in mobile terminals, the more The more powerful the module is, the more complicated its control is, and the more cumbersome the control process is, which brings great trouble and inconvenience to the user. Intelligent voice technology has great advantages in solving such problems, and can greatly improve the experience of human-computer interaction. Therefore, there are more and more voice products based on embedded terminals.
目前,现有技术中基于嵌入式终端的语音产品都是各自独立的,包括语音服务和上层业务逻辑,终端如果支持多个语音应用时,占用的资源较大。另一方面,目前语音服务的支持普遍存在较大的封闭性和技术门槛,使其开发使用的便利性大大降低,也使其差异化的语音业务无法实现。即目前的各类终端语音业务应用是独立的,业务逻辑和对应的语音功能的支持耦合在一起,其功能范畴也相对固定,同一终端上的不同的语音业务软件即使含有相同的语音引擎支持,也彼此独立。At present, the voice products based on the embedded terminal in the prior art are independent, including the voice service and the upper layer service logic. When the terminal supports multiple voice applications, the occupied resources are large. On the other hand, the current voice service support generally has a large closedness and technical threshold, which greatly reduces the convenience of its development and use, and also makes its differentiated voice service impossible. That is, the current types of terminal voice service applications are independent, the business logic and the corresponding voice function support are coupled together, and the functional scope thereof is relatively fixed. Even if different voice service software on the same terminal has the same voice engine support, Also independent of each other.
发明内容Summary of the invention
本发明实施例的目的在于提供一种语音控制系统、语音处理方法及终端设备,解决了现有技术中终端设备设备上的多个语音应用彼此独立,占用的资源较大的问题。 An object of the present invention is to provide a voice control system, a voice processing method, and a terminal device, which solves the problem that multiple voice applications on a terminal device device are independent of each other and occupy a large resource.
为了达到上述目的,本发明实施例提供一种语音控制系统,所述语音控制系统搭载在一终端设备上,所述终端设备上还搭载有多个不同的语音业务应用,所述语音控制系统包括:配置模块和多个语音引擎模块;其中,In order to achieve the above object, an embodiment of the present invention provides a voice control system, where the voice control system is mounted on a terminal device, and the terminal device is further equipped with a plurality of different voice service applications, where the voice control system includes : a configuration module and a plurality of speech engine modules; wherein
所述配置模块设置为根据不同的语音业务应用的绑定请求将所述语音业务应用与至少一个语音引擎模块绑定;The configuration module is configured to bind the voice service application to at least one voice engine module according to a binding request of a different voice service application;
所述语音引擎模块设置为对输入所述语音业务应用的输入信息进行处理,并将处理结果输出给对应的语音业务应用,使得所述语音业务应用利用所述处理结果来进行语音控制。The voice engine module is configured to process input information input to the voice service application, and output the processing result to a corresponding voice service application, so that the voice service application uses the processing result to perform voice control.
其中,所述语音控制系统还包括:The voice control system further includes:
与所述语音引擎模块和所述配置模块连接的业务流程组件模块,所述业务流程组件模块设置为对所述语音引擎模块、所述配置模块以及所述语音业务应用之间的业务流程交互进行逻辑控制。a business process component module connected to the voice engine module and the configuration module, the business process component module configured to perform business process interaction between the voice engine module, the configuration module, and the voice service application Logic control.
其中,所述语音引擎模块是语音识别ASR模块、语音合成TTS模块、自然语义理解NLU模块或者声纹识别VPR模块。The speech engine module is a speech recognition ASR module, a speech synthesis TTS module, a natural semantic understanding NLU module, or a voiceprint recognition VPR module.
其中,所述语音控制系统还包括:The voice control system further includes:
与所述语音识别ASR模块和所述自然语义理解NLU模块对应的语音识别接口、与所述语音合成TTS模块对应的语音合成接口以及与所述声纹识别VPR模块对应的声纹识别接口中的一个或多个。a voice recognition interface corresponding to the voice recognition ASR module and the natural semantic understanding NLU module, a voice synthesis interface corresponding to the voice synthesis TTS module, and a voiceprint recognition interface corresponding to the voiceprint recognition VPR module one or more.
其中,所述语音控制系统还包括:The voice control system further includes:
与所述业务流程组件模块对应的对外接口。An external interface corresponding to the business process component module.
本发明实施例还提供一种多个语音业务应用的语音处理方法,所述多个语音业务应用搭载于同一终端设备上,所述语音处理方法包括:The embodiment of the present invention further provides a voice processing method for multiple voice service applications, where the multiple voice service applications are installed on the same terminal device, and the voice processing method includes:
根据不同的语音业务应用的绑定请求,与所述语音业务应用进行绑定;Binding with the voice service application according to a binding request of a different voice service application;
针对已绑定的语音业务应用,对输入所述语音业务应用的输入信息进行处理,并将处理结果输出给对应的语音业务应用,使得所述语音业务应 用利用所述处理结果来进行语音控制。And inputting the input information of the voice service application to the bound voice service application, and outputting the processing result to the corresponding voice service application, so that the voice service should be The voice control is performed by using the processing result.
其中,所述多个语音业务应用在不同时间交叉处于激活状态。The plurality of voice service applications are in an active state at different times.
其中,所述语音业务包括语音识别ASR业务、语音合成TTS业务、自然语义理解NLU业务或者声纹识别VPR业务。The voice service includes a voice recognition ASR service, a voice synthesis TTS service, a natural semantic understanding NLU service, or a voiceprint recognition VPR service.
本发明实施例还提供一种终端设备,包括语音控制系统,所述语音控制系统搭载在所述终端设备上,所述终端设备上还搭载有多个不同的语音业务应用,所述语音控制系统包括:配置模块和多个语音引擎模块;其中,The embodiment of the present invention further provides a terminal device, including a voice control system, where the voice control system is mounted on the terminal device, and the terminal device is further equipped with a plurality of different voice service applications, and the voice control system is Including: a configuration module and a plurality of speech engine modules; wherein
所述配置模块设置为根据不同的语音业务应用的绑定请求将所述语音业务应用与至少一个语音引擎模块绑定;The configuration module is configured to bind the voice service application to at least one voice engine module according to a binding request of a different voice service application;
所述语音引擎模块设置为对输入所述语音业务应用的输入信息进行处理,并将处理结果输出给对应的语音业务应用,使得所述语音业务应用利用所述处理结果来进行语音控制。The voice engine module is configured to process input information input to the voice service application, and output the processing result to a corresponding voice service application, so that the voice service application uses the processing result to perform voice control.
其中,所述语音控制系统还包括:The voice control system further includes:
与所述语音引擎模块和所述配置模块连接的业务流程组件模块,所述业务流程组件模块设置为对所述语音引擎模块、所述配置模块以及所述语音业务应用之间的业务流程交互进行逻辑控制。a business process component module connected to the voice engine module and the configuration module, the business process component module configured to perform business process interaction between the voice engine module, the configuration module, and the voice service application Logic control.
其中,所述语音引擎模块是语音识别ASR模块、语音合成TTS模块、自然语义理解NLU模块或者声纹识别VPR模块。The speech engine module is a speech recognition ASR module, a speech synthesis TTS module, a natural semantic understanding NLU module, or a voiceprint recognition VPR module.
本发明实施例的上述技术方案至少具有如下有益效果:The above technical solutions of the embodiments of the present invention have at least the following beneficial effects:
本发明实施例的语音控制系统、语音处理方法及终端设备中,通过提供一语音控制系统,对搭载在同一终端设备上的多个语音业务应用提供统一的语音服务支撑,从而满足各个语音业务应用不同的差异性需求,同时达到降低资源占用,提升效率的目的。The voice control system, the voice processing method, and the terminal device in the embodiment of the present invention provide a unified voice service support for multiple voice service applications on the same terminal device by providing a voice control system, thereby satisfying each voice service application. Different differential needs, at the same time, achieve the purpose of reducing resource occupation and improving efficiency.
附图说明DRAWINGS
图1表示本发明实施例提供的语音控制系统的组成结构示意图; 1 is a schematic structural diagram of a voice control system according to an embodiment of the present invention;
图2表示本发明实施例提供的语音处理方法的基本步骤流程图;2 is a flow chart showing the basic steps of a voice processing method according to an embodiment of the present invention;
图3表示本发明实施例提供的语音控制系统中语音识别状态转移图;FIG. 3 is a diagram showing a speech recognition state transition diagram in a voice control system according to an embodiment of the present invention; FIG.
图4表示本发明实施例提供的语音控制系统中语音合成状态转移图。FIG. 4 is a diagram showing a state transition of a speech synthesis state in a voice control system according to an embodiment of the present invention.
具体实施方式detailed description
为使本发明实施例要解决的技术问题、技术方案和优点更加清楚,下面将结合附图及具体实施例进行详细描述。The technical problems, the technical solutions, and the advantages of the embodiments of the present invention will be more clearly described in the following description.
本发明实施例针对相关技术中终端设备设备上的多个语音应用彼此独立,占用的资源较大的问题,提供一种语音控制系统、语音处理方法及终端设备,通过提供一语音控制系统,对搭载在同一终端设备上的多个语音业务应用提供统一的语音服务支撑,从而满足各个语音业务应用不同的差异性需求,同时达到降低资源占用,提升效率的目的。The embodiment of the present invention provides a voice control system, a voice processing method, and a terminal device for providing a voice control system, a voice control system, and a plurality of voice applications on a terminal device, which are independent of each other and occupy a large resource. A plurality of voice service applications on the same terminal device provide unified voice service support, so as to meet different differential requirements of each voice service application, and at the same time, the purpose of reducing resource occupation and improving efficiency is achieved.
如图1所示,本发明实施例提供一种语音控制系统,所述语音控制系统搭载在一终端设备上,所述终端设备上还搭载有多个不同的语音业务应用,所述语音控制系统包括:配置模块10和多个语音引擎模块20;其中,As shown in FIG. 1 , an embodiment of the present invention provides a voice control system, where the voice control system is mounted on a terminal device, and the terminal device is further equipped with a plurality of different voice service applications, and the voice control system is The system includes: a configuration module 10 and a plurality of voice engine modules 20; wherein
所述配置模块10设置为根据不同的语音业务应用的绑定请求将所述语音业务应用与至少一个语音引擎模块绑定;The configuration module 10 is configured to bind the voice service application to at least one voice engine module according to a binding request of a different voice service application;
所述语音引擎模块20设置为对输入所述语音业务应用的输入信息进行处理,并将处理结果输出给对应的语音业务应用,使得所述语音业务应用利用所述处理结果来进行语音控制。The voice engine module 20 is configured to process input information input to the voice service application, and output the processing result to a corresponding voice service application, so that the voice service application uses the processing result to perform voice control.
本发明的上述实施例中,配置模块10主要实现该语音控制系统的可配置化,可以根据不同的需求场景,对语音平台系统进行语音引擎的可配置化;根据需要可以对语音引擎模块20各组合进行配置,可以只支持其中一个语音引擎模块20,也可以支持任何可选语音引擎模块的子集。同时还可对语音控制系统进行语音语种的可配置化,根据不同地域的需求,对所支持语音服务进行语种配置,以实现语音应用的本地化。对于上层需要实现语音功能的语音业务应用软件,根据其实现语音的功能需要,在其启 动的时候,需绑定语音控制系统。例如,某应用软件只需要语音识别的功能,就仅需要与语音识别模块(语音引擎模块的一种)绑定,就可以通过语音识别模块来实现从音频输入到识别结果输出的整个功能,其语音业务应用只需要利用识别结果来处理控制逻辑即可。In the foregoing embodiment of the present invention, the configuration module 10 mainly implements configurability of the voice control system, and can perform configurability of the voice engine on the voice platform system according to different demand scenarios; The combination is configured to support only one of the speech engine modules 20, or a subset of any of the optional speech engine modules. At the same time, the voice control system can be configurable for voice language, and the supported voice services can be configured according to the needs of different regions to realize the localization of the voice application. For the voice service application software that needs voice function at the upper layer, according to the function of voice function, When moving, you need to bind the voice control system. For example, an application software only needs the function of voice recognition, and only needs to be bound with the voice recognition module (a type of voice engine module), and the entire function from the audio input to the recognition result output can be realized through the voice recognition module. Voice service applications only need to use the recognition results to process the control logic.
可选地,本发明的上述实施例中所述语音控制系统还包括:Optionally, the voice control system in the foregoing embodiment of the present invention further includes:
与所述语音引擎模块20和所述配置模块10连接的业务流程组件模块30,所述业务流程组件模块30设置为对所述语音引擎模块20、所述配置模块10以及所述语音业务应用之间的业务流程交互进行逻辑控制。a business process component module 30 connected to the voice engine module 20 and the configuration module 10, the business process component module 30 being configured to apply to the voice engine module 20, the configuration module 10, and the voice service application The business process interaction between the two is logically controlled.
本发明的上述实施例提供的业务流程组件模块30包括常设置为终端设备的语音通用标准流程组件,此组件除了支持上述多个语音引擎模块20支持的功能外,还包含了终端设备的其他常用功能的业务流程交互逻辑控制。如图1所示,业务流程组件模块30包含多个业务流程组件,终端设备的一个业务应用可对应一个或多个业务流程组件,一个业务流程组件也可设置为一个或多个终端设备的业务应用,在此不进行具体限定。The business process component module 30 provided by the above embodiment of the present invention includes a voice common standard process component that is often set as a terminal device. In addition to supporting the functions supported by the plurality of voice engine modules 20, the component further includes other commonly used terminal devices. Functional business process interaction logic control. As shown in FIG. 1, the business process component module 30 includes a plurality of business process components, one business application of the terminal device may correspond to one or more business process components, and one business process component may also be configured as one or more terminal device services. The application is not specifically limited herein.
具体的,本发明的上述实施例中,所述语音引擎模块是语音识别ASR模块、语音合成TTS模块、自然语义理解NLU模块或者声纹识别VPR模块。其中,语音识别(ASR)模块:语音识别模块主要对用户输入的音频录音通过模式识别等各种算法进行分析识别,最后将识别结果以约定的文本格式输出,结束本次识别。其中,语音识别模块包含语音唤醒子模块,语音唤醒子模块设置为对用户预先设置的唤醒词进行持续识别,与普通识别类似的,语音唤醒子模块对用户根据唤醒词输入的音频进行分析识别,返回约定格式的文本效果后,立即开始下次录音监听,使得用户可随时输入音频进行识别。Specifically, in the foregoing embodiment of the present invention, the voice engine module is a voice recognition ASR module, a voice synthesis TTS module, a natural semantic understanding NLU module, or a voiceprint recognition VPR module. Among them, the speech recognition (ASR) module: the speech recognition module mainly analyzes and recognizes the audio recording input by the user through various algorithms such as pattern recognition, and finally outputs the recognition result in an agreed text format, and ends the identification. The voice recognition module includes a voice wakeup submodule, and the voice wakeup submodule is configured to continuously identify the wakeup words preset by the user. Similar to the normal recognition, the voice wakeup submodule analyzes and recognizes the audio input by the user according to the wakeup word. After returning the text effect of the agreed format, the next recording monitor will be started immediately, so that the user can input the audio for recognition at any time.
语音合成TTS模块:语音合成模块主要根据用户输入的文本数据流,通过合成算法将文本数据与音频数据对应起来,最终将输入的文本数据流合成为音频数据流输出。The speech synthesis TTS module: the speech synthesis module mainly associates the text data with the audio data according to the text data stream input by the user, and finally synthesizes the input text data stream into an audio data stream output.
自然语义理解NLU模块:对用户的音频输入进行识别,并在识别的 基础上进行进一步的语义分析,得到用户话语的真实意图,并根据用户意图提供进一步的信息内容的资源。Natural semantic understanding of the NLU module: recognition of the user's audio input, and identification Based on further semantic analysis, the real intention of the user's utterance is obtained, and resources for further information content are provided according to the user's intention.
声纹识别VPR模块:声纹识别模块首先根据用户输入的音频数据,对其进行数据采集和特征提取,提取用户的音频特征和相关参数并保存,对以后用户的音频输入进行匹配和鉴权,主要用户安全场景。Voiceprint recognition VPR module: The voiceprint recognition module first performs data collection and feature extraction based on the audio data input by the user, extracts the user's audio features and related parameters, and saves and matches and authenticates the user's audio input. Primary user security scenario.
较佳的,本发明的上述实施例中所述语音控制系统还包括:Preferably, the voice control system in the above embodiment of the present invention further includes:
与所述语音识别ASR模块和所述自然语义理解NLU模块对应的语音识别接口、与所述语音合成TTS模块对应的语音合成接口以及与所述声纹识别VPR模块对应的声纹识别接口中的一个或多个。a voice recognition interface corresponding to the voice recognition ASR module and the natural semantic understanding NLU module, a voice synthesis interface corresponding to the voice synthesis TTS module, and a voiceprint recognition interface corresponding to the voiceprint recognition VPR module one or more.
本发明实施例提供的语音控制系统根据其语音功能封装统一的对外接口,如语音识别(ASR)功能提供统一的语音识别接口,语音合成(TTS)功能提供统一的语音合成接口,语音唤醒提供统一的语音唤醒接口,声纹识别(VPR)提供统一的声纹识别的接口。The voice control system provided by the embodiment of the present invention provides a unified voice recognition interface according to the voice function, and provides a unified voice recognition interface, and the voice synthesis (TTS) function provides a unified voice synthesis interface, and the voice wakeup provides uniformity. The voice wake-up interface, voiceprint recognition (VPR) provides a unified voiceprint recognition interface.
可选地,本发明实施例提供的语音控制系统还提供与所述业务流程组件模块30对应的对外接口。Optionally, the voice control system provided by the embodiment of the present invention further provides an external interface corresponding to the service process component module 30.
对于上层需要实现语音功能的业务应用软件,根据其实现语音的功能需要,在其启动的时候,绑定语音控制系统,并调用其需要的对应的语音功能接口,例如某应用软件只需要语音识别的功能,就可以通过调用语音识别的接口来实现从音频输入到识别结果输出的整个功能,其应用只需利用识别结果来处理控制逻辑即可,同样的,应用也可根据自身需要同时调用语音平台支持的多个语音功能模块接口来实现相对应的语音功能。进一步地,上层应用软件也可通过调用语音平台系统的与业务流程组件模块30对应的对外接口来方便地同时实现对应的业务的语音功能支持和控制逻辑。For the business application software that needs voice function at the upper layer, according to its function of implementing voice, when it starts, bind the voice control system and call the corresponding voice function interface that it needs, for example, an application software only needs voice recognition. The function can realize the whole function from audio input to recognition result output by calling the interface of speech recognition. The application only needs to use the recognition result to process the control logic. Similarly, the application can also call the voice according to its own needs. Multiple voice function module interfaces supported by the platform to implement corresponding voice functions. Further, the upper layer application software can also conveniently implement the voice function support and control logic of the corresponding service by calling the external interface corresponding to the business process component module 30 of the voice platform system.
综上,本发明实施例提供的语音控制系统为智能终端上的语音业务应用提供统一的语音服务,终端上所有的语音业务应用都可以通过调用语音控制系统而获得对应的语音服务,而不必再各自独立包含语音引擎,大大 节省了对资源的占用;同时,语音平台引擎的可配置化可以满足不同语音业务的差异性需求,大大便利了不同语音业务的集成,提高了终端的用户体验。In summary, the voice control system provided by the embodiment of the present invention provides a unified voice service for a voice service application on an intelligent terminal, and all voice service applications on the terminal can obtain a corresponding voice service by calling a voice control system, without having to Each contains a separate speech engine, The resource platform is saved. At the same time, the configurability of the voice platform engine can meet the different requirements of different voice services, greatly facilitating the integration of different voice services and improving the user experience of the terminal.
为了更好的实现上述目的,如图2所示,本发明实施例还提供一种多个语音业务应用的语音处理方法,所述多个语音业务应用搭载于同一终端设备上,所述语音处理方法包括:As shown in FIG. 2, the embodiment of the present invention further provides a voice processing method for multiple voice service applications, where the multiple voice service applications are mounted on the same terminal device, and the voice processing is performed. Methods include:
步骤21,根据不同的语音业务应用的绑定请求,与所述语音业务应用进行绑定;Step 21: Bind the voice service application according to a binding request of a different voice service application.
步骤22,针对已绑定的语音业务应用,对输入所述语音业务应用的输入信息进行处理,并将处理结果输出给对应的语音业务应用,使得所述语音业务应用利用所述处理结果来进行语音控制。Step 22: For the bound voice service application, process input information input to the voice service application, and output the processing result to the corresponding voice service application, so that the voice service application uses the processing result to perform voice control.
可选地,本发明实施例提供的语音处理方法中所述多个语音业务应用在不同时间交叉处于激活状态。Optionally, in the voice processing method provided by the embodiment of the present invention, the multiple voice service applications are in an active state at different time intervals.
具体的,所述语音业务包括语音识别ASR业务、语音合成TTS业务、自然语义理解NLU业务或者声纹识别VPR业务。本发明实施例中提及的多个语音业务即为上述语音业务中的任意两个或多个的组合。Specifically, the voice service includes a voice recognition ASR service, a voice synthesis TTS service, a natural semantic understanding NLU service, or a voiceprint recognition VPR service. The plurality of voice services mentioned in the embodiments of the present invention are a combination of any two or more of the foregoing voice services.
其中,语音识别(ASR)业务:语音识别模块主要对用户输入的音频录音通过模式识别等各种算法进行分析识别,最后将识别结果以约定的文本格式输出,结束本次识别。其中,语音识别模块包含语音唤醒子模块,语音唤醒子模块设置为对用户预先设置的唤醒词进行持续识别,与普通识别类似的,语音唤醒子模块对用户根据唤醒词输入的音频进行分析识别,返回约定格式的文本效果后,立即开始下次录音监听,使得用户可随时输入音频进行识别。Among them, the speech recognition (ASR) service: the speech recognition module mainly analyzes and recognizes the audio input input by the user through various algorithms such as pattern recognition, and finally outputs the recognition result in an agreed text format, and ends the identification. The voice recognition module includes a voice wakeup submodule, and the voice wakeup submodule is configured to continuously identify the wakeup words preset by the user. Similar to the normal recognition, the voice wakeup submodule analyzes and recognizes the audio input by the user according to the wakeup word. After returning the text effect of the agreed format, the next recording monitor will be started immediately, so that the user can input the audio for recognition at any time.
语音合成TTS业务:语音合成模块主要根据用户输入的文本数据流,通过合成算法将文本数据与音频数据对应起来,最终将输入的文本数据流合成为音频数据流输出。Speech synthesis TTS service: The speech synthesis module mainly associates the text data with the audio data according to the text data stream input by the user, and finally synthesizes the input text data stream into an audio data stream output.
自然语义理解NLU业务:对用户的音频输入进行识别,并在识别的 基础上进行进一步的语义分析,得到用户话语的真实意图,并根据用户意图提供进一步的信息内容的资源。Natural semantic understanding of NLU services: recognition of the user's audio input, and identification Based on further semantic analysis, the real intention of the user's utterance is obtained, and resources for further information content are provided according to the user's intention.
声纹识别VPR业务:声纹识别模块首先根据用户输入的音频数据,对其进行数据采集和特征提取,提取用户的音频特征和相关参数并保存,对以后用户的音频输入进行匹配和鉴权,主要用户安全场景。Voiceprint recognition VPR service: The voiceprint recognition module first performs data collection and feature extraction based on the audio data input by the user, extracts the user's audio features and related parameters, and saves and matches and authenticates the user's audio input. Primary user security scenario.
本发明实施例中,终端设备的录音资源一般具有排他性,同一时间只能支持一个应用占用录音设备,也就意味着同一时间只有一个应用处于激活状态,而不同时间的应用可交叉处于激活状态,使用同一语音控制系统的语音服务支撑。但是若同一时间用户打开两个应用,则优先级较高的应用占用录音设备,优先级较低的应用自动断开;需要说明的是,其优先级的高低可预先设定或者由应用之间交互决定,不限于一固定形式。In the embodiment of the present invention, the recording resources of the terminal device are generally exclusive, and only one application can occupy the recording device at the same time, which means that only one application is in an active state at the same time, and applications at different times can be cross-active. Use the voice service support of the same voice control system. However, if the user opens two applications at the same time, the application with higher priority occupies the recording device, and the application with lower priority is automatically disconnected; it should be noted that the priority of the application can be preset or between applications. The interaction decision is not limited to a fixed form.
举例说明如下:An example is as follows:
这里以智能终端平台上支持两种语音业务应用产品为例,其中应用一为语音助手,可在正常使用的环境下对手机的大部分功能进行全语音操控,如打电话、发短信、播放音乐、声控拍照、生活服务语音搜索等等;另一种语音业务应用二为驾驶助手,可在驾驶环境下进行诸如导航、打电话、发短信、播放音乐等等功能的全语音操控。Here is an example of supporting two voice service application products on the smart terminal platform, wherein the application one is a voice assistant, which can perform full voice control on most functions of the mobile phone in a normal use environment, such as making a call, sending a text message, and playing music. , voice-activated camera, life service voice search, etc.; another voice service application 2 is a driving assistant, which can perform full voice control such as navigation, making a call, texting, playing music and the like in a driving environment.
为了尽可能地节省系统资源,首先,根据这两个应用的需求,确定语音平台系统需要支持的功能配置,这里需要语音识别、语音唤醒和语音合成三种引擎支持,那么由配置模块读取配置文件构建这一满足需求而又无冗余的语音平台系统版本。In order to save system resources as much as possible, firstly, according to the requirements of these two applications, the function configuration that the voice platform system needs to support is determined. Here, three engine support, namely voice recognition, voice wake-up and voice synthesis, are required, and then the configuration module reads the configuration. File builds a version of the voice platform system that meets the needs without redundancy.
应用一的调用流程如下:The calling process of application one is as follows:
应用一需要使用语音平台系统的语音服务,首先要绑定语音平台系统,绑定操作成功后,需要对各语音功能引擎进行初始化,就语音识别而言,初始化后还需要加载语法,加载语法成功后即达到语音识别的准备就绪状态,类似地,语音合成也需要进行引擎的初始化,初始化成功后即达到语音合成的准备就绪状态。对语音识别(包括语音唤醒)而言,准备就 绪状态后,语音开始录音,并对录音进行识别,识别成功后返回文本的识别结果,应用根据这个识别结果来进行操作并继续下个语音交互流程或进入结束状态,如图3所示的状态转移图。而对语音合成而言,进入准备就绪状态后,如应用需要播报对应的文本,则可将对应的文本作为参数传入开始语音合成,设备对传入的文本进行语音播报,然后进行相关的操作并进入相应的下一环语音交互流程,或进入结束状态,如图4所示的状态转移图。To apply the voice service of the voice platform system, the voice platform system must be bound first. After the binding operation is successful, the voice function engine needs to be initialized. In terms of voice recognition, the syntax needs to be loaded after initialization, and the syntax is successfully loaded. After that, the ready state of speech recognition is reached. Similarly, speech synthesis also needs to be initialized by the engine. After the initialization is successful, the ready state of speech synthesis is reached. For speech recognition (including voice wakeup), prepare After the state, the voice starts recording, and the recording is recognized. After the recognition is successful, the recognition result of the text is returned, and the application operates according to the recognition result and continues to the next voice interaction process or enters the end state, as shown in FIG. Transfer map. For speech synthesis, after entering the ready state, if the application needs to broadcast the corresponding text, the corresponding text can be used as a parameter to start the speech synthesis, and the device broadcasts the incoming text, and then performs related operations. And enter the corresponding next ring voice interaction process, or enter the end state, as shown in the state transition diagram shown in Figure 4.
应用二的语音调用流程与应用一相似,目前的终端设备其录音资源一般具有排他性,同一时间只能支持一个应用占用录音设备,也就意味着同一时间只有一个应用处于激活状态,而不同时间不同的应用可交叉处于激活状态,使用同一语音平台系统的语音服务支撑。The voice calling process of application 2 is similar to that of the application 1. The recording resources of the current terminal device are generally exclusive, and only one application can occupy the recording device at the same time, which means that only one application is active at the same time, and different times are different. Applications can be cross-active and supported using voice services from the same voice platform system.
这里需要说明的是,与上述类似的,本发明实施例在终端硬件允许的条件下,可支持任意数量的差异化功能的语音业务应用,不局限于本实施例中所述情况。It should be noted that, similar to the above, the voice service application that can support any number of different differentiated functions under the condition that the terminal hardware allows, is not limited to the case described in this embodiment.
为了更好的实现上述目的,本发明实施例还提供一种终端设备,包括语音控制系统,所述语音控制系统搭载在所述终端设备上,所述终端设备上还搭载有多个不同的语音业务应用,所述语音控制系统包括:配置模块和多个语音引擎模块;其中,In order to achieve the above objective, the embodiment of the present invention further provides a terminal device, including a voice control system, where the voice control system is mounted on the terminal device, and the terminal device is further equipped with multiple different voices. a service application, the voice control system includes: a configuration module and a plurality of voice engine modules; wherein
所述配置模块设置为根据不同的语音业务应用的绑定请求将所述语音业务应用与至少一个语音引擎模块绑定;The configuration module is configured to bind the voice service application to at least one voice engine module according to a binding request of a different voice service application;
所述语音引擎模块设置为对输入所述语音业务应用的输入信息进行处理,并将处理结果输出给对应的语音业务应用,使得所述语音业务应用利用所述处理结果来进行语音控制。The voice engine module is configured to process input information input to the voice service application, and output the processing result to a corresponding voice service application, so that the voice service application uses the processing result to perform voice control.
具体的,本发明具体实施例中所述语音控制系统还包括:Specifically, the voice control system in the specific embodiment of the present invention further includes:
与所述语音引擎模块和所述配置模块连接的业务流程组件模块,所述业务流程组件模块设置为对所述语音引擎模块、所述配置模块以及所述语音业务应用之间的业务流程交互进行逻辑控制。 a business process component module connected to the voice engine module and the configuration module, the business process component module configured to perform business process interaction between the voice engine module, the configuration module, and the voice service application Logic control.
具体的,本发明具体实施例中所述语音引擎模块是语音识别ASR模块、语音合成TTS模块、自然语义理解NLU模块或者声纹识别VPR模块。Specifically, the voice engine module in the specific embodiment of the present invention is a voice recognition ASR module, a voice synthesis TTS module, a natural semantic understanding NLU module, or a voiceprint recognition VPR module.
具体的,本发明具体实施例中所述语音控制系统还包括:Specifically, the voice control system in the specific embodiment of the present invention further includes:
与所述语音识别ASR模块和所述自然语义理解NLU模块对应的语音识别接口、与所述语音合成TTS模块对应的语音合成接口以及与所述声纹识别VPR模块对应的声纹识别接口中的一个或多个。a voice recognition interface corresponding to the voice recognition ASR module and the natural semantic understanding NLU module, a voice synthesis interface corresponding to the voice synthesis TTS module, and a voiceprint recognition interface corresponding to the voiceprint recognition VPR module one or more.
具体的,本发明具体实施例中所述语音控制系统还包括:Specifically, the voice control system in the specific embodiment of the present invention further includes:
与所述业务流程组件模块对应的对外接口。An external interface corresponding to the business process component module.
需要说明的是,本发明上述实施例提供的终端设备是承载上述语音控制系统和语音处理方法的终端设备,则上述语音控制系统和语音处理方法的所有实施例均适设置为该终端设备,且均能达到相同或相似的有益效果。It should be noted that, the terminal device provided by the foregoing embodiment of the present invention is a terminal device that carries the voice control system and the voice processing method, and all embodiments of the voice control system and the voice processing method are appropriately configured as the terminal device, and Both can achieve the same or similar benefits.
以上所述是本发明的优选实施方式,应当指出,对于本技术领域的普通技术人员来说,在不脱离本发明所述原理的前提下,还可以做出若干改进和润饰,这些改进和润饰也应视为本发明的保护范围。The above is a preferred embodiment of the present invention, and it should be noted that those skilled in the art can also make several improvements and retouchings without departing from the principles of the present invention. It should also be considered as the scope of protection of the present invention.
工业实用性Industrial applicability
通过上述实施例及优选实施方式,对搭载在同一终端设备上的多个语音业务应用提供统一的语音服务支撑,从而满足各个语音业务应用不同的差异性需求,同时达到降低资源占用,提升效率的目的。 The foregoing embodiments and the preferred embodiments provide unified voice service support for multiple voice service applications on the same terminal device, so as to meet different differential requirements of each voice service application, and at the same time, reduce resource occupation and improve efficiency. purpose.

Claims (11)

  1. 一种语音控制系统,所述语音控制系统搭载在一终端设备上,所述终端设备上还搭载有多个不同的语音业务应用,所述语音控制系统包括:配置模块和多个语音引擎模块;其中,A voice control system, the voice control system is mounted on a terminal device, and the terminal device is further equipped with a plurality of different voice service applications, and the voice control system includes: a configuration module and a plurality of voice engine modules; among them,
    所述配置模块设置为根据不同的语音业务应用的绑定请求将所述语音业务应用与至少一个语音引擎模块绑定;The configuration module is configured to bind the voice service application to at least one voice engine module according to a binding request of a different voice service application;
    所述语音引擎模块设置为对输入所述语音业务应用的输入信息进行处理,并将处理结果输出给对应的语音业务应用,使得所述语音业务应用利用所述处理结果来进行语音控制。The voice engine module is configured to process input information input to the voice service application, and output the processing result to a corresponding voice service application, so that the voice service application uses the processing result to perform voice control.
  2. 根据权利要求1所述的语音控制系统,其中,所述语音控制系统还包括:The voice control system of claim 1 wherein said voice control system further comprises:
    与所述语音引擎模块和所述配置模块连接的业务流程组件模块,所述业务流程组件模块设置为对所述语音引擎模块、所述配置模块以及所述语音业务应用之间的业务流程交互进行逻辑控制。a business process component module connected to the voice engine module and the configuration module, the business process component module configured to perform business process interaction between the voice engine module, the configuration module, and the voice service application Logic control.
  3. 根据权利要求1所述的语音控制系统,其中,所述语音引擎模块是语音识别ASR模块、语音合成TTS模块、自然语义理解NLU模块或者声纹识别VPR模块。The voice control system according to claim 1, wherein the voice engine module is a voice recognition ASR module, a voice synthesis TTS module, a natural semantic understanding NLU module, or a voiceprint recognition VPR module.
  4. 根据权利要求3所述的语音控制系统,其中,所述语音控制系统还包括:The voice control system of claim 3, wherein the voice control system further comprises:
    与所述语音识别ASR模块和所述自然语义理解NLU模块对应的语音识别接口、与所述语音合成TTS模块对应的语音合成接口以及与所述声纹识别VPR模块对应的声纹识别接口中的一个或多个。a voice recognition interface corresponding to the voice recognition ASR module and the natural semantic understanding NLU module, a voice synthesis interface corresponding to the voice synthesis TTS module, and a voiceprint recognition interface corresponding to the voiceprint recognition VPR module one or more.
  5. 根据权利要求2所述的语音控制系统,其中,所述语音控制系统还包括: The voice control system of claim 2, wherein the voice control system further comprises:
    与所述业务流程组件模块对应的对外接口。An external interface corresponding to the business process component module.
  6. 一种多个语音业务应用的语音处理方法,所述多个语音业务应用搭载于同一终端设备上,所述语音处理方法包括:A voice processing method for a plurality of voice service applications, where the plurality of voice service applications are mounted on the same terminal device, and the voice processing method includes:
    根据不同的语音业务应用的绑定请求,与所述语音业务应用进行绑定;Binding with the voice service application according to a binding request of a different voice service application;
    针对已绑定的语音业务应用,对输入所述语音业务应用的输入信息进行处理,并将处理结果输出给对应的语音业务应用,使得所述语音业务应用利用所述处理结果来进行语音控制。For the bound voice service application, the input information input to the voice service application is processed, and the processing result is output to the corresponding voice service application, so that the voice service application uses the processing result to perform voice control.
  7. 根据权利要求6所述的多个语音业务应用的语音处理方法,其中,所述多个语音业务应用在不同时间交叉处于激活状态。The voice processing method of a plurality of voice service applications according to claim 6, wherein the plurality of voice service applications are in an active state at different time intervals.
  8. 根据权利要求7所述的多个语音业务应用的语音处理方法,其中,所述语音业务包括语音识别ASR业务、语音合成TTS业务、自然语义理解NLU业务或者声纹识别VPR业务。The voice processing method for a plurality of voice service applications according to claim 7, wherein the voice service comprises a voice recognition ASR service, a voice synthesis TTS service, a natural semantic understanding NLU service, or a voiceprint recognition VPR service.
  9. 一种终端设备,包括语音控制系统,所述语音控制系统搭载在所述终端设备上,所述终端设备上还搭载有多个不同的语音业务应用,所述语音控制系统包括:配置模块和多个语音引擎模块;其中,A terminal device includes a voice control system, and the voice control system is mounted on the terminal device, and the terminal device is further equipped with a plurality of different voice service applications, where the voice control system includes: a configuration module and multiple Voice engine modules; among them,
    所述配置模块设置为根据不同的语音业务应用的绑定请求将所述语音业务应用与至少一个语音引擎模块绑定;The configuration module is configured to bind the voice service application to at least one voice engine module according to a binding request of a different voice service application;
    所述语音引擎模块设置为对输入所述语音业务应用的输入信息进行处理,并将处理结果输出给对应的语音业务应用,使得所述语音业务应用利用所述处理结果来进行语音控制。The voice engine module is configured to process input information input to the voice service application, and output the processing result to a corresponding voice service application, so that the voice service application uses the processing result to perform voice control.
  10. 根据权利要求9所述的终端设备,其中,所述语音控制系统还包括: The terminal device according to claim 9, wherein the voice control system further comprises:
    与所述语音引擎模块和所述配置模块连接的业务流程组件模块,所述业务流程组件模块设置为对所述语音引擎模块、所述配置模块以及所述语音业务应用之间的业务流程交互进行逻辑控制。a business process component module connected to the voice engine module and the configuration module, the business process component module configured to perform business process interaction between the voice engine module, the configuration module, and the voice service application Logic control.
  11. 根据权利要求9所述的终端设备,其中,所述语音引擎模块是语音识别ASR模块、语音合成TTS模块、自然语义理解NLU模块或者声纹识别VPR模块。 The terminal device according to claim 9, wherein the speech engine module is a speech recognition ASR module, a speech synthesis TTS module, a natural semantic understanding NLU module, or a voiceprint recognition VPR module.
PCT/CN2016/102605 2016-01-28 2016-10-19 Voice control system, voice processing method and terminal device WO2017128775A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201610061640.2 2016-01-28
CN201610061640.2A CN107018228B (en) 2016-01-28 2016-01-28 Voice control system, voice processing method and terminal equipment

Publications (1)

Publication Number Publication Date
WO2017128775A1 true WO2017128775A1 (en) 2017-08-03

Family

ID=59397325

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2016/102605 WO2017128775A1 (en) 2016-01-28 2016-10-19 Voice control system, voice processing method and terminal device

Country Status (2)

Country Link
CN (1) CN107018228B (en)
WO (1) WO2017128775A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114553922A (en) * 2022-02-07 2022-05-27 中煤信息技术(北京)有限公司 Voice-controlled coal mine comprehensive automation system and method

Families Citing this family (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107657031A (en) * 2017-09-28 2018-02-02 四川长虹电器股份有限公司 Method based on android system management intelligent sound box voice technical ability
CN107818778A (en) * 2017-11-15 2018-03-20 安徽声讯信息技术有限公司 A kind of interactive system based on intelligent sound mouse
CN108133701B (en) * 2017-12-25 2021-11-12 江苏木盟智能科技有限公司 System and method for robot voice interaction
CN108257590B (en) * 2018-01-05 2020-10-02 携程旅游信息技术(上海)有限公司 Voice interaction method and device, electronic equipment and storage medium
CN110827453A (en) * 2019-11-18 2020-02-21 成都启英泰伦科技有限公司 Fingerprint and voiceprint double authentication method and authentication system
CN110928588A (en) * 2019-11-19 2020-03-27 珠海格力电器股份有限公司 Method and device for adjusting terminal configuration, mobile terminal and storage medium
CN111128125A (en) * 2019-12-30 2020-05-08 深圳市优必选科技股份有限公司 Voice service configuration system and voice service configuration method and device thereof
CN111261156A (en) * 2019-12-30 2020-06-09 北京梧桐车联科技有限责任公司 Voice acquisition method and device and computer readable storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1723487A (en) * 2002-12-13 2006-01-18 摩托罗拉公司 Method and apparatus for selective speech recognition
KR20120063372A (en) * 2010-12-07 2012-06-15 현대자동차주식회사 Standalone voice recognition method and system using abstraction api layer
CN103117058A (en) * 2012-12-20 2013-05-22 四川长虹电器股份有限公司 Multi-voice engine switch system and method based on intelligent television platform
CN103714814A (en) * 2013-12-11 2014-04-09 四川长虹电器股份有限公司 Voice introducing method of voice recognition engine

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104050966B (en) * 2013-03-12 2019-01-01 百度国际科技(深圳)有限公司 The voice interactive method of terminal device and the terminal device for using this method
CN104318924A (en) * 2014-11-12 2015-01-28 沈阳美行科技有限公司 Method for realizing voice recognition function

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1723487A (en) * 2002-12-13 2006-01-18 摩托罗拉公司 Method and apparatus for selective speech recognition
KR20120063372A (en) * 2010-12-07 2012-06-15 현대자동차주식회사 Standalone voice recognition method and system using abstraction api layer
CN103117058A (en) * 2012-12-20 2013-05-22 四川长虹电器股份有限公司 Multi-voice engine switch system and method based on intelligent television platform
CN103714814A (en) * 2013-12-11 2014-04-09 四川长虹电器股份有限公司 Voice introducing method of voice recognition engine

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114553922A (en) * 2022-02-07 2022-05-27 中煤信息技术(北京)有限公司 Voice-controlled coal mine comprehensive automation system and method

Also Published As

Publication number Publication date
CN107018228A (en) 2017-08-04
CN107018228B (en) 2020-03-31

Similar Documents

Publication Publication Date Title
WO2017128775A1 (en) Voice control system, voice processing method and terminal device
US9525767B2 (en) System and method for answering a communication notification
CN107004411B (en) Voice application architecture
AU2013252518B2 (en) Embedded system for construction of small footprint speech recognition with user-definable constraints
TWI489372B (en) Voice control method and mobile terminal apparatus
CN106409283B (en) Man-machine mixed interaction system and method based on audio
US10811005B2 (en) Adapting voice input processing based on voice input characteristics
US20140064464A1 (en) Real Time Automatic Caller Speech Profiling
CN103929539B (en) A kind of mobile terminal notepad processing method based on speech recognition and system
US20100311345A1 (en) Method And System For Executing An Internet Radio Application Within A Vehicle
CN104104789A (en) Voice answering method and mobile terminal device
CN105975063B (en) A kind of method and apparatus controlling intelligent terminal
CN106991106A (en) Reduce as the delay caused by switching input mode
US7680514B2 (en) Wireless speech recognition
CN108418744A (en) A kind of electronics seat system for promoting electrical power services quality
CN107731231A (en) A kind of method for supporting more high in the clouds voice services and a kind of storage device
CN108806688A (en) Sound control method, smart television, system and the storage medium of smart television
US7496693B2 (en) Wireless enabled speech recognition (SR) portable device including a programmable user trained SR profile for transmission to external SR enabled PC
CN109712623A (en) Sound control method, device and computer readable storage medium
CN110175016A (en) Start the method for voice assistant and the electronic device with voice assistant
CN111094924A (en) Data processing apparatus and method for performing voice-based human-machine interaction
CN109360565A (en) A method of precision of identifying speech is improved by establishing resources bank
KR20140067687A (en) Car system for interactive voice recognition
CN109660672A (en) Conversion method, equipment and the computer readable storage medium of sound-type
CN108877799A (en) A kind of phonetic controller and method

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 16887648

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 16887648

Country of ref document: EP

Kind code of ref document: A1