CN103117058A

CN103117058A - Multi-voice engine switch system and method based on intelligent television platform

Info

Publication number: CN103117058A
Application number: CN201210558320XA
Authority: CN
Inventors: 陈冠霖; 赵波; 刘贤洪; 杨金峰; 毕端
Original assignee: Sichuan Changhong Electric Co Ltd
Current assignee: Sichuan Changhong Electric Co Ltd
Priority date: 2012-12-20
Filing date: 2012-12-20
Publication date: 2013-05-22
Anticipated expiration: 2032-12-20
Also published as: CN103117058B

Abstract

The invention relates to intelligent television software platforms and discloses a multi-voice engine switch method based on an intelligent television platform. The purposes that a voice engine with highest identifying efficiency at present can be automatically searched and switching is carried out are achieved, and voice interactive experience of a user can be improved. The method includes that when the user operates a voice application program and uses a voice identifying function, a voice engine selection module obtains collected voice data through a voice application interface, then the voice data are transmitted to each voice engine module, response time of returning of an identifying result by each voice engine module is recorded and compared, and a voice engine module with the shortest response time is selected and switching is carried out. In addition, the invention further discloses a corresponding switch system which is suitable for achieving a quick voice identifying function in an intelligent television.

Description

More voice engine switched system and method based on the intelligent television platform

Technical field

The present invention relates to the intelligent television software platform, specifically, relate to a kind of more voice engine switched system and method based on the intelligent television platform.

Background technology

Along with television terminal is intelligent, the development of networking, the retrievable content of intelligent television has obtained abundant greatly, and function is more diversification also, and controlling of TV becomes more frequent and complicated thereupon.User's operating process has been simplified in the application of speech recognition technology on intelligent television greatly, and the user experiences and is greatly improved.Because speech recognition need to take huge system resource, intelligent television generally all connects cloud server by network at present and realizes speech identifying function;

Be used for realizing that the speech recognition engine of speech identifying function is comprised of speech detection module, characteristic extracting module and identification search module in server; Wherein, the function of speech detection module be carry out voice signal detection and with processing, TV is sent to this module with the primary voice data that collects, and voice signal data need to convert the data layout (such as 8K, 16bit) of standard in the speech detection module; Simultaneously, utilize efficient signal detection algorithm, judge the starting point and ending point of voice; Characteristic extracting module is received the audio data stream after detection, therefrom extracts the eigenvector stream that obtains voice signal.Phonetic feature is to utilize Digital Signal Processing, extracts the information of reacting its essential attribute most from voice signal.In this module, need to carry out the processing such as pre-emphasis, minute frame, windowing, product and conversion, Cepstrum Transform, difference to voice signal, finally obtain the eigenvector of tens of dimensions left and right; Acoustic model storehouse, dictionary/dictionary in the unknown phonic signal character that will receive of identification search module and engine and identify syntactic information and mate obtains the word sequence of suitable unknown phonetic feature.This process can be briefly described as follows: by retrieval dictionary/dictionary, sentence can be resolved into the sequence of phoneme by word sequence.The sequence of this phoneme combines with acoustic model, just obtains more reflecting the acoustic model unit sequence information of its essential attribute.Then, the information of the eigenvector of raw tone and all possible sentence candidate's acoustic model unit sequence is mated mutually, calculate its matching probability, select the acoustic model unit sequence with maximum a posteriori probability.By this unit sequence, can obtain with it corresponding word sequence, the word sequence of Here it is engine exports to TV.

And owing to there being a plurality of speech recognition engines in server, if the some stationary engines of single use carry out speech recognition, be unfavorable for the lifting of intelligent television audio identification efficiency, cause the user speech interactive experience bad; Therefore, how to search the speech recognition engine of current full blast and to switch be problem demanding prompt solution during interactive voice is used between a plurality of speech recognition engines.

Summary of the invention

Technical matters to be solved by this invention is: propose a kind of more voice engine switched system and method based on the intelligent television platform, realize automatically searching the highest speech engine of current recognition efficiency and switching, the interactive voice that promotes the user is experienced.

The scheme that the present invention solves the problems of the technologies described above employing is: the more voice engine switched system based on the intelligent television platform comprises: speech engine is selected module and at least two speech engine modules; All speech engine modules are encapsulated by unified speech engine interface, and connect speech engine selection module by the speech engine interface; Described speech engine selects module to be connected with speech application by the voice application interface.

Further, described speech engine module is used for obtaining from the speech engine interface speech data that speech engine selects module to transmit, and speech data is identified, and then selects module to return to recognition result to speech engine; Described speech engine selects module to be used for when speech application uses speech identifying function, obtain the speech data that collects by the voice application interface, speech data is sent to each speech engine module by the speech engine interface, and receive the recognition result that all speech engine modules are returned, recording each speech engine module returns to the response time of recognition result and compares, select the shortest speech engine module of response time to switch, make speech application can call the highest speech engine module of recognition efficiency.

Further, described selection the shortest speech engine module of response time is switched and referred to: speech engine selects module to be connected to the shortest speech engine module of response time by the speech engine interface, disconnects simultaneously and being connected of other speech engine module.

In addition, the invention allows for a kind of corresponding more voice engine switching method based on the intelligent television platform, comprising:

A. when the user moved speech application use speech identifying function, speech engine selected module to obtain the speech data that collects by the voice application interface;

B. speech engine selects module that speech data is sent to each speech engine module by the speech engine interface;

C. each speech engine module is identified speech data, then selects module to return to recognition result to speech engine;

D. speech engine selects each speech engine module of module records return to the response time of recognition result and compare, and selects the shortest speech engine module of response time to switch.

Further, in steps d, described selection the shortest speech engine module of response time is switched and referred to: speech engine selects module to be connected to the shortest speech engine module of response time by the speech engine interface, disconnects simultaneously and being connected of other speech engine module.

The invention has the beneficial effects as follows: compare by the response time (being recognition speed) of each speech engine module being returned to recognition result, select the shortest speech engine module of response time to switch, make speech application can call the highest speech engine module of recognition efficiency and carry out speech recognition, thereby promoted the whole recognition efficiency of speech recognition; And, because the connection carrier (voice application interface) between speech application and speech engine selection module remains unchanged, when the speech engine module switches, speech application need not to pay close attention to specifically which speech engine module switches, thereby has guaranteed stability and the continuity of speech recognition.

Description of drawings

Fig. 1 is that in the present invention, the more voice engine switched system based on the intelligent television platform is realized framework map;

Fig. 2 is the process flow diagram based on the more voice engine switching method of intelligent television platform in the present invention.

Embodiment

of the present inventionly realize that principle is: due to the performance difference of each speech engine module in system, these modules to the processing of speech data with regard to faster or slower, therefore, we can select module that the response time of each speech engine resume module speech data is recorded and compares by a speech engine is set, thereby it is the shortest to find out the processing time, respond the fastest speech engine module, then the connection that switches to this speech engine module gets final product, and the introducing that speech engine is selected module does not change all the time due to the application interface between itself and speech application, therefore, stability problem that simultaneously can also resolution system.

Referring to Fig. 1, the more voice engine switched system based on the intelligent television platform in the present invention comprises speech engine selection module and a plurality of speech engine module; All speech engine modules are encapsulated by unified speech engine interface, and connect speech engine selection module by the speech engine interface; Described speech engine selects module to be connected with speech application by the voice application interface.

Wherein, described speech engine module is used for obtaining from the speech engine interface speech data that speech engine selects module to transmit, and speech data is identified, and then selects module to return to recognition result to speech engine; Described speech engine selects module to be used for when speech application uses speech identifying function, obtain the speech data that collects by the voice application interface, speech data is sent to each speech engine module by the speech engine interface, and receive the recognition result that all speech engine modules are returned, recording each speech engine module returns to the response time of recognition result and compares, select the shortest speech engine module of response time to switch, make speech application can call the highest speech engine module of recognition efficiency.

Fig. 2 has provided the corresponding flow process of changing method, and it comprises following performing step:

A. when the user moved speech application use speech identifying function, speech engine selected module to obtain the speech data that collects by the voice application interface; The voice capture device that this speech data derives from intelligent television collects to get sound source signal;

B. speech engine selects module that speech data is sent to each speech engine module by the speech engine interface; Owing to having adopted unified speech engine interface to encapsulate, each speech engine module can be received same speech data simultaneously;

D. speech engine selects each speech engine module of module records return to the response time of recognition result and compare, select the shortest speech engine module of response time to switch: speech engine selects module to be connected to the shortest speech engine module of response time by the speech engine interface, and disconnection simultaneously is connected with other speech engine module.After this, speech application can be realized speech recognition fast by calling the shortest speech engine module of this response time, and the interactive voice that promotes the user is experienced.

Claims

1. based on the more voice engine switched system of intelligent television platform, it is characterized in that, comprising: speech engine is selected module and at least two speech engine modules; All speech engine modules are encapsulated by unified speech engine interface, and connect speech engine selection module by the speech engine interface; Described speech engine selects module to be connected with speech application by the voice application interface.

2. the more voice engine switched system based on the intelligent television platform as claimed in claim 1, it is characterized in that, described speech engine module is used for obtaining from the speech engine interface speech data that speech engine selects module to transmit, and speech data is identified, then select module to return to recognition result to speech engine; Described speech engine selects module to be used for when speech application uses speech identifying function, obtain the speech data that collects by the voice application interface, speech data is sent to each speech engine module by the speech engine interface, and receive the recognition result that all speech engine modules are returned, recording each speech engine module returns to the response time of recognition result and compares, select the shortest speech engine module of response time to switch, make speech application can call the highest speech engine module of recognition efficiency.

3. the more voice engine switched system based on the intelligent television platform as claimed in claim 2, it is characterized in that, described selection the shortest speech engine module of response time is switched and referred to: speech engine selects module to be connected to the shortest speech engine module of response time by the speech engine interface, disconnects simultaneously and being connected of other speech engine module.

4. based on the more voice engine switching method of intelligent television platform, it is characterized in that, comprising:

5. the more voice engine switching method based on the intelligent television platform as claimed in claim 4, it is characterized in that, in steps d, described selection the shortest speech engine module of response time is switched and referred to: speech engine selects module to be connected to the shortest speech engine module of response time by the speech engine interface, disconnects simultaneously and being connected of other speech engine module.