CN103117058B

CN103117058B - Based on Multi-voice engine switch system and the method for intelligent television platform

Info

Publication number: CN103117058B
Application number: CN201210558320.XA
Authority: CN
Inventors: 陈冠霖; 赵波; 刘贤洪; 杨金峰; 毕端
Original assignee: Sichuan Changhong Electric Co Ltd
Current assignee: Sichuan Changhong Electric Co Ltd
Priority date: 2012-12-20
Filing date: 2012-12-20
Publication date: 2015-12-09
Anticipated expiration: 2032-12-20
Also published as: CN103117058A

Abstract

The present invention relates to intelligent television software platform, it discloses a kind of more voice engine switching method based on intelligent television platform, realize automatically searching the highest speech engine of current recognition efficiency and switching, the interactive voice promoting user is experienced.The method may be summarized to be: when user runs speech application use speech identifying function, speech engine selects module to be obtained the speech data collected by voice application interface, then speech data is sent to each speech engine module, record and compare the response time that each speech engine module returns recognition result, the speech engine module of Response to selection shortest time switches.In addition, the invention also discloses corresponding switched system, be applicable in intelligent television, realize Rapid Speech recognition function.

Description

Based on Multi-voice engine switch system and the method for intelligent television platform

Technical field

The present invention relates to intelligent television software platform, specifically, relate to a kind of Multi-voice engine switch system based on intelligent television platform and method.

Background technology

Along with the development of television terminal intellectuality, networking, the retrievable content of intelligent television obtains abundant greatly, and function is more diversification also, and the manipulation of TV becomes more frequent and complicated thereupon.The application of speech recognition technology on intelligent television enormously simplify the operating process of user, and Consumer's Experience is greatly improved.Because speech recognition needs to take huge system resource, intelligent television is general at present all realizes speech identifying function by network connection cloud server;

Speech recognition engine in the server for realizing speech identifying function is made up of speech detection module, characteristic extracting module and identification search module; Wherein, the function of speech detection module be carry out voice signal detection and with process, the primary voice data collected is sent to this module by TV, and voice signal data needs the data layout (such as: 8K, 16bit) converting standard in speech detection module to; Meanwhile, utilize efficient signal detection algorithm, judge the starting point and ending point of voice; Characteristic extracting module receives the audio data stream after detection, therefrom extracts the eigenvector stream obtaining voice signal.Phonetic feature utilizes Digital Signal Processing, extracts the information of reacting its essential attribute most from voice signal.In this module, need to carry out the process such as pre-emphasis, framing, windowing, product and conversion, Cepstrum Transform, difference to voice signal, finally obtain the eigenvector of tens of dimension left and right; Identify search module by the unknown phonic signal character received with the acoustic model storehouse in engine, dictionary/dictionary with identify that syntactic information mates, obtain the word sequence of the most applicable unknown phonetic feature.This process can be briefly described as follows: by retrieval dictionary/dictionary, sentence can be resolved into the sequence of phoneme by word sequence.The sequence of this phoneme combines with acoustic model, is just more reflected the acoustic model unit sequence information of its essential attribute.Then, the information of the eigenvector of raw tone with the acoustic model unit sequence of all possible sentence candidate is mated mutually, calculates its matching probability, select the acoustic model unit sequence with maximum a posteriori probability.By this unit sequence, with it corresponding word sequence can be obtained, the word sequence of Here it is engine exports to TV.

And owing to there is multiple speech recognition engine in server, if the some stationary engines of single use carry out speech recognition, be unfavorable for the lifting of intelligent television audio identification efficiency, cause user speech interactive experience bad; Therefore, how to search between multiple speech recognition engine current full blast speech recognition engine and carry out switching be interactive voice application in a problem demanding prompt solution.

Summary of the invention

Technical matters to be solved by this invention is: propose a kind of Multi-voice engine switch system based on intelligent television platform and method, realizes automatically searching the highest speech engine of current recognition efficiency and switching, and the interactive voice promoting user is experienced.

The scheme that the present invention solves the problems of the technologies described above employing is: based on the Multi-voice engine switch system of intelligent television platform, comprising: speech engine selects module and at least two speech engine modules; All speech engine modules are encapsulated by unified speech engine interface, and connect speech engine selection module by speech engine interface; Described speech engine selects module to be connected with speech application by voice application interface.

Further, described speech engine module is used for obtaining from speech engine interface the speech data that speech engine selects module transmission, and identifies speech data, then selects module to return recognition result to speech engine; Described speech engine selects module to be used for when speech application uses speech identifying function, the speech data collected is obtained by voice application interface, speech data is sent to each speech engine module by speech engine interface, and receive the recognition result that all speech engine modules return, record each speech engine module return the response time of recognition result and contrast, the speech engine module of Response to selection shortest time switches, and makes speech application can call the highest speech engine module of recognition efficiency.

Further, the speech engine module of described Response to selection shortest time is carried out switching and is referred to: speech engine selects module to be connected to response time the shortest speech engine module by speech engine interface, disconnects the connection with other speech engine module simultaneously.

In addition, the invention allows for a kind of accordingly based on the more voice engine switching method of intelligent television platform, comprising:

A., when user runs speech application use speech identifying function, speech engine selects module to be obtained the speech data collected by voice application interface;

B. speech engine selects module that speech data is sent to each speech engine module by speech engine interface;

C. each speech engine module identifies speech data, then selects module to return recognition result to speech engine;

D. speech engine selects each speech engine module of module record return the response time of recognition result and contrast, and the speech engine module of Response to selection shortest time switches.

Further, in steps d, the speech engine module of described Response to selection shortest time is carried out switching and is referred to: speech engine selects module to be connected to response time the shortest speech engine module by speech engine interface, disconnects the connection with other speech engine module simultaneously.

The invention has the beneficial effects as follows: contrasted by the response time (i.e. recognition speed) each speech engine module being returned to recognition result, the speech engine module of Response to selection shortest time switches, the speech engine module making speech application can call recognition efficiency the highest carries out speech recognition, thus improves the overall recognition efficiency of speech recognition; And, because speech application and speech engine select the connection carrier between module (voice application interface) to remain unchanged, when speech engine module switches, speech application switches without the need to paying close attention to specifically which speech engine module, thus ensure that stability and the continuity of speech recognition.

Accompanying drawing explanation

Fig. 1 is that the Multi-voice engine switch system based on intelligent television platform in the present invention realizes framework map;

Fig. 2 is the process flow diagram of the more voice engine switching method based on intelligent television platform in the present invention.

Embodiment

The principle that realizes of the present invention is: due to the performance difference of each speech engine module in system, these modules to the process of speech data with regard to faster or slower, therefore, we can select the response time of module to each speech engine resume module speech data record and compare by arranging a speech engine, thus it is the shortest to find out the processing time, respond the fastest speech engine module, then the connection of this speech engine module is switched to, and speech engine selects the introducing of module not change all the time due to the application interface between itself and speech application, therefore, simultaneously can also the stability problem of resolution system.

See Fig. 1, comprise speech engine based on the Multi-voice engine switch system of intelligent television platform in the present invention and select module and multiple speech engine module; All speech engine modules are encapsulated by unified speech engine interface, and connect speech engine selection module by speech engine interface; Described speech engine selects module to be connected with speech application by voice application interface.

Wherein, described speech engine module is used for obtaining from speech engine interface the speech data that speech engine selects module transmission, and identifies speech data, then selects module to return recognition result to speech engine; Described speech engine selects module to be used for when speech application uses speech identifying function, the speech data collected is obtained by voice application interface, speech data is sent to each speech engine module by speech engine interface, and receive the recognition result that all speech engine modules return, record each speech engine module return the response time of recognition result and contrast, the speech engine module of Response to selection shortest time switches, and makes speech application can call the highest speech engine module of recognition efficiency.

Fig. 2 gives the corresponding flow process of changing method, and it comprises following performing step:

A., when user runs speech application use speech identifying function, speech engine selects module to be obtained the speech data collected by voice application interface; The voice capture device that this speech data derives from intelligent television collects to obtain sound source signal;

B. speech engine selects module that speech data is sent to each speech engine module by speech engine interface; Encapsulate owing to have employed unified speech engine interface, each speech engine module can receive same speech data simultaneously;

D. speech engine selects each speech engine module of module record return the response time of recognition result and contrast, the speech engine module of Response to selection shortest time switches: speech engine selects module to be connected to response time the shortest speech engine module by speech engine interface, disconnects the connection with other speech engine module simultaneously.After this, speech application can realize speech recognition fast by calling this response time the shortest speech engine module, and the interactive voice promoting user is experienced.

Claims

1. based on the Multi-voice engine switch system of intelligent television platform, it is characterized in that, comprising: speech engine selects module and at least two speech engine modules; All speech engine modules are encapsulated by unified speech engine interface, and connect speech engine selection module by speech engine interface; Described speech engine selects module to be connected with speech application by voice application interface;

Described speech engine module is used for obtaining from speech engine interface the speech data that speech engine selects module transmission, and identifies speech data, then selects module to return recognition result to speech engine; Described speech engine selects module to be used for when speech application uses speech identifying function, the speech data collected is obtained by voice application interface, speech data is sent to each speech engine module by speech engine interface, and receive the recognition result that all speech engine modules return, record each speech engine module return the response time of recognition result and contrast, the speech engine module of Response to selection shortest time switches, and makes speech application can call the highest speech engine module of recognition efficiency;

The speech engine module of described Response to selection shortest time is carried out switching and is referred to: speech engine selects module to be connected to response time the shortest speech engine module by speech engine interface, disconnects the connection with other speech engine module simultaneously.

2., based on the more voice engine switching method of intelligent television platform, be applied in the system as claimed in claim 1, it is characterized in that, comprising:

D. speech engine selects each speech engine module of module record return the response time of recognition result and contrast, and the speech engine module of Response to selection shortest time switches;

In steps d, the speech engine module of described Response to selection shortest time is carried out switching and is referred to: speech engine selects module to be connected to response time the shortest speech engine module by speech engine interface, disconnects the connection with other speech engine module simultaneously.