CN104506944B

CN104506944B - Interactive voice householder method and system based on tv scene and voice assistant

Info

Publication number: CN104506944B
Application number: CN201410634174.3A
Authority: CN
Inventors: 黄海兵
Original assignee: iFlytek Co Ltd
Current assignee: iFlytek Co Ltd
Priority date: 2014-11-12
Filing date: 2014-11-12
Publication date: 2018-09-21
Anticipated expiration: 2034-11-12
Also published as: CN104506944A

Abstract

The present invention relates to interactive voice householder methods and system based on tv scene and voice assistant, software and the voice assistant independent operating of televising, the voice assistant is televised the scene information of running software described in obtaining, the voice assistant matches speech recognition conversion result with the scene information of acquisition, then for matched scene information, according to situation elements information and scene state information and voice messaging, by televising, software carries out operation execution.The present invention is based on tv scene and the interactive voice householder methods and system of voice assistant, it is operated on it and is used according to the real-time scene information of TV, voice television is set really to march toward intelligence, simultaneously, software separates independent operating with televising, it can be used cooperatively with a voice assistant and multiple softwares of televising, greatly save system resource.In addition, convenient be updated and innovate to speech engine, promote development of the voice technology in terms of intelligence.

Description

Interactive voice householder method and system based on tv scene and voice assistant

Technical field

The present invention relates to a kind of interactive voice householder method and systems, more particularly to one kind to be helped based on tv scene and voice The interactive voice householder method and system of hand.

Background technology

Although the emerging technologies such as smart mobile phone, network change the production and life of people, in the family, TV significantly Or with the information transmission status that do not replace.With the development of science and technology, TV tech has also obtained significant progress, at present To the intelligent stage, smart television is more and more extensive to be applied in people's life Polarizations for Target Discrimination in Clutter.With the development of voice technology, language Sound TV also increasingly walks close to people’s lives.It is soft that voice is embedded in voice television is typically employed in and televises module at present Part carries out voice-controlled operations, and majority can only carry out concrete operations project and be operated, since scene information is soft according to TV The real time execution of part and change, therefore, existing inline operations cannot be directed to TV real-time scene information it is grasped Make and uses.In addition, for loading multiple softwares of televising in intelligent television platform, then software of each televising carries out The embedded exploitation of complicated voice could be used, meanwhile, when software is loaded, a large amount of memory can be occupied, especially Simultaneously load it is several televise software when, need a large amount of memory source, influence the operational effect of system.With speech recognition Degree it is higher and higher, speech engine is also more and more huger, and voice control is also more and more intelligent, this needs speech engine itself Continuous update and development, the embedded development for obviously greatly limiting voice control of voice.

Invention content

Present invention solves the technical problem that being：Build a kind of interactive voice auxiliary square based on tv scene and voice assistant Method and system overcome the real-time scene information that the prior art cannot be directed to TV be operated on it and use and influence The technical issues of operational effect of system, limitation voice control development on TV.

The technical scheme is that：A kind of interactive voice householder method based on tv scene and voice assistant is provided, Including software of televising, voice assistant, televise software and the voice assistant independent operating, interactive voice auxiliary Method includes the following steps：

Obtain scene information：The voice assistant is televised the scene information of running software described in obtaining, the scene Information includes situation elements information or scene state information；

Input voice：The voice assistant acquires voice messaging, and the voice assistant carries out voice to the voice messaging Identification conversion；

Matching executes：The voice assistant matches speech recognition conversion result with the scene information of acquisition；If institute It is same or similar in relevant information with institute speech recognition result to state the situation elements information of running software of televising, then institute It states voice assistant and matched situation elements information is transmitted to the software of televising, institute is executed by the software of televising State the corresponding project of situation elements information；If the scene state information of the running software matches knot with institute speech recognition result Fruit is same or similar in relevant information, then the voice assistant calls the scene state mould of the project information built in advance Plate, then the voice assistant information of corresponding scene state template is transmitted to by the software of televising according to voice messaging, The corresponding project of information of the scene state template is executed by the software of televising.

The present invention further technical solution be：It is described to televise software and the voice assistant is broadcast by the TV The spare interface for softening part establishes communication connection or described televises software and the voice assistant is built by proprietary protocol Vertical communication connection.

The present invention further technical solution be：The software of televising include it is a variety of it is independently operated televise it is soft The software cooperating of televising of part, the voice assistant and current active.

The present invention further technical solution be：Further include network server, the voice assistant believes the scene of acquisition Breath uploads to the network server, and the network server matches the scene information with pre-stored information, Matched information is transmitted to the voice assistant.

The present invention further technical solution be：The same or similar relevant information that is included in is being sent out in the relevant information It is same or similar in sound, word, word meaning, affiliated type or operation information, or matching both sides respectively sending out by partial information It is same or similar in sound, word, word meaning, affiliated type or operation information.

The technical scheme is that：A kind of interactive voice auxiliary system based on tv scene and voice assistant is built, Including software of televising, voice assistant, televise software and the voice assistant independent operating are described to televise Software includes acquiring the acquisition module of scene information, the communication module communicated with the voice assistant, execution module, described Voice assistant include obtain described in televise running software scene information data obtaining module, acquire voice messaging language Sound acquisition module, the sound identification module for carrying out speech recognition conversion, matching module, transmission module, described information acquisition module It televises described in acquisition the scene information of running software, the scene information includes situation elements information or scene state letter Breath；The voice acquisition module acquires voice messaging, and the sound identification module carries out speech recognition to the voice messaging and turns It changes；The matching module matches speech recognition conversion result with the scene information of acquisition；If the software of televising The situation elements information of operation and institute speech recognition result are same or similar in relevant information, and the transmission module will match Situation elements information be transmitted to the software of televising, the execution module executes the corresponding item of the situation elements information Mesh；If the scene state information of the running software of televising and institute's speech recognition result identical or phase in relevant information Seemingly, the voice assistant calls the scene state template of the project information built in advance, by the transmission module according to voice The information of corresponding scene state template is transmitted to the software of televising by information, and the execution module executes the scene shape The corresponding project of information of morphotype plate.

The present invention further technical solution be：The software of televising includes first information output module or described Voice assistant includes the second message output module.

The solution have the advantages that：Build a kind of interactive voice householder method based on tv scene and voice assistant and System, including software of televising, voice assistant, televise software and the voice assistant independent operating, institute's predicate Sound assistant televises the scene information of running software described in obtaining, and the scene information includes situation elements information or scene shape State information；The voice assistant acquires voice messaging, and the voice assistant carries out speech recognition conversion to the voice messaging；Institute Voice assistant is stated to match speech recognition conversion result with the scene information of acquisition；If the running software of televising Situation elements information and institute speech recognition result are same or similar in relevant information, then the voice assistant is by matched field Scape element information is transmitted to the software of televising, and it is corresponding to execute the situation elements information by the software of televising Project；If the scene state information of the running software and institute speech recognition result matching result it is identical in relevant information or Similar, then the voice assistant calls the scene state template of the project information built in advance, then the voice assistant according to The information of corresponding scene state template is transmitted to the software of televising by voice messaging, is executed by the software of televising The corresponding project of information of the scene state template.The present invention is based on tv scene and the interactive voice auxiliary squares of voice assistant Method and system, televise software and the voice assistant independent operating, the voice assistant obtain described in televise The scene information of running software, the voice assistant match speech recognition conversion result with the scene information of acquisition, so Afterwards for matched scene information, according to situation elements information and scene state information and voice messaging, by software of televising Carry out operation execution.The present invention is based on tv scene and the interactive voice householder methods and system of voice assistant, according to TV Real-time scene information is operated on it and is used, and voice television is made really to march toward intelligence, meanwhile, it is soft with televising Part separates independent operating, can be used cooperatively with a voice assistant and multiple softwares of televising, greatly save system resource.Separately Outside, convenient that speech engine is updated and is innovated, promote development of the voice technology in terms of intelligence.

Description of the drawings

Fig. 1 is the structural diagram of the present invention.

Fig. 2 is the preferred embodiment of the present invention structural schematic diagram.

Specific implementation mode

With reference to specific embodiment, technical solution of the present invention is further illustrated.

As shown in Figure 1, the specific implementation mode of the present invention is：A kind of voice based on tv scene and voice assistant is provided Interaction householder method, including software 1 of televising, voice assistant 2, the software 1 of televising are independent with the voice assistant 2 Operation, interactive voice householder method include the following steps：

Obtain scene information：The voice assistant 2 obtains the scene information of the operation of software 1 of televising, the field Scape information includes situation elements information or scene state information.

Specific implementation process is as follows：The voice assistant 2 obtains the scene information mode of the operation of software 1 of televising Including two ways：A kind of mode is the scene information of the 1 background acquisition self-operating of software of televising, and this information is adopted Mode set is preferred manner comprehensively, accurately, quickly.Another mode is：The voice assistant 2 is televised by described The scene information that software 1 of televising described in the spare interface acquisition of software 1 is run, this mode will be according to the work(of spare interface It can determine the degree of acquisition information.For the scene information of the acquisition of software 1 of televising, by the software 1 of televising It is transmitted to the acquisition that the voice assistant 2 completes scene information.Pass through the software 1 of televising for the voice assistant 2 Spare interface acquisition described in televise software 1 operation scene information, itself be scene information obtain process.Institute It includes situation elements information or scene state information to state scene information.The situation elements information includes that operation details interface is presented Visual information, specifically include text information, pictorial information, video information title of runnable interface etc., operation details interface Text information is most important information.The scene state information includes mainly the operation information that runnable interface is related to, such as：It broadcasts It puts video, play the related operation informations such as music, operation game.In specific embodiment, according to these information, usually by acquisition Element information is converted to text information more.

Input voice：The voice assistant 2 acquires voice messaging, and the voice assistant 2 carries out language to the voice messaging Sound identification conversion.

Specific implementation process is as follows：By external voice input equipment input voice information, the voice assistant 2 acquires institute Voice messaging is stated, speech recognition conversion then is carried out to the voice messaging.In specific embodiment, speech recognition conversion result packet Text information is included, operation information is related to.Such as：Happy base camp is opened, then speech recognition conversion result is related to operating Information also includes text information.

Matching executes：The voice assistant 2 matches speech recognition conversion result with the scene information of acquisition；If institute Situation elements information and the institute speech recognition result for stating the operation of software 1 of televising are same or similar in relevant information, then Matched situation elements information is transmitted to the software 1 of televising by the voice assistant 2, by the software 1 of televising Execute the corresponding project of the situation elements information；If the scene state information of the running software and institute's speech recognition result Matching result is same or similar in relevant information, then the voice assistant 2 calls the scene of the project information built in advance State template, then the voice assistant 2 information of corresponding scene state template is transmitted to by the TV according to voice messaging and is broadcast Part 1 is softened, the corresponding project of information of the scene state template is executed by the software 1 of televising.

Specific implementation process is as follows：The voice assistant 2 carries out speech recognition conversion result and the scene information of acquisition Matching is mainly matched from pronunciation, word, word meaning or the operation information of oneself each relevant information, the scene member Prime information includes the type where the title of situation elements information, situation elements information, the making involved by situation elements information It is one or more in the content information that personnel, situation elements information are related to.It is same or similar in the relevant information to be included in Relevant information is same or similar in pronunciation, word, word meaning, affiliated type or operation information, such as：Current scene element Information is " happy base camp ", and same or similar matching is carried out from the pronunciation of " happy base camp ", word, can also be belonging to it It is matched in type, such as：" happy base camp " is variety show, can also be from the matching of the enterprising row information of its host, also It can be from its affiliated enterprising row information matching of TV station etc..Another way is：Match both sides respectively partial information pronunciation, text It is same or similar in word, word meaning, affiliated type or operation information.Such as：Current scene element information is " happy university degree Battalion ", its partial information " happy " and " base camp " can be taken to be matched, if voice recognition result include " happy " or " base camp " can also then match " happy base camp " as correlation.After matching is related, the voice assistant 2 is by matched field Scape element information is transmitted to the software 1 of televising, and executing the situation elements information by the software 1 of televising corresponds to Project.For example having the program of display " happy base camp " in situation elements information, after matching is related, the voice assistant 2 will " happy base camp " information is transferred to the software 1 of televising, and the software 1 of televising, which executes, to be somebody's turn to do " happy base camp " Program, implementing result include the operations such as selection, click.

The scene state information of the operation of software 1 of televising is to institute's speech recognition result matching result in related letter Same or similar on breath, then the voice assistant 2 calls the scene state template of the project information built in advance, then institute's predicate The information of corresponding scene state template is transmitted to the software 1 of televising by sound assistant 2 according to voice messaging, by the TV Playout software 1 executes the corresponding project of information of the scene state template.It is exemplified below：If the scene state letter currently acquired Breath is " blame sincere not faze for broadcasting ", then the voice assistant 2 calls the video player module built in advance, video player module Including the phases such as " broadcasting ", " F.F. ", " rewind ", " Volume Up ", " volume down ", " contrast increase ", " contrast reduction " The operation information that video playing is related to is closed, if the information of voice recognition result includes " increasing volume ", understands from its meaning, answers For " Volume Up ", then " Volume Up " is sent to the software 1 of televising by the voice assistant 2, then the TV Playout software 1 executes the operation of Volume Up.

As shown in Figure 1, the preferred embodiment of the present invention is：Televise software 1 and the progress of the voice assistant 2 It is described to televise software 1 and the voice assistant 2 establishes communication connection by following two modes in message transmitting procedure. It is described televise software 1 and the voice assistant 2 by the spare interface of the software 1 of televising establish communication connection or Software 1 and the voice assistant 2 of televising described in person pass through proprietary protocol and establish communication connection.The voice assistant 2 obtains The scene information of acquisition operation includes two ways：The software 1 of televising is transmitted to the voice assistant 2 or institute's predicate Sound assistant 2 directly acquires to the software 1 of televising.For the scene information of the acquisition operation of software 1 of televising, institute It states televise software 1 and the voice assistant 2 and establishes and communicate to connect, then by the software 1 of televising by the fortune of acquisition Row scene information is transferred to the voice assistant 2.The voice assistant 2 can also televise what software 1 was reserved by described Interface is established with the software 1 of televising and is communicated to connect, and the voice assistant 2 is directly acquired to the software 1 of televising It televises described in acquisition the Run-time scenario information of software 1.The voice assistant 2 is according to the reserved of the software 1 of televising Interface is established with the software 1 of televising and is communicated to connect.Currently, most software is some specific functions, it is reserved Communication interface, such as：Some softwares do not see the interface that Chu reserves progress massage voice reading for old man, alternatively, some softwares are The auxiliary operation interface etc. that blind person reserves.These functional interfaces and institute of the voice assistant 2 by the software 1 of televising It states software 1 of televising and establishes communication connection.The voice assistant 2 is established with the software 1 of televising by proprietary protocol Communication connection.The proprietary protocol communicated with the software 1 of televising by building the voice assistant 2, realizes the voice The communication connection of assistant 2 and the software 1 of televising.

As shown in Figure 1, the preferred embodiment of the present invention is：The software of televising includes a variety of independently operated electricity Depending on playout software, the software cooperating of televising of the voice assistant and current active.Specific implementation process is as follows： The software 1 of televising is a variety of independently operated softwares of televising, the electricity of the voice assistant 2 and current active Depending on 1 cooperating of playout software.If current environment only there are one the operation of software 1 of televising, the voice assistant 2 with Current 1 cooperating of software of televising, if current system environment has multiple operations of software 1 of televising, The voice assistant 2 obtains the current software of televising in system environments by current system, such as Android system 1, then the voice assistant 2 and the current foundation communication connection of software 1 of televising, carry out related work.

As shown in Fig. 2, the preferred embodiment of the present invention is：Further include network server 3, the voice assistant 2 will adopt The scene information of collection uploads to the network server 3, and the network server 3 is by the scene information and pre-stored letter Breath is matched, and matched information is transmitted to the voice assistant 2.If scene information is " blame sincere not faze ", the network clothes Business device 3 is previously stored with the relevant information of " blame sincere not faze ", for example, the master of the recommended information of " blame sincere not faze ", " blame sincere not faze " People's relevant information is held, the information such as the song link information of " blame sincere not faze ", the network server 3 is relevant by " blame sincere not faze " These information are transferred to the voice assistant 2, these information are organized into information list by the voice assistant 2, can directly be shown Show output, for users to use, including the operations such as checks, plays；It can also be transferred to the software 1 of televising, by the electricity It shows and exports depending on playout software 1, for using；It can also be transferred to mobile terminal, shown and exported by mobile terminal, for using.

As shown in Figure 1, the specific implementation mode of the present invention is：Build a kind of voice based on tv scene and voice assistant Interaction auxiliary system, including software 1 of televising, voice assistant 2, the software 1 of televising are independent with the voice assistant 2 Operation, the software 1 of televising are logical including acquiring the acquisition module 11 of scene information, being communicated with the voice assistant Believe module 12, execution module 13, the letter of scene information of the voice assistant 2 including the operation of software 1 of televising described in acquisition Cease acquisition module 21, acquire voice messaging voice acquisition module 22, carry out speech recognition conversion sound identification module 23, With module 24, transmission module 25, described information acquisition module 21 obtains the scene information of the operation of software 1 of televising, institute It includes situation elements information or scene state information to state scene information；The voice acquisition module 22 acquires voice messaging, described Sound identification module 23 carries out speech recognition conversion to the voice messaging；The matching module 24 is by speech recognition conversion result It is matched with the scene information of acquisition；If situation elements information and the speech recognition of the operation of software 1 of televising As a result related in pronunciation, word, word meaning or operation information, the transmission module 25 is by matched situation elements information It is transmitted to the software 1 of televising, the execution module 13 executes the corresponding project of the situation elements information；If the electricity Believe in pronunciation, word, word meaning or operation with institute's speech recognition result depending on the scene state information that playout software 1 is run Related on breath, the voice assistant 2 calls the scene state template of the project information built in advance, by the transmission module 25 The information of corresponding scene state template is transmitted to the software 1 of televising according to voice messaging, the execution module 13 is held The corresponding project of information of the row scene state template.

As shown in Figure 1, the specific implementation process of the present invention is：Described information acquisition module 21 obtain it is described televise it is soft The scene information mode that part 1 is run includes two ways：A kind of mode is the 1 background acquisition self-operating of software of televising Scene information, this information collection mode comprehensively, it is accurate, quickly, be preferred manner.Another mode is：The voice The scene information that software 1 of televising described in spare interface acquisition of the assistant 2 by the software 1 of televising is run, it is this Mode will determine the degree of acquisition information according to the function of spare interface.For the scene letter of the acquisition of software 1 of televising Breath is transmitted to the acquisition that the voice assistant 2 completes scene information by the software 1 of televising.For the voice assistant 2 The scene information that software 1 of televising described in spare interface acquisition by the software 1 of televising is run, itself is The process that scene information obtains.The scene information includes situation elements information or scene state information.The situation elements letter Breath includes the visual information for running details interface and presenting, and specifically includes the text information, pictorial information, video information of runnable interface The text information of title etc., operation details interface is most important information.The scene state information includes mainly runnable interface The operation information being related to, such as：It plays video, play the related operation informations such as music, operation game.In specific embodiment, root According to these information, the element information of acquisition is usually converted into text information more.

By external voice input equipment input voice information, the voice acquisition module 22 acquires the voice messaging, Then sound identification module 23 carries out speech recognition conversion to the voice messaging.In specific embodiment, speech recognition conversion knot Fruit includes text information, relates to operation information.Such as：Happy base camp is opened, then speech recognition conversion result is related to Operation information also includes text information.

The matching module 24 matches speech recognition conversion result with the scene information of acquisition, mainly from it is each oneself Pronunciation, word, word meaning or the operation information of relevant information are matched, and the situation elements information includes situation elements Type where the title of information, situation elements information, the producer involved by situation elements information, situation elements information relate to And content information in it is one or more.In the relevant information it is same or similar be included in relevant information pronunciation, word, It is same or similar in word meaning, affiliated type or operation information, such as：Current scene element information is " happy base camp ", Same or similar matching is carried out from the pronunciation of " happy base camp ", word, can also be matched from its affiliated type, such as： " happy base camp " is variety show, can also be from the matching of the enterprising row information of its host, can also be from its affiliated TV station Enterprising row information matching etc..Another way is：Match both sides respectively partial information in pronunciation, word, word meaning, affiliated class It is same or similar in type or operation information.Such as：Current scene element information is " happy base camp ", can take its partial information " happy " and " base camp " is matched, if voice recognition result includes " happy " or " base camp ", can also be incited somebody to action " fast Happy base camp " matching is correlation.After matching is related, matched situation elements information is transmitted to the electricity by the transmission module 25 Depending on playout software 1, the corresponding project of the situation elements information is executed by the execution module 13.Such as in situation elements information There is the program of display " happy base camp ", after matching correlation, " happy base camp " information is transferred to described by the voice assistant 2 Televise software 1, the execution module 13 execute should " happy base camp " program, implementing result includes selection, is clicked etc. Operation.

The scene state information of the operation of software 1 of televising is to institute's speech recognition result matching result in related letter Same or similar on breath, then the voice assistant 2 calls the scene state template of the project information built in advance, then the biography The information of corresponding scene state template is transmitted to the software 1 of televising by defeated module 25 according to voice messaging, is held by described Row module 13 executes the corresponding project of information of the scene state template.It is exemplified below：If the scene state letter currently acquired Breath is " blame sincere not faze for broadcasting ", then the voice assistant 2 calls the video player module built in advance, video player module Including the phases such as " broadcasting ", " F.F. ", " rewind ", " Volume Up ", " volume down ", " contrast increase ", " contrast reduction " The operation information that video playing is related to is closed, if the information of voice recognition result includes " increasing volume ", understands from its meaning, answers For " Volume Up ", then " Volume Up " is sent to the software 1 of televising, the then execution by the transmission module 25 Module 13 executes the operation of Volume Up.

As shown in Figure 1, the preferred embodiment of the present invention is：The software 1 of televising includes a variety of independently operated It televises software, the software cooperating of televising of the voice assistant and current active.Specific implementation process is such as Under：The software 1 of televising is a variety of independently operated softwares of televising, the institute of the voice assistant 2 and current active State 1 cooperating of software of televising.If current environment is only there are one the operation of software 1 of televising, the voice helps Hand 2 and current 1 cooperating of software of televising, if current system environment has multiple fortune of software 1 of televising Row, then the voice assistant 2 by current system, such as Android system, broadcast by the current TV obtained in system environments Part 1 is softened, then the voice assistant 2 is established with the current software 1 of televising and communicated to connect, and carries out related work.

As shown in Fig. 2, the preferred embodiment of the present invention is：Further include network server 3, the voice assistant 2 will adopt The scene information of collection uploads to the network server 3, and the network server 3 is by the scene information and pre-stored letter Breath is matched, and matched information is transmitted to the voice assistant 2.If scene information is " blame sincere not faze ", the network clothes Business device 3 is previously stored with the relevant information of " blame sincere not faze ", for example, the master of the recommended information of " blame sincere not faze ", " blame sincere not faze " People's relevant information is held, the information such as the song link information of " blame sincere not faze ", the network server 3 is relevant by " blame sincere not faze " These information are transferred to the voice assistant 2, these information are organized into information list by the voice assistant 2, by the second information Output module 26 directly displays output, for users to use, including the operations such as checks, plays；The TV can also be transferred to broadcast Part 1 is softened, is shown and is exported by the first information output module 14, for using；It can also be transferred to mobile terminal, by mobile whole End display output, for using.

The solution have the advantages that：Build a kind of interactive voice householder method based on tv scene and voice assistant and System, including software 1 of televising, voice assistant 2, televise software 1 and 2 independent operating of the voice assistant, institute State the scene information that voice assistant 2 obtains the operation of software 1 of televising, the scene information include situation elements information or Scene state information；The voice assistant 2 acquires voice messaging, and the voice assistant 2 carries out voice knowledge to the voice messaging It does not convert；The voice assistant 2 matches speech recognition conversion result with the scene information of acquisition；If described televise The situation elements information that software 1 is run phase in pronunciation, word, word meaning or operation information with institute's speech recognition result It closes, matched situation elements information is transmitted to the software 1 of televising by the voice assistant 2, is televised by described Software 1 executes the corresponding project of the situation elements information；If the scene state information of the running software is known with the voice Other result is related in pronunciation, word, word meaning or operation information, and the voice assistant 2 calls this built in advance The scene state template of mesh information is transmitted the information of corresponding scene state template according to voice messaging by the voice assistant 2 To the software 1 of televising, the corresponding project of information of the scene state template is executed by the software 1 of televising. The present invention is based on tv scene and the interactive voice householder method and system of voice assistant 2, it is described televise software 1 with it is described 2 independent operating of voice assistant, the voice assistant 2 obtain the scene information of the operation of software 1 of televising, and the voice helps Hand 2 matches speech recognition conversion result with the scene information of acquisition, then for matched scene information, according to scene Element information and scene state information and voice messaging, by televising, software 1 carries out operation execution.The present invention is based on TV Fields The interactive voice householder method and system of scape and voice assistant 2 are operated on it according to the real-time scene information of TV With use, so that voice television is really marched toward intelligence, meanwhile, software 1 separates independent operating with televising, can be with a voice Assistant 2 is used cooperatively with multiple softwares 1 of televising, and greatlys save system resource.In addition, convenient be updated speech engine And innovation, promote development of the voice technology in terms of intelligence.

The above content is a further detailed description of the present invention in conjunction with specific preferred embodiments, and it cannot be said that The specific implementation of the present invention is confined to these explanations.For those of ordinary skill in the art to which the present invention belongs, exist Under the premise of not departing from present inventive concept, a number of simple deductions or replacements can also be made, all shall be regarded as belonging to the present invention's Protection domain.

Claims

1. a kind of interactive voice householder method based on tv scene and voice assistant, software of televising is independent with voice assistant Operation, software and the voice assistant of televising establish communication connection by the spare interface of the software of televising Or televise software and the voice assistant pass through proprietary protocol and establish communication connection, which is characterized in that voice is handed over Mutual householder method includes the following steps：

Obtain scene information：The voice assistant is televised the scene information of running software described in obtaining, the scene information Including situation elements information or scene state information；The voice assistant is televised the scene information of running software described in obtaining Mode includes two ways：A kind of mode is the scene information of the software background acquisition self-operating of televising；Other one Kind of mode is：The voice assistant passes through running software of televising described in the spare interface acquisition of the software of televising Scene information；The situation elements information includes the visual information for running details interface and presenting, and specifically includes the text of runnable interface Word information, pictorial information, video information title, the scene state information include the operation information that runnable interface is related to, specifically Including：It plays video, play music, operation game；

Input voice：The voice assistant acquires voice messaging, and the voice assistant carries out speech recognition to the voice messaging Conversion；

Matching executes：The voice assistant matches speech recognition conversion result with the scene information of acquisition；If the electricity Situation elements information and institute speech recognition result depending on playout software operation is same or similar in relevant information, then institute's predicate Matched situation elements information is transmitted to the software of televising by sound assistant, and the field is executed by the software of televising The corresponding project of scape element information；If the scene state information of the running software of televising and institute's speech recognition result Same or similar in relevant information with result, then the voice assistant calls the scene state of the project information built in advance Template, then the voice assistant according to voice messaging by the information of corresponding scene state template be transmitted to it is described televise it is soft Part is executed the corresponding project of information of the scene state template by the software of televising.

2. the interactive voice householder method based on tv scene and voice assistant according to claim 1, which is characterized in that institute It includes a variety of independently operated softwares of televising, the TV of the voice assistant and current active to state software of televising Playout software cooperating.

3. the interactive voice householder method based on tv scene and voice assistant according to claim 1, which is characterized in that institute It states voice assistant and the scene information of acquisition is uploaded into network server, the network server is by the scene information and in advance The information of storage is matched, and matched information is transmitted to the voice assistant.

4. the interactive voice householder method based on tv scene and voice assistant according to claim 1, which is characterized in that institute State in relevant information it is same or similar include in relevant information in pronunciation, word, word meaning, affiliated type or operation information It is same or similar, or matching both sides respectively partial information in pronunciation, word, word meaning, affiliated type or operation information It is same or similar.

5. a kind of interactive voice auxiliary system based on tv scene and voice assistant, which is characterized in that soft including televising Part, voice assistant, televise software and the voice assistant independent operating, software and the voice of televising Assistant establishes communication connection or software and the voice of televising by the spare interface of the software of televising Assistant is established by proprietary protocol and is communicated to connect, and the software of televising includes acquisition module and the institute for acquiring scene information Communication module, the execution module that voice assistant is communicated are stated, the voice assistant includes software fortune of televising described in acquisition The data obtaining module of capable scene information, the voice acquisition module for acquiring voice messaging, the voice for carrying out speech recognition conversion Identification module, matching module, transmission module, described information acquisition module obtain described in televise running software scene letter Breath, the scene information includes situation elements information or scene state information；It televises described in the voice assistant acquisition soft The scene information mode of part operation includes two ways：A kind of mode is the software background acquisition self-operating of televising Scene information；Another mode is：The voice assistant acquires the electricity by the spare interface of the software of televising Depending on the scene information of playout software operation；The situation elements information includes the visual information for running details interface and presenting, specifically Text information, pictorial information, video information title including runnable interface, the scene state information include that runnable interface is related to Operation information, specifically include：It plays video, play music, operation game；The voice acquisition module acquires voice messaging, The sound identification module carries out speech recognition conversion to the voice messaging；The matching module is by speech recognition conversion result It is matched with the scene information of acquisition；If the situation elements information of the running software of televising and the speech recognition knot Fruit is same or similar in relevant information, the transmission module by matched situation elements information be transmitted to it is described televise it is soft Part, the execution module execute the corresponding project of the situation elements information；If the scene shape of the running software of televising State information and institute speech recognition result are same or similar in relevant information, and the voice assistant calls this built in advance The information of corresponding scene state template is transmitted to by the transmission module according to voice messaging by the scene state template of mesh information The software of televising, the execution module execute the corresponding project of information of the scene state template.

6. according to interactive voice auxiliary system of the claim 5 based on tv scene and voice assistant, which is characterized in that the electricity Include a variety of independently operated softwares of televising depending on playout software, the voice assistant and the described of current active are televised Software cooperating.

7. according to interactive voice auxiliary system of the claim 5 based on tv scene and voice assistant, which is characterized in that further include The scene information of acquisition is uploaded to the network server by network server, the voice assistant, and the network server will The scene information is matched with pre-stored information, and matched information is transmitted to the voice assistant.

8. according to interactive voice auxiliary system of the claim 7 based on tv scene and voice assistant, which is characterized in that the electricity Depending on playout software include first information output module or the voice assistant includes the second message output module.