CN104506906B - Interactive voice householder method and system based on tv scene element and voice assistant - Google Patents

Interactive voice householder method and system based on tv scene element and voice assistant Download PDF

Info

Publication number
CN104506906B
CN104506906B CN201410634282.0A CN201410634282A CN104506906B CN 104506906 B CN104506906 B CN 104506906B CN 201410634282 A CN201410634282 A CN 201410634282A CN 104506906 B CN104506906 B CN 104506906B
Authority
CN
China
Prior art keywords
information
software
televising
voice
voice assistant
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201410634282.0A
Other languages
Chinese (zh)
Other versions
CN104506906A (en
Inventor
黄海兵
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
iFlytek Co Ltd
Original Assignee
iFlytek Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by iFlytek Co Ltd filed Critical iFlytek Co Ltd
Priority to CN201410634282.0A priority Critical patent/CN104506906B/en
Publication of CN104506906A publication Critical patent/CN104506906A/en
Application granted granted Critical
Publication of CN104506906B publication Critical patent/CN104506906B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/41Structure of client; Structure of client peripherals
    • H04N21/422Input-only peripherals, i.e. input devices connected to specially adapted client devices, e.g. global positioning system [GPS]
    • H04N21/42203Input-only peripherals, i.e. input devices connected to specially adapted client devices, e.g. global positioning system [GPS] sound input device, e.g. microphone
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/26Speech to text systems
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/439Processing of audio elementary streams
    • H04N21/4394Processing of audio elementary streams involving operations for analysing the audio stream, e.g. detecting features or characteristics in audio streams
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs
    • H04N21/4402Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving reformatting operations of video signals for household redistribution, storage or real-time display
    • H04N21/440236Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving reformatting operations of video signals for household redistribution, storage or real-time display by media transcoding, e.g. video is transformed into a slideshow of still pictures, audio is converted into text
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/443OS processes, e.g. booting an STB, implementing a Java virtual machine in an STB or power management in an STB
    • H04N21/4432Powering on the client, e.g. bootstrap loading using setup parameters being stored locally or received from the server
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/47End-user applications
    • H04N21/472End-user interface for requesting content, additional data or services; End-user interface for interacting with content, e.g. for content reservation or setting reminders, for requesting event notification, for manipulating displayed content
    • H04N21/47202End-user interface for requesting content, additional data or services; End-user interface for interacting with content, e.g. for content reservation or setting reminders, for requesting event notification, for manipulating displayed content for requesting content on demand, e.g. video on demand

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Human Computer Interaction (AREA)
  • Computational Linguistics (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • General Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Databases & Information Systems (AREA)
  • Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)
  • Information Transfer Between Computers (AREA)

Abstract

The present invention relates to interactive voice householder methods and system based on tv scene element and voice assistant, software and the voice assistant independent operating of televising, the scene information that software of televising described in the voice assistant acquisition is run, the voice assistant matches speech recognition conversion result with the scene information of acquisition, then for matched scene information, according to situation elements information and scene state information and voice messaging, by televising, software carries out operation execution.The present invention is based on tv scene element and the interactive voice householder methods and system of voice assistant, it is operated on it and is used according to the real-time scene information of TV, voice television is set really to march toward intelligence, simultaneously, software separates independent operating with televising, it can be used cooperatively with a voice assistant and multiple softwares of televising, greatly save system resource.In addition, convenient be updated and innovate to speech engine, promote development of the voice technology in terms of intelligence.

Description

Interactive voice householder method and system based on tv scene element and voice assistant
Technical field
The present invention relates to a kind of interactive voice householder method and systems, more particularly to one kind to be based on tv scene element and language The interactive voice householder method and system of sound assistant.
Background technique
Although the emerging technologies such as smart phone, network change people's production and life, in the family, TV significantly Or there is the information transmission status that do not replace.With the development of science and technology, TV tech has also obtained significant progress, at present To the intelligent stage, smart television is more and more extensive to be applied in people's life Polarizations for Target Discrimination in Clutter.With the development of voice technology, language Sound TV also increasingly walks close to people's lives.It is soft that embedded voice is typically employed in module of televising in voice television at present Part carries out voice-controlled operations, and majority can only carry out concrete operations project and be operated, since scene information is soft according to TV The real time execution of part and change, therefore, existing inline operations for the real-time scene information of TV cannot grasp it Make and uses.In addition, software of then each televising carries out for loading multiple softwares of televising in intelligent television platform The embedded exploitation of complicated voice just can be carried out use, meanwhile, when software is loaded, a large amount of memory can be occupied, especially Simultaneously load it is several televise software when, need a large amount of memory source, influence the operational effect of system.With speech recognition Degree it is higher and higher, speech engine is also more and more huger, and voice control is also more and more intelligent, this needs speech engine itself It is continuous to update and develop, the embedded development for obviously greatly limiting voice control of voice.
Summary of the invention
Technical problem solved by the present invention is it is auxiliary to construct a kind of interactive voice based on tv scene element and voice assistant Aid method and system, overcome the prior art cannot for TV real-time scene information be operated on it and be used and The technical issues of operational effect of influence system, limitation voice control development on TV.
The technical scheme is that providing a kind of interactive voice auxiliary square based on tv scene element and voice assistant Method, including software of televising, voice assistant, televise software and the voice assistant independent operating, interactive voice Householder method includes the following steps:
Obtain scene information: the scene information for software operation of televising described in the voice assistant acquisition, the scene Information includes situation elements information;
Input voice: the voice assistant acquires voice messaging, and the voice assistant carries out voice to the voice messaging Identification conversion;
Matching executes: the voice assistant matches speech recognition conversion result with the scene information of acquisition;If institute Situation elements information and the institute speech recognition result for stating software operation of televising are same or similar in relevant information, then institute It states voice assistant and matched situation elements information is transmitted to the software of televising, institute is executed by the software of televising State the corresponding project of situation elements information.
Software is televised and the voice assistant is broadcast by the TV a further technical solution of the present invention is: described The spare interface for softening part establishes communication connection or described televises software and the voice assistant is built by proprietary protocol Vertical communication connection.
A further technical solution of the present invention is: the software of televising include it is a variety of it is independently operated televise it is soft The software cooperating of televising of part, the voice assistant and current active.
A further technical solution of the present invention is: further including network server, the voice assistant believes the scene of acquisition Breath uploads to the network server, and the network server matches the scene information with pre-stored information, Matched information is transmitted to the voice assistant.
A further technical solution of the present invention is: the same or similar relevant information that is included in is being sent out in the relevant information It is same or similar in sound, text, text meaning, affiliated type or operation information, or matching both sides respectively sending out by partial information It is same or similar in sound, text, text meaning, affiliated type or operation information.
The technical scheme is that construct it is a kind of based on tv scene element and the interactive voice of voice assistant auxiliary system System, including software of televising, voice assistant, televise software and the voice assistant independent operating, the TV Playout software includes the acquisition acquisition module of scene information, the communication module communicated with the voice assistant, execution module, The voice assistant include obtain described in televise software operation scene information data obtaining module, acquisition voice messaging Voice acquisition module, carry out speech recognition conversion speech recognition module, matching module, transmission module, the acquisition of information The scene information that software of televising described in module acquisition is run, the scene information includes situation elements information;The voice Acquisition module acquires voice messaging, and the speech recognition module carries out speech recognition conversion to the voice messaging;The matching Module matches speech recognition conversion result with the scene information of acquisition;If the scene member of the software operation of televising Prime information and institute's speech recognition result are same or similar in relevant information, and the transmission module believes matched situation elements Breath is transmitted to the software of televising, and the execution module executes the corresponding project of the situation elements information.
A further technical solution of the present invention is: the software of televising include it is a variety of it is independently operated televise it is soft The software cooperating of televising of part, the voice assistant and current active.
A further technical solution of the present invention is: further including network server, the voice assistant believes the scene of acquisition Breath uploads to the network server, and the network server matches the scene information with pre-stored information, Matched information is transmitted to the voice assistant.
A further technical solution of the present invention is: the software of televising includes first information output module or described Voice assistant includes the second message output module.
The solution have the advantages that: construct a kind of interactive voice auxiliary square based on tv scene element and voice assistant Method and system, including software of televising, voice assistant, televise software and the voice assistant independent operating, institute The scene information for software operation of televising described in voice assistant acquisition is stated, the scene information includes situation elements information;Institute Voice assistant acquisition voice messaging is stated, the voice assistant carries out speech recognition conversion to the voice messaging;The voice helps Hand matches speech recognition conversion result with the scene information of acquisition;If the situation elements of the software operation of televising Information and institute's speech recognition result are same or similar in relevant information, then the voice assistant believes matched situation elements Breath is transmitted to the software of televising, and executes the corresponding project of the situation elements information by the software of televising.This Interactive voice householder method and system of the invention based on tv scene element and voice assistant, it is described televise software with it is described Voice assistant independent operating, the voice assistant obtain described in televise the scene information of software operation, the voice assistant Speech recognition conversion result is matched with the scene information of acquisition, then for matched scene information, according to scene member Prime information and scene state information and voice messaging, by televising, software carries out operation execution.The present invention is based on tv scenes The interactive voice householder method and system of element and voice assistant are operated on it according to the real-time scene information of TV With use, so that voice television is really marched toward intelligence, meanwhile, software separates independent operating with televising, can be with a voice Assistant is used cooperatively with multiple softwares of televising, and greatlys save system resource.In addition, it is convenient speech engine is updated and Innovation promotes development of the voice technology in terms of intelligence.
Detailed description of the invention
Fig. 1 is the structural diagram of the present invention.
Fig. 2 is the preferred embodiment of the present invention structural schematic diagram.
Specific embodiment
Combined with specific embodiments below, further explanation of the technical solution of the present invention.
As shown in Figure 1, a specific embodiment of the invention is: providing a kind of based on tv scene element and voice assistant Interactive voice householder method, including software 1 of televising, voice assistant 2, software 1 and the voice assistant 2 of televising Independent operating, interactive voice householder method include the following steps:
Obtain scene information: the voice assistant 2 obtains the scene information of the operation of software 1 of televising, the field Scape information includes situation elements information.
Specific implementation process is as follows: the voice assistant 2 obtains the scene information mode of the operation of software 1 of televising Including two ways: a kind of mode is the scene information of the 1 background acquisition self-operating of software of televising, and this information is adopted Mode set is preferred manner comprehensively, accurately, quickly.Another mode is: the voice assistant 2 is televised by described The scene information that software 1 of televising described in the spare interface acquisition of software 1 is run, this mode will be according to the function of spare interface It can determine the degree of acquisition information.For the scene information of the acquisition of software 1 of televising, by the software 1 of televising It is transmitted to the acquisition that the voice assistant 2 completes scene information.Pass through the software 1 of televising for the voice assistant 2 Spare interface acquisition described in televise software 1 operation scene information, itself be scene information obtain process.Institute Stating scene information includes situation elements information.The situation elements information includes the visual information for running details interface and presenting, tool Body includes text information, pictorial information, video information title of runnable interface etc., and the text information at operation details interface is most main The information wanted.The scene state information mainly includes the operation information that runnable interface is related to, such as: it plays video, play sound The related operation informations such as happy, operation game.In specific embodiment, according to these information, usually by more turns of the element information of acquisition It is changed to text information.
Input voice: the voice assistant 2 acquires voice messaging, and the voice assistant 2 carries out language to the voice messaging Sound identification conversion.
Specific implementation process is as follows: by external voice input equipment input voice information, the voice assistant 2 acquires institute Voice messaging is stated, speech recognition conversion then is carried out to the voice messaging.In specific embodiment, speech recognition conversion result packet Text information is included, operation information is alsod relate to.Such as: happy base camp is opened, then speech recognition conversion result is related to operating Information also includes text information.
Matching executes: the voice assistant 2 matches speech recognition conversion result with the scene information of acquisition;If institute Situation elements information and the institute speech recognition result for stating the operation of software 1 of televising are same or similar in relevant information, then Matched situation elements information is transmitted to the software 1 of televising by the voice assistant 2, by the software 1 of televising Execute the corresponding project of the situation elements information.
Specific implementation process is as follows: the voice assistant 2 carries out speech recognition conversion result and the scene information of acquisition Matching is mainly matched from pronunciation, text, text meaning or the operation information of oneself each relevant information, the scene member Prime information includes type where the title of situation elements information, situation elements information, production involved in situation elements information One of content information that personnel, situation elements information are related to is a variety of.It is same or similar in the relevant information to be included in Relevant information is same or similar in pronunciation, text, text meaning, affiliated type or operation information, such as: current scene element Information is " happy base camp ", carries out same or similar matching from the pronunciation of " happy base camp ", text, can also be belonging to it It is matched in type, such as: " happy base camp " is variety show, can also be from the matching of the enterprising row information of its host, also It can be from its affiliated enterprising row information matching of TV station etc..Another way is: matching both sides respectively partial information pronunciation, text It is same or similar in word, text meaning, affiliated type or operation information.Such as: current scene element information is " happy university degree Battalion ", can take its partial information " happy " and " base camp " to be matched, if in speech recognition result including " happy " or " base camp " can also then match " happy base camp " as correlation.After matching is related, the voice assistant 2 is by matched field Scape element information is transmitted to the software 1 of televising, and it is corresponding to execute the situation elements information by the software 1 of televising Project.For example having the program of display " happy base camp " in situation elements information, after matching is related, the voice assistant 2 will " happy base camp " information is transferred to the software 1 of televising, and the software 1 of televising, which executes, is somebody's turn to do " happy base camp " Program, implementing result include the operations such as selection, click.
As shown in Figure 1, the preferred embodiment of the present invention is: televise software 1 and the progress of the voice assistant 2 In message transmitting procedure, televise software 1 and the voice assistant 2 establish communication connection by following two mode. It is described televise software 1 and the voice assistant 2 by the spare interface of the software 1 of televising establish communication connection or Software 1 and the voice assistant 2 of televising described in person pass through proprietary protocol and establish communication connection.The voice assistant 2 obtains The scene information of acquisition operation includes two ways: the software 1 of televising is transmitted to the voice assistant 2 or institute's predicate Sound assistant 2 directly acquires to the software 1 of televising.For the scene information of the acquisition operation of software 1 of televising, institute It states televise software 1 and the voice assistant 2 and establishes and communicate to connect, then by the software 1 of televising by the fortune of acquisition Row scene information is transferred to the voice assistant 2.The voice assistant 2 can also televise what software 1 was reserved by described Interface and the software 1 of televising are established and are communicated to connect, and the voice assistant 2 is directly acquired to the software 1 of televising It televises described in acquisition the Run-time scenario information of software 1.The voice assistant 2 is according to the reserved of the software 1 of televising Interface and the software 1 of televising are established and are communicated to connect.Currently, most software is some specific functions, reserve Communication interface, such as: some softwares are that old man does not see Chu and the reserved interface for carrying out massage voice reading, alternatively, some softwares are The auxiliary operation interface etc. that blind person reserves.These functional interfaces and institute of the voice assistant 2 by the software 1 of televising It states software 1 of televising and establishes communication connection.The voice assistant 2 is established with the software 1 of televising by proprietary protocol Communication connection.The proprietary protocol communicated by constructing the voice assistant 2 with the software 1 of televising, realizes the voice The communication connection of assistant 2 and the software 1 of televising.
As shown in Figure 1, the preferred embodiment of the present invention is: the software of televising includes a variety of independently operated electricity Depending on playout software, the software cooperating of televising of the voice assistant and current active.Specific implementation process is as follows: The software 1 of televising is a variety of independently operated softwares of televising, the electricity of the voice assistant 2 and current active Depending on 1 cooperating of playout software.The operation of software 1 if current environment is televised described in only one, the voice assistant 2 with Current 1 cooperating of software of televising, if current system environment has multiple operations of software 1 of televising, The voice assistant 2 obtains the current software of televising in system environments by current system, such as Android system 1, then the voice assistant 2 is established with the current software 1 of televising and is communicated to connect, and carries out related work.
As shown in Fig. 2, the preferred embodiment of the present invention is: further including network server 3, the voice assistant 2 will adopt The scene information of collection uploads to the network server 3, and the network server 3 is by the scene information and pre-stored letter Breath is matched, and matched information is transmitted to the voice assistant 2.If scene information is " blame sincere not faze ", the network clothes Business device 3 is previously stored with the relevant information of " blame sincere not faze ", for example, the master of the recommended information of " blame sincere not faze ", " blame sincere not faze " People's relevant information is held, the information such as the song link information of " blame sincere not faze ", the network server 3 is relevant by " blame sincere not faze " These information are transferred to the voice assistant 2, these information are organized into information list, can directly shown by the voice assistant 2 Show output, for users to use, including the operation such as checks, plays;It can also be for transmission to the software 1 of televising, by the electricity It shows and exports depending on playout software 1, for using;It can also be transferred to mobile terminal, shown and exported by mobile terminal, for using.
As shown in Figure 1, a specific embodiment of the invention is: constructing a kind of based on tv scene element and voice assistant Interactive voice auxiliary system, including software 1 of televising, voice assistant 2, software 1 and the voice assistant 2 of televising Independent operating, the software 1 of televising is including acquiring the acquisition module 11 of scene information, being communicated with the voice assistant Communication module 12, execution module 13, the voice assistant 2 include obtain described in televise software 1 operation scene information Data obtaining module 21, acquire voice messaging voice acquisition module 22, carry out speech recognition conversion speech recognition module 23, matching module 24, transmission module 25, the data obtaining module 21 obtain the scene letter of the operation of software 1 of televising Breath, the scene information includes situation elements information;The voice acquisition module 22 acquires voice messaging, the speech recognition mould Block 23 carries out speech recognition conversion to the voice messaging;The matching module 24 is by the field of speech recognition conversion result and acquisition Scape information is matched;If it is described televise software 1 operation situation elements information and institute's speech recognition result pronunciation, Related in text, text meaning or operation information, matched situation elements information is transmitted to described by the transmission module 25 It televises software 1, the execution module 13 executes the corresponding project of the situation elements information.
As shown in Figure 1, specific implementation process of the invention is: the data obtaining module 21 obtain it is described televise it is soft The scene information mode that part 1 is run includes two ways: a kind of mode is the 1 background acquisition self-operating of software of televising Scene information, this information collection mode comprehensively, it is accurate, quickly, be preferred manner.Another mode is: the voice The scene information that software 1 of televising described in spare interface acquisition of the assistant 2 by the software 1 of televising is run, it is this Mode will determine the degree of acquisition information according to the function of spare interface.For the scene letter of the acquisition of software 1 of televising Breath is transmitted to the acquisition that the voice assistant 2 completes scene information by the software 1 of televising.For the voice assistant 2 The scene information that software 1 of televising described in spare interface acquisition by the software 1 of televising is run, itself is The process that scene information obtains.The scene information includes situation elements information.The situation elements information includes operation details The visual information that interface is presented, specifically includes text information, pictorial information, video information title of runnable interface etc., and operation is detailed The text information at feelings interface is most important information.The scene state information mainly includes the operation letter that runnable interface is related to Breath, such as: it plays video, play the related operation informations such as music, operation game.In specific embodiment, according to these information, lead to The element information of acquisition is often converted into text information more.
By external voice input equipment input voice information, the voice acquisition module 22 acquires the voice messaging, Then speech recognition module 23 carries out speech recognition conversion to the voice messaging.In specific embodiment, speech recognition conversion knot Fruit includes text information, alsos relate to operation information.Such as: happy base camp is opened, then speech recognition conversion result is related to Operation information also includes text information.
The matching module 24 matches speech recognition conversion result with the scene information of acquisition, mainly from it is each oneself Pronunciation, text, text meaning or the operation information of relevant information are matched, and the situation elements information includes situation elements Producer involved in type, situation elements information where the title of information, situation elements information, situation elements information relate to And one of content information or a variety of.In the relevant information it is same or similar be included in relevant information pronunciation, text, It is same or similar in text meaning, affiliated type or operation information, such as: current scene element information is " happy base camp ", Same or similar matching is carried out from the pronunciation of " happy base camp ", text, can also be matched from its affiliated type, such as: " happy base camp " is variety show, can also be from the matching of the enterprising row information of its host, can also be from its affiliated TV station Enterprising row information matching etc..Another way is: matching both sides respectively partial information in pronunciation, text, text meaning, affiliated class It is same or similar in type or operation information.Such as: current scene element information is " happy base camp ", can take its partial information " happy " and " base camp " is matched, if in speech recognition result including " happy " or " base camp ", can also be incited somebody to action " fast Happy base camp " matching is correlation.After matching is related, matched situation elements information is transmitted to the electricity by the transmission module 25 Depending on playout software 1, the corresponding project of the situation elements information is executed by the execution module 13.Such as in situation elements information There is the program of display " happy base camp ", after matching correlation, " happy base camp " information is transferred to described by the voice assistant 2 Televise software 1, the execution module 13 execute should " happy base camp " program, implementing result includes selection, clicks etc. Operation.
As shown in Figure 1, the preferred embodiment of the present invention is: the software 1 of televising includes a variety of independently operated It televises software, the software cooperating of televising of the voice assistant and current active.Specific implementation process is such as Under: the software 1 of televising is a variety of independently operated softwares of televising, the institute of the voice assistant 2 and current active State 1 cooperating of software of televising.If current environment is televised described in only one, the operation of software 1, the voice are helped Hand 2 and current 1 cooperating of software of televising, if current system environment has multiple fortune of software 1 of televising Row, then the voice assistant 2 by current system, such as Android system, broadcast by the current TV obtained in system environments Part 1 is softened, then the voice assistant 2 is established with the current software 1 of televising and communicated to connect, and carries out related work.
As shown in Fig. 2, the preferred embodiment of the present invention is: further including network server 3, the voice assistant 2 will adopt The scene information of collection uploads to the network server 3, and the network server 3 is by the scene information and pre-stored letter Breath is matched, and matched information is transmitted to the voice assistant 2.If scene information is " blame sincere not faze ", the network clothes Business device 3 is previously stored with the relevant information of " blame sincere not faze ", for example, the master of the recommended information of " blame sincere not faze ", " blame sincere not faze " People's relevant information is held, the information such as the song link information of " blame sincere not faze ", the network server 3 is relevant by " blame sincere not faze " These information are transferred to the voice assistant 2, these information are organized into information list by the voice assistant 2, by the second information Output module 26 directly displays output, for users to use, including the operation such as checks, plays;It can also be broadcast for transmission to the TV Part 1 is softened, is shown and is exported by the first information output module 14, for using;It can also be transferred to mobile terminal, by mobile whole End display output, for using.
The solution have the advantages that: construct a kind of interactive voice auxiliary square based on tv scene element and voice assistant Method and system, including software 1 of televising, voice assistant 2, software 1 and the independent fortune of the voice assistant 2 of televising Row, the voice assistant 2 obtain the scene information of the operation of software 1 of televising, and the scene information includes situation elements Information;The voice assistant 2 acquires voice messaging, and the voice assistant 2 carries out speech recognition conversion to the voice messaging;Institute It states voice assistant 2 and matches speech recognition conversion result with the scene information of acquisition;If the operation of software 1 of televising Situation elements information and institute's speech recognition result in pronunciation, text, text meaning or operation information related, by described Matched situation elements information is transmitted to the software 1 of televising by voice assistant 2, is executed by the software 1 of televising The corresponding project of the situation elements information.The present invention is based on the interactive voice householder method of tv scene and voice assistant 2 and System, televise software 1 and 2 independent operating of voice assistant, the voice assistant 2 obtain it is described televise it is soft The scene information that part 1 is run, the voice assistant 2 match speech recognition conversion result with the scene information of acquisition, so Afterwards for matched scene information, according to situation elements information and scene state information and voice messaging, by software 1 of televising Carry out operation execution.The present invention is based on tv scene and the interactive voice householder methods and system of voice assistant 2, according to TV Real-time scene information is operated on it and is used, and voice television is made really to march toward intelligence, meanwhile, it is soft with televising Part 1 separates independent operating, can be used cooperatively with a voice assistant 2 and multiple softwares 1 of televising, and greatlys save system money Source.In addition, convenient be updated and innovate to speech engine, promote development of the voice technology in terms of intelligence.
The above content is a further detailed description of the present invention in conjunction with specific preferred embodiments, and it cannot be said that Specific implementation of the invention is only limited to these instructions.For those of ordinary skill in the art to which the present invention belongs, exist Under the premise of not departing from present inventive concept, a number of simple deductions or replacements can also be made, all shall be regarded as belonging to of the invention Protection scope.

Claims (8)

1. a kind of interactive voice householder method based on tv scene element and voice assistant, including software of televising, voice Assistant, televise software and the voice assistant independent operating, it is described televise software and the voice assistant it is logical Cross the software of televising spare interface establish communication connection or it is described televise software and the voice assistant it is logical It crosses proprietary protocol and establishes communication connection, which is characterized in that interactive voice householder method includes the following steps:
Obtain scene information: the scene information for software operation of televising described in the voice assistant acquisition, the scene information Including situation elements information;The scene information mode that software is run of televising described in the voice assistant acquisition includes two kinds of sides Formula: a kind of mode is the scene information of the software background acquisition self-operating of televising, another mode is: institute's predicate The scene information that software of televising described in spare interface acquisition of the sound assistant by the software of televising is run;The field Scape element information includes the visual information for running details interface and presenting;
Input voice: the voice assistant acquires voice messaging, and the voice assistant carries out speech recognition to the voice messaging Conversion;
Matching executes: the voice assistant matches speech recognition conversion result with the scene information of acquisition;If the electricity Situation elements information and institute speech recognition result depending on playout software operation is same or similar in relevant information, then institute's predicate Matched situation elements information is transmitted to the software of televising by sound assistant, executes the field by the software of televising The corresponding project of scape element information.
2. the interactive voice householder method based on tv scene element and voice assistant, feature exist according to claim 1 In the software of televising includes a variety of independently operated softwares of televising, the institute of the voice assistant and current active State software cooperating of televising.
3. the interactive voice householder method based on tv scene element and voice assistant, feature exist according to claim 1 In further including network server, the scene information of acquisition is uploaded to the network server, the network by the voice assistant Server matches the scene information with pre-stored information, and matched information is transmitted to the voice assistant.
4. the interactive voice householder method based on tv scene element and voice assistant, feature exist according to claim 1 In the same or similar relevant information that is included in is in pronunciation, text, text meaning, affiliated type or operation in the relevant information It is same or similar in information, or matching both sides respectively partial information in pronunciation, text, text meaning, affiliated type or operation It is same or similar in information.
5. a kind of interactive voice auxiliary system based on tv scene element and voice assistant, which is characterized in that broadcast including TV Soften part, voice assistant, televise software and the voice assistant independent operating, the software and described of televising Voice assistant establishes communication connection or the software and described of televising by the spare interface of the software of televising Voice assistant by proprietary protocol establish communicate to connect, the software of televising include acquire scene information acquisition module, Communication module, the execution module communicated with the voice assistant, the voice assistant are soft including televising described in acquisition The data obtaining module of the scene information of part operation, carries out speech recognition conversion at the voice acquisition module for acquiring voice messaging Speech recognition module, matching module, transmission module, the data obtaining module obtain described in televise the scene of software operation Information, the scene information include situation elements information;The voice acquisition module acquires voice messaging, the speech recognition mould Block carries out speech recognition conversion to the voice messaging;The matching module believes speech recognition conversion result and the scene of acquisition Breath is matched;If the situation elements information and institute's speech recognition result of the software operation of televising are in relevant information Same or similar, matched situation elements information is transmitted to the software of televising, the execution mould by the transmission module Block executes the corresponding project of the situation elements information, the scene letter for software operation of televising described in the voice assistant acquisition Breath mode includes two ways: a kind of mode is the scene information of the software background acquisition self-operating of televising, in addition A kind of mode is: software operation of televising described in spare interface acquisition of the voice assistant by the software of televising Scene information;The situation elements information includes the visual information for running details interface and presenting.
6. the interactive voice auxiliary system according to claim 5 based on tv scene element and voice assistant, which is characterized in that institute Stating software of televising includes a variety of independently operated softwares of televising, the TV of the voice assistant and current active Playout software cooperating.
7. the interactive voice auxiliary system according to claim 5 based on tv scene element and voice assistant, which is characterized in that also Including network server, the scene information of acquisition is uploaded to the network server, the network service by the voice assistant Device matches the scene information with pre-stored information, and matched information is transmitted to the voice assistant.
8. the interactive voice auxiliary system according to claim 7 based on tv scene element and voice assistant, which is characterized in that institute Stating software of televising includes that first information output module or the voice assistant include the second message output module.
CN201410634282.0A 2014-11-12 2014-11-12 Interactive voice householder method and system based on tv scene element and voice assistant Active CN104506906B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410634282.0A CN104506906B (en) 2014-11-12 2014-11-12 Interactive voice householder method and system based on tv scene element and voice assistant

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410634282.0A CN104506906B (en) 2014-11-12 2014-11-12 Interactive voice householder method and system based on tv scene element and voice assistant

Publications (2)

Publication Number Publication Date
CN104506906A CN104506906A (en) 2015-04-08
CN104506906B true CN104506906B (en) 2019-01-18

Family

ID=52948610

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410634282.0A Active CN104506906B (en) 2014-11-12 2014-11-12 Interactive voice householder method and system based on tv scene element and voice assistant

Country Status (1)

Country Link
CN (1) CN104506906B (en)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105550225B (en) * 2015-12-07 2019-05-28 百度在线网络技术(北京)有限公司 Index structuring method, querying method and device
CN107644641B (en) * 2017-07-28 2021-04-13 深圳前海微众银行股份有限公司 Dialog scene recognition method, terminal and computer-readable storage medium
CN108766436A (en) * 2018-05-31 2018-11-06 广州酷狗计算机科技有限公司 A kind of sound control method and system of multimedia equipment
CN109600675A (en) * 2019-01-24 2019-04-09 合肥盛东信息科技有限公司 A kind of AI voice endowment interactive television control system
CN113253970B (en) * 2021-07-09 2021-10-12 广州小鹏汽车科技有限公司 Voice interaction method and device, voice interaction system, vehicle and medium

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2000250575A (en) * 1999-03-01 2000-09-14 Matsushita Electric Ind Co Ltd Speech understanding device and method for automatically selecting bidirectional tv receiver
CN101465994A (en) * 2008-11-14 2009-06-24 深圳创维数字技术股份有限公司 Set-top box and method for implementing voice search therein
CN102075797A (en) * 2010-12-29 2011-05-25 深圳市同洲电子股份有限公司 Channel or program voice browsing method and digital television receiving terminal
CN102395013A (en) * 2011-11-07 2012-03-28 康佳集团股份有限公司 Voice control method and system for intelligent television
CN103064936A (en) * 2012-12-24 2013-04-24 北京百度网讯科技有限公司 Voice-input-based image information extraction analysis method and device
CN103187058A (en) * 2011-12-28 2013-07-03 上海博泰悦臻电子设备制造有限公司 Speech conversational system in vehicle
CN103869931A (en) * 2012-12-10 2014-06-18 三星电子(中国)研发中心 Method and device for controlling user interface through voice

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2000250575A (en) * 1999-03-01 2000-09-14 Matsushita Electric Ind Co Ltd Speech understanding device and method for automatically selecting bidirectional tv receiver
CN101465994A (en) * 2008-11-14 2009-06-24 深圳创维数字技术股份有限公司 Set-top box and method for implementing voice search therein
CN102075797A (en) * 2010-12-29 2011-05-25 深圳市同洲电子股份有限公司 Channel or program voice browsing method and digital television receiving terminal
CN102395013A (en) * 2011-11-07 2012-03-28 康佳集团股份有限公司 Voice control method and system for intelligent television
CN103187058A (en) * 2011-12-28 2013-07-03 上海博泰悦臻电子设备制造有限公司 Speech conversational system in vehicle
CN103869931A (en) * 2012-12-10 2014-06-18 三星电子(中国)研发中心 Method and device for controlling user interface through voice
CN103064936A (en) * 2012-12-24 2013-04-24 北京百度网讯科技有限公司 Voice-input-based image information extraction analysis method and device

Also Published As

Publication number Publication date
CN104506906A (en) 2015-04-08

Similar Documents

Publication Publication Date Title
CN104516709B (en) Voice householder method and system based on running software scene and voice assistant
CN104506944B (en) Interactive voice householder method and system based on tv scene and voice assistant
US20210104232A1 (en) Electronic device for processing user utterance and method of operating same
CN110111787B (en) Semantic parsing method and server
US10650816B2 (en) Performing tasks and returning audio and visual feedbacks based on voice command
CN104506906B (en) Interactive voice householder method and system based on tv scene element and voice assistant
CN104461446B (en) Software running method and system based on interactive voice
EP3633947B1 (en) Electronic device and control method therefor
CN106101789A (en) The voice interactive method of terminal and device
CN105161106A (en) Voice control method of intelligent terminal, voice control device and television system
WO2021057408A1 (en) Command execution method and apparatus, and device
CN108646580A (en) The determination method and device of control object, storage medium, electronic device
CN113711183A (en) Method and system for semantic intelligent task learning and adaptive execution
CN102664009B (en) System and method for implementing voice control over video playing device through mobile communication terminal
CN103281580A (en) Television set remote control method for separating user interface and system thereof
CN104506901B (en) Voice householder method and system based on tv scene state and voice assistant
WO2023083262A1 (en) Multiple device-based method for providing service, and related apparatus and system
CN109474658A (en) Electronic equipment, server and the recording medium of task run are supported with external equipment
US11170764B2 (en) Electronic device for processing user utterance
CN105554588A (en) Closed caption-support content receiving apparatus and display apparatus
CN104363517B (en) Method for switching languages and system based on tv scene and voice assistant
CN105629750A (en) Smart home control method and system
CN102929385A (en) Method for controlling application program by voice
CN107071541A (en) The method and apparatus managed for peripheral context
CN110968362B (en) Application running method, device and storage medium

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right

Effective date of registration: 20191219

Address after: 400000 floor 2, building a, No. 99, Century Avenue, Chayuan New District, Nan'an District, Chongqing

Patentee after: Chongqing Xunfei Huiyu Artificial Intelligence Technology Research Institute Co., Ltd.

Address before: 230000 No. 666 Wangjiang West Road, hi tech Development Zone, Anhui, Hefei

Patentee before: Iflytek Co., Ltd.

TR01 Transfer of patent right
TR01 Transfer of patent right

Effective date of registration: 20210616

Address after: 230088 666 Wangjiang West Road, Hefei hi tech Development Zone, Anhui

Patentee after: IFLYTEK Co.,Ltd.

Address before: 400000 2nd floor, building a, 99 Century Avenue, Chayuan New District, Nan'an District, Chongqing

Patentee before: Chongqing Xunfei Huiyu Artificial Intelligence Technology Research Institute Co.,Ltd.

TR01 Transfer of patent right