WO2018006489A1 - Procédé et dispositif d'interaction vocale de terminal - Google Patents

Procédé et dispositif d'interaction vocale de terminal Download PDF

Info

Publication number
WO2018006489A1
WO2018006489A1 PCT/CN2016/098147 CN2016098147W WO2018006489A1 WO 2018006489 A1 WO2018006489 A1 WO 2018006489A1 CN 2016098147 W CN2016098147 W CN 2016098147W WO 2018006489 A1 WO2018006489 A1 WO 2018006489A1
Authority
WO
WIPO (PCT)
Prior art keywords
matching
information
terminal
text information
cloud server
Prior art date
Application number
PCT/CN2016/098147
Other languages
English (en)
Chinese (zh)
Inventor
韩菁
Original Assignee
深圳Tcl数字技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳Tcl数字技术有限公司 filed Critical 深圳Tcl数字技术有限公司
Publication of WO2018006489A1 publication Critical patent/WO2018006489A1/fr

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/41Structure of client; Structure of client peripherals
    • H04N21/422Input-only peripherals, i.e. input devices connected to specially adapted client devices, e.g. global positioning system [GPS]
    • H04N21/42203Input-only peripherals, i.e. input devices connected to specially adapted client devices, e.g. global positioning system [GPS] sound input device, e.g. microphone
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/27Server based end-user applications
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/41Structure of client; Structure of client peripherals
    • H04N21/422Input-only peripherals, i.e. input devices connected to specially adapted client devices, e.g. global positioning system [GPS]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/41Structure of client; Structure of client peripherals
    • H04N21/422Input-only peripherals, i.e. input devices connected to specially adapted client devices, e.g. global positioning system [GPS]
    • H04N21/42204User interfaces specially adapted for controlling a client device through a remote control device; Remote control devices therefor
    • H04N21/42206User interfaces specially adapted for controlling a client device through a remote control device; Remote control devices therefor characterized by hardware details

Definitions

  • the present invention relates to the field of terminal technologies, and in particular, to a voice interaction method and apparatus for a terminal.
  • the intelligent interactive system implementation mode on the existing television system is a customized mode, that is, the TV manufacturer puts forward the requirement and is customized by the third-party identification system.
  • Voice and semantic recognition generally adopt a binding method. TV manufacturers can only select a service provider on a TV to complete voice and semantic recognition in voice interaction. This implementation mode is too limited for traditional TV companies. Large, unable to adjust according to demand, and poor flexibility.
  • the main purpose of the present invention is to provide a method and device for voice interaction of a terminal, which aims to solve the problem that a TV manufacturer can only select a service provider on a television to complete voice and semantic recognition in voice interaction.
  • a TV manufacturer can only select a service provider on a television to complete voice and semantic recognition in voice interaction.
  • the limitations are too large to adjust according to demand and the flexibility is poor.
  • the present invention provides a method for voice interaction of a terminal, including the steps of:
  • the terminal receives the output information returned by the cloud server and outputs the output information.
  • the method further includes:
  • the terminal performs a matching operation according to the text information and the information pre-stored in the local database of the terminal;
  • a response control operation corresponding to the control information is performed.
  • the step of performing the matching operation according to the text information and the information pre-stored by the terminal local database comprises:
  • the terminal calculates a matching parameter according to the text information and the information of the current page collected in advance;
  • the calculating, by the terminal, the matching parameters according to the text information and the information of the current page collected in advance includes:
  • the page space collection algorithm of the terminal in the background performs the collection of the text information of the controllable control of the current page of the television
  • the terminal After acquiring the text information, the terminal calculates matching parameters for the text information and the text collected by the combined scene control.
  • the method further includes:
  • the terminal After the current page entry fails, the terminal matches the matching parameter with the global static term. After the global static term is successfully matched, the terminal sets a tag that matches the global static term.
  • the matching parameter is matched with the application information, and after matching with the application information, setting a label matching the application information;
  • the cloud server searches for output information corresponding to the text information, including:
  • the cloud server After receiving the text information, the cloud server identifies the search parameter of the text information according to the semantic parsing engine loaded by itself;
  • the method further comprises the steps of:
  • the cloud server determines the service type corresponding to the search parameter, and accesses the information provider corresponding to the service type to provide the information service.
  • the present invention further provides a voice interaction device for a terminal, including:
  • a receiving module configured to receive an audio stream output by the voice input device
  • An obtaining module configured to acquire text information corresponding to the audio stream
  • a sending module configured to upload the text information to a cloud server built by the terminal corresponding operator, to search for output information corresponding to the text information by using the cloud server, and return to the terminal;
  • the receiving module is further configured to receive output information returned by the cloud server;
  • An output module for outputting output information returned by the cloud server.
  • the method further comprises:
  • a matching module configured to perform a matching operation according to the information stored in the terminal database according to the text information
  • the obtaining module is further configured to: after the matching operation is successful, obtain control information corresponding to the matching operation;
  • a response module configured to execute a response control operation corresponding to the control information
  • the sending module is configured to upload the text information to a cloud server of the terminal after the matching operation fails.
  • the matching module comprises:
  • a calculating unit configured to calculate a matching parameter according to the text information and information of a pre-acquired current page
  • a matching unit configured to match the matching parameter with a current page entry, after the current page entry is successfully matched
  • a setting unit configured to set a label that matches the current page entry.
  • the calculating unit is further configured to: in the process of transmitting the audio stream to the terminal, the page space collection algorithm in the background performs the collection of the text information of the current page controllable control of the television; the calculating unit is further used for
  • the matching parameters are calculated for the text information and the text collected by the combined scene control.
  • the matching module further includes: a prompting unit,
  • the matching unit is further configured to: after the current page entry matching fails, match the matching parameter with a global static term;
  • the setting unit is further configured to: after matching with the global static term, set a tag that matches the global static term;
  • the matching unit is further configured to: after the matching with the global static term fails, match the matching parameter with the application information;
  • the setting unit is further configured to: after matching the application information, set a label that matches the application information;
  • the prompting unit is configured to prompt the matching operation operation to fail after the matching with the application information fails.
  • the process of the cloud server acquiring the output information includes: after receiving the text information, identifying a search parameter of the text information according to a semantic analysis engine loaded by itself; and searching for corresponding output information according to the search parameter.
  • the cloud server determines a service type corresponding to the search parameter, and accesses an information provider corresponding to the service type to provide an information service.
  • the terminal of the invention sets up the voice interaction of its own terminal platform, and uses the television server as an interface to independently select the voice recognition service identification service and the semantic analysis engine, and separates the voice recognition from the semantic recognition, is not bound, and is semantically recognized.
  • the operation is identified in the terminal's own server, without relying on the third-party service provider to provide services, and can be adjusted according to requirements, and the flexibility is greatly increased.
  • FIG. 1 is a schematic flowchart of a first embodiment of a voice interaction method of a terminal according to the present invention
  • FIG. 2 is a schematic flowchart of a second embodiment of a voice interaction method of a terminal according to the present invention
  • FIG. 3 is a schematic flowchart of a matching operation according to an embodiment of the present invention.
  • FIG. 4 is a schematic flowchart of a third embodiment of a voice interaction method of a terminal according to the present invention.
  • FIG. 5 is a schematic diagram of functional modules of a first embodiment of a voice interaction device of a terminal according to the present invention.
  • FIG. 6 is a schematic diagram of functional modules of a second embodiment of a voice interaction device of a terminal according to the present invention.
  • FIG. 7 is a schematic diagram of a refinement function module of an embodiment of the matching module of FIG. 6;
  • FIG. 8 is a schematic diagram of logic of a voice interaction service according to an embodiment of the present invention.
  • FIG. 9 is a schematic flowchart of voice interaction according to an embodiment of the present invention.
  • the main solution of the embodiment of the present invention is: the terminal establishes the voice interaction of its own terminal platform, and uses the TV server as an interface to independently select the voice recognition service identification service and the semantic analysis engine to separate the voice recognition from the semantic recognition.
  • the binding and semantic recognition operations are identified in the terminal's own server, without relying on third-party service providers to provide services, which can be adjusted according to requirements, and the flexibility is greatly increased.
  • the present invention provides a voice interaction method for a terminal.
  • FIG. 1 is a schematic flowchart diagram of a first embodiment of a voice interaction method of a terminal according to the present invention.
  • the voice interaction method of the terminal includes:
  • Step S10 The terminal receives the audio stream output by the voice input device, and acquires text information corresponding to the audio stream.
  • the voice input device is a mobile phone or a remote controller, and the mobile phone can input voice to the terminal by using a WeChat voice or a multi-screen interactive voice module;
  • the remote controller is a remote controller capable of supporting a voice input function.
  • the terminal is preferably a television set, and may also be a controlled display device.
  • the user When the user needs to interact with the television, the user is connected to the television through a mobile phone, and the connection may be a wireless or wired connection. After the connection is established, the user enters the voice through the mobile phone, and the mobile phone converts the recorded voice into an audio stream in real time, transmits it to the television, or transmits the converted audio stream to the television after a voice recording is completed.
  • the television acquires text information corresponding to the audio stream.
  • the obtaining process includes but is not limited to: 1) the television uploads the audio stream to a third-party voice recognition server, and the third-party voice recognition server identifies the audio stream to obtain text information of the audio stream, and feeds the text information to the television.
  • the TV customizes or purchases the voice recognition service, and saves the customized or purchased voice recognition service database on the TV local end.
  • the television After receiving the audio stream, the television recognizes the text information of the audio stream through the local database, and completes the text on the TV local end.
  • the process of audio streaming text information is merely exemplary, and the present invention is not limited to the scope of the above description.
  • Step S20 the terminal uploads the text information to a cloud server constructed by the terminal corresponding to the operator, to search for output information corresponding to the text information by using the cloud server, and return to the terminal;
  • the television has its own cloud server loaded with a semantic parsing engine for identifying the semantics of the textual information of the audio stream.
  • the text information is uploaded to the cloud server of the terminal.
  • the cloud server identifies the search parameter of the text information according to the semantic parsing engine loaded by itself (for semantic recognition, and identifies the semantics of the user from the voice sent by the user through the voice input device).
  • the search parameter is keyword information of text information or user demand information, for example, an on-demand service, a song search service, or an e-commerce service.
  • the search parameter takes the keyword information as an example, and searches for output information corresponding to the text information according to the keyword information, where the output information may be a resource stored in a local server database, or may be provided by a third-party service provider. of. After searching for the information, return the searched output information to the TV.
  • the output information may be e-commerce push information, product advertisement information, or the like.
  • Step S30 the terminal receives the information returned by the cloud server and outputs the information.
  • the television receives the output information returned by the cloud server and outputs the output, including direct display, or push to other terminals (eg, mobile phones, pads, etc.) connected to the television or played.
  • the terminal establishes a voice interaction of its own terminal platform, and uses the TV server as an interface to independently select an access voice recognition service identification service and a semantic resolution engine to separate the voice recognition from the semantic recognition, without binding, and semantic recognition.
  • the operation is identified in the server of the terminal itself, without relying on the service provided by the third-party service provider, and can be adjusted according to requirements, and the flexibility is greatly increased.
  • FIG. 2 is a schematic flowchart diagram of a second embodiment of a voice interaction method of a terminal according to the present invention. Based on the first embodiment of the voice interaction method of the terminal, after the step S10, the method further includes:
  • Step S40 The terminal performs a matching operation according to the text information and information pre-stored in the local database of the terminal;
  • Step S50 After the matching operation is successful, obtain control information corresponding to the matching operation;
  • step S60 a response control operation corresponding to the control information is performed.
  • the matching operation of the television control is performed first.
  • the television stores a database of control information, such as control information including volume addition and subtraction, up and down left and right control, play, pause, fast forward or rewind.
  • Matching the text information with the information pre-stored by the terminal local database after the matching operation is successful, acquiring control information corresponding to the matching operation; performing a response control operation corresponding to the control information; after the matching operation fails, A process of uploading the text information to a cloud server of the terminal is performed.
  • the process of performing the matching operation according to the text information and the information pre-stored by the terminal local database includes:
  • Step S41 the terminal calculates a matching parameter according to the text information and the information of the current page collected in advance;
  • Step S42 Match the matching parameter with the current page entry, and after the current page term is successfully matched, set a tag that matches the current page entry.
  • Step S43 After the current page term matching fails, the matching parameter is matched with the global static term, and after the global static term is successfully matched, the tag matching the global static term is set;
  • Step S44 After the matching with the global static term fails, the matching parameter is matched with the application information, and after the matching with the application information is successful, setting a label matching the application information;
  • step S45 after the matching with the application information fails, the prompt matching operation fails.
  • the page space collection algorithm of the TV in the background will start collecting the text information of the controllable control of the current page of the TV.
  • the local fuzzy matching operation of the TV is performed, and the fuzzy matching is performed through the fuzzy matching.
  • the algorithm calculates the character matching number, source matching degree, and target matching degree as the matching parameters for the text information and the text collected by the combined scene control.
  • the target matching degree is also set differently, according to requirements and performance settings, for example. It can be 0.67 or 1 and so on.
  • the matching priority order is also set.
  • the current page entry is matched first, and the matching degree requirement is 0.67; after the current page term matching fails, the global static term is matched, and the global static term includes some preset global control. Commands, for example, volume addition and subtraction, up and down left and right control, etc.; global static term matching fails to match playback control terms, such as pause, play, fast forward or rewind; the last match is the application term, and the machine
  • the current page matching success label is FUZZY_MATCH
  • the global static matching success label is GLOBAL_MATCH
  • the local application matching success label is APP_MATCH
  • the playback control matching success label is PLAYER_MATCH
  • the fuzzy matching failure label is FAIL_MATCH.
  • the matching success label is FUZZY_MATCH
  • the matching success label is PLAYER_MATCH
  • the matching is completed in the playback control entry, and the corresponding playback control instruction is processed
  • the local fuzzy matching succeeds. After the corresponding control command is completed, the voice interaction ends.
  • the matching operation fails, and the interaction with the cloud service is performed to obtain the information required by the user.
  • the matching operation is performed locally in the terminal, and the third party is not required to complete the matching operation and control, thereby effectively improving the local control efficiency of the terminal.
  • FIG. 4 is a schematic flowchart diagram of a third embodiment of a voice interaction method of a terminal according to the present invention. The method also includes the steps of:
  • Step S70 After identifying the search parameter, the cloud server determines a service type corresponding to the search parameter, and accesses an information provider corresponding to the service type to provide an information service.
  • the cloud server determines the service type corresponding to the search parameter, for example, the on-demand service is required.
  • the song search business is still e-commerce business.
  • the cloud server selects and accesses an information provider providing information service corresponding to the service type according to the service type corresponding to the identified search parameter.
  • the service type may be customized according to requirements, and the server of the terminal selects an appropriate information provider to provide the service.
  • the step S70 can also be performed before or after other steps, and the order can be adjusted according to actual needs.
  • the embodiment is based on the cloud platform of the terminal, and the cloud platform provides an interface, customizes the extended service type, and selects an appropriate information provider to provide information services, avoids the limitation caused by outsourcing by a voice service provider, and improves the flexibility of the funding control. Sex.
  • the invention further provides a voice interaction device for a terminal.
  • FIG. 5 is a schematic diagram of functional modules of a first embodiment of a voice interaction apparatus of a terminal according to the present invention.
  • the device includes: a receiving module 10, an obtaining module 20, a sending module 30, and an output module 40.
  • the receiving module 10 is configured to receive an audio stream output by the voice input device.
  • the obtaining module 20 is configured to acquire text information corresponding to the audio stream
  • the voice input device is a mobile phone or a remote controller, and the mobile phone can input voice to the terminal by using a WeChat voice or a multi-screen interactive voice module;
  • the remote controller is a remote controller capable of supporting a voice input function.
  • the terminal is preferably a television set, and may also be a controlled display device.
  • the user When the user needs to interact with the television, the user is connected to the television through a mobile phone, and the connection may be a wireless or wired connection. After the connection is established, the user enters the voice through the mobile phone, and the mobile phone converts the recorded voice into an audio stream in real time, transmits it to the television, or transmits the converted audio stream to the television after a voice recording is completed.
  • the receiving module 10 receives the audio stream output by the voice input device, and the acquiring module 20 acquires the text information corresponding to the audio stream.
  • the acquisition module 20 acquisition process includes, but is not limited to: 1) uploading the audio stream to a third-party voice recognition server, and the third-party voice recognition server will identify the audio stream to obtain text information of the audio stream, and feedback the text information.
  • the TV is customized or purchased the voice recognition service, and the customized or purchased voice recognition service database is saved on the TV local end.
  • the text information of the audio stream is identified through the local database, and is completed at the local end.
  • the process of audio streaming text information is merely exemplary, and the present invention is not limited to the scope of the above description.
  • the sending module 30 is configured to upload the text information to a cloud server that is configured by the terminal corresponding to the operator, to search for output information corresponding to the text information by using the cloud server, and return the information;
  • the television has its own cloud server loaded with a semantic parsing engine for identifying the semantics of the textual information of the audio stream.
  • the text information is uploaded to the cloud server of the terminal.
  • the cloud server identifies the search parameter of the text information according to the semantic parsing engine loaded by itself (for semantic recognition, and identifies the semantics of the user from the voice sent by the user through the voice input device).
  • the search parameter is keyword information of text information or user demand information, for example, an on-demand service, a song search service, or an e-commerce service.
  • the search parameter takes the keyword information as an example, and searches for output information corresponding to the text information according to the keyword information, where the output information may be a resource stored in a local server database, or may be provided by a third-party service provider. of. After searching for the output information, return the searched output information to the TV.
  • the information may be e-commerce push information, product advertisement information, and the like.
  • the receiving module 10 is further configured to receive output information returned by the cloud server;
  • the output module 40 is configured to output output information returned by the cloud server.
  • the receiving module 10 receives the information returned by the cloud server and outputs it through the output module 40.
  • the output manner includes direct display or push to other terminals (eg, mobile phones, pads, etc.) connected to the output module 40 or played.
  • the terminal establishes a voice interaction of its own terminal platform, and uses the TV server as an interface to independently select an access voice recognition service identification service and a semantic resolution engine to separate the voice recognition from the semantic recognition, without binding, and semantic recognition.
  • the operation is identified in the server of the terminal itself, without relying on the service provided by the third-party service provider, and can be adjusted according to requirements, and the flexibility is greatly increased.
  • FIG. 6 is a schematic diagram of functional modules of a second embodiment of a voice interaction apparatus of a terminal according to the present invention. Also included: a matching module 50 and a response module 60,
  • the matching module 50 is configured to perform a matching operation according to the text information and information pre-stored in the local database of the terminal;
  • the obtaining module 20 is further configured to: after the matching operation is successful, obtain control information corresponding to the matching operation;
  • the response module 60 is configured to perform a response control operation corresponding to the control information.
  • the matching operation of the television control is performed first.
  • a database in which control information is stored in advance such as control information including volume addition and subtraction, up and down left and right control, play, pause, fast forward or rewind.
  • Matching the text information with the information pre-stored by the terminal local database after the matching operation is successful, acquiring control information corresponding to the matching operation; performing a response control operation corresponding to the control information; after the matching operation fails, A process of uploading the text information to a cloud server of the terminal is performed.
  • the matching module 50 includes:
  • the calculating unit 51 is configured to calculate a matching parameter according to the text information and the information of the current page collected in advance;
  • the matching unit 52 is configured to match the matching parameter with the current page entry, after the current page entry is successfully matched;
  • the setting unit 53 is configured to set a label that matches the current page entry.
  • the matching unit 52 is further configured to: after the current page entry matching fails, match the matching parameter with a global static term;
  • the setting unit 53 is further configured to: after matching with the global static term, set a tag that matches the global static term;
  • the matching unit 52 is further configured to: after the matching with the global static term fails, match the matching parameter with the application information;
  • the setting unit 53 is further configured to: after matching the application information, set a label that matches the application information;
  • the prompting unit 54 is configured to prompt the matching operation operation to fail after the matching with the application information fails.
  • the page space collection algorithm of the computing unit 51 in the background starts to collect the text information of the current page steerable control of the television.
  • the matching unit 52 performs the television localization.
  • the fuzzy matching operation the calculating unit 51 calculates the character matching number, the source matching degree and the target matching degree as the matching parameters by using the fuzzy matching algorithm for the text information and the text collected by the combined scene control, and setting the target matching degree in different scenarios. Also, depending on the requirements and performance settings, for example, it can be 0.67 or 1 or the like. Similarly, the matching priority order is also set.
  • the current page entry is matched first, and the matching degree requirement is 0.67; after the current page term matching fails, the global static term is matched, and the global static term includes some preset global control. Commands, for example, volume addition and subtraction, up and down left and right control, etc.; global static term matching fails to match playback control terms, such as pause, play, fast forward or rewind; the last match is the application term, and the machine
  • the current page matching success label is FUZZY_MATCH
  • the global static matching success label is GLOBAL_MATCH
  • the local application matching success label is APP_MATCH
  • the playback control matching success label is PLAYER_MATCH
  • the fuzzy matching failure label is FAIL_MATCH.
  • the matching success label is FUZZY_MATCH
  • the matching success label is PLAYER_MATCH
  • the matching is completed in the playback control entry, and the corresponding playback control instruction is processed
  • the local fuzzy matching succeeds. After the corresponding control command is completed, the voice interaction ends.
  • the prompting unit 54 prompts that the matching operation fails, and transfers to the interaction with the cloud service to obtain the information required by the user.
  • the matching operation is performed locally in the terminal, and the third party is not required to complete the matching operation and control, thereby effectively improving the local control efficiency of the terminal.
  • the cloud server determines the service type corresponding to the search parameter, and accesses the information provider corresponding to the service type to provide the information service.
  • the cloud server determines the service type corresponding to the search parameter, for example, the on-demand service is required.
  • the song search business is still e-commerce business.
  • the cloud server selects and accesses an information provider providing information service corresponding to the service type according to the service type corresponding to the identified search parameter.
  • the service type may be customized according to requirements, and the server of the terminal selects an appropriate information provider to provide the service.
  • the embodiment is based on the cloud platform of the terminal, and the cloud platform provides an interface, customizes the extended service type, and selects an appropriate information provider to provide information services, avoids the limitation caused by outsourcing by a voice service provider, and improves the flexibility of the funding control. Sex.
  • the service logic diagram of the voice interaction includes:
  • the system includes a plurality of parts, including: a voice input module, a local fuzzy matching module, a local control module, a service display module, and a cloud service module;
  • Voice input is a voice input device.
  • the voice input devices supported by this system include a mobile phone and a remote control.
  • the mobile phone input device can input voice via WeChat voice or multi-screen interactive voice module; the remote control supports all remote controllers that support voice input.
  • the local fuzzy matching module is the key to implement local control, including the collection of local terms and the fuzzy matching algorithm of terms. After the user's voice input is converted into a voice text, the fuzzy matching algorithm is firstly sent to determine whether the current instruction of the user matches the local term, and if the matching succeeds, the matching type and the matching ID are returned. When doing local fuzzy matching, we set the matching priority of the local scene. First, match the current page control entry. If the matching is unsuccessful, it will continue to match the preset static entry. If the matching is unsuccessful, the matching control entry will continue to match. If the success is successful, the local application terms will continue to be matched. If the matching is unsuccessful, the cloud platform will be submitted for semantic understanding.
  • the local control module is the module that performs the local control function. According to the result of the fuzzy matching, the control corresponding to the matching result is found, and the control operation is completed.
  • the local control module includes a lookup algorithm and control instructions.
  • the business presentation module refers to the presentation of the results fed back by the cloud platform in addition to local control. Such as movie list, song list, product list, etc.;
  • the cloud platform module includes all server-side processing.
  • the cloud platform includes a local server and a third-party server.
  • the local server is responsible for interfacing with the terminal service and connecting with the third-party server.
  • the third-party server includes a voice recognition server, a semantic understanding service, and a third-party content provider.
  • Step S100 The user inputs a voice command, and the collection algorithm collects the term information of the controllable control of the current page of the system.
  • the text information of the voice recognition is transmitted to the local fuzzy matching algorithm for matching, and if the matching succeeds, the local control module is executed, and the control function of the response is performed to complete a voice interaction experience; if the matching is unsuccessful, the text information of the voice recognition is transmitted to the cloud.
  • the semantic understanding server feeds back the result of the semantic understanding to the local server; the local server goes to the service display module according to the keyword of the semantic understanding feedback to the resource library search content; the service display module performs the content fed back by the local server at the terminal Reasonable display. Thereby completing a voice interaction experience.
  • the system built a set of standard frameworks for speech recognition, semantic understanding and business content access on the TV platform. As a traditional TV manufacturer, you can choose partner access independently. We can choose the voice recognition service engine independently, or we can independently plan the terminal service access type.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Human Computer Interaction (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Telephonic Communication Services (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

L'invention concerne un procédé d'interaction vocale de terminal, comprenant les étapes suivantes : un terminal reçoit un flux audio délivré en sortie par un dispositif d'entrée vocale, et obtient des informations textuelles correspondant au flux audio ; le terminal charge les informations textuelles vers l'amont, dans un serveur en nuage construit par un opérateur correspondant au terminal, de sorte à rechercher des informations de sortie correspondant aux informations textuelles au moyen du serveur en nuage et retourner les informations au terminal ; le terminal reçoit les informations de sortie retournées par le serveur en nuage et délivre les informations en sortie. L'invention concerne également un dispositif d'interaction vocale de terminal. Une opération de reconnaissance sémantique de la présente invention exécute une reconnaissance dans un serveur du terminal sans dépendre d'un service fourni par un fournisseur de services tiers. La présente invention peut en outre être ajustée en fonction des exigences, ce qui augmente considérablement la flexibilité.
PCT/CN2016/098147 2016-07-06 2016-09-06 Procédé et dispositif d'interaction vocale de terminal WO2018006489A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201610529267.9A CN106101789B (zh) 2016-07-06 2016-07-06 终端的语音交互方法及装置
CN201610529267.9 2016-07-06

Publications (1)

Publication Number Publication Date
WO2018006489A1 true WO2018006489A1 (fr) 2018-01-11

Family

ID=57213435

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2016/098147 WO2018006489A1 (fr) 2016-07-06 2016-09-06 Procédé et dispositif d'interaction vocale de terminal

Country Status (2)

Country Link
CN (1) CN106101789B (fr)
WO (1) WO2018006489A1 (fr)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109584870A (zh) * 2018-12-04 2019-04-05 安徽精英智能科技有限公司 一种智能语音交互服务方法及系统
CN111176607A (zh) * 2019-12-27 2020-05-19 国网山东省电力公司临沂供电公司 一种基于电力业务的语音交互系统及方法
CN111223485A (zh) * 2019-12-19 2020-06-02 深圳壹账通智能科技有限公司 智能交互方法、装置、电子设备及存储介质
CN111367492A (zh) * 2020-03-04 2020-07-03 深圳市腾讯信息技术有限公司 网页页面展示方法及装置、存储介质
CN111801731A (zh) * 2019-01-22 2020-10-20 京东方科技集团股份有限公司 语音控制方法、语音控制装置以及计算机可执行非易失性存储介质
CN115396709A (zh) * 2022-08-22 2022-11-25 海信视像科技股份有限公司 显示设备、服务器及免唤醒语音控制方法

Families Citing this family (19)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108109618A (zh) * 2016-11-25 2018-06-01 宇龙计算机通信科技(深圳)有限公司 语音交互方法、系统以及终端设备
CN106782561A (zh) * 2016-12-09 2017-05-31 深圳Tcl数字技术有限公司 语音识别方法和系统
CN106792047B (zh) * 2016-12-20 2020-05-05 Tcl科技集团股份有限公司 一种智能电视的语音控制方法及系统
CN107845384A (zh) * 2017-10-30 2018-03-27 江西博瑞彤芸科技有限公司 一种语音识别方法
CN109785844A (zh) * 2017-11-15 2019-05-21 青岛海尔多媒体有限公司 用于智能电视交互操作的方法及装置
CN109741749B (zh) * 2018-04-19 2020-03-27 北京字节跳动网络技术有限公司 一种语音识别的方法和终端设备
CN110444200B (zh) * 2018-05-04 2024-05-24 北京京东尚科信息技术有限公司 信息处理方法、电子设备、服务器、计算机系统及介质
CN108877797A (zh) * 2018-06-26 2018-11-23 上海早糯网络科技有限公司 主动交互式的智能语音系统
CN110164411A (zh) * 2018-07-18 2019-08-23 腾讯科技(深圳)有限公司 一种语音交互方法、设备及存储介质
CN110795175A (zh) * 2018-08-02 2020-02-14 Tcl集团股份有限公司 模拟控制智能终端的方法、装置及智能终端
CN109979449A (zh) * 2019-02-15 2019-07-05 江门市汉的电气科技有限公司 一种智能灯具的语音控制方法、装置、设备和存储介质
CN109859761A (zh) * 2019-02-22 2019-06-07 安徽卓上智能科技有限公司 一种智能语音交互控制方法
CN109785840B (zh) * 2019-03-05 2021-01-29 湖北亿咖通科技有限公司 自然语言识别的方法、装置及车载多媒体主机、计算机可读存储介质
CN110335602A (zh) * 2019-07-10 2019-10-15 青海中水数易信息科技有限责任公司 一种具有语音识别功能的河长制信息化系统
CN110517690A (zh) * 2019-08-30 2019-11-29 四川长虹电器股份有限公司 语音控制功能的引导方法及系统
CN110600003A (zh) * 2019-10-18 2019-12-20 北京云迹科技有限公司 机器人的语音输出方法、装置、机器人和存储介质
CN111475241B (zh) * 2020-04-02 2022-03-11 深圳创维-Rgb电子有限公司 一种界面的操作方法、装置、电子设备及可读存储介质
CN111627440A (zh) * 2020-05-25 2020-09-04 红船科技(广州)有限公司 一种基于三维虚拟人物和语音识别实现交互的学习系统
CN112767943A (zh) * 2021-02-26 2021-05-07 湖北亿咖通科技有限公司 一种语音交互系统

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102740014A (zh) * 2011-04-07 2012-10-17 青岛海信电器股份有限公司 语音控制电视机、电视系统及通过语音控制电视机的方法
CN102855872A (zh) * 2012-09-07 2013-01-02 深圳市信利康电子有限公司 基于终端及互联网语音交互的家电控制方法及系统
CN102957711A (zh) * 2011-08-16 2013-03-06 广州欢网科技有限责任公司 在电视上通过语音进行网址定位的方法及系统
CN103093755A (zh) * 2012-09-07 2013-05-08 深圳市信利康电子有限公司 基于终端及互联网语音交互的网络家电控制方法及系统
CN103176591A (zh) * 2011-12-21 2013-06-26 上海博路信息技术有限公司 一种基于语音识别的文本定位和选择方法
CN105609104A (zh) * 2016-01-22 2016-05-25 北京云知声信息技术有限公司 一种信息处理方法、装置及智能语音路由控制器

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103188409A (zh) * 2011-12-29 2013-07-03 上海博泰悦臻电子设备制造有限公司 语音自动应答云端服务器、系统及方法
CN104506901B (zh) * 2014-11-12 2018-06-15 科大讯飞股份有限公司 基于电视场景状态及语音助手的语音辅助方法及系统
CN104599669A (zh) * 2014-12-31 2015-05-06 乐视致新电子科技(天津)有限公司 一种语音控制方法和装置
CN105161106A (zh) * 2015-08-20 2015-12-16 深圳Tcl数字技术有限公司 智能终端的语音控制方法、装置及电视机系统
CN105512182B (zh) * 2015-11-25 2019-03-12 深圳Tcl数字技术有限公司 语音控制方法及智能电视
CN105551488A (zh) * 2015-12-15 2016-05-04 深圳Tcl数字技术有限公司 语音控制方法及系统

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102740014A (zh) * 2011-04-07 2012-10-17 青岛海信电器股份有限公司 语音控制电视机、电视系统及通过语音控制电视机的方法
CN102957711A (zh) * 2011-08-16 2013-03-06 广州欢网科技有限责任公司 在电视上通过语音进行网址定位的方法及系统
CN103176591A (zh) * 2011-12-21 2013-06-26 上海博路信息技术有限公司 一种基于语音识别的文本定位和选择方法
CN102855872A (zh) * 2012-09-07 2013-01-02 深圳市信利康电子有限公司 基于终端及互联网语音交互的家电控制方法及系统
CN103093755A (zh) * 2012-09-07 2013-05-08 深圳市信利康电子有限公司 基于终端及互联网语音交互的网络家电控制方法及系统
CN105609104A (zh) * 2016-01-22 2016-05-25 北京云知声信息技术有限公司 一种信息处理方法、装置及智能语音路由控制器

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109584870A (zh) * 2018-12-04 2019-04-05 安徽精英智能科技有限公司 一种智能语音交互服务方法及系统
CN111801731A (zh) * 2019-01-22 2020-10-20 京东方科技集团股份有限公司 语音控制方法、语音控制装置以及计算机可执行非易失性存储介质
CN111801731B (zh) * 2019-01-22 2024-02-13 京东方科技集团股份有限公司 语音控制方法、语音控制装置以及计算机可执行非易失性存储介质
CN111223485A (zh) * 2019-12-19 2020-06-02 深圳壹账通智能科技有限公司 智能交互方法、装置、电子设备及存储介质
CN111176607A (zh) * 2019-12-27 2020-05-19 国网山东省电力公司临沂供电公司 一种基于电力业务的语音交互系统及方法
CN111367492A (zh) * 2020-03-04 2020-07-03 深圳市腾讯信息技术有限公司 网页页面展示方法及装置、存储介质
CN111367492B (zh) * 2020-03-04 2023-07-18 深圳市腾讯信息技术有限公司 网页页面展示方法及装置、存储介质
CN115396709A (zh) * 2022-08-22 2022-11-25 海信视像科技股份有限公司 显示设备、服务器及免唤醒语音控制方法

Also Published As

Publication number Publication date
CN106101789A (zh) 2016-11-09
CN106101789B (zh) 2020-04-24

Similar Documents

Publication Publication Date Title
WO2018006489A1 (fr) Procédé et dispositif d'interaction vocale de terminal
WO2020060325A1 (fr) Dispositif électronique, système et procédé pour utiliser un service de reconnaissance vocale
WO2017143692A1 (fr) Téléviseur intelligent et son procédé de commande vocale
WO2020222444A1 (fr) Serveur pour déterminer un dispositif cible sur la base d'une entrée vocale d'un utilisateur et pour commander un dispositif cible, et procédé de fonctionnement du serveur
WO2019051902A1 (fr) Procédé de commande de terminal, climatiseur et support d'informations lisible par un ordinateur
WO2018043991A1 (fr) Procédé et appareil de reconnaissance vocale basée sur la reconnaissance de locuteur
WO2014003283A1 (fr) Dispositif d'affichage, procédé de commande de dispositif d'affichage, et système interactif
WO2017028601A1 (fr) Procédé et dispositif de commande vocale pour un terminal intelligent et système de télévision
WO2014107101A1 (fr) Appareil d'affichage et son procédé de commande
WO2019085543A1 (fr) Système de télévision et procédé de commande de télévision
WO2018023926A1 (fr) Procédé et système d'interaction pour téléviseur et terminal mobile
WO2014187158A1 (fr) Procédé, serveur, et terminal pour contrôler le partage de données de terminal en nuage
WO2019062113A1 (fr) Procédé et dispositif de commande pour appareil ménager, appareil ménager et support de stockage lisible
WO2019114262A1 (fr) Procédé de chargement d'interface utilisateur, téléviseur intelligent, et support de stockage lisible par ordinateur
WO2016058258A1 (fr) Procédé et système de commande à distance de terminal
WO2017054488A1 (fr) Procédé de commande de lecture de télévision, serveur et système de commande de lecture de télévision
WO2017206377A1 (fr) Procédé et dispositif de lecture synchrone de programme
WO2019062112A1 (fr) Procédé et dispositif de commande d'un appareil de climatisation, appareil de climatisation et support lisible par ordinateur
WO2019041851A1 (fr) Procédé de conseil après-vente d'appareil ménager, dispositif électronique et support de stockage lisible par ordinateur
WO2019114127A1 (fr) Procédé et dispositif de sortie vocale pour conditionneur d'air
WO2018036057A1 (fr) Procédé et dispositif de mise à niveau adaptative en arrière-plan de logiciel
WO2018233221A1 (fr) Procédé de sortie sonore multi-fenêtre, télévision et support de stockage lisible par ordinateur
WO2017045441A1 (fr) Procédé et appareil de lecture audio utilisant une télévision intelligente
WO2017036208A1 (fr) Procédé et système pour extraire des informations dans une interface d'affichage
WO2018006581A1 (fr) Procédé et appareil de lecture de télévision intelligente

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 16907993

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 27.05.2019)

122 Ep: pct application non-entry in european phase

Ref document number: 16907993

Country of ref document: EP

Kind code of ref document: A1