WO2014106433A1 - 语音识别的方法、交互设备、服务器和系统 - Google Patents

语音识别的方法、交互设备、服务器和系统 Download PDF

Info

Publication number
WO2014106433A1
WO2014106433A1 PCT/CN2013/090033 CN2013090033W WO2014106433A1 WO 2014106433 A1 WO2014106433 A1 WO 2014106433A1 CN 2013090033 W CN2013090033 W CN 2013090033W WO 2014106433 A1 WO2014106433 A1 WO 2014106433A1
Authority
WO
WIPO (PCT)
Prior art keywords
party application
semantics
unit
target
text
Prior art date
Application number
PCT/CN2013/090033
Other languages
English (en)
French (fr)
Inventor
李善甫
董明杰
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Priority to KR1020157020795A priority Critical patent/KR101838095B1/ko
Publication of WO2014106433A1 publication Critical patent/WO2014106433A1/zh
Priority to US14/792,502 priority patent/US10229684B2/en
Priority to US16/256,872 priority patent/US10971156B2/en
Priority to US17/190,218 priority patent/US11676605B2/en

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/26Speech to text systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/18Speech classification or search using natural language modelling
    • G10L15/1822Parsing for meaning understanding
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • G10L2015/223Execution procedure of a spoken command
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M1/00Substation equipment, e.g. for use by subscribers
    • H04M1/72Mobile telephones; Cordless telephones, i.e. devices for establishing wireless links to base stations without route selection
    • H04M1/724User interfaces specially adapted for cordless or mobile telephones
    • H04M1/72403User interfaces specially adapted for cordless or mobile telephones with means for local support of applications that increase the functionality
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M2250/00Details of telephonic subscriber devices
    • H04M2250/74Details of telephonic subscriber devices with voice recognition means

Definitions

  • the present invention relates to the field of interactive device applications, and more particularly to a method, a device, and a system for voice recognition. Background technique
  • the user can download various applications (applications in English) for the interactive device (User Equipment, abbreviated as UE), and these applications are not necessarily provided by the terminal manufacturer.
  • UE User Equipment
  • a third-party application is an assembly or instruction set that is independent of the operating system but can be called and executed in the operating environment of the operating system. It is with these numerous third-party applications that greatly enrich the user's choice of applications.
  • the operating systems of the two most common interactive devices are Android (Android for Android) and Apple OS (abbreviated as iOS), all of which can support third-party applications.
  • Android Android for Android
  • iOS Apple OS
  • On the Android platform a third party can upload the developed application to the Android market for sale; on the iOS platform, the third party can upload the developed application to the application mall for sale.
  • Users of interactive devices can download Android Market or Apple Store applications to the interactive device by connecting to the network. These applications can be free or charged, and these applications include the most common maps, QQ social communication software, and Skype instant messaging. Software or Taobao shopping software, etc.
  • the above application is presented on the terminal in the form of an icon. When the user needs to enable a third-party application, the corresponding icon is triggered by the touch screen to trigger the third-party application to start.
  • embodiments of the present invention provide a voice recognition method, an interaction device, a server, and a system to solve the problem of starting a third-party application by voice recognition.
  • a method for speech recognition comprising: parsing one or more first texts to obtain first target semantics, the first text being converted by the first speech signal; determining according to the acquired first target semantics A third-party application object corresponding to the first target semantics, and a third-party application associated with the third-party application object is a program that is initiated by the device without being authorized by voice at the factory; obtained from the third-party application registry and determined A third-party application associated with a third-party application object and launching a third-party application.
  • the method before parsing the first text to obtain the first target semantics, the method further comprises: acquiring the first speech signal and converting the first speech signal to obtain one or more first texts.
  • the semantics of the first text with the highest semantic score of at least two first texts and exceeding a preset threshold are obtained as the foregoing A target semantic.
  • all third-party application objects corresponding to the first target semantics are determined according to the first target semantics; Equivalent to two, obtaining all third-party application object information associated with the first target semantics; issuing prompt information, the prompt information including the third-party application object information, to prompt the user to further issue according to the third-party application object information a selection instruction to select a target third-party application object from all third-party application objects associated with the first target semantics; receive a selection instruction, and select a corresponding target third-party application object according to the selection instruction; correspondingly, from a third-party application Obtaining a third-party application associated with the determined third-party application object in the registry and launching the third-party application specifically includes: obtaining a third-party application associated with the target third-party application object from the third-party application registry and Start a third-party application.
  • selecting the corresponding target third-party application object according to the received selection instruction specifically includes: Converting the second speech signal to obtain the second text; parsing the second text Taking the second target semantics; determining the corresponding target third-party application object according to the second target semantics.
  • the method further includes: acquiring, in the third-party application registry, the third party application object that is associated with the determined Generate a third-party application registry before the three-party application launches the third-party application.
  • the method further includes: sending a semantic command of the first target semantics to the third-party application, so that The third party application invokes a third party application interface (API) that matches the launched third party application based on the semantic command of the first target semantics.
  • API third party application interface
  • the method further includes: the third party application receives the third voice signal of the user; and converts the third voice signal to obtain the third Text; invokes a third-party application interface (API) that matches the launched third-party application according to the semantic command of the third text, so that the launched third-party application obtains the required hardware or software resources through the API.
  • API third-party application interface
  • the method is performed by an interaction device.
  • parsing the one or more first texts to obtain the first target semantics is performed by the server; according to the acquired first target Semanticly determining a third-party application object corresponding to the first target semantics, and acquiring a third-party application associated with the determined third-party application object from a third-party application registry and launching the third-party application Interactive device execution.
  • a voice recognition interactive device including a parsing unit, a determining unit, a starting unit, and a storage unit: a parsing unit, configured to parse the first text sent by one or more speech recognition units to obtain a first target Semantics, the first text is converted by the first voice signal; and the determining unit is configured to determine, according to the first target semantics acquired by the parsing unit, a third-party application object corresponding to the first target semantic, and the third-party application object is associated
  • the third-party application is a program that is started by voice without authorization by the device when the device leaves the factory; the startup unit is configured to access the storage unit, and obtain the third-party application object determined by the determining unit from the third-party application registry stored in the storage unit.
  • the interaction device further includes a voice recognition unit, configured to convert The first speech signal results in one or more first texts and is sent to the parsing unit.
  • the interaction device further includes a voice recognition unit: the voice recognition unit is configured to convert the first voice signal to obtain at least two a text is sent to the parsing unit; the parsing unit is configured to: according to a predetermined scoring rule, each of the at least two first texts according to the semantic naturalness and coherence of the at least two first texts The semantics are scored, and the level of the score reflects the naturalness and coherence of the semantics; the semantics of the first text with the highest semantic score of at least two first texts is obtained as the first target semantics.
  • the parsing unit is specifically configured to: obtain the first text of the at least two first texts with the highest semantic score and exceeding a preset threshold Semantics as the first target semantics.
  • the interaction device further includes: a sending unit and a receiving unit: determining unit, specifically, determining, according to the first target semantics, determining All third-party application objects corresponding to a target semantic; if the third-party application object is greater than or equal to two, obtain all third-party application object information associated with the first target semantics; and the sending unit is configured to issue prompt information, prompt information Include the third-party application object information, to prompt the user to issue a further selection instruction according to the third-party application object information, so that the determining unit selects the target third-party application object from all third-party application objects associated with the first target semantics; a receiving unit, configured to receive a selection instruction sent by the user, where the determining unit is configured to select a corresponding target third-party application object according to the received selection instruction; the startup unit obtains the target third party from the third-party application registry stored in the storage unit Application object associated with the third Party application and launch third
  • the voice recognition unit is further configured to: when the selection instruction is the second voice signal, convert the second voice signal to obtain the second text and Sending to the parsing unit; the parsing unit is further configured to parse the second text to obtain the second target semantics; the determining unit is further configured to determine the corresponding target third-party application object according to the second target semantics.
  • the interaction device further includes: a generating unit: the generating unit acquires, from the third-party application registry, the third-party application object that is determined by the starting unit Before the third-party application is launched and the third-party application is launched, a third-party application registry is generated and sent to the storage unit.
  • a seventh possible implementation In conjunction with the second aspect or the above possible implementation of the second aspect, in a seventh possible implementation, in the mode, after the startup unit starts the third-party application, sending a semantic command of the first target semantics to the third-party application, so that the third-party application invokes the semantic command according to the first target semantics with the third-party application that is started.
  • a matching third-party application interface (API) A matching third-party application interface (API).
  • a server for speech recognition including a parsing unit, a determining unit, a starting unit, and a storing unit: a parsing unit, configured to parse one or more first texts to obtain first target semantics, wherein the first text Obtaining, by the converting the first voice signal, a determining unit, configured to determine, according to the first target semantics acquired by the parsing unit, a third-party application object corresponding to the first target semantic, where the third-party application associated with the third-party application object is a device a program that is initiated by voice without a credit at the factory; a startup unit, configured to acquire, from a third-party application registry stored in the storage unit, a third-party application associated with the third-party application object determined by the determining unit and start a third party Application;
  • the storage unit is used to store third-party application registries and third-party applications.
  • the parsing unit is specifically configured to: according to a predetermined scoring rule, the semantics and continuity of the semantics of the at least two first texts obtained by converting the first speech signal to the at least two The semantics of each first text in the first text is scored, and the level of the score reflects the naturalness and consistency of the semantics; obtaining the first text of the at least two first texts with the highest semantic score Semantics as the first target semantics.
  • the parsing unit is specifically configured to: obtain the highest semantic score of the at least two first texts and exceed the preset threshold The semantics of the first text serves as the first target semantics.
  • the server further includes: a sending unit and a receiving unit: the determining unit, specifically, determining, according to the first target semantic, the first All third-party application objects corresponding to the target semantics; if the third-party application object is greater than or equal to two, obtain all third-party application object information associated with the first target semantics; the sending unit is configured to issue prompt information, and the prompt information includes The third-party application object information is used to prompt the user to issue a further selection instruction according to the third-party application object information, so that the determining unit selects the target third-party application object from all the third-party application objects associated with the first target semantics; a unit, configured to receive a selection instruction sent by the user; the determining unit is specifically configured to select a corresponding target third-party application object according to the received selection instruction; the startup unit obtains the target third-party application from the third-party application registry stored in the storage unit Third party associated with the object The application
  • the parsing unit is further configured to: when the selection instruction is the second speech signal, parse the second text to obtain the second target semantics, where the second text is obtained by converting the second speech signal; and the determining unit is specifically configured to determine the corresponding according to the second target semantic Target third party object.
  • the server further includes a generating unit: the generating unit acquires the determined third party from the third-party application registry in the starting unit Before the third-party application associated with the object is applied and the third-party application is launched, a third-party application registry is generated and sent to the storage unit.
  • the semantics of the first target semantics are sent to the third-party application a command to cause a third party application to invoke a third party application interface (API) that matches the launched third party application based on the semantic command of the first target semantics.
  • API third party application interface
  • the third-party application converts the third voice signal to obtain the third text.
  • a third-party application interface (API) that matches the launched third-party application is invoked according to the semantic command of the third text to enable the launched third-party application to obtain the required hardware or software resources through the API.
  • the receiving unit is further configured to receive, by the interaction device, before the parsing the first text to obtain the first target semantics A text.
  • the sending unit is further configured to send a response feedback to the interaction device after starting the third-party application.
  • a fourth aspect provides a device for voice recognition, comprising: a processor and a memory: the processor parses the one or more first texts to obtain a first target semantic, the first text is converted by the first voice signal; The obtained first target semantics determines a third-party application object corresponding to the first target semantics, and the third-party application associated with the third-party application object is a program that is initiated by a voice mode without authorization by the device when the device leaves the factory; Accessing a memory, obtaining a third party application associated with the determined third party application object from the third party application registry stored in the memory, and initiating the third party application, wherein the memory stores the third party application registry and the The third party application and the instructions of the processor to perform the above method.
  • the processor parses the first text to obtain the first target language. Before the meaning, the first speech signal is acquired and the first speech signal is converted to obtain one or more first texts.
  • the processor converts the first voice signal to obtain at least two first texts, and according to the predetermined scoring rule, according to the at least two first texts The semantic naturalness and coherence of the semantics of each of the at least two first texts is scored.
  • the level of the score reflects the naturalness and consistency of the semantics.
  • the processor acquires the semantics of the first text that has the highest semantic score of the at least two first texts and exceeds a preset threshold.
  • the first target semantics The first target semantics.
  • the processor determines, according to the first target semantic, all third-party application objects corresponding to the first target semantics; If the object is greater than or equal to two, the third-party application object information associated with the first target semantics is obtained; the prompt information is sent, and the prompt information includes the third-party application object information, to prompt the user to use the third-party application object information.
  • Obtaining a third-party application associated with the determined third-party application object and launching the third-party application in the three-party application registry specifically includes: obtaining a third-party application associated with the target third-party application object from the third-party application registry Program and launch a third-party application.
  • the processor selects the instruction as the second voice signal; and accordingly, selects the corresponding target third-party application object according to the received selection instruction.
  • the method includes: converting a second speech signal to obtain a second text; parsing the second text to obtain a second target semantic; and determining, according to the second target semantic, a corresponding target third-party application, in combination with the foregoing possible implementation manner of the fourth aspect or the fourth aspect,
  • the processor generates a third-party application registry before acquiring the third-party application associated with the determined third-party application object from the third-party application registry and starting the third-party application And sent to the memory.
  • the method further includes: sending a semantic command of the first target semantics to the third-party application, To enable third-party applications to tune based on the semantics of the first target semantics Use a third-party application interface (API) that matches the third-party application being launched.
  • API application interface
  • the processor after starting the third-party application, to enable the third-party application to receive the third voice signal of the user; a third voice signal to obtain a third text; a third-party application interface (API) matching the launched third-party application according to a semantic command of the third text, so that the activated third-party application obtains the API through the API Required hardware or software resources.
  • API third-party application interface
  • a system for speech recognition comprising any of the above-described interactive devices, servers or devices.
  • the third-party application corresponding to the voice signal can be determined through semantic analysis; the third-party application registry information is searched and the third-party program is started, so that the user does not need to click the third-party application to start
  • the corresponding program provides users with smarter services, which is convenient for users.
  • Figure 1 is a schematic flow chart of a speech recognition method according to a first embodiment of the present invention.
  • FIG. 2 is a schematic flow chart of a voice recognition method according to Embodiment 2 of the present invention.
  • FIG. 3 is a schematic diagram of calling the hardware of the interaction device after the third-party application is started in the embodiment of the present invention.
  • FIG. 4 is a schematic diagram of a third party application supporting a voice recognition function according to Embodiment 3 of the present invention.
  • Figure 5 is a schematic block diagram of a speech recognition system of Embodiment 4 of the present invention.
  • Figure 6 is a schematic flow chart of a speech recognition method according to Embodiment 5 of the present invention.
  • Figure 7 is a schematic block diagram of an interactive device for speech recognition according to Embodiment 7 of the present invention.
  • Figure 8 is a schematic block diagram of an interactive device for speech recognition according to Embodiment 8 of the present invention.
  • Figure 9 is a schematic block diagram of a server for speech recognition in accordance with a ninth embodiment of the present invention.
  • Figure 10 is a schematic block diagram of a server for speech recognition in Embodiment 10 of the present invention.
  • FIG 11 is a schematic block diagram of an apparatus for speech recognition in accordance with an eleventh embodiment of the present invention. detailed description
  • Speech recognition technology is a technique that enables a machine, that is, a device to convert a voice signal into a corresponding text or command through a process of recognizing and understanding.
  • Speech recognition technology mainly includes three aspects: feature extraction technology, pattern matching criterion and model training technology.
  • speech recognition technology generally outputs recognition results in the form of text or commands.
  • the entire speech recognition process is a language signal input, after processing, outputting text content, or outputting commands to drive other devices, thereby changing the traditional mouse or keyboard input mode.
  • the interactive device of the prior art may include, for example, a smart phone (in English, a smart phone), a tablet computer (in English, a tablet PC), a game machine, and the like.
  • the above interactive devices usually have a voice recognition function, and the accuracy of the voice recognition function from voice to text is constantly increasing.
  • interactive devices must be triggered by a touch or a mouse click to launch a third-party application.
  • the user wants to improve the intelligence level of the interactive device, for example, can directly launch a third-party application through voice recognition.
  • the embodiment of the present invention provides a method for directly starting a third-party application according to a user's voice signal.
  • Embodiment 1 is a schematic flow chart of a voice recognition method according to Embodiment 1 of the present invention, which includes the following contents.
  • S11 and S12 are also referred to as semantic analysis in the context.
  • the third-party application may be downloaded on the user's interactive device or downloaded on the server; the voice recognition function for converting the voice signal into text may be placed on the interactive device or placed on the server; Parsing the text to obtain the target semantics and determining the third-party application object corresponding to the target semantics and launching the third-party application may be performed by the interactive device or by the server.
  • This server is used to provide support services for interactive devices, unlike third-party servers that support third-party applications below.
  • the embodiment of the present invention utilizes a voice recognition function, according to a user's voice signal, can determine a third-party application corresponding to a voice signal through semantic analysis; find a third-party application registry information and start a third-party program, thereby eliminating the need for the user to click on a third party
  • the application launches the corresponding program, providing users with smarter services, which is convenient for users.
  • the first speech signal before parsing the first text to obtain the first target semantics, acquiring the first speech signal and converting the first speech signal to obtain the one or more first texts.
  • converting the first voice signal to obtain at least two first texts correspondingly, according to a predetermined scoring rule, according to the semantic naturalness and coherence of the at least two first texts
  • the semantics of each of the at least two first texts are scored, and the level of the scores reflects the naturalness and consistency of the semantics; obtaining semantic scores of the at least two first texts
  • the highest semantics of the first text is used as the first target semantic.
  • acquiring the semantics of the first text with the highest semantic score of the at least two first texts as the first target semantics including: acquiring the at least two first The semantics of the text is the highest and the semantics of the first text exceeding a preset threshold is used as the first target semantic.
  • determining the third-party application object corresponding to the first target semantics according to the acquired first target semantics specifically includes: determining, according to the first target semantics, the first All third-party application objects corresponding to the target semantics; if the third-party application objects are greater than or equal to two, acquiring all third-party application object information associated with the first target semantics; issuing prompt information, the prompt information Include the third-party application object information, to prompt the user to issue a further selection instruction according to the third-party application object information, to select a target third-party application from all third-party application objects associated with the first target semantics Receiving a selection instruction, and selecting a corresponding target third-party application object according to the received selection instruction; correspondingly, obtaining, from the third-party application registry, the first associated with the determined third-party application object
  • the third party application and launching the third party application specifically includes: obtaining a third party application associated with the target third party application object from a third party application registry and launching the third party application.
  • the selecting instruction is the second voice signal.
  • the selecting the corresponding target third-party application object according to the received the selection instruction specifically includes: converting the second voice signal to obtain a second Text; parsing the second text to obtain second target semantics; determining a corresponding target third-party application object according to the second target semantics.
  • the method further includes: before acquiring the third-party application associated with the determined third-party application object from the third-party application registry and starting the third-party application, The third party application registry is generated.
  • the method further includes: sending, to the third-party application, a semantic command of the first target semantic, so that the third-party application is The semantic command of the first target semantics invokes a third party application interface API that matches the third party application being launched.
  • the method further includes: the third-party application converting the third voice signal to obtain a third text; according to the third text
  • the semantic command invokes a third-party application interface API that matches the launched third-party application to enable the launched third-party application to obtain the required hardware or software resources through the API.
  • the method is performed by an interaction device.
  • the method is performed by the server by parsing the one or more first texts to obtain a first target semantic; and determining, according to the acquired first target semantics, the first The third-party application object corresponding to the target semantics, and the obtaining the third-party application associated with the determined third-party application object from the third-party application registry and starting the third-party application are executed by the interaction device.
  • Fig. 2 is a schematic flow chart of a speech recognition method according to a second embodiment of the present invention.
  • the interactive device executes the method, and the user downloads the third party application to the interactive device.
  • the interactive device includes at least a voice recognition unit, a parsing unit, a determining unit, a starting unit, and a storage unit.
  • the interaction device receives the first voice signal.
  • the first voice signal here is what the user said, such as "Old Hu Online?”
  • the interactive device can receive the first voice signal through the microphone. 522.
  • the interaction device converts the first voice signal to obtain one or more first texts.
  • the voice recognition unit of the interactive device After receiving the first signal, the voice recognition unit of the interactive device converts the voice waveform into a text (Speech to text), and the text corresponds to the first text.
  • the first text output after the speech recognition unit is converted can be "Old Hu Online?”, "Old Hu First?” and / or "Tiger Online?”, etc., sent to the determination unit for processing.
  • the interaction device includes a voice recognition unit, a parsing unit, a determining unit, and a starting unit, and respectively performs specific steps in Embodiment 2 of the present invention.
  • the speech recognition unit and the remaining parsing unit, determining unit, and initiating unit may be included in different devices.
  • the interaction device scores the semantics of the at least two first texts according to a predetermined scoring rule according to the naturalness and coherence of the semantics of at least two first texts (English for semantics).
  • the parsing unit of the interactive device lists corresponding optional semantics according to the first text, for example, the phone book of the interactive device has "old Hu", and the voice input of the previous user has a word with high probability of occurrence such as “online”; the parsing unit Sort out a variety of combinations, such as “Old Hu Online?”, “Old Hu First?” and “Tiger Online?”, and score each combination according to the naturalness and coherence of semantics. The level of the score reflects the semantics. The pros and cons of naturalness and coherence.
  • the parsing unit may set a scoring threshold and score the various combinations of semantics of the first text according to a predetermined scoring rule.
  • the task analysis is performed and the storage unit is accessed only when the score of the semantics of the first text is the highest and exceeds a predetermined threshold.
  • the interactive device converts the first voice signal to obtain only one first text, directly determining the first target semantics by using the semantics of the first text, the scoring processes S23 and S24 may be omitted, and then the steps S25 and subsequent steps are performed.
  • the interaction device acquires, as the first target semantics, the semantics of the first text with the highest semantic score of the at least two first texts.
  • the parsing unit of the interactive device only acquires the semantics of the first text with the highest score, for example, "Lao Hu Online”, then S25 is performed.
  • the parsing unit may The recognition unit feeds back commands that require more information to be entered.
  • the parsing unit is closely related to text communication and voice communication according to semantic context analysis, that is, feedback commands to the speech recognition unit, such as "Do you want to be voice online communication” or "text online communication".
  • the voice recognition unit can feed the command to the user in a voice or text manner, and use The user enters more requirements. For example, the user answers “speech", thereby increasing the "speech” requirement, and the parsing unit uses "speech" and "online” as the first target semantics.
  • the interaction device determines, according to the first target semantic, a third-party application object corresponding to the first target semantic.
  • the determining unit of the interaction device determines the related task according to the selected first target semantics, such as
  • the task corresponding to "Online” is a social task, and the startup unit needs to start the corresponding third-party application, such as QQ and / or Skype.
  • the candidate third-party application is referred to as a third-party application object.
  • the determining unit may be based on an association relationship between the predefined semantics and the third-party application, or may select the third-party application object by using the first target semantics according to the frequency of use of the third-party application, and the present invention does not limited.
  • only a third party application object corresponding to the first target semantics is determined based on the first target semantics, for example, Skype instant messaging software.
  • the interaction device may further include a transmitting unit and a receiving unit. If the determined third-party application object is greater than or equal to two, obtain all third-party application object information associated with the first target semantics; the sending unit sends prompt information, where the prompt information includes the third-party application object information, Prompting the user to issue a further selection instruction according to the third-party application object information, so that the determining unit selects the target third-party application object from all third-party application objects associated with the first target semantics; the receiving unit receives the selection instruction, The determining unit specifically selects a corresponding target third-party application object according to the received selection instruction; correspondingly, the starting unit acquires a third-party application associated with the target third-party application object and starts the third-party application.
  • the determining unit can feed back to the voice recognition unit a command for inputting more information according to the context, such as feeding back a command to the voice recognition unit, such as "Do you want to be Skype instant communication" or "QQ social communication".
  • the speech recognition unit can feed the command to the user in a voice or text manner, allowing the user to input more requests. If the user answers "Skype", the parsing unit will use "Skype" as the second target semantic. Next, the determining unit determines the target third party application object as Skype instant messaging software according to the second target semantic.
  • the interaction device obtains a third-party application associated with the determined third-party application object from a third-party application registry and starts the third-party application.
  • the activation unit of the interactive device searches for the determined third party in the third party application registration information table. Use the program and launch the third-party application.
  • the startup unit After the startup unit determines the specific third-party application, it accesses the storage unit.
  • the storage unit stores a third-party application registration information table generated when the interactive device is powered on or a third-party application registration information table generated when the program is installed, and finds a corresponding program such as Skype and starts the program.
  • the third-party application registration information table can be generated when the third-party application is installed, or can be generated when the third-party application is installed, as long as it is in the first
  • the third-party application registry can obtain the third-party application associated with the determined third-party application object and start the third-party application.
  • the third-party application registration information table is stored in the interaction device.
  • the third party application registration information table may also be stored on other devices than the interaction device, or both on the interaction device and other devices.
  • FIG. 3 is a schematic diagram of calling the hardware of the interaction device after the third-party application is started in the embodiment of the present invention.
  • the interactive device 31, after launching the third party application 32 communicates with the third party application 32 via the defined API 33.
  • a third-party application needs to activate the video and voice chat function, that is, through the interactive device operating system 34 pre-defined video API and audio API interface, etc., and call the camera 35 and microphone 36 of the interactive device.
  • Thousands of APIs are defined in the current interactive device operating system to support various third-party applications to implement different functions. With these APIs, third-party applications can communicate with the interactive device platform and invoke interactive devices.
  • Various resources are examples of resources.
  • the semantic command of the parsing unit is received, and the corresponding operation is completed according to the semantic command.
  • the parsing unit sends a semantic command to QQ, and the delivery of the command may also be a predefined API; after receiving the command, QQ finds the friend named "Old Hu" in the QQ buddy list.
  • the third-party application may involve the user authentication problem, and need to input the user account and password, such information is fed back to the user by the third-party application through the parsing unit and the speech recognition unit; the user passes the voice or the keyboard Enter the corresponding account User number and password.
  • the user only needs to issue a command by voice, and the interactive device can identify the voice, perform semantic analysis, and finally start the third-party application, and further activate the third-party application after starting the third-party application.
  • Feedback to users about status information related to third-party applications provides a new way to launch third-party applications, greatly facilitating user access.
  • the information of the text exchanged between the respective units such as the voice recognition unit and the parsing unit may be decoded, and the form presented to the user may be a played voice or screen text or the like.
  • the embodiment of the present invention utilizes a voice recognition function, according to a user's voice signal, can determine a third-party application corresponding to a voice signal through semantic analysis; find a third-party application registry information and start a third-party program, thereby eliminating the need for the user to click on a third party
  • the application launches the corresponding program, providing users with smarter services, which is convenient for users.
  • Embodiment 4 is a schematic structural diagram of a third party application supporting a voice recognition function according to Embodiment 3 of the present invention.
  • the interaction device has a voice recognition function, and the user downloads the third party application to the interaction device.
  • the third party application itself also supports the voice recognition function, and can perform related according to the voice input of the user. Operation.
  • the embodiment 2 and the embodiment 3 are the same as the process of identifying the third-party application corresponding to the user according to the voice signal of the user, which is not cleaned up here. The difference is in the execution process of the third-party application in the third embodiment after the third-party application is started.
  • the third-party application 41 can directly receive the user's voice signal 42 and obtain a corresponding command according to the recognized voice signal 42 to perform related operations.
  • the third party application receives the user's voice signal 42, converts the text, completes the semantic parsing of the text, and performs semantically corresponding tasks.
  • the third party application 41 can place the semantic analysis and task execution on its corresponding third party server 43.
  • the API 46 defined by the operating system 45 of the interaction device contacts the interaction device and starts related resources, such as calling the camera 47 and the microphone 48. Perform video communication.
  • the communication between the third-party application and the third-party server may be a private interface and protocol defined by the third party.
  • the third-party application is a map
  • the third-party application needs to invoke the GPS information of the interactive device, and can query the address information of the interactive device through the corresponding API, and at the same time
  • the map information can be accessed by a third-party application to access its own third-party server to obtain corresponding map information.
  • FIG. 5 is a schematic block diagram of a speech recognition system 50 of Embodiment 4 of the present invention.
  • System 50 includes an interaction device 51 and a server 52.
  • the interactive device 51 may include a voice recognition unit 511, a storage unit 512, an operating system 513, a camera 514, a microphone 515, and the like.
  • the server 52 may include a third party application 522, a storage unit 523, and a parsing unit, a determining unit, and a starting unit, wherein the parsing unit, the determining unit, and the initiating unit are indicated at 521 in FIG.
  • the interactive device 51 is connected to the server 52 via the network 53.
  • Operating system 513 defines a plurality of APIs 516 for supporting various third party applications 522.
  • the interactive device only has the speech recognition function, and the semantic analysis and the function of starting the third-party application are all completed by the server, and the third-party application is stored on the server.
  • these third-party applications do not support speech recognition.
  • the storage unit may be included in the interaction device or the server, or may be placed on one of the two. When the startup unit accesses the storage unit, the storage unit may be selected one by one; the interaction device is connected to the server through the network, and the network provides communication between the two.
  • the pipeline can be a wireless network or a wired network.
  • Fig. 6 is a schematic flow chart of the speech recognition method in the fifth embodiment of the present invention.
  • the interaction device converts the first voice signal to obtain one or more first texts, and sends the first text to the server.
  • the first voice signal here is what the user said, for example, "Old Hu Online?”
  • the voice recognition unit of the interaction device converts the voice waveform into the first text, for example, after the voice recognition unit is converted.
  • the first text output can be "Old Hu Online", “Old Hu First?” and / or "Tiger Online?”, etc., sent to the server for processing.
  • the server parses one or more first texts to obtain a first target semantic. This step can include the following specific content.
  • the server includes a parsing unit, a determining unit, and a starting unit.
  • the parsing unit of the server receives at least two first texts obtained by converting the first speech signal sent by the interaction device, and according to a predetermined scoring rule, the at least two according to the semantics and coherence of the semantics of the at least two first texts.
  • the semantics of each first text in the first text is scored, and the level of the score reflects the naturalness and coherence of the semantics; and the semantics of the first text with the highest score is obtained as the first target semantic.
  • the parsing unit lists the corresponding optional semantics according to the first text, for example, the “phone” in the phone book of the interactive device, and the “phone” in the previous user’s voice input has a higher probability of occurrence;
  • the unit sorts out a variety of combinations, such as "Old Hu Online?”, “Old Hu First?” and “Tiger Online?”. According to the naturalness and coherence of semantics, each combination is scored, and the level of the score reflects the naturalness and coherence of the semantics.
  • the parsing unit may set a scoring threshold, and score the various combinations of semantics of the first text according to a predetermined scoring rule, and maximize the semantic score of the at least two first texts. And the semantics of the first text exceeding a preset threshold is used as the first target semantic.
  • the parsing unit takes the semantics of the first text as the first target semantics.
  • the parsing unit may interact with the context according to the context.
  • the speech recognition unit of the device feeds back a command that needs to input more information; as described in the above example, after the semantic "online” scoring, the semantic scoring threshold is not satisfied, the parsing unit analyzes the context according to semantics, "online” and text communication and voice communication Closely related, that is, feedback commands to the speech recognition unit, such as "Do you want to be voice online communication” or "text online communication”.
  • the voice recognition unit can feed the command to the user in a voice or text manner, allowing the user to input more requests. For example, the user answers “speech, thereby increasing the "speech” requirement, and the parsing unit uses “speech” and "online” as the first target semantics.
  • the server when the interactive device converts the first voice signal to obtain only one first text and sends the first text to the server, the server directly determines the first target semantics by using the semantics of the first text, and then performs step S63 and later. A step of.
  • the server determines, according to the first target semantic, a third-party application object corresponding to the first target semantic.
  • the determining unit determines related tasks according to the selected first target semantics, for example, the task corresponding to “online” is a social task, and the startup unit needs to start a corresponding third-party application, such as QQ and/or Skype.
  • the candidate third-party application is referred to as a third-party application object.
  • the determining unit may be based on an association relationship between the predefined semantics and the third-party application, or may select the third-party application object by using the first target semantics according to the frequency of use of the third-party application, and the present invention does not limited.
  • only a third party application object corresponding to the first target semantics is determined according to the first target semantics, for example, Skype instant messaging software.
  • the server may further include a transmitting unit and a receiving unit. If the determined third-party application object is greater than or equal to two, obtain all third-party application object information associated with the first target semantics; the sending unit sends prompt information, where the prompt information includes the third-party application object information, Prompting the user to issue a further selection instruction according to the third-party application object information, so that the determining unit selects the target third-party application object from all third-party application objects associated with the first target semantics; the receiving unit receives the selection instruction, The determining unit selects a corresponding target third-party application object according to the selection instruction; correspondingly, the startup unit acquires a third-party application associated with the target third-party application object and starts the third-party application.
  • the sending unit sends prompt information, where the prompt information includes the third-party application object information, Prompting the user to issue a further selection instruction according to the third-party application object information, so that the determining unit selects the target third-party application object from all third-party application objects associated with the
  • the processing unit can feed back to the voice recognition unit a command that needs to input more information; and feed back a command to the voice recognition unit, such as "Do you want to be Skype instant communication" or "QQ social communication".
  • the speech recognition unit can feed the command to the user in a voice or text manner, allowing the user to input more requests. For example, the user answers "Skype", whereby the parsing unit uses "Skype" as the second target semantic.
  • the determining unit determines the target third party application object as the Skype instant messaging software according to the second target semantics.
  • the server obtains a third-party application associated with the determined third-party application object from a third-party application registry, starts the third-party application, and sends a response feedback to the interaction device.
  • the server's boot unit can alternatively access the storage unit of the server or the storage unit of the interactive device, find the determined third-party application and launch the third-party application according to the stored third-party application registration information table.
  • the startup unit determines the specific third-party application, it accesses the storage unit.
  • the storage unit of the interactive device stores a third-party application registration information table
  • the storage unit of the server stores a third-party application and a third-party application registration information table.
  • the third-party application registration information table can be generated when the third-party application is installed, or can be generated when the device is powered on after installation.
  • the startup unit of the server needs to access the storage unit of the interaction device to acquire the third-party application and start the third-party application.
  • the interaction device receives response feedback sent by the server.
  • the sending unit sends a semantic command of the first target semantics to the third-party application, so that the third-party application invokes the third-party application that is started according to the semantic command of the first target semantics.
  • the matching API English Application Programming Interface, Chinese is the application programming interface
  • the semantic command of the parsing unit is received, and the corresponding operation is completed according to the semantic command.
  • the parsing unit sends a semantic command to QQ, and the delivery of the command may also be a predefined API; after receiving the command, QQ finds the friend named "Old Hu" in the QQ buddy list.
  • the third-party application may involve the problem of user authentication, and the user account and password need to be input, and such information is fed back to the user by the third-party application through the sending unit of the server and the voice recognition unit of the interactive device; Enter the corresponding account number and password by voice or keyboard.
  • the information of the text exchanged between the respective units such as the voice recognition unit and the parsing unit may be decoded, and the form presented to the user may be a played voice or screen text or the like.
  • the embodiment of the present invention utilizes a voice recognition function, according to a user's voice signal, can determine a third-party application corresponding to a voice signal through semantic analysis; find a third-party application registry information and start a third-party program, thereby eliminating the need for the user to click on a third party
  • the application launches the corresponding program, providing users with smarter services, which is convenient for users.
  • the difference from the embodiments 4 and 5 is that the third party application itself also supports the voice recognition function, and the related operations can be performed according to the user's voice input.
  • the embodiment 6 and the embodiment 4 are the same as the process of recognizing the corresponding third party application according to the user's voice signal. Referring to the embodiment 5, the details are not described herein.
  • the execution process of the third-party application to the task in Embodiment 6 is different after the third-party application is started.
  • the third-party application After the third-party application is launched, it can directly receive the user's voice signal due to its own voice recognition function, and perform related operations according to the command.
  • the third-party application receives the user's voice signal, converts the text, completes the semantic parsing of the text, and performs semantic-corresponding tasks.
  • Third-party applications place semantic analysis and task execution on their corresponding third-party servers.
  • the corresponding third-party server for example For Tencent's servers, the communication between third-party applications and third-party servers can be private interfaces and protocols defined by third parties themselves.
  • the third-party application is a map
  • the third-party application needs to invoke the GPS information of the interactive device, and can query the address information of the interactive device through the corresponding API, and the specific map information can be accessed by the third-party application to the third-party server. , get the corresponding map information.
  • FIG. 7 is a schematic block diagram of an interactive device 70 for speech recognition in accordance with a seventh embodiment of the present invention.
  • the interactive device 70 may include: a parsing unit 71, a determining unit 72, a boot unit 73, a storage unit 74, and the like.
  • the interactive device can be a device such as a mobile terminal or a tablet PC.
  • Parsing unit 71 parses one or more first texts to obtain first target semantics, the first text being converted from the first speech signal.
  • the determining unit 72 determines, according to the first target semantics acquired by the parsing unit, a third-party application object corresponding to the first target semantic, and the third-party application associated with the third-party application object is that the device passes the factory without authorization.
  • the initiating unit 73 accesses the storage unit, acquires a third-party application associated with the determined third-party application object from the third-party application registration table stored in the storage unit, and starts the third-party application.
  • the storage unit 74 stores a third party application registry and the third party application.
  • the interaction device 70 can perform the voice recognition method in Embodiment 1 of the present invention, and details are not described herein again.
  • the embodiment of the present invention utilizes a voice recognition function, according to a user's voice signal, can determine a third-party application corresponding to a voice signal through semantic analysis; find a third-party application registry information and start a third-party program, thereby eliminating the need for the user to click on a third party
  • the application launches the corresponding program, providing users with smarter services, which is convenient for users.
  • FIG. 8 is a schematic block diagram of an interactive device 80 for voice recognition according to Embodiment 8 of the present invention.
  • the interactive device 80 includes a voice recognition unit 81, a parsing unit 82, a determining unit 83, a starting unit 84, a storage unit 85, and a generating unit 86, and a transmitting unit 87 and a receiving unit 88.
  • the parsing unit 82, the determining unit 83, the initiating unit 84, and the storage unit 85 of the interaction device 80 are the same as or similar to the parsing unit 71, the determining unit 72, the initiating unit 73, and the storage unit 74 of the interaction device 70, except that the interaction device 80 further includes a voice recognition unit 81 and a generating unit 86, and a transmitting unit 87 and receiving unit 88.
  • the speech recognition unit 81 converts the first speech signal to obtain one or more first texts, and transmits the same to the parsing unit 82.
  • the generating unit 86 generates the third-party application before the interaction device acquires the third-party application associated with the determined third-party application object from the third-party application registry and starts the third-party application
  • the registry is sent to the storage unit 85.
  • the parsing unit 82 specifically converts the first speech signal to obtain at least two first texts; according to a predetermined scoring rule, according to the semantic naturalness and coherence of the at least two first texts, in the at least two first texts And scoring the semantics of each of the first texts, the level of the scores embodying the naturalness and consistency of the semantics; obtaining the first texts of the at least two first texts having the highest semantic score Semantics as the first target semantics.
  • the parsing unit 82 specifically acquires the semantics of the first text with the highest semantic score of the at least two first texts and exceeding a preset threshold as the first target semantics.
  • the determining unit 83 determines, according to the first target semantics, all third-party application objects corresponding to the first target semantics; if the third-party application objects are greater than or equal to two, acquiring all the first targets The semantically associated third-party application object information; the sending unit 87 sends the prompting information, where the prompting information includes the third-party application object information, to prompt the user to issue a further selection instruction according to the third-party application object information, so as to determine
  • the unit 83 selects a target third party application object from all third party application objects associated with the first target semantics; the receiving unit 88 receives a selection instruction, and the determining unit 83 selects a corresponding target third party according to the received selection instruction.
  • An application object; the startup unit specifically obtains a third-party application associated with the target third-party application object from a third-party application registry stored by the storage unit, and starts the third-party application.
  • the speech recognition unit 81 further converts the second speech signal to obtain a second text and sends the second text to the parsing unit 82 when the selection instruction is the second speech signal; the parsing unit 82 further parses the second text acquisition The second target semantics; the determining unit 83 determines the corresponding target third-party application object according to the second target semantics.
  • the initiating unit 84 after launching the third-party application, transmitting the semantic command of the first target semantics to the third-party application, so that the third-party application performs a semantic command according to the first target semantic
  • a third-party application interface API that matches the third-party application being launched is invoked.
  • the interaction device 80 implements the voice recognition method of the embodiment 1, 2 or 3 of the present invention, and the specific details are not described herein again.
  • the embodiment of the present invention utilizes a voice recognition function, according to a user's voice signal, can determine a third-party application corresponding to a voice signal through semantic analysis; find a third-party application registry information and start a third-party program, thereby eliminating the need for the user to click on a third party
  • the application launches the corresponding program, providing users with smarter services, which is convenient for users.
  • FIG. 9 is a schematic block diagram of a server 90 for speech recognition in accordance with a ninth embodiment of the present invention.
  • the server 90 includes a parsing unit 91, a determining unit 92, a starting unit 93, a storage unit 94, and the like.
  • Server 90 provides support services for interactive devices.
  • the parsing unit 91 parses the one or more first texts to obtain a first target semantic, wherein the first text is obtained by converting the first speech signal.
  • the determining unit 92 determines, according to the first target semantics acquired by the parsing unit, a third-party application object corresponding to the first target semantic, and the third-party application associated with the third-party application object is that the device passes the factory without authorization.
  • the launch unit 93 obtains a third party application associated with the determined third party application object from the third party application registry stored by the storage unit 92 and launches the third party application.
  • the storage unit 92 stores the third party application registry and the third party application.
  • the server 90 implements the voice recognition method of the embodiment 1 of the present invention, and the details are not described herein.
  • the embodiment of the present invention utilizes a voice recognition function, according to a user's voice signal, can determine a third-party application corresponding to a voice signal through semantic analysis; find a third-party application registry information and start a third-party program, thereby eliminating the need for the user to click on a third party
  • the application launches the corresponding program, providing users with smarter services, which is convenient for users.
  • FIG 10 is a schematic block diagram of a server 100 for speech recognition in accordance with a tenth embodiment of the present invention.
  • the server 100 includes a parsing unit 101, a determining unit 102, a starting unit 103, a storage unit 104, and a generating unit 105, and a transmitting unit 106 and a receiving unit 107 and the like.
  • the parsing unit 101, the determining unit 102, the starting unit 103, and the storage unit 104 of the server 100 are the same as or similar to the parsing unit 91, the determining unit 92, the starting unit 93, and the storage unit 94 of the server 90.
  • the difference is that the server 100 further includes a generating unit 105 and a transmitting unit 106 and a receiving unit 107.
  • the generating unit 105 when the server acquires a third-party application associated with the determined third-party application object in the third-party application registry, and starts the third-party application
  • the third party application registry is generated and sent to the storage unit 104.
  • the parsing unit 101 specifically performs, for each of the at least two first texts, the semantics of the at least two of the first texts obtained by converting the first speech signal according to a predetermined scoring rule.
  • the semantics of the scores the level of the scores reflecting the naturalness and coherence of the semantics; obtaining the semantics of the first text with the highest semantic score of the at least two first texts as the A target semantic.
  • the parsing unit 101 specifically acquires the semantics of the first text having the highest semantic score of the at least two first texts and exceeding a preset threshold as the first target semantics.
  • the determining unit 102 further determines, according to the first target semantics, all third-party application objects corresponding to the first target semantics; if the third-party application objects are greater than or equal to two, acquiring all the first targets
  • the unit 101 selects a target third-party application object from all third-party application objects associated with the first target semantics; the receiving unit 107 receives a selection instruction, and the determining unit 101 selects a corresponding target third-party application according to the received selection instruction.
  • the object; the initiating unit 103 acquires a third-party application associated with the target third-party application object from the third-party application registry stored by the storage unit 104 and starts the third-party application.
  • the parsing unit 101 further parses the second text to acquire the second target semantics when the selection instruction is the second speech signal, wherein the second text is obtained by converting the second speech signal;
  • the second target semantics determines the corresponding target third party application object.
  • the initiating unit 103 after initiating the third-party application, sends the semantic command of the first target semantics to the third-party application, so that the third-party application according to the semantics of the first target semantics
  • the command invokes a third party application interface API that matches the third party application being launched.
  • the startup unit 103 after launching the third-party application, causes the third-party application to convert the third voice signal to obtain a third text; and according to the semantic command of the third text, the invoked and the activated A third-party application interface API that matches the third-party application to enable the launched third-party application to obtain the required hardware or software resources through the API.
  • the receiving unit 107 also receives the first text sent by the interactive device before parsing the first text to obtain the first target semantics.
  • the sending unit 106 sends a response feedback to the interactive device after starting the third-party application.
  • the server 100 implements the voice recognition method of the embodiment 1, 5 or 6 of the present invention, and the specific details are not described herein again.
  • the embodiment of the present invention utilizes a voice recognition function, according to a user's voice signal, can determine a third-party application corresponding to a voice signal through semantic analysis; find a third-party application registry information and start a third-party program, thereby eliminating the need for the user to click on a third party
  • the application launches the corresponding program, providing users with smarter services, which is convenient for users.
  • FIG 11 is a schematic block diagram of a device 110 for speech recognition in accordance with an eleventh embodiment of the present invention.
  • the device 110 includes a processor 111 and a memory 112 and the like.
  • the memory 112 may be a RAM and a ROM, or any fixed storage medium, or a removable storage medium for storing a program that can perform the embodiments of the present invention, or data to be processed in the embodiment of the present invention, or a third-party application. Programs, etc.
  • the way of the bus communicates bidirectionally with other devices.
  • the processor 111 parses one or more first texts to obtain a first target semantic, the first text is converted by the first voice signal; determining and synchronizing according to the acquired first target semantics a third-party application object corresponding to the first target semantics, the third-party application associated with the third-party application object is a program that is initiated by a voice mode without authorization by the device when the device leaves the factory; accessing the memory, being stored from the memory A third party application associated with the determined third party application object is obtained in the third party application registry and the third party application is launched.
  • the memory 112 stores a third party application registry and the third party application, and the processor 111 executes the instructions of the above method.
  • the processor 111 converts the first voice signal acquired by the microphone or the like to obtain one or more first texts.
  • the processor 111 specifically converts the first voice signal to obtain at least two first texts; according to a predetermined scoring rule, the at least two first according to the semantic naturalness and consistency of the at least two first texts
  • the semantics of each first text in the text is scored, and the level of the score reflects the naturalness and coherence of the semantics; and the first of the semantic scores of the at least two first texts is obtained.
  • the semantics of the text serves as the first target semantics.
  • the processor 111 specifically acquires the semantic score of the at least two first texts to be the highest and exceeds the preset.
  • the semantics of the first text of the threshold is used as the first target semantic.
  • the processor 111 further determines, according to the first target semantics, all third-party application objects corresponding to the first target semantics; if the third-party application objects are greater than or equal to two, acquiring all the first targets Semantic-related third-party application object information; issuing prompt information, the prompt information including the third-party application object information, to prompt the user to issue a further selection instruction according to the third-party application object information, to Selecting a target third-party application object among third-party application objects associated with the first target semantics; receiving a selection instruction, and selecting a corresponding target third-party application object according to the received selection instruction; a third party stored from the memory 112 A third party application associated with the target third party application object is obtained in the application registry and the third party application is launched.
  • the processor 111 further converts the second speech signal to obtain a second text when the selection instruction is the second speech signal; parses the second text to obtain second target semantics; and determines corresponding according to the second target semantics Target third-party application objects.
  • the processor 111 generates a third party application registration form before acquiring a third party application associated with the determined third party application object from the third party application registry and launching the third party application.
  • the processor 111 sends the semantic command of the first target semantics to the third-party application, so that the third-party application performs semantic commands according to the first target semantics.
  • a third-party application interface API that matches the third-party application being launched is invoked.
  • the processor 111 generates the third-party application registry before the device 110 acquires the third-party application associated with the determined third-party application object from the third-party application registry and starts the third-party application. And sent to the memory 112.
  • the processor 111 when the device 110 does not include a device or a module or the like that collects a voice signal, such as a microphone, the processor 111 also receives the first text transmitted by the other device before parsing the first text to obtain the first target semantics. a first text; parsing the first text to obtain a first target semantic, wherein, optionally, the first text obtains at least two first texts by converting the first speech signal, and determining the context according to the acquired first target semantics a third-party application object corresponding to the first target semantics, obtaining a third-party application associated with the determined third-party application object from the third-party application registry stored in the memory 112, and launching the third-party application .
  • the memory 112 stores the third party application registry and the third party application
  • the processor 111 executes the above method Instructions. .
  • the processor 111 generates the third party application registry and sends it to the memory 112 when the device 111 installs a third party application or when the device 111 is powered on.
  • the processor 111 specifically, according to a predetermined scoring rule, the semantics of each of the at least two first texts according to the semantic naturalness and coherence of the at least two first texts obtained by converting the first speech signal Performing a scoring, the level of the scoring reflecting the naturalness and coherence of the semantics; obtaining the semantics of the first text having the highest semantic score of the at least two first texts as the first target Semantics.
  • the processor 111 specifically acquires the semantics of the first text with the highest semantic score of the at least two first texts and exceeding a preset threshold as the first target semantics.
  • the processor 111 further determines, according to the first target semantics, all third-party application objects corresponding to the first target semantics; if the third-party application objects are greater than or equal to two, acquiring all the first targets Semantic-related third-party application object information; issuing prompt information, the prompt information including the third-party application object information, to prompt the user to issue a further selection instruction according to the third-party application object information, to Selecting a target third-party application object among third-party application objects associated with the first target semantics; receiving a selection instruction, and selecting a corresponding target third-party application object according to the received selection instruction; a third party stored from the memory 112 A third party application associated with the target third party application object is obtained in the application registry and the third party application is launched.
  • the processor 111 further parses the second text to obtain a second target semantic when the selection instruction is a second speech signal, wherein the second text is obtained by converting the second speech signal; according to the second The target semantics determine the corresponding target third-party application object.
  • processor 111 further activates the third-party application, sending the semantic command of the first target semantics to the third-party application, so that the third-party application performs semantic commands according to the first target semantics.
  • a third-party application interface API that matches the third-party application being launched is invoked.
  • the processor 111 causes the third-party application to convert the third voice signal to obtain a third text; and according to the semantic command of the third text, the invoked and the activated A third party application interface API that matches the third party application to enable the launched third party application to obtain the required hardware or software resources through the API.
  • the processor 111 sends a response to the other device after starting the third-party application Feed.
  • the device 110 implements the voice recognition method in the embodiment 1, 2, 3, 5 or 6 of the present invention.
  • the specific implementation may be, for example, a mobile terminal, a tablet computer or a server.
  • the embodiment of the present invention utilizes a voice recognition function, according to a user's voice signal, can determine a third-party application corresponding to a voice signal through semantic analysis; find a third-party application registry information and start a third-party program, thereby eliminating the need for the user to click on a third party
  • the application launches the corresponding program, providing users with smarter services, which is convenient for users.
  • the embodiment of the present invention further provides a voice recognition system, which may include any one of the device 110, the interaction device 70, the interaction device 80, the server 90, and the server 100 in the foregoing embodiment. For the sake of cleanliness, the details will not be described here.
  • the disclosed systems, devices, and methods may be implemented in other ways.
  • the device embodiments described above are merely illustrative.
  • the division of the unit is only a logical function division.
  • there may be another division manner for example, multiple units or components may be combined or Can be integrated into another system, or some features can be ignored, or not executed.
  • the coupling or direct coupling or communication connection shown or discussed may be an indirect coupling or communication connection through some interface, device or unit, and may be electrical, mechanical or otherwise.
  • the units described as separate components may or may not be physically separate, and the components displayed as units may or may not be physical units, that is, may be located in one place, or may be distributed to multiple network units. Some or all of the units may be selected according to actual needs to achieve the objectives of the solution of the embodiment.
  • each functional unit in each embodiment of the present invention may be integrated into one processing unit, or each unit may exist physically separately, or two or more units may be integrated in one unit. In the unit.
  • the functions, if implemented in the form of software functional units and sold or used as separate products, may be stored in a computer readable storage medium.
  • the technical solution of the present invention which is essential to the prior art or part of the technical solution, may be embodied in the form of a software product stored in a storage medium, including
  • the instructions are used to cause a computer device (which may be a personal computer, server, or network device, etc.) to perform all or part of the steps of the methods described in various embodiments of the present invention.
  • the foregoing storage medium includes: a U disk, a mobile hard disk, a read-only memory (ROM), a random access memory (RAM), a magnetic disk or an optical disk, and the like, which can store program codes. .

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Artificial Intelligence (AREA)
  • Theoretical Computer Science (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Human Computer Interaction (AREA)
  • General Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • User Interface Of Digital Computer (AREA)
  • Telephonic Communication Services (AREA)
  • Telephone Function (AREA)

Abstract

提供了一种语音识别方法、装置和系统。所述语音识别方法包括:解析一个或多个第一文本以得到第一目标语义,第一文本由第一语音信号转换得到(S11);根据所获取的第一目标语义确定与第一目标语义相对应的第三方应用对象,第三方应用对象相关联的第三方应用程序是设备出厂时未经授信通过语音方式启动的程序(S12);从第三方应用注册表中获取与所确定的第三方应用对象相关的第三方应用程序并启动第三方应用程序(S13)。所述语音识别方法根据用户的语音信号,可以通过语义分析,确定语音信号对应的第三方应用程序,查找第三方应用注册表信息并启动第三方程序,从而无需用户通过点击第三方应用程序来启动对应程序,为用户提供更智能的服务,方便用户的使用。

Description

语音识别的方法、 交互设备、 服务器和系统 本申请要求于 2013 年 01 月 06 日提交中国专利局、 申请号为 201310003504.4、 发明名称为"语音识别的方法、 交互设备、 服务器和系统" 的中国专利申请的优先权, 其全部内容通过引用结合在本申请中。 技术领域
本发明涉及交互设备应用领域, 更具体地, 涉及语音识别的方法、 交互 设备、 服务器和系统。 背景技术
现有技术中,用户可以为交互设备(英文为 User Equipment,缩写为 UE ) 灵活自主的下载各种应用(英文为 application ), 这些应用不一定是终端厂家 提供的。 我们将非交互设备自带且用户自行获取的应用称为第三方应用程 序, 如苹果的应用商城(英文为 APP store )和安卓市场 (英文为 android market ) 中的应用, 也可以说是设备出厂时未经授信通过语音方式启动的程 序。 第三方应用程序独立于操作系统的但是能在操作系统的操作环境下被调 用并执行的程序集或者指令集。 正是有了这些数量众多的第三方应用程序, 极大地丰富了用户对应用的选择。 目前最常见的两个交互设备的操作系统为 安卓(英文为 Android )和苹果操作系统(缩写为 iOS ) , 都可以支持第三方 应用程序。 其中 Android平台上, 第三方可以将开发的应用上传到安卓市场 上发售; iOS平台上, 第三方可以将开发的应用上传到应用商城上发售。
交互设备的用户可以通过连接网络,将安卓市场或者苹果商店的应用下 载到该交互设备, 这些应用可以是免费的或者收费的, 且这些应用包括最常 见的地图、 QQ社交通信软件、 Skype 即时通信软件或淘宝购物软件等。 上 述应用会以图标的形式呈现在终端上, 当用户需要启用某个第三方应用程序 时, 通过触摸屏点击对应的图标, 触发该第三方应用程序启动。
对于交互设备, 无需点击对应的图标, 如何更智能地通过语音识别来启 动第三方应用程序是个难题。 发明内容 有鉴于此, 本发明实施例提供一种语音识别的方法、 交互设备、 服务器 和系统, 以解决通过语音识别来启动第三方应用程序的难题。
第一方面, 提供了一种语音识别的方法, 包括: 解析一个或多个第一文 本以得到第一目标语义, 第一文本由第一语音信号转换得到; 根据所获取的 第一目标语义确定与第一目标语义相对应的第三方应用对象, 第三方应用对 象相关联的第三方应用程序是设备出厂时未经授信通过语音方式启动的程 序; 从第三方应用注册表中获取与所确定的第三方应用对象相关联的第三方 应用程序并启动第三方应用程序。
在第一种可能的实现方式中, 在解析第一文本以得到第一目标语义之 前, 方法还包括: 获取第一语音信号并转换所述第一语音信号得到一个或多 个第一文本。
结合第一方面或第一种可能的实现方式, 在第二种可能的实现方式中, 转换第一语音信号得到至少两个第一文本; 按照预定的评分规则根据至少两 个第一文本的语义的自然性及连贯性对至少两个第一文本中的每个第一文 本的语义进行评分, 评分的高低体现语义的自然性及连贯性的优劣。
结合第一方面的第二种可能的实现方式, 在第三种可能的实现方式中, 获取至少两个第一文本的语义评分最高且超过预设阈值的第一文本的语义 作为所述的第一目标语义。
结合第一方面的第三种可能的实现方式, 在第四种可能的实现方式中, 根据第一目标语义, 确定与第一目标语义相对应的所有第三方应用对象; 若 第三方应用对象大于等于两个, 则获取所有与第一目标语义相关联的第三方 应用对象信息; 发出提示信息, 提示信息包含所述第三方应用对象信息, 用 以提示用户根据所述第三方应用对象信息发出进一步的选择指令, 以从所有 与第一目标语义相关联的第三方应用对象中选则目标第三方应用对象;接收 选择指令, 并根据选择指令选择相应的目标第三方应用对象; 相应, 从第三 方应用注册表中获取与所确定的第三方应用对象相关联的第三方应用程序 并启动第三方应用程序具体包括: 从第三方应用注册表中获取与目标第三方 应用对象相关联的第三方应用程序并启动第三方应用程序。
结合第一方面的第四种可能的实现方式, 在第五种可能的实现方式中, 当选择指令为第二语音信号; 相应, 根据接收的选择指令选择相应的目标第 三方应用对象具体包括: 转换第二语音信号得到第二文本; 解析第二文本获 取第二目标语义; 根据第二目标语义确定相应的目标第三方应用对象。
结合第一方面或第一方面的上述可能的实现方式,在第六种可能的实现 方式中, 方法还包括: 在从第三方应用注册表中获取与所确定的第三方应用 对象相关联的第三方应用程序并启动第三方应用程序之前, 生成第三方应用 注册表。
结合第一方面或第一方面的上述可能的实现方式,在第七种可能的实现 方式中, 启动第三方应用程序后还包括: 向第三方应用程序发送第一目标语 义的语义命令, 以使第三方应用程序根据第一目标语义的语义命令调用与被 启动的第三方应用程序相匹配的第三方应用接口 ( API )。
结合第一方面或第一方面的上述可能的实现方式,在第八种可能的实现 方式中, 方法还包括: 第三方应用程序接收用户的第三语音信号; 转换第三 语音信号以得到第三文本; 根据第三文本的语义命令调用与被启动的第三方 应用程序相匹配的第三方应用接口(API ), 以使被启动的第三方应用程序通 过 API获取所需的硬件或软件资源。
结合第一方面或第一方面的上述可能的实现方式,在第九种可能的实现 方式中, 方法由交互设备执行。
结合第一方面或第一方面的上述可能的实现方式,在第九种可能的实现 方式中, 解析一个或多个第一文本以得到第一目标语义由服务器执行; 根据 所获取的第一目标语义确定与所述第一目标语义相对应的第三方应用对象, 以及从第三方应用注册表中获取与所确定的第三方应用对象相关联的第三 方应用程序并启动所述第三方应用程序由交互设备执行。
第二方面,提供了一种语音识别的交互设备, 包括解析单元、确定单元、 启动单元和存储单元: 解析单元, 用于解析一个或多个语音识别单元发送的 第一文本以得到第一目标语义, 所述第一文本由第一语音信号转换得到; 确 定单元,用于根据解析单元所获取的第一目标语义确定与第一目标语义相对 应的第三方应用对象, 第三方应用对象相关联的第三方应用程序是设备出厂 时未经授信通过语音方式启动的程序; 启动单元, 用于访问存储单元, 从存 储单元存储的第三方应用注册表中获取与确定单元所确定的第三方应用对 象相关联的第三方应用程序并启动第三方应用程序; 和存储单元, 用于存储 第三方应用注册表和第三方应用程序。
在第一种可能的实现方式中, 交互设备还包括语音识别单元, 用于转换 第一语音信号得到一个或多个第一文本, 并发送到所述解析单元。
结合第二方面或第二方面的第一种可能的实现方式,在第二种可能的实 现方式中, 交互设备还包括语音识别单元: 语音识别单元用于转换第一语音 信号得到至少两个第一文本并发送到所述解析单元; 解析单元具体用于按照 预定的评分规则,根据至少两个第一文本的语义的自然性及连贯性对至少两 个第一文本中的每个第一文本的语义进行评分,评分的高低体现语义的自然 性及连贯性的优劣; 获取至少两个第一文本的语义评分最高的第一文本的语 义作为的第一目标语义。
结合第二方面的第二种可能的实现方式, 在第三种可能的实现方式中, 解析单元具体用于: 获取至少两个第一文本的语义评分最高且超过预设阈值 的第一文本的语义作为所述的第一目标语义。
结合第二方面或第二方面的上述可能的实现方式,在第四种可能的实现 方式中, 交互设备还包括发送单元和接收单元: 确定单元, 具体用于根据第 一目标语义, 确定与第一目标语义相对应的所有第三方应用对象; 若第三方 应用对象大于等于两个, 则获取所有与第一目标语义相关联的第三方应用对 象信息; 发送单元, 用于发出提示信息, 提示信息包含所述第三方应用对象 信息, 用以提示用户根据第三方应用对象信息发出进一步的选择指令, 以便 确定单元从所有与第一目标语义相关联的第三方应用对象中选则目标第三 方应用对象; 接收单元, 用于接收用户发送的选择指令; 确定单元, 具体用 于根据接收的选择指令选择相应的目标第三方应用对象; 启动单元从存储单 元存储的第三方应用注册表中获取与目标第三方应用对象相关联的第三方 应用程序并启动第三方应用程序。
结合第二方面的第四种可能的实现方式, 在第五种可能的实现方式中, 语音识别单元还用于当的选择指令为第二语音信号时,转换第二语音信号得 到第二文本并发送到解析单元; 解析单元还用于解析第二文本获取第二目标 语义; 确定单元还用于根据第二目标语义确定相应的目标第三方应用对象。
结合第二方面的上述可能的实现方式, 在第六种可能的实现方式中, 交 互设备还包括生成单元: 生成单元在启动单元从第三方应用注册表中获取与 所确定的第三方应用对象相关联的第三方应用程序并启动第三方应用程序 之前, 生成第三方应用注册表并发送到存储单元。
结合第二方面或第二方面的上述可能的实现方式,在第七种可能的实现 方式中, 启动单元启动第三方应用程序后, 向第三方应用程序发送第一目标 语义的语义命令, 以使第三方应用程序根据第一目标语义的语义命令调用与 被启动的第三方应用程序相匹配的第三方应用接口 ( API )。
第三方面, 提供了一种语音识别的服务器, 包括解析单元、 确定单元、 启动单元和存储单元: 解析单元, 用于解析一个或多个第一文本以得到第一 目标语义, 其中第一文本由转换第一语音信号获得; 确定单元, 用于根据解 析单元所获取的第一目标语义确定与第一目标语义相对应的第三方应用对 象, 第三方应用对象相关联的第三方应用程序是设备出厂时未经授信通过语 音方式启动的程序; 启动单元, 用于从存储单元存储的第三方应用注册表中 获取与确定单元所确定的第三方应用对象相关联的第三方应用程序并启动 第三方应用程序; 存储单元用于存储第三方应用注册表和第三方应用程序。
在第一种可能的实现方式中,解析单元具体用于:按照预定的评分规则, 根据转换第一语音信号得到的至少两个第一文本的语义的自然性及连贯性 对所述至少两个第一文本中的每个第一文本的语义进行评分,评分的高低体 现语义的自然性及连贯性的优劣; 获取所述至少两个第一文本的语义评分最 高的所述第一文本的语义作为所述的第一目标语义。
结合第三方面的第一种可能的实现方式, 在第二种可能的实现方式中, 解析单元具体用于: 获取所述至少两个第一文本的语义评分最高且超过预设 阈值的所述第一文本的语义作为所述的第一目标语义。
结合第三方面或第三方面的上述可能的实现方式,在第三种可能的实现 方式中, 服务器还包括发送单元和接收单元: 确定单元, 具体用于根据第一 目标语义, 确定与第一目标语义相对应的所有第三方应用对象; 若第三方应 用对象大于等于两个, 则获取所有与第一目标语义相关联的第三方应用对象 信息; 发送单元, 用于发出提示信息, 提示信息包含所述第三方应用对象信 息, 用以提示用户根据第三方应用对象信息发出进一步的选择指令, 以便确 定单元从所有与第一目标语义相关联的第三方应用对象中选则目标第三方 应用对象; 接收单元, 用于接收用户发送的选择指令; 确定单元, 具体用于 根据接收的选择指令选择相应的目标第三方应用对象; 启动单元从存储单元 存储的第三方应用注册表中获取与目标第三方应用对象相关联的第三方应 用程序并启动第三方应用程序。
结合第三方面的第三种可能的实现方式, 在第四种可能的实现方式中, 解析单元还用于当的选择指令为第二语音信号时,解析第二文本获取第二目 标语义, 其中第二文本由转换第二语音信号获得; 确定单元具体用于根据第 二目标语义确定相应的目标第三方对象。
结合第三方面或第三方面的上述可能的实现方式,在第五种可能的实现 方式中, 服务器还包括生成单元: 生成单元在启动单元从第三方应用注册表 中获取与所确定的第三方应用对象相关联的第三方应用程序并启动第三方 应用程序之前, 生成第三方应用注册表并发送到存储单元。
结合第三方面或第三方面的上述任一项可能的实现方式,在第六种可能 的实现方式中, 启动单元启动的第三方应用程序后, 向第三方应用程序发送 第一目标语义的语义命令, 以使第三方应用程序根据第一目标语义的语义命 令调用与被启动的第三方应用程序相匹配的第三方应用接口 ( API )。
结合第三方面或第三方面的上述可能的实现方式,在第七种可能的实现 方式中, 启动单元启动第三方应用程序后, 以使第三方应用程序转换第三语 音信号以得到第三文本; 根据第三文本的语义命令调用与被启动的第三方应 用程序相匹配的第三方应用接口(API ), 以使被启动的第三方应用程序通过 API获取所需的硬件或软件资源。
结合第三方面或第三方面的上述可能的实现方式,在第八种可能的实现 方式中, 接收单元还用于在解析第一文本以得到第一目标语义前, 接收由交 互设备发送的第一文本。
结合第三方面或第三方面的上述可能的实现方式,在第九种可能的实现 方式中, 发送单元还用于在启动第三方应用程序后向交互设备发送响应反 馈。
第四方面, 提供了一种语音识别的装置, 包括处理器和存储器: 处理器 解析一个或多个第一文本以得到第一目标语义, 所述第一文本由第一语音信 号转换得到; 根据所获取的第一目标语义确定与所述第一目标语义相对应的 第三方应用对象, 所述第三方应用对象相关联的第三方应用程序是设备出厂 时未经授信通过语音方式启动的程序; 访问存储器, 从所述存储器存储的第 三方应用注册表中获取与所确定的第三方应用对象相关联的第三方应用程 序并启动所述第三方应用程序, 其中存储器存储第三方应用注册表和所述第 三方应用程序以及处理器执行上述方法的指令。
在第一种可能的实现方式中, 处理器在解析第一文本以得到第一目标语 义之前,获取第一语音信号并转换所述第一语音信号得到一个或多个第一文 本。
结合第四方面或第一种可能的实现方式, 在第二种可能的实现方式中, 处理器转换第一语音信号得到至少两个第一文本;按照预定的评分规则根据 至少两个第一文本的语义的自然性及连贯性对至少两个第一文本中的每个 第一文本的语义进行评分, 评分的高低体现语义的自然性及连贯性的优劣。
结合第四方面的第二种可能的实现方式, 在第三种可能的实现方式中, 处理器获取至少两个第一文本的语义评分最高且超过预设阈值的第一文本 的语义作为所述的第一目标语义。
结合第四方面的第三种可能的实现方式, 在第四种可能的实现方式中, 处理器根据第一目标语义,确定与第一目标语义相对应的所有第三方应用对 象; 若第三方应用对象大于等于两个, 则获取所有与第一目标语义相关联的 第三方应用对象信息; 发出提示信息, 提示信息包含所述第三方应用对象信 息, 用以提示用户根据所述第三方应用对象信息发出进一步的选择指令, 以 从所有与第一目标语义相关联的第三方应用对象中选则目标第三方应用对 象;接收选择指令,并根据选择指令选择相应的目标第三方应用对象;相应, 从第三方应用注册表中获取与所确定的第三方应用对象相关联的第三方应 用程序并启动第三方应用程序具体包括: 从第三方应用注册表中获取与目标 第三方应用对象相关联的第三方应用程序并启动第三方应用程序。
结合第四方面的第四种可能的实现方式, 在第五种可能的实现方式中, 处理器当选择指令为第二语音信号; 相应, 根据接收的选择指令选择相应的 目标第三方应用对象具体包括: 转换第二语音信号得到第二文本; 解析第二 文本获取第二目标语义; 根据第二目标语义确定相应的目标第三方应用对 结合第四方面或第四方面的上述可能的实现方式,在第六种可能的实现 方式中,处理器在从第三方应用注册表中获取与所确定的第三方应用对象相 关联的第三方应用程序并启动第三方应用程序之前, 生成第三方应用注册表 并发送到存储器。
结合第四方面或第四方面的上述可能的实现方式,在第七种可能的实现 方式中, 处理器启动第三方应用程序后还包括: 向第三方应用程序发送第一 目标语义的语义命令, 以使第三方应用程序根据第一目标语义的语义命令调 用与被启动的第三方应用程序相匹配的第三方应用接口 ( API )。 结合第四方面或第四方面的上述可能的实现方式,在第八种可能的实现 方式中, 处理器在启动第三方应用程序后, 以使第三方应用程序接收用户的 第三语音信号; 转换第三语音信号以得到第三文本; 根据第三文本的语义命 令调用与被启动的第三方应用程序相匹配的第三方应用接口(API ), 以使被 启动的第三方应用程序通过 API获取所需的硬件或软件资源。
第五方面, 提供了一种语音识别的系统, 包括上述任一交互设备、 服务 器或装置。
通过上述技术方案, 根据用户的语音信号, 可以通过语义分析, 确定语 音信号对应的第三方应用程序; 查找第三方应用注册表信息并启动第三方程 序, 从而无需用户通过点击第三方应用程序来启动对应程序, 为用户提供了 更智能的服务, 方便了用户的使用。 附图说明
为了更清楚地说明本发明实施例的技术方案, 下面将对本发明实施例中 所需要使用的附图作筒单地介绍, 显而易见地, 下面所描述的附图仅仅是本 发明的一些实施例, 对于本领域普通技术人员来讲, 在不付出创造性劳动的 前提下, 还可以根据这些附图获得其他的附图。
图 1是本发明实施例 1的语音识别方法的示意流程图。
图 2是本发明实施例 2的语音识别方法的示意流程图
图 3是本发明实施例的第三方应用程序启动后调用交互设备硬件的示意 图。
图 4是本发明实施例 3的第三方应用程序支持语音识别功能的示意图。 图 5是本发明实施例 4的语音识别系统的示意架构图。
图 6是本发明实施例 5的语音识别方法的示意流程图。
图 7是本发明实施例 7的语音识别的交互设备的示意框图。
图 8是本发明实施例 8的语音识别的交互设备的示意框图。
图 9是本发明实施例 9的语音识别的服务器的示意框图。
图 10是本发明实施例 10的语音识别的服务器的示意框图。
图 11是本发明实施例 11的语音识别的装置的示意框图。 具体实施方式
下面将结合本发明实施例中的附图,对本发明实施例中的技术方案进行 清楚、 完整地描述, 显然, 所描述的实施例是本发明的一部分实施例, 而不 是全部实施例。 基于本发明中的实施例, 本领域普通技术人员在没有做出创 造性劳动的前提下所获得的所有其他实施例, 都应属于本发明保护的范围。
本文中术语"和 /或",仅仅是一种描述关联对象的关联关系,表示可以存 在三种关系, 例如, A和 /或 B, 可以表示: 单独存在 A, 同时存在 A和 B, 单独存在 B这三种情况。 另外, 本文中字符 "/" , 一般表示前后关联对象是 一种 "或" 的关系。
语音识别技术就是使机器, 也就是装置通过识别和理解过程, 将语音信 号转变为相应的文本或命令的技术。 语音识别技术主要包括特征提取技术、 模式匹配准则及模型训练技术三个方面。
现有技术中, 语音识别技术一般将识别结果以文本或命令的形式输出。 整个语音识别过程为语言信号输入, 经过处理后, 输出文字内容, 或者输出 命令用以驱动其他的设备, 从而改变了传统的鼠标或者键盘的输入方式。
现有技术的交互设备例如可以包括: 智能手机(英文为 smart phone )、 平板计算机(英文为 Tablet PC ), 游戏机等。 上述交互设备通常具有语音识 别功能, 且语音识别功能中从语音到文字的准确率在不断提高。 但交互设备 启动第三方应用程序必需通过触摸点击或者鼠标点击, 才能实现程序的调 用。 这样, 用户希望提高交互设备的智能化水平, 例如可以通过语音识别, 直接启动第三方应用程序。 本发明实施例提供了这样一种方法, 可以实现根 据用户的语音信号, 直接启动第三方应用程序的问题。
图 1是本发明实施例 1的语音识别方法的示意流程图, 包括以下内容。
511 , 解析一个或多个第一文本得到第一目标语义, 所述第一文本由第 一语音信号转换得到。
512, 根据所获取的第一目标语义确定与第一目标语义相对应的第三方 应用对象, 所述第三方应用对象相关联的第三方应用程序是设备出厂时未经 授信通过语音方式启动的程序。
S11和 S12在上下文中也称为语义分析。
S13 , 从第三方应用注册表中获取与所确定的第三方应用对象相关联的 第三方应用程序并启动第三方应用程序。 作为不同的实施例, 第三方应用程序可以下载在用户的交互设备上, 也 可以下载在服务器上; 将语音信号转换成文本的语音识别功能可以放置在交 互设备上, 也可以放置在服务器上; 解析文本获得目标语义以及确定与目标 语义相对应的第三方应用对象并启动第三方应用程序可以由交互设备执行, 也可以由服务器执行。 该服务器是用来为交互设备提供支持服务的, 与下文 中为第三方应用程序提供支持的第三方服务器不同。
本发明实施例利用语音识别功能, 根据用户的语音信号, 可以通过语义 分析, 确定语音信号对应的第三方应用程序; 查找第三方应用注册表信息并 启动第三方程序, 从而无需用户通过点击第三方应用程序来启动对应程序, 为用户提供了更智能的服务, 方便了用户的使用。
可选的, 作为不同的实施例, 在解析所述第一文本以得到第一目标语义 之前, 获取第一语音信号并转换所述第一语音信号得到所述一个或多个第一 文本。
可选的,作为不同的实施例,转换第一语音信号得到至少两个第一文本; 相应, 按照预定的评分规则, 根据所述至少两个第一文本的语义的自然性及 连贯性对所述至少两个第一文本中的每个第一文本的语义进行评分, 所述评 分的高低体现所述语义的自然性及连贯性的优劣; 获取所述至少两个第一文 本的语义评分最高的所述第一文本的语义作为所述的第一目标语义。
可选的, 作为不同的实施例, 获取所述至少两个第一文本的语义评分最 高的所述第一文本的语义作为所述的第一目标语义, 包括: 获取所述至少两 个第一文本的语义评分最高且超过预设阈值的所述第一文本的语义作为所 述的第一目标语义。
可选的, 作为不同的实施例, 根据所获取的第一目标语义确定与所述第 一目标语义相对应的第三方应用对象具体包括: 根据所述第一目标语义, 确 定与所述第一目标语义相对应的所有第三方应用对象; 若所述第三方应用对 象大于等于两个, 则获取所有与所述第一目标语义相关联的第三方应用对象 信息; 发出提示信息, 所述提示信息包含所述第三方应用对象信息, 用以提 示用户根据所述第三方应用对象信息发出进一步的选择指令, 以从所有与所 述第一目标语义相关联的第三方应用对象中选则目标第三方应用对象;接收 选择指令, 并根据接收的所述选择指令选择相应的目标第三方应用对象; 相 应, 所述从第三方应用注册表中获取与所确定的第三方应用对象相关联的第 三方应用程序并启动所述第三方应用程序具体包括: 从第三方应用注册表中 获取与所述目标第三方应用对象相关联的第三方应用程序并启动所述第三 方应用程序。
可选的, 作为不同的实施例, 选择指令为第二语音信号; 相应, 所述根 据接收的所述选择指令选择相应的目标第三方应用对象具体包括: 转换所述 第二语音信号得到第二文本; 解析所述第二文本获取第二目标语义; 根据所 述第二目标语义确定相应的目标第三方应用对象。
可选的, 作为不同的实施例, 方法还包括: 在所述从第三方应用注册表 中获取与所确定的第三方应用对象相关联的第三方应用程序并启动所述第 三方应用程序之前, 生成所述第三方应用注册表。
可选的, 作为不同的实施例, 启动所述的第三方应用程序后还包括: 向 所述第三方应用程序发送所述第一目标语义的语义命令, 以使所述第三方应 用程序根据所述第一目标语义的语义命令调用与被启动的所述第三方应用 程序相匹配的第三方应用接口 API。
可选的, 作为不同的实施例, 启动所述第三方应用程序后, 所述方法还 包括: 所述第三方应用程序转换所述第三语音信号以得到第三文本; 根据所 述第三文本的语义命令调用与被启动的所述第三方应用程序相匹配的第三 方应用接口 API, 以使所述被启动的第三方应用程序通过所述 API获取所需 的硬件或软件资源。
可选的, 作为不同的实施例, 所述方法由交互设备执行。
可选的, 作为不同的实施例, 所述方法由所述解析一个或多个第一文本 以得到第一目标语义由服务器执行; 所述根据所获取的第一目标语义确定与 所述第一目标语义相对应的第三方应用对象, 以及所述从第三方应用注册表 中获取与所确定的第三方应用对象相关联的第三方应用程序并启动所述第 三方应用程序由交互设备执行。
图 2是本发明实施例 2的语音识别方法的示意流程图。 在实施例 2中, 交互设备执行该方法, 用户将第三方应用程序下载到交互设备。 交互设备至 少包括语音识别单元、 解析单元、 确定单元、 启动单元和存储单元等。
S21 , 交互设备接收第一语音信号。
此处的第一语音信号是用户说的话, 例如 "老胡在线吗"。 交互设备可 以通过麦克风接收第一语音信号。 522, 交互设备转换第一语音信号得到一个或多个第一文本。
交互设备的语音识别单元接收到第一信号后, 将语音波形转换为文字 (Speech to text), 该文字对应上述第一文本。 语音识别单元转换后输出的第 一文本可以是 "老胡在线吗"、 "老胡在先吗" 和 /或 "老虎在线吗" 等, 发送 到确定单元进行处理。 在本发明实施例中, 交互设备包括语音识别单元以及 解析单元、 确定单元和启动单元, 分别执行本发明实施例 2中的具体步骤。 在其他实施例中语音识别单元和其余的解析单元、 确定单元、 启动单元可以 包括在不同的设备中。
523 , 交互设备按照预定的评分规则, 根据至少两个第一文本的语义 (英 文为 semantic)的自然性及连贯性分别对至少两个第一文本的语义进行评分。
交互设备的解析单元根据第一文本列举出对应的可选语义,如交互设备 的电话簿中有 "老胡", 以前用户的语音输入中有 "在线" 等出现概率较高 的词;解析单元排序出多种组合,例如 "老胡在线吗"、 "老胡在先吗"和 "老 虎在线吗", 并根据语义的自然性及连贯性, 对每个组合打分, 评分的高低 体现所述语义的自然性及连贯性的优劣。
可选的, 作为不同的实施例, 解析单元可以设置一个评分阈值, 并根据 预定的评分规则对第一文本的语义的各种组合评定分值。仅仅当该第一文本 的语义的评分最高且超过预定的阈值时, 才做任务分析, 并访问存储单元。
S28 , 当交互设备转换第一语音信号仅得到一个第一文本时, 通过第一 文本的语义直接确定第一目标语义, 可以省略评分过程 S23和 S24, 进而执 行步骤 S25及以后的步骤。
524, 交互设备获取至少两个第一文本的语义评分最高的第一文本的语 义作为第一目标语义。
可选的, 在一个实施例中, 如果交互设备的解析单元只获取了一个评分 最高的第一文本的语义例如 "老胡在线", 则接下来执行 S25。
在另一实施例中, 当没有一个第一文本的语义的评分超过预定的阈值 时, 或有两个以上的第一文本的语义的评分超过预定的阈值时, 解析单元根 据上下文, 可以向语音识别单元反馈需要输入更多信息的命令。 可选的, 解 析单元根据语义的上下文分析, "在线" 与文字通信和语音通信紧密相关, 即向语音识别单元反馈命令, 如 "你希望是语音在线通信" 还是 "文字在线 通信"。 语音识别单元可以将该命令以语音或文字的方式反馈给用户, 让用 户输入更多的要求。 例如用户回答 "语音", 由此增加 "语音" 要求, 解析 单元将 "语音" 与 "在线" 作为第一目标语义。
525 , 交互设备根据第一目标语义确定与第一目标语义相对应的第三方 应用对象。
交互设备的确定单元根据选择出的第一目标语义确定相关的任务, 如
"在线 "对应的任务为社交任务,需要启动单元启动对应的第三方应用程序, 如 QQ和 /或 Skype等。 为方便说明,本文中将候选的第三方应用程序称为第 三方应用对象。确定单元可以基于预先定义的语义和第三方应用程序之间的 关联关系,也可以是根据第三方应用程序的使用频率自行选择来通过第一目 标语义确定第三方应用对象, 本发明对此不做限定。
在一个实施例中,根据第一目标语义仅确定了一个与第一目标语义相对 应的第三方应用对象, 举例来说, Skype即时通信软件。
在另一个实施例中, 交互设备还可以包括发送单元和接收单元。 若所确 定的第三方应用对象大于等于两个, 则获取所有与第一目标语义相关联的第 三方应用对象信息; 发送单元发出提示信息, 所述提示信息包含所述第三方 应用对象信息,用以提示用户根据所述第三方应用对象信息发出进一步的选 择指令, 以便确定单元从所有与所述第一目标语义相关联的第三方应用对象 中选则目标第三方应用对象; 接收单元接收选择指令, 确定单元具体根据接 收的所述选择指令选择相应的目标第三方应用对象; 相应, 启动单元获取与 目标第三方应用对象相关联的第三方应用程序并启动该第三方应用程序。
如果根据第一目标语义查找到至少两个第三方应用对象, 举例来说,
Skype即时通信软件和 QQ社交通信软件。 则确定单元根据上下文, 可以向 语音识别单元反馈需要输入更多信息的命令, 比如向语音识别单元反馈命 令, 如 "你希望是 Skype即时通信"还是 "QQ社交通信"。 语音识别单元可 以将该命令以语音或文字的方式反馈给用户, 让用户输入更多的要求。 假如 用户回答 "Skype" , 由此解析单元将 "Skype" 作为第二目标语义。 接下来, 确定单元根据第二目标语义确定目标第三方应用对象为 Skype 即时通信软 件。
526 , 交互设备从第三方应用注册表中获取与所确定的第三方应用对象 相关联的第三方应用程序并启动该第三方应用程序。
交互设备的启动单元在第三方应用注册信息表中查找确定的第三方应 用程序并启动该第三方应用程序。
启动单元确定具体的第三方应用程序后, 再去访问存储单元。 存储单元 存储交互设备开机时生成的第三方应用注册信息表或者安装程序时生成的 第三方应用注册信息表, 查找到对应的程序如 Skype并启动该程序。 与交互 设备的出厂时自带的缺省应用的注册表不同, 第三方应用注册信息表可以在 安装第三方应用程序时生成, 也可以在安装第三方应用程序后开机时生成, 只要在从第三方应用注册表中获取与所确定的第三方应用对象相关联的第 三方应用程序并启动第三方应用程序之前就可以。 本实施例中, 第三方应用 注册信息表存储在交互设备。 在其他实施例中, 第三方应用注册信息表也可 以存储在交互设备之外的其他设备, 或同时存储在交互设备和其他设备上。
S27, 交互设备启动第三方应用程序后, 向第三方应用程序发送第一目 标语义的语义命令, 以使第三方应用程序根据第一目标语义的语义命令调用 与被启动的第三方应用程序相匹配的 API (英文为 Application Programming Interface ,中文为应用编程接口 )且通过 API获取所需的硬件和 /或软件资源。
图 3是本发明实施例的第三方应用程序启动后调用交互设备硬件的示意 图。 交互设备 31启动第三方应用程序 32后, 通过定义好的 API 33与第三 方应用程序 32进行通信。如第三方应用程序需要启动视频和语音聊天功能, 即通过交互设备操作系统 34预先定义好的视频 API和音频 API接口等, 调 用交互设备的摄像机 ( camera ) 35和麦克风( microphone ) 36。 在目前的交 互设备操作系统中定义了几千种 API用于支持各种第三方应用程序,从而实 现不同的功能, 利用这些 API, 第三方应用程序可以与交互设备平台进行通 信、 调用交互设备的各种资源。
举例来说, 第三方应用程序被启动后, 接收解析单元的语义命令, 并根 据该语义命令完成对应的操作。 例如启动第三方应用程序 QQ后, 解析单元 发送语义命令给 QQ,该命令的传递也可以是预先定义好的 API; QQ接收到 该命令后, 在 QQ好友列表中查找好友名为 "老胡" 的通信对象的状态; 并 将该状态信息反馈给解析单元, 解析单元将状态信息翻译成对应的语义, 通 过语音识别单元将该语义输出给用户,如最后以语音信息的形式输出告知用 户, 如 "老胡不在线"。 在此过程中, 第三方应用程序可能涉及用户鉴权的 问题, 及需要输入用户帐号和密码, 这类信息由第三方应用程序经过解析单 元、 语音识别单元反馈给用户; 用户通过语音或者键盘的方式输入对应的账 户号码和密码。
在本发明实施例 2中, 用户只需通过语音发出命令, 交互设备就可以对 该语音进行识别、 进而进行语义分析, 最后启动第三方应用程序, 此外还可 以在启动第三方应用程序后进一步地向用户反馈与第三方应用程序相关的 状态信息, 提供了一种新的启动第三方应用程序的方法, 极大方便了用户的 使用。
在上述各个单元例如语音识别单元和解析单元之间交互的文本的信息 可以是译码, 对用户呈现的形式可以是播放的语音或屏幕文字等。
本发明实施例利用语音识别功能, 根据用户的语音信号, 可以通过语义 分析, 确定语音信号对应的第三方应用程序; 查找第三方应用注册表信息并 启动第三方程序, 从而无需用户通过点击第三方应用程序来启动对应程序, 为用户提供了更智能的服务, 方便了用户的使用。
图 4是本发明实施例 3的第三方应用程序支持语音识别功能的示意架构 图。 实施例 3中, 交互设备具有语音识别功能, 用户将第三方应用程序下载 到交互设备, 与实施例 2不同的是, 第三方应用程序本身也支持语音识别功 能, 可以根据用户的语音输入执行相关的操作。 这样, 实施例 2和实施例 3 根据用户的语音信号识别与启动对应的第三方应用程序的过程相同, 出于筒 洁, 此处不再赘述。 不同在于第三方应用程序启动后, 实施例 3中第三方应 用程序对任务的执行过程。
参考图 4, 第三方应用程序 41启动后, 由于其自身带有语音识别功能, 能够直接接收用户的语音信号 42, 并根据识别语音信号 42获得对应命令, 执行相关的操作。 第三方应用程序接收用户的语音信号 42, 转换得到文本, 完成文本的语义解析, 并执行语义对应的任务。 可选的, 第三方应用程序 41 可以将语义分析和任务执行放置在其对应的第三方服务器 43上实现。 当根 据文本的语义命令, 第三方应用程序 41需要调用交互设备 44的资源时, 通 过交互设备的操作系统 45定义的 API 46与交互设备联系, 并启动相关的资 源, 如调用摄像机 47和麦克风 48进行视频通信。 举例来说, 如果交互设备 启动的第三方应用程序为 QQ, 对应的第三方服务器例如为腾讯公司的服务 器, 第三方应用程序与第三方服务器之间的通信可以为第三方定义的私有接 口和协议。 第三方应用程序如果为地图, 该第三方应用程序需要调用交互设 备的 GPS信息, 可以通过对应的 API查询交互设备的地址信息, 同时具体 的地图信息可以由第三方应用程序访问自己的第三方服务器, 获取对应地图 信息。
图 5是本发明实施例 4的语音识别系统 50的示意架构图。 系统 50包括 交互设备 51和服务器 52。 交互设备 51可以包括语音识别单元 511、 存储单 元 512、 操作系统 513、 摄像机 514和麦克风 515等。 服务器 52可以包括第 三方应用程序 522、 存储单元 523以及解析单元、 确定单元、 启动单元, 其 中解析单元、 确定单元和启动单元在图 5中以 521指示。 交互设备 51与服 务器 52通过网络 53连接。 操作系统 513定义了多个 API 516用于支持各种 第三方应用程序 522。
在图 5的语音识别方法系统中, 交互设备仅具备语音识别功能, 而语义 分析、 启动第三方应用程序的功能都由服务器完成, 且第三方应用程序存储 在服务器上。 可选的, 这些第三方应用程序不支持语音识别功能。 在交互设 备或者服务器上可以同时包括存储单元、 或者放置在二者之一上, 启动单元 访问存储单元时可以择一地选择一个存储单元; 交互设备通过网络与服务器 连接, 网络提供二者通信的管道, 可以为无线网络或者有线网络。
基于图 5中系统 50的架构, 图 6是本发明实施例 5的语音识别方法的 示意流程图。
561 , 交互设备转换第一语音信号得到一个或多个第一文本, 将第一文 本发送至服务器。
此处的第一语音信号是用户说的话, 例如 "老胡在线吗", 通过交互设 备的麦克风接收后, 由交互设备的语音识别单元将语音波形转换为第一文 本, 例如语音识别单元转换后输出的第一文本可以是 "老胡在线吗"、 "老胡 在先吗" 和 /或 "老虎在线吗" 等, 发送到服务器进行处理。
562, 服务器解析一个或多个第一文本以得到第一目标语义。 该步骤可 以包括以下具体内容。
在本实施例中, 服务器包括解析单元、 确定单元和启动单元。 服务器的 解析单元接收交互设备发送的转换第一语音信号得到的至少两个第一文本, 按照预定的评分规则,根据至少两个第一文本的语义的自然性及连贯性对所 述至少两个第一文本中的每个第一文本的语义进行评分, 所述评分的高低体 现所述语义的自然性及连贯性的优劣; 获取评分最高的第一文本的语义作为 第一目标语义。 可选的, 解析单元根据第一文本列举出对应的可选语义, 如交互设备的 电话簿中有 "老胡", 以前用户的语音输入中有 "在线" 等出现概率较高的 词; 处理单元排序出多种组合, 例如 "老胡在线吗"、 "老胡在先吗" 和 "老 虎在线吗"。 根据语义的自然性及连贯性, 对每个组合打分, 评分的高低体 现所述语义的自然性及连贯性的优劣。
可选的, 作为不同的实施例, 解析单元可以设置一个评分阈值, 并根据 预定的评分规则对第一文本的语义的各种组合评定分值,且将至少两个第一 文本的语义评分最高且超过预设阈值的所述第一文本的语义作为所述的第 一目标语义。
在一个实施例中, 如果只有一个评分最高的第一文本的语义例如 "老胡 在线", 则解析单元将第一文本的语义作为第一目标语义。
在另一实施例中, 当没有一个第一文本的语义的评分超过预定的阈值 时, 或有两个以上的第一文本的语义的评分超过预定的阈值时, 解析单元根 据上下文, 可以向交互设备的语音识别单元反馈需要输入更多信息的命令; 如上例所述, 语义 "在线" 评分后, 不满足语义的评分阈值, 解析单元根据 语义的上下文分析, "在线" 与文字通信和语音通信紧密相关, 即向语音识 别单元反馈命令, 如 "你希望是语音在线通信" 还是 "文字在线通信"。 语 音识别单元可以将该命令以语音或文字的方式反馈给用户, 让用户输入更多 的要求。 例如用户回答 "语音, 由此增加 "语音"要求, 解析单元将 "语音" 与 "在线" 作为第一目标语义。
可选的, 作为不同的实施例, 当交互设备转换第一语音信号仅得到一个 第一文本并发送到服务器时,服务器通过第一文本的语义直接确定第一目标 语义, 进而执行步骤 S63及以后的步骤。
S63 , 服务器根据第一目标语义确定与第一目标语义相对应的第三方应 用对象。
确定单元根据选择出的第一目标语义确定相关的任务, 如 "在线" 对应 的任务为社交任务, 需要启动单元启动对应的第三方应用程序, 如 QQ 和 / 或 Skype等。 为方便说明, 本文中将候选的第三方应用程序称为第三方应用 对象。 确定单元可以基于预先定义的语义和第三方应用程序之间的关联关 系,也可以是根据第三方应用程序的使用频率自行选择来通过第一目标语义 确定第三方应用对象, 本发明对此不做限定。 在一个实施例中,根据第一目标语义仅确定了一个与第一目标语义相对 应的第三方应用对象, 举例来说, Skype即时通信软件。
在另一个实施例中, 服务器还可以包括发送单元和接收单元。 若所确定 的第三方应用对象大于等于两个, 则获取所有与第一目标语义相关联的第三 方应用对象信息; 发送单元发出提示信息, 所述提示信息包含所述第三方应 用对象信息, 用以提示用户根据所述第三方应用对象信息发出进一步的选择 指令, 以便确定单元从所有与所述第一目标语义相关联的第三方应用对象中 选则目标第三方应用对象; 接收单元接收选择指令, 确定单元根据所述选择 指令选择相应的目标第三方应用对象; 相应, 启动单元获取与目标第三方应 用对象相关联的第三方应用程序并启动该第三方应用程序。
如果根据第一目标语义查找到至少两个第三方应用对象, 举例来说,
Skype即时通信软件和 QQ社交通信软件。 则处理单元根据上下文, 可以向 语音识别单元反馈需要输入更多信息的命令; 向语音识别单元反馈命令, 如 "你希望是 Skype即时通信"还是 "QQ社交通信"。 语音识别单元可以将该 命令以语音或文字的方式反馈给用户, 让用户输入更多的要求。 例如用户回 答 "Skype" , 由此解析单元将 "Skype" 作为第二目标语义。 接下来, 确定 单元根据第二目标语义确定目标第三方应用对象为 Skype即时通信软件。
564 , 服务器从第三方应用注册表中获取与所确定的第三方应用对象相 关联的第三方应用程序并启动该第三方应用程序, 并向交互设备发送响应反 馈。
服务器的启动单元可以择一地访问服务器的存储单元或交互设备的存 储单元, 根据所存储的第三方应用注册信息表, 找到所确定的第三方应用程 序并启动该第三方应用程序。
启动单元确定具体的第三方应用程序后, 再去访问存储单元。 交互设备 的存储单元存储第三方应用注册信息表,服务器的存储单元存储第三方应用 程序和第三方应用注册信息表。 与交互设备的出厂时自带的缺省应用的注册 表不同, 第三方应用注册信息表可以在安装第三方应用程序时生成, 也可以 在安装后开机时生成。 本实施例中, 当第三方应用注册信息表存储在交互设 备的存储单元时,服务器的启动单元需要访问交互设备的存储单元以获取第 三方应用程序并启动该第三方应用程序。
565 , 交互设备接收服务器发送的响应反馈。 服务器的启动单元启动第三方应用程序后,发送单元向第三方应用程序 发送第一目标语义的语义命令, 以使第三方应用程序根据第一目标语义的语 义命令调用与被启动的第三方应用程序相匹配的 API (英文为 Application Programming Interface, 中文为应用编程接口)且通过 API获取所需的硬件 和 /或软件资源。
可选的, 第三方应用程序被启动后, 接收解析单元的语义命令, 并根据 该语义命令完成对应的操作。 例如启动第三方应用程序 QQ后, 解析单元发 送语义命令给 QQ,该命令的传递也可以是预先定义好的 API; QQ接收到该 命令后, 在 QQ好友列表中查找好友名为 "老胡" 的通信对象的状态; 并将 该状态信息反馈给解析单元, 解析单元将状态信息翻译成对应的语义, 通过 交互设备的语音识别单元将该语义输出给用户,如最后以语音信息的形式输 出告知用户, 如 "老胡不在线"。 在此过程中, 第三方应用程序可能涉及用 户鉴权的问题, 及需要输入用户帐号和密码, 这类信息由第三方应用程序经 过服务器的发送单元、 交互设备的语音识别单元反馈给用户; 用户通过语音 或者键盘的方式输入对应的账户号码和密码。
在上述各个单元例如语音识别单元和解析单元之间交互的文本的信息 可以是译码, 对用户呈现的形式可以是播放的语音或屏幕文字等。
本发明实施例利用语音识别功能, 根据用户的语音信号, 可以通过语义 分析, 确定语音信号对应的第三方应用程序; 查找第三方应用注册表信息并 启动第三方程序, 从而无需用户通过点击第三方应用程序来启动对应程序, 为用户提供了更智能的服务, 方便了用户的使用。
可选的, 作为另一个实施例 6, 与实施例 4和 5的不同之处在于第三方 应用程序本身也支持语音识别功能,可以根据用户的语音输入执行相关的操 作。 这样, 实施例 6和实施例 4根据用户的语音信号识别与启动对应的第三 方应用程序的过程相同, 参考实施例 5 , 出于筒洁, 此处不再赘述。 不同在 于第三方应用程序启动后,实施例 6中对第三方应用程序对任务的执行过程。
第三方应用程序启动后, 由于其自身带有语音识别功能, 能够直接接收 用户的语音信号, 并根据该命令执行相关的操作。 第三方应用程序接收用户 的语音信号,转换得到文本, 完成文本的语义解析,并执行语义对应的任务。 第三方应用程序将语义分析和任务执行放置在其对应的第三方服务器上实 现。 当根据文本的语义命令, 第三方应用程序需要调用交互设备的资源时, 通过交互设备的操作系统定义的 API与交互设备联系, 并启动相关的资源, 如调用摄像机和麦克风进行视频通信, 此时, 如果交互设备启动的第三方应 用程序为 QQ, 对应的第三方服务器例如为腾讯公司的服务器, 第三方应用 程序与第三方服务器之间的通信可以为第三方自己定义的私有接口和协议。 第三方应用程序如果为地图, 该第三方应用程序需要调用交互设备的 GPS 信息, 可以通过对应的 API查询交互设备的地址信息, 同时具体的地图信息 可以由第三方应用程序访问自己的第三方服务器, 获取对应地图信息。
图 7是本发明实施例 7的语音识别的交互设备 70的示意框图。 交互设 备 70可以包括: 解析单元 71、确定单元 72、启动单元 73和存储单元 74等。 举例来说, 交互设备可以是移动终端或平板电脑(Tablet PC )等装置。
解析单元 71解析一个或多个第一文本以得到第一目标语义, 所述第一 文本由第一语音信号转换得到。
确定单元 72根据解析单元所获取的第一目标语义确定与所述第一目标 语义相对应的第三方应用对象, 所述第三方应用对象相关联的第三方应用程 序是设备出厂时未经授信通过语音方式启动的程序。
启动单元 73访问所述存储单元, 从所述存储单元存储的第三方应用注 册表中获取与所确定的第三方应用对象相关联的第三方应用程序并启动所 述第三方应用程序。
存储单元 74存储第三方应用注册表和所述第三方应用程序。
交互设备 70可以执行本发明实施例 1的语音识别方法, 具体细节此处 不再赘述。
本发明实施例利用语音识别功能, 根据用户的语音信号, 可以通过语义 分析, 确定语音信号对应的第三方应用程序; 查找第三方应用注册表信息并 启动第三方程序, 从而无需用户通过点击第三方应用程序来启动对应程序, 为用户提供了更智能的服务, 方便了用户的使用。
图 8是本发明实施例 8的语音识别的交互设备 80的示意框图。 交互设 备 80包括语音识别单元 81、 解析单元 82、 确定单元 83、 启动单元 84、 存 储单元 85和生成单元 86, 以及发送单元 87和接收单元 88。 交互设备 80的 解析单元 82、 确定单元 83、 启动单元 84、 存储单元 85与交互设备 70的解 析单元 71、 确定单元 72、 启动单元 73和存储单元 74相同或相似, 不同之 处在于, 交互设备 80还包括语音识别单元 81和生成单元 86, 以及发送单元 87和接收单元 88。
语音识别单元 81转换第一语音信号得到一个或多个第一文本, 并发送 到所述解析单元 82。
生成单元 86 当所述交互设备在所述从第三方应用注册表中获取与所确 定的第三方应用对象相关联的第三方应用程序并启动所述第三方应用程序 之前, 生成所述第三方应用注册表并发送到所述存储单元 85。
解析单元 82具体转换第一语音信号得到至少两个第一文本; 按照预定 的评分规则,根据所述至少两个第一文本的语义的自然性及连贯性对所述至 少两个第一文本中的每个第一文本的语义进行评分, 所述评分的高低体现所 述语义的自然性及连贯性的优劣; 获取所述至少两个第一文本的语义评分最 高的所述第一文本的语义作为所述的第一目标语义。
解析单元 82具体获取所述至少两个第一文本的语义评分最高且超过预 设阈值的所述第一文本的语义作为所述的第一目标语义。
确定单元 83具体根据所述第一目标语义, 确定与所述第一目标语义相 对应的所有第三方应用对象; 若所述第三方应用对象大于等于两个, 则获取 所有与所述第一目标语义相关联的第三方应用对象信息; 发送单元 87发出 提示信息, 所述提示信息包含所述第三方应用对象信息, 用以提示用户根据 所述第三方应用对象信息发出进一步的选择指令, 以便确定单元 83从所有 与所述第一目标语义相关联的第三方应用对象中选则目标第三方应用对象; 接收单元 88接收选择指令,确定单元 83具体根据接收的所述选择指令选择 相应的目标第三方应用对象; 启动单元具体从所述存储单元存储的第三方应 用注册表中获取与所述目标第三方应用对象相关联的第三方应用程序并启 动所述第三方应用程序。
语音识别单元 81还当所述的选择指令为第二语音信号时, 转换所述第 二语音信号得到第二文本并发送到所述解析单元 82; 解析单元 82还解析所 述第二文本获取第二目标语义; 确定单元 83根据所述第二目标语义确定相 应的目标第三方应用对象。
启动单元 84还在启动所述第三方应用程序后, 向所述第三方应用程序 发送所述第一目标语义的语义命令, 以使所述第三方应用程序根据所述第一 目标语义的语义命令调用与被启动的所述第三方应用程序相匹配的第三方 应用接口 API。 交互设备 80实现了本发明实施例 1、 2或 3的语音识别方法,出于筒洁, 具体细节此处不再赘述。
本发明实施例利用语音识别功能, 根据用户的语音信号, 可以通过语义 分析, 确定语音信号对应的第三方应用程序; 查找第三方应用注册表信息并 启动第三方程序, 从而无需用户通过点击第三方应用程序来启动对应程序, 为用户提供了更智能的服务, 方便了用户的使用。
图 9是本发明实施例 9的语音识别的服务器 90的示意框图。 服务器 90 包括解析单元 91、 确定单元 92、 启动单元 93和存储单元 94等。 服务器 90 为交互设备提供支持服务。
解析单元 91解析一个或多个第一文本以得到第一目标语义, 其中所述 第一文本由转换第一语音信号获得。
确定单元 92根据解析单元所获取的第一目标语义确定与所述第一目标 语义相对应的第三方应用对象, 所述第三方应用对象相关联的第三方应用程 序是设备出厂时未经授信通过语音方式启动的程序。
启动单元 93从所述存储单元 92存储的第三方应用注册表中获取与所确 定的第三方应用对象相关联的第三方应用程序并启动所述第三方应用程序。
存储单元 92存储所述第三方应用注册表和所述第三方应用程序。
服务器 90实现了本发明实施例 1的语音识别方法, 出于筒洁, 具体细 节此处不再赘述。
本发明实施例利用语音识别功能, 根据用户的语音信号, 可以通过语义 分析, 确定语音信号对应的第三方应用程序; 查找第三方应用注册表信息并 启动第三方程序, 从而无需用户通过点击第三方应用程序来启动对应程序, 为用户提供了更智能的服务, 方便了用户的使用。
图 10是本发明实施例 10的语音识别的服务器 100的示意框图。服务器 100包括解析单元 101、 确定单元 102、 启动单元 103、 存储单元 104和生成 单元 105 ,以及发送单元 106和接收单元 107等。服务器 100的解析单元 101、 确定单元 102、 启动单元 103和存储单元 104与服务器 90的解析单元 91、 确定单元 92、 启动单元 93和存储单元 94相同或相似。 不同的是, 服务器 100还包括生成单元 105以及发送单元 106和接收单元 107。
生成单元 105当所述服务器在所述从第三方应用注册表中获取与所确定 的第三方应用对象相关联的第三方应用程序并启动所述第三方应用程序之 前, 生成所述第三方应用注册表并发送到所述存储单元 104。
解析单元 101具体按照预定的评分规则,根据转换第一语音信号得到的 至少两个所述第一文本的语义的自然性及连贯性对所述至少两个第一文本 中的每个第一文本的语义进行评分,所述评分的高低体现所述语义的自然性 及连贯性的优劣; 获取所述至少两个第一文本的语义评分最高的所述第一文 本的语义作为所述的第一目标语义。
解析单元 101具体获取所述至少两个第一文本的语义评分最高且超过预 设阈值的所述第一文本的语义作为所述的第一目标语义。
确定单元 102还根据所述第一目标语义,确定与所述第一目标语义相对 应的所有第三方应用对象; 若所述第三方应用对象大于等于两个, 则获取所 有与所述第一目标语义相关联的第三方应用对象信息; 发送单元 106发出提 示信息, 所述提示信息包含所述第三方应用对象信息, 用以提示用户根据所 述第三方应用对象信息发出进一步的选择指令, 以便确定单元 101从所有与 所述第一目标语义相关联的第三方应用对象中选则目标第三方应用对象;接 收单元 107接收选择指令,确定单元 101根据接收的所述选择指令选择相应 的目标第三方应用对象; 启动单元 103从所述存储单元 104存储的第三方应 用注册表中获取与所述目标第三方应用对象相关联的第三方应用程序并启 动所述第三方应用程序。
解析单元 101还当所述的选择指令为第二语音信号时,解析所述第二文 本获取第二目标语义, 其中所述第二文本由转换所述第二语音信号获得; 确 定单元 102根据所述第二目标语义确定相应的目标第三方应用对象。
启动单元 103还在启动所述的第三方应用程序后, 向所述第三方应用程 序发送所述第一目标语义的语义命令, 以使所述第三方应用程序根据所述第 一目标语义的语义命令调用与被启动的所述第三方应用程序相匹配的第三 方应用接口 API。
启动单元 103在启动所述第三方应用程序后, 以使所述第三方应用程序 转换所述第三语音信号以得到第三文本; 根据所述第三文本的语义命令调用 与被启动的所述第三方应用程序相匹配的第三方应用接口 API, 以使所述被 启动的第三方应用程序通过所述 API获取所需的硬件或软件资源。
接收单元 107还在解析第一文本以得到第一目标语义前,接收由交互设 备发送的所述第一文本。 发送单元 106在启动所述第三方应用程序后向所述交互设备发送响应反 馈。
服务器 100实现了本发明实施例 1、 5或 6的语音识别方法, 出于筒洁, 具体细节此处不再赘述。
本发明实施例利用语音识别功能, 根据用户的语音信号, 可以通过语义 分析, 确定语音信号对应的第三方应用程序; 查找第三方应用注册表信息并 启动第三方程序, 从而无需用户通过点击第三方应用程序来启动对应程序, 为用户提供了更智能的服务, 方便了用户的使用。
图 11是本发明实施例 11的语音识别的装置 110的示意框图。 装置 110 包括处理器 111和存储器 112等。
存储器 112可以是包括 RAM和 ROM、或任何固定的存储介质、或可移 动的存储介质, 用于存储可以执行本发明实施例的程序, 或本发明实施例中 待处理的数据, 或第三方应用程序等。 如总线的方式与其他装置双向通信。
作为一种实现方式的装置 110, 处理器 111解析一个或多个第一文本以 得到第一目标语义, 所述第一文本由第一语音信号转换得到; 根据所获取的 第一目标语义确定与所述第一目标语义相对应的第三方应用对象, 所述第三 方应用对象相关联的第三方应用程序是设备出厂时未经授信通过语音方式 启动的程序; 访问存储器, 从所述存储器存储的第三方应用注册表中获取与 所确定的第三方应用对象相关联的第三方应用程序并启动所述第三方应用 程序。 其中存储器 112存储第三方应用注册表和所述第三方应用程序, 以及 处理器 111执行上述方法的指令。
当装置 110还可以包括麦克风等采集语音信号的器件或模块等时, 处理 器 111转换麦克风等采集的第一语音信号得到一个或多个第一文本。
此外, 处理器 111具体转换第一语音信号得到至少两个第一文本; 按照 预定的评分规则,根据所述至少两个第一文本的语义的自然性及连贯性对所 述至少两个第一文本中的每个第一文本的语义进行评分, 所述评分的高低体 现所述语义的自然性及连贯性的优劣; 获取所述至少两个第一文本的语义评 分最高的所述第一文本的语义作为所述的第一目标语义。
处理器 111具体获取所述至少两个第一文本的语义评分最高且超过预设 阈值的所述第一文本的语义作为所述的第一目标语义。
处理器 111还根据所述第一目标语义, 确定与所述第一目标语义相对应 的所有第三方应用对象; 若所述第三方应用对象大于等于两个, 则获取所有 与所述第一目标语义相关联的第三方应用对象信息; 发出提示信息, 所述提 示信息包含所述第三方应用对象信息, 用以提示用户根据所述第三方应用对 象信息发出进一步的选择指令, 以从所有与所述第一目标语义相关联的第三 方应用对象中选则目标第三方应用对象; 接收选择指令, 并根据接收的所述 选择指令选择相应的目标第三方应用对象; 从所述存储器 112存储的第三方 应用注册表中获取与所述目标第三方应用对象相关联的第三方应用程序并 启动所述第三方应用程序。
处理器 111还当所述的选择指令为第二语音信号时, 转换所述第二语音 信号得到第二文本; 解析所述第二文本获取第二目标语义; 根据所述第二目 标语义确定相应的目标第三方应用对象。
处理器 111在从第三方应用注册表中获取与所确定的第三方应用对象相 关联的第三方应用程序并启动第三方应用程序之前, 生成第三方应用注册 表。
处理器 111还在启动所述第三方应用程序后, 向所述第三方应用程序发 送所述第一目标语义的语义命令, 以使所述第三方应用程序根据所述第一目 标语义的语义命令调用与被启动的所述第三方应用程序相匹配的第三方应 用接口 API。
处理器 111当装置 110在所述从第三方应用注册表中获取与所确定的第 三方应用对象相关联的第三方应用程序并启动所述第三方应用程序之前, 生 成所述第三方应用注册表并发送到所述存储器 112。
作为另一种实现方式, 当装置 110不包括麦克风等采集语音信号的器件 或模块等时, 处理器 111还在解析第一文本以得到第一目标语义前, 接收由 另一装置发送的所述第一文本; 解析第一文本以得到第一目标语义, 其中可 选的, 所述第一文本由转换第一语音信号获得至少两个第一文本, 根据所获 取的第一目标语义确定与所述第一目标语义相对应的第三方应用对象,从所 述存储器 112存储的第三方应用注册表中获取与所确定的第三方应用对象相 关联的第三方应用程序并启动所述第三方应用程序。 其中存储器 112存储所 述第三方应用注册表和所述第三方应用程序, 以及处理器 111执行上述方法 的指令。。
处理器 111当装置 111安装第三方应用程序时或当装置 111开机后生成 所述第三方应用注册表并发送到所述存储器 112。
处理器 111具体按照预定的评分规则, 根据转换第一语音信号得到的至 少两个第一文本的语义的自然性及连贯性对所述至少两个第一文本中的每 个第一文本的语义进行评分, 所述评分的高低体现所述语义的自然性及连贯 性的优劣; 获取所述至少两个第一文本的语义评分最高的所述第一文本的语 义作为所述的第一目标语义。
处理器 111具体获取所述至少两个第一文本的语义评分最高且超过预设 阈值的所述第一文本的语义作为所述的第一目标语义。
处理器 111还根据所述第一目标语义, 确定与所述第一目标语义相对应 的所有第三方应用对象; 若所述第三方应用对象大于等于两个, 则获取所有 与所述第一目标语义相关联的第三方应用对象信息; 发出提示信息, 所述提 示信息包含所述第三方应用对象信息, 用以提示用户根据所述第三方应用对 象信息发出进一步的选择指令, 以从所有与所述第一目标语义相关联的第三 方应用对象中选则目标第三方应用对象; 接收选择指令, 并根据接收的所述 选择指令选择相应的目标第三方应用对象; 从所述存储器 112存储的第三方 应用注册表中获取与所述目标第三方应用对象相关联的第三方应用程序并 启动所述第三方应用程序。
处理器 111还当所述的选择指令为第二语音信号时, 解析所述第二文本 获取第二目标语义, 其中所述第二文本由转换所述第二语音信号获得; 根据 所述第二目标语义确定相应的目标第三方应用对象。
处理器 111还启动所述的第三方应用程序后, 向所述第三方应用程序发 送所述第一目标语义的语义命令, 以使所述第三方应用程序根据所述第一目 标语义的语义命令调用与被启动的所述第三方应用程序相匹配的第三方应 用接口 API。
处理器 111在启动所述第三方应用程序后, 以使所述第三方应用程序转 换所述第三语音信号以得到第三文本; 根据所述第三文本的语义命令调用与 被启动的所述第三方应用程序相匹配的第三方应用接口 API , 以使所述被启 动的第三方应用程序通过所述 API获取所需的硬件或软件资源。
处理器 111 在启动所述第三方应用程序后向上述另一装置发送响应反 馈。
装置 110实现了本发明实施例 1、 2、 3、 5或 6中的语音识别的方法, 作为具体实现形式可以是例如移动终端、 平板电脑或服务器等。
本发明实施例利用语音识别功能, 根据用户的语音信号, 可以通过语义 分析, 确定语音信号对应的第三方应用程序; 查找第三方应用注册表信息并 启动第三方程序, 从而无需用户通过点击第三方应用程序来启动对应程序, 为用户提供了更智能的服务, 方便了用户的使用。
本发明实施例还提供了一种语音识别系统, 可以包括上述实施例中的装 置 110、 交互设备 70、 交互设备 80、 服务器 90和服务器 100中任一装置。 出于筒洁, 具体细节此处不再赘述。
本领域普通技术人员可以意识到, 结合本文中所公开的实施例描述的各 示例的单元及算法步骤, 能够以电子硬件、 或者计算机软件和电子硬件的结 合来实现。 这些功能究竟以硬件还是软件方式来执行, 取决于技术方案的特 定应用和设计约束条件。 专业技术人员可以对每个特定的应用来使用不同方 法来实现所描述的功能, 但是这种实现不应认为超出本发明的范围。
所属领域的技术人员可以清楚地了解到, 为描述的方便和筒洁, 上述描 述的系统、 装置和单元的具体工作过程, 可以参考前述方法实施例中的对应 过程, 在此不再赘述。
在本申请所提供的几个实施例中, 应该理解到, 所揭露的系统、 装置和 方法, 可以通过其它的方式实现。 例如, 以上所描述的装置实施例仅仅是示 意性的, 例如, 所述单元的划分, 仅仅为一种逻辑功能划分, 实际实现时可 以有另外的划分方式, 例如多个单元或组件可以结合或者可以集成到另一个 系统, 或一些特征可以忽略, 或不执行。 另一点, 所显示或讨论的相互之间 的耦合或直接耦合或通信连接可以是通过一些接口, 装置或单元的间接耦合 或通信连接, 可以是电性, 机械或其它的形式。
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作 为单元显示的部件可以是或者也可以不是物理单元, 即可以位于一个地方, 或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或 者全部单元来实现本实施例方案的目的。
另外, 在本发明各个实施例中的各功能单元可以集成在一个处理单元 中, 也可以是各个单元单独物理存在, 也可以两个或两个以上单元集成在一 个单元中。
所述功能如果以软件功能单元的形式实现并作为独立的产品销售或使 用时, 可以存储在一个计算机可读取存储介质中。 基于这样的理解, 本发明 的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的部 分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质 中, 包括若干指令用以使得一台计算机设备(可以是个人计算机, 服务器, 或者网络设备等)执行本发明各个实施例所述方法的全部或部分步骤。 而前 述的存储介质包括: U盘、移动硬盘、只读存储器( ROM , Read-Only Memory )、 随机存取存储器(RAM, Random Access Memory ), 磁碟或者光盘等各种可 以存储程序代码的介质。
以上所述, 仅为本发明的具体实施方式, 但本发明的保护范围并不局限 于此, 任何熟悉本技术领域的技术人员在本发明揭露的技术范围内, 可轻易 想到变化或替换, 都应涵盖在本发明的保护范围之内。 因此, 本发明的保护 范围应所述以权利要求的保护范围为准。

Claims

权利要求
1、 一种语音识别的方法, 其特征在于:
解析一个或多个第一文本以得到第一目标语义, 所述第一文本由第一语 音信号转换得到;
根据所获取的第一目标语义确定与所述第一目标语义相对应的第三方 应用对象, 所述第三方应用对象相关联的第三方应用程序是设备出厂时未经 授信通过语音方式启动的程序;
从第三方应用注册表中获取与所确定的第三方应用对象相关联的第三 方应用程序并启动所述第三方应用程序。
2、 根据权利要求 1所述的方法, 其特征在于, 在解析所述第一文本以 得到第一目标语义之前, 所述方法还包括: 获取第一语音信号并转换所述第 一语音信号得到所述一个或多个第一文本。
3、 根据权利要求 1或 2所述的方法, 其特征在于, 所述转换所述第一 语音信号得到多个第一文本包括:
转换第一语音信号得到至少两个第一文本;
相应, 所述解析所述第一文本以得到第一目标语义包括:
按照预定的评分规则,根据所述至少两个第一文本的语义的自然性及连 贯性对所述至少两个第一文本中的每个第一文本的语义进行评分, 所述评分 的高低体现所述语义的自然性及连贯性的优劣;
获取所述至少两个第一文本的语义中评分最高的所述第一文本的语义 作为所述的第一目标语义。
4、 根据权利要求 3所述的方法, 其特征在于, 所述获取所述至少两个 第一文本的语义中评分最高的所述第一文本的语义作为所述的第一目标语 义, 包括:
获取所述至少两个第一文本的语义中评分最高且超过预设阈值的所述 第一文本的语义作为所述的第一目标语义。
5、 根据权利要求 1至 4任意一项所述的方法, 其特征在于, 所述根据 所获取的第一目标语义确定与所述第一目标语义相对应的第三方应用对象 具体包括:
根据所述第一目标语义,确定与所述第一目标语义相对应的所有第三方 应用对象; 若所述第三方应用对象大于等于两个, 则获取所有与所述第一目标语义 相关联的第三方应用对象信息;
发出提示信息, 所述提示信息包含所述第三方应用对象信息, 用以提示 用户根据所述第三方应用对象信息发出进一步的选择指令, 以从所有与所述 第一目标语义相关联的第三方应用对象中选则目标第三方应用对象;
接收选择指令, 并根据所述选择指令选择相应的目标第三方应用对象; 相应, 所述从第三方应用注册表中获取与所确定的第三方应用对象相关 联的第三方应用程序并启动所述第三方应用程序具体包括:
从第三方应用注册表中获取与所述目标第三方应用对象相关联的第三 方应用程序并启动所述第三方应用程序。
6、 根据权利要求 5所述的方法, 其特征在于, 所述的选择指令为第二 语音信号; 相应,
所述根据接收的所述选择指令选择相应的目标第三方应用对象具体包 括:
转换所述第二语音信号得到第二文本;
解析所述第二文本获取第二目标语义;
根据所述第二目标语义确定相应的目标第三方应用对象。
7、 根据权利要求 1至 6任一项所述的方法, 其特征在于, 在所述从第 三方应用注册表中获取与所确定的第三方应用对象相关联的第三方应用程 序并启动所述第三方应用程序之前, 所述方法还包括:
生成第三方应用注册表。
8、 根据权利要求 1至 7任意一项所述的方法, 其特征在于, 所述启动 所述的第三方应用程序后还包括:
向所述第三方应用程序发送所述第一目标语义的语义命令, 以使所述第 三方应用程序根据所述第一目标语义的语义命令调用与被启动的所述第三 方应用程序相匹配的第三方应用接口 API, 以使所述被启动的第三方应用程 序通过所述 API获取所需的硬件或软件资源。
9、 根据权利要求 1至 7任意一项所述的方法, 其特征在于, 所述启动 所述第三方应用程序后, 所述方法还包括:
所述第三方应用程序接收用户的第三语音信号;
转换所述第三语音信号以得到第三文本; 根据所述第三文本的语义命令调用与被启动的所述第三方应用程序相 匹配的第三方应用接口 API, 以使所述被启动的第三方应用程序通过所述 API获取所需的硬件或软件资源。
10、 根据权利要求 1至 9任意一项所述的方法, 其特征在于, 所述方法 由交互设备执行。
11、 根据权利要求 1至 9任意一项所述的方法, 其特征在于: 所述解析一个或多个第一文本以得到第一目标语义由服务器执行; 所述根据所获取的第一目标语义确定与所述第一目标语义相对应的第 三方应用对象, 以及所述从第三方应用注册表中获取与所确定的第三方应用 对象相关联的第三方应用程序并启动所述第三方应用程序由交互设备执行。
12、 一种语音识别的交互设备, 其特征在于, 所述交互设备包括解析单 元、 确定单元、 启动单元和存储单元:
所述解析单元, 用于解析一个或多个第一文本以得到第一目标语义, 所 述第一文本由第一语音信号转换得到;
所述确定单元, 用于根据所述解析单元所获取的第一目标语义确定与所 述第一目标语义相对应的第三方应用对象, 所述第三方应用对象相关联的第 三方应用程序是设备出厂时未经授信通过语音方式启动的程序;
所述启动单元, 用于访问所述存储单元, 从所述存储单元存储的第三方 应用注册表中获取与所述确定单元所确定的第三方应用对象相关联的第三 方应用程序并启动所述第三方应用程序;
所述存储单元, 用于存储第三方应用注册表和所述第三方应用程序。
13、 根据权利要求 12所述的交互设备, 其特征在于, 所述交互设备还 还包括语音识别单元:
所述语音识别单元, 用于转换第一语音信号得到一个或多个第一文本, 并发送到所述解析单元。
14、 根据权利要求 12所述的交互设备, 其特征在于, 所述交互设备还 包括语音识别单元:
所述语音识别单元用于转换第一语音信号得到至少两个第一文本并发 送到所述解析单元;
所述解析单元具体用于按照预定的评分规则,根据所述至少两个第一文 本的语义的自然性及连贯性对所述至少两个第一文本中的每个第一文本的 语义进行评分, 所述评分的高低体现所述语义的自然性及连贯性的优劣; 获 取所述至少两个第一文本的语义评分最高的所述第一文本的语义作为所述 的第一目标语义。
15、 根据权利要求 14所述的交互设备, 其特征在于, 所述解析单元具 体用于:
获取所述至少两个第一文本的语义评分最高且超过预设阈值的所述第 一文本的语义作为所述的第一目标语义。
16、 根据权利要求 12至 15任意一项所述的交互设备, 其特征在于, 所 述交互设备还包括发送单元和接收单元:
所述确定单元, 具体用于根据所述第一目标语义, 确定与所述第一目标 语义相对应的所有第三方应用对象; 若所述第三方应用对象大于等于两个, 则获取所有与所述第一目标语义相关联的第三方应用对象信息;
所述发送单元, 用于发出提示信息, 所述提示信息包含所述第三方应用 对象信息, 用以提示用户根据所述第三方应用对象信息发出进一步的选择指 令, 以便所述确定单元从所有与所述第一目标语义相关联的第三方应用对象 中选则目标第三方应用对象;
所述接收单元, 用于接收用户发送的选择指令;
所述确定单元, 具体用于根据接收的所述选择指令选择相应的目标第三 方应用对象; 所述启动单元从所述存储单元存储的第三方应用注册表中获取 与所述目标第三方应用对象相关联的第三方应用程序并启动所述第三方应 用程序。
17、 根据权利要求 16所述的交互设备, 其特征在于:
所述语音识别单元还用于当所述的选择指令为第二语音信号时,转换所 述第二语音信号得到第二文本并发送到所述解析单元;
所述解析单元还用于解析所述第二文本获取第二目标语义; 所述确定单 元还用于根据所述第二目标语义确定相应的目标第三方应用对象。
18、 根据权利要求 12至 17任一项所述的交互设备, 其特征在于, 所述 交互设备还包括生成单元:
所述生成单元在所述启动单元从第三方应用注册表中获取与所确定的 第三方应用对象相关联的第三方应用程序并启动所述第三方应用程序之前, 生成所述第三方应用注册表并发送到所述存储单元。
19、 根据权利要求 11至 18任一项所述的交互设备, 其特征在于: 所述启动单元启动所述第三方应用程序后, 向所述第三方应用程序发送 所述第一目标语义的语义命令, 以使所述第三方应用程序根据所述第一目标 语义的语义命令调用与被启动的所述第三方应用程序相匹配的第三方应用 接口 API。
20、 一种语音识别的服务器, 其特征在于, 包括解析单元、 确定单元、 启动单元和存储单元:
所述解析单元, 用于解析一个或多个第一文本以得到第一目标语义, 其 中所述第一文本由转换第一语音信号获得;
所述确定单元, 用于根据所述解析单元所获取的第一目标语义确定与所 述第一目标语义相对应的第三方应用对象, 所述第三方应用对象相关联的第 三方应用程序是设备出厂时未经授信通过语音方式启动的程序;
所述启动单元, 用于从所述存储单元存储的第三方应用注册表中获取与 所述确定单元所确定的第三方应用对象相关联的第三方应用程序并启动所 述第三方应用程序;
所述存储单元用于存储所述第三方应用注册表和所述第三方应用程序。
21、 根据权利要求 20所述的服务器, 其特征在于, 所述解析单元具体 用于:
按照预定的评分规则,根据转换第一语音信号得到的至少两个所述第一 文本的语义的自然性及连贯性对所述至少两个第一文本中的每个第一文本 的语义进行评分, 所述评分的高低体现所述语义的自然性及连贯性的优劣; 获取所述至少两个第一文本的语义评分最高的所述第一文本的语义作为所 述的第一目标语义。
22、 根据权利要求 21所述的服务器, 其特征在于, 所述解析单元具体 用于:
获取所述至少两个第一文本的语义评分最高且超过预设阈值的所述第 一文本的语义作为所述的第一目标语义。
23、 根据权利要求 20至 22任意一项所述的服务器, 其特征在于, 所述 服务器还包括发送单元和接收单元:
所述确定单元, 具体用于根据所述第一目标语义, 确定与所述第一目标 语义相对应的所有第三方应用对象; 若所述第三方应用对象大于等于两个, 则获取所有与所述第一目标语义相关联的第三方应用对象信息; 所述发送单元, 用于发出提示信息, 所述提示信息包含所述第三方应用 对象信息, 用以提示用户根据所述第三方应用对象信息发出进一步的选择指 令, 以便所述确定单元从所有与所述第一目标语义相关联的第三方应用对象 中选则目标第三方应用对象;
所述接收单元, 用于接收用户发送的选择指令;
所述确定单元, 具体用于根据接收的所述选择指令选择相应的目标第三 方应用对象; 所述启动单元从所述存储单元存储的第三方应用注册表中获取 与所述目标第三方应用对象相关联的第三方应用程序并启动所述第三方应 用程序。
24、 根据权利要求 23所述的服务器, 其特征在于:
所述解析单元还用于当所述的选择指令为第二语音信号时,解析所述第 二文本获取第二目标语义, 其中所述第二文本由转换所述第二语音信号获 得; 所述确定单元具体用于根据所述第二目标语义确定相应的目标第三方应 用对象。
25、 根据权利要求 20至 24任一项所述的服务器, 其特征在于, 所述服 务器还包括生成单元:
所述生成单元在所述启动单元从第三方应用注册表中获取与所确定的 第三方应用对象相关联的第三方应用程序并启动所述第三方应用程序之前, 生成所述第三方应用注册表并发送到所述存储单元。
26、 根据权利要求 19-25任一项所述的服务器, 其特征在于:
所述启动单元启动所述的第三方应用程序后, 向所述第三方应用程序发 送所述第一目标语义的语义命令, 以使所述第三方应用程序根据所述第一目 标语义的语义命令调用与被启动的所述第三方应用程序相匹配的第三方应 用接口 API。
27、 根据权利要求 20至 26任意一项所述的服务器, 其特征在于: 所述启动单元启动所述第三方应用程序后, 以使所述第三方应用程序转 换所述第三语音信号以得到第三文本; 根据所述第三文本的语义命令调用与 被启动的所述第三方应用程序相匹配的第三方应用接口 API , 以使所述被启 动的第三方应用程序通过所述 API获取所需的硬件或软件资源。
28、 根据权利要求 20至 27任意一项所述的服务器, 其特征在于, 所述 接收单元还用于在解析第一文本以得到第一目标语义前,接收由交互设备发 送的所述第一文本。
29、 根据权利要求 20至 28任意一项所述的服务器, 其特征在于: 所述发送单元还用于在启动所述第三方应用程序后向所述交互设备发 送响应反馈。
30、 一种语音识别的系统, 其特征在于, 包括如权利要求 12至 19任一 项所述的交互设备或如权利要求 20至 29任一项所述的服务器。
PCT/CN2013/090033 2013-01-06 2013-12-20 语音识别的方法、交互设备、服务器和系统 WO2014106433A1 (zh)

Priority Applications (4)

Application Number Priority Date Filing Date Title
KR1020157020795A KR101838095B1 (ko) 2013-01-06 2013-12-20 음성 인식을 위한 방법, 상호 작용 장치, 서버 및 시스템
US14/792,502 US10229684B2 (en) 2013-01-06 2015-07-06 Method, interaction device, server, and system for speech recognition
US16/256,872 US10971156B2 (en) 2013-01-06 2019-01-24 Method, interaction device, server, and system for speech recognition
US17/190,218 US11676605B2 (en) 2013-01-06 2021-03-02 Method, interaction device, server, and system for speech recognition

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201310003504.4A CN103915095B (zh) 2013-01-06 2013-01-06 语音识别的方法、交互设备、服务器和系统
CN201310003504.4 2013-01-06

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US14/792,502 Continuation US10229684B2 (en) 2013-01-06 2015-07-06 Method, interaction device, server, and system for speech recognition

Publications (1)

Publication Number Publication Date
WO2014106433A1 true WO2014106433A1 (zh) 2014-07-10

Family

ID=51040723

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2013/090033 WO2014106433A1 (zh) 2013-01-06 2013-12-20 语音识别的方法、交互设备、服务器和系统

Country Status (4)

Country Link
US (3) US10229684B2 (zh)
KR (1) KR101838095B1 (zh)
CN (1) CN103915095B (zh)
WO (1) WO2014106433A1 (zh)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109408679A (zh) * 2018-09-28 2019-03-01 平安科技(深圳)有限公司 智能管理应用程序的方法、装置、电子设备及存储介质
CN111583956A (zh) * 2020-04-30 2020-08-25 联想(北京)有限公司 语音处理方法和装置
US11482218B2 (en) * 2019-01-22 2022-10-25 Beijing Boe Technology Development Co., Ltd. Voice control method, voice control device, and computer-executable non-volatile storage medium

Families Citing this family (28)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP6491326B2 (ja) 2014-09-19 2019-03-27 華為技術有限公司Huawei Technologies Co.,Ltd. アプリケーションプログラムを実行するための方法および装置
CN104318924A (zh) * 2014-11-12 2015-01-28 沈阳美行科技有限公司 一种实现语音识别功能的方法
CN104683456B (zh) * 2015-02-13 2017-06-23 腾讯科技(深圳)有限公司 业务处理方法、服务器及终端
US20170147286A1 (en) * 2015-11-20 2017-05-25 GM Global Technology Operations LLC Methods and systems for interfacing a speech dialog with new applications
CN105912725A (zh) * 2016-05-12 2016-08-31 上海劲牛信息技术有限公司 一种通过自然语言交互调用海量智慧应用的系统
CN106373571A (zh) * 2016-09-30 2017-02-01 北京奇虎科技有限公司 一种语音控制方法及装置
CN106486119B (zh) * 2016-10-20 2019-09-20 海信集团有限公司 一种识别语音信息的方法和装置
CN107122179A (zh) * 2017-03-31 2017-09-01 阿里巴巴集团控股有限公司 语音的功能控制方法和装置
CN108288468B (zh) * 2017-06-29 2019-07-19 腾讯科技(深圳)有限公司 语音识别方法及装置
CN107492374A (zh) * 2017-10-11 2017-12-19 深圳市汉普电子技术开发有限公司 一种语音控制方法、智能设备及存储介质
CN107783705A (zh) * 2017-10-20 2018-03-09 珠海市魅族科技有限公司 展示应用程序的方法、装置、计算机装置及存储介质
CN107944954A (zh) * 2017-11-15 2018-04-20 联想(北京)有限公司 信息处理方法及其装置
KR102617265B1 (ko) 2018-03-13 2023-12-26 삼성전자주식회사 사용자 음성 입력을 처리하는 장치
CN110286955A (zh) * 2018-03-16 2019-09-27 青岛海尔多媒体有限公司 应用程序启动的方法、装置及计算机可读存储介质
JP7111818B2 (ja) 2018-03-21 2022-08-02 グーグル エルエルシー セキュアな処理環境内のデータ転送
CN108847242B (zh) * 2018-05-30 2021-05-25 Oppo广东移动通信有限公司 电子设备控制方法、装置、存储介质及电子设备
US10811009B2 (en) * 2018-06-27 2020-10-20 International Business Machines Corporation Automatic skill routing in conversational computing frameworks
CN109166582A (zh) * 2018-10-16 2019-01-08 深圳供电局有限公司 一种语音识别的自动控制系统及方法
CN109147775A (zh) * 2018-10-18 2019-01-04 深圳供电局有限公司 一种基于神经网络的语音识别方法及装置
CN109872714A (zh) * 2019-01-25 2019-06-11 广州富港万嘉智能科技有限公司 一种提高语音识别准确性的方法、电子设备及存储介质
CN110310648A (zh) * 2019-05-21 2019-10-08 深圳壹账通智能科技有限公司 移动终端的控制方法、装置、移动终端及可读存储介质
CN110503959B (zh) * 2019-09-03 2022-02-22 腾讯科技(深圳)有限公司 语音识别数据分发方法、装置、计算机设备及存储介质
CN111629164B (zh) * 2020-05-29 2021-09-14 联想(北京)有限公司 一种视频录制生成方法及电子设备
CN111968640A (zh) * 2020-08-17 2020-11-20 北京小米松果电子有限公司 语音控制方法、装置、电子设备及存储介质
US11430467B1 (en) * 2020-09-21 2022-08-30 Amazon Technologies, Inc. Interaction emotion determination
CN113012695B (zh) * 2021-02-18 2022-11-25 北京百度网讯科技有限公司 智能控制方法、装置、电子设备及计算机可读存储介质
US11978449B2 (en) * 2021-03-02 2024-05-07 Samsung Electronics Co., Ltd. Electronic device for processing user utterance and operation method therefor
CN113496703A (zh) * 2021-07-23 2021-10-12 北京百度网讯科技有限公司 通过语音方式控制程序的方法、设备及程序产品

Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1389852A (zh) * 2001-06-06 2003-01-08 松下电器产业株式会社 使用语音识别和自然语言对家居活动的自动控制
CN101067780A (zh) * 2007-06-21 2007-11-07 腾讯科技(深圳)有限公司 智能设备的文字输入系统及方法
CN101276586A (zh) * 2007-03-26 2008-10-01 京达国际科技股份有限公司 语音控制装置及方法
CN101795318A (zh) * 2009-01-05 2010-08-04 三星电子株式会社 移动终端和用于提供该移动终端的应用程序的方法
CN102394976A (zh) * 2011-11-01 2012-03-28 宇龙计算机通信科技(深圳)有限公司 基于联系人启动通讯应用的方法及移动终端
CN102520788A (zh) * 2011-11-16 2012-06-27 歌尔声学股份有限公司 一种语音识别控制方法
CN102541574A (zh) * 2010-12-13 2012-07-04 鸿富锦精密工业(深圳)有限公司 应用程序开启系统及方法
KR20120090151A (ko) * 2011-02-05 2012-08-17 박재현 음성인식기술을 이용한 스마트폰에서의 어플리케이션 실행 방법
CN202533866U (zh) * 2011-11-16 2012-11-14 歌尔声学股份有限公司 一种语音识别控制系统
CN102830915A (zh) * 2012-08-02 2012-12-19 聚熵信息技术(上海)有限公司 语义输入控制系统及其方法
CN102868827A (zh) * 2012-09-15 2013-01-09 潘天华 一种利用语音命令控制手机应用程序启动的方法
CN103024169A (zh) * 2012-12-10 2013-04-03 深圳市永利讯科技股份有限公司 一种通讯终端应用程序的语音启动方法和装置

Family Cites Families (33)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6839669B1 (en) * 1998-11-05 2005-01-04 Scansoft, Inc. Performing actions identified in recognized speech
US6195636B1 (en) * 1999-02-19 2001-02-27 Texas Instruments Incorporated Speech recognition over packet networks
US6330537B1 (en) * 1999-08-26 2001-12-11 Matsushita Electric Industrial Co., Ltd. Automatic filtering of TV contents using speech recognition and natural language
CA2387079C (en) * 1999-10-19 2011-10-18 Sony Electronics Inc. Natural language interface control system
US6748361B1 (en) * 1999-12-14 2004-06-08 International Business Machines Corporation Personal speech assistant supporting a dialog manager
US7203645B2 (en) * 2001-04-27 2007-04-10 Intel Corporation Speech recognition system loading different recognition engines for different applications
US6801604B2 (en) * 2001-06-25 2004-10-05 International Business Machines Corporation Universal IP-based and scalable architectures across conversational applications using web services for speech and audio processing resources
US7167831B2 (en) 2002-02-04 2007-01-23 Microsoft Corporation Systems and methods for managing multiple grammars in a speech recognition system
US7398209B2 (en) * 2002-06-03 2008-07-08 Voicebox Technologies, Inc. Systems and methods for responding to natural language speech utterance
US7519534B2 (en) * 2002-10-31 2009-04-14 Agiletv Corporation Speech controlled access to content on a presentation medium
US7461352B2 (en) * 2003-02-10 2008-12-02 Ronald Mark Katsuranis Voice activated system and methods to enable a computer user working in a first graphical application window to display and control on-screen help, internet, and other information content in a second graphical application window
US7356472B2 (en) * 2003-12-11 2008-04-08 International Business Machines Corporation Enabling speech within a multimodal program using markup
US8706501B2 (en) * 2004-12-09 2014-04-22 Nuance Communications, Inc. Method and system for sharing speech processing resources over a communication network
US7409344B2 (en) * 2005-03-08 2008-08-05 Sap Aktiengesellschaft XML based architecture for controlling user interfaces with contextual voice commands
DE102005016853A1 (de) * 2005-04-12 2006-10-19 Siemens Ag Verfahren zur Kontrolle von sprachgesteuerten Applikationen und zugehöriges Kontrollsystem
EP1920588A4 (en) * 2005-09-01 2010-05-12 Vishal Dhawan PLATFORM OF NETWORKS OF VOICE APPLICATIONS
US9456068B2 (en) * 2005-09-01 2016-09-27 Xtone, Inc. System and method for connecting a user to business services
US7899673B2 (en) * 2006-08-09 2011-03-01 Microsoft Corporation Automatic pruning of grammars in a multi-application speech recognition interface
US7742922B2 (en) * 2006-11-09 2010-06-22 Goller Michael D Speech interface for search engines
US8000969B2 (en) * 2006-12-19 2011-08-16 Nuance Communications, Inc. Inferring switching conditions for switching between modalities in a speech application environment extended for interactive text exchanges
US20090327979A1 (en) * 2008-06-30 2009-12-31 Nokia Corporation User interface for a peripheral device
US8811965B2 (en) * 2008-10-14 2014-08-19 Todd Michael Cohan System and method for automatic data security back-up and control for mobile devices
US8639513B2 (en) * 2009-08-05 2014-01-28 Verizon Patent And Licensing Inc. Automated communication integrator
US8786664B2 (en) * 2010-04-28 2014-07-22 Qualcomm Incorporated System and method for providing integrated video communication applications on a mobile computing device
US8731939B1 (en) * 2010-08-06 2014-05-20 Google Inc. Routing queries based on carrier phrase registration
US20120059655A1 (en) * 2010-09-08 2012-03-08 Nuance Communications, Inc. Methods and apparatus for providing input to a speech-enabled application program
KR20120063372A (ko) 2010-12-07 2012-06-15 현대자동차주식회사 추상화 api 층위를 이용한 독립형 음성인식 방법 및 시스템
US20120209413A1 (en) * 2011-02-14 2012-08-16 Microsoft Corporation Background Audio on Mobile Devices
US8924219B1 (en) * 2011-09-30 2014-12-30 Google Inc. Multi hotword robust continuous voice command detection in mobile devices
US9292253B2 (en) * 2012-08-02 2016-03-22 Nuance Communications, Inc. Methods and apparatus for voiced-enabling a web application
US10276157B2 (en) * 2012-10-01 2019-04-30 Nuance Communications, Inc. Systems and methods for providing a voice agent user interface
US20140108010A1 (en) * 2012-10-11 2014-04-17 Intermec Ip Corp. Voice-enabled documents for facilitating operational procedures
US9741343B1 (en) * 2013-12-19 2017-08-22 Amazon Technologies, Inc. Voice interaction application selection

Patent Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1389852A (zh) * 2001-06-06 2003-01-08 松下电器产业株式会社 使用语音识别和自然语言对家居活动的自动控制
CN101276586A (zh) * 2007-03-26 2008-10-01 京达国际科技股份有限公司 语音控制装置及方法
CN101067780A (zh) * 2007-06-21 2007-11-07 腾讯科技(深圳)有限公司 智能设备的文字输入系统及方法
CN101795318A (zh) * 2009-01-05 2010-08-04 三星电子株式会社 移动终端和用于提供该移动终端的应用程序的方法
CN102541574A (zh) * 2010-12-13 2012-07-04 鸿富锦精密工业(深圳)有限公司 应用程序开启系统及方法
KR20120090151A (ko) * 2011-02-05 2012-08-17 박재현 음성인식기술을 이용한 스마트폰에서의 어플리케이션 실행 방법
CN102394976A (zh) * 2011-11-01 2012-03-28 宇龙计算机通信科技(深圳)有限公司 基于联系人启动通讯应用的方法及移动终端
CN102520788A (zh) * 2011-11-16 2012-06-27 歌尔声学股份有限公司 一种语音识别控制方法
CN202533866U (zh) * 2011-11-16 2012-11-14 歌尔声学股份有限公司 一种语音识别控制系统
CN102830915A (zh) * 2012-08-02 2012-12-19 聚熵信息技术(上海)有限公司 语义输入控制系统及其方法
CN102868827A (zh) * 2012-09-15 2013-01-09 潘天华 一种利用语音命令控制手机应用程序启动的方法
CN103024169A (zh) * 2012-12-10 2013-04-03 深圳市永利讯科技股份有限公司 一种通讯终端应用程序的语音启动方法和装置

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109408679A (zh) * 2018-09-28 2019-03-01 平安科技(深圳)有限公司 智能管理应用程序的方法、装置、电子设备及存储介质
US11482218B2 (en) * 2019-01-22 2022-10-25 Beijing Boe Technology Development Co., Ltd. Voice control method, voice control device, and computer-executable non-volatile storage medium
CN111583956A (zh) * 2020-04-30 2020-08-25 联想(北京)有限公司 语音处理方法和装置
CN111583956B (zh) * 2020-04-30 2024-03-26 联想(北京)有限公司 语音处理方法和装置

Also Published As

Publication number Publication date
KR101838095B1 (ko) 2018-03-13
US11676605B2 (en) 2023-06-13
US10229684B2 (en) 2019-03-12
CN103915095B (zh) 2017-05-31
US20210366483A1 (en) 2021-11-25
US20190156833A1 (en) 2019-05-23
CN103915095A (zh) 2014-07-09
US10971156B2 (en) 2021-04-06
KR20150103222A (ko) 2015-09-09
US20150310864A1 (en) 2015-10-29

Similar Documents

Publication Publication Date Title
US11676605B2 (en) Method, interaction device, server, and system for speech recognition
US10509829B2 (en) Contextual search using natural language
US10115396B2 (en) Content streaming system
KR102490776B1 (ko) 디지털 개인 비서 내에서 헤드리스로 작업을 완료하기 위한 기법
JP2023115067A (ja) アシスタントアプリケーションのための音声ユーザインタフェースショートカット
US20190034542A1 (en) Intelligent agent system and method of accessing and delivering digital files
KR102043365B1 (ko) 음성 인식 전자 디바이스에서 선택적으로 오프라인으로 실행가능한 음성 액션을 위한 데이터의 로컬 유지
WO2018077214A1 (zh) 信息搜索方法和装置
US20120059655A1 (en) Methods and apparatus for providing input to a speech-enabled application program
KR20190012255A (ko) 선택적으로 조종 가능한 상태 머신을 갖는 개인 보조 모듈 제공
KR20180115937A (ko) 대화형 콘텐츠 제공 시스템
US20180329588A1 (en) Predictive Determination of Actions
MX2008008855A (es) Sistema de interaccion social.
JP2017517904A (ja) ビデオベースの対話方法、端末、サーバ、およびシステム
TW201344479A (zh) 在對話期間提供建議的方法,使用所述方法的電子裝置,以及電腦程式產品
US10594840B1 (en) Bot framework for channel agnostic applications
WO2017156934A1 (zh) 智能互联方法和智能终端
WO2014194647A1 (en) Data exchange method, device, and system for group communication
WO2022206704A1 (zh) 一种语音交互方法及电子设备
US10997963B1 (en) Voice based interaction based on context-based directives
WO2017000929A1 (zh) 基于客户端的媒体信息投放方法及装置
WO2018145574A1 (zh) 一种信息处理方法及装置、终端、服务器、存储介质
KR20130125064A (ko) 키워드 음성 인식을 통해 관련 어플리케이션을 실행시키는 음성 통화 처리 방법 및 이를 실행하는 모바일 단말
WO2017193544A1 (zh) 资源下载方法、装置及电子设备
EP3972263A1 (en) System and method to control a media client using a message service

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 13869990

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

ENP Entry into the national phase

Ref document number: 20157020795

Country of ref document: KR

Kind code of ref document: A

122 Ep: pct application non-entry in european phase

Ref document number: 13869990

Country of ref document: EP

Kind code of ref document: A1