WO2019218903A1 - 一种语音控制的方法及装置 - Google Patents

一种语音控制的方法及装置 Download PDF

Info

Publication number
WO2019218903A1
WO2019218903A1 PCT/CN2019/085905 CN2019085905W WO2019218903A1 WO 2019218903 A1 WO2019218903 A1 WO 2019218903A1 CN 2019085905 W CN2019085905 W CN 2019085905W WO 2019218903 A1 WO2019218903 A1 WO 2019218903A1
Authority
WO
WIPO (PCT)
Prior art keywords
text data
keyword
voice
action
user
Prior art date
Application number
PCT/CN2019/085905
Other languages
English (en)
French (fr)
Inventor
李鹏
罗永浩
Original Assignee
北京字节跳动网络技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京字节跳动网络技术有限公司 filed Critical 北京字节跳动网络技术有限公司
Publication of WO2019218903A1 publication Critical patent/WO2019218903A1/zh
Priority to US17/020,509 priority Critical patent/US20200411008A1/en

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/26Speech to text systems
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/16Sound input; Sound output
    • G06F3/167Audio in a user interface, e.g. using voice commands for navigating, audio feedback
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • G06F40/35Discourse or dialogue representation
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • G10L2015/223Execution procedure of a spoken command

Definitions

  • the present application relates to the field of voice control technologies, and in particular, to a voice control method and apparatus.
  • the way of interacting with applications on smart terminals through voice is increasingly favored by users.
  • the user starts the voice control service by clicking the control of the voice control service.
  • the smart terminal presents a voice input interface to the user, and then the user makes a voice on the voice input interface to input the voice.
  • the data is such that the smart terminal operates the corresponding application according to the voice data input by the user, thereby implementing various interactions between the user and the application on the smart terminal.
  • the smart terminal needs to present the voice input interface to the user in advance, and then can perform voice interaction with the user, so that the smart terminal cannot quickly perform voice interaction with the user, and the user experience is poor. .
  • the embodiments of the present application provide a method and apparatus for voice control to improve the efficiency of voice interaction between a user and a smart terminal.
  • an embodiment of the present application provides a method for voice control, where the method includes:
  • the triggering operation being an operation of triggering voice control recognized by the client on the interaction interface
  • the converting the voice data into text data includes:
  • the initial text data is adjusted by semantic analysis of the initial text data, and the adjusted initial text data is used as the text data.
  • the generating, according to the text data, a control instruction including:
  • the text data is matched with preset instruction type text data, and a control instruction is generated based on the matched instruction type text data.
  • the method further includes:
  • Determining action keywords and/or object keywords in the adjusted initial text data by performing semantic analysis on the initial text data
  • a control instruction including:
  • the control instruction is generated based on the action keyword and/or the object keyword.
  • the text data includes an action keyword and an object keyword
  • the text data is matched with preset instruction type text data, and is generated based on the matched instruction type text data.
  • Control instructions including:
  • the control instruction is generated based on the first action keyword and the first object keyword.
  • the text data includes an action keyword
  • the text data is matched with preset instruction type text data
  • the control instruction is generated based on the matched instruction type text data, including :
  • the control instruction is generated based on the second action keyword and the second object keyword.
  • the text data includes an object keyword
  • the text data is matched with preset instruction type text data
  • the control instruction is generated based on the matched instruction type text data, including :
  • the control instruction is generated based on the third action keyword and the third object keyword.
  • the generating, according to the text data, a control instruction including:
  • the control instruction is generated based on the fourth action keyword and the fourth object keyword.
  • the method further includes:
  • the determining, according to the third object keyword, the third action keyword includes: determining, by using the third object keyword, the most applicable action keyword The third action keyword.
  • the embodiment of the present application further provides a device for voice control, where the device includes:
  • a receiving module configured to receive voice data in response to a triggering operation for the interaction interface, where the triggering operation is an operation triggered by the client to trigger voice control on the interaction interface;
  • a conversion module configured to convert the voice data into text data
  • Generating a module configured to generate a control instruction based on the text data
  • An execution module is configured to execute the control instruction.
  • the converting module includes:
  • a converting unit configured to convert the voice data into initial text data
  • an adjusting unit configured to adjust the initial text data by performing semantic analysis on the initial text data, and use the adjusted initial text data as the text data.
  • the generating module is further configured to:
  • the text data is matched with preset instruction type text data, and a control instruction is generated based on the matched instruction type text data.
  • the apparatus further includes:
  • a determining module configured to determine an action keyword and/or an object keyword in the adjusted initial text data by performing semantic analysis on the initial text data; and the generating module is further configured to: based on the action Key words and/or object keywords, producing the control instructions.
  • the text data includes an action keyword and an object keyword
  • the generating module includes:
  • a first matching unit configured to match the action keyword in the text data with the action keyword in the preset command text data to determine a first action keyword, the first action keyword Refers to the action keyword matched in the preset instruction type text data;
  • a second matching unit configured to match an object keyword in the text data with an object keyword in the preset instruction type text data, and determine a first object keyword, the first object keyword Refers to the object keyword matched in the preset instruction type text data;
  • a first generating unit configured to generate the control instruction based on the first action keyword and the first object keyword.
  • the text data includes an action keyword
  • the generating module includes:
  • a third matching unit configured to match the action keyword in the text data with the action keyword in the preset command text data to determine a second action keyword, the second action keyword Refers to the action keyword matched in the preset instruction type text data;
  • a first determining unit configured to determine a second object keyword according to the operation object of the triggering operation
  • a second generating unit configured to generate the control instruction based on the second action keyword and the second object keyword.
  • the text data includes a target keyword
  • the generating module includes:
  • a fourth matching unit configured to match an object keyword in the text data with an object keyword in the preset instruction type text data, and determine a third object keyword, the third object keyword Refers to the object keyword matched in the preset instruction type text data;
  • a second determining unit configured to determine a third action keyword according to the third object keyword
  • a third generating unit configured to generate the control instruction based on the third action keyword and the third object keyword.
  • the generating module includes:
  • a third determining unit configured to perform semantic analysis on the text data to determine a fourth action keyword
  • a fourth determining unit configured to determine a fourth object keyword according to the operation object of the triggering operation
  • a fourth generating unit configured to generate the control instruction based on the fourth action keyword and the fourth object keyword.
  • the apparatus further includes:
  • a presentation module for presenting a voice entry popup
  • the presentation form of the voice recording pop-up window when the voice data is received is different from the presentation form of the voice recording pop-up window when the voice data is not received.
  • the second determining unit is further configured to: determine, according to the third object keyword, a third action keyword, including: applicability with the third object keyword The highest action keyword is determined as the third action keyword.
  • the triggering operation recognized by the client triggers the reception of the voice data, so that the operation steps required by the user are reduced, thereby improving the interaction efficiency between the user and the client.
  • the terminal may receive the voice data in response to the triggering operation for the interaction interface, where the triggering operation is performed by the client on the interaction interface.
  • the recognized operation of triggering the voice control and then the terminal can convert the received voice data into text data, and generate and execute a control instruction for operating the application according to the text data, thereby implementing user interaction with the application.
  • the user can directly input the voice data in any area on the interaction interface without being restricted by the specific voice input interface. Therefore, the user does not need to perform related operations to switch the display interface of the terminal from the interactive interface to the voice input interface. Compared to the prior art, the user does not need to perform an operation of exiting the display window to find a control of the voice control service. The operation reduces the user's required steps, improves the interaction between the user and the client, and improves the user experience.
  • FIG. 1 is a schematic diagram of an exemplary application scenario provided by an embodiment of the present application
  • FIG. 2 is a schematic flowchart of a method for voice control according to an embodiment of the present application
  • FIG. 3 is a schematic diagram of a software architecture of an exemplary application scenario according to an embodiment of the present disclosure
  • FIG. 4 is a schematic structural diagram of an apparatus for voice control according to an embodiment of the present application.
  • the terminal since the user needs to input voice data on a specific voice input interface each time, the terminal must first present a specific voice input interface to the user before performing various applications with the user. Interaction, which reduces the efficiency of interaction between the user and the application. Especially when the user accesses the service provided by the application, if the user wants to interact with the application through voice control, the user also needs to exit the current application on the smart terminal first. Then, the voice data for the application is input on the voice input interface presented by the smart terminal, so that the voice control can be used to interact with the application. It can be seen that the user can only input voice data on a specific voice input interface. In this way, the user needs to perform more operations, so that the interaction between the user and the application is less efficient, and the user experience is also poor.
  • the user when the user needs to maximize the display window, the user needs to perform the operation of exiting the current display window (running in the background), and then find the control that starts the voice control service on the display interface of the terminal and click, and then the terminal is based on
  • the user needs to perform more operations, which reduces the efficiency of interaction with the display window.
  • the embodiment of the present application provides a voice control method, which triggers the reception of voice data through a trigger operation recognized by the client, so that the operation steps required by the user are reduced, thereby improving the user and the client.
  • the efficiency of the interaction between when the user needs to interact with the client on the terminal in a voice control manner, the terminal may receive the voice data in response to the triggering operation for the interaction interface, where the triggering operation is performed by the client on the interaction interface.
  • the recognized operation of triggering the voice control and then the terminal can convert the received voice data into text data, and generate and execute a control instruction for operating the application according to the text data, thereby implementing user interaction with the application.
  • the user can directly input the voice data in any area on the interaction interface without being restricted by the specific voice input interface. Therefore, the user does not need to perform related operations to switch the display interface of the terminal from the interactive interface to the voice input interface. Compared to the prior art, the user does not need to perform an operation of exiting the display window to find a control of the voice control service. The operation reduces the user's required steps, improves the interaction between the user and the client, and improves the user experience.
  • the user can directly click on the display window, the click window recognizes the click operation, and determines that interaction with the user is required, and then the user can directly input “maximize display” on the interactive interface.
  • the voice data of the window is such that the terminal maximizes the display window running in the background based on the voice data. It can be seen that the user does not need to exit the current display window, but can directly perform the triggering operation of triggering the voice control on the current interaction interface, thereby reducing the operation steps required by the user and improving the interaction efficiency with the display window.
  • a voice control method in the embodiment of the present application may be applied to the application scenario shown in FIG. 1 .
  • the user 101 when the user 101 needs to perform voice interaction with the client on the terminal 102, the user 101 can perform a triggering operation on the terminal 102 for the interactive interface, and the triggering operation can be recognized by the client on the terminal 102. And determining, as the operation for triggering the voice control, after the terminal 102 responds to the triggering operation, the voice data input by the user 101 may be received, and the voice data is converted into text data, and then the terminal 102 may generate corresponding data according to the text data.
  • the instructions are controlled and executed to effect interaction between the client on the terminal 102 and the user 101.
  • FIG. 2 is a schematic flowchart of a voice control method according to an embodiment of the present application.
  • the method may specifically include:
  • S201 Receive voice data in response to a triggering operation for the interaction interface, where the triggering operation is an operation triggered by the client to trigger voice control on the interaction interface.
  • the user when the user needs to interact with the client on the terminal, the user may perform a trigger operation on the interaction interface of the terminal, such as long pressing a specific area on the interaction interface, etc., the trigger operation indicates The user needs to interact with the client by means of voice control.
  • the client on the terminal can determine the triggering operation performed by the user. Specifically, the triggering operation can be matched with the preset triggering operation. The triggering operation is determined to trigger the operation of initiating the voice control. After the client recognizes the triggering operation, triggering the activation of the voice receiver (such as a microphone, etc.) configured on the terminal to receive the voice data input by the user.
  • the voice receiver such as a microphone, etc.
  • the voice receiver is automatically triggered to receive the voice data input by the user. Therefore, for the user, the user can input directly on the interaction interface. Voice data without the need to input voice data on a specific voice input interface, so that the user does not need to perform excessive operation steps, thereby improving the user experience.
  • the client interacting with the user may include not only the third-party software on the terminal, but also various applications on the terminal, such as the desktop of the terminal, the display window, and various functions built into the operating system. Programs, etc.
  • the interactive interface generally refers to a terminal that presents a display interface of a client that interacts with the user.
  • the triggering operation performed by the user may be an operation of the user for the interaction interface, for example, the user may click, double-click, long press, etc. on the client icon on the interaction interface, or It is a double-click, long-press, slide, etc. operation performed by the user in a blank area on the interactive interface (ie, an area where the client icon is not displayed).
  • the form of the trigger operation can be set in advance, and the user performs the operation on the terminal. Any of the operations can be set as a triggering operation for triggering voice control.
  • the triggering operation may be different from the operations frequently used by the user on the terminal, for example, the user usually turns left or right.
  • the touch screen on the sliding terminal is used to switch the client icon displayed on the interactive interface, but the user usually slid the touch display screen upwards, and the user can perform the operation of sliding up the touch display screen in advance to start the trigger.
  • Voice controlled operation in order to facilitate the user's use, and also to minimize the changes to the existing operating rules, the triggering operation may be different from the operations frequently used by the user on the terminal, for example, the user usually turns left or right.
  • the touch screen on the sliding terminal is used to switch the client icon displayed on the interactive interface, but the user usually slid the touch display screen upwards, and the user can perform the operation of sliding up the touch display screen in advance to start the trigger.
  • Voice controlled operation in order to facilitate the user's use, and also to minimize the changes to the existing operating rules, the triggering operation may be different from the operations frequently used by the user on the terminal, for example, the user usually turns
  • a voice recording popup window may be utilized to prompt the user to input voice data.
  • the user may present a voice recording popup window for prompting the user to perform voice input, and feeding back the voice recording situation to the user.
  • the voice recording window is popped up, in order to reflect the difference between the input voice data and the input voice data to the user, the presentation form of the voice recording pop-up window when the user inputs the voice data may be changed, so that the user does not input the voice data. There are differences in the presentation form of the voice recording pop-up window.
  • the terminal may be configured by a voice recognition engine, and after receiving the voice data input by the user by using the voice receiver, the terminal may identify the voice data by the voice recognition engine and convert the data into text data. For example, if the user inputs the voice data whose voice content is “da kai weixin”, the terminal can use the voice recognition engine to convert the voice data into the Chinese text “Open WeChat”.
  • the "da kai weixin" in this embodiment is only used to describe the Chinese pronunciation of the voice data input by the user, and the similarities are also the following.
  • the terminal may convert the received voice data into initial text data by using a voice recognition engine, but considering that the voice recognition engine cannot achieve 100% recognition accuracy in actual applications, therefore, the initial is obtained.
  • the initial text data can also be semantically analyzed, and the initial text data is adjusted according to the result of the semantic analysis, so that the content of the adjusted initial text data is more universal and/or logical. Strong, more suitable for the voice content actually input by the user. For example, suppose there is a client called “Reading”, when the user inputs the voice data with the voice content “da kai yue du”, the initial text data recognized by the voice recognition engine is “open reading”. However, there is no client named “Reading” on the terminal.
  • the initial text data can be adjusted to “open reading”, so that the subsequent terminal can successfully open the “ready” client, then the user can
  • the adjusted initial text data is text data converted based on the voice data.
  • the adjusted initial text data can also be analyzed, and the predicate and/or object in the adjusted initial text data can be segmented to obtain the action keyword corresponding to the predicate and/or the object keyword corresponding to the object. .
  • the content of the text data is converted, there may be a certain difference from the voice data content input by the user.
  • the user inputs the voice content as “qing da kai wo de weixin”, and the initial text data obtained by the speech recognition engine is “Please open my WeChat”, but after semantic analysis, it can only retain the original text data.
  • the action keyword and the object keyword, the obtained adjusted initial text data may be “open WeChat”, and “Open WeChat” is used as text data converted based on voice data.
  • S203 Generate a control instruction based on the converted text data.
  • control commands can be generated based on the converted text data.
  • the text data can be matched with preset instruction type text data, and control instructions can be generated based on the matched instruction type text data.
  • the preset instruction type text data refers to text data that is preset in the terminal and can be used to generate a control instruction.
  • corresponding control instructions may be generated based on specific text data. For example, if the specific text data is “Start WeChat”, a control instruction for starting and running WeChat is generated based on the text data, and, for example, specific text data.
  • the text data may be matched with the preset instruction type text data, and based on the result of the matching, it is determined whether the corresponding control instruction can be generated.
  • a non-limiting example of matching text data with instruction-type text data is provided.
  • the text data converted based on the voice data includes an action keyword and an object keyword, and the terminal may use the action keyword in the text data and the action keyword in the instruction type text data.
  • the reason why the action keyword in the text data needs to be matched with the object keyword and the command type text data is that all the text data obtained based on the voice data input by the user is suitable for direct generation.
  • Control instruction It can be understood that, for the same control instruction, voice data input by different users may be different, and the converted text data may also be different. Therefore, it is necessary to match the action keyword in the converted text data with the object keyword and the instruction type text data, and determine the execution action of the control instruction and the execution object, so that even if different users input different voice data, Implement the same interaction with the client.
  • the content of the voice data input by the user A is “open WeChat software”
  • the content of the voice data input by the user B is “running the WeChat application program”
  • the content of the voice data input by the user C is “starting the WeChat client”, which is visible.
  • the voice data input by users A, B, and C are different, they are all for the terminal to be able to run the client "WeChat”, so they all correspond to the same control command of running WeChat. Therefore, by matching with the action keyword in the command type text data, the action keywords belonging to the users A, B, and C, respectively, "open", “run”, and “start” can be combined with the instruction type text data.
  • the action keyword "run” is successfully matched, and the object keywords "WeChat software”, “WeChat application”, and “WeChat client” belonging to users A, B, and C can all be compared with object keywords in the command text data.
  • the "WeChat client” is successfully matched, so that the control commands corresponding to the users A, B, and C are the control commands for running the client "WeChat”, and thus the users A, B, and C can perform the same interaction with the client.
  • the object data may not be included in the text data obtained based on the voice data input by the user.
  • the object keyword may be determined according to the operation object of the trigger operation performed by the user. Therefore, in another matching example, the text data converted based on the voice data may include an action keyword, and the terminal may match the action keyword with the action keyword in the preset instruction type text data. And the matched action keyword is used as the second action keyword, and at the same time, the second object keyword may be determined according to the operation object of the trigger operation performed by the user, thereby according to the second action keyword and the second object keyword , generate the corresponding control instructions.
  • the user may perform a triggering operation on the client icon on the interaction interface
  • the operation object of the triggering operation is usually a client that the user needs to interact with, and therefore, may be based on the triggering operation.
  • the operation object determines the second object keyword.
  • the user can double-click the WeChat icon on the interactive interface and input the voice data whose voice content is “on”. It can be understood that the interaction desired by the user is to open WeChat. Then, the terminal can match the action keyword "open” in the text data with the action keyword in the instruction type text data, successfully match the second action keyword "run", and at the same time, the operation object based on the user's double-click operation
  • the "WeChat icon” determines the second object keyword "WeChat client", and based on the second action keyword and the second object keyword, can generate a control instruction for running the WeChat client.
  • the text data obtained based on the voice data input by the user may not include the action keyword.
  • the action keyword may be determined based on the object keyword in the text data. Therefore, in another matching example, the text data converted based on the voice data may include the object keyword, and the terminal may match the object keyword with the object keyword in the preset instruction type text data. And matching the matched object keyword as the third object keyword, and determining the third action keyword according to the third object keyword, thereby generating corresponding according to the third action keyword and the third object keyword Control instructions.
  • the operation required to control the client usually has only one operation, or the applicability of the operation is the highest, and the terminal can be the client (also That is, the third object keyword) determines an operation that needs to be performed on the client, that is, determines a third action keyword that generates a control command.
  • the WeChat on the terminal is not running, and the user inputs the voice data of the “WeChat client”, it is generally considered that the user needs the terminal to run the WeChat client, that is, the WeChat client needs to be executed.
  • the operation is usually an operation of running a WeChat client.
  • the terminal may determine that the third action keyword is “running”, and then according to the third object keyword and the third action key.
  • the word generates a control instruction that runs the WeChat client.
  • the action keyword and the object keyword for generating the control command are determined based on the matching of the text data with the preset command text data, and in other embodiments, the text data may also be The semantic analysis method is performed to determine the action keyword and the object keyword for generating the control command.
  • the text data may be semantically analyzed, and the fourth action keyword is determined from the text data according to a certain rule, and according to a trigger operation performed by the user.
  • the operation object determines the client that the user needs to interact with, that is, determines the fourth object keyword, and then generates a corresponding control instruction based on the determined fourth action keyword and the fourth object keyword.
  • the user can double-click the blank area on the interactive interface (ie, the area where the client icon is not displayed), and input the voice data whose voice content is “too bright”, and the terminal can understand through the semantic analysis that the user desires to reduce the brightness, that is, The action keyword is to reduce the brightness.
  • the terminal may determine that the user needs to reduce the brightness of the display screen according to the user's double-click operation on the blank area of the interaction interface, that is, the object keyword is the display screen, and thus, according to the determined action keyword With the object keyword, a control command that reduces the brightness of the display screen can be generated.
  • the terminal may directly According to the voice data input by the user, the action keyword and the object keyword are determined, or a matching manner between the statement and the statement is used to determine which control command needs to be generated.
  • the terminal may send the generated control instruction to the corresponding application program, so that the application program executes the control instruction.
  • the generated control command is a control command such as turning on the Bluetooth and increasing the brightness of the display
  • the terminal may send the control command to the application set by the system for execution; if the generated control command is a file for decompressing the file, copying the file, etc.
  • the terminal may send the control instruction to the file manager for execution; if the generated control instruction is a control instruction that maximizes and minimizes the display window, the terminal may send the control instruction to the window manager for execution.
  • the triggering operation recognized by the client triggers the reception of the voice data, so that the operation steps required by the user are reduced, thereby improving the interaction efficiency between the user and the client.
  • the terminal may receive the voice data in response to the triggering operation for the interaction interface, where the triggering operation is performed by the client on the interaction interface.
  • the recognized operation of triggering the voice control and then the terminal can convert the received voice data into text data, and generate and execute a control instruction for operating the application according to the text data, thereby implementing user interaction with the application.
  • the user can directly input the voice data in any area on the interaction interface without being restricted by the specific voice input interface. Therefore, the user does not need to perform related operations to switch the display interface of the terminal from the interactive interface to the voice input interface. Compared to the prior art, the user does not need to perform an operation of exiting the display window to find a control of the voice control service. The operation reduces the user's required steps, improves the interaction between the user and the client, and improves the user experience.
  • FIG. 3 is a schematic diagram showing an exemplary software architecture applied to the voice control method in the embodiment of the present application.
  • the software architecture can be applied to a terminal.
  • the software architecture can include a voice interactive service module, a voice receiver, a voice recognition engine, a text semantic analysis module, and various clients that can be created in the system.
  • the client can include not only third-party software on the terminal, but also various applications on the terminal, such as the desktop of the terminal, system settings, the dock Dock, the display window, and various functional programs built into the operating system. .
  • the voice interaction service module can establish a communication connection with the voice receiver, the voice recognition engine, the text semantic analysis module, and various clients for serially independent voice receivers, voice recognition engines, and text semantic analysis modules, and The corresponding data is forwarded to each client to form callbacks and controls.
  • the user may perform a trigger operation on the interaction interface on the interaction interface of the terminal, and the trigger operation is recognized by the client.
  • the voice interaction service module may be notified through the system interface, and the voice interaction server module may start the voice receiver by sending a startup command.
  • the voice receiver can start receiving voice data input by the user and send the voice data to the voice interaction service module.
  • the interactive interface generally refers to a display interface of the client that the terminal presents with the user.
  • the voice interaction service module resends the received voice data to the voice recognition engine, and the voice recognition engine recognizes the voice data, and converts the voice data into initial text data.
  • the speech recognition engine obtains the initial text data
  • the initial text data is sent to the voice interaction service module.
  • the speech interaction service module can send the text data to the text semantic analysis module, and the text semantic analysis module semantically analyzes and adjusts the initial text data to make adjustments.
  • the initial text data is more universal and/or more logical.
  • the text semantic analysis module can analyze the adjusted initial text data and segment the predicate in the adjusted initial text data. / or object, get the action keyword corresponding to the predicate and / or the object keyword corresponding to the object. Then, the text semantic analysis module can send the finally obtained text data (ie, the adjusted initial text data) to the voice interaction service module.
  • the voice interaction service module may match the action keyword and/or the object keyword in the text data with the action keyword and the object keyword in the command text data, and based on the matching
  • the instruction type text data generates a control instruction.
  • the preset instruction type text data refers to text data that is preset in the terminal and can be used to generate a control instruction.
  • the voice interaction service module may match the action keyword in the text data with the action keyword in the instruction type text data, and determine the matched action keyword as the first The action keyword, at the same time, matching the object keyword in the text data with the object keyword in the instruction type text data, and using the matched object keyword as the first object keyword, and then, based on the matched The first action keyword and the first object keyword may generate corresponding control commands.
  • the voice interaction service module can send the control instruction to the corresponding application to enable the application to perform the operation on the client. For example, if the generated control command is a control command such as turning on the Bluetooth and increasing the brightness of the display, the voice interactive service module may send the control command to the application set by the system for execution; if the generated control command is a decompressed file or a copy a control instruction such as a file, the terminal may send the control instruction to the file manager for execution; if the generated control instruction is a control instruction that maximizes and minimizes the display window, the terminal may send the control instruction to the window manager. Executed in the middle.
  • the generated control command is a control command such as turning on the Bluetooth and increasing the brightness of the display
  • the voice interactive service module may send the control command to the application set by the system for execution; if the generated control command is a decompressed file or a copy a control instruction such as a file, the terminal may send the control instruction to the file manager for execution; if the generated control instruction is a control instruction
  • the user can directly input the voice data in any area on the interaction interface without being restricted by the specific voice input interface. Therefore, the user does not need to perform related operations to switch the display interface of the terminal from the interactive interface to the voice input interface. Compared to the prior art, the user does not need to perform an operation of exiting the display window to find a control of the voice control service. The operation reduces the user's required steps, improves the interaction between the user and the client, and improves the user experience.
  • FIG. 4 is a schematic structural diagram of a device for voice control according to an embodiment of the present application.
  • the device 400 includes:
  • the receiving module 401 is configured to receive voice data in response to a triggering operation for the interaction interface, where the triggering operation is an operation triggered by the client to trigger voice control on the interaction interface;
  • a conversion module 402 configured to convert the voice data into text data
  • a generating module 403, configured to generate a control instruction based on the text data
  • the execution module 404 is configured to execute the control instruction.
  • the converting module 402 includes:
  • a converting unit configured to convert the voice data into initial text data
  • an adjusting unit configured to adjust the initial text data by performing semantic analysis on the initial text data, and use the adjusted initial text data as the text data.
  • the generating module 403 is further configured to:
  • the text data is matched with preset instruction type text data, and a control instruction is generated based on the matched instruction type text data.
  • the apparatus 400 further includes:
  • a determining module configured to determine an action keyword and/or an object keyword in the adjusted initial text data by performing semantic analysis on the initial text data; and the generating module is further configured to: based on the action keyword And/or object keywords, generating the control instructions.
  • the text data includes an action keyword and an object keyword
  • the generating module 403 includes:
  • a first matching unit configured to match the action keyword in the text data with the action keyword in the preset command text data to determine a first action keyword, the first action keyword Refers to the action keyword matched in the preset instruction type text data;
  • a second matching unit configured to match an object keyword in the text data with an object keyword in the preset instruction type text data, and determine a first object keyword, the first object keyword Refers to the object keyword matched in the preset instruction type text data;
  • a first generating unit configured to generate the control instruction based on the first action keyword and the first object keyword.
  • the text data includes an action keyword
  • the generating module 403 includes:
  • a third matching unit configured to match the action keyword in the text data with the action keyword in the preset command text data to determine a second action keyword, the second action keyword Refers to the action keyword matched in the preset instruction type text data;
  • a first determining unit configured to determine a second object keyword according to the operation object of the triggering operation
  • a second generating unit configured to generate the control instruction based on the second action keyword and the second object keyword.
  • the text data includes a target keyword
  • the generating module 403 includes:
  • a fourth matching unit configured to match an object keyword in the text data with an object keyword in the preset instruction type text data, and determine a third object keyword, the third object keyword Refers to the object keyword matched in the preset instruction type text data;
  • a second determining unit configured to determine a third action keyword according to the third object keyword
  • a third generating unit configured to generate the control instruction based on the third action keyword and the third object keyword.
  • the generating module 403 includes:
  • a third determining unit configured to perform semantic analysis on the text data to determine a fourth action keyword
  • a fourth determining unit configured to determine a fourth object keyword according to the operation object of the triggering operation
  • a fourth generating unit configured to generate the control instruction based on the fourth action keyword and the fourth object keyword.
  • the apparatus 400 further includes:
  • a presentation module for presenting a voice entry popup
  • the presentation form of the voice recording pop-up window when the voice data is received is different from the presentation form of the voice recording pop-up window when the voice data is not received.
  • the client can recognize the voice control triggering operation, the user can directly input the voice data in any area on the interaction interface without being restricted by the specific voice input interface, so the user does not need to Performing related operations to switch the display interface of the terminal from the interactive interface to the voice input interface.
  • the user does not need to perform an operation of exiting the display window to find the operation of the control of the voice control service, thereby reducing the user.
  • the steps that need to be performed improve the efficiency of interaction between the user and the client, and also improve the user experience.
  • the steps of a method or algorithm described in connection with the embodiments disclosed herein can be implemented directly in hardware, a software module executed by a processor, or a combination of both.
  • the software module can be placed in random access memory (RAM), memory, read only memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, removable disk, CD-ROM, or technical field. Any other form of storage medium known.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Multimedia (AREA)
  • Theoretical Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Acoustics & Sound (AREA)
  • General Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • User Interface Of Digital Computer (AREA)

Abstract

一种语音控制的方法及装置,方法包括:响应于针对于交互界面的触发操作,接收语音数据,触发操作为客户端在交互界面上所识别的触发语音控制的操作(201);将接收到的语音数据转换为文本数据(202);基于转换得到的文本数据,生成控制指令(203);执行生成的控制指令(204)。在用户与客户端进行交互的过程中,用户可以直接在交互界面上的任意区域触发语音数据的输入,而无需受限于特定的语音输入界面,因此,用户不需要再执行相关操作以使得终端的显示界面由交互界面切换到语音输入界面,从而减少了用户所需执行的操作步骤,提高了用户与客户端之间的交互效率,也提高了用户的使用体验。

Description

一种语音控制的方法及装置
相关申请的交叉引用
本申请要求于2018年05月14日提交的,申请号为201810456387.X、发明名称为“一种语音控制的方法及装置”的中国专利申请的优先权,该申请的全文通过引用结合在本申请中。
技术领域
本申请涉及语音控制技术领域,具体涉及一种语音控制的方法及装置。
背景技术
随着技术的发展,通过语音来与智能终端上应用进行交互的方式,越来越受用户青睐。现有的语音交互过程中,用户通过点击语音控制服务的控件来启动语音控制服务,此时,智能终端会向用户呈现一个语音输入界面,然后,用户在该语音输入界面上进行发声以输入语音数据,以使得智能终端根据用户输入的语音数据操作相应的应用,从而实现用户与智能终端上应用的各种交互。
但是,每次用户与应用进行交互时,智能终端都需要预先向用户呈现语音输入界面,然后才能与用户实现语音交互,从而导致智能终端无法快速的与用户进行语音交互,用户的使用体验较差。
发明内容
有鉴于此,本申请实施例提供一种语音控制的方法及装置,以提 高用户与智能终端进行语音交互的效率。
为解决上述问题,本申请实施例提供的技术方案如下:
第一方面,本申请实施例提供了一种语音控制的方法,该方法包括:
响应于针对于交互界面的触发操作,接收语音数据,所述触发操作为客户端在所述交互界面上所识别的触发语音控制的操作;
将所述语音数据转换为文本数据;
基于所述文本数据,生成控制指令;
执行所述控制指令。
在一些可能的实施方式中,所述将所述语音数据转换为文本数据,包括:
将所述语音数据转换为初始文本数据;
通过对所述初始文本数据进行语义分析,调整所述初始文本数据,将所述调整后的初始文本数据作为所述文本数据。
在一些可能的实施方式中,所述基于所述文本数据,生成控制指令,包括:
将所述文本数据与预设的指令型文本数据进行匹配,并基于匹配到的指令型文本数据生成控制指令。
在一些可能的实施方式中,所述方法还包括:
通过对所述初始文本数据进行语义分析,确定所述调整后的初始文本数据中的动作关键词和/或对象关键词;以及
所述基于所述文本数据,生成控制指令,包括:
基于所述动作关键词和/或对象关键词,生成所述控制指令。
在一些可能的实施方式中,所述文本数据包括动作关键词和对象关键词,则所述将所述文本数据与预设的指令型文本数据进行匹配,并基于匹配到的指令型文本数据生成控制指令,包括:
将所述文本数据中的动作关键词,与所述预设的指令型文本数据中的动作关键词进行匹配,确定第一动作关键词,所述第一动作关键词是指在所述预设的指令型文本数据中所匹配到的动作关键词;
将所述文本数据中的对象关键词,与所述预设的指令型文本数据中的对象关键词进行匹配,确定第一对象关键词,所述第一对象关键词是指在所述预设的指令型文本数据中所匹配到的对象关键词;
基于所述第一动作关键词与所述第一对象关键词,生成所述控制指令。
在一些可能的实施方式中,所述文本数据包括动作关键词,则所述将所述文本数据与预设的指令型文本数据进行匹配,并基于匹配到的指令型文本数据生成控制指令,包括:
将所述文本数据中的动作关键词,与所述预设的指令型文本数据中的动作关键词进行匹配,确定第二动作关键词,所述第二动作关键词是指在所述预设的指令型文本数据中所匹配到的动作关键词;
根据所述触发操作的操作对象确定第二对象关键词;
基于所述第二动作关键词与所述第二对象关键词,生成所述控制指令。
在一些可能的实施方式中,所述文本数据包括对象关键词,则所述将所述文本数据与预设的指令型文本数据进行匹配,并基于匹配到的指令型文本数据生成控制指令,包括:
将所述文本数据中的对象关键词,与所述预设的指令型文本数据中的对象关键词进行匹配,确定第三对象关键词,所述第三对象关键词是指在所述预设的指令型文本数据中所匹配到的对象关键词;
根据所述第三对象关键词确定第三动作关键词;
基于所述第三动作关键词与所述第三对象关键词,生成所述控制指令。
在一些可能的实施方式中,所述基于所述文本数据,生成控制指令,包括:
对所述文本数据进行语义分析,确定第四动作关键词;
根据所述触发操作的操作对象确定第四对象关键词;
基于所述第四动作关键词与所述第四对象关键词,生成所述控制指令。
在一些可能的实施方式中,所述方法还包括:
呈现语音录入弹窗;
其中,在接收到所述语音数据时所述语音录入弹窗的呈现形式,与没有接收到所述语音数据时所述语音录入弹窗的呈现形式存在差异。在一些可能的实施方式中,所述根据所述第三对象关键词确定第三动作关键词,包括:将与所述第三对象关键词之间适用性最高的动作关键词,确定为所述第三动作关键词。
第二方面,本申请实施例还提供了一种语音控制的装置,该装置包括:
接收模块,用于响应于针对于交互界面的触发操作,接收语音数据,所述触发操作为客户端在所述交互界面上所识别的触发语音控制的操作;
转换模块,用于将所述语音数据转换为文本数据;
生成模块,用于基于所述文本数据,生成控制指令;
执行模块,用于执行所述控制指令。
在一些可能的实施方式中,所述转换模块,包括:
转换单元,用于将所述语音数据转换为初始文本数据;
调整单元,用于通过对所述初始文本数据进行语义分析,调整所述初始文本数据,将所述调整后的初始文本数据作为所述文本数据。
在一些可能的实施方式中,所述生成模块进一步用于,
将所述文本数据与预设的指令型文本数据进行匹配,并基于匹配到的指令型文本数据生成控制指令。
在一些可能的实施方式中,所述装置还包括:
确定模块,用于通过对所述初始文本数据进行语义分析,确定所述调整后的初始文本数据中的动作关键词和/或对象关键词;以及所述生成模块进一步用于:基于所述动作关键词和/或对象关键词,生产所述控制指令。
在一些可能的实施方式中,所述文本数据包括动作关键词和对象关键词,则所述生成模块,包括:
第一匹配单元,用于将所述文本数据中的动作关键词,与所述预设的指令型文本数据中的动作关键词进行匹配,确定第一动作关键词,所述第一动作关键词是指在所述预设的指令型文本数据中所匹配到的动作关键词;
第二匹配单元,用于将所述文本数据中的对象关键词,与所述预设的指令型文本数据中的对象关键词进行匹配,确定第一对象关键词,所述第一对象关键词是指在所述预设的指令型文本数据中所匹配到的对象关键词;
第一生成单元,用于基于所述第一动作关键词与所述第一对象关键词,生成所述控制指令。
在一些可能的实施方式中,所述文本数据包括动作关键词,则所述生成模块,包括:
第三匹配单元,用于将所述文本数据中的动作关键词,与所述预设的指令型文本数据中的动作关键词进行匹配,确定第二动作关键词,所述第二动作关键词是指在所述预设的指令型文本数据中所匹配到的动作关键词;
第一确定单元,用于根据所述触发操作的操作对象确定第二对象关键词;
第二生成单元,用于基于所述第二动作关键词与所述第二对象关键词,生成所述控制指令。
在一些可能的实施方式中,所述文本数据包括对象关键词,则所述生成模块,包括:
第四匹配单元,用于将所述文本数据中的对象关键词,与所述预设的指令型文本数据中的对象关键词进行匹配,确定第三对象关键词,所述第三对象关键词是指在所述预设的指令型文本数据中所匹配到的对象关键词;
第二确定单元,用于根据所述第三对象关键词确定第三动作关键词;
第三生成单元,用于基于所述第三动作关键词与所述第三对象关键词,生成所述控制指令。
在一些可能的实施方式中,所述生成模块,包括:
第三确定单元,用于对所述文本数据进行语义分析,确定第四动作关键词;
第四确定单元,用于根据所述触发操作的操作对象确定第四对象关键词;
第四生成单元,用于基于所述第四动作关键词与所述第四对象关键词,生成所述控制指令。
在一些可能的实施方式中,所述装置还包括:
呈现模块,用于呈现语音录入弹窗;
其中,在接收到所述语音数据时所述语音录入弹窗的呈现形式,与没有接收到所述语音数据时所述语音录入弹窗的呈现形式存在差异。
在一些可能的实施方式中,所述第二确定单元进一步用于:所述根据所述第三对象关键词确定第三动作关键词,包括:将与所述第三对象关键词之间适用性最高的动作关键词,确定为所述第三动作关键词。
由此可见,本申请实施例具有如下有益效果:
本申请实施例中,通过客户端识别出的触发操作来触发语音数据的接收,使得用户所需执行的操作步骤减少,进而提高用户与客户端之间的交互效率。具体的,当用户需要通过语音控制的方式与终端上的客户端进行交互时,终端可以响应于针对于交互界面的触发操作,接收语音数据,其中,该触发操作为客户端在交互界面上所识别的触发语音控制的操作,然后,终端可以将接收到的语音数据转换为文本数据,并根据该文本数据生成与操作该应用的控制指令并执行,从而实现用户与应用的交互。可见,在用户与客户端进行交互的过程中,由于客户端可以识别出语音控制触发操作,用户可以直接在交互界面上的任意区域触发语音数据的输入,而无需受限于特定的语音输入界面,因此,用户不需要再执行相关操作以使得终端的显示界面由交互界面切换到语音输入界面,相比于现有技术而言,用户不需要执行退 出显示窗口的操作,查找语音控制服务的控件的操作,从而减少了用户所需执行的操作步骤,提高了用户与客户端之间的交互效率,也提高了用户的使用体验。
附图说明
图1为本申请实施例提供的一种示例性应用场景示意图;
图2为本申请实施例提供的一种语音控制的方法流程示意图;
图3为本申请实施例提供的一种示例性应用场景的软件架构示意图;
图4为本申请实施例提供的一种语音控制的装置结构示意图。
具体实施方式
现有的语音交互过程中,由于用户每次都需要在特定的语音输入界面上输入语音数据,因此,终端每次都要先向用户呈现特定的语音输入界面,才能与用户进行各种应用的交互,这样会降低用户与应用之间的交互效率,尤其是在用户访问应用提供的服务时,如果用户希望通过语音控制的方式与应用进行交互,则用户还需要先在智能终端上退出当前应用,然后再在智能终端呈现的语音输入界面上输入针对于该应用的语音数据,才能实现通过语音控制的方式与该应用进行交互,可见,用户只能在特定的语音输入界面上输入语音数据的方式,导致了用户需要执行的操作较多,从而使得用户与应用之间的交互效率较低,而且,用户的使用体验也较差。
比如,当用户需要对显示窗口进行最大化时,用户需要执行退出当前显示窗口(后台运行)的操作,然后在终端的显示界面上查找到启动语音控制服务的控件并进行点击,接着,终端基于用户点击该控件的操作,向用户呈现语音输入界面,用户在该语音输入界面上输入“最大化显示窗口”的语音数据,以使得终端基于该语音数据,将后台运行的显示窗口最大化。在此过程中,用户所需要进行的操作较多,降低了与显示窗口进行交互的效率。
为了解决上述技术问题,本申请实施例提供了一种语音控制的方法,通过客户端识别出的触发操作来触发语音数据的接收,使得用户所需执行的操作步骤减少,进而提高用户与客户端之间的交互效率。具体的,当用户需要通过语音控制的方式与终端上的客户端进行交互时,终端可以响应于针对于交互界面的触发操作,接收语音数据,其中,该触发操作为客户端在交互界面上所识别的触发语音控制的操作,然后,终端可以将接收到的语音数据转换为文本数据,并根据该文本数据生成操作该应用的控制指令并执行,从而实现用户与应用的交互。可见,在用户与客户端进行交互的过程中,由于客户端可以识别出语音控制触发操作,用户可以直接在交互界面上的任意区域触发语音数据的输入,而无需受限于特定的语音输入界面,因此,用户不需要再执行相关操作以使得终端的显示界面由交互界面切换到语音输入界面,相比于现有技术而言,用户不需要执行退出显示窗口的操作,查找语音控制服务的控件的操作,从而减少了用户所需执行的操作步骤,提高了用户与客户端之间的交互效率,也提高了用户的使用体验。
依然以最大化显示窗口为例,用户可以直接对该显示窗口进行点击,由显示窗口识别出该点击操作,并确定需要与用户进行交互,然后用户可以直接在该交互界面上输入“最大化显示窗口”的语音数据,以使得终端基于该语音数据,将后台运行的显示窗口最大化。可见,用户不需要退出当前显示窗口,而可以直接在当前交互界面上执行触发语音控制的触发操作,也就减少了用户所需执行的操作步骤,提高了与显示窗口的交互效率。
作为一种示例,本申请实施例的一种语音控制的方法,可以应用于如图1所示的应用场景中。在该场景中,当用户101需要与终端102上的客户端进行语音交互时,用户101可以在终端102上执行针对于交互界面的触发操作,该触发操作可以由终端102上的客户端进行识别、并确定为触发语音控制的操作,在终端102响应该触发操作后,可以接收用户101输入的语音数据,并将该语音数据转换为文本数据,然后,终端102可以根据该文本数据生成相应的控制指令,并执行该指令,以实现终端102上的客户端与用户101之间的交互。
当然,上述场景仅仅作为示例性说明,并不用于限定本申请实施例的场景,除上述示例性场景外,本申请实施例还可以应用于其它可适用场景中。
为了使本技术领域的人员更好地理解本申请中的技术方案,下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例,都应当属于本申请保护的范围。
请一并参阅图2,图2示出了本申请实施例提供的一种语音控制的方法的流程示意图,该方法具体可以包括:
S201:响应于针对交互界面的触发操作,接收语音数据,该触发操作为客户端在交互界面上所识别的触发语音控制的操作。
作为一种示例性的具体实现方式,当用户需要与终端上的客户端进行交互时,用户可以在终端的交互界面上执行触发操作,比如长按交互界面上的特定区域等,该触发操作表明用户需要通过语音控制的方式与客户端进行交互,则,终端上的客户端可以对用户执行的触发操作进行判断,具体可以是将该触发操作与预置的触发操作进行匹配,如果匹配成功可确定该触发操作为触发启动语音控制的操作,在客户端识别该触发操作后,触发配置在终端上的语音接收器(如麦克风等)的启动,以接收用户输入的语音数据。
可以理解,由于终端上的客户端可以自主识别出触发语音控制的触发操作,从而自动触发语音接收器来接收用户输入的语音数据,因此,对于用户而言,用户可以直接在该交互界面上输入语音数据,而无需在特定的语音输入界面上进行语音数据的输入,从而用户不需要执行过多的操作步骤,提高了用户的使用体验。
需要说明的是,与用户进行交互的客户端,不仅仅可以包括终端上的第三方软件,也可以包括终端上的各种应用程序,如终 端的桌面、显示窗口以及操作系统内置的各种功能化程序等。而交互界面,通常是指终端呈现有与用户进行交互的客户端的显示界面。
在一些可能的实施方式中,用户执行的触发操作,可以是用户针对于交互界面的操作,比如,可以是用户对交互界面上的客户端图标的单击、双击、长按等操作,也可以是用户在交互界面上的空白区域(即没有显示客户端图标的区域)进行的双击、长按、滑动等操作,可以理解,该触发操作的形式可以预先进行设定,用户在终端上进行的任意一种操作,均可以被设定为用于触发语音控制的触发操作。但是实际应用中,为了方便用户的使用,同时也尽量减少对现有操作规则的改动,该触发操作可以与用户在终端上经常使用的操作存在一定区别,比如,用户通常会向左或者向右滑动终端上的触摸显示屏,以切换交互界面所显示的客户端图标,但是用户通常很少向上滑动触摸显示屏,则,可以预先设定用户执行的向上滑动触摸显示屏的操作,为触发启动语音控制的操作。
进一步的,为了提高用户的使用体验,可以利用语音记录弹窗来提示用户输入语音数据。具体的,本实施例中,在响应用户针对于交互界面的触发操作后,可以向用户呈现语音记录弹窗,该语音记录弹窗用于提示用户可以进行语音输入,以及向用户反馈语音记录情况。需要说明的是,在弹出语音记录窗口后,为了向用户体现输入语音数据与没有输入语音数据的区别,可以改变用户输入语音数据时语音记录弹窗的呈现形式,使得其与用户没有输入语音数据时语音记录弹窗的呈现形式存在差异。
S202:将接收到的语音数据转换为文本数据。
实际应用中,终端可以配置由语音识别引擎,则终端在利用语音接收器接收到用户输入的语音数据后,可以由语音识别引擎对该语音数据进行识别,并转换为文本数据。比如,用户输入语音内容为“da kai weixin”的语音数据,则终端可以利用语音识别引擎,将该语音数据转换为中文文本“打开微信”。其中,本实施 例中的“da kai weixin”仅是用于描述用户输入的语音数据的中文发音,下面类似之处亦是如此。
作为一种示例性的具体实施方式,终端可以通过语音识别引擎将接收到的语音数据转换为初始文本数据,但是考虑到实际应用中语音识别引擎无法达到百分之百的识别准确率,因此,在得到初始文本数据后,还可以对该初始文本数据进行语义分析,根据语义分析的结果,来对初始文本数据进行调整,使得调整后的初始文本数据中内容的普适性更高和/或逻辑性更强,更贴合用户实际输入的语音内容。比如,假设存在一款名为“悦读”的客户端,则当用户输入语音内容为“da kai yue du”的语音数据时,语音识别引擎通常所识别的初始文本数据为“打开阅读”,但是终端上并不存在名为“阅读”的客户端,则通过语义分析,可以将初始文本数据调整为“打开悦读”,以便于后续终端顺利打开“悦读”客户端,则可以将该调整后的初始文本数据作为基于语音数据所转换得到的文本数据。同时,通过语义分析还可以对调整后的初始文本数据进行分析,切分出调整后的初始文本数据中的谓语和/或宾语,得到谓语对应的动作关键词和/或宾语对应的对象关键词。
在一些可能的场景中,由于转换得到文本数据的内容,也可以与用户输入的语音数据内容存在一定的差异。比如,用户输入语音内容为“qing da kai wo de weixin”,利用语音识别引擎所得到的初始文本数据为“请打开我的微信”,但是在经过语义分析后,可以只保留初始文本数据中的动作关键词与对象关键词,所得到的调整后的初始文本数据可以为“打开微信”,并将“打开微信”作为基于语音数据转换得到的文本数据。
S203:基于转换得到的文本数据,生成控制指令。
在将语音数据转换为文本数据后,可以基于转换得到的文本数据,生成相应的控制指令。
对于基于转换得到的文本数据生成控制指令的具体实施过程,本实施例中,提供了以下两种示例性实施方式:
在一种示例性实施方式中,可以将文本数据与预设的指令型 文本数据进行匹配,并基于匹配到的指令型文本数据生成控制指令。
其中,预设的指令型文本数据,是指预先设定于终端内部、可以用于生成控制指令的文本数据。实际应用中,基于特定的文本数据可以生成相对应的控制指令,比如,特定的文本数据为“启动微信”,则基于该文本数据生成启动并运行微信的控制指令,又比如,特定的文本数据为“播放音乐”,则生成播放当前音乐列表中第一首歌的控制指令等,因此,这些特定的文本数据可以作为预设的指令型文本数据,具体实现时,可以由技术人员按照实际情况的需要进行设定。
本实施例中,当得到文本数据后,可以将该文本数据与预设的指令型文本数据进行匹配,基于匹配的结果,确定是否可以生成相应的控制指令。本实施例中,提供一下将文本数据与指令型文本数据进行匹配的非限定性示例。具体的,在一种匹配示例中,基于语音数据所转换得到的文本数据包括有动作关键词和对象关键词,则终端可以将文本数据中的动作关键词与指令型文本数据中的动作关键词进行匹配,并确定所匹配到的动作关键词,将其作为第一动作关键词,同时,将文本数据中的对象关键词与指令型文本数据中的对象关键词进行匹配,并将所匹配到的对象关键词作为第一对象关键词,然后,基于所匹配到的第一动作关键词、第一对象关键词,可以生成相应的控制指令。
需要说明的是,之所以需要将文本数据中的动作关键词与对象关键词与指令型文本数据进行匹配,是因为并非基于用户输入的语音数据而得到的所有文本数据,均适合直接用于生成控制指令。可以理解,针对于同一控制指令,可能不同用户输入的语音数据不同,进而所转换得到的文本数据也可能不同。因此,需要将转换得到的文本数据中的动作关键词与对象关键词与指令型文本数据进行匹配,确定出控制指令的执行动作以及执行对象,这样,即使不同用户输入不同的语音数据,也可以实现与客户端进行相同的交互。
比如,用户A输入的语音数据的内容为“打开微信软件”,用户B输入的语音数据的内容为“运行微信应用程序”,用户C输入的语音数据的内容为“启动微信客户端”,可见,虽然用户A、B、C输入的语音数据不同,但是均是为了终端能够运行客户端“微信”,所以都对应着运行微信这一相同控制指令。因此,通过与指令型文本数据中的动作关键词进行匹配,分别将属于用户A、B、C的动作关键词“打开”、“运行”、“启动”,均可以与指令型文本数据中的动作关键词“运行”成功匹配,将属于用户A、B、C的对象关键词“微信软件”、“微信应用程序”、“微信客户端”,均可以与指令型文本数据中的对象关键词“微信客户端”成功匹配,从而使得用户A、B、C对应的控制指令均为运行客户端“微信”的控制指令,进而可以实现用户A、B、C与客户端进行相同的交互。
考虑到实际应用的一些场景中,基于用户输入的语音数据所得到的文本数据中可能并不包含对象关键词,此时,可以根据用户执行的触发操作的操作对象确定对象关键词。因此,在另一种匹配的示例中,基于语音数据所转换得到的文本数据可以包括有动作关键词,则终端可以将该动作关键词与预设的指令型文本数据中的动作关键词进行匹配,并将所匹配到的动作关键词作为第二动作关键词,同时,可以根据用户执行的触发操作的操作对象确定第二对象关键词,从而根据该第二动作关键词与第二对象关键词,生成相应的控制指令。在本实施方式中,考虑到用户可以是针对于交互界面上的客户端图标进行触发操作,而该触发操作的操作对象,通常为用户需要进行交互的客户端,因此,可以基于该触发操作的操作对象,确定第二对象关键词。
比如,用户可以双击交互界面上的微信图标,并输入语音内容为“打开”的语音数据,可以理解,用户所期望进行的交互为打开微信。则,终端可以将文本数据中的动作关键词“打开”与指令型文本数据中的动作关键词进行匹配,成功匹配到第二动作关键词“运行”,同时,基于用户的双击操作的操作对象“微信图标”,确定出第二对象关键词“微信客户端”,则基于第二动作关 键词与第二对象关键词,可以生成运行微信客户端的控制指令。
而在实际应用的另一些场景中,基于用户输入的语音数据所得到的文本数据中可能并不包含动作关键词,此时,可以基于文本数据中的对象关键词确定动作关键词。因此,在另一种匹配的示例中,基于语音数据所转换得到的文本数据可以包括有对象关键词,则终端可以将该对象关键词与预设的指令型文本数据中的对象关键词进行匹配,并将所匹配到的对象关键词作为第三对象关键词,同时,可以根据第三对象关键词确定第三动作关键词,从而根据该第三动作关键词与第三对象关键词,生成相应的控制指令。本实施方式中,考虑到部分应用场景下,用户与客户端进行交互时,所需要控制客户端执行的操作通常只有一种操作,或者该操作的适用性最高,则终端可以该客户端(也即第三对象关键词),确定出需要对客户端进行执行的操作,即确定出生成控制指令的第三动作关键词。
比如,如果终端上的微信没有运行,并且用户输入语音内容为“微信客户端”的语音数据,则通常情况下,可以认为用户需要终端运行微信客户端,也即,需要对微信客户端所执行的操作通常为运行微信客户端的操作,此时,终端根据第三对象关键词“微信客户端”,可以确定第三动作关键词为“运行”,进而根据第三对象关键词与第三动作关键词生成运行微信客户端的控制指令。
上述实施方式中,是基于文本数据与预设的指令型文本数据进行匹配而确定出生成控制指令的动作关键词与对象关键词,而在其他的一些实施方式中,也可以是通过对文本数据进行语义分析方式,确定出生成控制指令的动作关键词与对象关键词。
具体的,在另一种示例性实施方式中,也可以是对所述文本数据进行语义分析,按照一定的规则,从文本数据中确定出第四动作关键词,并根据用户执行的触发操作的操作对象,确定出用户需要进行交互的客户端,也即为确定出第四对象关键词,然后基于确定出的第四动作关键词与第四对象关键词,生成相应的控 制指令。
举例来说,用户可以双击交互界面上的空白区域(即没有显示客户端图标的区域),并且输入语音内容为“太亮了”的语音数据,则终端通过语义分析可知,用户期望降低亮度,即动作关键词为降低亮度,进一步的,终端根据用户在交互界面上空白区域的双击操作,可以确定用户需要降低显示屏幕的亮度,即对象关键词为显示屏幕,从而,根据所确定的动作关键词与对象关键词,可以生成降低显示屏幕亮度的控制指令。
当然,上述实施方式仅作为示例性说明,并不用于对本实施例的限定,事实上,除了上述实施方式之外,基于文本数据生成控制指令的还存在其它多种实施方式,比如,终端可以直接根据用户输入的语音数据,确定出动作关键词与对象关键词,或者是采用语句与语句之间的匹配方式等来确定出需要生成何种控制指令等。
S204:执行生成的控制指令。
本实施例中,终端可以将生成的控制指令,发送给相应的应用程序,以使得该应用程序执行该控制指令。比如,如果生成的控制指令为打开蓝牙、提高显示屏亮度等控制指令,则终端可以将该控制指令发送至系统设置的应用程序中进行执行;如果生成的控制指令为解压文件、拷贝文件等控制指令,则终端可以将该控制指令发送至文件管理器中进行执行;如果生成的控制指令为最大化、最小化显示窗口的控制指令,则终端可以将该控制指令发送至窗口管理器中进行执行。
本实施例中,通过客户端识别出的触发操作来触发语音数据的接收,使得用户所需执行的操作步骤减少,进而提高用户与客户端之间的交互效率。具体的,当用户需要通过语音控制的方式与终端上的客户端进行交互时,终端可以响应于针对于交互界面的触发操作,接收语音数据,其中,该触发操作为客户端在交互界面上所识别的触发语音控制的操作,然后,终端可以将接收到的语音数据转换为文本数据,并根据该文本数据生成与操作该应用的控制指令并执行,从而实现用 户与应用的交互。可见,在用户与客户端进行交互的过程中,由于客户端可以识别出语音控制触发操作,用户可以直接在交互界面上的任意区域触发语音数据的输入,而无需受限于特定的语音输入界面,因此,用户不需要再执行相关操作以使得终端的显示界面由交互界面切换到语音输入界面,相比于现有技术而言,用户不需要执行退出显示窗口的操作,查找语音控制服务的控件的操作,从而减少了用户所需执行的操作步骤,提高了用户与客户端之间的交互效率,也提高了用户的使用体验。
为了更加详细的介绍本申请的技术方案,下面结合具体软件架构对本申请实施例进行描述。请一并参阅图3,图3示出了本申请实施例中语音控制的方法所应用的一种示例性软件架构示意图,在一些场景下,该软件架构可应用于终端上。
该软件架构可以包括可以被创建于系统中的语音交互服务模块、语音接收器、语音识别引擎、文本语义分析模块以及各种客户端。其中,客户端不仅仅可以包括终端上的第三方软件,也可以包括终端上的各种应用程序,如终端的桌面、系统设置、停靠栏Dock、显示窗口以及操作系统内置的各种功能化程序。
语音交互服务模块可以与语音接收器、语音识别引擎、文本语义分析模块以及各种客户端之间建立通信连接,用于串联相互独立的语音接收器、语音识别引擎以及文本语义分析模块,并将相应的数据转发至各个客户端,形成回调和控制。
当用户需要通过语音控制的方式实现与客户端的交互时,用户可以在终端的交互界面上执行针对于交互界面的触发操作,由客户端对该触发操作进行识别。当客户端识别出该触发操作后,可以通过系统接口,通知语音交互服务模块,语音交互服务器模块可以通过发送启动指令的方式,启动语音接收器。语音接收器可以开始接收用户输入的语音数据,并将该语音数据发送给语音交互服务模块。其中,交互界面,通常是指终端呈现有与用户进行交互的客户端的显示界面。
然后,语音交互服务模块将接收到的语音数据再发送给语音识别引擎,由语音识别引擎对该语音数据进行识别,并将该语音数据转换为初始文本数据。语音识别引擎在得到初始文本数据后,将该初始文本数据发送给语音交互服务模块。
考虑到语音识别引擎无法做到百分之百的识别准确率,语音交互服务模块可以再将该文本数据发送给文本语义分析模块,由文本语义分析模块对该初始文本数据进行语义分析并调整,以使得调整后的初始文本数据的普适性更高和/或逻辑性更强;同时,文本语义分析模块还可以对调整后的初始文本数据进行分析,切分出调整后的初始文本数据中的谓语和/或宾语,得到谓语对应的动作关键词和/或宾语对应的对象关键词。然后,文本语义分析模块可以将最终得到的文本数据(即调整后的初始文本数据)发送给语音交互服务模块。
语音交互服务模块在接收到该文本数据后,可以将该文本数据中的动作关键词和/或对象关键词,与指令型文本数据中的动作关键词与对象关键词进行匹配,并基于匹配到的指令型文本数据生成控制指令。其中,预设的指令型文本数据,是指预先设定于终端内部、可以用于生成控制指令的文本数据。
具体的,在一种示例中,语音交互服务模块可以将文本数据中的动作关键词与指令型文本数据中的动作关键词进行匹配,并确定所匹配到的动作关键词,将其作为第一动作关键词,同时,将文本数据中的对象关键词与指令型文本数据中的对象关键词进行匹配,并将所匹配到的对象关键词作为第一对象关键词,然后,基于所匹配到的第一动作关键词、第一对象关键词,可以生成相应的控制指令。
当然,语音交互服务模块根据接收到的文本数据生成相应的控制指令的实施方式存在多种,具体可以想见上述实施例中的相关之处描述即可,在此不再赘述。
语音交互服务模块在生成控制指令后,可以将该控制指令发送至相应的应用程序,以使得该应用程序对客户端执行进行的操 作。比如,如果生成的控制指令为打开蓝牙、提高显示屏亮度等控制指令,则语音交互服务模块可以将该控制指令发送至系统设置的应用程序中进行执行;如果生成的控制指令为解压文件、拷贝文件等控制指令,则终端可以将该控制指令发送至文件管理器中进行执行;如果生成的控制指令为最大化、最小化显示窗口的控制指令,则终端可以将该控制指令发送至窗口管理器中进行执行。
可见,在用户与客户端进行交互的过程中,由于客户端可以识别出语音控制触发操作,用户可以直接在交互界面上的任意区域触发语音数据的输入,而无需受限于特定的语音输入界面,因此,用户不需要再执行相关操作以使得终端的显示界面由交互界面切换到语音输入界面,相比于现有技术而言,用户不需要执行退出显示窗口的操作,查找语音控制服务的控件的操作,从而减少了用户所需执行的操作步骤,提高了用户与客户端之间的交互效率,也提高了用户的使用体验。
此外,本申请实施例还提供了一种语音控制的装置。参阅图4,图4示出了本申请实施例中一种语音控制的装置结构示意图,该装置400包括:
接收模块401,用于响应于针对于交互界面的触发操作,接收语音数据,所述触发操作为客户端在所述交互界面上所识别的触发语音控制的操作;
转换模块402,用于将所述语音数据转换为文本数据;
生成模块403,用于基于所述文本数据,生成控制指令;
执行模块404,用于执行所述控制指令。
在一些可能的实施方式中,所述转换模块402,包括:
转换单元,用于将所述语音数据转换为初始文本数据;
调整单元,用于通过对所述初始文本数据进行语义分析,调整所述初始文本数据,将所述调整后的初始文本数据作为所述文本数据。
在一些可能的实施方式中,所述生成模块403进一步用于,
将所述文本数据与预设的指令型文本数据进行匹配,并基于匹配 到的指令型文本数据生成控制指令。
在一些可能的实施方式中,所述装置400还包括:
确定模块,用于通过对所述初始文本数据进行语义分析,确定所述调整后的初始文本数据中的动作关键词和/或对象关键词;以及生成模块进一步用于:基于所述动作关键词和/或对象关键词,生成所述控制指令。
在一些可能的实施方式中,所述文本数据包括动作关键词和对象关键词,则所述生成模块403,包括:
第一匹配单元,用于将所述文本数据中的动作关键词,与所述预设的指令型文本数据中的动作关键词进行匹配,确定第一动作关键词,所述第一动作关键词是指在所述预设的指令型文本数据中所匹配到的动作关键词;
第二匹配单元,用于将所述文本数据中的对象关键词,与所述预设的指令型文本数据中的对象关键词进行匹配,确定第一对象关键词,所述第一对象关键词是指在所述预设的指令型文本数据中所匹配到的对象关键词;
第一生成单元,用于基于所述第一动作关键词与所述第一对象关键词,生成所述控制指令。
在一些可能的实施方式中,所述文本数据包括动作关键词,则所述生成模块403,包括:
第三匹配单元,用于将所述文本数据中的动作关键词,与所述预设的指令型文本数据中的动作关键词进行匹配,确定第二动作关键词,所述第二动作关键词是指在所述预设的指令型文本数据中所匹配到的动作关键词;
第一确定单元,用于根据所述触发操作的操作对象确定第二对象关键词;
第二生成单元,用于基于所述第二动作关键词与所述第二对象关键词,生成所述控制指令。
在一些可能的实施方式中,所述文本数据包括对象关键词,则所 述生成模块403,包括:
第四匹配单元,用于将所述文本数据中的对象关键词,与所述预设的指令型文本数据中的对象关键词进行匹配,确定第三对象关键词,所述第三对象关键词是指在所述预设的指令型文本数据中所匹配到的对象关键词;
第二确定单元,用于根据所述第三对象关键词确定第三动作关键词;
第三生成单元,用于基于所述第三动作关键词与所述第三对象关键词,生成所述控制指令。
在一些可能的实施方式中,所述生成模块403,包括:
第三确定单元,用于对所述文本数据进行语义分析,确定第四动作关键词;
第四确定单元,用于根据所述触发操作的操作对象确定第四对象关键词;
第四生成单元,用于基于所述第四动作关键词与所述第四对象关键词,生成所述控制指令。
在一些可能的实施方式中,所述装置400还包括:
呈现模块,用于呈现语音录入弹窗;
其中,在接收到所述语音数据时所述语音录入弹窗的呈现形式,与没有接收到所述语音数据时所述语音录入弹窗的呈现形式存在差异。
本申请实施例中,由于客户端可以识别出语音控制触发操作,用户可以直接在交互界面上的任意区域触发语音数据的输入,而无需受限于特定的语音输入界面,因此,用户不需要再执行相关操作以使得终端的显示界面由交互界面切换到语音输入界面,相比于现有技术而言,用户不需要执行退出显示窗口的操作,查找语音控制服务的控件的操作,从而减少了用户所需执行的操作步骤,提高了用户与客户端之间的交互效率,也提高了用户的使用体验。
需要说明的是,本说明书中各个实施例采用递进的方式描述,每 个实施例重点说明的都是与其他实施例的不同之处,各个实施例之间相同相似部分互相参见即可。对于实施例公开的装置而言,由于其与实施例公开的方法相对应,所以描述的比较简单,相关之处参见方法部分说明即可。
还需要说明的是,在本文中,诸如第一和第二等之类的关系术语仅仅用来将一个实体或者操作与另一个实体或操作区分开来,而不一定要求或者暗示这些实体或操作之间存在任何这种实际的关系或者顺序。而且,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、物品或者设备不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、物品或者设备所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括所述要素的过程、方法、物品或者设备中还存在另外的相同要素。
结合本文中所公开的实施例描述的方法或算法的步骤可以直接用硬件、处理器执行的软件模块,或者二者的结合来实施。软件模块可以置于随机存储器(RAM)、内存、只读存储器(ROM)、电可编程ROM、电可擦除可编程ROM、寄存器、硬盘、可移动磁盘、CD-ROM、或技术领域内所公知的任意其它形式的存储介质中。
对所公开的实施例的上述说明,使本领域专业技术人员能够实现或使用本申请。对这些实施例的多种修改对本领域的专业技术人员来说将是显而易见的,本文中所定义的一般原理可以在不脱离本申请的精神或范围的情况下,在其它实施例中实现。因此,本申请将不会被限制于本文所示的这些实施例,而是要符合与本文所公开的原理和新颖特点相一致的最宽的范围。

Claims (19)

  1. 一种语音控制的方法,其特征在于,所述方法包括:
    响应于针对于交互界面的触发操作,接收语音数据,所述触发操作为客户端在所述交互界面上所识别的触发语音控制的操作;
    将所述语音数据转换为文本数据;
    基于所述文本数据,生成控制指令;
    执行所述控制指令。
  2. 根据权利要求1所述的方法,其特征在于,所述将所述语音数据转换为文本数据,包括:
    将所述语音数据转换为初始文本数据;
    通过对所述初始文本数据进行语义分析,调整所述初始文本数据,将所述调整后的初始文本数据作为所述文本数据。
  3. 根据权利要求1所述的方法,其特征在于,所述基于所述文本数据,生成控制指令,包括:
    将所述文本数据与预设的指令型文本数据进行匹配,并基于匹配到的指令型文本数据生成控制指令。
  4. 根据权利要求2所述的方法,其特征在于,所述方法还包括:
    通过对所述初始文本数据进行语义分析,确定所述调整后的初始文本数据中的动作关键词和/或对象关键词;以及
    所述基于所述文本数据,生成控制指令,包括:
    基于所述动作关键词和/或对象关键词,生成所述控制指令。
  5. 根据权利要求3所述的方法,其特征在于,所述文本数据包括动作关键词和对象关键词,则所述将所述文本数据与预设的指令型文本数据进行匹配,并基于匹配到的指令型文本数据生成控制指令,包括:
    将所述文本数据中的动作关键词,与所述预设的指令型文本数据中的动作关键词进行匹配,确定第一动作关键词,所述第一动作关键词是指在所述预设的指令型文本数据中所匹配到的动作关键词;
    将所述文本数据中的对象关键词,与所述预设的指令型文本数据中的对象关键词进行匹配,确定第一对象关键词,所述第一对象关键词是指在所述预设的指令型文本数据中所匹配到的对象关键词;
    基于所述第一动作关键词与所述第一对象关键词,生成所述控制指令。
  6. 根据权利要求3所述的方法,其特征在于,所述文本数据包括动作关键词,则所述将所述文本数据与预设的指令型文本数据进行匹配,并基于匹配到的指令型文本数据生成控制指令,包括:
    将所述文本数据中的动作关键词,与所述预设的指令型文本数据中的动作关键词进行匹配,确定第二动作关键词,所述第二动作关键词是指在所述预设的指令型文本数据中所匹配到的动作关键词;
    根据所述触发操作的操作对象确定第二对象关键词;
    基于所述第二动作关键词与所述第二对象关键词,生成所述控制指令。
  7. 根据权利要求3所述的方法,其特征在于,所述文本数据包括对象关键词,则所述将所述文本数据与预设的指令型文本数据进行匹配,并基于匹配到的指令型文本数据生成控制指令,包括:
    将所述文本数据中的对象关键词,与所述预设的指令型文本数据中的对象关键词进行匹配,确定第三对象关键词,所述第三对象关键词是指在所述预设的指令型文本数据中所匹配到的对象关键词;
    根据所述第三对象关键词确定第三动作关键词;
    基于所述第三动作关键词与所述第三对象关键词,生成所述控制 指令。
  8. 根据权利要求1所述的方法,其特征在于,所述基于所述文本数据,生成控制指令,包括:
    对所述文本数据进行语义分析,确定第四动作关键词;
    根据所述触发操作的操作对象确定第四对象关键词;
    基于所述第四动作关键词与所述第四对象关键词,生成所述控制指令。
  9. 根据权利要求1所述的方法,其特征在于,所述方法还包括:
    呈现语音录入弹窗;
    其中,在接收到所述语音数据时所述语音录入弹窗的呈现形式,与没有接收到所述语音数据时所述语音录入弹窗的呈现形式存在差异。
  10. 根据权利要求7所述的方法,其特征在于,所述根据所述第三对象关键词确定第三动作关键词,包括:
    将与所述第三对象关键词之间适用性最高的动作关键词,确定为所述第三动作关键词。
  11. 一种语音控制的装置,其特征在于,所述装置包括:
    接收模块,用于响应于针对于交互界面的触发操作,接收语音数据,所述触发操作为客户端在所述交互界面上所识别的触发语音控制的操作;
    转换模块,用于将所述语音数据转换为文本数据;
    生成模块,用于基于所述文本数据,生成控制指令;
    执行模块,用于执行所述控制指令。
  12. 根据权利要求11所述的装置,其特征在于,所述转换模块包括:
    转换单元,用于将所述语音数据转换为初始文本数据;
    调整单元,用于通过对所述初始文本数据进行语义分析,调整所述初始文本数据,将所述调整后的初始文本数据作为所述文本数据。
  13. 根据权利要求11所述的装置,其特征在于,所述生成模块进一步用于:
    将所述文本数据与预设的指令型文本数据进行匹配,并基于匹配到的指令型文本数据生成控制指令。
  14. 根据权利要求12所述的装置,其特征在于,所述装置还包括:
    确定模块,用于通过对所述初始文本数据进行语义分析,确定所述调整后的初始文本数据中的动作关键词和/或对象关键词;以及
    所述生成模块进一步用于:
    基于所述动作关键词和/或对象关键词,生成所述控制指令。
  15. 根据权利要求13所述的装置,其特征在于,所述文本数据包括动作关键词和对象关键词,则所述生成模块,包括:
    第一匹配单元,用于将所述文本数据中的动作关键词,与所述预设的指令型文本数据中的动作关键词进行匹配,确定第一动作关键词,所述第一动作关键词是指在所述预设的指令型文本数据中所匹配到的动作关键词;
    第二匹配单元,用于将所述文本数据中的对象关键词,与所述预设的指令型文本数据中的对象关键词进行匹配,确定第一对象关键词,所述第一对象关键词是指在所述预设的指令型文本数据中所匹配到的对象关键词;
    第一生成单元,用于基于所述第一动作关键词与所述第一对象关键词,生成所述控制指令。
  16. 根据权利要求13所述的装置,其特征在于,所述文本数据包括动作关键词,则所述生成模块,包括:
    第三匹配单元,用于将所述文本数据中的动作关键词,与所述预设的指令型文本数据中的动作关键词进行匹配,确定第二动作关键词,所述第二动作关键词是指在所述预设的指令型文本数据中所匹配到的动作关键词;
    第一确定单元,用于根据所述触发操作的操作对象确定第二对象关键词;
    第二生成单元,用于基于所述第二动作关键词与所述第二对象关键词,生成所述控制指令。
  17. 根据权利要求13所述的装置,其特征在于,所述文本数据包括对象关键词,则所述生成模块,包括:
    第四匹配单元,用于将所述文本数据中的对象关键词,与所述预设的指令型文本数据中的对象关键词进行匹配,确定第三对象关键词,所述第三对象关键词是指在所述预设的指令型文本数据中所匹配到的对象关键词;
    第二确定单元,用于根据所述第三对象关键词确定第三动作关键词;
    第三生成单元,用于基于所述第三动作关键词与所述第三对象关键词,生成所述控制指令。
  18. 一种设备,其特征在于,包括:
    一个或多个处理器;和
    存储装置,用于存储一个或多个程序,
    当所述一个或多个程序被所述一个或多个处理器执行,使得所述一个或多个处理器实现如权利要求1-10任一所述的语音控制的方法。
  19. 一种计算机可读介质,其上存储有计算机程序,该程序被处理器执行时实现如权利要求1-10任一所述的语音控制的方法。
PCT/CN2019/085905 2018-05-14 2019-05-07 一种语音控制的方法及装置 WO2019218903A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US17/020,509 US20200411008A1 (en) 2018-05-14 2020-09-14 Voice control method and device

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201810456387.XA CN109741737B (zh) 2018-05-14 2018-05-14 一种语音控制的方法及装置
CN201810456387.X 2018-05-14

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US17/020,509 Continuation US20200411008A1 (en) 2018-05-14 2020-09-14 Voice control method and device

Publications (1)

Publication Number Publication Date
WO2019218903A1 true WO2019218903A1 (zh) 2019-11-21

Family

ID=66354307

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/085905 WO2019218903A1 (zh) 2018-05-14 2019-05-07 一种语音控制的方法及装置

Country Status (3)

Country Link
US (1) US20200411008A1 (zh)
CN (2) CN111627436B (zh)
WO (1) WO2019218903A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112135294A (zh) * 2020-09-21 2020-12-25 Oppo广东移动通信有限公司 无线加密方法及其客户终端设备

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20220148574A1 (en) * 2019-02-25 2022-05-12 Faurecia Clarion Electronics Co., Ltd. Hybrid voice interaction system and hybrid voice interaction method
CN110532412A (zh) * 2019-08-28 2019-12-03 维沃移动通信有限公司 一种文件处理方法及移动终端
CN111309283B (zh) * 2020-03-25 2023-12-05 北京百度网讯科技有限公司 用户界面的语音控制方法、装置、电子设备及存储介质
CN113643697A (zh) * 2020-04-23 2021-11-12 百度在线网络技术(北京)有限公司 一种语音控制方法、装置、电子设备及存储介质
CN113035194B (zh) * 2021-03-02 2022-11-29 海信视像科技股份有限公司 一种语音控制方法、显示设备及服务器
CN113223556A (zh) * 2021-03-25 2021-08-06 惠州市德赛西威汽车电子股份有限公司 一种用于车载语音系统的语句合成测试方法
CN114121013A (zh) * 2021-12-07 2022-03-01 杭州逗酷软件科技有限公司 语音控制方法、装置、电子设备及存储介质

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130226590A1 (en) * 2012-02-29 2013-08-29 Pantech Co., Ltd. Voice input apparatus and method
CN103442138A (zh) * 2013-08-26 2013-12-11 华为终端有限公司 语音控制方法、装置及终端
CN105957530A (zh) * 2016-04-28 2016-09-21 海信集团有限公司 一种语音控制方法、装置和终端设备
CN106250474A (zh) * 2016-07-29 2016-12-21 Tcl集团股份有限公司 一种语音控制的处理方法及系统
CN106504748A (zh) * 2016-10-08 2017-03-15 珠海格力电器股份有限公司 一种语音控制方法和装置
CN107948698A (zh) * 2017-12-14 2018-04-20 深圳市雷鸟信息科技有限公司 智能电视的语音控制方法、系统及智能电视

Family Cites Families (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9256396B2 (en) * 2011-10-10 2016-02-09 Microsoft Technology Licensing, Llc Speech recognition for context switching
US20130325466A1 (en) * 2012-05-10 2013-12-05 Clickberry, Inc. System and method for controlling interactive video using voice
CN102750087A (zh) * 2012-05-31 2012-10-24 华为终端有限公司 控制语音识别功能的方法、装置和终端设备
CN103488401A (zh) * 2013-09-30 2014-01-01 乐视致新电子科技(天津)有限公司 一种语音助手激活方法和装置
CN104599669A (zh) * 2014-12-31 2015-05-06 乐视致新电子科技(天津)有限公司 一种语音控制方法和装置
CN105094644B (zh) * 2015-08-11 2018-07-10 百度在线网络技术(北京)有限公司 用于应用程序的语音搜索方法和系统
CN105551487A (zh) * 2015-12-07 2016-05-04 北京云知声信息技术有限公司 一种语音控制方法及装置
US20190258318A1 (en) * 2016-06-28 2019-08-22 Huawei Technologies Co., Ltd. Terminal for controlling electronic device and processing method thereof
CN107799115A (zh) * 2016-08-29 2018-03-13 法乐第(北京)网络科技有限公司 一种语音识别方法及装置
CN107507614B (zh) * 2017-07-28 2018-12-21 北京小蓦机器人技术有限公司 结合ui执行自然语言命令的方法、设备、系统与存储介质

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130226590A1 (en) * 2012-02-29 2013-08-29 Pantech Co., Ltd. Voice input apparatus and method
CN103442138A (zh) * 2013-08-26 2013-12-11 华为终端有限公司 语音控制方法、装置及终端
CN105957530A (zh) * 2016-04-28 2016-09-21 海信集团有限公司 一种语音控制方法、装置和终端设备
CN106250474A (zh) * 2016-07-29 2016-12-21 Tcl集团股份有限公司 一种语音控制的处理方法及系统
CN106504748A (zh) * 2016-10-08 2017-03-15 珠海格力电器股份有限公司 一种语音控制方法和装置
CN107948698A (zh) * 2017-12-14 2018-04-20 深圳市雷鸟信息科技有限公司 智能电视的语音控制方法、系统及智能电视

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112135294A (zh) * 2020-09-21 2020-12-25 Oppo广东移动通信有限公司 无线加密方法及其客户终端设备

Also Published As

Publication number Publication date
US20200411008A1 (en) 2020-12-31
CN109741737A (zh) 2019-05-10
CN111627436A (zh) 2020-09-04
CN109741737B (zh) 2020-07-21
CN111627436B (zh) 2023-07-04

Similar Documents

Publication Publication Date Title
WO2019218903A1 (zh) 一种语音控制的方法及装置
US11735173B2 (en) Automatically determining language for speech recognition of spoken utterance received via an automated assistant interface
KR101213835B1 (ko) 음성 인식에 있어서 동사 에러 복원
EP3926625B1 (en) Voice to text conversion based on third-party agent content
EP3611723B1 (en) Graphical user interface voice control apparatus/system and method
EP3724875B1 (en) Text independent speaker recognition
US11789695B2 (en) Automatic adjustment of muted response setting
US20220068267A1 (en) Method and apparatus for recognizing speech, electronic device and storage medium
GB2565420A (en) Interactive sessions
CN116830075A (zh) 助理命令的被动消歧
EP3149926B1 (en) System and method for handling a spoken user request
US20200411004A1 (en) Content input method and apparatus
US10963640B2 (en) System and method for cooperative text recommendation acceptance in a user interface
US20230025709A1 (en) Transferring dialog data from an initially invoked automated assistant to a subsequently invoked automated assistant
WO2023040692A1 (zh) 语音控制方法、装置、设备及介质
US20240064363A1 (en) Voice-based scene selection for video content on a computing device
US20240029728A1 (en) System(s) and method(s) to enable modification of an automatically arranged transcription in smart dictation
US20240185848A1 (en) Generating a group automated assistant session to provide content to a plurality of users via headphones
US20240161741A1 (en) Short-Lived Repeat Voice Commands
WO2023003585A1 (en) Transferring dialog data from an initially invoked automated assistant to a subsequently invoked automated assistant
WO2024019766A1 (en) System(s) and method(s) to enable modification of an automatically arranged transcription in smart dictation

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19802547

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19802547

Country of ref document: EP

Kind code of ref document: A1