WO2018133307A1 - 一种实现语音控制的方法和终端 - Google Patents

一种实现语音控制的方法和终端 Download PDF

Info

Publication number
WO2018133307A1
WO2018133307A1 PCT/CN2017/088150 CN2017088150W WO2018133307A1 WO 2018133307 A1 WO2018133307 A1 WO 2018133307A1 CN 2017088150 W CN2017088150 W CN 2017088150W WO 2018133307 A1 WO2018133307 A1 WO 2018133307A1
Authority
WO
WIPO (PCT)
Prior art keywords
keyword
text
terminal
server
keyword text
Prior art date
Application number
PCT/CN2017/088150
Other languages
English (en)
French (fr)
Inventor
李念
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Priority to US16/479,796 priority Critical patent/US11238860B2/en
Priority to CN201780084159.8A priority patent/CN110235087B/zh
Priority to EP17893471.7A priority patent/EP3561643B1/en
Publication of WO2018133307A1 publication Critical patent/WO2018133307A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/18Speech classification or search using natural language modelling
    • G10L15/1822Parsing for meaning understanding
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/28Constructional details of speech recognition systems
    • G10L15/30Distributed recognition, e.g. in client-server systems, for mobile phones or network applications
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L2015/088Word spotting
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • G10L2015/223Execution procedure of a spoken command

Definitions

  • the present invention relates to the field of electronic technologies, and in particular, to a method and a terminal for implementing voice control.
  • Voice recognition and control are relatively mature and widely used, such as mobile phone input method, car electrical control, etc.
  • Smart home is an upgrade function for traditional home appliances. It can remotely control home appliances through smart terminals such as mobile phones and computers, and can control multiple devices at the same time.
  • a home appliance or automatic repeat control is now widely implemented to implement voice control.
  • the voice cloud can obtain the user's usage of all the device names, room names and other custom keywords by reading the data of the loT cloud, and the method of identifying and parsing will greatly increase the cost and And there are problems with security.
  • a method and terminal for implementing voice control according to some embodiments of the present invention are directed to solving the problem of low success and security of voice control for improving user personalized settings.
  • an embodiment of the present invention provides a method for implementing voice control, where the method includes: the terminal records a correspondence between a first keyword text and a second keyword text; when a voice input by a user The terminal sends the voice input by the user to the first server for semantic parsing and logical parsing; then, when the first server returns parsing failure, the terminal acquires the parsed text returned by the first server, And the second keyword in the parsed text is replaced by the first keyword according to the corresponding relationship, and then sent to the first server; and then the terminal receives the logical resolution of the first server successfully.
  • the returned control command structure and performs a function according to the control command structure to play a notification voice.
  • the method records, by the terminal, a correspondence between the first keyword text and the second keyword text, when the first server cannot parse the second keyword text, the terminal according to the correspondence
  • the second keyword text is replaced with the first keyword text and then sent to the first server for semantic parsing and logical parsing, which can provide the user with personalized voice commands without increasing the additional processing complexity of the first server, The cost is increased, and the recognition accuracy is improved.
  • the second server is not required to provide the first keyword text and the second keyword text relationship description, thereby reducing the user's personal private information and all the user information of the enterprise to the outside world, thereby improving security.
  • the terminal records the correspondence between the first keyword text and the second keyword text, including: when the user modifies the first keyword text to the second
  • the operation of the keyword text the terminal records the correspondence between the first keyword text and the second keyword text
  • the terminal records the correspondence relationship at any time according to the operation of the user, without relying on external devices, operating Convenient and fast update.
  • the terminal records the correspondence between the first keyword text and the second keyword text, including: the terminal acquiring and recording the first from the second server
  • the second server may be an Internet of Things server, which records the first keyword text and the second keyword text modification operation set by the user on the terminal, the terminal When the second server interacts with the second server, the corresponding relationship may be obtained from the second server.
  • the method may enable the terminal to collect and record the modification operation in real time, and reduce the processing logic complexity of the terminal.
  • the terminal saves a correspondence between the first keyword text and the second keyword text, where the terminal: the first key The word text and the second keyword text and their corresponding relationships are recorded in the vocabulary list.
  • the terminal acquires the parsed text returned by the first server, and according to the correspondence, Transmitting the second keyword in the parsed text to the first server after the first keyword is replaced by: the terminal, the second keyword text in the vocabulary list and the parsed text The matching is performed; the terminal replaces the second keyword matched in the parsed text with the corresponding first keyword, and sends the second keyword to the first server.
  • the terminal records the first keyword text and the second keyword text and their corresponding relationships in a vocabulary list, including: the terminal according to The first keyword text has different types, and the first keyword text and the second keyword text and their corresponding relationships are recorded in different vocabulary lists.
  • the terminal performs a function according to the control command structure, and plays the notification voice, where the terminal: the first key in the control command structure
  • the word text is replaced with the corresponding second keyword text;
  • the terminal generates an executable control command according to the replaced control command structure, and executes the control command;
  • the terminal according to the replaced The control command structure is described, a notification voice is generated, and the notification voice is played.
  • the function is executed, including sending the execution function command to the device or sending the device to the device through the second server, which enables the device or the second server to more easily understand the meaning of the execution command, and notify the voice. Playing the second keyword text can avoid misunderstandings caused by vocabulary changes, thereby improving the user experience.
  • the terminal after the terminal replaces the second keyword matched in the parsed text with the corresponding first keyword, the terminal further includes: The terminal records the second keyword and the corresponding replacement record of the first keyword; the terminal replaces the first keyword text in the control command structure with the corresponding second key
  • the word text includes: the terminal replacing the first keyword text in the control command structure with the corresponding second keyword text according to the replacement record.
  • the method further includes: when the terminal sends the voice input by the user to a first server for semantic parsing and logical parsing, the terminal Transmitting the correspondence between the first keyword text and the second keyword text to the first server.
  • an embodiment of the present invention provides a terminal for implementing voice control, where the terminal includes: at least one processor; at least one memory, the at least one memory includes a plurality of instructions; and the processor performs the The instructions cause the terminal to perform at least the following steps: recording a correspondence between the first keyword text and the second keyword text; and when the user inputs the voice: transmitting the voice input by the user to the first server for semantic analysis and logic Parsing; when the first server returns parsing failure, acquiring the parsed text returned by the first server, and replacing the second keyword in the parsed text with the first keyword according to the correspondence After sending to the first a server; receiving a control command structure returned after the first server logic is successfully parsed, and performing a function according to the control command structure to play a notification voice.
  • the processor executes the plurality of instructions to cause the terminal Perform at least the following steps:
  • the processor executes the plurality of instructions to cause the The terminal performs at least the following steps:
  • the correspondence between the first keyword text and the second keyword text is acquired and recorded from the second server.
  • the processor executes the instructions The terminal is caused to perform at least the following steps:
  • the first keyword text and the second keyword text and their corresponding relationships are recorded in a vocabulary list.
  • the first keyword text and the second keyword text and their corresponding relationships are recorded in a vocabulary list
  • the processor executes the The instructions cause the terminal to perform at least the following steps: recording the first keyword text and the second keyword text and their corresponding relationships in different vocabulary lists according to different types of the first keyword text .
  • the processor executing the instructions causes the terminal to perform at least the following steps: The second keyword text in the matching is matched with the parsed text; the second keyword matched in the parsed text is replaced with the corresponding first keyword, and then sent to the first server.
  • the processor executes the instructions to enable the terminal to perform at least the following Step: replacing the first keyword text in the control command structure with the corresponding second keyword text; generating an executable control command according to the replaced control command structure, and executing the Controlling a command; generating a notification voice according to the replaced control command structure, and playing the notification voice.
  • the processing Executing the instructions to cause the terminal to perform at least the following steps: recording the second keyword and the corresponding replacement record of the first keyword; and replacing the first keyword text in the control command structure
  • the corresponding second keyword text includes: replacing the first keyword text in the control command structure with the corresponding second keyword text according to the replacement record.
  • the processor further executes the instructions to enable the terminal to perform at least the following steps:
  • the voice input by the user is sent to the first server for semantic analysis and logical analysis
  • the correspondence between the first keyword text and the second keyword text is sent to the first server.
  • an embodiment of the present invention provides a terminal for implementing voice control, where a sending unit, a replacing unit and an executing unit; the recording unit is configured to record a correspondence between the first keyword text and the second keyword text; when the user inputs the voice: the first sending unit is configured to use the user The input voice is sent to the first server for semantic parsing and logical parsing; the replacing unit is configured to acquire, when the first server returns parsing failure, the parsed text returned by the first server, according to the correspondence relationship The second keyword in the parsed text is replaced by the first keyword and sent to the first server; the execution unit is configured to receive a control command structure returned after the first server logic is successfully parsed And executing a function according to the control command structure to play a notification voice.
  • the recording unit includes: a first recording subunit, configured to: when the user modifies the first keyword text into the second keyword text, record the A correspondence between a keyword text and the second keyword text.
  • the recording unit includes: a second recording subunit, configured to acquire and record a correspondence between the first keyword text and the second keyword text from the second server.
  • the recording unit further includes: a third recording subunit, configured to use the first keyword text and the second keyword text and The correspondence is recorded in the vocabulary list.
  • the third recording subunit is configured to: use the first keyword text and the location according to different types of the first keyword text
  • the second keyword text and its corresponding relationship are recorded in different vocabulary lists.
  • the replacing unit includes: a matching subunit, configured to perform the second keyword text in the vocabulary list and the parsed text And a replacement subunit, configured to replace the second keyword matched in the parsed text with the corresponding first keyword, and send the first keyword to the first server.
  • the execution unit includes: a replacement subunit, configured to replace the first keyword text in the control command structure with a corresponding one.
  • the second keyword text an execution subunit, configured to generate an executable control command according to the replaced control command structure, and execute the control command;
  • the voice generation subunit is configured to be used according to the replaced
  • the control command structure is described, a notification voice is generated, and the notification voice is played.
  • the replacing unit further includes: a replacement recording subunit, wherein the second keyword matched in the parsed text is replaced with a corresponding one After the first keyword, recording a replacement record of the second keyword and the corresponding first keyword; the re-substituting subunit is configured to: according to the replacement record, the control command structure The first keyword text is replaced with the corresponding second keyword text.
  • the terminal further includes: a second sending unit, configured to send the voice input by the user to the first server for semantic parsing and logical parsing And transmitting the correspondence between the first keyword text and the second keyword text to the first server.
  • a second sending unit configured to send the voice input by the user to the first server for semantic parsing and logical parsing And transmitting the correspondence between the first keyword text and the second keyword text to the first server.
  • an embodiment of the present invention provides a computer readable storage medium, comprising instructions, when executed on a computer, causing a computer to perform the method as described in the embodiments of the first aspect.
  • an embodiment of the present invention provides a computer program product comprising instructions, which when executed on a computer, cause the computer to perform the method as described in the embodiments of the first aspect.
  • FIG. 1 is a schematic structural diagram of a system for implementing voice control according to an embodiment of the present invention
  • FIG. 2 is a schematic structural diagram of a terminal according to an embodiment of the present invention.
  • FIG. 3 is a schematic flowchart diagram of a method for implementing voice control according to an embodiment of the present invention
  • FIG. 4 is a schematic structural diagram of a system for implementing voice control according to an embodiment of the present invention.
  • FIG. 5 is a schematic flowchart of a method for implementing voice control according to an embodiment of the present invention.
  • FIG. 6 is a schematic diagram showing an improvement of a terminal for implementing voice control according to an embodiment of the present invention.
  • FIG. 7 is a schematic diagram of a process for implementing voice control according to an embodiment of the present invention.
  • FIG. 8 is a schematic flowchart diagram of a method for implementing a voice control according to an embodiment of the present invention.
  • the following describes an embodiment of the terminal, the device, the server, the system, and the terminal to implement voice control with the server in the embodiment of the present invention, to improve the accuracy and success rate of voice control.
  • Speech recognition and control are relatively mature and widely used, such as mobile phone input methods, vehicle electrical control.
  • the operation control of smart homes also generally implements the voice control function.
  • Smart home is an upgrade function for traditional home appliances. It can remotely control home appliances through smart terminals such as mobile phones and computers, and can simultaneously control multiple home appliances or achieve automatic repetition.
  • Control at present, the voice control function is generally implemented. The user can operate the home appliance by speaking a control command to the mobile phone or the control terminal supporting voice input.
  • the latter has many forms of voice control terminals, such as smart speakers, routers, cameras, dedicated voice control terminals, etc.
  • the following intelligent devices that support voice control are collectively referred to as "terminals" or "voice terminals.”
  • speech cloud mainly server clusters in computing centers.
  • the voice cloud server identifies and processes, converts the text into a text, control command data structure, etc., and returns the terminal to the terminal.
  • These data are converted into home appliance control commands to perform the user's control intent.
  • the voice cloud is not just for smart home services, it also supports mobile phones, other voice services on the car, and is provided by a separate carrier.
  • Figure 1 shows the current typical smart home system networking diagram.
  • the system architecture of the present invention uses a speech-based cloud-based voice and doctrine-based voice control system that includes a terminal, a device, and one or more servers.
  • the smart device may be a smart home appliance, including various devices in the home through an Internet of Things technology, such as audio and video equipment, lighting systems, curtain control, air conditioning control, security systems, digital cinema systems, Audio and video servers, video cabinet systems, network appliances, etc.
  • the electronic device has data processing capability, and can not only provide the traditional living function, but also support the user to use the terminal to perform remote control, timing control and other intelligent functions through the network.
  • the smart home appliance may be another device that needs to perform a network connection and can implement a network connection through a terminal.
  • the smart appliance is, for example, a smart TV.
  • smart TVs also have a processor, memory, network connection device, can carry a variety of operating systems, and can be connected to the Internet, can be similar to the terminal to support a variety of interactive applications, such as Install, update, and delete apps based on the user.
  • the terminal may be a portable electronic device that also includes other functions such as a personal digital assistant and/or a music player function, such as a mobile phone, a tablet, a wearable electronic device with wireless communication capabilities (eg, Smart watches) and so on.
  • portable electronic devices include, but are not limited to, piggybacking Or portable electronic devices of other operating systems.
  • the portable electronic device described above may also be other portable electronic devices, such as a laptop having a touch-sensitive surface, such as a touchpad, or the like.
  • the terminal may also be a device that can be used as a mobile security agent, such as a remote controller conforming to the same specification, a smart environment detector, or the like.
  • the terminal in the embodiment of the present invention may be the mobile phone 100.
  • the embodiment will be specifically described below by taking the mobile phone 100 as an example. It should be understood that the illustrated mobile phone 100 is only one example of a terminal, and the mobile phone 100 may have more or fewer components than those shown in the figures, two or more components may be combined, or may have Different component configurations.
  • the various components shown in the figures can be implemented in hardware, software, or a combination of hardware and software, including one or more signal processing and/or application specific integrated circuits.
  • the mobile phone 100 may specifically include: a processor 101, a radio frequency (RF) circuit 102, a memory 103, a touch screen 104, a Bluetooth device 105, one or more sensors 106, a Wi-Fi device 107, a positioning device 108, Components such as audio circuit 109, peripheral interface 110, and power system 111. These components can communicate over one or more communication buses or signal lines (not shown in Figure 2). It will be understood by those skilled in the art that the hardware structure shown in FIG. 2 does not constitute a limitation on the mobile phone 100, and the mobile phone 100 may include more or less components than those illustrated, or combine some components, or different component arrangements. .
  • RF radio frequency
  • the processor 101 is a control center of the mobile phone 100, and connects various parts of the mobile phone 100 by using various interfaces and lines, by running or executing an application stored in the memory 103 (hereinafter referred to as App), and calling the memory stored in the memory 103.
  • App an application stored in the memory 103
  • the data and instructions perform various functions and processing data of the handset 100.
  • processor 101 may include one or more processing units; processor 101 may also integrate an application processor and a modem processor; wherein the application processor primarily processes operating systems, user interfaces, applications, etc.
  • the modem processor primarily handles wireless communications. It can be understood that the above modem processor may not be integrated into the processor 101.
  • the processor 101 can be an integrated chip.
  • the processor 101 may further include a fingerprint verification chip for verifying the collected fingerprint.
  • the radio frequency circuit 102 can be used to receive and transmit wireless signals during transmission or reception of information or calls. Specifically, the radio frequency circuit 102 can process the downlink data of the base station and then process it to the processor 101. In addition, the data related to the uplink is sent to the base station.
  • radio frequency circuits include, but are not limited to, an antenna, at least one amplifier, a transceiver, a coupler, a low noise amplifier, a duplexer, and the like.
  • the radio frequency circuit 102 can also communicate with other devices through wireless communication.
  • the wireless communication can use any communication standard or protocol, including but not limited to a global mobile communication system, a general packet without Line services, code division multiple access, wideband code division multiple access, long term evolution, email, short message service, etc.
  • the memory 103 is used to store applications and data, and the processor 101 executes various functions and data processing of the mobile phone 100 by running applications and data stored in the memory 103.
  • the memory 103 mainly includes a storage program area and a storage data area, wherein the storage program area can store an operating system, an application required for at least one function (such as a sound playing function, an image playing function, etc.); the storage data area can be stored according to the use of the mobile phone. Data created at 100 o'clock (such as audio data, phone book, etc.).
  • the memory 103 may include a high speed random access memory, and may also include a nonvolatile memory such as a magnetic disk storage device, a flash memory device, or other volatile solid state storage device.
  • the memory 103 can store various operating systems, such as those developed by Apple. Operating system, developed by Google Inc. Operating system, etc.
  • the touch screen 104 can include a touch panel 104-1 and a display 104-2.
  • the touch panel 104-1 can collect touch events on or near the user of the mobile phone 100 (for example, the user uses any suitable object such as a finger, a stylus, or the like on the touch panel 104-1 or on the touchpad 104.
  • the operation near -1) and the collected touch information is transmitted to other devices such as the processor 101.
  • the touch event of the user in the vicinity of the touch panel 104-1 may be referred to as a hovering touch; the hovering touch may mean that the user does not need to directly touch the touchpad in order to select, move or drag a target (eg, an icon, etc.) And only the user is located near the terminal in order to perform the desired function.
  • the touch panel 104-1 capable of floating touch can be realized by a capacitive type, an infrared light feeling, an ultrasonic wave, or the like.
  • the touch panel 104-1 may include two parts of a touch detection device and a touch controller. Wherein, the touch detection device detects the touch orientation of the user, and detects a signal brought by the touch operation, and transmits a signal to the touch controller; the touch controller receives the touch information from the touch detection device, and converts the touch information into contact coordinates, and then Sended to the processor 101, the touch controller can also receive instructions from the processor 101 and execute them.
  • the touch panel 104-1 can be implemented in various types such as resistive, capacitive, infrared, and surface acoustic waves.
  • a display (also referred to as display) 104-2 can be used to display information entered by the user or information provided to the user as well as various menus of the mobile phone 100.
  • the display 104-2 can be configured in the form of a liquid crystal display, an organic light emitting diode, or the like.
  • the touchpad 104-1 can be overlaid on the display 104-2, and when the touchpad 104-1 detects a touch event on or near it, it is transmitted to the processor 101 to determine the type of touch event, and then the processor 101 may provide a corresponding visual output on display 104-2 depending on the type of touch event.
  • the touchpad 104-1 and the display 104-2 are implemented as two separate components to implement the input and output functions of the handset 100, in some embodiments, the touchpad 104- 1 is integrated with the display screen 104-2 to implement the input and output functions of the mobile phone 100. It can be understood that the touch screen 104 is formed by stacking a plurality of layers of materials. In the embodiment of the present invention, only the touch panel (layer) and the display screen (layer) are shown, and other layers are not described in the embodiment of the present invention. .
  • the touch panel 104-1 may be overlaid on the display 104-2, and the size of the touch panel 104-1 is larger than the size of the display 104-2, so that the display 104- 2 is completely covered under the touch panel 104-1, or the touch panel 104-1 may be disposed on the front side of the mobile phone 100 in the form of a full-board, that is, the user's touch on the front of the mobile phone 100 can be perceived by the mobile phone. You can achieve a full touch experience on the front of your phone.
  • the touch panel 104-1 is disposed on the front side of the mobile phone 100 in the form of a full-board
  • the display screen 104-2 may also be disposed on the front side of the mobile phone 100 in the form of a full-board, so that the front side of the mobile phone is A frameless (Bezel) structure can be realized.
  • the mobile phone 100 may further have a fingerprint recognition function.
  • a fingerprint reader may be disposed on the back of the handset 100 (eg, below the rear camera) or on the front side of the handset 100 (eg, below the touch screen 104). No longer detailed.
  • the mobile phone 100 may further include a Bluetooth device 105 for implementing the mobile phone 100 with other short-distance terminals (such as a hand). Data exchange between machines, smart watches, etc.).
  • the Bluetooth device in the embodiment of the present invention may be an integrated circuit or a Bluetooth chip or the like.
  • the handset 100 can also include at least one type of sensor 106, such as a light sensor, motion sensor, and other sensors.
  • the light sensor may include an ambient light sensor and a proximity sensor, wherein the ambient light sensor may adjust the brightness of the display of the touch screen 104 according to the brightness of the ambient light, and the proximity sensor may turn off the power of the display when the mobile phone 100 moves to the ear.
  • the accelerometer sensor can detect the magnitude of acceleration in all directions (usually three axes). When it is stationary, it can detect the magnitude and direction of gravity. It can be used to identify the gesture of the mobile phone (such as horizontal and vertical screen switching, related Game, magnetometer attitude calibration), vibration recognition related functions (such as pedometer, tapping), etc.
  • the mobile phone 100 can also be configured with gyroscopes, barometers, hygrometers, thermometers, infrared sensors and other sensors, here Let me repeat.
  • the Wi-Fi device 107 is configured to provide the mobile phone 100 with network access complying with the Wi-Fi related standard protocol, and the mobile phone 100 can access the Wi-Fi access point through the Wi-Fi device 107, thereby helping the user to send and receive emails, Browsing web pages and accessing streaming media, etc., it provides users with wireless broadband Internet access.
  • the Wi-Fi device 107 can also function as a Wi-Fi wireless access point, and can provide Wi-Fi network access to other terminals.
  • the positioning device 108 is configured to provide a geographic location for the mobile phone 100. It can be understood that the positioning device 108 can be specifically a receiver of a positioning system such as a Global Positioning System (GPS) or a Beidou satellite navigation system, or a Russian GLONASS. After receiving the geographical location transmitted by the positioning system, the positioning device 108 sends the information to the processor 101 for processing, or sends it to the memory 103 for storage. In some other embodiments, the positioning device 108 may be a receiver for assisting an Global Positioning System (AGPS).
  • the AGPS is an operation mode for performing GPS positioning with a certain assist, which can utilize the signal of the base station.
  • the GPS satellite signal can make the mobile phone 100 locate faster; in the AGPS system, the positioning device 108 can obtain positioning assistance by communicating with an auxiliary positioning server (such as a mobile phone positioning server).
  • the AGPS system assists the positioning device 108 in performing the ranging and positioning services by acting as a secondary server, in which case the secondary positioning server provides positioning through communication over a wireless communication network with a positioning device 108 (i.e., GPS receiver) of the terminal, such as the handset 100. assist.
  • the positioning device 108 can also be a Wi-Fi access point based positioning technology. Since each Wi-Fi access point has a globally unique MAC address, the terminal can scan and collect the broadcast signals of the surrounding Wi-Fi access points when Wi-Fi is turned on, so that Wi- can be obtained.
  • the geographic location combined with the strength of the Wi-Fi broadcast signal, calculates the geographic location of the terminal and sends it to the location device 108 of the terminal.
  • the audio circuit 109, the speaker 113, and the microphone 114 can provide an audio interface between the user and the handset 100.
  • the audio circuit 109 can transmit the converted electrical data of the received audio data to the speaker 113 for conversion to the sound signal output by the speaker 113; on the other hand, the microphone 114 converts the collected sound signal into an electrical signal by the audio circuit 109. After receiving, it is converted into audio data, and then the audio data is output to the RF circuit 102 for transmission to, for example, another mobile phone, or the audio data is output to the memory 103 for further processing.
  • the peripheral interface 110 is used to provide various interfaces for external input/output devices (such as a keyboard, a mouse, an external display, an external memory, a subscriber identity module card, etc.).
  • external input/output devices such as a keyboard, a mouse, an external display, an external memory, a subscriber identity module card, etc.
  • a universal serial bus (USB) interface is connected to the mouse, and a metal contact on the card slot of the subscriber identification module is connected to a subscriber identity module card (SIM) card provided by the telecommunications carrier.
  • SIM subscriber identity module card
  • Peripheral interface 110 can be used to couple the external input/output peripherals described above to processor 101 and memory 103.
  • the mobile phone 100 may further include a power supply device 111 (such as a battery and a power management chip) that supplies power to the various components.
  • the battery may be logically connected to the processor 101 through the power management chip to manage charging, discharging, and power management through the power supply device 111. And other functions.
  • the mobile phone 100 may further include a camera (front camera and/or rear camera), a flash, a micro projection device, a near field communication (NFC) device, and the like, and details are not described herein.
  • a camera front camera and/or rear camera
  • a flash a flash
  • a micro projection device a micro projection device
  • NFC near field communication
  • the server may be a cloud server, which is an Internet-based computing method with computing and storage capabilities, and provides shared software and hardware resources and information that can be provided to computer terminals and other device service devices as needed.
  • the cloud server can be a voice cloud.
  • the system can include the following:
  • Smart home appliances - home appliances that are only used by users who are connected, remotely controlled, and automatically run by command. Some have programming and timing functions, which are upgrades to traditional home appliances.
  • Control terminal The control device that runs the control software, usually fixed and mobile.
  • the mobile control terminal is usually a smart device such as a smart phone or a tablet;
  • the fixed control terminal is usually a non-smart device such as a panel or a switch.
  • the present invention is directed to improvements in the former mobile intelligent terminal.
  • the mobile control terminal can communicate with the device through the home wireless local area network (in the "control terminal 1" position in the figure), or through the Internet and the device outside the home ("control terminal 2" position in the figure).
  • IoT Cloud IoT Cloud
  • IoT Cloud In order to realize the processing and control of the device status when the control terminal cannot communicate directly with the device, the device and the control terminal must pass the control server when communicating, called "IoT Cloud". The messages and commands between the two are forwarded by the IoT cloud.
  • the IoT cloud also records and executes these messages/commands.
  • Voice Cloud The voice cloud itself is not part of a smart home, but a third-party service provider that provides the ability to convert speech into text and text into an executable command data structure.
  • the voice cloud and smart home system are two independent entities that communicate through the Internet.
  • the communication content is the above-mentioned "voice-->text” interaction process.
  • the smart home system also includes many components, such as “Smart Home Cloud” (“IoT Cloud”), which controls and manages home devices, smart home appliances in a large number of users' homes, and terminals that control home appliances (such as with control software). Smartphone or voice terminal) etc.
  • IoT Cloud Smart Home Cloud
  • the subsequent control process is the same as the manual operation of the original user on the terminal, so the party only involves the voice cloud and the terminal device, and the functions and processes of other devices. The process is no longer described.
  • the current voice cloud has been able to achieve high recognition accuracy, can convert any voice into text, and in many public services such as booking, inquiries, etc. Speech recognition has achieved a high degree of intelligence and accuracy.
  • the flow of implementing voice control in the smart home service is as shown in FIG. 3.
  • the voice cloud can correctly identify the operating object as “air conditioning”, the location is “Living room”, the action is “adjust temperature”, the target parameter is "26 degrees”, and according to this processing result, the following correct data structure is returned:
  • the voice cloud must have corresponding keywords in the control commands. For example, there must be “air conditioning” to determine that “26” is the temperature. After receiving the complete data of such parameters, the terminal can generate corresponding control commands and know which device to control.
  • Voice Cloud has learned and summarized a large number of devices, such as “air conditioners”, “refrigerators”, “lights”, etc., and designed corresponding control parameter combinations for each type of equipment. Individual rooms have also been defined as standard usages such as "living room”, “bedroom”, “corridor”.
  • the speech cloud also processes various possible word sequences and modal words, and has high accuracy for these voice commands within the standard expression range.
  • the command parsing fails, for example, "opening the living room air conditioner” matches the predefined keyword, Can be parsed into a command, but when the user replaces "living room” with “large room”, if the keyword does not define "large room”, the command parsing fails, and the voice cloud returns the parsed text string to the terminal, but cannot return control.
  • the data structure of the command returns an error code, and the terminal prompts the user to "unrecognized command, please re-speak".
  • the voice cloud cannot be recognized using the user-defined non-standard keyword; the standard defined by the voice recognition system is used. Keywords, although the voice cloud can recognize the data structure of the command, but unlike the user's description in the smart home, the terminal cannot find the target device that should be controlled. In order to avoid this situation, smart homes constantly require users to try different control voices, but they can't control success, causing users' distrust of voice control and low-energy impressions, while avoiding voice clouds trying to improve and provide more comprehensive functions. Collect as many "non-standard keywords" as possible and standardize them.
  • the location and effective range of the personalized setting (modification keyword) of different users are in the home of the user, and the client operated by the user (ie, the terminal) This modification can be completely perceived and can be distinguished.
  • the client operated by the user ie, the terminal
  • This modification can be completely perceived and can be distinguished.
  • the embodiment of the present invention divides the speech recognition process that can not be solved or is completely handed over to the voice cloud for processing into two stages of standard and non-standard by utilizing the computing power of the terminal and the characteristics of processing the individual of the user.
  • the non-standard keyword is replaced with a standard keyword on the user local terminal, and the scope of the personality information is limited to the terminal.
  • the original scheme must provide the personality part to the cloud to be recognized, and the voice control command recognition is completed through the two-step iterative process. Instead of a single submission of the prior art, the result is directly returned, so that the user can modify the personalized language usage to take effect immediately on the voice control, no matter what value is modified, the voice cloud does not need to be modified, and does not need to be modified. Inter-cloud interface reduces the risk of information leakage.
  • the invention solves the problem from the root of the user's personalized setting, that is, the user's own modification of the standard keyword: the user's personalized setting is only related to a specific family or individual, and does not need to be centralized to the voice cloud for processing, in the prior art
  • the voice cloud needs to identify the user, obtain the personalized keyword, and then perform matching, which is actually personalized processing in the centralized common processing process, which is low in efficiency and high in cost.
  • Which standard keyword is modified by each user to be non- The standard value can be judged on the terminal side. It is not necessary to distinguish it from the voice cloud. This will cause the voice cloud to provide different types of keyword modification when it is targeted at different smart home service providers.
  • the user's modified value of the keyword becomes multi-level processing, the development volume is large and the execution efficiency is low.
  • the voice terminal is a part of the smart home system, and the user can modify which category keywords are known from the design, and can also No information is read at risk to the user whether it is actually modified or modified.
  • a method for implementing voice control according to an aspect of the present invention includes:
  • Step 11 The terminal records a correspondence between the first keyword text and the second keyword text.
  • Step 12 When the voice input by the user cooperates with the server to parse the voice input by the user, and performs a function, step 12 specifically includes:
  • Step 121 The terminal sends the voice input by the user to the first server for semantic analysis and logical analysis.
  • Step 122 When the first server returns the parsing failure, the terminal acquires the parsed text returned by the first server, and replaces the second keyword in the parsed text with the server according to the correspondence. Transmitting the first keyword to the first server;
  • Step 123 The terminal receives a control command structure returned after the first server logic is successfully parsed, and performs a function according to the control command structure to play a notification voice.
  • the first server is used interchangeably with the voice cloud
  • the second server is used interchangeably with the loT cloud.
  • voice recognition and control of user personalized parameters are implemented by performing non-standard keyword recognition, replacement, and reverse replacement at the terminal.
  • non-standard keywords such as the information in the room "living room"
  • the user can modify the standard keywords (such as the information in the room "living room") to be different according to their own idioms.
  • step 11 when the user modifies the first keyword text to the second keyword text: the terminal records the first keyword text and the Corresponding relationship of the second keyword text; or, the terminal acquires and records the correspondence between the first keyword text and the second keyword text from the second server, that is, which keywords are modified and modified into what Content, the terminal can be obtained from the IoT cloud without any problems.
  • the terminal records the first keyword text and the Corresponding relationship of the second keyword text; or, the terminal acquires and records the correspondence between the first keyword text and the second keyword text from the second server, that is, which keywords are modified and modified into what Content, the terminal can be obtained from the IoT cloud without any problems.
  • step S12 the terminal cooperates with the server to parse the voice input by the user, and performs the function.
  • the original "voice submission, result return” process includes the following processing:
  • Step 121 The terminal uploads the voice input by the user to the voice cloud for voice recognition, including semantic analysis and logical analysis, and waits for the voice cloud to return the recognition result.
  • the terminal when the voice cloud cannot parse the non-standard keyword, the terminal completes the replacement of the non-standard keyword to the standard keyword and sends it to the voice cloud, and the voice cloud does not need to establish a cloud room.
  • the database interface finds the corresponding standard keywords to the loT cloud, without relying on another service provider, reducing costs and improving information security.
  • the terminal sends the voice input by the user to the voice cloud for semantic parsing and logical parsing, and simultaneously uploads a personalized vocabulary (non-standard keyword) of the user to the voice cloud to achieve a higher voice.
  • the recognition rate according to the service provided by the voice cloud, can increase the recognition accuracy of the customized vocabulary, which is usually a standard function that the voice recognition service can provide. The effect is better after use, the client uploads the voice (recording file), and the voice cloud presses
  • the standard process performs speech and semantic analysis because the speech parsing has a higher accuracy rate after the vocabulary of the previous step.
  • the terminal When there is a non-standard keyword in the user's voice, the semantic recognition of the control command by the voice cloud may be due to the lack of keywords. (if no room information is missing) and failed.
  • the terminal replaces the returned voice recognition text according to the non-standard keyword in the vocabulary, and replaces the non-standard keyword (such as “large room”) with the standard keyword (such as “living room”).
  • the standard keywords in this category as long as the terminal has records, does not affect the recognition of the speech cloud and subsequent anti-replacement, and can record this replacement in the program.
  • the terminal re-uploads the replaced standardized control command text string (text string), and the speech cloud performs semantic recognition. At this time, it is a standard vocabulary, and the semantic recognition is successful.
  • step 123 the terminal performs reverse replacement according to the previous replacement, and generates a control command that can actually correspond to the device.
  • the terminal generates a notification voice according to the command, and uses the non-standard keyword to generate a notification voice to tell the user to execute the result (that is, when the user is notified, the room information is the “big room” that the user calls, and the user can understand).
  • the speech and semantic recognition process which is completely completed by the voice cloud, is divided into two stages, and the non-standard semantic recognition process is standardized by using the standard and non-standard keyword correspondences known by the terminal.
  • a non-standard keyword is a modification content of a standard keyword by a user.
  • the terminal or the IoT cloud records the non-standard keyword and its corresponding category, and can generate different vocabularies according to the category.
  • the process of replacing a non-standard keyword with a standard keyword is that the voice terminal judges in a voice cloud return and a voice recognition text by matching a non-standard keyword. For content that can be matched, the terminal is replaced with one of the standard keywords of the same category (the software of the terminal can be arbitrarily specified, usually for easy understanding and manual identification, the first standard keyword is selected), and the replaced category is recorded. Replace with the actual device when the voice cloud returns control commands.
  • the voice recognition of the user's non-standard keywords is completed, and the command parsing process of the voice cloud is not modified, nor is it required to provide The interface allows the voice cloud to get information from all users.
  • the software modules required to complete the process are implemented in the terminal, that is, the voice terminal needs to add a "custom keyword recognition function" to the original voice control module (FIG. 6).
  • the result of the success or failure of the voice transmission to the voice cloud and the voice cloud returned by the voice control module is different.
  • the voice control command can be different only when the voice is successful.
  • the process example is as shown in Figure 7.
  • the voice command processing method is as follows:
  • the client When setting up the device, first provide a standard device parameter template, including the standard device name, such as "air conditioning” list; standard room, such as "living room” and other lists. If the user modifies the item, the client will record the different items and modified values (the user may modify multiple devices in the home) to record different lists, such as the modified device name is recorded in "custom-device-list”. The modified device room name is recorded in "custom-location-list", and each different modification value is recorded as one of them, and each item in the list is different. Each record will also correspond to the original standard name.
  • the standard device name such as "air conditioning” list
  • standard room such as "living room” and other lists.
  • the user names the air conditioner “Arctic Wind”, then in the “custom-device-list”, one item is “Arctic Wind, Air Conditioning”; for the room, it can be arbitrarily Specify a standard room name, such as "living room, big room.”
  • the order of recording is "standard keywords, non-standard keywords”.
  • the terminal sends the user's voice to the voice cloud, such as "adjust the arctic wind of the big room to 18 degrees".
  • the voice content can be recognized by the voice cloud, but cannot be converted into a control command.
  • the terminal matches the non-standard keywords in all the lists with the text returned by the voice cloud (hereinafter referred to as "dictation text"), such as "custom-device-list” and “custom-location-list” above (also There can be more lists, depending on how many keywords are allowed to be modified, how many lists are there, and each non-standard keyword is searched in "dictation text", in which the "Arctic” can be searched Two values of "wind” and "large room” are located in the two lists "custom-device-list” and "custom-location-list”
  • the terminal obtains the recognition result.
  • the method for implementing voice recognition by the terminal can implement user-defined keyword recognition and processing, and improve the voice and semantic recognition rate of the control command, and the solution is not related to the modification performed by the user, and does not need to be performed.
  • the method does not need the remaining server interfaces, avoids the inter-cloud interface, reduces the port that is attacked by the network, and has high network and information security.
  • the cloud first identifies the user and then processes the user's non-standard information, and the local distributed processing efficiency is high.
  • the method standardizes the interface, and has low coupling and low dependency between the voice service provider, facilitating promotion and flexible selection of the supplier, and further reducing the user's individual. Private information and all user information of the enterprise are exposed to the outside world, and privacy and trade secrets are protected by the software on the terminal side. The voice service upgrade and maintenance are not involved, and the cost is low.
  • an embodiment of the present invention provides a terminal for implementing voice control, which includes a recording unit, a first sending unit, a replacing unit, and an executing unit; the recording unit is configured to record the first Corresponding relationship between the keyword text and the second keyword text; when the user inputs the voice: the first sending unit is configured to send the voice input by the user to the first server for semantic parsing and logical parsing; And acquiring, when the first server returns the parsing failure, the parsed text returned by the first server, and replacing the second keyword in the parsed text with the first keyword according to the correspondence And then sent to the first server; the execution unit is configured to receive a control command structure returned after the first server logic is successfully parsed, and execute a function according to the control command structure to play a notification voice.
  • the recording unit includes: a first recording subunit, configured to: when the user modifies the first keyword text into the second keyword text, record the first keyword text and the The correspondence between the second keyword texts.
  • the recording unit includes: a second recording subunit, configured to acquire and record a correspondence between the first keyword text and the second keyword text from the second server.
  • the recording unit further includes: a third recording subunit, configured to use the first keyword text and the second keyword text and The correspondence is recorded in the vocabulary list.
  • the third record subunit is configured to record the first keyword text and the second keyword text and corresponding relationship according to different types of the first keyword text In a different list of words.
  • the replacing unit includes: a matching subunit, configured to match the second keyword text in the vocabulary list with the parsed text; and replace a subunit, The second keyword matched in the parsed text is replaced with the corresponding first keyword and sent to the first server.
  • the execution unit includes: a replacement subunit, configured to replace the first keyword text in the control command structure with the corresponding second keyword text; and execute the subunit And generating, according to the replaced control command structure, an executable control command, and executing the control command; and a voice generating subunit, configured to generate a notification voice according to the replaced control command structure, and play The notification voice.
  • the replacing unit further includes: a replacement recording subunit, configured to: after the matching the second keyword in the parsed text is replaced by the corresponding first keyword, a second keyword and a corresponding replacement record of the first keyword; the re-sub-subunit is configured to replace the first keyword text in the control command structure with a corresponding one according to the replacement record The second keyword text.
  • a replacement recording subunit configured to: after the matching the second keyword in the parsed text is replaced by the corresponding first keyword, a second keyword and a corresponding replacement record of the first keyword; the re-sub-subunit is configured to replace the first keyword text in the control command structure with a corresponding one according to the replacement record The second keyword text.
  • the terminal further includes: a second sending unit, configured to: when the voice input by the user is sent to the first server for semantic parsing and logical parsing, the first keyword text and the first The correspondence between the two keyword texts is sent to the first server.
  • a second sending unit configured to: when the voice input by the user is sent to the first server for semantic parsing and logical parsing, the first keyword text and the first The correspondence between the two keyword texts is sent to the first server.
  • the above embodiments it may be implemented in whole or in part by software, hardware, firmware, or any combination thereof.
  • software it may be implemented in whole or in part in the form of a computer program product.
  • the computer program product includes one or more computer instructions.
  • the computer program instructions When the computer program instructions are loaded and executed on a computer, the processes or functions described in accordance with embodiments of the present invention are generated in whole or in part.
  • the computer can be a general purpose computer, a special purpose computer, a computer network, or other programmable device.
  • the computer instructions can be stored in a computer readable storage medium or transferred from one computer readable storage medium to another computer readable storage medium, for example, the computer instructions can be from a website site, computer, server or data center Transfer to another website site, computer, server, or data center by wire (eg, coaxial cable, fiber optic, digital subscriber line (DSL), or wireless (eg, infrared, wireless, microwave, etc.).
  • the computer readable storage medium can be any available media that can be accessed by a computer or a data storage device such as a server, data center, or the like that includes one or more available media.
  • the usable medium may be a magnetic medium (eg, a floppy disk, a hard disk, a magnetic tape), an optical medium (eg, a DVD), or a semiconductor medium (such as a solid state disk (SSD)).

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Human Computer Interaction (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • Telephonic Communication Services (AREA)

Abstract

本发明提供了一种实现语音控制的方法和终端,所述终端首先记录第一关键词文本和第二关键词文本的对应关系;当用户输入的语音时:所述终端将所述用户输入的语音发送给第一服务器进行语义解析和逻辑解析;接着,在所述第一服务器返回解析失败时,所述终端获取所述第一服务器返回的解析文本,根据所述对应关系将所述解析文本中的所述第二关键词替换为所述第一关键词后发送给所述第一服务器;然后,所述终端接收所述第一服务器逻辑解析成功后所返回的控制命令结构,并根据所述控制命令结构执行功能,播放通知语音。所述方法和终端提高了用户个性化设置的语音控制的成功低和安全性。

Description

一种实现语音控制的方法和终端 技术领域
本发明涉及电子技术领域,尤其涉及一种实现语音控制的方法和终端。
背景技术
语音识别和控制已经比较成熟并广泛使用,如手机输入法、车载电器控制等,智能家居是一种对传统家电的升级功能,可以通过手机、电脑等智能终端远程控制家电,还能同时控制多个家电或实现自动重复控制,目前也普遍了实现语音控制功能。
因为语音识别和处理需要强大的处理能力和大容量数据库,以及实时的响应能力,所以目前对语音识别的处理通常都放在云端进行。然而,由于目前云端处理能力的局限,一种方式是限制使用标准命令的控制方法,该方法限制了普通用户的语言控制的实用性,例如每个用户或家庭对家中设备、房间都有自己的习惯叫法,如果云端记录所有用户对所有关键字的不同说法,关键字数量过大需要复杂的算法处理,提高成本,影响处理速度,降低识别率,同时容易造成冲突,影响语音控制成功率,进而降低用户使用体验。此外,如果额外增加一个loT云接口,语音云通过读取loT云的数据,获得用户所有的设备名、房间名等自定义的关键字的用法,进行识别和解析的方法,会大幅提高成本和且在安全方面存在问题。
发明内容
根据本发明一些实施例提供的一种实现语音控制的方法、终端,旨在解决提高用户个性化设置的语音控制的成功低和安全性。
第一方面,本发明实施例提供了一种实现语音控制的方法,其中,所述方法包括:所述终端记录第一关键词文本和第二关键词文本的对应关系;当用户输入的语音时:所述终端将所述用户输入的语音发送给第一服务器进行语义解析和逻辑解析;接着,在所述第一服务器返回解析失败时,所述终端获取所述第一服务器返回的解析文本,根据所述对应关系将所述解析文本中的所述第二关键词替换为所述第一关键词后发送给所述第一服务器;然后,所述终端接收所述第一服务器逻辑解析成功后所返回的控制命令结构,并根据所述控制命令结构执行功能,播放通知语音。
所述方法通过在所述终端记录第一关键词文本和第二关键词文本的对应关系,在所述第一服务器无法解析第二关键词文本时,所述终端根据所述对应关系将所述第二关键词文本替换为第一关键词文本后再发送给第一服务器进行语义解析和逻辑解析,既能够为用户提供个性化的语音命令,并且不增加第一服务器额外的处理复杂度,不增加成本,且提高识别准确性,此外,不需要增加第二服务器提供第一关键词文本和第二关键词文本关系说明,减少用户个人私有信息和企业全部用户信息对外界暴露,提高安全度。
结合本发明第一方面,在第一方面的第一实施例中,所述终端记录第一关键词文本和第二关键词文本的对应关系包括:当用户将第一关键词文本修改为第二关键词文本的操作:所述终端记录所述第一关键词文本和所述第二关键词文本的对应关系,所述终端通过随用户的操作随时记录所述对应关系,无需依赖外界设备,操作方便,更新速度快。
结合本发明第一方面,在第一方面的第一实施例中,所述终端记录第一关键词文本和第二关键词文本的对应关系包括:所述终端从第二服务器获取并记录第一关键词文本和第二关键词文本的对应关系,第二服务器可以是物联网服务器,其记录用户在所述终端上设置的将第一关键词文本和第二关键词文本修改操作,所述终端在与所述第二服务器交互时,可以从所述第二服务器获取该对应关系,该方法可以使终端无需实时搜集和记录修改操作,降低所述终端的处理逻辑复杂度。
结合本发明第一方面各实施例,在可能的实现方式中,所述终端保存所述第一关键词文本和所述第二关键词文本的对应关系包括:所述终端将所述第一关键词文本和所述第二关键词文本及其对应关系记录在词汇列表中。
结合本发明第一方面前述实施例,在可能的实现方式中,在所述第一服务器返回解析失败时,所述终端获取所述第一服务器返回的解析文本,根据所述对应关系将所述解析文本中的所述第二关键词替换为所述第一关键词后发送给所述第一服务器包括:所述终端将所述词汇列表中的所述第二关键词文本与所述解析文本进行匹配;所述终端将所述解析文本中匹配到的所述第二关键词替换为对应的所述第一关键词后发送给第一服务器。
结合本发明第一方面前述实施,在可能的实现方式中,所述终端将所述第一关键词文本和所述第二关键词文本及其对应关系记录在词汇列表中包括:所述终端根据所述第一关键词文本不同的类型,将所述第一关键词文本和所述第二关键词文本及其对应关系记录在不同的词汇列表中。
结合本发明第一方面各实施例,在可能的实现方式中,所述终端根据所述控制命令结构执行功能,并播放通知语音包括:所述终端将所述控制命令结构中所述第一关键词文本替换为所对应的所述第二关键词文本;所述终端根据替换后的所述控制命令结构,生成可执行的控制命令,并执行所述控制命令;所述终端根据替换后的所述控制命令结构,生成通知语音,并播放所述通知语音。所述终端替换为第二关键词文本后执行功能,包括将执行功能命令发送给设备或通过第二服务器发送给所述设备,都能够使设备或第二服务器更易理解执行命令的含义,通知语音播放第二关键词文本能够避免词汇变化给用户带来的误解,进而提高用户使用体验。
结合本发明第一方面各实施例,在可能的实现方式中,所述终端将所述解析文本中匹配到的所述第二关键词替换为对应的所述第一关键词后还包括:所述终端记录所述第二关键词和对应的所述第一关键词的替换记录;则所述终端将所述控制命令结构中所述第一关键词文本替换为所对应的所述第二关键词文本包括:所述终端根据所述替换记录将所述控制命令结构中所述第一关键词文本替换为所对应的所述第二关键词文本。
结合本发明第一方面各实施例,在可能的实现方式中,所述方法还包括:在所述终端将所述用户输入的语音发送给第一服务器进行语义解析和逻辑解析时,所述终端将所述第一关键词文本和第二关键词文本的对应关系发送给所述第一服务器。
第二方面,本发明实施例提供了一种实现语音控制的终端,其中,所述终端包括:至少一个处理器;至少一个存储器,所述至少一个存储器包括若干指令;所述处理器执行所述若干指令使所述终端至少执行如下步骤:记录第一关键词文本和第二关键词文本的对应关系;当用户输入的语音:将所述用户输入的语音发送给第一服务器进行语义解析和逻辑解析;在所述第一服务器返回解析失败时,获取所述第一服务器返回的解析文本,根据所述对应关系将所述解析文本中的所述第二关键词替换为所述第一关键词后发送给所述第一 服务器;接收所述第一服务器逻辑解析成功后所返回的控制命令结构,并根据所述控制命令结构执行功能,播放通知语音。
结合本发明第二方面,在第二方面的第一实施例中,在记录第一关键词文本和第二关键词文本的对应关系步骤中,所述处理器执行所述若干指令使所述终端至少执行如下步骤:
当用户将第一关键词文本修改为第二关键词文本的操作,记录所述第一关键词文本和所述第二关键词文本的对应关系。
结合本发明第二方面,在第二方面的第二实施例中,在记录第一关键词文本和第二关键词文本的对应关系的步骤中,所述处理器执行所述若干指令使所述终端至少执行如下步骤:
从第二服务器获取并记录第一关键词文本和第二关键词文本的对应关系。
结合本发明第二方面各实施例,在可能的实现方式中,在保存所述第一关键词文本和所述第二关键词文本的对应关系的步骤中,所述处理器执行所述若干指令使所述终端至少执行如下步骤:
将所述第一关键词文本和所述第二关键词文本及其对应关系记录在词汇列表中。
结合本发明第二方面前述实施,在可能的实现方式中,在将所述第一关键词文本和所述第二关键词文本及其对应关系记录在词汇列表中,所述处理器执行所述若干指令使所述终端至少执行如下步骤:根据所述第一关键词文本不同的类型,将所述第一关键词文本和所述第二关键词文本及其对应关系记录在不同的词汇列表中。
结合本发明第二方面各实施例,在可能的实现方式中,在所述第一服务器返回解析失败时,获取所述第一服务器返回的解析文本,根据所述对应关系将所述解析文本中的所述第二关键词替换为所述第一关键词后发送给所述第一服务器的步骤中,所述处理器执行所述若干指令使所述终端至少执行如下步骤:将所述词汇列表中的所述第二关键词文本与所述解析文本进行匹配;将所述解析文本中匹配到的所述第二关键词替换为对应的所述第一关键词后发送给第一服务器。
结合本发明第二方面各实施例,在可能的实现方式中,根据所述控制命令结构执行功能,并播放通知语音的步骤中,所述处理器执行所述若干指令使所述终端至少执行如下步骤:将所述控制命令结构中所述第一关键词文本替换为所对应的所述第二关键词文本;根据替换后的所述控制命令结构,生成可执行的控制命令,并执行所述控制命令;根据替换后的所述控制命令结构,生成通知语音,并播放所述通知语音。
结合本发明第二方面各实施例,在可能的实现方式中,在将所述解析文本中匹配到的所述第二关键词替换为对应的所述第一关键词的步骤后,所述处理器执行所述若干指令使所述终端至少执行如下步骤:记录所述第二关键词和对应的所述第一关键词的替换记录;将所述控制命令结构中所述第一关键词文本替换为所对应的所述第二关键词文本包括:根据所述替换记录将所述控制命令结构中所述第一关键词文本替换为所对应的所述第二关键词文本。
结合本发明第二方面各实施例,在可能的实现方式中,所述处理器还执行所述若干指令使所述终端至少执行如下步骤:
在将所述用户输入的语音发送给第一服务器进行语义解析和逻辑解析时,将所述第一关键词文本和第二关键词文本的对应关系发送给所述第一服务器。
第三方面,本发明实施例提供了一种实现语音控制的终端,其中,包括记录单元,第 一发送单元、替换单元和执行单元;所述记录单元用于记录第一关键词文本和第二关键词文本的对应关系;当用户输入的语音:所述第一发送单元用于将所述用户输入的语音发送给第一服务器进行语义解析和逻辑解析;所述替换单元用于在所述第一服务器返回解析失败时,获取所述第一服务器返回的解析文本,根据所述对应关系将所述解析文本中的所述第二关键词替换为所述第一关键词后发送给所述第一服务器;所述执行单元用于接收所述第一服务器逻辑解析成功后所返回的控制命令结构,并根据所述控制命令结构执行功能,播放通知语音。
结合本发明第三方面,在可能的实现方式中,所述记录单元包括:第一记录子单元,用于当用户将第一关键词文本修改为第二关键词文本的操作,记录所述第一关键词文本和所述第二关键词文本的对应关系。
结合本发明第三方面,在可能的实现方式中,所述记录单元包括:第二记录子单元,用于从第二服务器获取并记录第一关键词文本和第二关键词文本的对应关系。
结合本发明第三方面各实施例,在可能的实现方式中,所述记录单元还包括:第三记录子单元,用于将所述第一关键词文本和所述第二关键词文本及其对应关系记录在词汇列表中。
结合本发明第三方面前述各实施例,在可能的实现方式中,所述第三记录子单元用于:根据所述第一关键词文本不同的类型,将所述第一关键词文本和所述第二关键词文本及其对应关系记录在不同的词汇列表中。
结合本发明第三方面前述各实施例,在可能的实现方式中,所述替换单元包括:匹配子单元,用于将所述词汇列表中的所述第二关键词文本与所述解析文本进行匹配;替换子单元,用于将所述解析文本中匹配到的所述第二关键词替换为对应的所述第一关键词后发送给第一服务器。
结合本发明第三方面前述各实施例,在可能的实现方式中,所述执行单元包括:再替换子单元,用于将所述控制命令结构中所述第一关键词文本替换为所对应的所述第二关键词文本;执行子单元,用于根据替换后的所述控制命令结构,生成可执行的控制命令,并执行所述控制命令;语音生成子单元,用于根据替换后的所述控制命令结构,生成通知语音,并播放所述通知语音。
结合本发明第三方面前述实施例,在可能的实现方式中,所述替换单元还包括:替换记录子单元,用于在所述解析文本中匹配到的所述第二关键词替换为对应的所述第一关键词后,记录所述第二关键词和对应的所述第一关键词的替换记录;所述再替换子单元用于根据所述替换记录将所述控制命令结构中所述第一关键词文本替换为所对应的所述第二关键词文本。
结合本发明第三方面前述各实施例,在可能的实现方式中,所述终端还包括:第二发送单元,用于将所述用户输入的语音发送给第一服务器进行语义解析和逻辑解析时,将所述第一关键词文本和第二关键词文本的对应关系发送给所述第一服务器。
第四方面,本发明实施例提供了一种计算机可读存储介质,包括指令,当其在计算机上运行时,使得计算机执行如第一方面各实施例所述的方法。
第五方面,本发明实施例提供了一种包含指令的计算机程序产品,当其在计算机上运行时,使得计算机执行如第一方面各实施例所述的方法。
附图说明
为了更清楚地说明本发明实施例中的技术方案,下面将对实施例描述中所需要使用的附图作简要地介绍。在附图中,相同的标好表示相应的部分。显而易见地,下面描述中的附图仅仅是本发明的一些实施例,而非全部。对于本领域普通技术人员来讲,在没有付出创造性劳动的前提下,还可以根据这些附图获得其它的附图。
图1示出根据本发明实施例提供的一种实现语音控制的系统的结构示意图;
图2示出根据本发明实施例提供的一种终端的结构示意图;
图3示出根据本发明实施例提供的一种实现语音控制的方法流程示意图;
图4示出根据本发明实施例提供的一种实现语音控制的系统的结构示意图;
图5示出根据本发明实施例提供的一种实现语音控制的方法流程示意图;
图6示出根据本发明实施例提供的一种实现语音控制的终端的改进示意图;
图7示出根据本发明实施例提供的一种实现语音控制的过程示意图;
图8示出根据本发明实施例提供的一种实现语音控制方法的流程示意图。
具体实施方式
本发明实施例中所使用的术语只是为了描述特定实施例的目的,而并非旨在作为对本发明的限制。如在本发明的说明书和所附权利要求书中所使用的那样,单数表达形式“一个”、“一种”、“所述”、“上述”和“该”旨在也包括复数表达形式,除非其上下文中明确地有相反指示。还应当理解,本发明中可能使用的术语“和/或”是指并包含一个或多个相绑定的列出项目的任何或所有可能组合。
以下介绍本发明实施例中终端、设备、服务器、系统、以及所述终端配合所述服务器实现语音控制的实施例,用来提高语音控制的准确率和成功率。
语音识别和控制已经比较成熟并广泛使用,如手机输入法、车载电器控制等。目前在智能家居的操作控制也普遍实现了语音控制功能,智能家居是一种对传统家电的升级功能,可以通过手机、电脑等智能终端远程控制家电,还能同时控制多个家电或实现自动重复控制,目前也普遍了实现语音控制功能,用户通过对手机或支持语音输入的控制终端讲出控制命令,就可以操作家电设备。后者的语音控制终端形态很多,如智能音箱、路由器、摄像头、专用语音控制终端等多种,以下对支持语音控制的智能设备统称“终端”或“语音终端”。
因为语音识别和处理需要强大的处理能力和大容量数据库,以及实时的响应能力,所以目前对语音识别的处理通常都放在云端(以下称为“语音云”,主要是计算中心里的服务器集群,具有强大的存储和处理能力)进行,即,终端把用户的语音发送到语音云,语音云的服务器进行识别和处理,转换成文本、控制命令的数据结构等形式返回给终端,终端再根据这些数据,转换成家电控制命令去执行用户的控制意图。通常语音云并不只是给智能家居服务,还会支持手机、车载的其它语音业务,并且有单独的运营商提供服务。
以智能家庭系统为例,图1中所示为目前通常的智能家庭系统组网图。本发明的系统架构使用基于语音云进行语音和主义识别的语音控制系统,所述系统包括终端、设备和一个或多个服务器。
在本发明一些实施例中,所述智能设备可以是智能家电,包括通过物联网技术将家中的各种设备,如音视频设备、照明系统、窗帘控制、空调控制、安防系统、数字影院系统、 影音服务器、影柜系统、网络家电等。所述电子设备具有数据处理能力,不仅能够提供传统居住功能,还能够支持用户利用终端通过网络进行远程控制、定时控制等智能功能。还应当理解的是,在本发明其他一些实施例中,所述智能家电还可以是其他需要进行入网连接并可以通过终端配合实现入网连接的设备。
在一些实施例中,智能家电例如为智能电视。智能电视除具有普通电视的显示器、扬声器等装置外,还具有处理器、存储器、网络连接装置,能够搭载各种操作系统,并能够连接互联网,可以类似终端支持多种方式的交互式应用,例如根据用户安装、更新、删除应用。
在本发明一些实施例中,所述终端可以是还包含其它功能诸如个人数字助理和/或音乐播放器功能的便携式电子设备,诸如手机、平板电脑、具备无线通讯功能的可穿戴电子设备(如智能手表)等。便携式电子设备的示例性实施例包括但不限于搭载
Figure PCTCN2017088150-appb-000001
Figure PCTCN2017088150-appb-000002
或者其它操作系统的便携式电子设备。上述便携式电子设备也可以是其它便携式电子设备,诸如具有触敏表面(例如触控板)的膝上型计算机(Laptop)等。还应当理解的是,在本发明其他一些实施例中,所述终端还可以是符合同类规范的遥控器、智能环境检测器等可作为移动安全代理的设备。
如图2所示,本发明实施例中的终端可以为手机100。下面以手机100为例对实施例进行具体说明。应该理解的是,图示手机100仅是终端的一个范例,并且手机100可以具有比图中所示出的更多的或者更少的部件,可以组合两个或更多的部件,或者可以具有不同的部件配置。图中所示出的各种部件可以在包括一个或多个信号处理和/或专用集成电路在内的硬件、软件、或硬件和软件的组合中实现。
如图2所示,手机100具体可以包括:处理器101、射频(RF)电路102、存储器103、触摸屏104、蓝牙装置105、一个或多个传感器106、Wi-Fi装置107、定位装置108、音频电路109、外设接口110以及电源系统111等部件。这些部件可通过一根或多根通信总线或信号线(图2中未示出)进行通信。本领域技术人员可以理解,图2中示出的硬件结构并不构成对手机100的限定,手机100可以包括比图示更多或更少的部件,或者组合某些部件,或者不同的部件布置。
下面结合图2对手机100的各个部件进行具体的介绍:
处理器101是手机100的控制中心,利用各种接口和线路连接手机100的各个部分,通过运行或执行存储在存储器103内的应用程序(以下可以简称App),以及调用存储在存储器103内的数据和指令,执行手机100的各种功能和处理数据。在一些实施例中,处理器101可包括一个或多个处理单元;处理器101还可以集成应用处理器和调制解调处理器;其中,应用处理器主要处理操作系统、用户界面和应用程序等,调制解调处理器主要处理无线通信。可以理解的是,上述调制解调处理器也可以不集成到处理器101中。处理器101可以是集成芯片。在本发明一些实施例中,上述处理器101还可以包括指纹验证芯片,用于对采集到的指纹进行验证。
射频电路102可用于在收发信息或通话过程中,无线信号的接收和发送。具体地,射频电路102可以将基站的下行数据接收后,给处理器101处理;另外,将涉及上行的数据发送给基站。通常,射频电路包括但不限于天线、至少一个放大器、收发信机、耦合器、低噪声放大器、双工器等。此外,射频电路102还可以通过无线通信和其他设备通信。所述无线通信可以使用任一通信标准或协议,包括但不限于全球移动通讯系统、通用分组无 线服务、码分多址、宽带码分多址、长期演进、电子邮件、短消息服务等。
存储器103用于存储应用程序以及数据,处理器101通过运行存储在存储器103的应用程序以及数据,执行手机100的各种功能以及数据处理。存储器103主要包括存储程序区以及存储数据区,其中,存储程序区可存储操作系统、至少一个功能所需的应用程序(比如声音播放功能、图像播放功能等);存储数据区可以存储根据使用手机100时所创建的数据(比如音频数据、电话本等)。此外,存储器103可以包括高速随机存取存储器,还可以包括非易失存储器,例如磁盘存储器件、闪存器件或其他易失性固态存储器件等。存储器103可以存储各种操作系统,例如苹果公司所开发的
Figure PCTCN2017088150-appb-000003
操作系统,谷歌公司所开发的
Figure PCTCN2017088150-appb-000004
操作系统等。
触摸屏104可以包括触控板104-1和显示器104-2。其中,触控板104-1可采集手机100的用户在其上或附近的触摸事件(比如用户使用手指、触控笔等任何适合的物体在触控板104-1上或在触控板104-1附近的操作),并将采集到的触摸信息发送给其他器件例如处理器101。其中,用户在触控板104-1附近的触摸事件可以称之为悬浮触控;悬浮触控可以是指,用户无需为了选择、移动或拖动目标(例如图标等)而直接接触触控板,而只需用户位于终端附近以便执行所想要的功能。在悬浮触控的应用场景下,术语“触摸”、“接触”等不会暗示用于直接接触触摸屏,而是附近或接近的接触。能够进行悬浮触控的触控板104-1可以采用电容式、红外光感以及超声波等实现。触控板104-1可包括触摸检测装置和触摸控制器两个部分。其中,触摸检测装置检测用户的触摸方位,并检测触摸操作带来的信号,将信号传送给触摸控制器;触摸控制器从触摸检测装置上接收触摸信息,并将它转换成触点坐标,再发送给处理器101,触摸控制器还可以接收处理器101发送的指令并加以执行。此外,可以采用电阻式、电容式、红外线以及表面声波等多种类型来实现触控板104-1。显示器(也称为显示屏)104-2可用于显示由用户输入的信息或提供给用户的信息以及手机100的各种菜单。可以采用液晶显示器、有机发光二极管等形式来配置显示器104-2。触控板104-1可以覆盖在显示器104-2之上,当触控板104-1检测到在其上或附近的触摸事件后,传送给处理器101以确定触摸事件的类型,随后处理器101可以根据触摸事件的类型在显示器104-2上提供相应的视觉输出。虽然在图2中,触控板104-1与显示屏104-2是作为两个独立的部件来实现手机100的输入和输出功能,但是在某些实施例中,可以将触控板104-1与显示屏104-2集成而实现手机100的输入和输出功能。可以理解的是,触摸屏104是由多层的材料堆叠而成,本发明实施例中只展示出了触控板(层)和显示屏(层),其他层在本发明实施例中不予记载。另外,在本发明其他一些实施例中,触控板104-1可以覆盖在显示器104-2之上,并且触控板104-1的尺寸大于显示屏104-2的尺寸,使得显示屏104-2全部覆盖在触控板104-1下面,或者,上述触控板104-1可以以全面板的形式配置在手机100的正面,也即用户在手机100正面的触摸均能被手机感知,这样就可以实现手机正面的全触控体验。在其他一些实施例中,触控板104-1以全面板的形式配置在手机100的正面,显示屏104-2也可以以全面板的形式配置在手机100的正面,这样在手机的正面就能够实现无边框(Bezel)的结构。
在本发明实施例中,手机100还可以具有指纹识别功能。例如,可以在手机100的背面(例如后置摄像头的下方)配置指纹识别器,或者在手机100的正面(例如触摸屏104的下方)配置指纹识别器。不再详述。
手机100还可以包括蓝牙装置105,用于实现手机100与其他短距离的终端(例如手 机、智能手表等)之间的数据交换。本发明实施例中的蓝牙装置可以是集成电路或者蓝牙芯片等。
手机100还可以包括至少一种传感器106,比如光传感器、运动传感器以及其他传感器。具体地,光传感器可包括环境光传感器及接近传感器,其中,环境光传感器可根据环境光线的明暗来调节触摸屏104的显示器的亮度,接近传感器可在手机100移动到耳边时,关闭显示器的电源。作为运动传感器的一种,加速计传感器可检测各个方向上(一般为三轴)加速度的大小,静止时可检测出重力的大小及方向,可用于识别手机姿态的应用(比如横竖屏切换、相关游戏、磁力计姿态校准)、振动识别相关功能(比如计步器、敲击)等;至于手机100还可配置的陀螺仪、气压计、湿度计、温度计、红外线传感器等其他传感器,在此不再赘述。
Wi-Fi装置107,用于为手机100提供遵循Wi-Fi相关标准协议的网络接入,手机100可以通过Wi-Fi装置107接入到Wi-Fi接入点,进而帮助用户收发电子邮件、浏览网页和访问流媒体等,它为用户提供了无线的宽带互联网访问。在其他一些实施例中,该Wi-Fi装置107也可以作为Wi-Fi无线接入点,可以为其他终端提供Wi-Fi网络接入。
定位装置108,用于为手机100提供地理位置。可以理解的是,该定位装置108具体可以是全球定位系统(GPS)或北斗卫星导航系统、俄罗斯GLONASS等定位系统的接收器。定位装置108在接收到上述定位系统发送的地理位置后,将该信息发送给处理器101进行处理,或者发送给存储器103进行保存。在另外的一些实施例中,该定位装置108可以是辅助全球卫星定位系统(AGPS)的接收器,AGPS是一种在一定辅助配合下进行GPS定位的运行方式,它可以利用基站的信号,配合GPS卫星信号,可以让手机100定位的速度更快;在AGPS系统中,该定位装置108可通过与辅助定位服务器(例如手机定位服务器)的通信而获得定位辅助。AGPS系统通过作为辅助服务器来协助定位装置108完成测距和定位服务,在这种情况下,辅助定位服务器通过无线通信网络与终端例如手机100的定位装置108(即GPS接收器)通信而提供定位协助。在另外的一些实施例中,该定位装置108也可以是基于Wi-Fi接入点的定位技术。由于每一个Wi-Fi接入点都有一个全球唯一的MAC地址,终端在开启Wi-Fi的情况下即可扫描并收集周围的Wi-Fi接入点的广播信号,因此可以获取到Wi-Fi接入点广播出来的MAC地址;终端将这些能够标示Wi-Fi接入点的数据(例如MAC地址)通过无线通信网络发送给位置服务器,由位置服务器检索出每一个Wi-Fi接入点的地理位置,并结合Wi-Fi广播信号的强弱程度,计算出该终端的地理位置并发送到该终端的定位装置108中。
音频电路109、扬声器113、麦克风114可提供用户与手机100之间的音频接口。音频电路109可将接收到的音频数据转换后的电信号,传输到扬声器113,由扬声器113转换为声音信号输出;另一方面,麦克风114将收集的声音信号转换为电信号,由音频电路109接收后转换为音频数据,再将音频数据输出至RF电路102以发送给比如另一手机,或者将音频数据输出至存储器103以便进一步处理。
外设接口110,用于为外部的输入/输出设备(例如键盘、鼠标、外接显示器、外部存储器、用户识别模块卡等)提供各种接口。例如通过通用串行总线(USB)接口与鼠标连接,通过用户识别模块卡卡槽上的金属触点与电信运营商提供的用户识别模块卡(SIM)卡进行连接。外设接口110可以被用来将上述外部的输入/输出外围设备耦接到处理器101和存储器103。
手机100还可以包括给各个部件供电的电源装置111(比如电池和电源管理芯片),电池可以通过电源管理芯片与处理器101逻辑相连,从而通过电源装置111实现管理充电、放电、以及功耗管理等功能。
尽管图2未示出,手机100还可以包括摄像头(前置摄像头和/或后置摄像头)、闪光灯、微型投影装置、近场通信(NFC)装置等,在此不再赘述。以下实施例均可以在具有上述结构的手机100中实现。
所述服务器可以是云服务器,是一种基于互联网的计算方式,具有计算和存储能力,提供共享的软硬件资源和信息可以按需求提供给计算机各种终端和其他设备的服务设备。所述云服务器可以是语音云。
结合图1,在具体的实施例中,所述系统可以包括如下:
智能家电——只有联网、远程控制、按命令自动运行的用户所使用的家电设备,有些还具有编程、定时功能,是对传统家电的升级。
控制终端——运行控制软件的控制设备,通常有固定和移动两种。移动的控制终端通常是智能手机、平板等智能设备;固定控制终端通常是面板、开关等非智能设备。本发明针对前者的移动智能终端进行改进。移动控制终端可以在家庭中(图中“控制终端1”位置)通过家庭无线局域网和设备通信,也可以在家庭以外通过互联网和设备通信(图中“控制终端2”位置)。
IoT云(物联网云)——为了实现在控制终端不能和设备直接通信时,仍能对设备状态进行处理和控制,设备和控制终端在通信时都要通过控制服务器,称为“IoT云”,由IoT云转发两者间的消息和命令。IoT云还会对这些消息/命令进行记录和执行。
语音云——语音云本身并不是智能家庭的组成部分,而是第三方服务商,提供把语音转换成文本,以及文本转换成可执行的命令数据结构的功能。
语音云和智能家居系统是两个独立运行的实体,中间通过互联网进行通信,通信内容为上述的“语音-->文本”的交互过程。另外,智能家居系统也包括很多组成部分,如控制和管理家庭设备的“智能家居云”(简称“IoT云”)、大量用户家庭中的智能家电设备、控制家电的终端(如带有控制软件的智能手机或语音终端)等。在本方法中,当终端识别到正确的用户控制命令后,后续控制过程与原先用户在终端上手工操作app界面相同,所以本方只涉及语音云和终端两个设备,其它设备的功能和处理过程不再描述。
随着数据库技术、人工智能技术、服务器处理能力的发展,当前的语音云已经能达到很高的识别准确率,能把任意一段话音转换成文本,并且在很多如订票、查询等公共业务上,语音识别已经能达到很高的智能程度和准确率。
在本发明一些实施例中,智能家居业务中实现语音控制的流程如图3。
在家电语音控制中,通常使用“操作+设备+目的”的表达方式,如“设置客厅空调26度”,对于这类标准用法,语音云都能正确地识别操作对象为“空调”,位置是“客厅”,动作是“调节温度”,目标参数“26度”,并根据此处理结果返回如下正确的数据结构:
{
“dev”,”空调”
“op”,”set”
“temp”,”26”
“loc”,”客厅”
}
语音云为了完整地识别这类控制命令,控制命令中必须存在相应的关键字,例如必须有“空调”,才能判断“26”是温度。而终端在收到这种参数完整数据后才能生成相应的控制命令、知道是控制哪个设备。语音云在提供此类业务时,已经学习并归纳了大量的设备,如“空调”、“冰箱”、“灯”等,并且对每类设备都设计了相应的控制参数组合,对家庭中的各个房间也已经定义为诸如“客厅”、“卧室”、“走廊”等标准用法。同时,语音云还对各种可能的词语顺序、语气词进行了相应的处理,对于这些在标准表达范围内的语音命令,具有很高的准确率。
当其中某项缺少或描述不符合预先定义的关键字时,虽然能从语言识别出文本(把语音转换为文字),但命令解析失败,例如,“打开客厅空调”符合预先定义的关键字、可以解析成命令,但用户把“客厅”换成“大房间”时,如果关键字没有定义“大房间”,则命令解析失败,语音云给终端返回解析后的文本字符串,但是无法返回控制命令的数据结构,而是返回错误码,终端提示用户“无法识别的命令,请重新说”。这种情况下因为语音云上缺少用户自定义的关键字,无论用户如何改变说法,结果总是失败:使用用户自定义的非标准关键字,语音云无法识别;使用语音识别系统预定义的标准关键字,虽然语音云可以识别出命令的数据结构,但与用户在智能家居中的描述不同,终端无法找到应该控制的目标设备。为避免这种情况下,智能家居不断要求用户尝试不同的控制语音说法,却始终无法控制成功,造成用户对语音控制的不信任和低能印象的问题,同时避免语音云试图改进提供更全面的功能搜集尽可能多的“非标准关键字”、把它们标准化,如对“房间”的描述增加“大房间”、“小房间”等关键字,通过扩大能够处理的集合的方法提高命令解析的成功率所带来的显而易见的问题,带来显而易见的运算复杂度大幅提高、处理速度下降、识别率降低的问题。在本发明的实施例中,利用家庭智能业务中,不同用户的个性化设置(修改关键字)实际发生的位置和有效范围在其所在家庭内部这一特点,用户操作的客户端(即终端)可对此修改完全感知并能加以区分,通过识别用户对标准关键字的修改,得到与个性化关键字的对应关系,对语音云不能识别的个性化部分加以替换,利用其基础能力实现扩展功能。不限制用户个性化关键字,并不需要把全部用户的个性化关键字提交到云端、不需要云端针对性开发和升级,无论用户如何修改,本地都可以用同一个软件获取修改后的值,并加以替换和反替换;也解决了现有技术三的问题,不需要把整个用户的情况暴露给第三方。
本发明的实施例通过利用终端的自身的计算能力以及是对用户个体进行处理的特点,把原先无法解决、或者全部交给语音云进行处理的语音识别过程,分为标准和非标准两个阶段,在用户本地终端上对非标准关键字替换为标准关键字,把个性信息范围限制在终端上,原先方案必须把个性部分提供给云才能识别,通过两步的迭代过程完成语音控制命令识别,而不是现有技术的一次提交、直接返回结果,从而使用户修改其个性化的语言用法在语音控制上能立即生效,无论修改成什么值都能立即识别,语音云不需要修改,并且不需要云间接口,减少信息泄漏风险。
本发明从用户个性化设置的根源,即用户自己对标准关键字的修改来解决问题:用户的个性化设置只和特定的家庭或个人有关,不需要集中到语音云进行处理,在现有技术三的解决方法中,语音云需要识别用户、取得个性化关键字,再进行匹配,实际是在集中的共性处理过程中进行个性化处理,效率低、成本高。每个用户把哪个标准关键字修改成非 标准值,在终端侧已经可以判断,不需要再到语音云进行区分,那样会导致,语音云在针对不同智能家居服务商时,对方会提供不同关键字的修改范围,需要区别处理,加上用户对关键字的修改值就变成需要进行多级处理,开发量大且执行效率低,语音终端作为智能家居系统的一个部分,用户可以修改哪些类别关键字从设计上就已经知道,也能无信息风险地读取到用户实际上是否修改、修改成什么值。
结合以上,结合图8,根据本发明一方面提供的一种实现语音控制的方法,其中,所述方法包括:
步骤11:所述终端记录第一关键词文本和第二关键词文本的对应关系。
步骤12:当用户输入的语音,与服务器配合对用户输入的语音进行解析,并执行功能,步骤12具体包括:
步骤121:所述终端将所述用户输入的语音发送给第一服务器进行语义解析和逻辑解析;
步骤122:在所述第一服务器返回解析失败时,所述终端获取所述第一服务器返回的解析文本,根据所述对应关系将所述解析文本中的所述第二关键词替换服务器为所述第一关键词后发送给所述第一服务器;
步骤123:所述终端接收所述第一服务器逻辑解析成功后所返回的控制命令结构,并根据所述控制命令结构执行功能,播放通知语音。
本领域技术人员可以理解的,以下为了方便描述,将第一服务器与语音云互换使用、将第二服务器与loT云互换使用。
本发明实施例通过在终端进行非标准关键字识别、替换和反替换,实现对用户个性化参数的语音识别和控制。之所以存在非标准关键字,是因为智能家居为了符合用户日常习惯,提供了修改设备属性的功能,用户可以把标准关键字(如房间中的信息“客厅”)修改为符合自己习惯用法的非标准关键字(如“大房间”),因此,在步骤11中,当用户将第一关键词文本修改为第二关键词文本的操作:所述终端记录所述第一关键词文本和所述第二关键词文本的对应关系;或者,所述终端从第二服务器获取并记录第一关键词文本和第二关键词文本的对应关系,也就是说,哪些关键字被修改、被修改成什么内容,终端可以无障碍地从IoT云获取到。
结合图3,在步骤S12,所述终端与服务器配合对用户输入的语音进行解析,并执行功能中,在原先的“语音提交,结果返回”流程包括以下处理过程:
步骤121:所述终端将所述用户输入的语音上传到语音云进行语音识别,包括语义解析和逻辑解析,等待语音云返回识别结果;
结合图4,可知,通过本发明的实施例,当语音云无法解析非标准关键字时,终端完成非标准关键字向标准关键字的替换,再发给语音云,语音云无需建立一个云间数据库接口向loT云查找对应标准关键词,无需依赖另一个服务商,降低成本并且提高信息安全性。
可选地,所述终端将所述用户输入的语音发送给语音云进行语义解析和逻辑解析时,同时上传用户个性化的词汇表(非标准关键字)到语音云,以达到较高的语音识别率,依据语音云提供的服务,可以增加自定义词汇的识别准确率,通常是语音识别服务都能提供的标准功能,使用后效果更好,客户端上传语音(录音文件),语音云按标准流程进行语音和语义解析,因为有上一步骤的词汇表后,语音解析有较高准确率。
当用户语音中存在非标准关键字时,语音云对控制命令的语义识别会因为缺少关键词 (如缺少房间信息)而失败。此时在步骤122中,终端根据词汇表中非标准关键字对返回的语音识别文本进行替换,把非标准关键字(如“大房间”)替换为标准关键字(如“客厅”、也可以为此类别中的标准关键字,只要终端有记录,不影响语音云的识别和后续的反替换),并可以在程序中记录此替换。
随后,终端重新上传替换后的标准化控制命令文本字符串(文本字符串),语音云进行语义识别,此时为标准词汇,语义识别成功。
当语音云返回语义解析后的控制命令结构,在步骤123中,终端再根据先前的替换进行反替换,并生成实际能对应到设备的控制命令。
然后,终端根据命令执行结果,用非标准关键字生成通知语音告诉用户执行结果(即通知用户时,房间信息为用户所说的“大房间”,用户才能理解)。
此处把原来完全有语音云完成的语音、语义识别过程分成两个阶段,利用终端已知的标准和非标准关键字对应关系,使非标准语义识别过程标准化。
上述处理过程如图5所示,虚线中为新增加处理过程,包括:
非标准关键字是用户对标准关键字进行过修改内容,在用户修改后保存时,终端或IoT云即记录此非标准关键字及其对应的类别,可以根据类别生成不同的词汇表。
用标准关键字替代非标准关键字的过程,是语音终端在语音云返回和语音识别文本中,通过匹配非标准关键字的方法判断。对于能匹配的内容,终端替换为同类别的标准关键字之一,(终端的软件可任意指定,通常为便于理解和人工识别,选第一个标准关键字),并记录替换的类别,才能在语音云返回控制命令时替换为实际的设备。
因为一句话中,同类关键字只会出现一次(一个命令能控制一个房间中的一个设备),所以在语音云成功完成语义解析(图中“语义解析2”)后,把命令结构中相应类别替换前面匹配出的非标准关键字。
通过上述这一过程的几个步骤(对应表示在下图6中虚线框内),即完成了用户非标准关键字的语音识别,而不需要对语音云的命令解析过程进行修改,也不需要提供接口让语音云取得所有用户的信息。
结合图6,在本发明一些实施例中,完成这一过程所需的软件模块在终端中实现,即语音终端中需要在原先的语音控制模块中增加“自定义关键字识别功能”(图6虚线框内模块)。
与原先语音控制模块把语音发送到语音云、由语音云返回的识别成功或失败的结果,只有成功时才能进行语音控制命令不同,流程举例如下图7,本方法对语音命令的处理过程如下:
1)在设置设备时,先提供标准的设备参数模板,其中包括标准的设备名,如“空调”等列表;标准的房间,如“客厅”等列表。如果用户修改其中的项目,客户端将根据记录用户修改的全部项目和修改值(用户可能修改家中多个设备),分别记录为不同列表,如修改的设备名记入“custom-device-list”,修改的设备房间名记入“custom-location-list”,每个不同的修改值记为其中一项,列表中每项内容不同相同。每一项记录还将对应原先的标准名称,如用户给空调起名为“北极风”,则在“custom-device-list”中,有一项为“北极风,空调”;对于房间,可以任意指定一个标准房间名,如“客厅,大房间”。记录顺序为“标准关键字,非标准关键字”。
2)首先终端把用户的语音发送到语音云,如“调节大房间的北极风到18度”,此时 语音内容可被语音云识别,但无法转换成控制命令,语音云返回{执行失败,原因=“缺少关键字”,文本=“调节大房间的北极风到18度”},终端识别此命令,进入标准化处理过程
3)首先终端把所有列表中的非标准关键字与语音云返回的文本(以下简称“听写文本”)进行匹配,如上述的“custom-device-list”和“custom-location-list”(也可以有更多列表,取决于有多少个允许修改的关键字,就有多少个列表),将其中每个非标准关键字在“听写文本”中搜索,在上述语音控制中可以搜索到“北极风”和“大房间”两个值,分别位于“custom-device-list”和“custom-location-list”两个列表中
4)对此二个值进行标准化替换,即把文本中的“北极风”替换为“空调”,“大房间”替换为“客厅”,并记录替换类型和原始值对,“device,北极风”,“location,大房间”,用户控制语音的文本变为“调节客厅的空调到18度”,记录为“standardization-list”
5)将此文本发送到语音云进行语义识别,语音云可返回正确的识别结果{执行成功,device=“空调”,action=“调节温度”,parameter=“18”,location=“客厅”,文本=“调节客厅的空调到18度”}
6)终端得到此识别结果,对照“standardization-list”,发现其中被替换了两个值“device”和“location”,则对其进行反替换,忽略无关部分,设备控制参数表变为{device=“北极风”,action=“调节温度”,parameter=“18”,location=“大房间”}。因为终端和IoT云都是按此参数记录和控制设备,按此参数可以正确控制家电设备。
以上过程即完成了非标准化关键字的语音识别过程。
综上所述,根据本发明实施例所述终端实现语音识别的方法可实现用户自定义关键字识别和处理,提高控制命令的语音和语义识别率,解决方法与用户所做修改无关,不需要根据用户修改升级系统,解决个性化设计带来的无法识别语音命令的问题;所述方法不需要其余服务器接口,避免云间接口,减少受网络攻击的端口,网络和信息安全性高,不需要云端先识别用户、再处理用户非标准信息,本地分散处理效率高,同时,方法使接口标准化,和语音服务商之间耦合低、依赖少,便于推广和灵活选择供应商,此外,减少用户个人私有信息和企业全部用户信息对外界暴露,保护隐私和商业秘密,通过终端侧的软件实现,不涉及语音服务升级和维护,成本低。
结合以上方法描述,在另一方面,本发明实施例提供了一种实现语音控制的终端,其中,包括记录单元,第一发送单元、替换单元和执行单元;所述记录单元用于记录第一关键词文本和第二关键词文本的对应关系;当用户输入的语音:所述第一发送单元用于将所述用户输入的语音发送给第一服务器进行语义解析和逻辑解析;所述替换单元用于在所述第一服务器返回解析失败时,获取所述第一服务器返回的解析文本,根据所述对应关系将所述解析文本中的所述第二关键词替换为所述第一关键词后发送给所述第一服务器;所述执行单元用于接收所述第一服务器逻辑解析成功后所返回的控制命令结构,并根据所述控制命令结构执行功能,播放通知语音。
在可能的实现方式中,所述记录单元包括:第一记录子单元,用于当用户将第一关键词文本修改为第二关键词文本的操作,记录所述第一关键词文本和所述第二关键词文本的对应关系。
在可能的实现方式中,所述记录单元包括:第二记录子单元,用于从第二服务器获取并记录第一关键词文本和第二关键词文本的对应关系。
结合本发明第三方面各实施例,在可能的实现方式中,所述记录单元还包括:第三记录子单元,用于将所述第一关键词文本和所述第二关键词文本及其对应关系记录在词汇列表中。
在可能的实现方式中,所述第三记录子单元用于:根据所述第一关键词文本不同的类型,将所述第一关键词文本和所述第二关键词文本及其对应关系记录在不同的词汇列表中。
在可能的实现方式中,所述替换单元包括:匹配子单元,用于将所述词汇列表中的所述第二关键词文本与所述解析文本进行匹配;替换子单元,用于将所述解析文本中匹配到的所述第二关键词替换为对应的所述第一关键词后发送给第一服务器。
在可能的实现方式中,所述执行单元包括:再替换子单元,用于将所述控制命令结构中所述第一关键词文本替换为所对应的所述第二关键词文本;执行子单元,用于根据替换后的所述控制命令结构,生成可执行的控制命令,并执行所述控制命令;语音生成子单元,用于根据替换后的所述控制命令结构,生成通知语音,并播放所述通知语音。
在可能的实现方式中,所述替换单元还包括:替换记录子单元,用于在所述解析文本中匹配到的所述第二关键词替换为对应的所述第一关键词后,记录所述第二关键词和对应的所述第一关键词的替换记录;所述再替换子单元用于根据所述替换记录将所述控制命令结构中所述第一关键词文本替换为所对应的所述第二关键词文本。
在可能的实现方式中,所述终端还包括:第二发送单元,用于将所述用户输入的语音发送给第一服务器进行语义解析和逻辑解析时,将所述第一关键词文本和第二关键词文本的对应关系发送给所述第一服务器。
以上各单元及各子单元执行的步骤具体请参见方法描述,为简明起见,不再赘述。
在上述实施例中,可以全部或部分地通过软件、硬件、固件或者其任意组合来实现。当使用软件实现时,可以全部或部分地以计算机程序产品的形式实现。所述计算机程序产品包括一个或多个计算机指令。在计算机上加载和执行所述计算机程序指令时,全部或部分地产生按照本发明实施例所述的流程或功能。所述计算机可以是通用计算机、专用计算机、计算机网络、或者其他可编程装置。所述计算机指令可以存储在计算机可读存储介质中,或者从一个计算机可读存储介质向另一个计算机可读存储介质传输,例如,所述计算机指令可以从一个网站站点、计算机、服务器或数据中心通过有线(例如同轴电缆、光纤、数字用户线(DSL))或无线(例如红外、无线、微波等)方式向另一个网站站点、计算机、服务器或数据中心进行传输。所述计算机可读存储介质可以是计算机能够存取的任何可用介质或者是包含一个或多个可用介质集成的服务器、数据中心等数据存储设备。所述可用介质可以是磁性介质,(例如,软盘、硬盘、磁带)、光介质(例如,DVD)、或者半导体介质(例如固态硬盘Solid State Disk(SSD))等。
综上所述,以上实施例仅用以说明本发明的技术方案,而非对其限制;尽管参照上述实施例对本发明进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对上述各实施例所记载的技术方案进行修改,或者对其中部分技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本发明各实施例技术方案的精神和范围。

Claims (29)

  1. 一种实现语音控制的方法,其中,所述方法包括:
    所述终端记录第一关键词文本和第二关键词文本的对应关系;
    当用户输入的语音:
    所述终端将所述用户输入的语音发送给第一服务器进行语义解析和逻辑解析;
    在所述第一服务器返回解析失败时,所述终端获取所述第一服务器返回的解析文本,根据所述对应关系将所述解析文本中的所述第二关键词替换为所述第一关键词后发送给所述第一服务器;
    所述终端接收所述第一服务器逻辑解析成功后所返回的控制命令结构,并根据所述控制命令结构执行功能,播放通知语音。
  2. 根据权利要求1所述的方法,其中,所述终端记录第一关键词文本和第二关键词文本的对应关系包括:当用户将第一关键词文本修改为第二关键词文本的操作:所述终端记录所述第一关键词文本和所述第二关键词文本的对应关系。
  3. 根据权利要求1所述的方法,其中,所述终端记录第一关键词文本和第二关键词文本的对应关系包括:所述终端从第二服务器获取并记录第一关键词文本和第二关键词文本的对应关系。
  4. 根据权利要求1至3中任一项所述的方法,其中,所述终端保存所述第一关键词文本和所述第二关键词文本的对应关系的步骤包括:
    所述终端将所述第一关键词文本和所述第二关键词文本及其对应关系记录在词汇列表中。
  5. 根据权利要求4所述的方法,所述终端将所述第一关键词文本和所述第二关键词文本及其对应关系记录在词汇列表中的步骤,包括:
    所述终端根据所述第一关键词文本不同的类型,将所述第一关键词文本和所述第二关键词文本及其对应关系记录在不同的词汇列表中。
  6. 根据权利要求4或5所述的方法,其中,在所述第一服务器返回解析失败时,所述终端获取所述第一服务器返回的解析文本,根据所述对应关系将所述解析文本中的所述第二关键词替换为所述第一关键词后发送给所述第一服务器的步骤,包括:
    所述终端将所述词汇列表中的所述第二关键词文本与所述解析文本进行匹配;
    所述终端将所述解析文本中匹配到的所述第二关键词替换为对应的所述第一关键词后发送给第一服务器。
  7. 根据权利要求1至6中任一项所述的方法,其中,所述终端根据所述控制命令结构执行功能,并播放通知语音包括:
    所述终端将所述控制命令结构中所述第一关键词文本替换为所对应的所述第二关键词文本;
    所述终端根据替换后的所述控制命令结构,生成可执行的控制命令,并执行所述控制命令;
    所述终端根据替换后的所述控制命令结构,生成通知语音,并播放所述通知语音。
  8. 根据权利要求7所述的方法,其中,所述终端将所述解析文本中匹配到的所述第二关键词替换为对应的所述第一关键词后还包括:
    所述终端记录所述第二关键词和对应的所述第一关键词的替换记录;
    所述终端将所述控制命令结构中所述第一关键词文本替换为所对应的所述第二关键词文本包括:
    所述终端根据所述替换记录将所述控制命令结构中所述第一关键词文本替换为所对应的所述第二关键词文本。
  9. 根据权利要求1至8中任一项所述的方法,其中,所述方法还包括:
    在所述终端将所述用户输入的语音发送给第一服务器进行语义解析和逻辑解析时,所述终端将所述第一关键词文本和第二关键词文本的对应关系发送给所述第一服务器。
  10. 一种实现语音控制的终端,其中,所述终端包括:
    至少一个处理器;
    至少一个存储器,所述至少一个存储器包括若干指令;
    所述处理器执行所述若干指令使所述终端至少执行如下步骤:
    记录第一关键词文本和第二关键词文本的对应关系;
    当用户输入的语音:
    将所述用户输入的语音发送给第一服务器进行语义解析和逻辑解析;
    在所述第一服务器返回解析失败时,获取所述第一服务器返回的解析文本,根据所述对应关系将所述解析文本中的所述第二关键词替换为所述第一关键词后发送给所述第一服务器;
    接收所述第一服务器逻辑解析成功后所返回的控制命令结构,并根据所述控制命令结构执行功能,播放通知语音。
  11. 根据权利要求10所述的终端,其中,在记录第一关键词文本和第二关键词文本的对应关系步骤中,所述处理器执行所述若干指令使所述终端至少执行如下步骤:
    当用户将第一关键词文本修改为第二关键词文本的操作,记录所述第一关键词文本和所述第二关键词文本的对应关系。
  12. 根据权利要求11所述的终端,其中,在记录第一关键词文本和第二关键词文本的对应关系的步骤中,所述处理器执行所述若干指令使所述终端至少执行如下步骤:
    从第二服务器获取并记录第一关键词文本和第二关键词文本的对应关系。
  13. 根据权利要求10至12中任一项所述的终端,其中,在保存所述第一关键词文本和所述第二关键词文本的对应关系的步骤中,所述处理器执行所述若干指令使所述终端至少执行如下步骤:
    将所述第一关键词文本和所述第二关键词文本及其对应关系记录在词汇列表中。
  14. 根据权利要求13所述的终端,在将所述第一关键词文本和所述第二关键词文本及其对应关系记录在词汇列表中,所述处理器执行所述若干指令使所述终端至少执行如下步骤:
    根据所述第一关键词文本不同的类型,将所述第一关键词文本和所述第二关键词文本及其对应关系记录在不同的词汇列表中。
  15. 根据权利要求10至14中任一项所述的终端,其中,在所述第一服务器返回解析失败时,获取所述第一服务器返回的解析文本,根据所述对应关系将所述解析文本中的所述第二关键词替换为所述第一关键词后发送给所述第一服务器的步骤中,所述处理器执行所述若干指令使所述终端至少执行如下步骤:
    将所述词汇列表中的所述第二关键词文本与所述解析文本进行匹配;
    将所述解析文本中匹配到的所述第二关键词替换为对应的所述第一关键词后发送给第一服务器。
  16. 根据权利要求10至15中任一项所述的终端,其中,根据所述控制命令结构执行功能,并播放通知语音的步骤中,所述处理器执行所述若干指令使所述终端至少执行如下步骤:
    将所述控制命令结构中所述第一关键词文本替换为所对应的所述第二关键词文本;
    根据替换后的所述控制命令结构,生成可执行的控制命令,并执行所述控制命令;
    根据替换后的所述控制命令结构,生成通知语音,并播放所述通知语音。
  17. 根据权利要求16所述的终端,其中,在将所述解析文本中匹配到的所述第二关键词替换为对应的所述第一关键词的步骤后,所述处理器执行所述若干指令使所述终端至少执行如下步骤:
    记录所述第二关键词和对应的所述第一关键词的替换记录;
    将所述控制命令结构中所述第一关键词文本替换为所对应的所述第二关键词文本包括:
    根据所述替换记录将所述控制命令结构中所述第一关键词文本替换为所对应的所述第二关键词文本。
  18. 根据权利要求10至17中任一项所述的终端,其中,所述处理器还执行所述若干指令使所述终端至少执行如下步骤:
    在将所述用户输入的语音发送给第一服务器进行语义解析和逻辑解析时,将所述第一关键词文本和第二关键词文本的对应关系发送给所述第一服务器。
  19. 一种实现语音控制的终端,其中,包括记录单元,第一发送单元、替换单元和执行单元;
    所述记录单元用于记录第一关键词文本和第二关键词文本的对应关系;
    当用户输入的语音:
    所述第一发送单元用于将所述用户输入的语音发送给第一服务器进行语义解析和逻辑解析;
    所述替换单元用于在所述第一服务器返回解析失败时,获取所述第一服务器返回的解析文本,根据所述对应关系将所述解析文本中的所述第二关键词替换为所述第一关键词后发送给所述第一服务器;
    所述执行单元用于接收所述第一服务器逻辑解析成功后所返回的控制命令结构,并根据所述控制命令结构执行功能,播放通知语音。
  20. 根据权利要求19所述的终端,其中,所述记录单元包括:
    第一记录子单元,用于当用户将第一关键词文本修改为第二关键词文本的操作,记录所述第一关键词文本和所述第二关键词文本的对应关系。
  21. 根据权利要求19所述的终端,其中,所述记录单元包括:
    第二记录子单元,用于从第二服务器获取并记录第一关键词文本和第二关键词文本的对应关系。
  22. 根据权利要求19至21中任一项所述的终端,其中,所述记录单元还包括:
    第三记录子单元,用于将所述第一关键词文本和所述第二关键词文本及其对应关系 记录在词汇列表中。
  23. 根据权利要求22所述的终端,所述第三记录子单元用于:
    根据所述第一关键词文本不同的类型,将所述第一关键词文本和所述第二关键词文本及其对应关系记录在不同的词汇列表中。
  24. 根据权利要求22或23所述的终端,其中,所述替换单元包括:
    匹配子单元,用于将所述词汇列表中的所述第二关键词文本与所述解析文本进行匹配;
    替换子单元,用于将所述解析文本中匹配到的所述第二关键词替换为对应的所述第一关键词后发送给第一服务器。
  25. 根据权利要求19至24中任一项所述的终端,其中,所述执行单元包括:
    再替换子单元,用于将所述控制命令结构中所述第一关键词文本替换为所对应的所述第二关键词文本;
    执行子单元,用于根据替换后的所述控制命令结构,生成可执行的控制命令,并执行所述控制命令;
    语音生成子单元,用于根据替换后的所述控制命令结构,生成通知语音,并播放所述通知语音。
  26. 根据权利要求25所述的终端,其中,所述替换单元还包括:
    替换记录子单元,用于在所述解析文本中匹配到的所述第二关键词替换为对应的所述第一关键词后,记录所述第二关键词和对应的所述第一关键词的替换记录;
    所述再替换子单元用于根据所述替换记录将所述控制命令结构中所述第一关键词文本替换为所对应的所述第二关键词文本。
  27. 根据权利要求19至26中任一项所述的终端,其中,还包括:
    第二发送单元,用于将所述用户输入的语音发送给第一服务器进行语义解析和逻辑解析时,将所述第一关键词文本和第二关键词文本的对应关系发送给所述第一服务器。
  28. 一种计算机可读存储介质,包括指令,当其在计算机上运行时,使得计算机执行如权利要求1至9中任意一项所述的方法。
  29. 一种包含指令的计算机程序产品,当其在计算机上运行时,使得计算机执行如权利要求1至9中任一项所述的方法。
PCT/CN2017/088150 2017-01-20 2017-06-13 一种实现语音控制的方法和终端 WO2018133307A1 (zh)

Priority Applications (3)

Application Number Priority Date Filing Date Title
US16/479,796 US11238860B2 (en) 2017-01-20 2017-06-13 Method and terminal for implementing speech control
CN201780084159.8A CN110235087B (zh) 2017-01-20 2017-06-13 一种实现语音控制的方法和终端
EP17893471.7A EP3561643B1 (en) 2017-01-20 2017-06-13 Method and terminal for implementing voice control

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201710051813.7 2017-01-20
CN201710051813 2017-01-20

Publications (1)

Publication Number Publication Date
WO2018133307A1 true WO2018133307A1 (zh) 2018-07-26

Family

ID=62907588

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2017/088150 WO2018133307A1 (zh) 2017-01-20 2017-06-13 一种实现语音控制的方法和终端

Country Status (4)

Country Link
US (1) US11238860B2 (zh)
EP (1) EP3561643B1 (zh)
CN (1) CN110235087B (zh)
WO (1) WO2018133307A1 (zh)

Families Citing this family (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110235087B (zh) * 2017-01-20 2021-06-08 华为技术有限公司 一种实现语音控制的方法和终端
US11276395B1 (en) * 2017-03-10 2022-03-15 Amazon Technologies, Inc. Voice-based parameter assignment for voice-capturing devices
US20200126549A1 (en) * 2017-07-14 2020-04-23 Daikin Industries, Ltd. Device control system
CN107507615A (zh) * 2017-08-29 2017-12-22 百度在线网络技术(北京)有限公司 界面智能交互控制方法、装置、系统及存储介质
JP7065314B2 (ja) * 2018-02-14 2022-05-12 パナソニックIpマネジメント株式会社 制御システム、及び、制御方法
CN111312253A (zh) * 2018-12-11 2020-06-19 青岛海尔洗衣机有限公司 语音控制方法、云端服务器及终端设备
CN110223694B (zh) * 2019-06-26 2021-10-15 百度在线网络技术(北京)有限公司 语音处理方法、系统和装置
WO2021051403A1 (zh) * 2019-09-20 2021-03-25 深圳市汇顶科技股份有限公司 一种语音控制方法、装置、芯片、耳机及系统
CN111696557A (zh) * 2020-06-23 2020-09-22 深圳壹账通智能科技有限公司 语音识别结果的校准方法、装置、设备及存储介质
CN112202870A (zh) * 2020-09-27 2021-01-08 上汽通用五菱汽车股份有限公司 语音交互方法、车载无屏设备、服务器及存储介质
CN112581952A (zh) * 2020-11-09 2021-03-30 金茂智慧科技(广州)有限公司 一种在线与离线语音结合的智能设备控制方法

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103021403A (zh) * 2012-12-31 2013-04-03 威盛电子股份有限公司 基于语音识别的选择方法及其移动终端装置及信息系统
CN106057199A (zh) * 2016-05-31 2016-10-26 广东美的制冷设备有限公司 控制方法、控制装置和终端

Family Cites Families (26)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2004109658A1 (ja) 2003-06-02 2004-12-16 International Business Machines Corporation 音声応答システム、音声応答方法、音声サーバ、音声ファイル処理方法、プログラム及び記録媒体
GB0426347D0 (en) 2004-12-01 2005-01-05 Ibm Methods, apparatus and computer programs for automatic speech recognition
US9318108B2 (en) * 2010-01-18 2016-04-19 Apple Inc. Intelligent automated assistant
US10276170B2 (en) * 2010-01-18 2019-04-30 Apple Inc. Intelligent automated assistant
US9305553B2 (en) * 2010-04-28 2016-04-05 William S. Meisel Speech recognition accuracy improvement through speaker categories
IL220411A0 (en) * 2012-06-14 2012-10-31 Haim Nachum Markovitz Information presentation system and method
US9786294B1 (en) * 2012-07-30 2017-10-10 Amazon Technologies, Inc. Visual indication of an operational state
US20150081663A1 (en) * 2013-09-18 2015-03-19 First Principles, Inc. System and method for active search environment
US9640178B2 (en) 2013-12-26 2017-05-02 Kopin Corporation User configurable speech commands
CN103730116B (zh) * 2014-01-07 2016-08-17 苏州思必驰信息科技有限公司 在智能手表上实现智能家居设备控制的系统及其方法
US9514743B2 (en) 2014-08-29 2016-12-06 Google Inc. Query rewrite corrections
CN104318924A (zh) 2014-11-12 2015-01-28 沈阳美行科技有限公司 一种实现语音识别功能的方法
US9836452B2 (en) * 2014-12-30 2017-12-05 Microsoft Technology Licensing, Llc Discriminating ambiguous expressions to enhance user experience
US10691473B2 (en) * 2015-11-06 2020-06-23 Apple Inc. Intelligent automated assistant in a messaging environment
US9484030B1 (en) * 2015-12-02 2016-11-01 Amazon Technologies, Inc. Audio triggered commands
CN105488032A (zh) 2015-12-31 2016-04-13 杭州智蚁科技有限公司 一种语音识别输入的控制方法及系统
CN105916090B (zh) 2016-05-31 2019-05-07 成都九十度工业产品设计有限公司 一种基于智能化语音识别技术的助听器系统
US9875740B1 (en) * 2016-06-20 2018-01-23 A9.Com, Inc. Using voice information to influence importance of search result categories
US10074369B2 (en) * 2016-09-01 2018-09-11 Amazon Technologies, Inc. Voice-based communications
US10474753B2 (en) * 2016-09-07 2019-11-12 Apple Inc. Language identification using recurrent neural networks
CN110235087B (zh) * 2017-01-20 2021-06-08 华为技术有限公司 一种实现语音控制的方法和终端
US10467509B2 (en) * 2017-02-14 2019-11-05 Microsoft Technology Licensing, Llc Computationally-efficient human-identifying smart assistant computer
KR102401377B1 (ko) * 2017-06-07 2022-05-24 현대자동차주식회사 대화형 음성인식을 이용한 지리정보 검색 방법 및 장치
US11076039B2 (en) * 2018-06-03 2021-07-27 Apple Inc. Accelerated task performance
US10839159B2 (en) * 2018-09-28 2020-11-17 Apple Inc. Named entity normalization in a spoken dialog system
US11194796B2 (en) * 2019-02-14 2021-12-07 Microsoft Technology Licensing, Llc Intuitive voice search

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103021403A (zh) * 2012-12-31 2013-04-03 威盛电子股份有限公司 基于语音识别的选择方法及其移动终端装置及信息系统
CN106057199A (zh) * 2016-05-31 2016-10-26 广东美的制冷设备有限公司 控制方法、控制装置和终端

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP3561643A4 *

Also Published As

Publication number Publication date
CN110235087B (zh) 2021-06-08
CN110235087A (zh) 2019-09-13
EP3561643B1 (en) 2023-07-19
EP3561643A4 (en) 2020-01-22
EP3561643A1 (en) 2019-10-30
US20200175980A1 (en) 2020-06-04
US11238860B2 (en) 2022-02-01

Similar Documents

Publication Publication Date Title
WO2018133307A1 (zh) 一种实现语音控制的方法和终端
EP3663903B1 (en) Display method and device
US10650816B2 (en) Performing tasks and returning audio and visual feedbacks based on voice command
US10431218B2 (en) Integration and probabilistic control of electronic devices
US10747954B2 (en) System and method for performing tasks based on user inputs using natural language processing
US11031011B2 (en) Electronic device and method for determining electronic device to perform speech recognition
US9934781B2 (en) Method of providing voice command and electronic device supporting the same
CN108027952B (zh) 用于提供内容的方法和电子设备
RU2689203C2 (ru) Гибкая схема для настройки языковой модели
US10811008B2 (en) Electronic apparatus for processing user utterance and server
CN105634881B (zh) 应用场景推荐方法及装置
EP3402160B1 (en) Service processing method and apparatus
US9807218B2 (en) Method for filtering spam in electronic device and the electronic device
US9967744B2 (en) Method for providing personal assistant service and electronic device thereof
WO2021204098A1 (zh) 语音交互方法及电子设备
US9843667B2 (en) Electronic device and call service providing method thereof
US20200051558A1 (en) Electronic device supporting personalized device connection and method thereof
US20210149627A1 (en) System for processing user utterance and control method of same
US9460090B2 (en) Method of recognizing situation requiring translation and performing translation function, and electronic device implementing the same
JP2020038709A (ja) 人工知能機器における連続会話機能
US20160004784A1 (en) Method of providing relevant information and electronic device adapted to the same
JP2019091444A (ja) スマートインタラクティブの処理方法、装置、設備及びコンピュータ記憶媒体
US11756575B2 (en) Electronic device and method for speech recognition processing of electronic device
US20230127543A1 (en) Method of identifying target device based on utterance and electronic device therefor
US9978375B2 (en) Method for transmitting phonetic data

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 17893471

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

ENP Entry into the national phase

Ref document number: 2017893471

Country of ref document: EP

Effective date: 20190726